Skip to content

Add mailmap support.#4586

Merged
ethomson merged 25 commits intolibgit2:masterfrom
emilio:mailmap
Jun 18, 2018
Merged

Add mailmap support.#4586
ethomson merged 25 commits intolibgit2:masterfrom
emilio:mailmap

Conversation

@emilio
Copy link
Copy Markdown
Contributor

@emilio emilio commented Mar 19, 2018

Fixes #4565.

@emilio
Copy link
Copy Markdown
Contributor Author

emilio commented Mar 19, 2018

cc @mystor, who ended up doing most of the work :)

@mystor
Copy link
Copy Markdown
Contributor

mystor commented Mar 19, 2018

I figure I'll add a high level summary of the API @emilio and I ended up making here:

  1. There is a new git_mailmap opaque struct which can be created from either a repository, or a byte buffer. It contains mailmap entries.
    a) When creating from a bare repository, the .mailmap file will be looked up in the root tree of HEAD
    b) When creating from a non-bare repository, the working directory will be checked for the .mailmap.
  2. This struct provides methods for accessing the individual entries (git_mailmap_entry_byindex), and resolving name/email pairs to real names/emails (git_mailmap_resolve).
  3. The mailmap APIs are also integrated into signatures, commits, and blame
    a) git_signature_with_mailmap resolves a signature to the real name/email.
    b) git_commit_{committer,author}_with_mailmap gets the committer/author's real name/email by using the given mailmap.
    c) A new flag, GIT_BLAME_USE_MAILMAP was added to the blame options. When specified, the mailmap from the repo will be used to ensure that the signatures in the blame chunk output use real names/emails.

This should be a fully non-breaking change, as it exclusively adds APIs, and doesn't change the behaviour of any existing functions. Mailmaps are never used implicitly in the API, and have to be requested.

I also added a bunch of tests into tests/mailmap/**.

Copy link
Copy Markdown
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Thanks a lot for this PR. In already does look great, especially considering it is your first contribution to libgit2. There's still some issues that need to be fixed up, but it's mostly just stylistic things you probably couldn't know any better, so don't feel discouraged ;)

There's only one thing where I'm a bit hesitant. I've recently started to use a common parser interface for parsing data such that we do not have to duplicate parser structures all over. It would've been cool if this PR used that interface. I don't think its a necessity to get this merged, but feedback and more users for that API would definitly be nice.

Comment thread include/git2/mailmap.h Outdated
* @param size size of the raw data buffer
* @return 0 on success
*/
GIT_EXTERN(int) git_mailmap_parse(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with git_mailmap_from_repo, a function name git_mailmap_from_buffer might in fact be favorable. This is only a proposal, the name is fine as is.

Comment thread include/git2/mailmap.h Outdated
* Resolve a name and email to the corresponding real name and email.
*
* @param name_out either 'name', or the real name to use.
* You should NOT free this value.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're simply returning either the original name/email or the mailmap-allocated ones? If so, you should state that lifetime of the out parameters is tied to mailmap and name/email

Comment thread include/git2/mailmap.h
* @param name the name to perform the lookup with.
* @param email the email to perform the lookup with.
* @return the corresponding mailmap entry, or NULL if it cannot be found.
*/
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can either name or email be NULL?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, although theoretically that could be handled.

Comment thread include/git2/mailmap.h Outdated
*
* @param out pointer to store the mailmap
* @param repo repository to find the .mailmap in
* @return 0 on success; GIT_ENOTFOUND if .mailmap could not be found.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise an error code?

Comment thread include/git2/mailmap.h Outdated
* @param out pointer to store the mailmap
* @param data raw data buffer to parse
* @param size size of the raw data buffer
* @return 0 on success
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise an error code?

Comment thread tests/mailmap/blame.c
cl_git_sandbox_cleanup();
g_repo = NULL;

git_blame_free(g_blame);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably free the blame before freeing the repo

Comment thread tests/mailmap/parsing.c Outdated
&resolved_email,
mailmap,
resolved[idx].replace_name,
resolved[idx].replace_email);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this several times now, with overly many newlines. We usually don't use that many newlines throughout our codebase. The only thing we try to stick to is a maximum line length of 80 characters

Comment thread tests/mailmap/parsing.c Outdated

/* Check that the resolver behaves correctly */
for (idx = 0; idx < resolved_size; ++idx) {
git_mailmap_resolve(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should verify the return code with cl_git_pass

Comment thread tests/mailmap/parsing.c Outdated

void test_mailmap_parsing__string(void)
{
cl_check_pass(git_mailmap_parse(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't use cl_check_pass but cl_git_pass

@@ -0,0 +1,15 @@
#!/bin/sh
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably remove all these hooks just to keep the churn as small as necessary

Copy link
Copy Markdown
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for amending! I'm happy to see that you've used the git_parse_ctx stuff now. I left a few comments where I feel like the parsing interface is still awkward to use -- nothing that you need to address, just some notes for myself.

Comment thread src/mailmap.c Outdated
void git_mailmap_resolve(
const char **name_out, const char **email_out,
const git_mailmap *mailmap,
const char *name, const char *email)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this was returning void at the first pass. I think you should return an error code here -- while the function cannot fail right now, you never know how you/we'll have to extend that function in the future. So returning error codes right now and having all callers check for failures is the right thing to do.

Comment thread src/mailmap.c
}

static int advance_until(
const char **start, size_t *len, git_parse_ctx *ctx, char needle)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be pulled up into the parser interface itself.

Comment thread src/mailmap.c
{
*start = ctx->line;
while (ctx->line_len > 0 && *ctx->line != '#' && *ctx->line != needle)
git_parse_advance_chars(ctx, 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an interface git_parse_advance_char(&c, ctx, 1); would be nice to have. Advance one character and put the next one into c.

Comment thread src/mailmap.c
}

/* Parse a single entry from a mailmap file.
*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-line comments should start out with an empty /*:

/*
 * Parse a...
 */

Comment thread src/mailmap.c
/* Parse the real name */
git_parse_advance_ws(ctx);
if (advance_until(&start, &len, ctx, '<') < 0)
return -1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (...) {
    giterr_set(GITERR_MAILMAP, "could not parse real name");
    return -1;
}

You'd have to define GITERR_MAILMAP, but I think it can be helpful to detail what's wrong here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always ignore these parser errors & just skip the line (hence the potential desire for warnings). Should we still set giterr if we won't produce a visible error code?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that a bit later and thought to have deleted that comment. Existing code is fine as-is.

Comment thread tests/mailmap/basic.c Outdated
void test_mailmap_basic__initialize(void)
{
git_buf buf = GIT_BUF_INIT;
git_buf_attach_notowned(&buf, TEST_MAILMAP, sizeof(TEST_MAILMAP) - 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use strlen(TEST_MAILMAP) here, which is a bit more obvious. The compiler should do it's thing and optimize that out

Comment thread tests/mailmap/basic.c
"Blatantly invalid line\n"
"Foo bar <foo@bar.com> <foo@bal.com>\n"
"<email@foo.com> <otheremail@foo.com>\n"
"<email@foo.com> Other Name <yetanotheremail@foo.com>\n";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this one is valid. At least git-check-mailmap does not mention this syntax

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git's documentation doesn't mention this format, but when I was testing mailmap, IIRC this format is supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fair enough

Comment thread tests/mailmap/basic.c Outdated
cl_assert(!git__strcmp(entry->replace_email, "foo@baz.com"));

entry = git_mailmap_entry_byindex(mailmap, 10000);
cl_assert(!entry);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this one into a separate test. A test should only check one thing

Comment thread tests/mailmap/basic.c Outdated
entry = git_mailmap_entry_byindex(mailmap, 0);
cl_assert(entry);
cl_assert(!entry->replace_name);
cl_assert(!git__strcmp(entry->replace_email, "foo@baz.com"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love if you'd validate all four entries.

struct {
    char *real_name;
    char *real_mail;
    char *replace_name;
    char *replace_mail;
} expected[] = {
    { "Foo bar", "foo@bar.com", "foo@baz.com" },
   ...
};

for (i = 0; i < git_mailmap_entry_count(mailmap); i++) {
    entry_byindex(mailmap, i);
   assert_all_things();
}

Comment thread tests/mailmap/basic.c Outdated
const git_mailmap_entry *entry = git_mailmap_entry_lookup(
mailmap, "Typoed the name once", "foo@baz.com");
cl_assert(entry);
cl_assert(!git__strcmp(entry->real_name, "Foo bar"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's cl_assert_equal_s, for this one and the following tests

@mystor
Copy link
Copy Markdown
Contributor

mystor commented Apr 8, 2018

@pks-t So, I've ended up making a few more changes to the patch than I expected to. While I was looking at the documentation for git's mailmap support more closely I realized that I was looking up mailmap entries for a repository incorrectly. git has a specific way of doing it, as well as a set of configs (mailmap.blob and mailmap.file) which override each other in a particular order. Unfortunately, this behaviour was incompatible with the API we were exposing from the mailmap module.

I've rewritten the API to support the correct overriding behaviour, but it has caused a fairly large surface change. As an example, the individual entry objects are no longer part of the public API, and you can't index into the mailmap, as that doesn't make as much sense with the correct overriding behaviour. I'm mostly glad that I caught this before we merged it & had to make a breaking change >.<.

I decided to continue to implement the backing store as a sorted git_vector, rather than using a git_strmap or similar, because it was simpler to implement, and required less moving parts.

@pks-t
Copy link
Copy Markdown
Member

pks-t commented Apr 20, 2018

Push for myself such that this thread is further up in my mailbox and I remember to review this soon

Copy link
Copy Markdown
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating, finally here's another round of feedback. The issues are becoming less grave and more of stylistic nature, which is a good sign that this PR is on a good track.

Comment thread include/git2/signature.h Outdated
* @param mailmap mailmap to resolve with
* @return 0 or an error code
*/
GIT_EXTERN(int) git_signature_with_mailmap(git_signature **out, const git_signature *sig, const git_mailmap *mailmap);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure about this function. git_mailmap_resolve is in fact nearly the exact same thing, with the only difference being that this one here uses signatures instead of strings. So maybe move this into mailmap.h and call it git_mailmap_resolve_signature?

Comment thread include/git2/types.h Outdated
void (*free)(git_writestream *stream);
};

/** A parsed representation of a .mailmap file. */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily parsed in case you've manually added entries or in case you've just allocated it, without parsing from anywhere. So I'd just drop "parsed" here

Comment thread include/git2/mailmap.h Outdated
* @param buf the buffer to read the mailmap file from
* @return 0 on success, or an error code
*/
GIT_EXTERN(int) git_mailmap_add_buffer(git_mailmap *mm, const git_buf *buf);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this interface is external, we should probably not use a buffer here but a plain const char * pointer with an (optional) size_t len. In case len is 0, we should then fall back to use strlen. Using buffers would only be a nuisance for library users, as we don't expose most of its interface.

Comment thread src/mailmap.c
error = git_vector_insert_sorted(
&mm->entries, entry, mailmap_entry_replace);
if (error < 0 && error != GIT_EEXISTS)
goto cleanup;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all duplicated code. Just use git_mailmap_add_entry

Comment thread src/mailmap.c Outdated
cleanup:
mailmap_entry_free(entry);

/* We never allocate data in these buffers, but better safe than sorry */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do allocate data, but we detach the buffers so that they don't own it anymore. Slight difference ;)

Comment thread src/mailmap.c Outdated
}

static int mailmap_add_blob(
git_mailmap *mm, git_repository *repo, const char *spec)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec is a bit weird. rev or revision would probably be more obvious

Comment thread src/mailmap.c Outdated
ssize_t fallback = -1;
size_t idx;
git_mailmap_entry *entry;
git_mailmap_entry needle = { NULL, NULL, NULL, (char *)email };
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say explicit initialization would be a bit nicer here. All struct members are char pointers, so in the unlikely case we ever rearrange its members, this would silently continue to work. So just zeroing out and then explicitly assigning to the fields would be a bit nicer.

Comment thread tests/mailmap/blame.c Outdated

cl_git_pass(git_blame_file(&g_blame, g_repo, "file.txt", &opts));
if (!g_blame)
return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unexpected. We just silently abort the test, but in fact we should always exactly know what to expect (and error if g_blame is not set)

Comment thread tests/mailmap/blame.c Outdated

cl_git_pass(git_blame_file(&g_blame, g_repo, "file.txt", &opts));
if (!g_blame)
return;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

const char *real_email;
const char *replace_name;
const char *replace_email;
} mailmap_entry;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mailmap_helpers.h" is a bit weird, as there are no helpers in here. It's more like data, so "mailmap_testdata.h" would fit better

@mystor
Copy link
Copy Markdown
Contributor

mystor commented May 18, 2018

@pks-t reminder that I posted some changes :-)

@pks-t
Copy link
Copy Markdown
Member

pks-t commented May 23, 2018

Thanks for the reminder, @mystor. I'll review this PR no later than this Friday

@pks-t
Copy link
Copy Markdown
Member

pks-t commented May 25, 2018

I'm happy with this PR. There's a memory leak introduced, though, as you don't ever free the mm->entries vector.

@mystor
Copy link
Copy Markdown
Contributor

mystor commented Jun 5, 2018

@pks-t I've fixed the memory leak you pointed out. Sorry for the delay. Is there anything else I should be doing for this PR?

@pks-t
Copy link
Copy Markdown
Member

pks-t commented Jun 7, 2018

Great, CI doesn't mention any memory leaks anymore. @ethomson, got any more comments on this PR?

Comment thread include/git2/mailmap.h
* @param replace_email the email to replace
* @return 0 if it was added, EEXISTS if it replaced an entry, or an error code
*/
GIT_EXTERN(int) git_mailmap_add_entry(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for the last-minute review, but I don't love the EEXISTS return here. That really makes it sound like an error case, not a success. Do you think that "replaced an entry" is really such a valuable state to know? If so, what do you think about an out-param that indicates that instead of overloading the return value? Curious what @pks-t thinks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning behind this decision was to match the git_vector APIs here. Most consumers of the mailmap API will probably just get it from a git_repository which just ignores errors like this.

I'd probably be fine with just not communicating whether or not an entry was replaced entirely.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an int force parameter (instead of a int *replaced) ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably be fine with just not communicating whether or not an entry was replaced entirely.

I'm +1 on this.

@mystor
Copy link
Copy Markdown
Contributor

mystor commented Jun 14, 2018

@ethomson @pks-t I just hid the EEXISTS error from the public API - anything else I should change or is this patch ready to go now?

@pks-t
Copy link
Copy Markdown
Member

pks-t commented Jun 15, 2018

I'm still fine with this PR as is. :)

Copy link
Copy Markdown
Member

@ethomson ethomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have two minor review comments. (Sorry I missed these during the first review.) Overall, this is a really great first contribution. Thanks @emilio and @mystor for putting together a large change that fits incredibly well stylistically with the rest of the codebase. 😀

Comment thread include/git2/mailmap.h Outdated
*
* @param out pointer to store the new mailmap
* @param buf buffer to parse the mailmap from
* @param len the length of the input buffer [optional: 0 defaults to 'strlen']
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a pattern that we have been using - our APIs either take a NUL-terminated string or a buffer with length. (And if it's the latter, then 0 really means 0.) I think it would be confusing to have an API that differs from this pattern.

I don't have a strong opinion on which way this should go - it seems like we should just take a buf? I'm happy either way but the 0 means we do the strlen doesn't align with the rest of our APIs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to take a buf, and I changed it in an earlier patch. I'm fine with switching back to that if it's what we want to do.

Comment thread include/git2/mailmap.h Outdated
* @param len the length of the input buffer [optional: 0 defaults to 'strlen']
* @return 0 on success, or an error code
*/
GIT_EXTERN(int) git_mailmap_add_buffer(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how much utility there is in making this a public method? It seems like most uses would prefer the git_mailmap_from_buffer. I'd be perfectly happy seeing this be an internal API.

If it does stay public (which is fine, if you have a use for it or think others will) then I do think that the name does not capture exactly what it does (at least when viewed through the lens of the names of our current APIs - add_buffer sounds like it will add a buffer, not add from a buffer).

I would propose git_mailmap_parse? Also, the same notes about the 0 len apply as with git_mailmap_from_buffer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm fine with making this internal. Not super attached to any of the public APIs other than the from_repo one.

@mystor
Copy link
Copy Markdown
Contributor

mystor commented Jun 17, 2018

@ethomson Made the changes you requested :-)

@ethomson
Copy link
Copy Markdown
Member

Thanks again @emilio and @mystor. Sorry it took so long to get this reviewed and merged, but I'm really happy that you contributed this. I hope that you'll continue to contribute more in the future. 😃

@ethomson ethomson merged commit 96882f2 into libgit2:master Jun 18, 2018
@emilio
Copy link
Copy Markdown
Contributor Author

emilio commented Jun 18, 2018

@mystor did most of the work actually, so thank her! :). Thanks for all the reviews and for getting this merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants