Conversation
|
cc @mystor, who ended up doing most of the work :) |
|
I figure I'll add a high level summary of the API @emilio and I ended up making here:
This should be a fully non-breaking change, as it exclusively adds APIs, and doesn't change the behaviour of any existing functions. Mailmaps are never used implicitly in the API, and have to be requested. I also added a bunch of tests into tests/mailmap/**. |
pks-t
left a comment
There was a problem hiding this comment.
Hi!
Thanks a lot for this PR. In already does look great, especially considering it is your first contribution to libgit2. There's still some issues that need to be fixed up, but it's mostly just stylistic things you probably couldn't know any better, so don't feel discouraged ;)
There's only one thing where I'm a bit hesitant. I've recently started to use a common parser interface for parsing data such that we do not have to duplicate parser structures all over. It would've been cool if this PR used that interface. I don't think its a necessity to get this merged, but feedback and more users for that API would definitly be nice.
| * @param size size of the raw data buffer | ||
| * @return 0 on success | ||
| */ | ||
| GIT_EXTERN(int) git_mailmap_parse( |
There was a problem hiding this comment.
To be consistent with git_mailmap_from_repo, a function name git_mailmap_from_buffer might in fact be favorable. This is only a proposal, the name is fine as is.
| * Resolve a name and email to the corresponding real name and email. | ||
| * | ||
| * @param name_out either 'name', or the real name to use. | ||
| * You should NOT free this value. |
There was a problem hiding this comment.
So you're simply returning either the original name/email or the mailmap-allocated ones? If so, you should state that lifetime of the out parameters is tied to mailmap and name/email
| * @param name the name to perform the lookup with. | ||
| * @param email the email to perform the lookup with. | ||
| * @return the corresponding mailmap entry, or NULL if it cannot be found. | ||
| */ |
There was a problem hiding this comment.
Can either name or email be NULL?
There was a problem hiding this comment.
No, although theoretically that could be handled.
| * | ||
| * @param out pointer to store the mailmap | ||
| * @param repo repository to find the .mailmap in | ||
| * @return 0 on success; GIT_ENOTFOUND if .mailmap could not be found. |
| * @param out pointer to store the mailmap | ||
| * @param data raw data buffer to parse | ||
| * @param size size of the raw data buffer | ||
| * @return 0 on success |
| cl_git_sandbox_cleanup(); | ||
| g_repo = NULL; | ||
|
|
||
| git_blame_free(g_blame); |
There was a problem hiding this comment.
You should probably free the blame before freeing the repo
| &resolved_email, | ||
| mailmap, | ||
| resolved[idx].replace_name, | ||
| resolved[idx].replace_email); |
There was a problem hiding this comment.
I've seen this several times now, with overly many newlines. We usually don't use that many newlines throughout our codebase. The only thing we try to stick to is a maximum line length of 80 characters
|
|
||
| /* Check that the resolver behaves correctly */ | ||
| for (idx = 0; idx < resolved_size; ++idx) { | ||
| git_mailmap_resolve( |
There was a problem hiding this comment.
You should verify the return code with cl_git_pass
|
|
||
| void test_mailmap_parsing__string(void) | ||
| { | ||
| cl_check_pass(git_mailmap_parse( |
There was a problem hiding this comment.
You shouldn't use cl_check_pass but cl_git_pass
| @@ -0,0 +1,15 @@ | |||
| #!/bin/sh | |||
There was a problem hiding this comment.
You should probably remove all these hooks just to keep the churn as small as necessary
pks-t
left a comment
There was a problem hiding this comment.
Thanks for amending! I'm happy to see that you've used the git_parse_ctx stuff now. I left a few comments where I feel like the parsing interface is still awkward to use -- nothing that you need to address, just some notes for myself.
| void git_mailmap_resolve( | ||
| const char **name_out, const char **email_out, | ||
| const git_mailmap *mailmap, | ||
| const char *name, const char *email) |
There was a problem hiding this comment.
I didn't notice this was returning void at the first pass. I think you should return an error code here -- while the function cannot fail right now, you never know how you/we'll have to extend that function in the future. So returning error codes right now and having all callers check for failures is the right thing to do.
| } | ||
|
|
||
| static int advance_until( | ||
| const char **start, size_t *len, git_parse_ctx *ctx, char needle) |
There was a problem hiding this comment.
This can probably be pulled up into the parser interface itself.
| { | ||
| *start = ctx->line; | ||
| while (ctx->line_len > 0 && *ctx->line != '#' && *ctx->line != needle) | ||
| git_parse_advance_chars(ctx, 1); |
There was a problem hiding this comment.
I think an interface git_parse_advance_char(&c, ctx, 1); would be nice to have. Advance one character and put the next one into c.
| } | ||
|
|
||
| /* Parse a single entry from a mailmap file. | ||
| * |
There was a problem hiding this comment.
Multi-line comments should start out with an empty /*:
/*
* Parse a...
*/
| /* Parse the real name */ | ||
| git_parse_advance_ws(ctx); | ||
| if (advance_until(&start, &len, ctx, '<') < 0) | ||
| return -1; |
There was a problem hiding this comment.
if (...) {
giterr_set(GITERR_MAILMAP, "could not parse real name");
return -1;
}
You'd have to define GITERR_MAILMAP, but I think it can be helpful to detail what's wrong here
There was a problem hiding this comment.
We always ignore these parser errors & just skip the line (hence the potential desire for warnings). Should we still set giterr if we won't produce a visible error code?
There was a problem hiding this comment.
I realized that a bit later and thought to have deleted that comment. Existing code is fine as-is.
| void test_mailmap_basic__initialize(void) | ||
| { | ||
| git_buf buf = GIT_BUF_INIT; | ||
| git_buf_attach_notowned(&buf, TEST_MAILMAP, sizeof(TEST_MAILMAP) - 1); |
There was a problem hiding this comment.
You can just use strlen(TEST_MAILMAP) here, which is a bit more obvious. The compiler should do it's thing and optimize that out
| "Blatantly invalid line\n" | ||
| "Foo bar <foo@bar.com> <foo@bal.com>\n" | ||
| "<email@foo.com> <otheremail@foo.com>\n" | ||
| "<email@foo.com> Other Name <yetanotheremail@foo.com>\n"; |
There was a problem hiding this comment.
I don't think this one is valid. At least git-check-mailmap does not mention this syntax
There was a problem hiding this comment.
git's documentation doesn't mention this format, but when I was testing mailmap, IIRC this format is supported.
| cl_assert(!git__strcmp(entry->replace_email, "foo@baz.com")); | ||
|
|
||
| entry = git_mailmap_entry_byindex(mailmap, 10000); | ||
| cl_assert(!entry); |
There was a problem hiding this comment.
I'd put this one into a separate test. A test should only check one thing
| entry = git_mailmap_entry_byindex(mailmap, 0); | ||
| cl_assert(entry); | ||
| cl_assert(!entry->replace_name); | ||
| cl_assert(!git__strcmp(entry->replace_email, "foo@baz.com")); |
There was a problem hiding this comment.
I'd love if you'd validate all four entries.
struct {
char *real_name;
char *real_mail;
char *replace_name;
char *replace_mail;
} expected[] = {
{ "Foo bar", "foo@bar.com", "foo@baz.com" },
...
};
for (i = 0; i < git_mailmap_entry_count(mailmap); i++) {
entry_byindex(mailmap, i);
assert_all_things();
}
| const git_mailmap_entry *entry = git_mailmap_entry_lookup( | ||
| mailmap, "Typoed the name once", "foo@baz.com"); | ||
| cl_assert(entry); | ||
| cl_assert(!git__strcmp(entry->real_name, "Foo bar")); |
There was a problem hiding this comment.
There's cl_assert_equal_s, for this one and the following tests
|
@pks-t So, I've ended up making a few more changes to the patch than I expected to. While I was looking at the documentation for git's mailmap support more closely I realized that I was looking up mailmap entries for a repository incorrectly. git has a specific way of doing it, as well as a set of configs (mailmap.blob and mailmap.file) which override each other in a particular order. Unfortunately, this behaviour was incompatible with the API we were exposing from the mailmap module. I've rewritten the API to support the correct overriding behaviour, but it has caused a fairly large surface change. As an example, the individual entry objects are no longer part of the public API, and you can't index into the mailmap, as that doesn't make as much sense with the correct overriding behaviour. I'm mostly glad that I caught this before we merged it & had to make a breaking change >.<. I decided to continue to implement the backing store as a sorted git_vector, rather than using a git_strmap or similar, because it was simpler to implement, and required less moving parts. |
|
Push for myself such that this thread is further up in my mailbox and I remember to review this soon |
pks-t
left a comment
There was a problem hiding this comment.
Thanks for updating, finally here's another round of feedback. The issues are becoming less grave and more of stylistic nature, which is a good sign that this PR is on a good track.
| * @param mailmap mailmap to resolve with | ||
| * @return 0 or an error code | ||
| */ | ||
| GIT_EXTERN(int) git_signature_with_mailmap(git_signature **out, const git_signature *sig, const git_mailmap *mailmap); |
There was a problem hiding this comment.
I'm a bit unsure about this function. git_mailmap_resolve is in fact nearly the exact same thing, with the only difference being that this one here uses signatures instead of strings. So maybe move this into mailmap.h and call it git_mailmap_resolve_signature?
| void (*free)(git_writestream *stream); | ||
| }; | ||
|
|
||
| /** A parsed representation of a .mailmap file. */ |
There was a problem hiding this comment.
It's not necessarily parsed in case you've manually added entries or in case you've just allocated it, without parsing from anywhere. So I'd just drop "parsed" here
| * @param buf the buffer to read the mailmap file from | ||
| * @return 0 on success, or an error code | ||
| */ | ||
| GIT_EXTERN(int) git_mailmap_add_buffer(git_mailmap *mm, const git_buf *buf); |
There was a problem hiding this comment.
As this interface is external, we should probably not use a buffer here but a plain const char * pointer with an (optional) size_t len. In case len is 0, we should then fall back to use strlen. Using buffers would only be a nuisance for library users, as we don't expose most of its interface.
| error = git_vector_insert_sorted( | ||
| &mm->entries, entry, mailmap_entry_replace); | ||
| if (error < 0 && error != GIT_EEXISTS) | ||
| goto cleanup; |
There was a problem hiding this comment.
This is all duplicated code. Just use git_mailmap_add_entry
| cleanup: | ||
| mailmap_entry_free(entry); | ||
|
|
||
| /* We never allocate data in these buffers, but better safe than sorry */ |
There was a problem hiding this comment.
We do allocate data, but we detach the buffers so that they don't own it anymore. Slight difference ;)
| } | ||
|
|
||
| static int mailmap_add_blob( | ||
| git_mailmap *mm, git_repository *repo, const char *spec) |
There was a problem hiding this comment.
spec is a bit weird. rev or revision would probably be more obvious
| ssize_t fallback = -1; | ||
| size_t idx; | ||
| git_mailmap_entry *entry; | ||
| git_mailmap_entry needle = { NULL, NULL, NULL, (char *)email }; |
There was a problem hiding this comment.
I'd say explicit initialization would be a bit nicer here. All struct members are char pointers, so in the unlikely case we ever rearrange its members, this would silently continue to work. So just zeroing out and then explicitly assigning to the fields would be a bit nicer.
|
|
||
| cl_git_pass(git_blame_file(&g_blame, g_repo, "file.txt", &opts)); | ||
| if (!g_blame) | ||
| return; |
There was a problem hiding this comment.
This is unexpected. We just silently abort the test, but in fact we should always exactly know what to expect (and error if g_blame is not set)
|
|
||
| cl_git_pass(git_blame_file(&g_blame, g_repo, "file.txt", &opts)); | ||
| if (!g_blame) | ||
| return; |
| const char *real_email; | ||
| const char *replace_name; | ||
| const char *replace_email; | ||
| } mailmap_entry; |
There was a problem hiding this comment.
"mailmap_helpers.h" is a bit weird, as there are no helpers in here. It's more like data, so "mailmap_testdata.h" would fit better
|
@pks-t reminder that I posted some changes :-) |
|
Thanks for the reminder, @mystor. I'll review this PR no later than this Friday |
|
I'm happy with this PR. There's a memory leak introduced, though, as you don't ever free the |
|
@pks-t I've fixed the memory leak you pointed out. Sorry for the delay. Is there anything else I should be doing for this PR? |
|
Great, CI doesn't mention any memory leaks anymore. @ethomson, got any more comments on this PR? |
| * @param replace_email the email to replace | ||
| * @return 0 if it was added, EEXISTS if it replaced an entry, or an error code | ||
| */ | ||
| GIT_EXTERN(int) git_mailmap_add_entry( |
There was a problem hiding this comment.
I'm sorry for the last-minute review, but I don't love the EEXISTS return here. That really makes it sound like an error case, not a success. Do you think that "replaced an entry" is really such a valuable state to know? If so, what do you think about an out-param that indicates that instead of overloading the return value? Curious what @pks-t thinks.
There was a problem hiding this comment.
The reasoning behind this decision was to match the git_vector APIs here. Most consumers of the mailmap API will probably just get it from a git_repository which just ignores errors like this.
I'd probably be fine with just not communicating whether or not an entry was replaced entirely.
There was a problem hiding this comment.
Maybe an int force parameter (instead of a int *replaced) ?
There was a problem hiding this comment.
I'd probably be fine with just not communicating whether or not an entry was replaced entirely.
I'm +1 on this.
Not that it matters much anyway but...
This matches git.
Also matches git.
|
I'm still fine with this PR as is. :) |
| * | ||
| * @param out pointer to store the new mailmap | ||
| * @param buf buffer to parse the mailmap from | ||
| * @param len the length of the input buffer [optional: 0 defaults to 'strlen'] |
There was a problem hiding this comment.
This is not a pattern that we have been using - our APIs either take a NUL-terminated string or a buffer with length. (And if it's the latter, then 0 really means 0.) I think it would be confusing to have an API that differs from this pattern.
I don't have a strong opinion on which way this should go - it seems like we should just take a buf? I'm happy either way but the 0 means we do the strlen doesn't align with the rest of our APIs.
There was a problem hiding this comment.
It used to take a buf, and I changed it in an earlier patch. I'm fine with switching back to that if it's what we want to do.
| * @param len the length of the input buffer [optional: 0 defaults to 'strlen'] | ||
| * @return 0 on success, or an error code | ||
| */ | ||
| GIT_EXTERN(int) git_mailmap_add_buffer( |
There was a problem hiding this comment.
I wonder how much utility there is in making this a public method? It seems like most uses would prefer the git_mailmap_from_buffer. I'd be perfectly happy seeing this be an internal API.
If it does stay public (which is fine, if you have a use for it or think others will) then I do think that the name does not capture exactly what it does (at least when viewed through the lens of the names of our current APIs - add_buffer sounds like it will add a buffer, not add from a buffer).
I would propose git_mailmap_parse? Also, the same notes about the 0 len apply as with git_mailmap_from_buffer.
There was a problem hiding this comment.
Sure, I'm fine with making this internal. Not super attached to any of the public APIs other than the from_repo one.
…ap_add_buffer internal
|
@ethomson Made the changes you requested :-) |
|
@mystor did most of the work actually, so thank her! :). Thanks for all the reviews and for getting this merged :) |
Fixes #4565.