Skip to content

Patch (diff) application#4705

Merged
ethomson merged 60 commits intomasterfrom
ethomson/apply
Nov 11, 2018
Merged

Patch (diff) application#4705
ethomson merged 60 commits intomasterfrom
ethomson/apply

Conversation

@ethomson
Copy link
Copy Markdown
Member

@ethomson ethomson commented Jun 26, 2018

This pull request takes a git_diff that was either "generated" (ie, we calculated the difference between two commitish things) or "parsed" (we opened the output of git diff or a file that we created ourselves) and apply it in the same manner that git apply does.

In addition, it introduces git_apply_tree which will perform a fully in-memory patch application, applying a git_diff to a git_tree and returning a git_index.

My intent is to get this to a minimal viable product that we can merge and then continue to iterate on. I think that this is a useful API even if we don't support all the options that git apply itself does (eg, writing rejection hunks, etc). However, there are still a few things outstanding before I consider this mergeable:

@ethomson ethomson force-pushed the ethomson/apply branch 10 times, most recently from e6aff0a to 8d1926d Compare July 1, 2018 15:58
@ethomson
Copy link
Copy Markdown
Member Author

ethomson commented Jul 1, 2018

So as of 20acd2a, I'm pretty happy with where this is. It's not complete yet, or ready for merge -- I do want to add callbacks, at a minimum. But I do think that this is ready for a first round of review - the patch application is in good shape and the rest of the changes are additive.

If you're the type of person that likes to read commits in order then I've tried to make this a reasonable read.

Copy link
Copy Markdown
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor bits and bobs, but it was a really good read !

Comment thread src/reader.c
Comment thread src/reader.c
Comment thread src/reader.h Outdated
Comment thread src/apply.c
Comment thread src/apply.c Outdated
Comment thread tests/apply/workdir.c Outdated
Comment thread tests/apply/workdir.c Outdated
Comment thread src/apply.c
Copy link
Copy Markdown
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the code and tried to get a first grasp of what is happening. I didn't yet review the tests -- I'll do another, more involved, review at some point in time. Probably next week

Comment thread include/git2/apply.h Outdated
git_index **out,
git_repository *repo,
git_tree *preimage,
git_diff *diff);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honenstly find this function a bit strange. It makes it harder than necessary to apply multiple diffs to a tree in succession, as you first would need to derive a tree from the index again and write the resulting object into the ODB. Wouldn't it make more sense to have it return a tree, as well, or is there a specific reason why you provide it like this?

Comment thread src/reader.c Outdated
if ((error = git_tree_entry_bypath(&tree_entry,
reader->tree, filename)) < 0 ||
(error = git_blob_lookup(&blob,
git_tree_owner(reader->tree), git_tree_entry_id(tree_entry))) < 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to read due to indentation. The second (error = git_blob_lookup(... line should probably be dedented by 4 spaces.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/reader.c Outdated
goto done;

git_buf_clear(out);
git_buf_put(out, git_blob_rawcontent(blob), git_blob_rawsize(blob));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to clear if you use git_buf_set

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/reader.c Outdated
static void tree_reader_free(git_reader *_reader)
{
GIT_UNUSED(_reader);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having all these noop foo_reader_free function we could just pass a NULL pointer and have git_reader_free only call the function if it's set.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that none of the readers provide a free implementation, so I'll remove it entirely.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment thread src/reader.c

reader->reader.read = tree_reader_read;
reader->reader.free = tree_reader_free;
reader->tree = tree;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we increment the refcount of the tree and decrement it in tree_reader_free again? We already had lifecycle problems in other places using diff where we didn't increment the refcount to keep it alive

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary. These readers really just provide a handy abstraction over querying them, and their lifecycle is never longer than the apply functions -- meaning that a caller couldn't try to free a tree or index in the middle of application (at least not on a single thread).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer going for safety here IMHO, just to make sure the first person to use git_reader in a "non-atomic" fashion doesn't get to debug lifecycle problems. Though a comment stating the implementation doesn't support that would be fine as well 😉.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not necessary now, I agree with @tiennou -- future proofing the API here might be sensible, as one will forget about this limitation and eventually be bitten by the limitation. The only thing keeping you from that is the additional required free function pointer, right?

Comment thread src/reader.h Outdated
};

extern int git_reader_for_nothing(
git_reader **out);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not have an implementation. Probably a leftover?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment thread src/reader.h
Comment thread src/apply.c
Comment thread src/apply.c Outdated
error = git_reader_for_workdir(&pre_reader, repo, true);
else if (opts.location == GIT_APPLY_LOCATION_INDEX)
error = git_reader_for_index(&pre_reader, repo, NULL);
else
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we used a switch here. Like that, the compiler would throw a warning in case we ever add another enum

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to a switch.

Comment thread src/apply.c Outdated
if (opts.location == GIT_APPLY_LOCATION_INDEX)
error = git_apply__to_index(repo, diff, preimage, postimage, &opts);
else
error = git_apply__to_workdir(repo, diff, preimage, postimage, &opts);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to handle BOTH here, I guess, because of the way we initialized the reader? I'd personally vote for a switch, as well

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not excited about using a switch here, but I also don't care very much, so I changed it.

@dometto
Copy link
Copy Markdown

dometto commented Sep 17, 2018

@ethomson could you give an approximate timeline for this? Many thanks!

@ethomson
Copy link
Copy Markdown
Member Author

Further testing has revealed problems in the patch application, I don’t have an ETA.

@ethomson
Copy link
Copy Markdown
Member Author

I've added per-delta and per-hunk callbacks. I'll go through the notes from the prior review (up to
20acd2a) tomorrow

@ethomson
Copy link
Copy Markdown
Member Author

I've cherry-picked in @hackhaslam's tests and have fixed this up per the comments.

@ethomson
Copy link
Copy Markdown
Member Author

@pks-t, @tiennou are you happy with these changes or did you want another 👀 ?

@pks-t
Copy link
Copy Markdown
Member

pks-t commented Oct 18, 2018 via email

@ethomson ethomson changed the title WIP: Patch (diff) application Patch (diff) application Oct 18, 2018
@ethomson
Copy link
Copy Markdown
Member Author

I didn't yet care to do another review as this is still labelled as WIP ;)

FTFY.

Copy link
Copy Markdown
Contributor

@tiennou tiennou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the full diff because of the many fixups. No nitpicks were harmed in the writing of this review 😉.

Comment thread include/git2/apply.h Outdated
Comment thread include/git2/apply.h Outdated
Comment thread include/git2/apply.h
Comment thread src/apply.c
error = ctx->opts.hunk_cb(&hunk->hunk, ctx->opts.payload);

if (error) {
if (error > 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this nested if confusing (especially given the implicit != 0 check).

if (error > 0) {
    /* current code */
} else if (error < 0) {
    goto done;
}

maybe ?

Also, GIT_EUSER comes to mind 😉.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does GIT_EUSER come to mind?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have mistook that for GIT_PASSTHROUGH, sorry.

Comment thread src/apply.c
error = opts->delta_cb(delta, opts->payload);

if (error) {
if (error > 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same complaint as above.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote up your suggestion, but I found it less readable than what's here now, tbqh. @pks-t can you break a tie?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A:

if (error > 0) {
    error = 0;
    goto done;
} else if (error < 0) {
    goto done;
}

B:

if (error) {
    if (error > 0)
        error = 0;

    goto done;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer style B, which matches what you have right now. Style A is a bit more subtle, as the reader would have to spot that neither of the conditionals catches the case error == 0, which is not that easy to see. Style B is a lot more explicit, as one cannot miss the fact that positive error codes are being reset.

Comment thread src/apply.c Outdated
Comment thread src/reader.c Outdated
Comment thread src/reader.c

reader->reader.read = tree_reader_read;
reader->reader.free = tree_reader_free;
reader->tree = tree;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer going for safety here IMHO, just to make sure the first person to use git_reader in a "non-atomic" fashion doesn't get to debug lifecycle problems. Though a comment stating the implementation doesn't support that would be fine as well 😉.

Comment thread src/reader.c
Comment thread tests/apply/both.c Outdated
@ethomson
Copy link
Copy Markdown
Member Author

I'm trying to accept your changes, and manually updating the commit message to be something that git rebase -i will parse as a fixup. (eg, the moral equivalent of git commit --fixup <id>. We'll see how this works... 😀

Comment thread src/reader.h

/**
* Create a `git_reader` that will allow random access to the given
* index, or the repository's index.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* index, or the repository's index.
* repository index.

@ethomson
Copy link
Copy Markdown
Member Author

Fixes #1980

@tiennou
Copy link
Copy Markdown
Contributor

tiennou commented Nov 1, 2018

BTW, I'd be interested in some kind of "report" on the usage of suggestions. They're quite easy to make, but I'd like to be sure that they're not a pain to re-fixup afterward (it sounds like it it).

Copy link
Copy Markdown
Member

@pks-t pks-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally got around to it. Sorry for taking so long, the security releases kept me busy. :/

Comment thread include/git2/apply.h Outdated
* @param diff the diff to apply
* @param options the options for the apply (or null for defaults)
*/
GIT_EXTERN(int) git_apply_tree(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would git_apply_to_tree be a better name? Right now it indicates that we try to apply a tree, while we in fact apply a diff to a tree.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Fixed.

Comment thread src/apply.c
error = opts->delta_cb(delta, opts->payload);

if (error) {
if (error > 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer style B, which matches what you have right now. Style A is a bit more subtle, as the reader would have to spot that neither of the conditionals catches the case error == 0, which is not that easy to see. Style B is a lot more explicit, as one cannot miss the fact that positive error codes are being reset.

Comment thread src/apply.c

/* ENOTFOUND means the preimage was not found; apply failed. */
if (error == GIT_ENOTFOUND)
error = GIT_EAPPLYFAIL;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make sense to also have an error message here. You overwrite GIT_ENOTFOUND, so the caller has no way of knowing that we failed due to the file not existing. So we should at least set an error message:

if (error == GIT_ENOTFOUND)
    error = apply_err("%s: file not found", delta->old_file.path);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do overwrite GIT_ENOTFOUND, but I don't overwrite the error message itself, which would have been set by the reader. This means that it's actually more explicit, since the existing error message distinguishes between the index and the working directory. eg, if it's not found in the index, it will say index does not contain 'foo'.

Comment thread src/apply.c Outdated
}
}

if (delta->status == GIT_DELTA_DELETED)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of skipping the rest of this function, you could make this code path similar to the previous one by using if (delta_status != GIT_DELTA_DELETED), moving remaining code into that condition's scope.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I do like that better, it matches nicely the condition and formatting above it.

Comment thread src/apply.c Outdated
if ((error = git_reader_for_tree(&pre_reader, preimage)) < 0)
goto done;

/* put the current tree into the postimage as-is - the diff will
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: misformated comment

Comment thread src/reader.c
Comment thread src/reader.h Outdated
*/
extern int git_reader_for_index(
git_reader **out,
git_repository *repo);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more sensible to pass an explicit index instead of an implicit one? First, this would match the previous git_reader_for_tree and second it would potentially open up further usecases

" in, first taking off their skins, by letting them stand a few minutes in\n" \
"-hot water, when they may be easily peeled. When made in this way you\n" \
"+Changed line.\n" \
" must thicken it with the flour only. Any part of the veal may be used,\n"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeez, I'm hungry already

Comment thread tests/apply/both.c Outdated
idx_entry.mode = 0100644;
idx_entry.path = "asparagus.txt";
cl_git_pass(git_oid_fromstr(&idx_entry.id, "06d3fefb8726ab1099acc76e02dfb85e034b2538"));
git_index_add(index, &idx_entry);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cl_git_pass

Comment thread tests/apply/both.c
validate_apply_workdir(repo, workdir_expected, workdir_expected_cnt);

git_diff_free(diff);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I didn't just skip over them, then renames are missing. Might be interesting to have different rename cases, too, like two-to-one, one-to-two and one-one (file1 -> file2, file2 -> file3).

Another test case would be the previous thing I mentioned, with having two patches for the same file in one application.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some additional tests. When you say two-to-one, you mean a mangled patch that has two sources being renamed to the same target? And by one-to-two, another mangled patch, this one with a single source being renamed to two different targets? These seem like reasonable tests.

Optionally hash the contents of files encountered in the filesystem or
working directory iterators.  This is not expected to be used in
production code paths, but may allow us to simplify some test contexts.

For working directory iterators, apply filters as appropriate, since we
have the context able to do it.
hackhaslam and others added 13 commits November 5, 2018 15:53
Create a test applying a patch with a rename and a modification of a
file.
When applying to both the index and the working directory, ensure that
the working directory's mode matches the index's mode.  It's not
sufficient to look only at the hashed object id to determine that the
file is unchanged, git also takes the mode into account.
Deltas containing exact renames are special; they simple indicate that a
file was renamed without providing additional metadata (like the
filemode).  Teach the reader to provide the file mode and use the
preimage's filemode in the case that the delta does not provide one.)
Test that we can rename some file from B->C and then rename some other
file from A->B.  Do this with both exact rename patches (eg `rename from
...` / `rename to ...`) and patches that remove the files and replace
them entirely.
Test a rename from A->B simultaneous with a rename from B->A.
Test that we can apply a patch that renames two different files to the
same target filename.  Git itself handles this scenario in a last-write
wins, such that the rename listed last is the one persisted in the
target.  Ensure that we do the same.
Test that a patch can contain two deltas that appear to rename an
initial source file to two different destination paths.  Git creates
both target files with the initial source contents; ensure that we do,
too.
git allows a patch file to contain multiple deltas to the same file:
although it does not produce files in this format itself, this could
be the result of concatenating two different patch files that affected
the same file.

git apply behaves by applying this next delta to the existing postimage
of the file.  We should do the same.  If we have previously seen a file,
and produced a postimage for it, we will load that postimage and apply
the current delta to that.  If we have not, get the file from the
preimage.
Ensure that we can apply a delta after renaming a file.
Multiple deltas can exist in a diff, and can be applied in-order.
However if there exists a delta that renames a file, it must be first,
so that other deltas can reference the resulting target file.

git enforces this (`error: already exists in index`), so ensure that we
do, too.
Multiple deltas can exist in a diff, and can be applied in-order.
If there exists a delta that modifies a file followed by a delta that
renames that file, then both will be captured.  The modification delta
will be applied and the resulting file will be staged with the original
filename.  The rename delta will be independently applied - to the
original file (not the modified file from the original delta) and staged
independently.
Ensure that we cannot modify a file after it's been renamed out of the
way.  If multiple deltas exist for a single path, ensure that we do not
attempt to modify a file after it's been renamed out of the way.

To support this, we must track the paths that have been removed or
renamed; add to a string map when we remove a path and remove from the
string map if we recreate a path.  Validate that we are not applying to
a path that is in this map, unless the delta is a rename, since git
supports renaming one file to two different places in two different
deltas.

Further, test that we cannot apply a modification delta to a path that
will be created in the future by a rename (a path that does not yet
exist.)
Ensure that we can add a file back after it's been removed.  Update the
renamed/deleted validation in application to not apply to deltas that
are adding files to support this.
@ethomson
Copy link
Copy Markdown
Member Author

ethomson commented Nov 5, 2018

I went through some more tests - in particular, the idea that you couldn't modify a file in a delta and then rename it in a second delta bothered me. I realized that my test was malformed (I was trying to rename a file to a different, already existing file, hence the error message). Recreating the test validates that git does indeed let you update some file named foo in one delta and then rename it in another.

I also added validation that we will not apply modification deltas to files after deleting them, and added several tests around strange behavior with multiple deltas.

@ethomson
Copy link
Copy Markdown
Member Author

Thanks everybody for the reviews! 🚢

@ethomson ethomson merged commit 11fbead into master Nov 11, 2018
@pks-t
Copy link
Copy Markdown
Member

pks-t commented Nov 13, 2018

Exciting times! Thanks a lot, @ethomson!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants