Patch (diff) application by ethomson · Pull Request #4705 · libgit2/libgit2

ethomson · 2018-06-26T14:37:26Z

This pull request takes a git_diff that was either "generated" (ie, we calculated the difference between two commitish things) or "parsed" (we opened the output of git diff or a file that we created ourselves) and apply it in the same manner that git apply does.

In addition, it introduces git_apply_tree which will perform a fully in-memory patch application, applying a git_diff to a git_tree and returning a git_index.

My intent is to get this to a minimal viable product that we can merge and then continue to iterate on. I think that this is a useful API even if we don't support all the options that git apply itself does (eg, writing rejection hunks, etc). However, there are still a few things outstanding before I consider this mergeable:

Tests for GIT_APPLY_LOCATION_INDEX
Tests for GIT_APPLY_LOCATION_BOTH
Whole-patch callbacks?
Hunk callbacks (eg, "partial patch application") a la patch: add support for partial patch application #4184

ethomson · 2018-07-01T21:33:15Z

So as of 20acd2a, I'm pretty happy with where this is. It's not complete yet, or ready for merge -- I do want to add callbacks, at a minimum. But I do think that this is ready for a first round of review - the patch application is in good shape and the rest of the changes are additive.

If you're the type of person that likes to read commits in order then I've tried to make this a reasonable read.

tiennou

Some minor bits and bobs, but it was a really good read !

pks-t

I went through the code and tried to get a first grasp of what is happening. I didn't yet review the tests -- I'll do another, more involved, review at some point in time. Probably next week

pks-t · 2018-08-03T10:29:11Z

+	git_index **out,
+	git_repository *repo,
+	git_tree *preimage,
+	git_diff *diff);


I honenstly find this function a bit strange. It makes it harder than necessary to apply multiple diffs to a tree in succession, as you first would need to derive a tree from the index again and write the resulting object into the ODB. Wouldn't it make more sense to have it return a tree, as well, or is there a specific reason why you provide it like this?

pks-t · 2018-08-03T10:37:49Z

+	if ((error = git_tree_entry_bypath(&tree_entry,
+		reader->tree, filename)) < 0 ||
+		(error = git_blob_lookup(&blob,
+		git_tree_owner(reader->tree), git_tree_entry_id(tree_entry))) < 0)


It's a bit hard to read due to indentation. The second (error = git_blob_lookup(... line should probably be dedented by 4 spaces.

pks-t · 2018-08-03T10:41:27Z

+		goto done;
+
+	git_buf_clear(out);
+	git_buf_put(out, git_blob_rawcontent(blob), git_blob_rawsize(blob));


No need to clear if you use git_buf_set

pks-t · 2018-08-03T10:45:24Z

+static void tree_reader_free(git_reader *_reader)
+{
+	GIT_UNUSED(_reader);
+}


Instead of having all these noop foo_reader_free function we could just pass a NULL pointer and have git_reader_free only call the function if it's set.

It turns out that none of the readers provide a free implementation, so I'll remove it entirely.

pks-t · 2018-08-03T10:47:06Z

+
+	reader->reader.read = tree_reader_read;
+	reader->reader.free = tree_reader_free;
+	reader->tree = tree;


Shouldn't we increment the refcount of the tree and decrement it in tree_reader_free again? We already had lifecycle problems in other places using diff where we didn't increment the refcount to keep it alive

I don't think that's necessary. These readers really just provide a handy abstraction over querying them, and their lifecycle is never longer than the apply functions -- meaning that a caller couldn't try to free a tree or index in the middle of application (at least not on a single thread).

I'd prefer going for safety here IMHO, just to make sure the first person to use git_reader in a "non-atomic" fashion doesn't get to debug lifecycle problems. Though a comment stating the implementation doesn't support that would be fine as well 😉.

While not necessary now, I agree with @tiennou -- future proofing the API here might be sensible, as one will forget about this limitation and eventually be bitten by the limitation. The only thing keeping you from that is the additional required free function pointer, right?

pks-t · 2018-08-03T11:08:36Z

+};
+
+extern int git_reader_for_nothing(
+	git_reader **out);


This does not have an implementation. Probably a leftover?

pks-t · 2018-08-03T11:22:21Z

+		error = git_reader_for_workdir(&pre_reader, repo, true);
+	else if (opts.location == GIT_APPLY_LOCATION_INDEX)
+		error = git_reader_for_index(&pre_reader, repo, NULL);
+	else


I'd prefer if we used a switch here. Like that, the compiler would throw a warning in case we ever add another enum

Changed this to a switch.

pks-t · 2018-08-03T11:25:07Z

+	if (opts.location == GIT_APPLY_LOCATION_INDEX)
+		error = git_apply__to_index(repo, diff, preimage, postimage, &opts);
+	else
+		error = git_apply__to_workdir(repo, diff, preimage, postimage, &opts);


No need to handle BOTH here, I guess, because of the way we initialized the reader? I'd personally vote for a switch, as well

I'm not excited about using a switch here, but I also don't care very much, so I changed it.

dometto · 2018-09-17T10:16:52Z

@ethomson could you give an approximate timeline for this? Many thanks!

ethomson · 2018-09-17T10:53:30Z

Further testing has revealed problems in the patch application, I don’t have an ETA.

ethomson · 2018-09-29T18:49:52Z

I've added per-delta and per-hunk callbacks. I'll go through the notes from the prior review (up to
20acd2a) tomorrow

ethomson · 2018-10-11T14:26:27Z

I've cherry-picked in @hackhaslam's tests and have fixed this up per the comments.

ethomson · 2018-10-17T13:33:06Z

@pks-t, @tiennou are you happy with these changes or did you want another 👀 ?

pks-t · 2018-10-18T10:57:55Z

On Wed, Oct 17, 2018 at 06:33:19AM -0700, Edward Thomson wrote: @pks-t, @tiennou are you happy with these changes or did you want another 👀 ?

I didn't yet care to do another review as this is still labelled as WIP ;) I can do that tomorrow, though

ethomson · 2018-10-18T11:30:42Z

I didn't yet care to do another review as this is still labelled as WIP ;)

FTFY.

tiennou

I only reviewed the full diff because of the many fixups. No nitpicks were harmed in the writing of this review 😉.

tiennou · 2018-10-18T22:16:53Z

+		error = ctx->opts.hunk_cb(&hunk->hunk, ctx->opts.payload);
+
+		if (error) {
+			if (error > 0) {


I find this nested if confusing (especially given the implicit != 0 check).

if (error > 0) { /* current code */ } else if (error < 0) { goto done; }

maybe ?

Also, GIT_EUSER comes to mind 😉.

Why does GIT_EUSER come to mind?

I might have mistook that for GIT_PASSTHROUGH, sorry.

tiennou · 2018-10-18T22:19:20Z

+		error = opts->delta_cb(delta, opts->payload);
+
+		if (error) {
+			if (error > 0)


Same complaint as above.

I wrote up your suggestion, but I found it less readable than what's here now, tbqh. @pks-t can you break a tie?

A:

if (error > 0) { error = 0; goto done; } else if (error < 0) { goto done; }

B:

if (error) { if (error > 0) error = 0; goto done; }

I think I prefer style B, which matches what you have right now. Style A is a bit more subtle, as the reader would have to spot that neither of the conditionals catches the case error == 0, which is not that easy to see. Style B is a lot more explicit, as one cannot miss the fact that positive error codes are being reset.

tiennou · 2018-10-18T22:46:33Z

+
+	reader->reader.read = tree_reader_read;
+	reader->reader.free = tree_reader_free;
+	reader->tree = tree;


I'd prefer going for safety here IMHO, just to make sure the first person to use git_reader in a "non-atomic" fashion doesn't get to debug lifecycle problems. Though a comment stating the implementation doesn't support that would be fine as well 😉.

ethomson · 2018-10-19T06:58:47Z

I'm trying to accept your changes, and manually updating the commit message to be something that git rebase -i will parse as a fixup. (eg, the moral equivalent of git commit --fixup <id>. We'll see how this works... 😀

tiennou · 2018-10-19T09:26:54Z

+
+/**
+ * Create a `git_reader` that will allow random access to the given
+ * index, or the repository's index.


Suggested change

* index, or the repository's index.

* repository index.

ethomson · 2018-10-21T02:28:58Z

Fixes #1980

tiennou · 2018-11-01T02:26:59Z

BTW, I'd be interested in some kind of "report" on the usage of suggestions. They're quite easy to make, but I'd like to be sure that they're not a pain to re-fixup afterward (it sounds like it it).

pks-t

Finally got around to it. Sorry for taking so long, the security releases kept me busy. :/

pks-t · 2018-11-02T11:19:11Z

+ * @param diff the diff to apply
+ * @param options the options for the apply (or null for defaults)
+ */
+GIT_EXTERN(int) git_apply_tree(


Would git_apply_to_tree be a better name? Right now it indicates that we try to apply a tree, while we in fact apply a diff to a tree.

Yes, I agree. Fixed.

pks-t · 2018-11-02T11:27:10Z

+		error = opts->delta_cb(delta, opts->payload);
+
+		if (error) {
+			if (error > 0)


I think I prefer style B, which matches what you have right now. Style A is a bit more subtle, as the reader would have to spot that neither of the conditionals catches the case error == 0, which is not that easy to see. Style B is a lot more explicit, as one cannot miss the fact that positive error codes are being reset.

pks-t · 2018-11-02T11:30:35Z

+
+		/* ENOTFOUND means the preimage was not found; apply failed. */
+		if (error == GIT_ENOTFOUND)
+			error = GIT_EAPPLYFAIL;


I think it'd make sense to also have an error message here. You overwrite GIT_ENOTFOUND, so the caller has no way of knowing that we failed due to the file not existing. So we should at least set an error message:

if (error == GIT_ENOTFOUND) error = apply_err("%s: file not found", delta->old_file.path);

I do overwrite GIT_ENOTFOUND, but I don't overwrite the error message itself, which would have been set by the reader. This means that it's actually more explicit, since the existing error message distinguishes between the index and the working directory. eg, if it's not found in the index, it will say index does not contain 'foo'.

pks-t · 2018-11-02T11:35:27Z

+		}
+	}
+
+	if (delta->status == GIT_DELTA_DELETED)


Instead of skipping the rest of this function, you could make this code path similar to the previous one by using if (delta_status != GIT_DELTA_DELETED), moving remaining code into that condition's scope.

Indeed, I do like that better, it matches nicely the condition and formatting above it.

pks-t · 2018-11-02T11:36:11Z

+	if ((error = git_reader_for_tree(&pre_reader, preimage)) < 0)
+		goto done;
+
+	/* put the current tree into the postimage as-is - the diff will


Nit: misformated comment

pks-t · 2018-11-02T12:17:10Z

+ */
+extern int git_reader_for_index(
+	git_reader **out,
+	git_repository *repo);


Wouldn't it be more sensible to pass an explicit index instead of an implicit one? First, this would match the previous git_reader_for_tree and second it would potentially open up further usecases

pks-t · 2018-11-02T12:18:01Z

+	" in, first taking off their skins, by letting them stand a few minutes in\n" \
+	"-hot water, when they may be easily peeled. When made in this way you\n" \
+	"+Changed line.\n" \
+	" must thicken it with the flour only. Any part of the veal may be used,\n"


Jeez, I'm hungry already

pks-t · 2018-11-02T12:19:19Z

+	idx_entry.mode = 0100644;
+	idx_entry.path = "asparagus.txt";
+	cl_git_pass(git_oid_fromstr(&idx_entry.id, "06d3fefb8726ab1099acc76e02dfb85e034b2538"));
+	git_index_add(index, &idx_entry);


cl_git_pass

pks-t · 2018-11-02T12:21:51Z

+	validate_apply_workdir(repo, workdir_expected, workdir_expected_cnt);
+
+	git_diff_free(diff);
+}


If I didn't just skip over them, then renames are missing. Might be interesting to have different rename cases, too, like two-to-one, one-to-two and one-one (file1 -> file2, file2 -> file3).

Another test case would be the previous thing I mentioned, with having two patches for the same file in one application.

I'll add some additional tests. When you say two-to-one, you mean a mangled patch that has two sources being renamed to the same target? And by one-to-two, another mangled patch, this one with a single source being renamed to two different targets? These seem like reasonable tests.

Optionally hash the contents of files encountered in the filesystem or working directory iterators. This is not expected to be used in production code paths, but may allow us to simplify some test contexts. For working directory iterators, apply filters as appropriate, since we have the context able to do it.

Create a test applying a patch with a rename and a modification of a file.

When applying to both the index and the working directory, ensure that the working directory's mode matches the index's mode. It's not sufficient to look only at the hashed object id to determine that the file is unchanged, git also takes the mode into account.

Deltas containing exact renames are special; they simple indicate that a file was renamed without providing additional metadata (like the filemode). Teach the reader to provide the file mode and use the preimage's filemode in the case that the delta does not provide one.)

Test that we can rename some file from B->C and then rename some other file from A->B. Do this with both exact rename patches (eg `rename from ...` / `rename to ...`) and patches that remove the files and replace them entirely.

Test a rename from A->B simultaneous with a rename from B->A.

Test that we can apply a patch that renames two different files to the same target filename. Git itself handles this scenario in a last-write wins, such that the rename listed last is the one persisted in the target. Ensure that we do the same.

Test that a patch can contain two deltas that appear to rename an initial source file to two different destination paths. Git creates both target files with the initial source contents; ensure that we do, too.

git allows a patch file to contain multiple deltas to the same file: although it does not produce files in this format itself, this could be the result of concatenating two different patch files that affected the same file. git apply behaves by applying this next delta to the existing postimage of the file. We should do the same. If we have previously seen a file, and produced a postimage for it, we will load that postimage and apply the current delta to that. If we have not, get the file from the preimage.

Ensure that we can apply a delta after renaming a file.

Multiple deltas can exist in a diff, and can be applied in-order. However if there exists a delta that renames a file, it must be first, so that other deltas can reference the resulting target file. git enforces this (`error: already exists in index`), so ensure that we do, too.

Multiple deltas can exist in a diff, and can be applied in-order. If there exists a delta that modifies a file followed by a delta that renames that file, then both will be captured. The modification delta will be applied and the resulting file will be staged with the original filename. The rename delta will be independently applied - to the original file (not the modified file from the original delta) and staged independently.

Ensure that we cannot modify a file after it's been renamed out of the way. If multiple deltas exist for a single path, ensure that we do not attempt to modify a file after it's been renamed out of the way. To support this, we must track the paths that have been removed or renamed; add to a string map when we remove a path and remove from the string map if we recreate a path. Validate that we are not applying to a path that is in this map, unless the delta is a rename, since git supports renaming one file to two different places in two different deltas. Further, test that we cannot apply a modification delta to a path that will be created in the future by a rename (a path that does not yet exist.)

Ensure that we can add a file back after it's been removed. Update the renamed/deleted validation in application to not apply to deltas that are adding files to support this.

ethomson · 2018-11-05T16:28:39Z

I went through some more tests - in particular, the idea that you couldn't modify a file in a delta and then rename it in a second delta bothered me. I realized that my test was malformed (I was trying to rename a file to a different, already existing file, hence the error message). Recreating the test validates that git does indeed let you update some file named foo in one delta and then rename it in another.

I also added validation that we will not apply modification deltas to files after deleting them, and added several tests around strange behavior with multiple deltas.

ethomson · 2018-11-11T16:40:51Z

Thanks everybody for the reviews! 🚢

pks-t · 2018-11-13T09:31:11Z

Exciting times! Thanks a lot, @ethomson!

ethomson force-pushed the ethomson/apply branch 10 times, most recently from e6aff0a to 8d1926d Compare July 1, 2018 15:58

tiennou reviewed Jul 17, 2018

View reviewed changes

Comment thread src/reader.c

Comment thread src/reader.c

Comment thread src/reader.h Outdated

Comment thread src/apply.c

Comment thread src/apply.c Outdated

Comment thread tests/apply/workdir.c Outdated

Comment thread tests/apply/workdir.c Outdated

Comment thread src/apply.c

pks-t reviewed Aug 3, 2018

View reviewed changes

ethomson mentioned this pull request Sep 3, 2018

Implement API for partial file adding similar to "git add -p" #591

Closed

dometto mentioned this pull request Sep 17, 2018

Breaks gollum's revert_page gollum/rugged_adapter#15

Closed

ethomson changed the title ~~WIP: Patch (diff) application~~ Patch (diff) application Oct 18, 2018

tiennou reviewed Oct 18, 2018

View reviewed changes

tiennou reviewed Oct 19, 2018

View reviewed changes

pks-t reviewed Nov 2, 2018

View reviewed changes

hackhaslam and others added 13 commits November 5, 2018 15:53

patch: add tests for aborting hunk callback

620ac9c

apply: test a patch with rename and modification

b73a42f

Create a test applying a patch with a rename and a modification of a file.

apply: test rename A -> B -> C scenarios

89b5a56

Test that we can rename some file from B->C and then rename some other file from A->B. Do this with both exact rename patches (eg `rename from ...` / `rename to ...`) and patches that remove the files and replace them entirely.

apply: test circular rename

235dc9b

Test a rename from A->B simultaneous with a rename from B->A.

apply: test rename 2 to 1

56a2ae0

Test that we can apply a patch that renames two different files to the same target filename. Git itself handles this scenario in a last-write wins, such that the rename listed last is the one persisted in the target. Ensure that we do the same.

apply: test rename 1 to 2

c71e964

Test that a patch can contain two deltas that appear to rename an initial source file to two different destination paths. Git creates both target files with the initial source contents; ensure that we do, too.

apply: test multiple deltas to new file

07e71bf

apply: test modify delta after rename delta

a3c1070

Ensure that we can apply a delta after renaming a file.

ethomson force-pushed the ethomson/apply branch from ee8a1ff to da4ebe1 Compare November 5, 2018 15:57

ethomson added 3 commits November 5, 2018 16:13

apply: test re-adding a file after removing it

f8b9493

Ensure that we can add a file back after it's been removed. Update the renamed/deleted validation in application to not apply to deltas that are adding files to support this.

test: ensure applying a patch can't delete a file twice

4e746d8

ethomson force-pushed the ethomson/apply branch from da4ebe1 to 4e746d8 Compare November 5, 2018 16:13

ethomson merged commit 11fbead into master Nov 11, 2018

This was referenced Dec 4, 2018

Add git_apply libgit2/pygit2#841

Closed

Added Repository.apply libgit2/pygit2#843

Merged

ethomson deleted the ethomson/apply branch January 9, 2019 10:15

tiennou mentioned this pull request Jan 29, 2019

Add support for apply-patch #1980

Closed

dometto mentioned this pull request Mar 12, 2019

Add Repository#apply libgit2/rugged#788

Merged

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade nodegit from 0.4.1 to 0.26.4 saurabharch/Breezeblocks#1

Open

snyk-bot mentioned this pull request Apr 22, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 aminatakonate000/Graviton-App#4

Open

snyk-bot mentioned this pull request May 5, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 Barnstorm-Online/ngp-openapi-generator#1

Open

Conversation

ethomson commented Jun 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethomson commented Jul 1, 2018

Uh oh!

tiennou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pks-t left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dometto commented Sep 17, 2018

Uh oh!

ethomson commented Sep 17, 2018

Uh oh!

ethomson commented Sep 29, 2018

Uh oh!

ethomson commented Oct 11, 2018

Uh oh!

ethomson commented Oct 17, 2018

Uh oh!

pks-t commented Oct 18, 2018 via email

Uh oh!

ethomson commented Oct 18, 2018

Uh oh!

tiennou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethomson commented Jun 26, 2018 •

edited

Loading