Conversation
e6aff0a to
8d1926d
Compare
|
So as of 20acd2a, I'm pretty happy with where this is. It's not complete yet, or ready for merge -- I do want to add callbacks, at a minimum. But I do think that this is ready for a first round of review - the patch application is in good shape and the rest of the changes are additive. If you're the type of person that likes to read commits in order then I've tried to make this a reasonable read. |
tiennou
left a comment
There was a problem hiding this comment.
Some minor bits and bobs, but it was a really good read !
pks-t
left a comment
There was a problem hiding this comment.
I went through the code and tried to get a first grasp of what is happening. I didn't yet review the tests -- I'll do another, more involved, review at some point in time. Probably next week
| git_index **out, | ||
| git_repository *repo, | ||
| git_tree *preimage, | ||
| git_diff *diff); |
There was a problem hiding this comment.
I honenstly find this function a bit strange. It makes it harder than necessary to apply multiple diffs to a tree in succession, as you first would need to derive a tree from the index again and write the resulting object into the ODB. Wouldn't it make more sense to have it return a tree, as well, or is there a specific reason why you provide it like this?
| if ((error = git_tree_entry_bypath(&tree_entry, | ||
| reader->tree, filename)) < 0 || | ||
| (error = git_blob_lookup(&blob, | ||
| git_tree_owner(reader->tree), git_tree_entry_id(tree_entry))) < 0) |
There was a problem hiding this comment.
It's a bit hard to read due to indentation. The second (error = git_blob_lookup(... line should probably be dedented by 4 spaces.
| goto done; | ||
|
|
||
| git_buf_clear(out); | ||
| git_buf_put(out, git_blob_rawcontent(blob), git_blob_rawsize(blob)); |
There was a problem hiding this comment.
No need to clear if you use git_buf_set
| static void tree_reader_free(git_reader *_reader) | ||
| { | ||
| GIT_UNUSED(_reader); | ||
| } |
There was a problem hiding this comment.
Instead of having all these noop foo_reader_free function we could just pass a NULL pointer and have git_reader_free only call the function if it's set.
There was a problem hiding this comment.
It turns out that none of the readers provide a free implementation, so I'll remove it entirely.
|
|
||
| reader->reader.read = tree_reader_read; | ||
| reader->reader.free = tree_reader_free; | ||
| reader->tree = tree; |
There was a problem hiding this comment.
Shouldn't we increment the refcount of the tree and decrement it in tree_reader_free again? We already had lifecycle problems in other places using diff where we didn't increment the refcount to keep it alive
There was a problem hiding this comment.
I don't think that's necessary. These readers really just provide a handy abstraction over querying them, and their lifecycle is never longer than the apply functions -- meaning that a caller couldn't try to free a tree or index in the middle of application (at least not on a single thread).
There was a problem hiding this comment.
I'd prefer going for safety here IMHO, just to make sure the first person to use git_reader in a "non-atomic" fashion doesn't get to debug lifecycle problems. Though a comment stating the implementation doesn't support that would be fine as well 😉.
There was a problem hiding this comment.
While not necessary now, I agree with @tiennou -- future proofing the API here might be sensible, as one will forget about this limitation and eventually be bitten by the limitation. The only thing keeping you from that is the additional required free function pointer, right?
| }; | ||
|
|
||
| extern int git_reader_for_nothing( | ||
| git_reader **out); |
There was a problem hiding this comment.
This does not have an implementation. Probably a leftover?
| error = git_reader_for_workdir(&pre_reader, repo, true); | ||
| else if (opts.location == GIT_APPLY_LOCATION_INDEX) | ||
| error = git_reader_for_index(&pre_reader, repo, NULL); | ||
| else |
There was a problem hiding this comment.
I'd prefer if we used a switch here. Like that, the compiler would throw a warning in case we ever add another enum
There was a problem hiding this comment.
Changed this to a switch.
| if (opts.location == GIT_APPLY_LOCATION_INDEX) | ||
| error = git_apply__to_index(repo, diff, preimage, postimage, &opts); | ||
| else | ||
| error = git_apply__to_workdir(repo, diff, preimage, postimage, &opts); |
There was a problem hiding this comment.
No need to handle BOTH here, I guess, because of the way we initialized the reader? I'd personally vote for a switch, as well
There was a problem hiding this comment.
I'm not excited about using a switch here, but I also don't care very much, so I changed it.
|
@ethomson could you give an approximate timeline for this? Many thanks! |
|
Further testing has revealed problems in the patch application, I don’t have an ETA. |
|
I've added per-delta and per-hunk callbacks. I'll go through the notes from the prior review (up to |
|
I've cherry-picked in @hackhaslam's tests and have fixed this up per the comments. |
FTFY. |
tiennou
left a comment
There was a problem hiding this comment.
I only reviewed the full diff because of the many fixups. No nitpicks were harmed in the writing of this review 😉.
| error = ctx->opts.hunk_cb(&hunk->hunk, ctx->opts.payload); | ||
|
|
||
| if (error) { | ||
| if (error > 0) { |
There was a problem hiding this comment.
I find this nested if confusing (especially given the implicit != 0 check).
if (error > 0) {
/* current code */
} else if (error < 0) {
goto done;
}
maybe ?
Also, GIT_EUSER comes to mind 😉.
There was a problem hiding this comment.
Why does GIT_EUSER come to mind?
There was a problem hiding this comment.
I might have mistook that for GIT_PASSTHROUGH, sorry.
| error = opts->delta_cb(delta, opts->payload); | ||
|
|
||
| if (error) { | ||
| if (error > 0) |
There was a problem hiding this comment.
I wrote up your suggestion, but I found it less readable than what's here now, tbqh. @pks-t can you break a tie?
There was a problem hiding this comment.
A:
if (error > 0) {
error = 0;
goto done;
} else if (error < 0) {
goto done;
}
B:
if (error) {
if (error > 0)
error = 0;
goto done;
}
There was a problem hiding this comment.
I think I prefer style B, which matches what you have right now. Style A is a bit more subtle, as the reader would have to spot that neither of the conditionals catches the case error == 0, which is not that easy to see. Style B is a lot more explicit, as one cannot miss the fact that positive error codes are being reset.
|
|
||
| reader->reader.read = tree_reader_read; | ||
| reader->reader.free = tree_reader_free; | ||
| reader->tree = tree; |
There was a problem hiding this comment.
I'd prefer going for safety here IMHO, just to make sure the first person to use git_reader in a "non-atomic" fashion doesn't get to debug lifecycle problems. Though a comment stating the implementation doesn't support that would be fine as well 😉.
|
I'm trying to accept your changes, and manually updating the commit message to be something that |
|
|
||
| /** | ||
| * Create a `git_reader` that will allow random access to the given | ||
| * index, or the repository's index. |
There was a problem hiding this comment.
| * index, or the repository's index. | |
| * repository index. |
|
Fixes #1980 |
|
BTW, I'd be interested in some kind of "report" on the usage of suggestions. They're quite easy to make, but I'd like to be sure that they're not a pain to re-fixup afterward (it sounds like it it). |
pks-t
left a comment
There was a problem hiding this comment.
Finally got around to it. Sorry for taking so long, the security releases kept me busy. :/
| * @param diff the diff to apply | ||
| * @param options the options for the apply (or null for defaults) | ||
| */ | ||
| GIT_EXTERN(int) git_apply_tree( |
There was a problem hiding this comment.
Would git_apply_to_tree be a better name? Right now it indicates that we try to apply a tree, while we in fact apply a diff to a tree.
| error = opts->delta_cb(delta, opts->payload); | ||
|
|
||
| if (error) { | ||
| if (error > 0) |
There was a problem hiding this comment.
I think I prefer style B, which matches what you have right now. Style A is a bit more subtle, as the reader would have to spot that neither of the conditionals catches the case error == 0, which is not that easy to see. Style B is a lot more explicit, as one cannot miss the fact that positive error codes are being reset.
|
|
||
| /* ENOTFOUND means the preimage was not found; apply failed. */ | ||
| if (error == GIT_ENOTFOUND) | ||
| error = GIT_EAPPLYFAIL; |
There was a problem hiding this comment.
I think it'd make sense to also have an error message here. You overwrite GIT_ENOTFOUND, so the caller has no way of knowing that we failed due to the file not existing. So we should at least set an error message:
if (error == GIT_ENOTFOUND)
error = apply_err("%s: file not found", delta->old_file.path);
There was a problem hiding this comment.
I do overwrite GIT_ENOTFOUND, but I don't overwrite the error message itself, which would have been set by the reader. This means that it's actually more explicit, since the existing error message distinguishes between the index and the working directory. eg, if it's not found in the index, it will say index does not contain 'foo'.
| } | ||
| } | ||
|
|
||
| if (delta->status == GIT_DELTA_DELETED) |
There was a problem hiding this comment.
Instead of skipping the rest of this function, you could make this code path similar to the previous one by using if (delta_status != GIT_DELTA_DELETED), moving remaining code into that condition's scope.
There was a problem hiding this comment.
Indeed, I do like that better, it matches nicely the condition and formatting above it.
| if ((error = git_reader_for_tree(&pre_reader, preimage)) < 0) | ||
| goto done; | ||
|
|
||
| /* put the current tree into the postimage as-is - the diff will |
| */ | ||
| extern int git_reader_for_index( | ||
| git_reader **out, | ||
| git_repository *repo); |
There was a problem hiding this comment.
Wouldn't it be more sensible to pass an explicit index instead of an implicit one? First, this would match the previous git_reader_for_tree and second it would potentially open up further usecases
| " in, first taking off their skins, by letting them stand a few minutes in\n" \ | ||
| "-hot water, when they may be easily peeled. When made in this way you\n" \ | ||
| "+Changed line.\n" \ | ||
| " must thicken it with the flour only. Any part of the veal may be used,\n" |
| idx_entry.mode = 0100644; | ||
| idx_entry.path = "asparagus.txt"; | ||
| cl_git_pass(git_oid_fromstr(&idx_entry.id, "06d3fefb8726ab1099acc76e02dfb85e034b2538")); | ||
| git_index_add(index, &idx_entry); |
| validate_apply_workdir(repo, workdir_expected, workdir_expected_cnt); | ||
|
|
||
| git_diff_free(diff); | ||
| } |
There was a problem hiding this comment.
If I didn't just skip over them, then renames are missing. Might be interesting to have different rename cases, too, like two-to-one, one-to-two and one-one (file1 -> file2, file2 -> file3).
Another test case would be the previous thing I mentioned, with having two patches for the same file in one application.
There was a problem hiding this comment.
I'll add some additional tests. When you say two-to-one, you mean a mangled patch that has two sources being renamed to the same target? And by one-to-two, another mangled patch, this one with a single source being renamed to two different targets? These seem like reasonable tests.
Optionally hash the contents of files encountered in the filesystem or working directory iterators. This is not expected to be used in production code paths, but may allow us to simplify some test contexts. For working directory iterators, apply filters as appropriate, since we have the context able to do it.
Create a test applying a patch with a rename and a modification of a file.
When applying to both the index and the working directory, ensure that the working directory's mode matches the index's mode. It's not sufficient to look only at the hashed object id to determine that the file is unchanged, git also takes the mode into account.
Deltas containing exact renames are special; they simple indicate that a file was renamed without providing additional metadata (like the filemode). Teach the reader to provide the file mode and use the preimage's filemode in the case that the delta does not provide one.)
Test that we can rename some file from B->C and then rename some other file from A->B. Do this with both exact rename patches (eg `rename from ...` / `rename to ...`) and patches that remove the files and replace them entirely.
Test a rename from A->B simultaneous with a rename from B->A.
Test that we can apply a patch that renames two different files to the same target filename. Git itself handles this scenario in a last-write wins, such that the rename listed last is the one persisted in the target. Ensure that we do the same.
Test that a patch can contain two deltas that appear to rename an initial source file to two different destination paths. Git creates both target files with the initial source contents; ensure that we do, too.
git allows a patch file to contain multiple deltas to the same file: although it does not produce files in this format itself, this could be the result of concatenating two different patch files that affected the same file. git apply behaves by applying this next delta to the existing postimage of the file. We should do the same. If we have previously seen a file, and produced a postimage for it, we will load that postimage and apply the current delta to that. If we have not, get the file from the preimage.
Ensure that we can apply a delta after renaming a file.
Multiple deltas can exist in a diff, and can be applied in-order. However if there exists a delta that renames a file, it must be first, so that other deltas can reference the resulting target file. git enforces this (`error: already exists in index`), so ensure that we do, too.
Multiple deltas can exist in a diff, and can be applied in-order. If there exists a delta that modifies a file followed by a delta that renames that file, then both will be captured. The modification delta will be applied and the resulting file will be staged with the original filename. The rename delta will be independently applied - to the original file (not the modified file from the original delta) and staged independently.
ee8a1ff to
da4ebe1
Compare
Ensure that we cannot modify a file after it's been renamed out of the way. If multiple deltas exist for a single path, ensure that we do not attempt to modify a file after it's been renamed out of the way. To support this, we must track the paths that have been removed or renamed; add to a string map when we remove a path and remove from the string map if we recreate a path. Validate that we are not applying to a path that is in this map, unless the delta is a rename, since git supports renaming one file to two different places in two different deltas. Further, test that we cannot apply a modification delta to a path that will be created in the future by a rename (a path that does not yet exist.)
Ensure that we can add a file back after it's been removed. Update the renamed/deleted validation in application to not apply to deltas that are adding files to support this.
da4ebe1 to
4e746d8
Compare
|
I went through some more tests - in particular, the idea that you couldn't modify a file in a delta and then rename it in a second delta bothered me. I realized that my test was malformed (I was trying to rename a file to a different, already existing file, hence the error message). Recreating the test validates that git does indeed let you update some file named I also added validation that we will not apply modification deltas to files after deleting them, and added several tests around strange behavior with multiple deltas. |
|
Thanks everybody for the reviews! 🚢 |
|
Exciting times! Thanks a lot, @ethomson! |
This pull request takes a
git_diffthat was either "generated" (ie, we calculated the difference between two commitish things) or "parsed" (we opened the output ofgit diffor a file that we created ourselves) and apply it in the same manner thatgit applydoes.In addition, it introduces
git_apply_treewhich will perform a fully in-memory patch application, applying agit_diffto agit_treeand returning agit_index.My intent is to get this to a minimal viable product that we can merge and then continue to iterate on. I think that this is a useful API even if we don't support all the options that
git applyitself does (eg, writing rejection hunks, etc). However, there are still a few things outstanding before I consider this mergeable:GIT_APPLY_LOCATION_INDEXGIT_APPLY_LOCATION_BOTH