Use a diff for iteration in index_update_all and index_add_all by carlosmn · Pull Request #3109 · libgit2/libgit2

carlosmn · 2015-05-11T14:18:17Z

Re-adding every entry means we hash data we already have hashed, which proves quite a bit slower than it has to be. We can know whether we need to re-calculate hashes by iterating over the entries a diff gives us, rather than every entry in the index.

This should resolve #2687

ethomson · 2015-05-11T14:34:23Z

This is great. I'll look over this more deeply shortly, and I'm very excited to do so.

Until then, just a random thought that add_all is probably also not correct. If you had poorly configured line endings, this would re-filter what's in your working directory and may end up putting stuff into the index.

So this is very nice. Thanks for the nice monday morning surprise!

carlosmn · 2015-05-11T14:37:12Z

Hm, that's an interesting point about the endings. Maybe we should take the crlf configuration into account in the preparation for the diff?

carlosmn · 2015-05-11T19:25:49Z

Come to think of it, this new method is probably not correct for type changes, since it doesn't pass INCLUE_TYPECHANGE to diff.

ethomson · 2015-05-11T19:35:28Z

Sorry, I was a little vague earlier, I meant the current add_all is probably dreadfully wrong. I would expect that a diff-based approach would take advantage of the stat cache and not even investigate things that it knows couldn't have changed. I was responding to your first commit's message that this is not a correctness change; I think that it is both correctness and perf. :)

carlosmn · 2015-05-12T10:08:49Z

It did later think you might have been talking about the current code. I've updated the commit message and added TYPECHANGE to the things for diff to tell us about.

We currently iterate over all the entries and re-add them to the index. While this provides correctness, it is wasteful as we try to re-insert files which have not changed. Instead, take a diff between the index and the worktree and only re-add those which we already know have changed.

Refactor so we look like the code we're replacing, which should also allow us to more easily inplement add-all.

Instead of going through each entry we have and re-adding, which may not even be correct for certain crlf options and has bad performance, use the function which performs a diff against the worktree and try to add and remove files from that list.

Without this option, we would not be able to catch exec bit changes.

Use a diff for iteration in index_update_all and index_add_all

carlosmn force-pushed the cmn/index-use-diff branch from 8c9adeb to 097ab0e Compare May 12, 2015 10:08

carlosmn added 5 commits May 14, 2015 15:23

index: refactor diff-based update_all to match other applies

197307f

Refactor so we look like the code we're replacing, which should also allow us to more easily inplement add-all.

index: make add_all to act on a diff

0a78a52

Instead of going through each entry we have and re-adding, which may not even be correct for certain crlf options and has bad performance, use the function which performs a diff against the worktree and try to add and remove files from that list.

index: include TYPECHANGE in the diff

2b2dfe8

Without this option, we would not be able to catch exec bit changes.

index: add a CHANGELOG entry for the diff usage

874cc35

carlosmn force-pushed the cmn/index-use-diff branch from 097ab0e to 874cc35 Compare May 14, 2015 13:24

ethomson added a commit that referenced this pull request May 19, 2015

Merge pull request #3109 from libgit2/cmn/index-use-diff

acc573c

Use a diff for iteration in index_update_all and index_add_all

ethomson merged commit acc573c into master May 19, 2015

ethomson mentioned this pull request May 19, 2015

Include conflicts when diffing #3139

Merged

carlosmn deleted the cmn/index-use-diff branch June 23, 2015 14:11

aral mentioned this pull request Oct 13, 2015

libgit2 bug with workaround in index.addAll() is now fixed, we should update index.js nodegit/nodegit#751

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a diff for iteration in index_update_all and index_add_all#3109

Use a diff for iteration in index_update_all and index_add_all#3109
ethomson merged 5 commits into
masterfrom
cmn/index-use-diff

carlosmn commented May 11, 2015

Uh oh!

ethomson commented May 11, 2015

Uh oh!

carlosmn commented May 11, 2015

Uh oh!

carlosmn commented May 11, 2015

Uh oh!

ethomson commented May 11, 2015

Uh oh!

carlosmn commented May 12, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

carlosmn commented May 11, 2015

Uh oh!

ethomson commented May 11, 2015

Uh oh!

carlosmn commented May 11, 2015

Uh oh!

carlosmn commented May 11, 2015

Uh oh!

ethomson commented May 11, 2015

Uh oh!

carlosmn commented May 12, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants