Skip to content

Use a diff for iteration in index_update_all and index_add_all#3109

Merged
ethomson merged 5 commits into
masterfrom
cmn/index-use-diff
May 19, 2015
Merged

Use a diff for iteration in index_update_all and index_add_all#3109
ethomson merged 5 commits into
masterfrom
cmn/index-use-diff

Conversation

@carlosmn
Copy link
Copy Markdown
Member

Re-adding every entry means we hash data we already have hashed, which proves quite a bit slower than it has to be. We can know whether we need to re-calculate hashes by iterating over the entries a diff gives us, rather than every entry in the index.

This should resolve #2687

@ethomson
Copy link
Copy Markdown
Member

This is great. I'll look over this more deeply shortly, and I'm very excited to do so.

Until then, just a random thought that add_all is probably also not correct. If you had poorly configured line endings, this would re-filter what's in your working directory and may end up putting stuff into the index.

So this is very nice. Thanks for the nice monday morning surprise!

@carlosmn
Copy link
Copy Markdown
Member Author

Hm, that's an interesting point about the endings. Maybe we should take the crlf configuration into account in the preparation for the diff?

@carlosmn
Copy link
Copy Markdown
Member Author

Come to think of it, this new method is probably not correct for type changes, since it doesn't pass INCLUE_TYPECHANGE to diff.

@ethomson
Copy link
Copy Markdown
Member

Sorry, I was a little vague earlier, I meant the current add_all is probably dreadfully wrong. I would expect that a diff-based approach would take advantage of the stat cache and not even investigate things that it knows couldn't have changed. I was responding to your first commit's message that this is not a correctness change; I think that it is both correctness and perf. :)

@carlosmn carlosmn force-pushed the cmn/index-use-diff branch from 8c9adeb to 097ab0e Compare May 12, 2015 10:08
@carlosmn
Copy link
Copy Markdown
Member Author

It did later think you might have been talking about the current code. I've updated the commit message and added TYPECHANGE to the things for diff to tell us about.

carlosmn added 5 commits May 14, 2015 15:23
We currently iterate over all the entries and re-add them to the
index. While this provides correctness, it is wasteful as we try to
re-insert files which have not changed.

Instead, take a diff between the index and the worktree and only re-add
those which we already know have changed.
Refactor so we look like the code we're replacing, which should also
allow us to more easily inplement add-all.
Instead of going through each entry we have and re-adding, which may not
even be correct for certain crlf options and has bad performance, use
the function which performs a diff against the worktree and try to add
and remove files from that list.
Without this option, we would not be able to catch exec bit changes.
@carlosmn carlosmn force-pushed the cmn/index-use-diff branch from 097ab0e to 874cc35 Compare May 14, 2015 13:24
ethomson added a commit that referenced this pull request May 19, 2015
Use a diff for iteration in index_update_all and index_add_all
@ethomson ethomson merged commit acc573c into master May 19, 2015
@carlosmn carlosmn deleted the cmn/index-use-diff branch June 23, 2015 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

git_index_add_all() and git_index_update_all() are about 25X slower than git CLT

2 participants