revwalk: avoid walking the entire history when output is unsorted by carlosmn · Pull Request #4606 · libgit2/libgit2

carlosmn · 2018-04-01T12:12:06Z

As part of reducing our divergence from git, its code for revwalk was ported
into our codebase. A detail about when to limit the list was lost and we ended
up always calling that code.

Limiting the list means performing the walk and creating the final list of
commits to be output during the preparation stage. This is unavoidable when
sorting and when there are negative refs.

We did this even when asked for unsorted output with no negative refs, which you
might do to retrieve something like the "last 10 commits on HEAD" for a
nominally unsorted meaning of "last".

This commit adds and sets a flag indicating when we do need to limit the list,
letting us avoid doing so when we can. The previously mentioned query thus no
longer loads the entire history of the project during the prepare stage, but
loads it iteratively during the walk.

As part of reducing our divergence from git, its code for revwalk was ported into our codebase. A detail about when to limit the list was lost and we ended up always calling that code. Limiting the list means performing the walk and creating the final list of commits to be output during the preparation stage. This is unavoidable when sorting and when there are negative refs. We did this even when asked for unsorted output with no negative refs, which you might do to retrieve something like the "last 10 commits on HEAD" for a nominally unsorted meaning of "last". This commit adds and sets a flag indicating when we do need to limit the list, letting us avoid doing so when we can. The previously mentioned query thus no longer loads the entire history of the project during the prepare stage, but loads it iteratively during the walk.

pks-t

Cool! Out of curiosity: have you done any benchmarks by how much this actually speeds up listings?

pks-t · 2018-04-03T12:01:55Z

+	}
+}
+
+static int get_revision(git_commit_list_node **out, git_revwalk *walk, git_commit_list **list)


There's not really a difference to get_one_revision right now, is there?

Right now it just calls the lower-level function but extra features would get built here.

I'd say we should avoid having two equivalent function calls for now, though. It can still easily be extended later when implementing said extra features

pks-t · 2018-04-03T12:05:06Z

+	int error;
+	git_commit_list_node *commit;
+
+	while(true) {


This loop is useless right now -- we always return during the first loop

Yeah, part of the porting. Here is where simplification handling happens in git. It's probably worth removing it to reduce confusion.

pks-t · 2018-04-03T12:13:22Z

 	}
+
+	if (sort_mode != GIT_SORT_NONE)
+		walk->limited = 1;


This has the limitation that we will stay limited when somebody is switching sorting to GIT_SORT_NONE afterwards, again. But handling that case correctly would need another flag, so I guess that's okay

Handling that would need something like else if (!walk->did_hide) which is not too bad, but would be quite an edge case where you're double-guessing what sorting you want for a single walk.

pks-t · 2018-04-03T12:19:00Z

+	return error;
 }

 static int revwalk_next_reverse(git_commit_list_node **object_out, git_revwalk *walk)


Wouldn't revwalk_next_reverse need the get_revision treatment, as well?

The reverse iterator does the full walk itself in prepare_walk by iterating via whatever other iterator we have set up and inserts them in this iterator.

We could potentially convert it into the same kind of loop but like with the pure time sort, we already have everything in the list.

carlosmn · 2018-04-03T14:37:35Z

have you done any benchmarks by how much this actually speeds up listings?

That depends on how big your history is. With this an "unsorted" walk without negative refs will only read however many commits you walk down (plus some overhead for the parents that end up in there) instead of reading the whole list.

My benchmark was to load up the github repository and call git_revwalk_next 11 times. Without the patch it takes about 7s and with it it's about 20ms or so.

We don't currently need to have anything that's different between `get_revision` and `get_one_revision` so let's just remove the inner function and make the code more straightforward.

carlosmn · 2018-04-12T00:55:26Z

I just pushed up the simplification removing the extra layer of function call that doesn't do anything. I think it'll be fine to skip the trying to guess whether we can disable limiting when changing the sorting method back to NONE.

ethomson · 2018-04-16T14:25:36Z

+	int error;
+	git_commit_list_node *commit;
+
+  commit = git_commit_list_pop(list);


Could you clean up the formatting in this function?

ethomson · 2018-06-18T11:10:14Z

Since @carlosmn updated this to address @pks-t's concerns, I went ahead and fixed up the formatting problems myself.

nebosite · 2018-12-18T20:48:07Z

I have a really big repository (many GB) and repository.Commits.Take(100) takes around 200 seconds to run. Compare to git log -n 100 which takes about 1 second. The culprit is this line in libgit2sharp:

        int res = NativeMethods.git_revwalk_next(out ret, walker);

On the same repository, Diff.Compare(oldTree, newTree) takes over 200 seconds in libgit2sharp, but only 1 second from the commandline.

pks-t · 2018-12-19T09:22:31Z

@nebosite: What was the version of libgit2 you have been testing with? Did it already include these changes from this PR here? If so, then you should probably create another issue with a simple reproducer that shows the problem

nebosite · 2018-12-28T22:54:46Z

I was using a same-day clone of master. Not sure I can reproduce this without a really large repo. I'm guessing the behavior might be exponential.

pks-t requested changes Apr 3, 2018

View reviewed changes

pks-t added the feedback provided label Apr 3, 2018

carlosmn mentioned this pull request Apr 5, 2018

git log -15 is slow libgit2/libgit2sharp#1558

Closed

revwalk: remove one useless layer of functions

ef68241

We don't currently need to have anything that's different between `get_revision` and `get_one_revision` so let's just remove the inner function and make the code more straightforward.

ethomson reviewed Apr 16, 2018

View reviewed changes

revwalk: formatting updates

ff98fec

ethomson merged commit cc9c952 into master Jun 18, 2018

mystor mentioned this pull request Jun 18, 2018

Fix interaction between limited flag and sorting over resets #4688

Merged

rgburke mentioned this pull request Jul 14, 2018

Limit the number of commits in list rgburke/grv#73

Closed

carlosmn deleted the cmn/revwalk-iteration branch July 23, 2018 15:35

carlosmn mentioned this pull request Jul 23, 2018

git_remote_fetch is slow #4736

Closed

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade nodegit from 0.4.1 to 0.26.4 saurabharch/Breezeblocks#1

Open

snyk-bot mentioned this pull request Apr 22, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 aminatakonate000/Graviton-App#4

Open

snyk-bot mentioned this pull request May 5, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 Barnstorm-Online/ngp-openapi-generator#1

Open

Conversation

carlosmn commented Apr 1, 2018

Uh oh!

pks-t left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlosmn commented Apr 3, 2018

Uh oh!

carlosmn commented Apr 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethomson commented Jun 18, 2018

Uh oh!

nebosite commented Dec 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pks-t commented Dec 19, 2018

Uh oh!

nebosite commented Dec 28, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nebosite commented Dec 18, 2018 •

edited

Loading