stash: reuse cached index entries in --patch temporary index#2306
stash: reuse cached index entries in --patch temporary index#2306adamchainz wants to merge 1 commit into
Conversation
d8438ee to
b228160
Compare
|
/preview |
|
Preview email sent as pull.2306.git.git.1779194003106.gitgitgadget@gmail.com |
|
/submit |
|
Submitted as pull.2306.git.git.1779194605735.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via 15bd84f. |
|
Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Adam Johnson <me@adamj.eu>
>
> `git stash -p` prepares the interactive selection by creating a
> temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> running the `add -p` machinery.
>
> That temporary index was created by running `git read-tree HEAD`. The
> resulting index had no useful cached stat data or fsmonitor-valid bits
> from the real index. When `run_add_p()` refreshed that temporary index
> before showing the first prompt, it could end up lstat(2)-ing every
> tracked file, even in a repository where `git diff` and `git restore -p`
> can use fsmonitor to avoid that work.
>
> Create the temporary index in-process instead. Use `unpack_trees()` to
> reset the real index contents to HEAD while writing the result to the
> temporary index path. For paths whose index entries already match HEAD,
> `oneway_merge()` reuses the existing cache entries, preserving their
> cached stat data and `CE_FSMONITOR_VALID` state.
Clever. As the fsmonitor_valid bit is in-core only, updating the
index in-process would be an obvious and probably the only sensible
way to preserve it.
I however have to wonder if simply replacing the external process
invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
a similar speed-up.
> This makes the refresh performed by `run_add_p()` behave like the one
> used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> instead of being scanned again.
>
> In a 206k file repository with `core.fsmonitor` enabled and a one-line
> change in one file, time to first prompt dropped from 34.774 seconds to
> 0.659 seconds.
Interesting.
> diff --git a/t/t3904-stash-patch.sh b/t/t3904-stash-patch.sh
> index 90a4ff2c10..4b3241c8cd 100755
> --- a/t/t3904-stash-patch.sh
> +++ b/t/t3904-stash-patch.sh
> @@ -84,6 +84,24 @@ test_expect_success 'none of this moved HEAD' '
> verify_saved_head
> '
>
> +test_expect_success 'stash -p with unmodified tracked files present' '
> + git reset --hard &&
> + echo line1 >alpha &&
> + echo line1 >beta &&
> + git add alpha beta &&
> + git commit -m "add alpha and beta" &&
> + echo line2 >>alpha &&
> + echo y | git stash -p &&
> + echo line1 >expect &&
> + test_cmp expect alpha &&
> + test_cmp expect beta &&
> + git stash pop &&
> + printf "line1\nline2\n" >expect &&
> + test_cmp expect alpha &&
> + echo line1 >expect &&
> + test_cmp expect beta
> +'
What I read from the proposed log message is that the change is
purely about performance and should not change any behaviour. Why
do we need a new test in t/t3904? I would not have surprised if we
saw a new test in t/perf/, though.
Thanks. |
|
Junio C Hamano wrote on the Git mailing list (how to reply to this email): "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 2 files changed, 83 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 32dbc97b47..48189cb9f7 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -372,6 +372,57 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> return 0;
> }
>
> +static int create_index_from_tree(const struct object_id *tree_id,
> + const char *index_path)
> +{
> + int nr_trees = 1;
> + int ret = 0;
> + struct unpack_trees_options opts;
> + struct tree_desc t[MAX_UNPACK_TREES];
> + struct tree *tree;
> + struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> + struct lock_file lock_file = LOCK_INIT;
> +
> + repo_read_index_preload(the_repository, NULL, 0);
> + if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
> + return -1;
Is this "non-zero return from refresh_index() leads to a failure"
intended? The old "git read-tree HEAD" wouldn't have cared if the
original index were unmerged, for example, but with this update, we
will see an immediate failure. There are other conditions that
refresh_index() flips its local variable has_errors on, which leads
to its non-zero return.
Since "git stash -p" is almost always invoked when the user has
unstaged modifications, I am not sure allowing refresh_index() to
notice and barf is what we want here. |
|
This patch series was integrated into seen via f9c94c1. |
|
This branch is now known as |
|
This patch series was integrated into seen via f7a955d. |
|
This patch series was integrated into seen via c691c15. |
lisamarielfabian
left a comment
There was a problem hiding this comment.
62c2cda8da6d82ae578064b1ae8c3dbed59eb012
|
This patch series was integrated into seen via 1cab94a. |
`git stash -p` prepares the interactive selection by creating a temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then running the `add -p` machinery. That temporary index was created by running `git read-tree HEAD`. The resulting index had no useful cached stat data or fsmonitor-valid bits from the real index. When `run_add_p()` refreshed that temporary index before showing the first prompt, it could end up lstat(2)-ing every tracked file, even in a repository where `git diff` and `git restore -p` can use fsmonitor to avoid that work. Create the temporary index in-process instead. Use `unpack_trees()` to reset the real index contents to HEAD while writing the result to the temporary index path. For paths whose index entries already match HEAD, `oneway_merge()` reuses the existing cache entries, preserving their cached stat data and `CE_FSMONITOR_VALID` state. This makes the refresh performed by `run_add_p()` behave like the one used by `git restore -p`: unchanged paths can be skipped via fsmonitor instead of being scanned again. In a 206k file repository with `core.fsmonitor` enabled and a one-line change in one file, time to first prompt dropped from 34.774 seconds to 0.659 seconds. The new perf test file demonstrates similar improvements, with maen times for without- and with-fsmonitor cases dropping from 6.90 and 6.83 seconds to 0.55 and 0.28 seconds, respectively. Signed-off-by: Adam Johnson <me@adamj.eu>
b228160 to
8785572
Compare
|
"Adam Johnson" wrote on the Git mailing list (how to reply to this email): > I however have to wonder if simply replacing the external process
> invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
> a similar speed-up.
Good idea, I just tried this, but it does not help. The subprocess runs
with GIT_INDEX_FILE set to a temporary index, so oneway_merge
never uses the CE_FSMONITOR_VALID fast path.
The in-process approach is necessary because it lets us set
opts.src_index to the real index with cached stat and fsmonitor data,
before switching GIT_INDEX_FILE.
> What I read from the proposed log message is that the change is
> purely about performance and should not change any behaviour. Why
> do we need a new test in t/t3904? I would not have surprised if we
> saw a new test in t/perf/, though.
Ah yeah, my bad. I added this while iterating to catch a bug I introduced,
but it's not necessary for the final patch. Will remove.
On Wed, 20 May 2026, at 03:08, Junio C Hamano wrote:
> "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Adam Johnson <me@adamj.eu>
> >
> > `git stash -p` prepares the interactive selection by creating a
> > temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> > running the `add -p` machinery.
> >
> > That temporary index was created by running `git read-tree HEAD`. The
> > resulting index had no useful cached stat data or fsmonitor-valid bits
> > from the real index. When `run_add_p()` refreshed that temporary index
> > before showing the first prompt, it could end up lstat(2)-ing every
> > tracked file, even in a repository where `git diff` and `git restore -p`
> > can use fsmonitor to avoid that work.
> >
> > Create the temporary index in-process instead. Use `unpack_trees()` to
> > reset the real index contents to HEAD while writing the result to the
> > temporary index path. For paths whose index entries already match HEAD,
> > `oneway_merge()` reuses the existing cache entries, preserving their
> > cached stat data and `CE_FSMONITOR_VALID` state.
>
> Clever. As the fsmonitor_valid bit is in-core only, updating the
> index in-process would be an obvious and probably the only sensible
> way to preserve it.
>
> I however have to wonder if simply replacing the external process
> invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
> a similar speed-up.
>
> > This makes the refresh performed by `run_add_p()` behave like the one
> > used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> > instead of being scanned again.
> >
> > In a 206k file repository with `core.fsmonitor` enabled and a one-line
> > change in one file, time to first prompt dropped from 34.774 seconds to
> > 0.659 seconds.
>
> Interesting.
>
> > diff --git a/t/t3904-stash-patch.sh b/t/t3904-stash-patch.sh
> > index 90a4ff2c10..4b3241c8cd 100755
> > --- a/t/t3904-stash-patch.sh
> > +++ b/t/t3904-stash-patch.sh
> > @@ -84,6 +84,24 @@ test_expect_success 'none of this moved HEAD' '
> > verify_saved_head
> > '
> >
> > +test_expect_success 'stash -p with unmodified tracked files present' '
> > + git reset --hard &&
> > + echo line1 >alpha &&
> > + echo line1 >beta &&
> > + git add alpha beta &&
> > + git commit -m "add alpha and beta" &&
> > + echo line2 >>alpha &&
> > + echo y | git stash -p &&
> > + echo line1 >expect &&
> > + test_cmp expect alpha &&
> > + test_cmp expect beta &&
> > + git stash pop &&
> > + printf "line1\nline2\n" >expect &&
> > + test_cmp expect alpha &&
> > + echo line1 >expect &&
> > + test_cmp expect beta
> > +'
>
> What I read from the proposed log message is that the change is
> purely about performance and should not change any behaviour. Why
> do we need a new test in t/t3904? I would not have surprised if we
> saw a new test in t/perf/, though.
>
> Thanks.
> |
|
"Adam Johnson" wrote on the Git mailing list (how to reply to this email): > Is this "non-zero return from refresh_index() leads to a failure"
> intended?
Good catch, it’s not needed. Removing, we can make the call
unconditional.
On Wed, 20 May 2026, at 03:26, Junio C Hamano wrote:
> "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > 2 files changed, 83 insertions(+), 6 deletions(-)
> >
> > diff --git a/builtin/stash.c b/builtin/stash.c
> > index 32dbc97b47..48189cb9f7 100644
> > --- a/builtin/stash.c
> > +++ b/builtin/stash.c
> > @@ -372,6 +372,57 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> > return 0;
> > }
> >
> > +static int create_index_from_tree(const struct object_id *tree_id,
> > + const char *index_path)
> > +{
> > + int nr_trees = 1;
> > + int ret = 0;
> > + struct unpack_trees_options opts;
> > + struct tree_desc t[MAX_UNPACK_TREES];
> > + struct tree *tree;
> > + struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> > + struct lock_file lock_file = LOCK_INIT;
> > +
> > + repo_read_index_preload(the_repository, NULL, 0);
> > + if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
> > + return -1;
>
> Is this "non-zero return from refresh_index() leads to a failure"
> intended? The old "git read-tree HEAD" wouldn't have cared if the
> original index were unmerged, for example, but with this update, we
> will see an immediate failure. There are other conditions that
> refresh_index() flips its local variable has_errors on, which leads
> to its non-zero return.
>
> Since "git stash -p" is almost always invoked when the user has
> unstaged modifications, I am not sure allowing refresh_index() to
> notice and barf is what we want here.
> |
|
/submit |
|
Submitted as pull.2306.v2.git.git.1779491545531.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
This patch series was integrated into seen via d5c7bc1. |
|
There was a status update in the "Cooking" section about the branch "git stash -p" has been optimized by reusing cached index entries in its temporary index, avoiding unnecessary lstat() calls on unchanged files. source: <pull.2306.git.git.1779194605735.gitgitgadget@gmail.com> |
CC: Thomas Gummerer t.gummerer@gmail.com, Elijah Newren newren@gmail.com, Phillip Wood phillip.wood@dunelm.org.uk, Victoria Dye vdye@github.com