Skip to content

stash: reuse cached index entries in --patch temporary index#2306

Open
adamchainz wants to merge 1 commit into
git:masterfrom
adamchainz:aj/optimize-stash-patch
Open

stash: reuse cached index entries in --patch temporary index#2306
adamchainz wants to merge 1 commit into
git:masterfrom
adamchainz:aj/optimize-stash-patch

Conversation

@adamchainz
Copy link
Copy Markdown
Contributor

CC: Thomas Gummerer t.gummerer@gmail.com, Elijah Newren newren@gmail.com, Phillip Wood phillip.wood@dunelm.org.uk, Victoria Dye vdye@github.com

@adamchainz
Copy link
Copy Markdown
Contributor Author

/preview

@gitgitgadget-git
Copy link
Copy Markdown

Preview email sent as pull.2306.git.git.1779194003106.gitgitgadget@gmail.com

@adamchainz
Copy link
Copy Markdown
Contributor Author

/submit

@gitgitgadget-git
Copy link
Copy Markdown

Submitted as pull.2306.git.git.1779194605735.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2306/adamchainz/aj/optimize-stash-patch-v1

To fetch this version to local tag pr-git-2306/adamchainz/aj/optimize-stash-patch-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2306/adamchainz/aj/optimize-stash-patch-v1

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via 15bd84f.

@gitgitgadget-git
Copy link
Copy Markdown

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Adam Johnson <me@adamj.eu>
>
> `git stash -p` prepares the interactive selection by creating a
> temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> running the `add -p` machinery.
>
> That temporary index was created by running `git read-tree HEAD`.  The
> resulting index had no useful cached stat data or fsmonitor-valid bits
> from the real index.  When `run_add_p()` refreshed that temporary index
> before showing the first prompt, it could end up lstat(2)-ing every
> tracked file, even in a repository where `git diff` and `git restore -p`
> can use fsmonitor to avoid that work.
>
> Create the temporary index in-process instead.  Use `unpack_trees()` to
> reset the real index contents to HEAD while writing the result to the
> temporary index path.  For paths whose index entries already match HEAD,
> `oneway_merge()` reuses the existing cache entries, preserving their
> cached stat data and `CE_FSMONITOR_VALID` state.

Clever.  As the fsmonitor_valid bit is in-core only, updating the
index in-process would be an obvious and probably the only sensible
way to preserve it.

I however have to wonder if simply replacing the external process
invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
a similar speed-up.

> This makes the refresh performed by `run_add_p()` behave like the one
> used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> instead of being scanned again.
>
> In a 206k file repository with `core.fsmonitor` enabled and a one-line
> change in one file, time to first prompt dropped from 34.774 seconds to
> 0.659 seconds.

Interesting.

> diff --git a/t/t3904-stash-patch.sh b/t/t3904-stash-patch.sh
> index 90a4ff2c10..4b3241c8cd 100755
> --- a/t/t3904-stash-patch.sh
> +++ b/t/t3904-stash-patch.sh
> @@ -84,6 +84,24 @@ test_expect_success 'none of this moved HEAD' '
>  	verify_saved_head
>  '
>  
> +test_expect_success 'stash -p with unmodified tracked files present' '
> +	git reset --hard &&
> +	echo line1 >alpha &&
> +	echo line1 >beta &&
> +	git add alpha beta &&
> +	git commit -m "add alpha and beta" &&
> +	echo line2 >>alpha &&
> +	echo y | git stash -p &&
> +	echo line1 >expect &&
> +	test_cmp expect alpha &&
> +	test_cmp expect beta &&
> +	git stash pop &&
> +	printf "line1\nline2\n" >expect &&
> +	test_cmp expect alpha &&
> +	echo line1 >expect &&
> +	test_cmp expect beta
> +'

What I read from the proposed log message is that the change is
purely about performance and should not change any behaviour.  Why
do we need a new test in t/t3904?  I would not have surprised if we
saw a new test in t/perf/, though.

Thanks.

@gitgitgadget-git
Copy link
Copy Markdown

Junio C Hamano wrote on the Git mailing list (how to reply to this email):

"Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  2 files changed, 83 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 32dbc97b47..48189cb9f7 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -372,6 +372,57 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
>  	return 0;
>  }
>  
> +static int create_index_from_tree(const struct object_id *tree_id,
> +				  const char *index_path)
> +{
> +	int nr_trees = 1;
> +	int ret = 0;
> +	struct unpack_trees_options opts;
> +	struct tree_desc t[MAX_UNPACK_TREES];
> +	struct tree *tree;
> +	struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> +	struct lock_file lock_file = LOCK_INIT;
> +
> +	repo_read_index_preload(the_repository, NULL, 0);
> +	if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
> +		return -1;

Is this "non-zero return from refresh_index() leads to a failure"
intended?  The old "git read-tree HEAD" wouldn't have cared if the
original index were unmerged, for example, but with this update, we
will see an immediate failure.  There are other conditions that
refresh_index() flips its local variable has_errors on, which leads
to its non-zero return.

Since "git stash -p" is almost always invoked when the user has
unstaged modifications, I am not sure allowing refresh_index() to
notice and barf is what we want here.

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via f9c94c1.

@gitgitgadget-git
Copy link
Copy Markdown

This branch is now known as aj/stash-patch-optimize-temporary-index.

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via f7a955d.

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via c691c15.

Copy link
Copy Markdown

@lisamarielfabian lisamarielfabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

62c2cda8da6d82ae578064b1ae8c3dbed59eb012

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via 1cab94a.

`git stash -p` prepares the interactive selection by creating a
temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
running the `add -p` machinery.

That temporary index was created by running `git read-tree HEAD`.  The
resulting index had no useful cached stat data or fsmonitor-valid bits
from the real index.  When `run_add_p()` refreshed that temporary index
before showing the first prompt, it could end up lstat(2)-ing every
tracked file, even in a repository where `git diff` and `git restore -p`
can use fsmonitor to avoid that work.

Create the temporary index in-process instead.  Use `unpack_trees()` to
reset the real index contents to HEAD while writing the result to the
temporary index path.  For paths whose index entries already match HEAD,
`oneway_merge()` reuses the existing cache entries, preserving their
cached stat data and `CE_FSMONITOR_VALID` state.

This makes the refresh performed by `run_add_p()` behave like the one
used by `git restore -p`: unchanged paths can be skipped via fsmonitor
instead of being scanned again.

In a 206k file repository with `core.fsmonitor` enabled and a one-line
change in one file, time to first prompt dropped from 34.774 seconds to
0.659 seconds. The new perf test file demonstrates similar improvements,
with maen times for without- and with-fsmonitor cases dropping from 6.90
and 6.83 seconds to 0.55 and 0.28 seconds, respectively.

Signed-off-by: Adam Johnson <me@adamj.eu>
@adamchainz adamchainz force-pushed the aj/optimize-stash-patch branch from b228160 to 8785572 Compare May 22, 2026 21:26
@gitgitgadget-git
Copy link
Copy Markdown

"Adam Johnson" wrote on the Git mailing list (how to reply to this email):

> I however have to wonder if simply replacing the external process
> invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
> a similar speed-up.

Good idea, I just tried this, but it does not help. The subprocess runs
with GIT_INDEX_FILE set to a temporary index, so oneway_merge
never uses the CE_FSMONITOR_VALID fast path.

The in-process approach is necessary because it lets us set
opts.src_index to the real index with cached stat and fsmonitor data,
before switching GIT_INDEX_FILE.

> What I read from the proposed log message is that the change is
> purely about performance and should not change any behaviour.  Why
> do we need a new test in t/t3904?  I would not have surprised if we
> saw a new test in t/perf/, though.

Ah yeah, my bad. I added this while iterating to catch a bug I introduced,
but it's not necessary for the final patch. Will remove.

On Wed, 20 May 2026, at 03:08, Junio C Hamano wrote:
> "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Adam Johnson <me@adamj.eu>
> >
> > `git stash -p` prepares the interactive selection by creating a
> > temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> > running the `add -p` machinery.
> >
> > That temporary index was created by running `git read-tree HEAD`.  The
> > resulting index had no useful cached stat data or fsmonitor-valid bits
> > from the real index.  When `run_add_p()` refreshed that temporary index
> > before showing the first prompt, it could end up lstat(2)-ing every
> > tracked file, even in a repository where `git diff` and `git restore -p`
> > can use fsmonitor to avoid that work.
> >
> > Create the temporary index in-process instead.  Use `unpack_trees()` to
> > reset the real index contents to HEAD while writing the result to the
> > temporary index path.  For paths whose index entries already match HEAD,
> > `oneway_merge()` reuses the existing cache entries, preserving their
> > cached stat data and `CE_FSMONITOR_VALID` state.
> 
> Clever.  As the fsmonitor_valid bit is in-core only, updating the
> index in-process would be an obvious and probably the only sensible
> way to preserve it.
> 
> I however have to wonder if simply replacing the external process
> invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
> a similar speed-up.
> 
> > This makes the refresh performed by `run_add_p()` behave like the one
> > used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> > instead of being scanned again.
> >
> > In a 206k file repository with `core.fsmonitor` enabled and a one-line
> > change in one file, time to first prompt dropped from 34.774 seconds to
> > 0.659 seconds.
> 
> Interesting.
> 
> > diff --git a/t/t3904-stash-patch.sh b/t/t3904-stash-patch.sh
> > index 90a4ff2c10..4b3241c8cd 100755
> > --- a/t/t3904-stash-patch.sh
> > +++ b/t/t3904-stash-patch.sh
> > @@ -84,6 +84,24 @@ test_expect_success 'none of this moved HEAD' '
> >  verify_saved_head
> >  '
> >  
> > +test_expect_success 'stash -p with unmodified tracked files present' '
> > + git reset --hard &&
> > + echo line1 >alpha &&
> > + echo line1 >beta &&
> > + git add alpha beta &&
> > + git commit -m "add alpha and beta" &&
> > + echo line2 >>alpha &&
> > + echo y | git stash -p &&
> > + echo line1 >expect &&
> > + test_cmp expect alpha &&
> > + test_cmp expect beta &&
> > + git stash pop &&
> > + printf "line1\nline2\n" >expect &&
> > + test_cmp expect alpha &&
> > + echo line1 >expect &&
> > + test_cmp expect beta
> > +'
> 
> What I read from the proposed log message is that the change is
> purely about performance and should not change any behaviour.  Why
> do we need a new test in t/t3904?  I would not have surprised if we
> saw a new test in t/perf/, though.
> 
> Thanks.
> 

@gitgitgadget-git
Copy link
Copy Markdown

"Adam Johnson" wrote on the Git mailing list (how to reply to this email):

> Is this "non-zero return from refresh_index() leads to a failure"
> intended?

Good catch, it’s not needed. Removing, we can make the call
unconditional.

On Wed, 20 May 2026, at 03:26, Junio C Hamano wrote:
> "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> >  2 files changed, 83 insertions(+), 6 deletions(-)
> >
> > diff --git a/builtin/stash.c b/builtin/stash.c
> > index 32dbc97b47..48189cb9f7 100644
> > --- a/builtin/stash.c
> > +++ b/builtin/stash.c
> > @@ -372,6 +372,57 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> >  return 0;
> >  }
> >  
> > +static int create_index_from_tree(const struct object_id *tree_id,
> > +   const char *index_path)
> > +{
> > + int nr_trees = 1;
> > + int ret = 0;
> > + struct unpack_trees_options opts;
> > + struct tree_desc t[MAX_UNPACK_TREES];
> > + struct tree *tree;
> > + struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> > + struct lock_file lock_file = LOCK_INIT;
> > +
> > + repo_read_index_preload(the_repository, NULL, 0);
> > + if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
> > + return -1;
> 
> Is this "non-zero return from refresh_index() leads to a failure"
> intended?  The old "git read-tree HEAD" wouldn't have cared if the
> original index were unmerged, for example, but with this update, we
> will see an immediate failure.  There are other conditions that
> refresh_index() flips its local variable has_errors on, which leads
> to its non-zero return.
> 
> Since "git stash -p" is almost always invoked when the user has
> unstaged modifications, I am not sure allowing refresh_index() to
> notice and barf is what we want here.
> 

@adamchainz
Copy link
Copy Markdown
Contributor Author

/submit

@gitgitgadget-git
Copy link
Copy Markdown

Submitted as pull.2306.v2.git.git.1779491545531.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-2306/adamchainz/aj/optimize-stash-patch-v2

To fetch this version to local tag pr-git-2306/adamchainz/aj/optimize-stash-patch-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-2306/adamchainz/aj/optimize-stash-patch-v2

@gitgitgadget-git
Copy link
Copy Markdown

This patch series was integrated into seen via d5c7bc1.

@gitgitgadget-git
Copy link
Copy Markdown

There was a status update in the "Cooking" section about the branch aj/stash-patch-optimize-temporary-index on the Git mailing list:

"git stash -p" has been optimized by reusing cached index
entries in its temporary index, avoiding unnecessary lstat()
calls on unchanged files.
source: <pull.2306.git.git.1779194605735.gitgitgadget@gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants