Skip to content

worktree: copy-on-write creation and shared-branch worktrees#2317

Open
nevion wants to merge 2 commits into
git:masterfrom
nevion:worktree-reflink-cow
Open

worktree: copy-on-write creation and shared-branch worktrees#2317
nevion wants to merge 2 commits into
git:masterfrom
nevion:worktree-reflink-cow

Conversation

@nevion
Copy link
Copy Markdown

@nevion nevion commented May 29, 2026

When many worktrees share one repository -- e .g. a fleet of agents each
needing an isolated checkout -- "git worktree add" is costly at scale.
Objects are shared via the common dir, but the working tree is not: each
add rewrites every tracked file, so N worktrees cost N full checkouts of
disk and I/O. And a branch can only be checked out in one worktree.

Patch 1 adds "git worktree add --reflink": on a copy-on-write filesystem
it populates the new worktree by reflinking the current worktree's files
and index, then "git reset --hard" rewrites only the paths that differ
from . A reflink_file() helper in copy.c uses FICLONE (Linux)
and clonefile() (macOS); elsewhere (other filesystems, Windows) it is
probed up front and falls back to a normal checkout. Defaulting is via
the worktree.reflink config (true/false/auto); --no-reflink overrides.

Patch 2 lets a branch be checked out in several worktrees, for parallel
work on one checkout. A branch mid-rebase or mid-bisect elsewhere is
still refused.

Benchmark (Linux-kernel fork, 93k files, ~33 GB tree incl. build output,
btrfs): a normal add allocates ~0.9 GB of real disk per worktree (~5.3 GB
for four, linear); --reflink allocates ~0 at any count and also carries
the untracked build tree. ("Real disk" = btrfs exclusive bytes.)

worktree-reflink-bench

Note: patch 2 changes a default (same-branch checkout now allowed); two
t2400 assertions were updated accordingly.

@gitgitgadget-git
Copy link
Copy Markdown

Welcome to GitGitGadget

Hi @nevion, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <revi.ewer@example.com>, Ill Takalook <ill.takalook@example.net>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@nevion
Copy link
Copy Markdown
Author

nevion commented May 29, 2026

this feature is used/reimplemented within grok build https://x.com/theskory/status/2059729539287167068 (I have nothing to do with that) but the best place for this to be done is inside git so all tools can benefit.

@nevion
Copy link
Copy Markdown
Author

nevion commented May 29, 2026

Benchmark — Linux-kernel fork (93k tracked files, ~33 GB working tree incl. build artifacts, btrfs):

worktree --reflink benchmark

Left: real disk written vs. number of worktrees (normal grows ~1.3 GB each; --reflink stays at 0). Right: one worktree — ~0 new disk, and it carries the full build tree (464k files) vs. source-only (93k). "Real disk" = btrfs exclusive bytes; shared CoW extents count as 0.

nevion added 2 commits May 29, 2026 12:00
Creating many worktrees from the same base -- for example to run a
fleet of automated agents in parallel -- is expensive today: every
"git worktree add" materializes the entire working tree by writing
each tracked file out from the object store. The objects are shared
via the common directory, but the working tree is not: N worktrees
mean N full checkouts on disk and N times the file I/O.

Add a "--reflink" option that, on copy-on-write filesystems, populates
the new worktree by reflinking the current worktree's files and index
instead. The subsequent "git reset --hard" then only rewrites the
paths that actually differ between the current worktree and
<commit-ish>; everything else (including untracked files such as build
outputs) keeps sharing storage with the source until modified. Because
the cloned index still carries the source files' stat data, it is
refreshed against the reflinked files first so that reset recognizes
the unchanged paths as up to date and leaves them sharing extents
rather than rewriting them.

The clones are made by a new reflink_file() helper in copy.c, which
uses the FICLONE ioctl on Linux and clonefile() on macOS and reports
an error otherwise so callers fall back to a normal copy. Support is
probed up front; when unavailable -- including on filesystems without
copy-on-write and on platforms such as Windows that lack a reflink
primitive -- "--reflink" transparently falls back to an ordinary
checkout, so the worst case is no slower than today rather than a
byte-for-byte copy of the source tree. The directory walk skips the
new worktree itself when it lives inside the source one, and preserves
symlinks and modes.

The behavior can be made the default with the worktree.reflink
configuration ("true", "false" or "auto", the last suppressing the
unsupported-filesystem warning), and turned off per-invocation with
--no-reflink. A configured default degrades quietly in modes that
cannot reflink (--orphan, --no-checkout) instead of erroring, so
enabling it never breaks those commands. The checkout step continues
to honor checkout.workers, so parallel checkout composes with
--reflink for the paths that do need rewriting.

Signed-off-by: Jason Newton <nevion@gmail.com>
When spinning up several worktrees on the same checkout for parallel
work (for example a fleet of agents working from one branch), git's
refusal to check out a branch that is already checked out elsewhere is
just in the way. The restriction exists to stop two worktrees from
moving the same branch underneath each other, but plain parallel
checkouts do not need that protection.

Drop the restriction: "git worktree add <branch>" now checks out a
branch even if it is in use by another worktree. The genuinely
dangerous case is kept -- a branch that another worktree is in the
middle of rebasing or bisecting is still refused, because a second
checkout could corrupt that operation. die_if_branch_busy() performs
that narrower check in place of the old die_if_checked_out(). The
separate guard against force-updating (e.g. with -B) a branch in use
elsewhere is left untouched.

Signed-off-by: Jason Newton <nevion@gmail.com>
@nevion
Copy link
Copy Markdown
Author

nevion commented Jun 1, 2026

The non-Linux CI failures here are unrelated to this change (which only touches worktree, copy, and t2400):

  • win test (4)t3070.966 iwildmatch (via ls-files): this is in wildmatch/pathspec matching, untouched by this series. It is already failing on git/git's own master CI, independent of this PR — e.g. run #11454 (1666c126) and run #11420 (c69baaf5, "The 9th batch" — the very commit this branch is based on) both show win test (4) = failure, whereas it still passed at run #11401 (56a4f3c3). So it's a recent base regression on the Windows shard, not introduced here; t3070-wildmatch.sh and wildmatch.c are byte-identical to v2.54.0, so rebasing onto a release would not avoid it. It'll be resolved on master independently of this topic.

  • osx-gcc / osx-meson (macos-14): intermittent. These passed on the first CI run for this exact commit and failed only on a re-run (failing step: ci/run-build-and-tests.sh, no specific test reported). This is macOS-runner flakiness; a re-run clears it.

  • linux-TEST-vars (ubuntu:20.04) (now green): the earlier failure was a runner infrastructure error — System.IO.IOException: No space left on device — not a test failure. The linux32/osx "cancelled" entries were fail-fast cancellations from that run. A re-run brought all linux-* jobs green.

Every job that actually exercises this change (all linux-*, almalinux, debian, fedora, and win test 0-3/5-9) passes.

@dscho
Copy link
Copy Markdown
Member

dscho commented Jun 3, 2026

/allow

@dscho
Copy link
Copy Markdown
Member

dscho commented Jun 3, 2026

Yes, sadly Git's CI is not always in a pristine shape :-(

@gitgitgadget-git
Copy link
Copy Markdown

User nevion is now allowed to use GitGitGadget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants