perf: Deduplicate file hashing and parallelize globwalks#11902
Merged
anthonyshew merged 1 commit intomainfrom Feb 18, 2026
Merged
perf: Deduplicate file hashing and parallelize globwalks#11902anthonyshew merged 1 commit intomainfrom
anthonyshew merged 1 commit intomainfrom
Conversation
- Deduplicate file hashing across tasks that share the same package and inputs, reducing redundant globwalks and file hash computations - Remove the IoSemaphore (max=1) that serialized all globwalk operations, replacing it with retry-on-EMFILE backoff at the globwalk and hash_objects layers - Change Lockfile::all_dependencies to return Cow<HashMap> for zero-copy pnpm lookups during transitive closure resolution - Optimize transitive closure cache keys to use a single reusable String buffer instead of allocating 3 Strings per lookup - Convert pnpm importers from BTreeMap to HashMap for O(1) workspace lookups
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Coverage Report
|
Contributor
Author
|
The query test on Windows is hanging, and unrelated. Merging through. |
github-actions Bot
added a commit
that referenced
this pull request
Feb 18, 2026
## Release v2.8.11-canary.4 Versioned docs: https://v2-8-11-canary-4.turborepo.dev ### Changes - fix: Resolve npm packages in @turbo/gen compiled binary (#11900) (`c2266b0`) - release(turborepo): 2.8.11-canary.3 (#11901) (`b21423e`) - perf: Deduplicate file hashing and parallelize globwalks (#11902) (`57cf69c`) - perf: Improve transitive dependency resolution cache sharing across workspaces (#11903) (`fd1b6e8`) --------- Co-authored-by: Turbobot <turbobot@vercel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes
turbo run --drywall-clock time by up to 1.48x on large monorepos by eliminating redundant file hashing work and removing a serialization bottleneck in globwalk operations.Benchmarks
Tested across three repos of varying size:
The improvement scales with repo size — specifically with how many tasks share the same
(package, inputs)combination.Changes
File hash deduplication — Multiple tasks in the same package with identical
inputsconfig (e.g.build,lint,typecheckall in one package) previously each ran an independent globwalk + file hash computation. Now tasks are grouped by(package_path, globs, include_default)and each unique combination is computed once, with results shared across tasks.Parallel globwalks via retry-on-EMFILE — The previous
IoSemaphore(max=1) serialized all globwalk operations to prevent fd exhaustion, making this the dominant bottleneck on large repos. This replaces the semaphore with retry-with-exponential-backoff onEMFILEerrors (the same pattern Node'sgraceful-fsuses), allowing globwalks to run fully parallel on rayon. If the OS returns "too many open files", the operation sleeps briefly and retries — up to 10 times with exponential backoff capped at 1s.Zero-copy lockfile dependency lookups —
Lockfile::all_dependenciesnow returnsCow<'_, HashMap<String, String>>instead of cloning the HashMap on every call. For pnpm (which pre-builds a dependency index), this eliminates ~329k HashMap clones during transitive closure resolution.Optimized transitive closure cache keys — The
DashMapresolve cache now uses a single null-byte-separatedStringkey built into a reusable buffer, instead of allocating a(String, String, String)tuple per lookup.HashMap importers for pnpm — Converted pnpm's
importersfield fromBTreeMaptoHashMap(with sorted serialization) for O(1) workspace lookups duringresolve_package.