perf: Reduce filesystem syscalls in globwalk, SCM hashing, and task scheduling#11907
Merged
anthonyshew merged 3 commits intomainfrom Feb 19, 2026
Merged
perf: Reduce filesystem syscalls in globwalk, SCM hashing, and task scheduling#11907anthonyshew merged 3 commits intomainfrom
anthonyshew merged 3 commits intomainfrom
Conversation
Literal file globs (e.g. package.json, turbo.json) are resolved with a single symlink_metadata() syscall instead of a full directory traversal. This eliminates thousands of redundant readdir calls in large monorepos where config files are appended to every task's include list. Also separates glob compilation from filesystem walking so EMFILE retries don't recompile patterns, and removes per-call tracing instrumentation from visit_file and glob_with_contextual_error (called ~6000 times in large repos).
Three optimizations to the SCM layer: 1. Exclude globs in get_package_file_hashes_from_inputs_and_index are now matched in-memory against already-known paths instead of spawning a separate filesystem globwalk. 2. hash_objects uses rayon to parallelize file hashing across threads. Each file already has its own EMFILE retry logic. 3. git_ls_tree_repo_root_sorted returns a BTreeMap directly, eliminating the intermediate HashMap->BTreeMap conversion in RepoGitIndex::new.
- expanded_hashes now stores Arc<FileHashes> so task distribution is a refcount bump instead of cloning the entire HashMap per task. - calculate_dependency_hashes acquires the tracker mutex once for all dependencies instead of once per dependency. - Vendor::infer() hoisted before the scheduling loop.
Contributor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Coverage Report
|
github-actions Bot
added a commit
that referenced
this pull request
Feb 19, 2026
## Release v2.8.11-canary.5 Versioned docs: https://v2-8-11-canary-5.turborepo.dev ### Changes - release(turborepo): 2.8.11-canary.4 (#11904) (`91cf34b`) - perf: Reduce filesystem syscalls in globwalk, SCM hashing, and task scheduling (#11907) (`9f36363`) --------- Co-authored-by: Turbobot <turbobot@vercel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reduces
turbo run --drywall-clock time by 5-12% on large monorepos (~1700 tasks) by cutting unnecessary filesystem operations and CPU work.User time dropped 39% (3.7s → 2.25s) and system time dropped 30% (1.87s → 1.32s) on the largest benchmark repo. Wall-clock improvement is bounded by irreducible I/O (git subprocesses, remaining directory walks).
Changes
Commit 1: Fast-path invariant globs
Literal file globs like
package.jsonandturbo.json— appended to every task's include list — now resolve with a singlesymlink_metadata()instead of a full directory traversal viawax::Glob::walk(). Also separates glob compilation from the walk so EMFILE retries don't recompile patterns, and removes per-call tracing fromvisit_file/glob_with_contextual_error(~6000 calls in large repos).Commit 2: In-memory exclude matching + parallel hashing
Exclude globs are now matched against already-known paths in memory instead of spawning a separate filesystem globwalk.
hash_objectsuses rayon for parallel file hashing.git_ls_tree_repo_root_sortedbuilds a BTreeMap directly, skipping the HashMap→BTreeMap conversion.Commit 3: Arc-share FileHashes + batch mutex
expanded_hashesstoresArc<FileHashes>so distributing results to ~1700 tasks is a refcount bump, not a HashMap clone.calculate_dependency_hashesacquires the tracker mutex once instead of per-dependency.Benchmarks
hyperfinewith--warmup 5, 10 runs each.faster= this PR,baseline= current main.Large repo (~1700 tasks):
Medium repo (~200 tasks):
Small repo (~5 tasks):