perf(calls): make CALLS resolution linear instead of quadratic#1304
Open
mollyjester wants to merge 1 commit into
Open
perf(calls): make CALLS resolution linear instead of quadratic#1304mollyjester wants to merge 1 commit into
mollyjester wants to merge 1 commit into
Conversation
build_function_call_groups was O(total_calls x distinct_functions) on large repos. cProfile (300-file slice) pinned two hot spots: 1. functions_named() scanned the entire function_index on every call in the callback-argument pre-pass (~48% of runtime). Precompute a name -> [(file_path, func)] dict once (O(F)); build order matches the previous per-call scan order exactly, so resolved output is unchanged. 2. Path(x).resolve().as_posix() was called ~172k times, hitting the filesystem (realpath/lstat) on the same paths repeatedly (~18%). Memoize via a module-level lru_cache, cleared per run for freshness in long-lived processes. Measured on a 1,491-file Python repo (build_function_call_groups only, parse excluded): 257.6s -> 6.5s (39.5x); the curve goes from quadratic to linear in file count. Output is byte-identical: every resolved edge (caller/called/line/resolution_tier/confidence/labels) matches before and after, verified on Python, JavaScript and Java by running old vs new on the same parsed input. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@mollyjester is attempting to deploy a commit to the shashankss1205's projects Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
build_function_call_groups(the CALLS-resolution pass) is O(total_calls × distinct_functions) on large repos, which makes it the dominant cost of indexing — on a 1,491-file Python repo this single pass took ~4.3 min of pure compute (and grows quadratically; larger repos effectively never finish). This PR makes it linear in repo size with byte-identical output.cProfileon a 300-file slice (21s total) pinned two hot spots:functions_named()— ~48% of runtime. It scanned the entirefunction_indexon every call inside the callback-argument pre-pass → the quadratic. Fixed by precomputing aname -> [(file_path, func)]dict once (O(F)) and turningfunctions_namedinto a dict lookup. The dict is built by iteratingfunction_index.items()in the same order the old per-call scan used, so results are order-for-order identical.Path(x).resolve().as_posix()— ~18%. Called ~172k times, hitting the filesystem (realpath/lstat) repeatedly on the same handful of paths. Fixed by memoizing through a module-levellru_cache(_resolved_posix), cleared at the entry ofbuild_function_call_groupsso long-lived processes stay fresh if files move between runs.Both changes are confined to
resolution/calls.py, language-agnostic, and add no new dependencies (onlyfunctools.lru_cache).Benchmark
build_function_call_groupsonly (parsing excluded), real Python repo:Before: 13.6 → 31.6 → 257s (quadratic). After: 1.4 → 2.4 → 6.5s (linear).
Correctness
The parse phase is mildly nondeterministic, so equivalence was proven by parsing once and running the old vs. new
build_function_call_groupson the same in-memory input. Every resolved edge —type, caller/called name + file + line,called_context,resolution_tier,confidence,confidence_label, and the C++caller_label/called_label— is identical before and after.Verified on Python, JavaScript, and Java samples (all IDENTICAL). The project's
tests/odoo/suite (112 tests) passes; the pre-existing parser-golden failures in my working tree are unrelated (they fail identically without this change).Risk
Low. No behavioral change to resolution logic — only memoization of a pure path transform and a one-time precompute replacing a repeated linear scan.