Skip to content

perf(calls): make CALLS resolution linear instead of quadratic#1304

Open
mollyjester wants to merge 1 commit into
CodeGraphContext:mainfrom
mollyjester:perf/cgc-calls-resolution-speedup
Open

perf(calls): make CALLS resolution linear instead of quadratic#1304
mollyjester wants to merge 1 commit into
CodeGraphContext:mainfrom
mollyjester:perf/cgc-calls-resolution-speedup

Conversation

@mollyjester

Copy link
Copy Markdown

Summary

build_function_call_groups (the CALLS-resolution pass) is O(total_calls × distinct_functions) on large repos, which makes it the dominant cost of indexing — on a 1,491-file Python repo this single pass took ~4.3 min of pure compute (and grows quadratically; larger repos effectively never finish). This PR makes it linear in repo size with byte-identical output.

cProfile on a 300-file slice (21s total) pinned two hot spots:

  1. functions_named() — ~48% of runtime. It scanned the entire function_index on every call inside the callback-argument pre-pass → the quadratic. Fixed by precomputing a name -> [(file_path, func)] dict once (O(F)) and turning functions_named into a dict lookup. The dict is built by iterating function_index.items() in the same order the old per-call scan used, so results are order-for-order identical.
  2. Path(x).resolve().as_posix() — ~18%. Called ~172k times, hitting the filesystem (realpath/lstat) repeatedly on the same handful of paths. Fixed by memoizing through a module-level lru_cache (_resolved_posix), cleared at the entry of build_function_call_groups so long-lived processes stay fresh if files move between runs.

Both changes are confined to resolution/calls.py, language-agnostic, and add no new dependencies (only functools.lru_cache).

Benchmark

build_function_call_groups only (parsing excluded), real Python repo:

files before after speedup
300 13.6s 1.4s ~10×
600 31.6s 2.4s ~12×
1,491 257.6s 6.5s 39.5×

Before: 13.6 → 31.6 → 257s (quadratic). After: 1.4 → 2.4 → 6.5s (linear).

Correctness

The parse phase is mildly nondeterministic, so equivalence was proven by parsing once and running the old vs. new build_function_call_groups on the same in-memory input. Every resolved edge — type, caller/called name + file + line, called_context, resolution_tier, confidence, confidence_label, and the C++ caller_label/called_label — is identical before and after.

Verified on Python, JavaScript, and Java samples (all IDENTICAL). The project's tests/odoo/ suite (112 tests) passes; the pre-existing parser-golden failures in my working tree are unrelated (they fail identically without this change).

Risk

Low. No behavioral change to resolution logic — only memoization of a pure path transform and a one-time precompute replacing a repeated linear scan.

build_function_call_groups was O(total_calls x distinct_functions) on
large repos. cProfile (300-file slice) pinned two hot spots:

1. functions_named() scanned the entire function_index on every call in
   the callback-argument pre-pass (~48% of runtime). Precompute a
   name -> [(file_path, func)] dict once (O(F)); build order matches the
   previous per-call scan order exactly, so resolved output is unchanged.
2. Path(x).resolve().as_posix() was called ~172k times, hitting the
   filesystem (realpath/lstat) on the same paths repeatedly (~18%).
   Memoize via a module-level lru_cache, cleared per run for freshness in
   long-lived processes.

Measured on a 1,491-file Python repo (build_function_call_groups only,
parse excluded): 257.6s -> 6.5s (39.5x); the curve goes from quadratic to
linear in file count. Output is byte-identical: every resolved edge
(caller/called/line/resolution_tier/confidence/labels) matches before and
after, verified on Python, JavaScript and Java by running old vs new on
the same parsed input.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

@mollyjester is attempting to deploy a commit to the shashankss1205's projects Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog tasks

Development

Successfully merging this pull request may close these issues.

1 participant