diff --git a/.claude/handoffs/explore-flow-tool-adoption.md b/.claude/handoffs/explore-flow-tool-adoption.md new file mode 100644 index 00000000..b4993811 --- /dev/null +++ b/.claude/handoffs/explore-flow-tool-adoption.md @@ -0,0 +1,70 @@ +--- +name: explore-flow-tool-adoption +date: 2026-05-24 00:55 +project: codegraph +branch: architectural-improvements +summary: Investigated why codegraph's read savings don't convert to wall-clock; root cause is agent tool-CHOICE (under-uses trace). Shipped a chain of fixes; the breakthrough is "explore-surfaces-flow" — the first mechanism to show up in real agent runs by adapting the tool the agent already uses. +--- + +# Handoff: codegraph retrieval — tool adoption & explore-surfaces-flow + +## Resume here — read this first +**Current state:** A long investigation into making agents answer flow questions faster with codegraph. 6 commits on `architectural-improvements` (all probe-validated, suite green 815). The breakthrough: **`codegraph_explore` now surfaces the execution flow** from the symbol-bag the agent already passes it (`PmsProductController getList PmsProductService list PmsProductServiceImpl` → leads output with `getList → service-interface → impl`, riding synth edges). It's the FIRST mechanism this whole arc to actually appear in real agent runs (spring-mall A/B: flow surfaced both runs, reads 2.0→1.5) — because it adapts the tool the agent USES instead of trying to make it use `trace`. + +**Immediate next step:** The user is weighing how to push tool-USE quality next (their open question). Decide between: (a) **extend explore-flow to surface more reliably** (spring-halo's query didn't name a connected co-named chain → no flow), (b) accept we're at the model-behavior ceiling and **wrap up**, or (c) the user's ideas — better tool-description *examples* (≈ steering, low-leverage per the evidence) or a *query-builder tool* (adds a call + new-tool adoption problem). My read: keep ADAPTING THE USED TOOL (the only thing that's worked); examples/new-tools are the "change the agent" direction that failed all session. + +> Suggested next message: "explore-flow only surfaced on 2 of 3 repos — dig into why spring-halo's explore query didn't produce a flow and make it surface more reliably" — OR — "we're at the model-behavior ceiling; let's stop and write the CHANGELOG/PR for this branch" + +## Goal +Make an AI agent answer **flow questions** ("how does X reach Y", request→handler→service, state→render) fast: ~0 Read/Grep, few codegraph calls, lower wall-clock. `codegraph_trace` is the fastest tool (1 call = the path), but the agent under-uses it. Ultimate target = trace's speed, however the agent gets there. + +## Key findings (the through-line) +- **The wall is agent tool-CHOICE, not the graph.** Matrix-wide, codegraph cuts reads −75% but wall-clock only −16% (`docs/benchmarks/codegraph-ab-matrix.md`). The floor is round-trips + the synthesis turn. The agent reliably calls `context`/`explore`, rarely `trace` (3/37 flow cells). Full analysis: `docs/benchmarks/call-sequence-analysis.md`. +- **Steering does NOT move it** (arms B/F/G, 3 wording variants): an MCP `initialize` instruction / tool description can't match a CLI `--append-system-prompt`'s salience, and forcing trace where it doesn't connect regresses. Reverted. +- **Sufficiency works** (committed): a self-sufficient `trace` (hop bodies + destination callees inlined) lets the unsteered agent stop — but only when it calls trace. +- **THE breakthrough — adapt the tool the agent uses.** `explore`'s query is a precise symbol-bag spanning the flow, so `explore` finds the call path AMONG its named symbols and leads with it. First mechanism to surface in real runs + drop reads. +- **What FAILED:** option 1 (context-surfaces-flow) — fuzzy DESCRIPTION can't disambiguate endpoints → confident WRONG-feature flow; reverted. trace multi-source-BFS over ambiguous names — same wrong-feature; reverted. + +## Gotchas +- **Co-naming disambiguation must match qualifiedName SEGMENTS, not substrings** (`buildFlowFromNamedSymbols` in `src/mcp/tools.ts`): `list` is a substring of `getList` → kept every getList. Split `qualifiedName` on `::`/`.` and match segments. +- **BFS must cap consecutive UNNAMED hops at 1** — full-graph BFS wanders a god-function's fan-out (excalidraw `render()` → pointer handlers → mutateElement). ≤1 bridge crosses a missing intermediate without wandering. +- **`getCallees` returns non-`calls` edges too** (references) — filter `c.edge.kind === 'calls'`. +- **Resolver/synthesizer changes need a CLEAN reindex**: `rm -rf .codegraph && codegraph init -i` (the init edge count is contains-only — query the DB for the real count). The explore-flow change is query-time (no reindex). +- **n=2 A/B is noisy** — report ranges/patterns, never conclude from one run. Foreground `sleep` is blocked → run A/B batches with `run_in_background`. +- Java/Kotlin `qualifiedName` is `Class::method` (so `matchesSymbol` resolves `Class.method` qualified trace endpoints — the agent already passes these). + +## How to test & validate +- Probe flow surfacing (no agent): `node scripts/agent-eval/probe-explore.mjs ""` → look for the `## Flow` section. `probe-trace.mjs ` for trace. +- Synthesizer: `sqlite3 /.codegraph/codegraph.db "select count(*) from edges where json_extract(metadata,'$.synthesizedBy')='interface-impl'"`; node count stable before/after reindex (synth adds edges only). +- Agent A/B (the real test): `bash scripts/agent-eval/run-arms.sh "" I ` (arm I = body-trace build, no steering). Parse via the `cmp2.mjs`-style scripts in `/tmp`. Pass = flow surfaces (`flowShown=Y`) + reads ≤ baseline. +- `npm test` (vitest, 815 pass); `__tests__/mcp-tool-allowlist.test.ts` covers the allowlist. + +## Repo state +- branch `architectural-improvements`, last commit `bafae81 feat(mcp): codegraph_explore surfaces the execution flow from its named symbols`. +- uncommitted: clean (only untracked `.claude/handoffs/`). +- 6 session commits: `eab5cf3` self-sufficient trace + `CODEGRAPH_MCP_TOOLS` allowlist · `a6183d7` research log + arms harness · `bde8c19` node/trace line numbers · `98baf41` Java/Kotlin interface→impl synthesizer · `6f3c468` playbook · `bafae81` explore-surfaces-flow. +- NOT pushed/merged. No version bump. CHANGELOG `[Unreleased]` has all of it. + +## Open threads / TODO +- [ ] **User's open question** (answer in the next turn): better tool-description *examples* vs a *query-builder tool* vs keep adapting the used tool. Evidence favors the last. +- [x] explore-flow reliability: now resolves QUALIFIED tokens (`Class.method`) — the agent's most precise input was being dropped by the file-ext strip (`2765c3c`). spring-halo's publish flow stays absent on purpose — it's **reactive/reconciler dispatch** (`publishPost` calls `ReactiveExtensionClient.get`/`awaitPostPublished`, not `PostService.publish`), so there's no static call chain. That's the next COVERAGE frontier (reactive runtimes — like MediatR, Vue Proxy), not an explore-flow bug. +- [ ] Ship-prep for the whole branch (this arc + the earlier framework sweep): CHANGELOG version block + `package.json` bump + PR to main. Releases go through `.github/workflows/release.yml` only — do NOT `npm publish`. +- [ ] Frontiers: MediatR (`_mediator.Send`→Handle) and Vue/Compose reactive runtimes are still unbridged dynamic dispatch. + +## Recent transcript (oldest → newest) +### Turn — "improve the A/B matrix; trace works, reads near 0 — what else?" +- Diagnosed: reads at floor, wall-clock floor = round-trips + synthesis. Built `seq-matrix.mjs`; found trace adoption 3/37. +### Turn — "do explore/context/trace compete? one tool?" +- Ablation arms A–E (`run-arms.sh`/`arms-F.sh` + `CODEGRAPH_MCP_TOOLS` allowlist). explore = 68% of payload, load-bearing; trace path-scoped but under-adopted; trace alone insufficient. +### Turn — "prototype body-inlining trace + A/B" +- Arm F: self-sufficient trace wins WITH append-prompt steering. But steering isn't a shippable channel. +### Turn — "port the steering + re-run" +- Arms G (3 variants) all regressed vs baseline; arm H (body-trace, no steer) ≈ baseline. Steering reverted; body-trace + line-numbers + allowlist committed. +### Turn — "tee up connectivity (Spring interface-DI)" +- Built `interfaceOverrideEdges` (Java/Kotlin interface→impl, overload-aware). Probe: 3-hop trace connects. But A/B null — agent never called trace. Committed (probe-validated, adoption-gated). +### Turn — "make context surface the flow (option 1)" +- Failed: fuzzy query → wrong-feature flows. Reverted. +### Turn — "change explore to do trace in the backend" +- WIN: explore's query is a precise symbol-bag. `buildFlowFromNamedSymbols` (co-naming segment match + ≤1 bridge). Probe perfect (Spring + excalidraw full chains); A/B: flow surfaces + modest read drop. Committed `bafae81`. +### Turn — "update memory + handoff; what about better examples / a query-builder tool?" +- This handoff + memory update. Strategic answer pending (adapt-the-tool > change-the-agent). diff --git a/.claude/handoffs/framework-coverage-sweep-2026-05-23.md b/.claude/handoffs/framework-coverage-sweep-2026-05-23.md new file mode 100644 index 00000000..3ba99a5e --- /dev/null +++ b/.claude/handoffs/framework-coverage-sweep-2026-05-23.md @@ -0,0 +1,70 @@ +--- +name: framework-coverage-sweep-2026-05-23 +date: 2026-05-23 23:59 +project: codegraph +branch: architectural-improvements +summary: Dynamic-dispatch coverage sweep COMPLETE — all 14 README frameworks + every flow-relevant language validated (measure→fix→validate→test→playbook→commit). ~37 commits pushed, suite green. Ship-prep (CHANGELOG + PR to main) is the only thing left. +--- + +# Handoff: Dynamic-dispatch framework/language coverage sweep (complete) + +## Resume here — read this first +**Current state:** The coverage sweep is **done**, AND a **frontier pass** closed the tractable partials. Every framework in the README's 14-row table is ✅, every flow-relevant language is validated (TS/JS, Python, Go, Java, C#, PHP, Ruby, Rust, Swift, Dart, Kotlin, Lua/Luau, Scala, C/C++), and the frontier pass added: React object data-router (literal), Next.js false-positive fix, Flask-RESTful `add_resource` (redash 6→77), Flask tuple methods + broader detection (flask-realworld 0→19), gorilla/mux confirmed. All committed/pushed to `architectural-improvements` (tree clean except untracked `.claude/handoffs/`). Full suite green (**809 passed**, 2 skipped; flaky `watcher.test.ts > debounced sync` passes on re-run). **No CHANGELOG entry exists, and the branch is not yet merged to main.** +**Immediate next step:** Ship-prep — write a CHANGELOG entry grouping the whole sweep (route resolution for Flask/FastAPI/Drupal/Rust-Axum+actix/Vapor/Spring-Kotlin/Play + React Router routing; the Python builtin-name guard, Dart method-range, and C++ inheritance foundational fixes; the flutter-build and cpp-override synthesizer channels), bump `package.json`, then open a PR to main. + +> Suggested next message: "do ship-prep: write the CHANGELOG entry covering the whole framework/language coverage sweep on this branch, bump the version, and open a PR to main" + +## Goal +Close static-extraction holes for **dynamic dispatch** across every language/framework codegraph supports, so cross-symbol flows (request→route→handler→service, state→render, virtual→override) exist in the graph and an agent answers flow questions with few codegraph calls and ~0 Read/Grep. Per framework/language: canonical flow `trace`s end-to-end, agent A/B shows fewer reads, no node explosion, recorded in `docs/design/dynamic-dispatch-coverage-playbook.md` (the matrix §6 + per-item notes §7). **This goal is now met; what remains is ship-prep + documented frontiers.** + +## Key findings (this session's work, all committed) +- **Routing convention is the hole in every backend** — same pattern each time: the resolver/extractor assumed one syntax. Flask (intervening `@login_required`/stacked routes), FastAPI (empty `""` path), Drupal (`claimsReference` for FQCN `_form`/single-colon controllers + contrib `detect` via composer name/type/`.info.yml`), Rust/Axum (chained `get(h).post(h2)` + namespaced `mod::handler`), actix (builder API `web::resource().route(web::get().to(h))`), Vapor (grouped `routes.grouped("x"); x.get(use:h)` — was 0 on every real app), Spring **Kotlin** (`fun` handler syntax + `.kt`), Play (extensionless `conf/routes` → controller), React Router (`` JSX). +- **Three FOUNDATIONAL fixes (broad benefit, not framework-specific):** (1) Python **bare-name builtin guard** in `src/resolution/index.ts` — a handler named `index`/`get`/`update` was filtered as a builtin method; mirror the dotted-branch `knownNames` guard. (2) **Dart method-range** in `src/extraction/tree-sitter.ts` `createNode` — Dart bodies are SIBLINGS of the signature, so methods were `end==start` (signature-only); extend `endLine` to the resolved body (guarded, child-body grammars no-op). (3) **C++ inheritance** — `extractInheritance` handled `base_clause` (PHP) but not C++ `base_class_clause`; added it (leveldb extends 219→298). +- **Two new synthesizer channels** in `src/resolution/callback-synthesizer.ts` (Dart analog + C++ analog of react-render): `flutter-build` (a State method calling `setState(` → `build`) and `cpp-override` (base virtual method → subclass override of same name, gated to C++). +- **measure-first repeatedly split "needs work" from "already covered":** Svelte, NestJS (prior), and this session **Lua/Luau** (module dispatch already resolves) + **Compose** (composition is plain function calls, already static) needed NO code. The assumed hole wasn't real. +- **`claimsReference` pre-filter is the recurring gotcha** (`src/resolution/index.ts:497-503`): a route ref naming no declared symbol (FQCN, `Controller@method`, `controller#action`, `Class.method`) is dropped before `framework.resolve()` runs. Added for Drupal + Play this session. + +## Gotchas +- **`claimsReference`:** if a new framework's route refs don't resolve despite a correct `resolve()`, it's the pre-filter — add `claimsReference`. +- **Reindex picks up resolver changes only on a CLEAN index:** `codegraph index` is incremental (skips unchanged files); after `npm run build`, do `rm -rf .codegraph && codegraph init -i` to re-extract. The init message's edge count is contains-only (~misleading); query the DB for the real count. +- **Extraction changes are high blast radius** (shared `createNode`/`extractInheritance`): re-check node counts on control repos (excalidraw 9,290 / django 302) — the Dart/C++ fixes are guarded to only-extend / C++-only, controls unchanged. +- **Play `conf/routes` is extensionless** → needed `isPlayRoutesFile` opt-in in `grammars.ts` (isSourceFile + detectLanguage→'yaml' no-grammar path). Narrow match, only ADDS Play files. +- **Flaky:** `watcher.test.ts > debounced sync > should trigger sync after file change` — timing-based, passes on re-run; unrelated to any of this work. +- **Foreground `sleep` is blocked** in Bash → background A/B batches (`run_in_background: true`), read the task output file. zsh quirks: quote globs (`'*.vue'`); SQL `count(*)` in `$(...)` needs care with quotes. +- Global `codegraph` is npm-linked to this repo's `dist/`; `npm run build` then reindex. A/B harness: `scripts/agent-eval/run-all.sh "" headless` (with vs empty MCP), parse via `node scripts/agent-eval/parse-run.mjs`. + +## How to test & validate (the per-framework loop) +- Corpus in `/tmp/codegraph-corpus/` (clone S/M/L, `git clone --depth 1`). Index: `rm -rf .codegraph && codegraph init -i`. +- Measure holes: `sqlite3 .codegraph/codegraph.db "select count(*) from nodes where kind='route'"` + route→handler edges (`join edges on source where kind='references'`). Node-count before/after (no explosion). +- Flow: `node scripts/agent-eval/probe-node.mjs ` (shows Called-by/Calls trail) / `probe-trace.mjs `. +- Agent A/B (≥2 runs/arm, variance is real): `run-all.sh` headless, record Read/Grep/duration/codegraph. Pass = fewer reads with codegraph. +- Tests: `npm test` (vitest). Resolver extract tests in `__tests__/frameworks.test.ts`; end-to-end in `__tests__/frameworks-integration.test.ts` (real CodeGraph + indexAll); Dart range in `__tests__/extraction.test.ts`; Drupal in `__tests__/drupal.test.ts`. + +## Repo state +- branch `architectural-improvements`, last commit `42a0178 docs(playbook): record frontier pass; test(go): gorilla/mux`. +- uncommitted: clean (only untracked `.claude/handoffs/`). +- ~37 commits total on the branch (handoff's original 11 frameworks + this session's: Flask/FastAPI, Drupal, Rust/Axum, Vapor, React Router, actix, Dart, Kotlin, Lua, Scala/Play, C/C++ — each a feat + a docs(playbook) commit; Lua was docs-only). + +## Open threads / TODO +- [ ] **SHIP-PREP (the only blocker to merge):** CHANGELOG entry for the whole sweep, `package.json` bump, PR to main. Releases go through `.github/workflows/release.yml` only — do NOT `npm publish` (see CLAUDE.md). +- [x] **Frontier pass DONE (commits 0456915, 03e49ab, 42a0178):** React object data-router (literal), Next.js false-positive fix, Flask-RESTful `add_resource`, Flask tuple methods + detection, gorilla/mux confirmed. +- [ ] **Frontiers LEFT (deliberately, with rationale in playbook §7 "Frontier pass"):** anonymous/inline closures (def-use frontier), metaprogramming finders (AR/Eloquent/JPA/EF), reactive runtimes (Vue Proxy / Compose recomposition), Akka actors, C callback-struct 422-way fan-out, C++ pure-virtual base methods, React lazy data-router (variable paths + lazy imports), Play SIRD, Nuxt-specific. Forcing these adds noise. +- [ ] Pre-existing, unrelated: Next.js `*.config.mjs` in a `pages/` dir treated as a route (false-positive found in bulletproof-react). + +## Recent transcript (oldest → newest, this session) +### Turn — "what's left / what's next on coverage" → did Flask/FastAPI +- 3 holes: Flask intervening/stacked decorators, FastAPI empty path, **Python bare-name builtin guard** (handlers named `index`/`get` filtered). microblog 6→27, realworld 12→20, dispatch 290/290. Fixed 6 stale Laravel/Rails tests too. Committed + pushed. +### Turn — "Drupal next" +- `claimsReference` for FQCN/_form/single-colon controllers + contrib `detect` (composer type/name + `.info.yml`). core 536→731 (87%), admin_toolbar 0→14. OOP `#[Hook]` = frontier. Committed. +### Turn — "Rust: Axum/actix/Rocket" +- Axum chained methods + namespaced handlers (realworld 12→19, 19/19); Rocket already 99%; **actix builder API** `web::resource().route(web::get().to())` (examples 51→128). Committed (2 commits: axum, then actix). +### Turn — "Vapor (Swift)" +- Resolver was 0-routes on every real app; rewrote for any receiver + optional non-string paths + `.grouped` prefix tracking + `use:` discriminator. template 0→3, SteamPress 0→27, SPI 0→14. Committed. +### Turn — "2, 3, 4" (React Router, actix [done above], Dart/Flutter) +- React Router `` JSX (react-realworld 0→10). Dart/Flutter: **method-range fix** (foundational) + `flutter-build` setState→build synthesizer. Committed. +### Turn — "Kotlin next" +- Spring resolver `['java']`→`['java','kotlin']` + `fun` handler regex (petclinic-kotlin 0→18, 18/18; Java unchanged 19/19). Compose composition already static. Committed. +### Turn — "Lua/Luau, Scala, C/C++ (Lua first, but do all three)" +- **Lua:** measure-first → module dispatch already covered (telescope 335 cross-file calls); no code change, validated. **Scala/Play:** `conf/routes` file-walk opt-in + Play resolver (computer-database 0→8). **C/C++:** general dispatch strong (redis 29k); fixed C++ `base_class_clause` inheritance + `cpp-override` synthesizer (leveldb 12 precise). All committed + pushed. +### Turn — "wrap up + refresh handoff" +- This handoff. Sweep complete; ship-prep (CHANGELOG + PR) is the remaining work. diff --git a/.cursor/rules/codegraph.mdc b/.cursor/rules/codegraph.mdc index 3f23cf6b..c8616cce 100644 --- a/.cursor/rules/codegraph.mdc +++ b/.cursor/rules/codegraph.mdc @@ -16,6 +16,7 @@ Use codegraph for **structural** questions — what calls what, what would break | "Where is X defined?" / "Find symbol named X" | `codegraph_search` | | "What calls function Y?" | `codegraph_callers` | | "What does Y call?" | `codegraph_callees` | +| "How does X reach/become Y? / trace the flow from X to Y" | `codegraph_trace` (one call = the whole path, incl. callback/React/JSX dynamic hops) | | "What would break if I changed Z?" | `codegraph_impact` | | "Show me Y's signature / source / docstring" | `codegraph_node` | | "Give me focused context for a task/area" | `codegraph_context` | @@ -25,7 +26,7 @@ Use codegraph for **structural** questions — what calls what, what would break ### Rules of thumb -- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer. +- **Answer directly — don't delegate exploration.** For "how does X work" / architecture questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. For a specific **flow** ("how does X reach Y") start with `codegraph_trace` from→to — one call returns the whole path with dynamic hops bridged — then ONE `codegraph_explore` for the bodies; don't rebuild the path with `codegraph_search` + `codegraph_callers`. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer. - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context. - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call. - **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call. diff --git a/.github/workflows/deploy-site.yml b/.github/workflows/deploy-site.yml new file mode 100644 index 00000000..b66dde95 --- /dev/null +++ b/.github/workflows/deploy-site.yml @@ -0,0 +1,43 @@ +name: Deploy site to GitHub Pages + +on: + push: + branches: [main] + paths: + - 'site/**' + - '.github/workflows/deploy-site.yml' + workflow_dispatch: + +# Allow GITHUB_TOKEN to deploy to Pages and verify the deployment origin. +permissions: + contents: read + pages: write + id-token: write + +# One deploy at a time; let an in-progress run finish. +concurrency: + group: pages + cancel-in-progress: false + +jobs: + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + - name: Build with Astro + uses: withastro/action@v3 + with: + path: site + node-version: 22 + + deploy: + needs: build + runs-on: ubuntu-latest + environment: + name: github-pages + url: ${{ steps.deployment.outputs.page_url }} + steps: + - name: Deploy to GitHub Pages + id: deployment + uses: actions/deploy-pages@v4 diff --git a/CHANGELOG.md b/CHANGELOG.md index 3cfadd1a..d727e6cd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,9 +7,66 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.9.4] - 2026-05-22 +## [0.9.4] - 2026-05-24 + +### Added +- **Framework-aware route resolution — `request → route → handler → service` + flows now resolve end-to-end across the supported stacks.** Added or fixed + routing for Express (inline arrow handlers → services), Rails, Spring (Java + + Kotlin; bare and class-prefixed mappings), Django/DRF (`router.register` → + ViewSet), Laravel (`Controller@method`), Flask/FastAPI (decorator stacks, + empty-path routers, Flask-RESTful `add_resource`), Gin/chi (group-var routing), + ASP.NET (feature-folder + bare attribute routes), Drupal, Rust (Axum chained + methods, actix builder API), Vapor (Swift grouped routes), Play (`conf/routes`), + Vue/Nuxt SFC templates, Svelte/SvelteKit, and React Router (`` JSX + + object data-router). +- **Dynamic-dispatch flow synthesis — `codegraph_trace`, `codegraph_callees`, and + `codegraph_explore` now follow flows that have no static call edge.** Bridged + channels: callback/observer registration, EventEmitter (`on`/`emit`), React + re-render (`setState` → `render`) and JSX children, Flutter `setState` → `build`, + C++ virtual overrides, and Java/Kotlin interface → implementation dispatch + (e.g. Spring `@Autowired svc.list()` → the impl). Each synthesized hop is + labeled inline in `trace` with where it was wired up. +- **`CODEGRAPH_MCP_TOOLS` — trim the exposed MCP tool surface.** Set it to a + comma-separated list of tool names (e.g. `trace,search,node,context`) to expose + only those codegraph tools over MCP; unset exposes all of them. Names match on + the short form, so `trace` and `codegraph_trace` are equivalent. Lets you + constrain an agent to a minimal surface (or A/B-test tool selection) without + editing the client's MCP config. Inert by default. +- **Release archives now ship with a `SHA256SUMS` file**, and the npm launcher + verifies the bundle it downloads against it — a mismatch aborts before anything + runs. Releases published before this change have no checksum file, so the + verification is skipped (not failed) when none is available. + +### Changed +- **`codegraph_trace` now returns a self-contained flow dossier.** Each hop on + the path is shown with its full body inline (previously just the call-site + line), and the destination's own outgoing calls are appended — so one trace + call usually answers a "how does X reach Y" flow question without a follow-up + `codegraph_explore`/`codegraph_node`/Read. Measured across real repos: fewer + tool calls and lower cost than the prior path-only output, with no wall-clock + regression. +- **`codegraph_node` and `codegraph_trace` now emit line-numbered source** + (`cat -n` style, matching `codegraph_explore` and Read), so an agent can cite + or edit exact lines without re-reading the file just to recover line numbers. +- **`codegraph_explore` now leads with the execution flow** when its query names + the symbols of a flow. Agents call `explore` far more than `trace`, passing a + bag of symbol names that usually spans the flow they're investigating + (`PmsProductController getList PmsProductService list PmsProductServiceImpl`); + `explore` now finds the call path *among those named symbols* — riding + synthesized dynamic-dispatch edges (callback / React re-render / JSX child / + interface→impl) — and shows it first. So a flow question answered through + `explore` gets the trace-quality path without the agent having to switch tools. + Scoped to the named symbols (no wrong-feature wandering) and bridge-capped (no + god-function fan-out); absent when the query is fuzzy or has no connected chain. ### Fixed +- **Static-extraction & resolution correctness fixes** underpinning the framework + work above: C++ inheritance (`base_class_clause` was unhandled, so C++ `extends` + edges were missing), Dart method body ranges (methods were extracted + signature-only), a Python builtin-name handler guard (handlers named + `index`/`get`/`update` were silently dropped), and an explore output-budget + regression that under-returned source on god-file repos. - **Orphaned `codegraph serve --mcp` processes after a parent SIGKILL.** When the MCP host (Claude Code, opencode, …) was force-killed — OOM killer, a `kill -9`, a container teardown — the child kept running indefinitely on @@ -21,13 +78,6 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). `5000`, `0` disables). Resolves [#277](https://github.com/colbymchenry/codegraph/issues/277). -### Added -- **Release archives now ship with a `SHA256SUMS` file**, and the npm launcher - verifies the bundle it downloads against it — a mismatch aborts before - anything runs. Releases published before this change have no checksum file, so - the verification is skipped (not failed) when none is available. - -### Fixed - **`codegraph: no prebuilt bundle for ` after installing through a registry mirror.** Installing `@colbymchenry/codegraph` from a registry that hadn't mirrored the matching per-platform package — most often the diff --git a/CLAUDE.md b/CLAUDE.md index be63c67b..a1131bfb 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -90,6 +90,71 @@ Cursor launches MCP subprocesses with the wrong cwd and doesn't pass `rootUri` i `src/mcp/server-instructions.ts` is sent back to the agent in the MCP `initialize` response. This is the *first* thing every agent sees about how to use the tools — treat it as the authoritative tool guidance and keep it in sync with `instructions-template.ts` and `.cursor/rules/codegraph.mdc`. +## Retrieval performance & dynamic-dispatch coverage (do not regress) + +CodeGraph's core value is letting an agent answer **structural/flow** questions ("how does X reach Y", trace, impact, callers) with a few **fast** codegraph calls and **zero Read/Grep**. The optimization target is **wall-clock latency + tool-call count** — *don't optimize for token cost*. (Cost is **lower**, not "flat" as earlier framing claimed: a current-build with-vs-without A/B across the 7 README repos, median of 4, saved on average **35% cost · 57% tokens · 46% time · 71% tool calls** — reproducing the published README. The mechanism is **far fewer turns over a much smaller accumulated context** — NOT cache-ability: the without-arm's huge token volume is *mostly* cheap cache-reads, which is why token-count savings (57%) look bigger than cost savings (35%). Measure tokens by **summing per-turn assistant usage**, not `result.usage` (last-turn only in current Claude Code). See `docs/benchmarks/call-sequence-analysis.md`.) The mechanism that drives everything here: **an agent falls back to Read/Grep the instant a codegraph answer is insufficient.** So every change is judged by one question — is codegraph's answer sufficient enough to *stop* the agent from reading? + +**Target behavior:** a flow question resolves in **1 codegraph call on small repos, scaling to 3–5 on large**, with **Read/Grep = 0**. When reviewing a PR or trying something new, do not regress this. + +### Adapt the tool to the agent — don't try to change the agent + +The lever that decides whether a retrieval change lands. **Test before building anything here: does this make a tool the agent _already calls_ do more with the input it _already gives_? If it instead needs the agent to behave differently — pick a different tool, query differently, learn from examples — it hits the low-salience wall and won't land.** + +CodeGraph's only channels to influence the agent are low-salience: the MCP `initialize` instructions (`server-instructions.ts`) and the tool descriptions. Changing them does **not** reliably move the agent's tool _choice_ or query style — validated: trace-first steering ported into the server-instructions + tool descriptions (3 wording variants) never reproduced what a CLI `--append-system-prompt` achieved, and **regressed** wall-clock vs baseline. New tools fare worse (rarely chosen — the agent under-picks even `trace`); "better examples" is the same steering. The agent's tool-choice does improve on its own as host models get better at tool use — but that is not ours to force. + +What works is meeting the agent where it already is: +- **Sufficiency** — `codegraph_trace` inlines each hop's body + the destination's own callees, so one trace call ends the flow investigation (no follow-up explore/node/Read). +- **explore-flow** — `codegraph_explore`'s query is a precise bag of symbol names (incl. qualified `Class.method`) spanning the flow the agent is after; explore finds the call path _among those named symbols_ (riding synthesized edges) and leads its output with it — delivering trace-quality flow through the call the agent reliably makes. (`buildFlowFromNamedSymbols`: segment/co-naming disambiguation; ≤1 unnamed bridge so it never wanders a god-function's fan-out.) + +What fails is the inverse — folding a precise answer into a **fuzzy-input** tool. `codegraph_context` gets a description, not symbols, so it can't disambiguate a flow's endpoints and surfaces the _wrong feature_. Precise output needs precise input. + +The remaining lever under this axis is **coverage**: every flow made to connect statically (a new dynamic-dispatch synthesizer) is then surfaced automatically by explore-flow/`trace`, no agent change needed. Reactive/reconciler runtimes (Halo's `ReactiveExtensionClient`, MediatR, Vue Proxy) are the frontier — flows there have no static edges, so nothing surfaces (correctly — silent beats wrong). Full investigation + A/B record: `docs/benchmarks/call-sequence-analysis.md`. + +### Explore budget — keep BOTH budgets monotonic with repo size + +Two functions in `src/mcp/tools.ts` scale explore with indexed file count. This is the expected resolution (a regression here silently forces agents back to Read): + +| Repo | files | explore calls | chars/call | per-file | +|---|---|---|---|---| +| express (small) | 147 | 1 | 18K | 3800 | +| excalidraw/django (medium) | 643–3043 | 2 | 28K | 6500 | +| vscode (large) | 10446 | 3 | 35K | 7000 | +| ~20k / ~40k | — | 4 / 5 | 38K | 7000 | + +- `getExploreBudget(fileCount)` → **call** budget: `<500→1, <5000→2, <15000→3, <25000→4, ≥25000→5` (max 5). +- `getExploreOutputBudget(fileCount)` → **per-call** output (chars / files / per-file). **Invariant: a larger tier must never get a smaller `maxCharsPerFile` than a smaller tier.** (Regression that motivated this doc: the `<5000` tier's 2500 was *below* the `<500` tier's 3800, so on a god-file repo — excalidraw's 415 KB `App.tsx` — one explore returned <1% of the file and forced a Read.) +- Explore output must **never tell the agent to "use Read"** — steer to another `codegraph_explore` and "treat returned source as already Read." + +### Dynamic-dispatch coverage — the flow must EXIST in the graph end-to-end + +Static tree-sitter extraction misses computed/indirect calls, so flows break at dynamic dispatch and the agent reads to reconstruct them. Synthesizers/resolvers bridge these so `trace`/`explore` connect end-to-end (`src/resolution/callback-synthesizer.ts`, `src/resolution/frameworks/`). Channels today: callback/observer, EventEmitter, **React re-render** (`setState`→`render`), **JSX child** (`render`→child component), django ORM descriptor. All synthesized edges are `provenance:'heuristic'` with `metadata.synthesizedBy` + `registeredAt` (the wiring site), surfaced inline in `trace`, the `node` trail, and `context` call-paths. + +**Principle: partial coverage is WORSE than none.** Bridging one boundary but not the next reveals a hop the agent then drills + reads to finish. Measured on excalidraw: react-render alone *raised* reads to 5–7; only completing the flow (adding the jsx-child hop) dropped it to 0–1. **Always close the flow end-to-end and re-measure** — never ship a half-bridged flow. + +### Validation methodology (REQUIRED for every new language/framework) + +For each **language × framework**, validate on **small, medium, and large** real repos with **≥3 different flow prompts** each: + +1. **Pick the canonical flow** for the framework ("how does X reach Y": state→render, request→handler→view, query→SQL, action→reducer→store…). +2. **Deterministic probes** (`scripts/agent-eval/probe-{trace,node,context,explore}.mjs` against the built `dist/`): `trace(from,to)` connects end-to-end with no break; **no node explosion** (`select count(*) from nodes` stable before/after re-index); synthesized-edge **precision** spot-check (`select … where provenance='heuristic'`). +3. **Agent A/B** (`scripts/agent-eval/run-all.sh ""`): with vs without codegraph, **≥2 runs/arm** (run-to-run variance is large — never conclude from n=1). Record **duration, total tool calls, Read, Grep**. Optional forced-Read-0 sufficiency proof via the block-read hook (`scripts/agent-eval/hook-settings.json`). +4. **Pass bar:** a normal flow question reaches **~0 Read/Grep within the repo's explore-call budget**, runs **faster** than without-codegraph, and shows **no regression on a control repo**. Record the numbers in `docs/design/dynamic-dispatch-coverage-playbook.md` (the coverage matrix). + +Full playbook + per-mechanism design: `docs/design/dynamic-dispatch-coverage-playbook.md` and `docs/design/callback-edge-synthesis.md`. + +### Worked example — Excalidraw (TS/React, medium, 643 files) + +The template to replicate per language/framework. Question: *"how does updating an element re-render the canvas on screen?"* (the full flow crosses three React boundaries: observer callback, `setState`→`render`, and JSX child). + +| Stage | duration | Read | Grep | codegraph | +|---|---|---|---|---| +| Without codegraph | 115–139s | 9–10 | 10–11 | 0 | +| Broken (explore-budget regression) | 131–139s | 5–10 | 3–5 | 6–14 | +| Fixed (budget + msgs + synthesis) | 64–112s | 0–2 | 2–4 | 3–**10** | +| + trace-first steering | **51–74s** | **0–2** | 0–4 | **3–4** | + +n=4 unhooked runs/stage, same prompt. After steering flow questions to `codegraph_trace` first: **best run 0 Read / 0 Grep / 3 codegraph / 51s**; **2 of 4 fully clean** (0 Read, 0 Grep). Steering eliminated the over-drill variance — call count tightened from 3–10 to 3–4, trace adoption went 3/4 → 4/4, and the `search`+`callers` path-reconstruction floundering dropped to 0. Run-to-run variance is still real; report the range, never a single run. **Residual reads/greps are all the nonce data-flow** (`canvasNonce` — a local prop with no graph edges); that's the def-use/data-flow frontier, left deliberately uncovered (tracking every local would explode the graph). Validated: `trace(mutateElement, renderStaticScene)` connects in **6 hops** across all three boundaries (`mutateElement → triggerUpdate → [callback] triggerRender → [react-render] render → [jsx] StaticCanvas → renderStaticScene`), each hop showing inline source + the wiring site; node count stable at 9,289; 1 callback + 46 react-render + 280 jsx-render synthesized edges (no explosion, precision-checked). + ## Tests Tests live in `__tests__/` and mirror the module they cover. Notable ones beyond the obvious: diff --git a/README.md b/README.md index faf357bc..0b348cb8 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,8 @@ **~35% cheaper · ~70% fewer tool calls · 100% local** +### [Documentation & Website →](https://colbymchenry.github.io/codegraph/) + [![npm version](https://img.shields.io/npm/v/@colbymchenry/codegraph.svg)](https://www.npmjs.com/package/@colbymchenry/codegraph) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Self-contained](https://img.shields.io/badge/Node.js-bundled%20%C2%B7%20none%20required-brightgreen.svg)](https://nodejs.org/) @@ -76,26 +78,26 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil ### Benchmark Results -Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. +Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on **v0.9.4** (2026-05-24)._ -> **Average: 35% cheaper · 59% fewer tokens · 49% faster · 70% fewer tool calls** +> **Average: 35% cheaper · 57% fewer tokens · 46% faster · 71% fewer tool calls** | Codebase | Language | Cost | Tokens | Time | Tool calls | |----------|----------|------|--------|------|------------| -| **VS Code** | TypeScript · ~10k files | 35% cheaper | 73% fewer | 41% faster | 72% fewer | -| **Excalidraw** | TypeScript · ~600 | 47% cheaper | 73% fewer | 60% faster | 86% fewer | -| **Django** | Python · ~2.7k | 34% cheaper | 64% fewer | 59% faster | 81% fewer | -| **Tokio** | Rust · ~700 | 52% cheaper | 81% fewer | 63% faster | 89% fewer | -| **OkHttp** | Java · ~640 | 17% cheaper | 41% fewer | 36% faster | 64% fewer | -| **Gin** | Go · ~150 | 22% cheaper | 23% fewer | 34% faster | 19% fewer | -| **Alamofire** | Swift · ~100 | 38% cheaper | 59% fewer | 51% faster | 77% fewer | +| **VS Code** | TypeScript · ~10k files | 26% cheaper | 78% fewer | 52% faster | 85% fewer | +| **Excalidraw** | TypeScript · ~640 | 52% cheaper | 90% fewer | 73% faster | 96% fewer | +| **Django** | Python · ~3k | 12% cheaper | 36% fewer | 19% faster | 53% fewer | +| **Tokio** | Rust · ~790 | 82% cheaper | 86% fewer | 71% faster | 92% fewer | +| **OkHttp** | Java · ~645 | 2% cheaper | 13% fewer | 31% faster | 45% fewer | +| **Gin** | Go · ~110 | 21% cheaper | 34% fewer | 27% faster | 40% fewer | +| **Alamofire** | Swift · ~110 | 47% cheaper | 64% fewer | 48% faster | 83% fewer | The gains scale with codebase size: on large repos the agent answers from the index in a handful of calls with **zero file reads**, while the no-CodeGraph agent fans out across grep/find/Read (and the sub-agents it spawns). On a small repo like Gin (~150 files) native search is already cheap, so the margin narrows.
Full benchmark details -**Methodology.** Each arm is `claude -p` (Claude Opus 4.7, Claude Code v2.1.145) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. +**Methodology.** Each arm is `claude -p` (Claude Opus 4.7) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated on codegraph **v0.9.4** (2026-05-24); per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Tokio's without-arm hit $2.41/3m one batch). **Queries:** | Codebase | Query | @@ -111,13 +113,13 @@ The gains scale with codebase size: on large repos the agent answers from the in **Raw medians — WITH → WITHOUT:** | Codebase | Cost | Tokens | Time | Tool calls | |----------|------|--------|------|------------| -| VS Code | $0.42 → $0.64 | 393k → 1.4M | 1m 0s → 1m 43s | 7 → 23 | -| Excalidraw | $0.54 → $1.02 | 851k → 3.2M | 1m 17s → 3m 14s | 12 → 83 | -| Django | $0.41 → $0.62 | 499k → 1.4M | 1m 0s → 2m 25s | 9 → 48 | -| Tokio | $0.50 → $1.04 | 657k → 3.4M | 1m 5s → 2m 56s | 9 → 75 | -| OkHttp | $0.36 → $0.44 | 352k → 596k | 45s → 1m 11s | 5 → 14 | -| Gin | $0.36 → $0.46 | 431k → 562k | 47s → 1m 11s | 7 → 8 | -| Alamofire | $0.61 → $0.99 | 1.1M → 2.6M | 1m 19s → 2m 41s | 15 → 64 | +| VS Code | $0.60 → $0.80 | 601k → 2.8M | 1m 10s → 2m 26s | 8 → 55 | +| Excalidraw | $0.43 → $0.90 | 344k → 3.5M | 48s → 2m 58s | 3 → 79 | +| Django | $0.59 → $0.67 | 739k → 1.2M | 1m 19s → 1m 38s | 9 → 19 | +| Tokio | $0.42 → $2.41 | 379k → 2.6M | 53s → 3m 2s | 4 → 53 | +| OkHttp | $0.47 → $0.47 | 636k → 730k | 42s → 1m 1s | 6 → 11 | +| Gin | $0.37 → $0.47 | 444k → 675k | 44s → 1m 0s | 6 → 10 | +| Alamofire | $0.61 → $1.14 | 1.0M → 2.8M | 1m 17s → 2m 27s | 12 → 69 | **Why CodeGraph wins:** with the index available, the agent answers directly — `codegraph_context` to map the area, then one `codegraph_explore` for the relevant source — and stops, usually with zero file reads. Without it, the agent (and the Explore sub-agents it spawns) spends most of its budget on discovery (find/ls/grep) before reading the right code. CodeGraph only helps when queried *directly*, so its instructions steer agents to answer directly rather than delegate exploration to file-reading sub-agents — otherwise a sub-agent reads files regardless and CodeGraph becomes overhead. @@ -263,25 +265,21 @@ CodeGraph builds a semantic knowledge graph of codebases for faster, smarter cod ### If `.codegraph/` exists in the project -**NEVER call `codegraph_explore` or `codegraph_context` directly in the main session.** These tools return large amounts of source code that fills up main session context. Instead, ALWAYS spawn an Explore agent for any exploration question (e.g., "how does X work?", "explain the Y system", "where is Z implemented?"). - -**When spawning Explore agents**, include this instruction in the prompt: - -> This project has CodeGraph initialized (.codegraph/ exists). Use `codegraph_explore` as your PRIMARY tool — it returns full source code sections from all relevant files in one call. -> -> **Rules:** -> 1. Follow the explore call budget in the `codegraph_explore` tool description — it scales automatically based on project size. -> 2. Do NOT re-read files that codegraph_explore already returned source code for. The source sections are complete and authoritative. -> 3. Only fall back to grep/glob/read for files listed under "Additional relevant files" if you need more detail, or if codegraph returned no results. +**Answer directly with CodeGraph — don't delegate exploration to a file-reading sub-agent or a grep/read loop.** CodeGraph *is* the pre-built search index; re-deriving its answers with grep + Read repeats work it already did and costs more for the same result. For "how does X work?", architecture, trace, or where-is-X questions, answer in a handful of CodeGraph calls and stop — typically with **zero file reads**. The returned source is complete and authoritative: treat it as already read and do not re-open those files. Reach for raw Read/Grep only to confirm a specific detail CodeGraph didn't cover. -**The main session may only use these lightweight tools directly** (for targeted lookups before making edits, not for exploration): +**Tool selection by intent:** | Tool | Use For | |------|---------| -| `codegraph_search` | Find symbols by name | -| `codegraph_callers` / `codegraph_callees` | Trace call flow | +| `codegraph_context` | Map a task / feature / area first — composes search + node + callers + callees in one call | +| `codegraph_trace` | "How does X reach Y" — the call path, each hop's body inline (follows dynamic-dispatch hops grep can't) | +| `codegraph_explore` | Survey several related symbols' source in ONE budget-capped call | +| `codegraph_search` | Find a symbol by name | +| `codegraph_callers` / `codegraph_callees` | Walk call flow one hop at a time | | `codegraph_impact` | Check what's affected before editing | -| `codegraph_node` | Get a single symbol's details | +| `codegraph_node` | Get a single symbol's source / signature | + +A direct CodeGraph answer is a handful of calls; a grep/read exploration is dozens. ### If `.codegraph/` does NOT exist @@ -297,34 +295,23 @@ At the start of a session, ask the user if they'd like to initialize CodeGraph: ## How It Works ``` -┌─────────────────────────────────────────────────────────────────┐ -│ Claude Code │ -│ │ -│ "Implement user authentication" │ -│ │ │ -│ ▼ │ -│ ┌─────────────────┐ ┌─────────────────┐ │ -│ │ Explore Agent │ ──── │ Explore Agent │ │ -│ └────────┬────────┘ └────────┬────────┘ │ -│ │ │ │ -└───────────┼────────────────────────┼─────────────────────────────┘ - │ │ - ▼ ▼ ┌───────────────────────────────────────────────────────────────────┐ -│ CodeGraph MCP Server │ -│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ -│ │ Search │ │ Callers │ │ Context │ │ -│ │ "auth" │ │ "login()" │ │ for task │ │ -│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ -│ │ │ │ │ -│ └────────────────┼────────────────┘ │ -│ ▼ │ -│ ┌───────────────────────┐ │ -│ │ SQLite Graph DB │ │ -│ │ • 387 symbols │ │ -│ │ • 1,204 edges │ │ -│ │ • Instant lookups │ │ -│ └───────────────────────┘ │ +│ Claude Code │ +│ │ +│ "How does a request reach the database?" │ +│ calls CodeGraph tools directly — no Explore sub-agent │ +│ │ │ +└─────────────────────────────────┬─────────────────────────────────┘ + │ + ▼ +┌───────────────────────────────────────────────────────────────────┐ +│ CodeGraph MCP Server │ +│ │ +│ context · trace · explore · callers · callees · impact │ +│ │ │ +│ ▼ │ +│ SQLite knowledge graph │ +│ symbols · edges · files · FTS5 full-text search │ └───────────────────────────────────────────────────────────────────┘ ``` @@ -397,6 +384,7 @@ When running as an MCP server, CodeGraph exposes these tools to Claude Code: |------|---------| | `codegraph_search` | Find symbols by name across the codebase | | `codegraph_context` | Build relevant code context for a task | +| `codegraph_trace` | Trace the call path between two symbols ("how does X reach Y") in one call — each hop with its body inline, following dynamic-dispatch hops (callbacks, React re-render, interface→impl) that grep can't | | `codegraph_callers` | Find what calls a function | | `codegraph_callees` | Find what a function calls | | `codegraph_impact` | Analyze what code is affected by changing a symbol | diff --git a/__tests__/drupal.test.ts b/__tests__/drupal.test.ts index fda5415b..c4f4421e 100644 --- a/__tests__/drupal.test.ts +++ b/__tests__/drupal.test.ts @@ -87,6 +87,52 @@ describe('drupalResolver.detect', () => { const ctx = makeContext({ readFile: () => '{ bad json' }); expect(drupalResolver.detect(ctx)).toBe(false); }); + + it('returns true for a contrib module with empty require (composer name/type)', () => { + const ctx = makeContext({ + readFile: (f) => + f === 'composer.json' + ? JSON.stringify({ + name: 'drupal/admin_toolbar', + type: 'drupal-module', + require: {}, + }) + : null, + }); + expect(drupalResolver.detect(ctx)).toBe(true); + }); + + it('returns true via the *.info.yml fallback when composer.json is absent', () => { + const ctx = makeContext({ + readFile: () => null, + getAllFiles: () => [ + 'mymodule/mymodule.info.yml', + 'mymodule/mymodule.routing.yml', + ], + }); + expect(drupalResolver.detect(ctx)).toBe(true); + }); + + it('returns false for a stray *.info.yml with no Drupal PHP/route file', () => { + const ctx = makeContext({ + readFile: () => null, + getAllFiles: () => ['some/unrelated.info.yml'], + }); + expect(drupalResolver.detect(ctx)).toBe(false); + }); +}); + +describe('drupalResolver.claimsReference', () => { + it('claims FQCN handler refs and hook names the pre-filter would drop', () => { + expect(drupalResolver.claimsReference!('\\Drupal\\m\\Form\\SettingsForm')).toBe(true); + expect(drupalResolver.claimsReference!('\\Drupal\\m\\Controller\\C:setNoJsCookie')).toBe(true); + expect(drupalResolver.claimsReference!('hook_form_alter')).toBe(true); + }); + + it('does not claim ordinary identifiers or entity-handler dotted refs', () => { + expect(drupalResolver.claimsReference!('someHelperFunction')).toBe(false); + expect(drupalResolver.claimsReference!('comment.default')).toBe(false); + }); }); // --------------------------------------------------------------------------- @@ -435,6 +481,51 @@ describe('drupalResolver.resolve', () => { }; expect(drupalResolver.resolve(ref, ctx)).toBeNull(); }); + + it('resolves a single-colon controller-service ref (Class:method)', () => { + const methodNode = { + id: 'method:nojs1', + kind: 'method' as const, + name: 'setNoJsCookie', + qualifiedName: 'BigPipeController::setNoJsCookie', + filePath: 'core/modules/big_pipe/src/Controller/BigPipeController.php', + language: 'php' as const, + startLine: 10, + endLine: 20, + startColumn: 0, + endColumn: 0, + updatedAt: 0, + }; + const classNode = { + id: 'class:nojs2', + kind: 'class' as const, + name: 'BigPipeController', + qualifiedName: 'BigPipeController', + filePath: 'core/modules/big_pipe/src/Controller/BigPipeController.php', + language: 'php' as const, + startLine: 5, + endLine: 30, + startColumn: 0, + endColumn: 0, + updatedAt: 0, + }; + const ctx = makeContext({ + getNodesByName: (name) => (name === 'BigPipeController' ? [classNode] : []), + getNodesInFile: () => [classNode, methodNode], + }); + const ref = { + fromNodeId: 'route:x', + referenceName: '\\Drupal\\big_pipe\\Controller\\BigPipeController:setNoJsCookie', + referenceKind: 'references' as const, + line: 1, + column: 0, + filePath: 'big_pipe.routing.yml', + language: 'yaml' as const, + }; + const resolved = drupalResolver.resolve(ref, ctx); + expect(resolved).not.toBeNull(); + expect(resolved!.targetNodeId).toBe('method:nojs1'); + }); }); // --------------------------------------------------------------------------- diff --git a/__tests__/extraction.test.ts b/__tests__/extraction.test.ts index 92717759..99c38345 100644 --- a/__tests__/extraction.test.ts +++ b/__tests__/extraction.test.ts @@ -1151,6 +1151,11 @@ class UserService { const privateMethod = methodNodes.find((m) => m.name === '_privateMethod'); expect(privateMethod).toBeDefined(); expect(privateMethod?.visibility).toBe('private'); + + // Dart models a method body as a SIBLING of the signature, so the method + // node must be extended to span its body (not just the signature line) — + // required for body-level analysis (callees, the callback synthesizer). + expect(findById!.endLine).toBeGreaterThan(findById!.startLine); }); it('should extract top-level function declarations', () => { diff --git a/__tests__/frameworks-integration.test.ts b/__tests__/frameworks-integration.test.ts index b64e8c66..2eb99447 100644 --- a/__tests__/frameworks-integration.test.ts +++ b/__tests__/frameworks-integration.test.ts @@ -57,3 +57,143 @@ describe('Django end-to-end framework extraction', () => { cg.close(); }); }); + +describe('Flask end-to-end framework extraction', () => { + let tmpDir: string | undefined; + afterEach(() => { + if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + tmpDir = undefined; + }); + + it('resolves stacked routes across @login_required to a view named after a builtin (index)', async () => { + tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-flask-')); + fs.writeFileSync(path.join(tmpDir, 'requirements.txt'), 'flask==3.0\n'); + fs.writeFileSync( + path.join(tmpDir, 'app.py'), + 'from flask import Blueprint, render_template\n' + + 'from flask_login import login_required\n' + + 'bp = Blueprint("main", __name__)\n' + + '\n' + + '@bp.route("/", methods=["GET", "POST"])\n' + + '@bp.route("/index", methods=["GET", "POST"])\n' + + '@login_required\n' + + 'def index():\n' + + ' return render_template("index.html")\n' + ); + + const cg = CodeGraph.initSync(tmpDir); + await cg.indexAll(); + + // Both stacked @bp.route decorators are extracted (the second was previously + // dropped because @login_required broke the "def must follow" assumption). + const routes = cg.getNodesByKind('route'); + expect(routes.map((r) => r.name).sort()).toEqual(['GET /', 'GET /index']); + + // The view function exists even though its name is a Python builtin method. + const fn = cg.getNodesByKind('function').find((n) => n.name === 'index'); + expect(fn).toBeDefined(); + + // Both routes resolve to it — exercises the bare-name builtin guard, which + // previously filtered the `index` reference as a builtin method. + for (const route of routes) { + const edges = cg.getOutgoingEdges(route.id); + const toView = edges.find((e) => e.target === fn!.id && e.kind === 'references'); + expect(toView, `route ${route.name} should resolve to index()`).toBeDefined(); + } + + cg.close(); + }); +}); + +describe('Flutter end-to-end — setState→build synthesis', () => { + let tmpDir: string | undefined; + afterEach(() => { + if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + tmpDir = undefined; + }); + + it('synthesizes a handler→build edge when a State method calls setState', async () => { + tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-flutter-')); + fs.writeFileSync( + path.join(tmpDir, 'main.dart'), + 'import "package:flutter/material.dart";\n' + + 'class CounterPage extends StatefulWidget {\n' + + ' @override\n' + + ' State createState() => _CounterPageState();\n' + + '}\n' + + 'class _CounterPageState extends State {\n' + + ' int _count = 0;\n' + + ' void _increment() {\n' + + ' setState(() {\n' + + ' _count++;\n' + + ' });\n' + + ' }\n' + + ' @override\n' + + ' Widget build(BuildContext context) {\n' + + ' return Text("$_count");\n' + + ' }\n' + + '}\n' + ); + + const cg = CodeGraph.initSync(tmpDir); + await cg.indexAll(); + + const methods = cg.getNodesByKind('method'); + const increment = methods.find((n) => n.name === '_increment'); + const build = methods.find((n) => n.name === 'build'); + expect(increment).toBeDefined(); + expect(build).toBeDefined(); + + // setState re-runs build (Flutter-internal, no static edge). The synthesizer + // bridges the handler → build so the "tap → setState → rebuilt UI" flow connects. + const edges = cg.getOutgoingEdges(increment!.id); + const toBuild = edges.find((e) => e.target === build!.id && e.kind === 'calls'); + expect(toBuild, '_increment should reach build via setState synthesis').toBeDefined(); + + cg.close(); + }); +}); + +describe('C++ end-to-end — virtual override synthesis', () => { + let tmpDir: string | undefined; + afterEach(() => { + if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + tmpDir = undefined; + }); + + it('bridges a base virtual method to the subclass override', async () => { + tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-cpp-')); + fs.writeFileSync( + path.join(tmpDir, 'iter.cpp'), + 'class Iterator {\n' + + ' public:\n' + + ' virtual void Next() { }\n' + + '};\n' + + 'class DBIter : public Iterator {\n' + + ' public:\n' + + ' void Next() override { advance(); }\n' + + ' void advance() { }\n' + + '};\n' + ); + + const cg = CodeGraph.initSync(tmpDir); + await cg.indexAll(); + + // Two methods named Next: the base virtual (lower line) and the override. + const nexts = cg + .getNodesByKind('method') + .filter((n) => n.name === 'Next') + .sort((a, b) => a.startLine - b.startLine); + expect(nexts.length).toBe(2); + const [baseNext, overrideNext] = nexts; + + // A vtable call to Iterator::Next dispatches to DBIter::Next — bridge it so + // trace/callees from the interface method reaches the implementation. + const edge = cg + .getOutgoingEdges(baseNext!.id) + .find((e) => e.target === overrideNext!.id && e.kind === 'calls'); + expect(edge, 'Iterator::Next should reach DBIter::Next via override synthesis').toBeDefined(); + + cg.close(); + }); +}); diff --git a/__tests__/frameworks.test.ts b/__tests__/frameworks.test.ts index a5e5c56b..1c2c643f 100644 --- a/__tests__/frameworks.test.ts +++ b/__tests__/frameworks.test.ts @@ -123,6 +123,52 @@ def create_user(id): expect(nodes[0].name).toBe('POST /'); expect(references[0].referenceName).toBe('create_user'); }); + + it('resolves the handler across an intervening decorator (@login_required)', () => { + const src = ` +@bp.route('/profile') +@login_required +def profile(): + return render_template('profile.html') +`; + const { nodes, references } = flaskResolver.extract!('routes.py', src); + expect(nodes[0].name).toBe('GET /profile'); + expect(references[0].referenceName).toBe('profile'); + }); + + it('extracts stacked @x.route decorators bound to one view', () => { + const src = ` +@bp.route('/', methods=['GET', 'POST']) +@bp.route('/index', methods=['GET', 'POST']) +@login_required +def index(): + return render_template('index.html') +`; + const { nodes, references } = flaskResolver.extract!('routes.py', src); + expect(nodes.map((n) => n.name)).toEqual(['GET /', 'GET /index']); + expect(references.map((r) => r.referenceName)).toEqual(['index', 'index']); + }); + + it('extracts the method from a tuple methods=(...) (not just a list)', () => { + const src = ` +@blueprint.route('/api/articles', methods=('POST',)) +def make_article(): + pass +`; + const { nodes, references } = flaskResolver.extract!('views.py', src); + expect(nodes[0].name).toBe('POST /api/articles'); + expect(references[0].referenceName).toBe('make_article'); + }); + + it('extracts Flask-RESTful api.add_resource(Resource, paths) → the Resource class', () => { + const src = ` +api.add_resource(TodoResource, '/todos/') +api.add_org_resource(AlertResource, '/api/alerts/', endpoint='alert') +`; + const { nodes, references } = flaskResolver.extract!('api.py', src); + expect(nodes.map((n) => n.name)).toEqual(['ANY /todos/', 'ANY /api/alerts/']); + expect(references.map((r) => r.referenceName)).toEqual(['TodoResource', 'AlertResource']); + }); }); describe('fastapiResolver.extract', () => { @@ -147,6 +193,32 @@ def create_item(item: Item): expect(nodes[0].name).toBe('POST /items'); expect(references[0].referenceName).toBe('create_item'); }); + + it('extracts a route mounted at the router/prefix root (empty path)', () => { + const src = ` +@router.get("", response_model=ListOfArticles, name="articles:list") +async def list_articles(): + return [] +`; + const { nodes, references } = fastapiResolver.extract!('articles.py', src); + expect(nodes[0].name).toBe('GET /'); + expect(references[0].referenceName).toBe('list_articles'); + }); + + it('extracts a multi-line decorator with an empty path', () => { + const src = ` +@router.post( + "", + status_code=201, + response_model=ArticleInResponse, +) +async def create_article(): + pass +`; + const { nodes, references } = fastapiResolver.extract!('articles.py', src); + expect(nodes[0].name).toBe('POST /'); + expect(references[0].referenceName).toBe('create_article'); + }); }); import { expressResolver } from '../src/resolution/frameworks/express'; @@ -463,13 +535,13 @@ describe('laravelResolver.extract', () => { const src = `Route::get('/users', [UserController::class, 'index']);\n`; const { nodes, references } = laravelResolver.extract!('routes/web.php', src); expect(nodes[0].name).toBe('GET /users'); - expect(references[0].referenceName).toBe('index'); + expect(references[0].referenceName).toBe('UserController@index'); }); it('extracts route with Controller@action syntax', () => { const src = `Route::post('/users', 'UserController@store');\n`; const { nodes, references } = laravelResolver.extract!('routes/web.php', src); - expect(references[0].referenceName).toBe('store'); + expect(references[0].referenceName).toBe('UserController@store'); }); it('extracts resource route', () => { @@ -487,13 +559,13 @@ describe('railsResolver.extract', () => { const src = `get '/users', to: 'users#index'\n`; const { nodes, references } = railsResolver.extract!('config/routes.rb', src); expect(nodes[0].name).toBe('GET /users'); - expect(references[0].referenceName).toBe('index'); + expect(references[0].referenceName).toBe('users#index'); }); it('extracts route without to: keyword', () => { const src = `post '/items' => 'items#create'\n`; const { nodes, references } = railsResolver.extract!('config/routes.rb', src); - expect(references[0].referenceName).toBe('create'); + expect(references[0].referenceName).toBe('items#create'); }); }); @@ -511,6 +583,75 @@ public List listUsers() { expect(nodes[0].name).toBe('GET /users'); expect(references[0].referenceName).toBe('listUsers'); }); + + it('extracts a Kotlin @GetMapping with a fun handler', () => { + const src = ` +@GetMapping("/vets") +fun showVetList(model: MutableMap): String { + return "vets" +} +`; + const { nodes, references } = springResolver.extract!('VetController.kt', src); + expect(nodes[0].name).toBe('GET /vets'); + expect(references[0].referenceName).toBe('showVetList'); + expect(nodes[0].language).toBe('kotlin'); + }); + + it('joins a Kotlin class @RequestMapping prefix and skips a stacked annotation', () => { + const src = ` +@RestController +@RequestMapping("/owners") +class OwnerController { + @GetMapping("/{ownerId}") + @ResponseBody + fun showOwner(@PathVariable ownerId: Int): String { + return "owner" + } +} +`; + const { nodes, references } = springResolver.extract!('OwnerController.kt', src); + expect(nodes[0].name).toBe('GET /owners/{ownerId}'); + expect(references[0].referenceName).toBe('showOwner'); + }); +}); + +import { playResolver } from '../src/resolution/frameworks/play'; +import { isSourceFile, isPlayRoutesFile } from '../src/extraction/grammars'; + +describe('playResolver.extract (conf/routes)', () => { + it('extracts METHOD /path Controller.action routes, dropping the package + args', () => { + const src = `# Routes +GET / controllers.Application.index +GET /computers controllers.Application.list(p: Int ?= 0, s: Int ?= 2) +POST /computers controllers.Application.save +-> /v1/posts v1.post.PostRouter +`; + const { nodes, references } = playResolver.extract!('conf/routes', src); + expect(nodes.map((n) => n.name)).toEqual([ + 'GET /', + 'GET /computers', + 'POST /computers', + ]); // the `->` include is skipped + expect(references.map((r) => r.referenceName)).toEqual([ + 'Application.index', + 'Application.list', + 'Application.save', + ]); + }); + + it('only runs on Play routes files', () => { + expect(playResolver.extract!('app/Foo.scala', 'GET / controllers.X.y').nodes).toHaveLength(0); + }); +}); + +describe('Play routes file detection', () => { + it('recognizes conf/routes (extensionless) and *.routes as source files', () => { + expect(isPlayRoutesFile('conf/routes')).toBe(true); + expect(isPlayRoutesFile('myapp/conf/routes')).toBe(true); + expect(isPlayRoutesFile('conf/admin.routes')).toBe(true); + expect(isSourceFile('conf/routes')).toBe(true); + expect(isPlayRoutesFile('src/routes.ts')).toBe(false); + }); }); import { goResolver } from '../src/resolution/frameworks/go'; @@ -528,6 +669,14 @@ describe('goResolver.extract', () => { const { nodes, references } = goResolver.extract!('main.go', src); expect(references[0].referenceName).toBe('createItem'); }); + + it('extracts gorilla/mux HandleFunc on a subrouter var, ignoring chained .Methods()', () => { + // `s` is a PathPrefix().Subrouter() var — any receiver is matched; the + // trailing .Methods("GET") doesn't break the handler capture. + const src = `s.HandleFunc("/users/{id}", listUsers).Methods("GET")\n`; + const { references } = goResolver.extract!('routes.go', src); + expect(references[0].referenceName).toBe('listUsers'); + }); }); import { rustResolver } from '../src/resolution/frameworks/rust'; @@ -539,6 +688,50 @@ describe('rustResolver.extract', () => { expect(nodes[0].name).toBe('GET /users'); expect(references[0].referenceName).toBe('list_users'); }); + + it('extracts every method from a chained axum .route (get().put())', () => { + const src = `let app = Router::new().route("/user", get(get_current_user).put(update_user));\n`; + const { nodes, references } = rustResolver.extract!('main.rs', src); + expect(nodes.map((n) => n.name)).toEqual(['GET /user', 'PUT /user']); + expect(references.map((r) => r.referenceName)).toEqual([ + 'get_current_user', + 'update_user', + ]); + }); + + it('extracts a multi-line axum .route with a namespaced handler', () => { + const src = ` +let app = Router::new() + .route( + "/articles/feed", + get(listing::feed_articles), + ); +`; + const { nodes, references } = rustResolver.extract!('main.rs', src); + expect(nodes[0].name).toBe('GET /articles/feed'); + expect(references[0].referenceName).toBe('feed_articles'); + }); + + it('extracts actix web::resource().route(web::METHOD().to(handler))', () => { + const src = `App::new().service(web::resource("/user/{id}").route(web::get().to(get_user)))\n`; + const { nodes, references } = rustResolver.extract!('main.rs', src); + expect(nodes[0].name).toBe('GET /user/{id}'); + expect(references[0].referenceName).toBe('get_user'); + }); + + it('extracts actix web::resource("/").to(handler) (all methods)', () => { + const src = `App::new().service(web::resource("/").to(index))\n`; + const { nodes, references } = rustResolver.extract!('main.rs', src); + expect(nodes[0].name).toBe('ANY /'); + expect(references[0].referenceName).toBe('index'); + }); + + it('extracts actix App-level .route("/path", web::METHOD().to(handler))', () => { + const src = `App::new().route("/health", web::get().to(health_check))\n`; + const { nodes, references } = rustResolver.extract!('main.rs', src); + expect(nodes[0].name).toBe('GET /health'); + expect(references[0].referenceName).toBe('health_check'); + }); }); describe('rustResolver.resolve cargo workspace crates', () => { @@ -871,22 +1064,94 @@ describe('vaporResolver.extract', () => { it('extracts route from app.get with use:', () => { const src = `app.get("users", use: listUsers)\n`; const { nodes, references } = vaporResolver.extract!('routes.swift', src); - expect(nodes[0].name).toBe('GET users'); + expect(nodes[0].name).toBe('GET /users'); expect(references[0].referenceName).toBe('listUsers'); }); + + it('extracts grouped RouteCollection routes with the group prefix and no path arg', () => { + const src = ` +func boot(routes: RoutesBuilder) throws { + let todos = routes.grouped("todos") + todos.get(use: index) + todos.post(use: create) + todos.group(":todoID") { todo in + todo.delete(use: delete) + } +} +`; + const { nodes, references } = vaporResolver.extract!('TodoController.swift', src); + expect(nodes.map((n) => n.name).sort()).toEqual([ + 'DELETE /todos/:todoID', + 'GET /todos', + 'POST /todos', + ]); + expect(references.map((r) => r.referenceName).sort()).toEqual([ + 'create', + 'delete', + 'index', + ]); + }); + + it('handles use: self.handler and non-string path segments', () => { + const src = `router.get("users", User.parameter, "edit", use: self.editUserHandler)\n`; + const { nodes, references } = vaporResolver.extract!('UserController.swift', src); + expect(nodes[0].name).toBe('GET /users/edit'); + expect(references[0].referenceName).toBe('editUserHandler'); + }); + + it('ignores non-route .get calls that lack use: (e.g. Environment.get)', () => { + const src = `let host = Environment.get("DATABASE_HOST") ?? "localhost"\n`; + const { nodes } = vaporResolver.extract!('configure.swift', src); + expect(nodes).toHaveLength(0); + }); }); import { reactResolver } from '../src/resolution/frameworks/react'; import { svelteResolver } from '../src/resolution/frameworks/svelte'; -describe('reactResolver.extract (smoke)', () => { - it('returns { nodes, references } shape', () => { +describe('reactResolver.extract — React Router', () => { + it('extracts a v6 }>', () => { const src = `}/>`; - const result = reactResolver.extract!('App.tsx', src); - expect(result).toHaveProperty('nodes'); - expect(result).toHaveProperty('references'); - expect(Array.isArray(result.nodes)).toBe(true); - expect(Array.isArray(result.references)).toBe(true); + const { nodes, references } = reactResolver.extract!('App.tsx', src); + const route = nodes.find((n) => n.kind === 'route'); + expect(route?.name).toBe('/users'); + expect(references[0]?.referenceName).toBe('UsersPage'); + }); + + it('extracts a v5 with attributes in any order', () => { + const src = ``; + const { nodes, references } = reactResolver.extract!('App.jsx', src); + const route = nodes.find((n) => n.kind === 'route'); + expect(route?.name).toBe('/login'); + expect(references[0]?.referenceName).toBe('Login'); + }); + + it('does not treat the container as a route', () => { + const src = `}/>`; + const routes = reactResolver.extract!('App.tsx', src).nodes.filter((n) => n.kind === 'route'); + expect(routes).toHaveLength(1); + expect(routes[0]?.name).toBe('/x'); + }); + + it('extracts createBrowserRouter object routes ({ path, element/Component })', () => { + const src = `const router = createBrowserRouter([ + { path: "/dashboard", element: }, + { path: "/login", Component: Login }, + ]);`; + const { nodes, references } = reactResolver.extract!('router.tsx', src); + const routes = nodes.filter((n) => n.kind === 'route'); + expect(routes.map((n) => n.name).sort()).toEqual(['/dashboard', '/login']); + expect(references.map((r) => r.referenceName).sort()).toEqual(['Dashboard', 'Login']); + }); + + it('does not treat config files or a nextjs-pages dir as Next.js routes', () => { + const cfg = reactResolver.extract!('apps/nextjs-pages/next.config.mjs', 'export default {}'); + expect(cfg.nodes.filter((n) => n.kind === 'route')).toHaveLength(0); + const vite = reactResolver.extract!('src/pages/vite.config.ts', 'export default {}'); + expect(vite.nodes.filter((n) => n.kind === 'route')).toHaveLength(0); + // a real page still works + const page = reactResolver.extract!('src/pages/about.tsx', 'export default function About(){return null}'); + expect(page.nodes.filter((n) => n.kind === 'route').map((n) => n.name)).toEqual(['/about']); }); }); @@ -969,7 +1234,7 @@ Route::get('/real', [RealController::class, 'index']); `; const { nodes, references } = laravelResolver.extract!('routes/web.php', src); expect(nodes.map((n) => n.name)).toEqual(['GET /real']); - expect(references.map((r) => r.referenceName)).toEqual(['index']); + expect(references.map((r) => r.referenceName)).toEqual(['RealController@index']); }); it('rails: skips =begin/=end and # commented routes', () => { @@ -982,7 +1247,7 @@ get '/real', to: 'real#index' `; const { nodes, references } = railsResolver.extract!('config/routes.rb', src); expect(nodes.map((n) => n.name)).toEqual(['GET /real']); - expect(references.map((r) => r.referenceName)).toEqual(['index']); + expect(references.map((r) => r.referenceName)).toEqual(['real#index']); }); it('spring: skips // and /* */ commented @GetMapping', () => { @@ -1046,7 +1311,7 @@ public IActionResult ListUsers() { return Ok(); } app.get("real", use: listUsers) `; const { nodes, references } = vaporResolver.extract!('routes.swift', src); - expect(nodes.map((n) => n.name)).toEqual(['GET real']); + expect(nodes.map((n) => n.name)).toEqual(['GET /real']); expect(references.map((r) => r.referenceName)).toEqual(['listUsers']); }); diff --git a/__tests__/mcp-tool-allowlist.test.ts b/__tests__/mcp-tool-allowlist.test.ts new file mode 100644 index 00000000..6f29616d --- /dev/null +++ b/__tests__/mcp-tool-allowlist.test.ts @@ -0,0 +1,58 @@ +/** + * CODEGRAPH_MCP_TOOLS allowlist — lets an operator (or an A/B harness) trim the + * exposed MCP tool surface without touching the client config. Inert when unset. + * Filtering happens in ListTools (getTools) and is enforced again on execute(). + */ +import { describe, it, expect, afterEach } from 'vitest'; +import { ToolHandler } from '../src/mcp/tools'; + +const ENV = 'CODEGRAPH_MCP_TOOLS'; + +describe('CODEGRAPH_MCP_TOOLS allowlist', () => { + const original = process.env[ENV]; + afterEach(() => { + if (original === undefined) delete process.env[ENV]; + else process.env[ENV] = original; + }); + + const listed = () => new ToolHandler(null).getTools().map(t => t.name).sort(); + + it('exposes the full tool surface when unset', () => { + delete process.env[ENV]; + const all = listed(); + expect(all).toContain('codegraph_explore'); + expect(all).toContain('codegraph_context'); + expect(all).toContain('codegraph_trace'); + expect(all.length).toBeGreaterThanOrEqual(10); + }); + + it('filters ListTools to the allowlisted short names', () => { + process.env[ENV] = 'trace,search,node'; + expect(listed()).toEqual(['codegraph_node', 'codegraph_search', 'codegraph_trace']); + }); + + it('accepts fully-qualified codegraph_ names and ignores whitespace', () => { + process.env[ENV] = ' codegraph_trace , search '; + expect(listed()).toEqual(['codegraph_search', 'codegraph_trace']); + }); + + it('treats an empty/whitespace value as unset (full surface)', () => { + process.env[ENV] = ' '; + expect(listed().length).toBeGreaterThanOrEqual(10); + }); + + it('rejects a disabled tool on execute (defense in depth)', async () => { + process.env[ENV] = 'trace'; + const res = await new ToolHandler(null).execute('codegraph_explore', {}); + expect(res.isError).toBe(true); + expect(res.content[0].text).toMatch(/disabled via CODEGRAPH_MCP_TOOLS/); + }); + + it('lets an allowlisted tool past the guard', async () => { + process.env[ENV] = 'search'; + // No CodeGraph attached, so it fails *after* the allowlist guard — the + // "disabled" message must NOT appear, proving the guard passed it through. + const res = await new ToolHandler(null).execute('codegraph_search', { query: 'x' }); + expect(res.content[0].text).not.toMatch(/disabled via CODEGRAPH_MCP_TOOLS/); + }); +}); diff --git a/docs/benchmarks/answer-directly-vs-explore-agent.md b/docs/benchmarks/answer-directly-vs-explore-agent.md new file mode 100644 index 00000000..09167ec1 --- /dev/null +++ b/docs/benchmarks/answer-directly-vs-explore-agent.md @@ -0,0 +1,88 @@ +# Answer directly vs. delegate to an Explore agent (interactive A/B) + +**Question:** Does answering a "how does X work?" question *directly* with CodeGraph in the +main session bloat main-session context — and would Claude Code be better off delegating that +exploration to a disposable **Explore agent** (which keeps main context lean by absorbing the +file reads in a sub-transcript)? And critically: **does the answer change at scale**, on a +codebase far larger than Excalidraw? + +**Short answer:** No. With CodeGraph, main-session context is roughly **scale-invariant (~50k)** +because the retrieval is targeted and the `explore` payload is budget-capped — it does not +balloon on a 16× larger repo. Answering directly wins at **every** scale: same-or-leaner main +context than the delegation path, **zero file reads**, and ~28% fewer tokens. The +delegation-for-hygiene advantage stays marginal even on a large codebase. + +## Methodology + +- **Harness:** interactive Claude Code TUI driven via `scripts/agent-eval/itrun.sh` (tmux), + **not** headless `claude -p`. This matters: headless spawns **0** Explore agents, so it cannot + measure delegation behavior at all; only the interactive TUI does. +- **Arms:** `WITH` = CodeGraph in the MCP config; `WITHOUT` = empty MCP config (`--strict-mcp-config`). +- **Model:** `opus`. **n = 3 runs per arm.** Main **and** sub-agent transcripts parsed + (`scripts/agent-eval/parse-session.mjs`); reads/bash are summed across main + sub-agents. +- **Repos:** Excalidraw (643 files, medium) and VS Code (~10.7k files, large — ~16× Excalidraw). +- **Build:** 0.9.4. **Date:** 2026-05-24. +- "main-session context" is the TUI's reported `Context X/Y` for the *main* thread (sub-agent + context does not count against it). "billable tokens" = summed per-turn assistant usage + (input + output + cache read + cache creation). + +## Excalidraw (643 files, medium) + +Question: *"How does Excalidraw render and update canvas elements?"* + +| metric | WITH codegraph | WITHOUT | +|---|---|---| +| Explore agents spawned | 0 / 0 / 0 | 0 / 1 / 1 (delegated 2 of 3) | +| main-session context | 51k / 49k / 50k (~50k) | 48k / 34k / 26k (~36k) | +| total tool calls | 4 / 4 / 4 | 16 / 55 / 37 | +| Reads (main+sub) | 0 / 0 / 0 | 6 / 25 / 16 | +| billable tokens | ~127k | ~175k | + +## VS Code (~10.7k files, large — ~16× Excalidraw) + +Question: *"How does the extension host communicate with the main process?"* + +| metric | WITH codegraph | WITHOUT | +|---|---|---| +| main-session context | 47k / 43k / 50k (~47k) | 54k / 29k / 31k (~38k) | +| Explore agents | 0 / 0 / 0 | 0 / 1 / 1 (delegated 2/3) | +| codegraph calls | ~8 (search + explore×2–3 + context) | 0 | +| Reads (main+sub) | 0 / 1 / 0 | 6 / 26 / 19 | +| billable tokens | ~126k | ~176k | + +## Findings + +**Main-session context is scale-invariant with CodeGraph.** With codegraph, main-session +context was **~47k on VS Code — essentially identical to Excalidraw's ~50k**, despite a 16× +bigger repo. It didn't balloon. Reason: codegraph's `explore` payload is **budget-capped** and +retrieval is **targeted** — answering one question pulls in the relevant *flow/area*, not more +just because the repo is huge. So codegraph makes main-session context roughly scale-invariant +(~50k). The delegation-for-hygiene advantage stays marginal even on a large codebase — exactly +the opposite of "it gets significant at scale." + +The thing that *would* balloon at scale is reading many big files directly into main — and +Claude Code avoids that **without** codegraph by delegating to an Explore agent (29–31k main), +but at the cost of **17–26 reads** and ~28% more tokens. CodeGraph keeps main lean a *better* +way: a capped, targeted payload — no delegation, **0 reads**. + +**On "the Explore agents use codegraph."** I couldn't reproduce it: across **6/6** +with-codegraph runs (both repos), Claude Code **never delegated** — it answered directly every +time. The Explore-agent path only appeared in the `without` arm (using grep/read, since codegraph +wasn't in that config). So with the current instructions + codegraph present, Claude Code stays +in the main session — the lean-main-via-Explore-agent best case simply isn't what happens; +lean-main-via-capped-codegraph is, and it's cheaper. + +## Verdict + +**"Answer directly with codegraph" wins for Claude Code too — at every scale.** No per-agent +split is needed; the unified "answer directly" instruction is right for Claude Code *and* for +Codex / Cursor / opencode (which have no Explore-agent mechanism and would otherwise read files +directly). This conclusion drove updating the README's `## CodeGraph` example block, which +previously told agents to "NEVER call `codegraph_explore` directly / ALWAYS spawn an Explore +agent" — i.e., it steered Claude Code toward the *worse* (17–26 read, ~28%-more-token) path. + +**Caveat / future work (not a blocker):** an Explore agent that *itself uses codegraph* could in +principle get lean-main *and* low-work. But the "answer directly" instruction prevents delegation +in practice (0 delegations observed across 6 runs), the main-context gain would be marginal +(~50k → ~30k, both a few percent of a 1M window), and it adds a sub-agent round-trip. Worth a +future experiment, not a default. diff --git a/docs/benchmarks/call-sequence-analysis.md b/docs/benchmarks/call-sequence-analysis.md new file mode 100644 index 00000000..3c79bad5 --- /dev/null +++ b/docs/benchmarks/call-sequence-analysis.md @@ -0,0 +1,426 @@ +# Call-sequence analysis — why read savings don't convert to wall-clock + +**Date:** 2026-05-23 · **Branch:** `architectural-improvements` · **Source data:** the surviving +stream-json logs from the A/B matrix (`/tmp/ab-matrix//run-headless-{with,without}.jsonl`, +37 cells × 2 arms). Re-mined — **no re-runs** — with `scripts/agent-eval/seq-matrix.mjs`. + +## Why this exists + +The [A/B matrix](codegraph-ab-matrix.md) showed codegraph cuts **reads 75%** but **wall-clock only +~16%**, and 63% of the wall-clock win comes from just 3 large-repo cells. Reads are at the floor +(~0), so the remaining wall-clock is **round-trips + the synthesis turn** — neither of which read +count can explain. The matrix records tool *counts*, not the call **sequence** or per-call +**payload size**. This analysis recovers both, to find where the wall-clock actually goes. + +## TL;DR — the bottleneck is trace ADOPTION, not trace completeness + +1. **Trace is called in 3 of 37 cells** — even though every question is a canonical flow question + ("trace the controller → service → repository", "how does X reach Y"). The agent overwhelmingly + reaches for **`context → search → search → explore`** instead — the exact path-reconstruction + anti-pattern the instructions tell it to avoid. +2. **`explore` averages 17.9K chars/call; `trace` averages 0.8K** — a **22× payload difference**. + The path-scoped tool that solves the small-repo-bloat problem exists and is tiny. It's just not + being invoked. +3. **Small repos still get bloated payloads** because of the explore-default: a **6-file** repo + (`flutter_module_books`) pulls **17.4K**; a 10-file repo pulls 18.0K. This is precisely the + "too much context on small codebases" failure mode — happening right now, via explore. +4. **Round-trips are 25% fewer with codegraph (283 vs 375 turns)** but wall-clock is only 16% + faster — because the with-arm's turns each carry a ~18K explore payload, inflating TTFT and + eroding the turn savings. +5. **Root cause:** `src/mcp/server-instructions.ts` leads with *"answer directly … `codegraph_context` + first, then ONE `codegraph_explore`"* as the headline pattern. The trace-first guidance is buried + in a table + a chain list below it. Agents anchor on the prominent headline → context→explore. + +**Decision:** the next experiment is **trace-first steering / adoption**, not enriching trace. We +can't evaluate trace's completeness when it's used 3/37 times. Get adoption up first, then measure +whether the residual `node`/`explore` follow-ups need a richer trace. + +## Finding 1 — trace adoption: 3/37 + +| metric | value | +|---|---| +| flow-question cells | 37 (all of them) | +| cells that called `codegraph_trace` | **3** (`cpp-leveldb`, `excalidraw`, `c-redis`) | +| dominant pattern instead | `context` → `search`×N → `explore` | + +The 3 trace cells, and what followed the trace call: + +| repo | files | cg sequence | turns (with/without) | +|---|--:|---|---| +| cpp-leveldb | 134 | `trace, node, node` | 5 / 8 | +| excalidraw | 643 | `context, trace, trace, explore` | 6 / **19** | +| c-redis | 884 | `context, trace, explore, node` | 10 / 15 | + +Even when trace *is* used, the agent follows it with `node`/`explore` to fetch bodies — so a +secondary lever (after adoption) is making one trace call self-sufficient enough to kill those +follow-ups. But that's step 2. + +## Finding 2 — payload size: path-scoped trace (0.8K) vs breadth-scoped explore (17.9K) + +Across all cells, per codegraph tool — call count and **average payload per call**: + +| tool | calls | avg/call | total | +|---|--:|--:|--:| +| `explore` | 32 | **17.9K** | 573K | +| `context` | 36 | 4.3K | 156K | +| `search` | 39 | 1.3K | 50K | +| `files` | 5 | 3.4K | 17K | +| `node` | 19 | 2.0K | 38K | +| `trace` | 4 | **0.8K** | 3.4K | + +`context` (used in 36/37 cells) is the default opener; `explore` is the default closer. Together +they are the ~22K breadth dump. `trace` — the tool that would replace that with the actual path — +is 22× smaller and barely used. This is the user's premise confirmed in numbers: explore is +breadth-scoped (returns the neighborhood), trace is path-scoped (returns the line). + +## Finding 3 — payload grows with repo size, and over-returns on small repos + +With-arm **total** codegraph payload by repo-size tier: + +| tier | cells | avg total payload | range | +|---|--:|--:|--:| +| S (<200 files) | 19 | 12.7K | 3.0–31.2K | +| M (<2000) | 9 | 32.4K | 5.4–58.2K | +| L (≥2000) | 9 | 34.0K | 20.2–43.1K | + +The small-repo waste is concrete — these all have a 2–3 file flow but pull a full neighborhood: + +| repo | files | with-arm payload | sequence | +|---|--:|--:|---| +| flutter_module_books | 6 | 17.4K | `context, explore` | +| computer-database | 10 | 18.0K | `context, search, status, explore` | +| aspnet-realworld | 78 | 22.2K | `context, explore` | +| django-realworld | 44 | 14.8K | `context, explore` | + +`explore`'s per-call budget is already adaptive (#185), but it doesn't help here because the agent +isn't choosing the path-scoped tool — it's choosing breadth. + +## Finding 4 — round-trips, and the ToolSearch tax + +| metric | with | without | +|---|--:|--:| +| total turns (37 cells) | 283 | 375 | +| avg turns / cell | 7.6 | 10.1 | + +25% fewer turns, but only ~16% faster wall-clock — the gap is the per-turn cost of the big explore +payloads. Also: **every with-arm run opens with a `ToolSearch` round-trip** (MCP tools are deferred +in this harness), a fixed 1-turn tax before any codegraph call. Worth confirming whether the +production install defers codegraph tools the same way. + +## Conclusion → the experiment to run next + +Measure-first changed the plan. The hypothesis was "enrich trace so one call is self-sufficient." +The data says trace is **used 3/37 times**, so completeness is moot until adoption is fixed. + +**Experiment: trace-first steering A/B.** +- **Change:** rewrite the `server-instructions.ts` headline so a *flow* question (how does X reach Y + / trace / from→to) routes to `codegraph_trace` **first**, demoting the context→explore pattern to + non-flow/onboarding questions. Mirror into `instructions-template.ts` + `.cursor/rules/codegraph.mdc`. +- **Metric:** trace-adoption rate (target ≫ 3/37), with-arm total payload (expect ↓ sharply, + especially small repos), turns (expect ↓), wall-clock (expect the 16% gap to widen toward the + 25% turn gap as 18K explore payloads are replaced by <1K traces). +- **Control:** a non-flow "what's the deal with module X" question must still go context→explore — + don't over-steer everything to trace. +- **Then, step 2:** with adoption up, measure the `node`/`explore` follow-ups after trace + (cpp-leveldb/excalidraw/c-redis all had them). If they're frequent, enrich trace (per-hop body + snippet, capped per hop) so one trace call ends the flow investigation. + +## Reproduce + +```bash +node scripts/agent-eval/seq-matrix.mjs # regenerates every table above from /tmp/ab-matrix +``` + +--- + +# Ablation experiment — do `context`, `explore`, and `trace` compete? Is `trace` enough? + +**Date:** 2026-05-23 · 52 runs, ~$20. Tool surface trimmed **server-side** via the new +`CODEGRAPH_MCP_TOOLS` allowlist (so an ablated tool is genuinely absent from ListTools, not +denied-on-call); trace-first steering injected with `--append-system-prompt`. 6 repos (2 S / 2 M / +2 L) × 2 runs; arm E is a **non-flow** survey question on 2 repos. Driver `arms-matrix.sh`, +analysis `parse-arms.mjs`. + +| arm | tools | steering | adoption | reads | cgOut | turns | dur | +|---|---|---|--:|--:|--:|--:|--:| +| **A** control | all | none | 2/12 | 1.25 | 28.8K | 7.6 | 38s | +| **B** steer | all | trace-first | **8/12** | 1.00 | **32.0K** | 7.9 | 43s | +| **C** no-explore | hide explore | trace-first | 8/12 | **2.08** | **9.2K** | 9.0 | 44s | +| **D** trace-centric | hide explore+context | trace-first | 8/12 | 2.00 | 6.6K | 10.5 | 46s | +| **E** control-probe | hide explore+context | trace-first | 0/4 | 2.50 | 27.8K | **20.0** | **72s** | + +## What it says + +1. **Steering works for adoption, not for payload.** B lifted trace use **2/12 → 8/12** (and 4/4 on + the genuinely path-shaped questions — the 2 non-adopters, flutter "what widgets" and vapor "name + the route", aren't from→to questions). But B's payload (32.0K) is *bigger* than control (28.8K) + and it's slightly slower — because the agent calls trace **and still calls explore**. Steering + adds a trace hop without displacing the explore dump. +2. **`explore` is the payload, and it's load-bearing — but 3–5× too heavy.** Removing it (C) cuts + payload **71%** (32K→9.2K) — confirming it's the bloat. But reads **double** (1.0→2.1) and turns + rise: the agent Reads files to recover the bodies explore had inlined. So explore isn't + redundant; it's the only one-call body-supplier, just delivered with a 32K sledgehammer. +3. **`context` is the most redundant of the three — as a body-supplier.** Removing it on top of + explore (D vs C) left reads flat (2.08→2.00) but raised turns (9.0→10.5). It supplies no unique + bodies; it earns its keep only as a round-trip-saver (the composed orient call). +4. **Removing tools makes flow questions SLOWER, not faster.** Turns climb monotonically + A→D (7.6→10.5) and duration with them — the Read + trace-follow-up round-trips cost more + wall-clock than the saved payload. Leaner payload ≠ faster. +5. **`trace` is definitively NOT sufficient.** The non-flow probe (E) thrashed without the survey + tools — **20 turns, 72s** reconstructing an overview from search/node/files. Survey questions + need a survey tool; trace can't substitute. + +## Verdict on the three design questions + +- **Do we need all three?** Yes — but for different reasons. trace = flow tool (real, under-adopted). + explore = the one-call body-supplier (load-bearing, over-heavy). context = round-trip-saving + opener (redundant for bodies, useful for orientation). +- **Are they competing?** Yes: explore competes with trace and *wins by default* — even when steered, + the agent traces **and** explores, so the payload win never lands until explore is displaced. +- **Could trace be all we need?** No. E rules it out for non-flow questions; C/D rule it out even + for flow (reads double without explore's bodies). + +**Three cheap fixes are now ruled out by data:** "trace is all we need" (false), "just steer to +trace" (B: slower + bigger than control), and "remove explore" (C/D: more reads/turns, slower). + +## The fix the data points to → next experiment + +The only path that wins: **make `trace` self-sufficient by inlining per-hop bodies** (capped per +hop → still path-scoped) so one trace call supplies what explore does *and* what the Read fallback +recovers — displacing both for flow questions. Keep **one** survey tool (context; demote explore to +deep-survey, not the flow default) for the non-flow class E proved is load-bearing. + +- **Experiment:** enriched body-inlining `trace` + steering vs control. +- **Target:** C/D's lean payload (~7–9K, not 32K) **without** C/D's extra reads/turns, and **beat A + on wall-clock** (the bar B/C/D all failed). +- **Metric:** payload, reads (must stay ≈ A's ~1.0, not rise to 2.0), turns, duration. + +## Reproduce (ablation) + +```bash +bash scripts/agent-eval/arms-matrix.sh # 52 runs into /tmp/arms (RUNS=2 default) +node scripts/agent-eval/parse-arms.mjs # the arm-comparison tables above +``` + +--- + +# Validation — body-inlining trace (arm F) + +The ablation pointed to one fix: make `trace` self-sufficient by inlining per-hop **bodies** +(capped per hop → still path-scoped) so one trace call displaces both the explore dump and the +Read fallback. Implemented in `handleTrace` (`sourceRangeAt`, 28 lines / 1200 chars per hop, with a +`… (+N more lines)` marker). Arm **F** = arm B's surface (all tools + trace-first steering) run on +the body-inlining build, so **F vs B isolates the enrichment**. + +| arm | adoption | reads | cgOut | turns | dur | cost | +|---|--:|--:|--:|--:|--:|--:| +| A all/none | 2/12 | 1.25 | 28.8K | 7.6 | 38s | $0.390 | +| B all/steer (thin trace) | 8/12 | 1.00 | 32.0K | 7.9 | 43s | $0.411 | +| **F all/steer (body trace)** | 5/12 | **1.17** | **25.1K** | **6.8** | **37s** | **$0.348** | +| C no-explore | 8/12 | 2.08 | 9.2K | 9.0 | 44s | $0.356 | +| D trace-centric | 8/12 | 2.00 | 6.6K | 10.5 | 46s | $0.368 | + +**F is the best-balanced arm:** lowest turns (6.8), fastest (37s), cheapest, payload leaner than +A/B — and it hits the target the ablation set: **C/D-class efficiency without C/D's Read penalty** +(F reads 1.17 vs C/D's ~2.0). It gets there not by *removing* a tool but by giving the agent a +complete trace so it *stops early*. + +**The win is clearest where trace connects** — excalidraw (the validated 6-hop path): + +| arm | sequence | turns | reads | dur | +|---|---|--:|--:|--:| +| B (thin) | `trace → context → explore → Grep → Read` | 7 | 1 | 47s | +| **F (body) r1** | `trace → context` | **4** | **0** | **31s** | +| F (body) r2 | `trace → trace → explore` | 5 | 0 | 42s | + +The body-trace ended the investigation in `trace → context` (run 1) — 0 reads, 0 grep, 0 explore. + +**Connectivity is the cap.** On flows that break at *unbridged* dynamic dispatch — aspnet-realworld +(MediatR `_mediator.Send → Handle`), vapor-spi (closure routing) — trace returns "no path" and the +agent falls back to explore, so F ≈ B (no regression, no gain). F's aggregate lift is therefore +**gated by dynamic-dispatch coverage**: the more flows the graph connects end-to-end, the more often +the self-sufficient trace fires. (n=2/arm — adoption and per-repo numbers are noisy; excalidraw and +spring-halo, the connecting repos, are 2/2 trace in both B and F.) + +## Verdict & ship list + +1. **Ship the body-inlining trace** — strict improvement (best-balanced arm; clean 0-read/4-turn win + on connecting traces; no regression on non-connecting ones). +2. **Strengthen the steering.** Arm A (shipped server-instructions, which *already* say "trace first + for flow") adopted trace only 2/12 — the guidance is too buried. The explicit + `--append-system-prompt` used in B–F lifted it. Port that into `server-instructions.ts` + + `instructions-template.ts` + `.cursor/rules/codegraph.mdc` (house rule: all three together), + flow-gated so non-flow survey questions still go context/explore (arm E proved they must). +3. **Next frontier to widen F's reach:** bridge more dynamic dispatch (MediatR/.NET, Vapor routing) — + every newly-connected flow converts an F≈B repo into an F-win repo. + +## Reproduce (arm F) + +```bash +bash scripts/agent-eval/arms-F.sh # 12 runs (RUNS=2); needs the body-inlining build +node scripts/agent-eval/parse-arms.mjs # F appears alongside A/B/C/D/E +``` + +--- + +# Steering port — the negative result (arm G) + +F's win used `--append-system-prompt`, which real users don't get. Arm **G** = arm A's invocation +(NO append-prompt) on a build where the steering was ported into the production channels +(`server-instructions.ts` + the `context`/`trace` tool descriptions + `instructions-template.ts` + +`.cursor/rules`). Three wording iterations, 12 runs each: + +| arm | adoption | reads | payload | turns | dur | +|---|--:|--:|--:|--:|--:| +| A (shipped instructions) | 2/12 | 1.25 | 28.8K | 7.6 | **38s** | +| F (body-trace + append-prompt) | 5/12 | **1.17** | 25.1K | 6.8 | **37s** | +| G v1 — anti-explore wording | 6/12 | 2.08 | 13.8K | 8.8 | 46s | +| G v2 — restore explore as fallback | 6/12 | 1.67 | 22.0K | 7.8 | 46s | +| G v3 — restore context as opener | 6/12 | 2.08 | 11.7K | 8.9 | 46s | + +**Production-instruction steering does not reproduce F, and regresses the A baseline.** All three G +variants pin at **~46s** (slower than A's 38s and F's 37s) with reads at 1.7–2.1 (vs A 1.25, F 1.17). +Wording only shuffled the slack between Read and explore — v1 suppressed explore → Read; v2/v3 +restored explore → over-investigation — never landing F's lean `trace → context`. + +**Two root causes:** +1. **Salience.** The same trace-first wording works as a top-of-prompt `--append-system-prompt` (F) + but not as an MCP `initialize` instruction / tool description (G). An MCP server has no + higher-salience channel — this is an architectural limit, not a wording bug. +2. **Forcing trace-first backfires where trace doesn't connect.** Steering pushed trace onto + MediatR (`_mediator.Send`) and Spring interface-DI (`@Autowired` iface → impl) flows, where trace + returns no-path; the forced trace is then a wasted round-trip *before* the fallback → slower. + The **unsteered** agent (A) is better-calibrated: it traces only when trace will obviously + connect (2/12) and explores otherwise. + +## Arm H — body-trace alone (the ship candidate) regresses + +The clean ship test: body-inlining trace + ORIGINAL instructions + no steering (= A's invocation, +only the trace *tool* changed). H vs A isolates the body-trace feature with nothing else moving. + +| arm | adoption | reads | payload | turns | dur | +|---|--:|--:|--:|--:|--:| +| A (no body-trace) | 2/12 | 1.25 | 28.8K | 7.6 | **38s** | +| H (body-trace, no steering) | 3/12 | 1.50 | 29.7K | 8.0 | **45s** | +| F (body-trace + append-prompt) | 5/12 | 1.17 | 25.1K | 6.8 | 37s | + +**Body-trace alone does NOT beat A — it mildly regresses** (45s vs 38s). The sequences show why: +unsteered, the agent treats trace as just one more call in its usual loop — excalidraw H was +`context → trace → explore → node×3 → Grep → Read` (77s) — so the bigger body-trace payload is pure +added cost, not offset by fewer follow-ups. The body-trace only pays off when the agent **leads with +trace and stops after it**, which only the append-prompt (F) achieved. + +## Final verdict + +The body-inlining trace is a real win (F) but its value is **entirely contingent on +lead-with-and-stop-after-trace steering we cannot deliver through any production MCP channel** +(append-prompt salience ≫ server-instructions / tool-descriptions; G failed three times). On its own +(H) it regresses. So: + +- **SHIP: the `CODEGRAPH_MCP_TOOLS` allowlist** — independent, clean, validated. +- **DON'T ship the body-inlining trace or the steering as-is** — measured neutral-to-negative + without a steering channel we don't have. +- **The real lever is connectivity, not steering** — trace earns its keep only when flows connect + end-to-end; dynamic-dispatch synthesizers (MediatR/.NET, Spring interface-DI, Vapor closures) help + the *unsteered* agent, which already traces when trace will connect. +- **One untested lever** to rescue the body-trace: steer via the trace tool's OWN OUTPUT (the + highest-salience channel — the agent reads it fresh, right at the decision point) with a strong + leading "complete flow — answer from this, don't explore" banner. Instructions/descriptions are + too far from the action; the tool result is not. Unproven; the only remaining shot at making the + body-trace pay off in production. + +measure-first paid off three times: it killed three cheap fixes in the ablation, stopped a steering +change that would have shipped an ~8s/query regression (G), and stopped shipping the body-trace +itself on a confounded assumption (H showed it needs steering we can't deliver). + +## Reproduce (arm G) + +```bash +ARM=G bash scripts/agent-eval/arms-F.sh # production-instruction steering, no append-prompt +node scripts/agent-eval/parse-arms.mjs +``` + +--- + +# Arm I — sufficiency, not steering (the shippable win) + +An LLM stops investigating when its context is *sufficient*, not when it's told to stop. So arm I +makes the trace OUTPUT complete instead of steering — same invocation as H (original instructions, +**no steering**), only the trace tool changed: +1. **Hop bodies no longer clipped** at 28 lines (that clip is why H re-fetched `mutateElement`). +2. **The destination's own callees are inlined** — the "last mile" the agent otherwise explores/Reads + for (excalidraw: `renderStaticScene → _renderStaticScene / renderStaticSceneThrottled`). + +| arm | adoption | reads | greps | payload | turns | dur | cost | +|---|--:|--:|--:|--:|--:|--:|--:| +| A baseline | 2/12 | 1.25 | 1.17 | 28.8K | 7.6 | 38s | $0.390 | +| H body-trace alone | 3/12 | 1.50 | 0.42 | 29.7K | 8.0 | 45s | $0.398 | +| **I body-trace + dest callees** | 2/12 | **1.17** | **0.25** | 27.2K | **7.0** | 39s | **$0.359** | +| F body-trace + append-steer | 5/12 | 1.17 | 0.17 | 25.1K | 6.8 | 37s | $0.348 | + +**I ≥ A on every axis** (reads, greps, turns, cost down; wall-clock flat) and **≈ F on outcomes with +zero steering** — despite *lower* trace adoption (2/12 vs F's 5/12). The destination-callees fix +turned the body-trace from a net-negative (H, 45s) into a net-positive (I, 39s): one richer trace +call now displaces the explore+node+Read follow-ups it used to trigger. excalidraw I-r2 was +`context → trace → explore` — **0 reads, 5 turns**, stopped because the data was present. The residual +reads (I-r1) are the `canvasNonce` data-flow — the def-use frontier the graph deliberately omits. + +This confirms the thesis: **completeness stops the agent; steering doesn't.** Every steering arm +(B/F append-prompt, G instructions) was either unshippable or a regression; the sufficiency arm (I) +ships and needs no steering. + +## Revised final verdict (supersedes the arm-G/H verdict above) + +- **SHIP: body-inlining trace + destination callees** (arm I) — ≥ A on all axes, no steering, no + regression; makes the self-sufficient-trace property real (one trace call answers the flow). +- **SHIP: the `CODEGRAPH_MCP_TOOLS` allowlist** — independent, validated. +- **DON'T ship steering** (instructions or tool descriptions) — three variants regressed; MCP can't + deliver append-prompt salience, and forcing trace where it doesn't connect backfires. +- **Connectivity is the multiplier** — arm I helps most where the trace connects; MediatR/.NET, + Spring interface-DI, and Vapor closures are the next synthesizers, and they help the *unsteered* + agent (which already traces when trace will connect). + +## Reproduce (arm I) + +```bash +ARM=I bash scripts/agent-eval/arms-F.sh # body-trace + destination callees, no steering +node scripts/agent-eval/parse-arms.mjs +``` + +--- + +# Current-build with/without A/B — the 7 README repos (2026-05-24) + +Re-ran the published README benchmark on the **current build** (all 7 repos freshly reindexed), +same queries, **median of 4 runs/arm** (headless: codegraph-only MCP vs empty MCP): + +| repo | time with→without | tools w→wo | tokens w→wo (saved) | cost w→wo (saved) | +|---|---|--:|--:|--:| +| vscode | 1m10s→2m26s | 8→55 | 601k→2.8M (78%) | $0.60→$0.80 (26%) | +| excalidraw | 48s→2m58s | 3→79 | 344k→3.5M (90%) | $0.43→$0.90 (52%) | +| django | 1m19s→1m38s | 9→19 | 739k→1.2M (36%) | $0.59→$0.67 (12%) | +| tokio | 53s→3m2s | 4→53 | 379k→2.6M (86%) | $0.42→$2.41 (82%) | +| okhttp | 42s→1m1s | 6→11 | 636k→730k (13%) | $0.47→$0.47 (2%) | +| gin | 44s→1m0s | 6→10 | 444k→675k (34%) | $0.37→$0.47 (21%) | +| alamofire | 1m17s→2m27s | 12→69 | 1.0M→2.8M (64%) | $0.61→$1.14 (47%) | + +**Average saved: 35% cost · 57% tokens · 46% time · 71% tool calls** — reproduces the published +README headline (35% / 59% / 49% / 70%); the current build holds the benchmark with no regression. + +**Cost is lower, not "flat"** (corrects the earlier note). But the **mechanism is volume, not +cache-ability**: codegraph answers in far fewer turns over a much smaller accumulated context, while +the without-arm fans out across many more turns (55–79 tool calls on the big repos), each +re-processing a large, growing context. The without-arm's token volume is *mostly* cheap cache-reads, +which is why **token-count savings (57%) look bigger than cost savings (35%)**. Per-repo margin tracks +how hard the without-arm thrashes that run (tokio blew up to $2.41/3m; django thrashed less). + +**Measurement gotcha:** `result.usage` in this Claude Code version is the **last turn only**, not +cumulative — using it under-counts tokens badly (an earlier excalidraw cut reported "−34% tokens" +off this bug; the real figure is ~90%). Sum **per-turn assistant `usage`** for the true total. +`total_cost_usd` and `duration_ms` are already cumulative/correct. + +Reproduce: +```bash +bash scripts/agent-eval/bench-readme.sh # 7 repos × with/without × 4 runs (RUNS=4) → /tmp/ab-readme +node scripts/agent-eval/parse-bench-readme.mjs # medians + % saved (summed per-turn tokens) +``` diff --git a/docs/benchmarks/codegraph-ab-matrix.md b/docs/benchmarks/codegraph-ab-matrix.md new file mode 100644 index 00000000..db9d0370 --- /dev/null +++ b/docs/benchmarks/codegraph-ab-matrix.md @@ -0,0 +1,125 @@ +# CodeGraph A/B benchmark — with vs without, every language × S/M/L + +**Date:** 2026-05-24 · **Branch:** `main` · **codegraph 0.9.4** + +A headless agent (Claude Opus, `--permission-mode bypassPermissions`) answers one +**canonical flow question** per repo — twice: **with** the codegraph MCP server, and +**without** any MCP (built-in Read/Grep/Glob/Bash only). Same model, same prompt; codegraph +is the only variable. Each cell was **re-indexed fresh** first (against a `dist/` build of the +current `main` HEAD), so the "with" arm reflects the shipped 0.9.4 resolvers. + +## Headline + +**Across 37 cells, codegraph cut total file reads from 159 → 38 — 76% fewer.** It never +*increased* reads in any cell (0 regressions). The mechanism: a few sub-millisecond codegraph +calls replace a read-and-grep exploration. + +**Cost stays roughly flat — marginally higher on the with-arm here** (summed across the 37 +cells: with `$15.4` vs without `$13.8`). On these short single-flow questions the without-arm +resolves in <10 calls and never balloons, so it doesn't reach the regime where codegraph's cost +savings compound, while the with-arm pays fixed MCP overhead (tool definitions in context + +tool-loading) that short tasks don't amortize. The win is **fewer tool calls (189 vs 321, −41%) ++ lower wall-clock** (mean **38s vs 48s**), which is the design target. On harder multi-turn +investigations cost flips to a net saving as the without-arm's accumulated context balloons — +see `docs/benchmarks/call-sequence-analysis.md`. + +The gap widens with repo size and flow complexity: on medium/large repos the without-codegraph +arm often **thrashes** — many greps/globs, shell `find`/`grep` (Bash), and occasionally spawning +a **sub-agent** — while the with-codegraph arm answers in 2–8 calls. On tiny repos (a handful of +files) the two arms tie or codegraph is marginally slower (MCP/index overhead doesn't pay off +when the whole flow fits in one or two files) — but reads still drop. + +## How to read the table + +- **R / G / Gl / B / Ag** = Read / Grep / Glob / Bash / sub-agent (Task) tool calls. +- **cg-calls** = codegraph MCP calls in the "with" arm (the trade for reads/greps). +- **dur** = wall-clock seconds. **files** = indexed file count (the size proxy). +- **reads saved** = without-reads − with-reads. +- One run per arm (a **snapshot** — run-to-run variance is real; treat ±1–2 reads and ±10s as + noise, look at the pattern across cells). 2-runs/arm headline numbers for several of these flows + live in `docs/design/dynamic-dispatch-coverage-playbook.md` §7. + +## Results + +| Language | Size | Repo | files | **with** R/G | cg-calls | dur | **without** R/G | dur | reads saved | +|---|---|---|--:|---|--:|--:|---|--:|--:| +| C | L | `c-redis` | 884 | 0R / 2G | 4 | 42s | 5R / 6G | 51s | 5 | +| C# | S | `aspnet-realworld` | 78 | 0R / 0G | 2 | 27s | 5R / 3G / 2Gl | 54s | 5 | +| C# | M | `aspnet-eshop` | 262 | 0R / 1G | 5 | 39s | 9R / 2G / 5Gl | 58s | 9 | +| C# | L | `aspnet-jellyfin` | 2081 | 3R / 0G | 4 | 51s | 17R / 1G / 2Gl / 17B / 1Ag | 212s | 14 | +| C++ | M | `cpp-leveldb` | 134 | 0R / 0G | 3 | 26s | 4R / 2G | 37s | 4 | +| Dart | S | `flutter_module_books` | 6 | 1R / 0G | 2 | 24s | 2R / 0G / 1Gl | 29s | 1 | +| Dart | M | `compass_app` | 212 | 2R / 0G / 1Gl | 2 | 42s | 3R / 0G / 2Gl | 30s | 1 | +| Go | S | `gin-realworld` | 21 | 0R / 0G | 5 | 35s | 4R / 3G / 1Gl | 57s | 4 | +| Go | M | `gin-vueadmin` | 625 | 1R / 1G | 4 | 47s | 3R / 3G / 1Gl | 44s | 2 | +| Go | L | `gin-gitness` | 4438 | 4R / 3G | 4 | 64s | 8R / 7G / 2Gl | 57s | 4 | +| Java | S | `spring-realworld` | 117 | 2R / 0G | 3 | 35s | 8R / 1G / 5B | 57s | 6 | +| Java | M | `spring-mall` | 536 | 1R / 0G | 5 | 39s | 2R / 4G / 2Gl | 49s | 1 | +| Java | L | `spring-halo` | 2444 | 1R / 2G | 8 | 60s | 4R / 1G / 6B | 52s | 3 | +| Kotlin | S | `kotlin-petclinic` | 43 | 0R / 0G | 2 | 37s | 3R / 0G / 1Gl | 23s | 3 | +| Kotlin | M | `Jetcaster` | 166 | 1R / 0G | 3 | 36s | 1R / 0G / 2Gl | 46s | 0 | +| Lua | S | `lualine.nvim` | 123 | 1R / 1G | 4 | 48s | 4R / 0G / 2Gl | 49s | 3 | +| Lua | M | `telescope.nvim` | 84 | 0R / 0G | 1 | 15s | 1R / 0G / 1Gl | 20s | 1 | +| Luau | S | `Knit` | 11 | 0R / 0G | 2 | 30s | 5R / 0G / 2Gl | 37s | 5 | +| PHP | S | `laravel-realworld` | 114 | 1R / 0G | 6 | 40s | 5R / 1G / 3Gl | 39s | 4 | +| PHP | M | `laravel-firefly` | 2047 | 2R / 1G | 4 | 47s | 4R / 5G / 3Gl | 75s | 2 | +| PHP | L | `laravel-bookstack` | 2160 | 1R / 2G | 2 | 41s | 2R / 4G / 1Gl | 50s | 1 | +| Python | S | `django-realworld` | 44 | 2R / 1G | 2 | 47s | 9R / 0G / 1B | 38s | 7 | +| Python | M | `django-wagtail` | 1672 | 2R / 0G | 4 | 45s | 8R / 3G / 3Gl / 1B | 66s | 6 | +| Python | L | `django-saleor` | 4429 | 2R / 2G | 4 | 52s | 4R / 6G / 1Gl | 64s | 2 | +| Ruby | S | `rails-realworld` | 59 | 0R / 0G | 2 | 30s | 3R / 0G / 2B | 33s | 3 | +| Ruby | M | `rails-spree` | 2905 | 2R / 3G / 1Gl | 5 | 43s | 3R / 3G / 2Gl / 1B | 55s | 1 | +| Ruby | L | `rails-forem` | 4658 | 3R / 1G | 3 | 43s | 4R / 2G / 3Gl | 48s | 1 | +| Rust | S | `rust-axum-realworld` | 13 | 0R / 0G | 2 | 21s | 3R / 0G / 1Gl | 38s | 3 | +| Rust | M | `rust-actix-examples` | 176 | 0R / 1G | 3 | 42s | 3R / 0G / 3B | 36s | 3 | +| Rust | L | `rust-cratesio` | 1053 | 1R / 0G | 3 | 22s | 1R / 2G | 18s | 0 | +| Scala | S | `computer-database` | 10 | 1R / 0G | 2 | 27s | 3R / 0G / 1Gl | 25s | 2 | +| Swift | S | `vapor-template` | 14 | 0R / 0G | 2 | 21s | 2R / 0G / 2Gl | 22s | 2 | +| Swift | M | `vapor-steampress` | 100 | 0R / 0G | 5 | 49s | 3R / 1G / 2Gl | 39s | 3 | +| Swift | L | `vapor-spi` | 542 | 1R / 1G | 4 | 27s | 2R / 5G | 34s | 1 | +| TypeScript/JS | S | `express-realworld` | 39 | 1R / 0G | 1 | 25s | 2R / 2G | 19s | 1 | +| TypeScript/JS | M | `excalidraw` | 643 | 1R / 0G | 3 | 55s | 7R / 5G / 3Gl / 1B | 87s | 6 | +| TypeScript/JS | L | `nest-immich` | 2759 | 1R / 0G | 7 | 50s | 3R / 0G / 1Gl | 44s | 2 | + +**Totals (37 cells):** with codegraph **38 reads / 22 greps**, without **159 reads / 72 greps** — +**76% fewer reads, ~69% fewer greps.** Codegraph never increased reads in any cell, and the +without-arm additionally ran **52 globs + 37 shell `find`/`grep` (Bash) + 1 sub-agent** that the +with-arm (**0 Bash, 0 sub-agents**) never needed. (74 agent runs, $29.18 total.) + +## Observations + +- **Biggest wins are medium/large backends with a real route→handler→service flow:** aspnet-jellyfin + (3R / 51s vs **17R + 17 Bash + a spawned sub-agent / 212s** — the single most dramatic cell), + aspnet-eshop (0R vs 9R), django-realworld (2R vs 9R), spring-realworld (2R vs 8R + 5 Bash), + django-wagtail (2R vs 8R), excalidraw (1R / 55s vs 7R / 87s), Luau Knit (0R vs 5R), aspnet-realworld + (0R vs 5R), c-redis (0R vs 5R). +- **Without codegraph, large repos make the agent thrash:** it falls back to shell `find`/`grep` + (37 Bash calls across the matrix) and on jellyfin even spawned a sub-agent — exactly the behavior + codegraph is meant to prevent. The with-arm answers those in 2–8 codegraph calls and used **0 Bash + and 0 sub-agents** anywhere. +- **Tie zone = tiny repos** (Kotlin Jetcaster 1R/1R, Rust cratesio 1R/1R, express 1R/2R, Swift template + 0R/2R): the whole flow fits in 1–2 files, so reading is already cheap; codegraph ties on reads and is + sometimes a few seconds slower (MCP + index overhead — Kotlin petclinic 37s vs 23s, cratesio 22s vs + 18s). This matches the design note that codegraph's value scales with repo size. +- **Duration tracks reads on the big repos** (jellyfin 51s vs 212s, excalidraw 55s vs 87s, aspnet-eshop + 39s vs 58s, django-wagtail 45s vs 66s) and is noise on small ones; mean wall-clock is 38s with vs 48s + without. +- Some "with" cells still read 2–4 files (jellyfin, gitness, forem, saleor, django) — the residual is + the documented frontier (anonymous handlers, deep service chains, dynamic finders); codegraph gets the + agent to the right file, then it reads one to confirm a detail. + +## Coverage note + +All 14 README frameworks and every flow-relevant language are validated (see the playbook). The +sizes here are by indexed file count; a few languages lack a clean third size in the corpus +(Dart/Kotlin = S/M, Scala/Luau = S only, C = L only, C++ = M only) — those cells are omitted rather +than faked. + +## Reproduce + +Canonical harness: `scripts/agent-eval/run-all.sh "" headless` (with = codegraph-only +MCP, without = empty MCP), parsed from the stream-json logs. The throwaway matrix driver + parser used +for this table live in `/tmp/ab-matrix/`: `run.sh` (the `lang|size|repo|question` matrix — each cell does +`rm -rf .codegraph && codegraph init -i` then both arms), `parse-matrix.mjs` (cells → this table), and +`compare.mjs` (old-vs-new diff + aggregates). Build `dist/` from the target commit first so the MCP +server loads the code under test (`codegraph` on PATH is `npm link`ed to the dev `dist/`). diff --git a/docs/design/callback-edge-synthesis.md b/docs/design/callback-edge-synthesis.md new file mode 100644 index 00000000..7c4bfb06 --- /dev/null +++ b/docs/design/callback-edge-synthesis.md @@ -0,0 +1,179 @@ +# Design + status: general callback / observer edge synthesis + +**Status:** Phases 1–3 implemented & validated as a **prototype, uncommitted on `main`** +(as of 2026-05-22). This doc is the handoff for continuing the work. +**Motivation:** close the dynamic-dispatch hole that static extraction leaves for +observer / event-emitter / signal patterns, where a *dispatcher* invokes callbacks +registered elsewhere through a shared store — so flows like "how does an update +reach the screen" actually exist in the graph. + +--- + +## TL;DR for a new session + +We synthesize `dispatcher → callback` edges that static parsing misses. It works: + +- **Field observer** (excalidraw `Scene.onUpdate`/`triggerUpdate`): synthesizes + `triggerUpdate → triggerRender`. `trace(mutateElement, triggerRender)` now = 3 hops. +- **EventEmitter** (express `on('mount', …)`/`emit('mount')`): synthesizes `use → onmount`. +- Precision is high: excalidraw got **1** synthesized edge out of 27k (the correct one); + node count moved +3 after Phase 3 (no explosion). + +**Files touched (all uncommitted on `main`):** +- `src/resolution/callback-synthesizer.ts` — the whole-graph synthesis pass (Phase 1 + 2). +- `src/resolution/index.ts` — calls `synthesizeCallbackEdges()` at the end of + `resolveAndPersistBatched()` (after base edges are persisted) + the import. +- `src/extraction/tree-sitter.ts` — `visitFunctionBody` now extracts **named** nested + functions (Phase 3), so inline named handlers become linkable nodes. + +**How to reproduce / test:** +```bash +npm run build +rm -rf /tmp/codegraph-corpus/excalidraw/.codegraph +( cd /tmp/codegraph-corpus/excalidraw && codegraph init -i ) +# synthesized edges (provenance='heuristic', metadata.synthesizedBy in {callback,event-emitter}): +sqlite3 /tmp/codegraph-corpus/excalidraw/.codegraph/codegraph.db \ + "select s.name||' → '||t.name||' '||coalesce(e.metadata,'') from edges e \ + join nodes s on e.source=s.id join nodes t on e.target=t.id where e.provenance='heuristic';" +# end-to-end trace (uses the dev probes): +node scripts/agent-eval/probe-trace.mjs /tmp/codegraph-corpus/excalidraw triggerUpdate triggerRender +``` +Probe scripts (dev-only, in `scripts/agent-eval/`): `probe-node.mjs` (symbol + trail), +`probe-trace.mjs` (call path), `probe-context.mjs`, `probe-explore.mjs`. EventEmitter +fixture lives at `/tmp/cb-fixture/bus.js` (ephemeral — recreate or move into `__tests__/`). + +--- + +## The hole + +```ts +class Scene { + private callbacks = new Set(); + onUpdate(cb: Callback) { this.callbacks.add(cb); } // REGISTRAR + triggerUpdate() { for (const cb of this.callbacks) cb(); } // DISPATCHER +} +this.scene.onUpdate(this.triggerRender); // REGISTRATION SITE +``` + +The runtime edge `triggerUpdate → triggerRender` does not exist statically: +`triggerUpdate`'s only literal call is `cb()` (anonymous). Measured: `triggerUpdate`'s +only callee was `randomInteger`; `trace(triggerUpdate, triggerRender)` returned no path. + +## Why it's a whole-graph pass, not a `FrameworkResolver.resolve()` + +`resolve(ref)` answers "what does this **named** ref point to," one ref at a time. The +callback edge has **no ref to resolve** (`cb()` is anonymous) and needs **cross-file, +multi-site correlation** (registrar, registration, dispatcher). So it's a whole-graph +pass after base resolution, language-level (any OO observer), living in +`src/resolution/callback-synthesizer.ts` — **not** under `frameworks/`. + +> Sibling mechanism for the *other* dynamic-dispatch class — **named** attribute/ +> descriptor dispatch (e.g. django `self._iterable_class(...)`) — is the +> `claimsReference` hook (`resolution/types.ts` + `resolution/index.ts` pre-filter) +> + a `FrameworkResolver.resolve()` (django ORM resolver in `frameworks/python.ts`). +> That one *does* fit `resolve()` because the ref is named. Both are part of the same +> coverage effort; see the "Related work" section. + +--- + +## As-built algorithm (and where it diverged from the original design) + +### Field-observer channels (`fieldChannelEdges`, Phase 1) +1. **Candidates** by method/function **name** — registrar `^(on[A-Z]\w*|subscribe| + addListener|addEventListener|register|watch|listen|addCallback)$`; dispatcher + contains `(emit|trigger|notify|dispatch|fire|publish|flush)`. +2. **Confirm by body** (read via `ctx.readFile` + slice node lines): registrar has + `this..add|push|set(`; dispatcher has `for (… of [Array.from(]this.)` + a call, + or `this..forEach(`. +3. **Pairing — DIVERGENCE:** the design said pair by *class*; the build pairs by + **same file + same field `F`** (file as a class proxy — getting the containing class + reliably was harder). Works for the common 1-class-per-file case; revisit for + multi-class files. +4. **Registrations:** `queries.getIncomingEdges(registrar.id, ['calls'])` → for each, + read the caller's source at the edge line and **regex-recover the arg** + (`\s*\(\s*(?:this\.)?(\w+)`). DIVERGENCE: design preferred tree-sitter + re-parse; build uses regex (named refs only — arrows/inline args are missed here). +5. **Synthesize** `dispatcher → fn` (`getNodesByName(arg)` → method|function). Capped at + `MAX_CALLBACKS_PER_CHANNEL = 40`. + +### EventEmitter channels (`eventEmitterEdges`, Phase 2) +- **File-oriented scan** (`ctx.getAllFiles()` + `readFile`, substring pre-filter on + `.emit(`/`.on(`/etc). `ON_RE` = `\.(?:on|once|addListener)\(\s*['"]([^'"]+)['"]\s*,\s* + (?:function\s+(\w+)|(?:this\.)?(\w+))`; `EMIT_RE` = `\.(?:emit|fire|dispatchEvent)\(\s*['"]([^'"]+)['"]`. +- Dispatcher = **enclosing function** of the `emit('e')` call (`enclosingFn` finds the + tightest function/method/component node containing the line). Handler = `getNodesByName` + of the on-handler name. +- Correlate by **event-name literal**; synthesize dispatcher → handler. +- **Precision — DIVERGENCE:** design proposed receiver-type matching; build uses an + **event fan-out cap** (`EVENT_FANOUT_CAP = 6`) — skip events with >6 handlers or + dispatchers (generic names like `error`/`change` would over-link without type info). + +### Provenance — DIVERGENCE +`Edge.provenance` is a fixed enum (`'tree-sitter'|'scip'|'heuristic'`), so synthesized +edges use **`provenance: 'heuristic'`** + `metadata: { synthesizedBy: 'callback'| +'event-emitter', via/event/field }`. The design's `'callback-synthesis'` provenance and +high/medium/low **confidence tiers were NOT implemented** — the fan-out cap + +registrar-name uniqueness + named-only handlers are the precision guards instead. + +### Phase 3 — inline callback extraction (`tree-sitter.ts`) +The real blocker for EventEmitter on real repos: inline handlers +(`on('mount', function onmount(){})`) weren't **nodes**, so nothing could link to them. +Root cause: `visitFunctionBody` walked *through* nested functions without extracting them. +Fix: in `visitForCallsAndStructure`, when a body node is a `functionType` and +`extractName` returns a real name, call `extractFunction` (which extracts it and walks +its own body) and return. **Named only** — anonymous arrows fall through to the existing +recursion (so their inner calls stay attributed to the enclosing fn). This bounded it: +excalidraw +3 nodes, no explosion, no regression. + +--- + +## Validation results (actual) + +| Repo | Result | +|---|---| +| excalidraw | 1 synthesized edge `triggerUpdate → triggerRender` (of 27,214); `trace(mutateElement, triggerRender)` = 3 hops; nodes 9,286 → 9,289 | +| express | after Phase 3: `use → onmount` `{event-emitter, event:"mount"}` (`onmount` now extracted at `application.js:109`) | +| `/tmp/cb-fixture/bus.js` | `tick → handleRefresh`, `persist → handleSave` (named-method EventEmitter handlers) | +| excalidraw / express | no Phase-1 regression; node counts stable | + +--- + +## Remaining work (prioritized for the next session) + +1. **Anonymous-arrow handlers** — `on('e', () => foo())` still produce no edge (no node, + intentionally not extracted in Phase 3). The fix is **synthesizer link-through-body**: + parse the arrow's body and link `dispatcher → (calls inside the arrow)`. Highest + remaining recall win; handles the most common modern callback shape. +2. **Wire into `resolveAndPersist`** (incremental sync) — synthesis currently runs only + in `resolveAndPersistBatched` (full index). Incremental re-index won't refresh + synthesized edges. +3. **Receiver-type matching** for EventEmitter precision (replace/augment the fan-out + cap) — use `type_of` edges so `x.emit('change')` only links to `y.on('change', fn)` + when `x`,`y` are the same type. Lets the fan-out cap relax. +4. **Tree-sitter arg recovery** (replace the regex in field-channel Stage 4) — robust for + arrows, multi-arg, line-wrapped calls. +5. **Single-callback fields** (`this.onChange = cb; … this.onChange()`) — scalar-store + variant of the field observer; not built. +6. **Broad precision/recall audit** — run across the full corpus; tally synthesized edges + per repo, spot-check, confirm no explosion on EventEmitter-heavy repos. +7. **Tests + CHANGELOG** — the fixture is a ready vitest case for the synthesizer; add + extractor tests for Phase 3 (named-nested-fn extraction; confirm other languages + unaffected — the change is in the shared walker), resolver tests for the django side. + +## Edge cases / model +- **Over-approximation across instances** is accepted (reachability, not instance + precision). `unregister`/`off` ignored. +- Synthesized edges are **additive** — never replace static edges; tooling can filter on + `provenance='heuristic'` + `metadata.synthesizedBy`. + +## Related work (same coverage effort) +This is one half of closing dynamic-dispatch coverage. The other artifacts on `main`: +- **Named attribute/descriptor resolver**: `claimsReference` (`resolution/types.ts`, + pre-filter in `resolution/index.ts`) + django ORM resolver (`frameworks/python.ts`, + `_iterable_class` → `ModelIterable.__iter__`). +- **Retrieval/UX changes** (separate from coverage): `explore` whole-small-file + glue + fixes, `node`-with-trail, `codegraph_trace`, `context` call-paths — all in + `src/mcp/tools.ts` / `src/context/index.ts`. +- **Full investigation context + findings:** auto-memory + `project_codegraph_read_displacement` (why coverage — not prompting/hooks/new-tools — + is the lever for getting agents to use codegraph over Read). diff --git a/docs/design/dynamic-dispatch-coverage-playbook.md b/docs/design/dynamic-dispatch-coverage-playbook.md new file mode 100644 index 00000000..c78d474d --- /dev/null +++ b/docs/design/dynamic-dispatch-coverage-playbook.md @@ -0,0 +1,548 @@ +# Dynamic-Dispatch Coverage Playbook + +**Audience:** a Claude agent continuing this work. +**Mission:** systematically close static-extraction coverage holes for **dynamic +dispatch** across **every language and framework codegraph supports**, and validate +each one the same way, so cross-symbol *flows* exist in the graph everywhere. + +> This is the top-level playbook. The deep design for one mechanism (the callback +> synthesizer) is in [`callback-edge-synthesis.md`](./callback-edge-synthesis.md). +> Full investigation context + findings: auto-memory `project_codegraph_read_displacement`. + +--- + +## 1. The goal (why this matters) + +codegraph's value is being **the map** — answering structural/flow questions +(`trace`, `impact`, callers, "how does X reach Y") that grep/Read cannot. Agents +will use codegraph instead of Read **only when it is sufficient**. We proved +empirically (see memory) that the lever for sufficiency is **coverage**, not +prompting/hooks/new-tools: when a flow is missing from the graph, the agent reads +the files to reconstruct it; when the flow *is* in the graph, the agent can answer +completely without reading. + +**Validated end-to-end on excalidraw:** after closing the update-flow hole, 2/3 +headless agent runs answered the "how does an update reach the screen" question with +**Read 0 and a complete answer** — impossible before, because the key edge wasn't in +the graph. (Caveat: coverage *enables* the no-read path; agent confirm-by-reading +variance means it doesn't *force* it. Completeness improves unconditionally.) + +The mission is to make that true for **all** languages/frameworks. + +--- + +## 2. The problem class: dynamic dispatch + +Static tree-sitter extraction captures explicit calls (`foo()`, `this.bar()`). It +**misses** any call whose target is computed/indirect. Four recurring shapes, with a +**difficulty gradient** (do the cheap ones first): + +| # | Shape | Example | Fix mechanism | Cost | +|---|---|---|---|---| +| 1 | **Named attribute / descriptor** | django `self._iterable_class(self)` | framework resolver (`claimsReference` + `resolve()`) | **cheap** | +| 2 | **Field-backed observer** | `onUpdate(cb)` + `for(cb of cbs)cb()` | callback synthesizer (whole-graph pass) | medium | +| 3 | **String-keyed EventEmitter** | `on('e',fn)` / `emit('e')` | callback synthesizer (event-keyed) | medium | +| 4 | **Inline callback handler** | `on('e', function h(){})` / `() => {}` | extraction (named) + synthesizer link-through-body (anon) | named: cheap · anon: hard | + +Key distinction driving the mechanism choice: +- **A named ref exists** to resolve (`_iterable_class` is an attribute name) → **resolver**. +- **No ref exists** (`cb()` is anonymous; needs registrar↔dispatcher correlation) → **synthesizer**. + +--- + +## 3. Worked examples (the two mechanisms, end to end) + +### 3a. Django ORM descriptor — the **resolver** pattern (Python) +- **Hole:** `QuerySet._fetch_all` calls `self._iterable_class(self)` (a runtime-chosen + iterable, default `ModelIterable`), whose `__iter__` runs the SQL compiler. Static + parsing can't resolve the attribute-as-callable → `_fetch_all`'s only callee was + `_prefetch_related_objects`; `trace(_fetch_all, execute_sql)` returned no path. +- **Fix:** `djangoResolver` claims the unresolved `_iterable_class` ref through the + name-exists pre-filter, then resolves it to `ModelIterable.__iter__`. +- **Files:** `src/resolution/types.ts` (`claimsReference?` on `FrameworkResolver`), + `src/resolution/index.ts` (pre-filter in `resolveOne` consults `claimsReference`), + `src/resolution/frameworks/python.ts` (`djangoResolver.resolve` + `claimsReference` + + `resolveModelIterableIter`). +- **Result:** `trace(_fetch_all, execute_sql)` → `_fetch_all → __iter__ → execute_sql` (3 hops). + +### 3b. Excalidraw observer + EventEmitter — the **synthesizer** (TS) +- **Hole:** `Scene.triggerUpdate` does `for (cb of this.callbacks) cb()`; `triggerRender` + is registered via `scene.onUpdate(this.triggerRender)`. The `triggerUpdate → + triggerRender` edge is dynamic → `trace` returned no path; the whole update flow broke. +- **Fix:** a whole-graph pass that detects registrar/dispatcher channels, correlates + registration sites, and synthesizes `dispatcher → callback` edges. Plus extraction of + **named** inline callbacks so handlers like express's `function onmount(){}` are nodes. +- **Files:** `src/resolution/callback-synthesizer.ts` (the pass — field observers + + EventEmitter), `src/resolution/index.ts` (calls `synthesizeCallbackEdges()` at the end + of `resolveAndPersistBatched`), `src/extraction/tree-sitter.ts` (`visitFunctionBody` + extracts named nested functions). +- **Result:** `trace(mutateElement, triggerRender)` → 3 hops; express `use → onmount`. + +--- + +## 4. The repeatable methodology (run this per language/framework) + +### Step 1 — Pick the framework's canonical *flow* question +Every framework has a signature data/control flow. Pick the "how does X reach/become Y" +question and a real repo (add to `.claude/skills/agent-eval/corpus.json`). Examples: +- React state→DOM, Vue reactive→render, Svelte store→update +- Rails request→controller→view, Spring request→`@Controller`→service +- Express/Koa request→middleware→handler, FastAPI request→route→dependency +- Redux action→reducer→store, RxJS subscribe→operator→observer +- Any ORM: query builder → SQL execution (django pattern) + +### Step 2 — Measure the hole (deterministic, no agent) +```bash +rm -rf /.codegraph && ( cd && codegraph init -i ) +node scripts/agent-eval/probe-trace.mjs # does the flow break? where? +node scripts/agent-eval/probe-node.mjs # trail: is the next hop missing? +``` +A "No direct call path … breaks at dynamic dispatch" + a sparse trail at the break +point **locates the hole** (this is exactly how `_iterable_class` and `triggerUpdate` +were found). Confirm it's dynamic by reading the break symbol's body. + +### Step 3 — Classify → choose the mechanism (use the §2 table) +- `self.(...)` / descriptor / metaclass → **resolver** (§3a). +- `for(cb of store)cb()` / `store.forEach(cb=>cb())` → **field-observer synthesizer** (§3b). +- `on('e',fn)` + `emit('e')` → **EventEmitter synthesizer** (§3b). +- Inline handler not a node → **named:** extraction (already done generically in + `tree-sitter.ts`); **anonymous:** synthesizer link-through-body (not yet built). + +### Step 4 — Implement +- **Resolver:** add to `src/resolution/frameworks/.ts` — a `resolve()` branch + + `claimsReference(name)` if the ref name isn't a declared symbol. Copy `djangoResolver`. +- **Synthesizer channel:** extend `src/resolution/callback-synthesizer.ts` — add the + framework's registrar/dispatcher **name patterns** and **body patterns** (e.g. signals + use `.connect()`/`.emit()`; Rx uses `.subscribe()`/`.next()`). +- Reindex (Step 2 command) and re-run `probe-trace` — the flow should now connect. + +### Step 5 — Validate (the same way every time) +1. **Deterministic:** `probe-trace(from,to)` finds the path; `probe-node` shows the + bridged hop. The previously-broken hop is closed. +2. **Precision:** count + spot-check synthesized/resolved edges — no explosion, correct targets: + ```bash + sqlite3 /.codegraph/codegraph.db \ + "select s.name||' → '||t.name||' '||coalesce(e.metadata,'') from edges e \ + join nodes s on e.source=s.id join nodes t on e.target=t.id where e.provenance='heuristic';" + ``` + (Resolver edges aren't `heuristic`; verify via the trace + callees instead.) +3. **Regression:** node count stable (`select count(*) from nodes;` before/after — a big + jump means an extraction change over-fired); existing traces on a control repo intact. +4. **End-to-end agent eval:** run the flow question with codegraph and measure + **reads / answer-completeness / cost** vs a pre-fix baseline: + ```bash + # headless (exact cost + clean tool sequence) + bash scripts/agent-eval/run-agent.sh with "" + # or the full A/B + interactive Explore-subagent path: + scripts/agent-eval/audit.sh local "" all + ``` + Then parse: `Read` count, codegraph-tool count, cost, and whether the answer now + contains the glue symbols (the ones that previously required a read). + +### Success criteria (per language/framework) +- `trace` finds the canonical flow end-to-end (no dynamic-dispatch break). +- Agent can answer the flow question with **Read 0** (achievable in ≥ some runs) and the + glue symbols appear in the answer. +- **No node explosion** and no regression on a control repo. +- Synthesized edges are precise on a spot-check (no generic-name over-linking). + +--- + +## 5. Validation toolkit (reference) + +| Tool | Purpose | +|---|---| +| `scripts/agent-eval/probe-trace.mjs ` | call-path between two symbols (the hole detector) | +| `scripts/agent-eval/probe-node.mjs [code]` | symbol + trail (callers/callees); `code` adds the body | +| `scripts/agent-eval/probe-context.mjs ""` | context output incl. call-paths | +| `scripts/agent-eval/probe-explore.mjs ""` | explore output | +| `scripts/agent-eval/{audit,run-agent,itrun}.sh` | agent A/B (headless + interactive); also the `/agent-eval` skill | +| `sqlite3 /.codegraph/codegraph.db` | direct edge/node inspection (provenance, metadata, counts) | + +Probe scripts use the built `dist/` — run `npm run build` first. Reindex after any +extraction or resolution change (`rm -rf /.codegraph && codegraph init -i`) — the +synthesizer/resolvers run at index time. Test fixtures: keep a tiny per-pattern fixture +(see `/tmp/cb-fixture/bus.js`; **move into `__tests__/`** when shipping). + +--- + +## 6. Coverage matrix (fill in as you go) + +Status legend: ✅ done+validated · 🔬 hole identified · ⬜ not started. +`Mechanism`: R = resolver, S = synthesizer channel, X = extraction. + +| Language | Framework(s) | Canonical flow to test | Mechanism | Status | +|---|---|---|---|---| +| TypeScript/JS | React / observer / EventEmitter / React Router | state→render; dispatch→callback; route→component | S + X | ✅ rendering+dispatch (excalidraw); **React Router JSX routing** `` (v5) + `element={}` (v6) → component (react-realworld **0→10, 10/10**). + **object data-router** `createBrowserRouter([{path, element/Component}])` (literal form); Next.js config/`nextjs-pages` false-positives FIXED. 🔬 lazy data-router (`path: paths.x.path, lazy: () => import()` — variable paths + lazy modules) | +| TypeScript/JS | Vue / Nuxt | template events (@click→handler); component composition; reactive→render | S + X | ✅ events + composition (vitepress S / vben M / element-plus L); 🔬 reactive→render (vue-core Proxy runtime — frontier, deferred) | +| TypeScript/JS | Svelte / SvelteKit | template calls/composition; SvelteKit action→api; store→DOM | X | ✅ already strong (realworld S / skeleton M / shadcn L): template `{fn()}` calls, `` composition, `import * as api` namespace, `load`→api all work out of the box. + exported-const object-of-functions extraction (SvelteKit `actions`). 🔬 `$lib`-namespace-from-action + store/reactive frontier | +| TypeScript/JS | Express / Koa | request → route → handler → service | R + X | ✅ named handlers + middleware + controller/service (resolver) + **inline arrow handlers → service body calls** (realworld S 19 / parse M / ghost L 65 edges). 🔬 custom routers (payload had 0 routes — not `app.get`-style) | +| TypeScript/JS | NestJS | request → @Controller → DI service → repo | R | ✅ already well-covered (realworld S / immich M-L / amplication L): @decorator routes (HTTP/GraphQL/microservice/WS) via resolver + DI `this.svc.method()` controller→service resolves correctly at scale (name + co-location). No dynamic-dispatch hole. 🔬 committed `dist/` build output gets indexed (realworld) — general build-dir-ignore follow-up | +| TypeScript/JS | RxJS / signals | subscribe → operator → observer | S | ⬜ | +| Python | Django ORM | QuerySet → SQL compiler | R | ✅ | +| Python | Django / DRF (views) | url → view → model | R + X | ✅ url→view (`path`/`url`/`as_view`) + **DRF `router.register`→ViewSet** (realworld S / wagtail M / saleor L); ORM QuerySet→SQL (prior work). 🔬 signals (`post_save`→receiver), DRF viewset CRUD actions (inherited), saleor GraphQL resolvers | +| Python | Flask / FastAPI | request → route → handler → dependency | R + X | ✅ **Flask: handler resolved across intervening decorators (`@login_required`) + stacked `@x.route` lines** (microblog S 6→27, redash L decorator routes 6/6); **FastAPI: empty-path router-root routes `@router.get("")` incl. multi-line** (realworld S 12→20 / Netflix dispatch L **290/290 100%**) + **bare-name builtin guard** — a handler named after a Python builtin method (`index`/`get`/`update`/`count`…) was filtered as a builtin and lost its route→handler edge. + **Flask-RESTful `add_resource(Resource,'/x')` → Resource class** (redash 6→**77**) + **tuple `methods=('GET',)`** (was mislabeled GET) + **broadened detection** (requirements/Pipfile/setup + subdir app-factory entrypoints — flask-realworld 0→**19**). 🔬 FastAPI `Depends()` dependency edges (light validation) | +| Go | Gin / chi / gorilla/mux / net-http | request → route → handler → service | X | ✅ **routes on ANY group var** (`v1.GET`, `PublicGroup.GET`) not just `r/router` (gin-vue-admin S→M 4→259 / realworld S / gitness L) — was missing all group-routed apps; named handlers resolve precisely. **gorilla/mux confirmed covered** by the any-receiver `HandleFunc`/`Handle` handling (subrouter-var `s.HandleFunc(...)` + namespaced handlers; `.Methods()` chain ignored). 🔬 inline `func(c){}` handlers (anonymous, body lost); subrouter/`PathPrefix` path-prefix not prepended (label only); gitness chi custom (26/321) | +| Rust | Axum / actix / Rocket | request → route → handler | R + X | ✅ **Axum chained methods + namespaced handlers** — `.route("/x", get(h1).post(h2))` emitted only the first method+handler, and `get(mod::handler)` captured the module not the fn (realworld-axum S **12→19, 19/19**); balanced-paren scan + per-method nodes + last-`::`-segment handler. **Rocket attribute macros 550/556 (99%)** (Rocket repo L) — already strong. crates.io named axum routes resolve (6/8; rest are closures/var handlers; its API is mostly the utoipa `routes!` macro = frontier). Cargo-workspace module resolution (prior work). **actix builder API** `web::resource("/x").route(web::get().to(h))` / `.to(h)` / App `.route("/x", web::get().to(h))` (actix-examples **51→128 routes, 35→112 resolved**) — was the dominant actix style and fully missed (the handler is in `.to(h)`, not `get(h)`). 🔬 actix `web::scope("/api")` prefix (not prepended to nested resource paths) + anonymous `.to` closure handlers | +| Java | Spring | request → @RestController → @Autowired service → repo | R + X | ✅ **bare `@GetMapping`/`@PostMapping` + class `@RequestMapping` prefix join → route→method** (realworld S / mall M / halo L) — was missing all path-less method mappings; DI controller→service resolves (name + dir) + **interface→impl dispatch synthesizer** (`interfaceOverrideEdges`: a class's `implements`/`extends` → link each interface/base method → its same-name override; JVM-gated, capped, **overload-aware**; mall **310** / halo **734** synth edges, node count unchanged) so trace follows controller→service-**interface**→**impl** instead of dead-ending at the abstract method — `trace("PmsProductController.getList","PmsProductServiceImpl.list")` connects in **3 hops** (probe-validated). ⚠️ **agent A/B null** (n=2: the agent went context→explore→Read and never invoked `trace`, so the synth edges weren't exercised — adoption-gated, the recurring wall; see `docs/benchmarks/call-sequence-analysis.md`). The fix is correct + improves trace/callees/impact/context connectivity regardless; agent-visible read reduction needs trace adoption. 🔬 Spring Data JPA derived queries (`findByEmail`) — metaprogramming frontier | +| Kotlin | Spring Boot / Jetpack Compose | request → @RestController → service; @Composable → child | R + X | ✅ **Spring Boot Kotlin** — the Spring resolver was `['java']`-only with a Java-syntax method regex (`public X name()`); extended to `.kt` + Kotlin `fun name(` handler matching (petclinic-kotlin **0→18, 18/18**; class-prefix joins; DI controller→repo resolves — `showOwner ← GET /owners/{ownerId}` → `OwnerRepository.findById`). **Compose composition already static** (@Composable→child are plain function calls — Jetcaster `PodcastInformation→HtmlTextContainer`). Java Spring unchanged (realworld 19/19). 🔬 Ktor `routing { get("/x"){…} }` lambda handlers (anonymous) + Compose recomposition (implicit `mutableStateOf`, no setState gate) + coroutines/Flow | +| Swift | Vapor | request → route → controller | R + X | ✅ **was 0 routes on every real app** — the extractor required an `app/router/routes` receiver + a `"path"` literal, but real Vapor routes on grouped builders (`let todos = routes.grouped("todos"); todos.get(use: index)`) with NO path arg. Rewrote: any receiver, optional/non-string path segments, `.grouped`/`.group{}` prefix tracking, `use:` discriminator. vapor-template S **0→3 (3/3**, nested `/todos/:todoID`), SteamPress M **0→27 (27/27)**, SwiftPackageIndex-Server L **0→14 (14/14** handler resolution). 🔬 typed-route enums (SPI `SiteURL.x.pathComponents` — path label only, handler still resolves) + closure handlers `app.get("x"){ }` (anonymous) | +| C# | ASP.NET Core | request → [Http*] action → DI service → EF | X | ✅ **feature-folder detection** (realworld 0→19 — was undetected) + **bare `[HttpGet]` + class `[Route]` prefix** (eShopOnWeb 9→33 / jellyfin L) — co-located so no claimsReference needed. 🔬 EF Core LINQ/DbSet (metaprogramming frontier) | +| Ruby | Rails / Sinatra | request → routes.rb → Controller#action → model | R | ✅ **RESTful `resources`/`resource` routing → controller#action** (realworld S 16 / spree M / forem L), pluralization + only/except + claimsReference; explicit routes fixed to precise `controller#action` too. 🔬 ActiveRecord dynamic finders (`Article.find_by_slug`) — metaprogramming frontier | +| PHP | Laravel | request → route → controller → Eloquent | R | ✅ **precise `Route::get([Ctrl::class,'m'])` / `'Ctrl@m'` → Ctrl@method** (realworld S / firefly M / bookstack L) — was resolving the bare method name to the WRONG controller (every `index`→ArticleController); Route::resource→controller. 🔬 Eloquent dynamic finders/relationships (metaprogramming frontier) | +| PHP | Drupal | request → *.routing.yml → _controller/_form | R | ✅ **`claimsReference` for FQCN handlers** (`\Drupal\…\Class::method` passed the pre-filter only because the `::method` name was known; bare `_form` FQCNs `\…\FormClass` and single-colon `Class:method` controller-services were dropped before resolve()) + **single-colon controller match** + **detect via composer `type:drupal-*` / `name:drupal/*` + `*.info.yml` fallback** (a contrib module with empty `require` was undetected → 0 routes). admin_toolbar S **0→14 (14/14)** / webform M 208 (**144**) / core L 836 (536→**731, 87%**). Remainder is the **entity-annotation handler frontier** (`_entity_form: type.op` resolves via the entity's PHP `#[ContentEntityType]` handlers, not a direct class). 🔬 **OOP `#[Hook]` attributes** — Drupal 11 moved ~all procedural hooks to attribute methods (core: 418 `#[Hook]` files vs 3 procedural), so the resolver's docblock/`module_hook` detection is obsolete for modern core (0 hook edges) | +| C/C++ | C++ vtables / inheritance | virtual call → override; general direct dispatch | S + X | ✅ **general dispatch strong** (redis C **29k** cross-file calls / leveldb C++ **1.4k**) + **C++ inheritance extraction fix** (`base_class_clause` was unhandled, so C++ extends edges were missing — leveldb **219→298**) + **cpp-override synthesizer** (base virtual method → subclass override, gated to C++, capped — leveldb 12 precise: `Iterator::Next→MergingIterator`). 🔬 C callback structs (`s->fn()` → 422-way fan-out, too noisy to synthesize) + C++ pure-virtual base methods (`virtual void f()=0;` declarations aren't extracted as nodes, so those overrides can't bridge) | +| Dart | Flutter | setState → build; build → child widgets | S + X | ✅ **setState→build synthesizer** (Dart analog of react-render: a State method whose body calls `setState(` → `build`) gated to `.dart` + **foundational Dart method-range fix** — Dart models a method body as a *sibling* of the signature, so method nodes were signature-only (`end==start`); now `endLine` spans the body (required for ALL body analysis: callees, context slices, the synthesizer's body scan). counter `initState→build`, books `build→BookDetail/BookForm`; widget composition already static (compass_app `build→ErrorIndicator/HomeButton`). Controls unchanged (excalidraw 9,290 / django 302 — the range fix only extends sibling-body grammars). 🔬 MVVM Command/ChangeNotifier dispatch (compass_app — no setState) + `Navigator.push(MaterialPageRoute(builder:))` nav routes | +| Lua / Luau | Neovim / Roblox | module dispatch (require→mod, mod.fn); event/callback | — | ✅ **already covered for the dominant flow (measure-first, no code change)** — Neovim is module-heavy (`require('x')` + `x.fn()`), and the general import + name resolution already handles it: telescope.nvim **220 imports + 335 cross-file `mod.fn` calls**, traces end-to-end (`map_entries ← init.lua → get_current_picker (state.lua)`). Luau instance-path `require(game:GetService(...))` handled by the extractor. 🔬 event-callback registration (`vim.keymap.set(…, fn)`, autocmd `callback=`, Roblox `signal:Connect(fn)`) is predominantly INLINE anonymous closures (corpus ~12 inline vs ~2 named) — the anonymous-handler frontier; named handlers too rare to justify a synthesizer | +| Scala | Play / Akka | request → conf/routes → controller action | R + X | ✅ **Play `conf/routes` → controller** — the extensionless `conf/routes` wasn't indexed; added narrow file-walk opt-in (`isPlayRoutesFile`) + a Play resolver parsing `METHOD /path Controller.action(args)` → the action method (computer-database **0→8, 7/8**; starter 0→4, 3/4 — the unresolved are Play's framework `Assets` controller, external). Scala general controller→DAO dispatch already resolves. No-regression: the file-walk change only ADDS Play routes files (excalidraw 9,290 / suite 800 unchanged). 🔬 SIRD programmatic router (`-> /v1 Router` include + `case GET(p"/x")` in code) + Akka actor `receive`/`Behaviors.receiveMessage` message→handler | + +(Verify the exact supported set against `src/extraction/languages/` and +`src/resolution/frameworks/` before starting — this table is a starting point.) + +--- + +## 7. Known limits & gotchas (from the excalidraw/django work) + +- **Coverage enables, doesn't force, the no-read path.** Agents still read to *confirm + source* sometimes; cost stays ~flat (codegraph calls trade for reads). The reliable + win is **completeness** + making Read-0 *possible*. Don't expect a guaranteed cost drop. +- **Vue (validated 2026-05-23, vitepress S / vben M / element-plus L).** SFC `