From 0e2789ab712bd570f775602955d9f1329133662e Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Mon, 8 Jun 2026 10:25:40 -0400 Subject: [PATCH 01/51] docs(agent-eval): nested MCP attach is startup-latency, not a hard block (#735) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Corrects the "run non-nested only" conclusion from #734. The codegraph server is healthy (handshake ~165ms); the flakiness is that on a multi-step implementation task the agent dives into Read/grep before codegraph finishes its ~2-3s startup (worse under nested CPU contention), so it runs with no codegraph. Fix: pre-warm a persistent daemon (high idle timeout) + skip the startup re-exec (CODEGRAPH_WASM_RELAUNCHED=1) so claude connects before the agent's first turn. claude's init snapshot can show status:"pending" even when it then connects — judge by actual codegraph usage, not the init line. ab-new-vs-baseline.sh now bakes in the pre-warm + skip-re-exec. Validated: a clean A/B showed the new build's agent used codegraph 2x / 5 Reads vs the baseline's 0 / 8 on the same fully-implemented task. Co-authored-by: Claude Opus 4.8 (1M context) --- CLAUDE.md | 2 +- scripts/agent-eval/ab-new-vs-baseline.sh | 50 +++++++++++++++--------- 2 files changed, 33 insertions(+), 19 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index e2f40832c..bad199f20 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -137,7 +137,7 @@ For each **language × framework**, validate on **small, medium, and large** rea 1. **Pick the canonical flow** for the framework ("how does X reach Y": state→render, request→handler→view, query→SQL, action→reducer→store…). 2. **Deterministic probes** (`scripts/agent-eval/probe-{node,explore}.mjs` against the built `dist/`): `codegraph_explore` with the flow's symbol names connects from→to end-to-end with no break (its Flow section shows the path); **no node explosion** (`select count(*) from nodes` stable before/after re-index); synthesized-edge **precision** spot-check (`select … where provenance='heuristic'`). 3. **Agent A/B** (`scripts/agent-eval/run-all.sh ""`): with vs without codegraph, **≥2 runs/arm** (run-to-run variance is large — never conclude from n=1). Record **duration, total tool calls, Read, Grep**. Optional forced-Read-0 sufficiency proof via the block-read hook (`scripts/agent-eval/hook-settings.json`). - - **Run agent-evals in a REAL terminal — NEVER nested inside a Claude Code session** (don't spawn `claude -p` from a Bash tool call). The codegraph MCP server is healthy (full `initialize`→`tools/list` handshake ~165ms, daemon and in-process modes both fine), but a nested `claude -p` marks it `status:"pending"` / 0 tools under CPU/timing contention and the agent silently runs with no codegraph — it can connect early in a session, then degrade to consistent failure as nested spawns pile up. `CODEGRAPH_NO_DAEMON=1` and `< /dev/null` do NOT fix it (it's the nested client, not the server). Confirm via `parse-run.mjs` (`codegraph tools exposed: 0` = void run). To isolate a change — **new-build vs baseline-build, both codegraph-on** (vs run-all.sh's with-vs-without) — use `scripts/agent-eval/ab-new-vs-baseline.sh "" [baseline-ref]`. + - **MCP attach is a startup-latency issue, not a hard block.** On a multi-step task the agent dives into Read/grep before codegraph finishes its ~2-3s startup (worse when the eval is itself run nested inside a Claude session, under CPU contention), so it runs with no codegraph. Fix: **pre-warm a persistent daemon** for the target (`CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS` high; spawn `serve --mcp --path "" [baseline-ref]` (it bakes in the pre-warm). 4. **Pass bar:** a normal flow question reaches **~0 Read/Grep within the repo's explore-call budget**, runs **faster** than without-codegraph, and shows **no regression on a control repo**. Record the numbers in `docs/design/dynamic-dispatch-coverage-playbook.md` (the coverage matrix). Full playbook + per-mechanism design: `docs/design/dynamic-dispatch-coverage-playbook.md` and `docs/design/callback-edge-synthesis.md`. diff --git a/scripts/agent-eval/ab-new-vs-baseline.sh b/scripts/agent-eval/ab-new-vs-baseline.sh index 7f5d58d1d..7e5cc84e5 100755 --- a/scripts/agent-eval/ab-new-vs-baseline.sh +++ b/scripts/agent-eval/ab-new-vs-baseline.sh @@ -2,18 +2,20 @@ # A/B a codegraph retrieval/steering change: the NEW build (current HEAD) vs a # BASELINE build (a git ref) — BOTH with codegraph attached — on the same # implementation task, measuring how many Read vs codegraph calls the agent -# makes. This ISOLATES the change (unlike run-all.sh, which is with-vs-without -# codegraph). The agent works on a throwaway copy of the target, so its edits -# never touch your repos. +# makes. ISOLATES the change (unlike run-all.sh's with-vs-without). The agent +# works on a throwaway copy of the target, so your repos are never touched. # -# *** RUN THIS IN A REAL TERMINAL — NOT nested inside a Claude Code session. *** -# A `claude -p` spawned from within another Claude session (e.g. from a Bash -# tool call) cannot reliably attach the codegraph MCP server: the server is -# healthy (full handshake ~165ms) but the nested client marks it -# status:"pending" / 0 tools under CPU/timing contention, and degrades to -# consistent failure over a long session. NO_DAEMON + `< /dev/null` do NOT fix -# it — it's the nested client, not the server. See codegraph/CLAUDE.md -# ("Running agent-evals — do NOT nest"). +# Reliable attach (works even when this is itself run nested inside a Claude +# session): each arm PRE-WARMS a persistent codegraph daemon for its target so +# claude connects to an already-bound, index-loaded daemon instantly — before +# the agent's first turn — and SKIPS codegraph's startup re-exec via +# CODEGRAPH_WASM_RELAUNCHED=1. Without this, on a multi-step task the agent +# dives into Read/grep before codegraph finishes its ~2-3s startup (worse under +# the CPU contention of a nested run) and runs with NO codegraph. +# +# Gotcha: claude's `system/init` snapshot can read status:"pending" / 0 tools +# even when the server then connects fine — judge by ACTUAL codegraph usage in +# parse-run.mjs's "by type", not the init line. # # Usage: ab-new-vs-baseline.sh "" [baseline-ref] # a repo with a .codegraph index (copied per arm) @@ -38,9 +40,13 @@ fi CHANGED=$(git -C "$ENGINE" diff --name-only "$BASE_REF" HEAD -- src 2>/dev/null) [ -n "$CHANGED" ] || { echo "no src/ changes between $BASE_REF and HEAD — nothing to A/B"; exit 1; } -# Always restore the engine to HEAD on exit, even if interrupted mid-arm. -restore() { git -C "$ENGINE" checkout HEAD -- $CHANGED 2>/dev/null; ( cd "$ENGINE" && npm run build >/dev/null 2>&1 ); } -trap restore EXIT +# On exit: kill any eval daemons + restore the engine to HEAD. +cleanup() { + pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null + git -C "$ENGINE" checkout HEAD -- $CHANGED 2>/dev/null + ( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) +} +trap cleanup EXIT mkdir -p "$OUT" echo "###### engine=$ENGINE baseline=$BASE_REF" @@ -54,17 +60,25 @@ rm -rf "$OUT/t-new" "$OUT/t-base" rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$TARGET/" "$OUT/t-new/" cp -R "$OUT/t-new" "$OUT/t-base" -cfg() { printf '{"mcpServers":{"codegraph":{"command":"%s","args":["serve","--mcp","--path","%s"]}}}' "$BIN" "$1" > "$2"; } +prewarm() { # target — spawn a persistent daemon (current $BIN) and wait for its socket + pkill -9 -f "serve --mcp --path $1" 2>/dev/null + CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" /dev/null 2>&1 & + node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" \ + && echo " daemon warm: $1" || echo " WARN: daemon never bound for $1 (arm may run without codegraph)" +} run_arm() { # label, target-copy local label="$1" tgt="$2" c="$OUT/mcp-$1.json" - cfg "$tgt" "$c" + # Connect to the pre-warmed daemon; skip the startup re-exec for a fast attach. + printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$c" + prewarm "$tgt" echo "############## ARM [$label] ##############" ( cd "$tgt" && claude -p "$TASK" \ --output-format stream-json --verbose --permission-mode bypassPermissions \ --model opus --max-budget-usd 4 --strict-mcp-config --mcp-config "$c" \ - < /dev/null > "$OUT/run-$label.jsonl" 2>"$OUT/run-$label.err" ) - node "$PARSE" "$OUT/run-$label.jsonl" 2>&1 | grep -E "tools exposed|by type|Result" || echo " (parse failed — see $OUT/run-$label.jsonl)" + "$OUT/run-$label.jsonl" 2>"$OUT/run-$label.err" ) + node "$PARSE" "$OUT/run-$label.jsonl" 2>&1 | grep -E "by type|Result" || echo " (parse failed — see $OUT/run-$label.jsonl)" + pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null echo } From 1983590533a51c950b055937051b152e8473a30b Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Mon, 8 Jun 2026 13:48:42 -0400 Subject: [PATCH 02/51] =?UTF-8?q?feat(mcp):=20codegraph=5Fnode=20reads=20f?= =?UTF-8?q?iles=20like=20the=20Read=20tool=20=E2=80=94=20offset/limit,=20b?= =?UTF-8?q?yte-parity=20(#738)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Makes codegraph_node a drop-in faster Read for indexed source files (file-read mode: \t like Read, offset/limit, + blast-radius header; symbolsOnly for the map). Fixes the old file-view dropping imports/line-numbers. #383/#527 preserved. Validated by A/B: explore/node already return source + line numbers, so Read=0 when used. Includes the A/B eval harness scripts. Full suite green (1270). --- CHANGELOG.md | 2 +- __tests__/node-file-view.test.ts | 83 +++++++--- docs/design/agent-codegraph-adoption.md | 136 +++++++++++++++++ scripts/agent-eval/ab-adoption.sh | 91 +++++++++++ scripts/agent-eval/ab-hook.sh | 86 +++++++++++ scripts/agent-eval/ab-impl.sh | 78 ++++++++++ scripts/agent-eval/ab-sufficiency.sh | 78 ++++++++++ scripts/agent-eval/redirect-read-hook.sh | 38 +++++ src/mcp/server-instructions.ts | 5 +- src/mcp/tools.ts | 187 +++++++++++++++-------- 10 files changed, 702 insertions(+), 82 deletions(-) create mode 100644 docs/design/agent-codegraph-adoption.md create mode 100644 scripts/agent-eval/ab-adoption.sh create mode 100644 scripts/agent-eval/ab-hook.sh create mode 100644 scripts/agent-eval/ab-impl.sh create mode 100644 scripts/agent-eval/ab-sufficiency.sh create mode 100755 scripts/agent-eval/redirect-read-hook.sh diff --git a/CHANGELOG.md b/CHANGELOG.md index 255b192a7..a94dadb45 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -16,7 +16,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ### New Features -- The `codegraph_node` MCP tool now accepts a file path on its own (no symbol) and returns that file's symbols plus which files depend on it — and the full source with `includeCode`. It's a drop-in upgrade for reading a source file: the same content, plus the file's blast radius, in one call. The agent-facing guidance was also retuned so assistants reach for codegraph while *implementing* a change (not only when answering questions), since one codegraph call returns more accurate context for fewer tokens than re-reading files. +- The `codegraph_node` MCP tool can now **read a whole source file like the built-in Read tool — only faster, served from the index**. Pass a file path with no symbol and it returns that file's current source with line numbers (the same `` shape Read produces, so an assistant can edit straight from it), narrowable with `offset`/`limit` exactly like Read, plus a one-line note of which files depend on it (the file's blast radius). Use it anywhere you'd reach for Read on an indexed source file. Pass `symbolsOnly: true` for just the file's structure. Configuration/data files (`.yml` / `.properties`) are summarized by key only, never dumped, so secrets in them are never surfaced. The agent-facing guidance was also retuned so assistants reach for codegraph while *implementing* a change (not only when answering questions), since one codegraph call returns the same bytes plus the blast radius, faster than re-reading the file. - New `codegraph upgrade` command updates CodeGraph to the latest release in place — it detects how you installed (the standalone `install.sh` / `install.ps1` bundle, npm, or npx) and does the right thing for each, on macOS, Linux, and Windows. Use `codegraph upgrade --check` to see whether an update is available without installing, or `codegraph upgrade ` to move to a specific version. After upgrading it reminds you to re-index your projects so they pick up the newer engine's improvements. (#679) - `codegraph status` now flags when a project's index was built by an older engine than the one you're running and recommends re-indexing (also surfaced in `codegraph status --json`), so you know when a `codegraph index -f` or `codegraph sync` will add coverage a newer release introduced. - Cross-file impact and blast-radius coverage now spans **all 22 supported languages and 14 web frameworks**, each validated on a real-world repo — see the new coverage table in the README. This release ships the cross-file resolution behind it, including Lua and Luau `require`, Shopify OS 2.0 Liquid section templates, Delphi form code-behind, Rust cross-module calls and Rocket route macros, Swift Fluent relationships, and the SvelteKit / Nuxt / Vapor / Axum route conventions. The residual everywhere is genuine static-analysis frontiers (runtime dispatch, reflection / DI, framework-convention entry points), never hidden. diff --git a/__tests__/node-file-view.test.ts b/__tests__/node-file-view.test.ts index 316ed555d..7d2a5703c 100644 --- a/__tests__/node-file-view.test.ts +++ b/__tests__/node-file-view.test.ts @@ -1,7 +1,9 @@ /** - * codegraph_node FILE-VIEW mode: a bare `file` (no `symbol`) returns that file's - * symbol map + graph role (dependents), and verbatim bodies with includeCode — - * a Read replacement for a source file that also surfaces the blast radius. + * codegraph_node FILE READ mode: a `file` with no `symbol` reads that file like + * the Read tool — current source with `\t` numbering (byte-for-byte + * Read's shape), narrowable with offset/limit — plus a one-line blast-radius + * header. `symbolsOnly` returns the structural map instead. Config/data files + * are summarized by key, never dumped (#383). */ import { describe, it, expect, beforeEach, afterEach } from 'vitest'; import * as fs from 'fs'; @@ -24,9 +26,23 @@ describe('codegraph_node file-view (Read replacement)', () => { ); fs.writeFileSync( path.join(dir, 'src', 'b.ts'), - "import { helper } from './a';\nexport function useHelper() { return helper(2); }\n", + "import { helper } from './a';\n\n// a comment between symbols\nconst SETTING = 7;\nexport function useHelper() { return helper(2) + SETTING; }\n", ); - cg = CodeGraph.initSync(dir, { config: { include: ['**/*.ts'], exclude: [] } }); + // A config/data file (#383): its values may be secrets and must never be + // dumped verbatim by the file-view. + fs.writeFileSync( + path.join(dir, 'src', 'application.properties'), + 'spring.datasource.password=SUPERSECRET123\nserver.port=8080\n', + ); + // A large file: exceeds the file-view line budget, so it must be windowed + // honestly (not silently truncated). + fs.writeFileSync( + path.join(dir, 'src', 'big.ts'), + 'export function big() {\n' + + Array.from({ length: 2000 }, (_, i) => ` const v${i} = ${i};`).join('\n') + + '\n return 0;\n}\n', + ); + cg = CodeGraph.initSync(dir, { config: { include: ['**/*.ts', '**/*.properties'], exclude: [] } }); await cg.indexAll(); h = new ToolHandler(cg); }); @@ -39,21 +55,54 @@ describe('codegraph_node file-view (Read replacement)', () => { const text = async (args: Record): Promise => (await h.execute('codegraph_node', args)).content.map((c) => c.text).join('\n'); - it("a bare file (no symbol) returns the file's symbols + dependents", async () => { + it('reads a whole file like Read by default — `\\t` lines (no pad), imports + gaps included', async () => { + const out = await text({ file: 'b.ts' }); // no includeCode needed — content is the default + // Byte-for-byte Read shape: line 1 is "1import …", NOT space-padded. + expect(out).toMatch(/^1\timport \{ helper \} from '\.\/a';$/m); + expect(out).toContain('// a comment between symbols'); // inter-symbol gap (Read has it; old reconstruction dropped it) + expect(out).toContain('const SETTING = 7'); // top-level statement + expect(out).toContain('useHelper'); // the symbol body too + expect(out).not.toContain('```'); // Read has no code fence; neither do we + }); + + it('leads with a one-line blast-radius header (the value-add over Read)', async () => { const out = await text({ file: 'a.ts' }); - expect(out).toContain('src/a.ts'); - expect(out).toContain('helper'); - expect(out).toContain('Widget'); - expect(out).toMatch(/depended on by 1 file/i); - expect(out).toContain('src/b.ts'); // the dependent file (blast radius) + expect(out).toMatch(/used by 1 file: src\/b\.ts/); // a.ts is imported by b.ts + expect(out).toContain('return x + 1'); // still returns the source + }); + + it('offset/limit narrow the window exactly like Read', async () => { + const out = await text({ file: 'big.ts', offset: 1000, limit: 3 }); + // Window starts at the requested line, numbered exactly: "1000 const v998 = 998;" + expect(out).toMatch(/^1000\t {2}const v998 = 998;$/m); + expect(out).not.toMatch(/^1\t/m); // line 1 is NOT shown + expect(out).toMatch(/lines 1000[–-]1002 of \d+/); // honest pagination note + }); + + it('an offset past EOF is reported, not a crash', async () => { + const out = await text({ file: 'a.ts', offset: 9999 }); + expect(out).toMatch(/past the end/i); + }); + + it('paginates a large file honestly by default — "lines 1–N of TOTAL", never a silent truncate', async () => { + const out = await text({ file: 'big.ts' }); + expect(out).toMatch(/lines 1[–-]\d+ of \d+/); // explicit window note + expect(out).not.toContain('(output truncated)'); // not the generic 15k chop + expect(out).toMatch(/^1\texport function big/m); // the head of the window is real source }); - it('resolves by basename and returns verbatim bodies with includeCode', async () => { - const out = await text({ file: 'a.ts', includeCode: true }); - expect(out).toContain('return x + 1'); // helper body - expect(out).toContain('class Widget'); // class body, verbatim - // It must NOT steer the agent back to Read — it is the Read replacement. - expect(out.toLowerCase()).not.toContain('read `src/a.ts`'); + it('does NOT dump a config/data file (yaml/properties) — #383 secret safety', async () => { + const out = await text({ file: 'application.properties' }); + expect(out).not.toContain('SUPERSECRET123'); // the value never reaches the agent + expect(out.toLowerCase()).toMatch(/config|values withheld/); + }); + + it('symbolsOnly returns the structural map, not the source', async () => { + const out = await text({ file: 'a.ts', symbolsOnly: true }); + expect(out).toContain('### Symbols'); + expect(out).toContain('helper'); + expect(out).toContain('Widget'); + expect(out).not.toContain('return x + 1'); // bodies are NOT included in the map }); it('still works as a normal symbol lookup (no regression)', async () => { diff --git a/docs/design/agent-codegraph-adoption.md b/docs/design/agent-codegraph-adoption.md new file mode 100644 index 000000000..8b6c1c061 --- /dev/null +++ b/docs/design/agent-codegraph-adoption.md @@ -0,0 +1,136 @@ +# Getting agents to actually use codegraph (not Read) — design notes & handoff + +> Working doc for a fresh session. Two problems to crack: +> **(P1)** agents still reach for `Read`/`grep` during implementation instead of codegraph; +> **(P2)** on startup the codegraph MCP server can be `pending` when the agent's first turn fires, so the agent runs with *no* codegraph at all. +> +> Read `codegraph/CLAUDE.md` → "Retrieval performance & dynamic-dispatch coverage" first — it's the doctrine these ideas must respect. + +--- + +## Context — what already shipped (so you don't repeat it) + +- **#733 (`7175dc4`)** — reframed the agent-facing steering (`src/mcp/server-instructions.ts` + the `codegraph_node`/`codegraph_explore` descriptions in `src/mcp/tools.ts`) to cover *implementation*, not just Q&A; and added **file-view mode**: `codegraph_node` now accepts a bare `file` (no `symbol`) → returns that file's symbol map + its dependents (blast radius) + verbatim bodies (`includeCode`). `handleFileView` in `src/mcp/tools.ts`. +- **Clean A/B result** (new build vs baseline build, both codegraph-connected, same fully-implemented task — `kindExclude` added to `codegraph_search`): + - **baseline:** 0 codegraph calls, 8 Reads (agent *ignored* available codegraph). + - **new:** 2 `codegraph_explore` calls, 5 Reads. + - So the reframe *did* move tool-choice — but the agent used `codegraph_explore`, **never the file-view**, and still Read 5×. n=1/arm. +- **Eval harness fix** (`#735`): nested attach is a *startup-latency* problem, not a hard block. `scripts/agent-eval/ab-new-vs-baseline.sh` now pre-warms a daemon + skips the re-exec; use it (run non-nested for cleanest results). + +**Doctrine constraints (from CLAUDE.md — do not relitigate):** +- *Adapt the tool to the agent.* Changing tool descriptions / `server-instructions.ts` is **low-salience** and has *regressed* wall-clock before. Wording alone won't reliably move tool-choice. +- *New tools fare worse than extending an existing one* (the agent under-picks even `trace`; `codegraph_context` was removed). +- The real levers that landed historically: **coverage** (more flows connect statically → `explore` surfaces them) and **sufficiency** (output complete enough that the agent *stops* reading). +- The optimization target is **wall-clock + tool-call count + Read=0**, not token cost (cost is lower as a side effect). + +--- + +## P1 — Agents under-use codegraph during implementation + +### STATUS — 2026-06-08 (RESOLVED via Read-parity, not a hook) + +**The fix: make `codegraph_node` read a file *exactly like the Read tool*, only +faster — so the agent reaches for it naturally. No forcing.** The owner's steer +settled the direction: *"codegraph should be able to Read just like the Read +tool… make it as good as Read. Read is slow and old; querying the index is fast. +You keep diverging away from using codegraph rather than pursuing the fix."* + +**DONE — `handleFileView` (`src/mcp/tools.ts`) is now full Read parity:** +- A `file` with no `symbol` returns the file's current source numbered + **byte-for-byte the way Read does — `\t`, no padding, trailing empty + line kept** (verified by reading the same file with both and diffing). The only + addition is a **one-line blast-radius header** (`used by N files: …`). +- **`offset` / `limit` mean exactly what they do on Read** (1-based start; max + lines; default whole file capped at 2000 lines like Read). Large files paginate + honestly (`(lines X–Y of N — pass offset/limit…)`), never the 15k `truncateOutput` chop. +- Content is the **default** (no `includeCode` needed); `symbolsOnly: true` returns + the cheap structural map instead. Security preserved: `yaml`/`properties` + summarized by key, never dumped (#383); reads via `validatePathWithinRoot` (#527). +- Tests: `__tests__/node-file-view.test.ts` (9, incl. strict format parity + `^1000\t const v998 = 998;` and unpadded `^1\timport …`). Full suite green + (1270). Descriptions / `server-instructions.ts` / CHANGELOG reframed: "read a + source file with codegraph_node instead of Read — same bytes, faster." + +**The hook (idea 1) — A/B'd and REJECTED. Do not ship.** Kept only as an eval +artifact (`scripts/agent-eval/redirect-read-hook.sh` + `ab-hook.sh`). +- Clean A/B (2 runs/arm, devpit "add `dp ping`, build it"; both arms codegraph-attached): + - **nohook:** 0 codegraph calls, 1 Read, **5–7 tool calls, 6–8 turns, 55–77s.** (Reproduces P1: agent ignores codegraph — but read-once-and-edit is *efficient* here.) + - **hook (deny-redirect):** 0 *successful* Reads + 1 file-view call (parity worked, edit compiled), but **8–9 tool calls, 9–10 turns, 200–239s**, and the agent **fought the deny** — `ToolSearch` to find the tool, reflexive re-Read (denied), then **`Bash python3` to read the file around the block.** + - Verdict: a blanket Read-deny **regresses the target metrics (~2× tool calls, more turns) on a simple edit** and the agent routes around it. Forcing is the wrong lever; making the tool genuinely better than Read is the right one. +- If routing is ever revisited: not a blanket hook. Either a narrow trigger (large + files only / after-N-reads) **with a clean A/B on a Read-heavy multi-file task** + (the hook's best case, untested), or just keep widening coverage + sufficiency. + +--- + +**Symptom:** even with codegraph attached + the new steering, the agent reflexively `Read`s/`grep`s mid-implementation, and never reaches for the file-view. Descriptions can't fix this (low-salience wall). + +### Ideas, ranked by expected leverage + +1. **PreToolUse(Read/Grep) hook that redirects to codegraph** — *highest leverage; the only channel that actually changes behavior.* + - Claude Code **hooks** can intercept a tool call and inject context or block it — unlike descriptions, this is *not* low-salience. We already have `scripts/agent-eval/block-read-hook.sh` + `hook-settings.json` (used to force Read=0 in evals). + - Ship a **recommended (opt-in) hook**: on `Read` (or `Grep`) of a path that's *indexed*, inject "this file is indexed — `codegraph_node {file}` returns it + its blast radius for fewer tokens; treat its output as already-Read." Soft nudge (don't hard-block, or it'll frustrate users on configs/docs codegraph doesn't index). + - The installer (`src/installer/targets/claude.ts`) could offer to add this hook (opt-in, like the auto-allow permissions). + - **Validate** with `ab-new-vs-baseline.sh` (Read count, with vs without the hook). This is the experiment most likely to move the needle. + - Open Qs: how to know a path is indexed from inside a hook (query `codegraph files`/`status`, or a fast local check against `.codegraph`); avoiding noise on non-indexed files; per-language false positives. + +2. **Sufficiency: make the file-view the obvious Read replacement so the agent *wants* it.** + - The A/B showed the agent never passed a `file` to `codegraph_node`. Why? It doesn't think "Read this file" → "codegraph_node file=X". Investigate: is the file-view's value (symbols + dependents + bodies) actually *better than Read* for the agent's next step (an `Edit`)? It returns bodies — but does it return enough surrounding context to `Edit` confidently? If not, the agent Reads anyway. + - Consider: when the agent *does* Read an indexed file, is there a way to make codegraph's prior `explore`/`node` output have *already* given it what it needed? (i.e. fix the upstream sufficiency, not the Read itself.) + +3. **Coverage — the durable lever.** Every flow that connects statically is one the agent doesn't Read to reconstruct. Keep closing dynamic-dispatch gaps (`src/resolution/`). Less about "stop Reading," more about "never need to." + +4. **Naming / affordance experiments (low confidence, cheap).** The file-view is buried inside `codegraph_node`. A dedicated, obviously-named affordance might get picked more — *but* "new tools fare worse," so this likely loses. If tried, A/B it; don't assume. + +**Recommendation:** prototype **idea 1 (the Read-redirect hook)** and A/B it. It's the one lever with a real chance of moving behavior. Everything else is incremental. + +--- + +## P2 — Agent runs without codegraph because the server is `pending` at startup + +**Symptom:** `serve --mcp` isn't ready when the agent's first turn fires (the host marks the MCP server `status:"pending"` / 0 tools), so the agent starts Read/grep and never uses codegraph. We saw this hard in nested evals (~2-3s startup vs the agent's turn-1); **real users hit a milder version** — the first query of a session may not have codegraph. + +### Root cause +`serve --mcp` does a `--liftoff-only` **re-exec** (for a node memory flag) **and** spawns/binds a detached **daemon** before tools are usable. Under load that exceeds the host's MCP-startup window. (`CODEGRAPH_WASM_RELAUNCHED=1` skips the re-exec; pre-warming a daemon removes the bind latency — both proven in `ab-new-vs-baseline.sh`. But a real user can't pre-warm.) + +### Ideas, ranked + +1. **CODEGRAPH-SIDE — expose the static tool list INSTANTLY, decoupled from the daemon. *Biggest shippable win; helps every user.*** + - Hypothesis: the host marks codegraph `pending` because `tools/list` (tool exposure) waits on the daemon connect. The local handshake already answers `initialize` fast (~107ms; `runLocalHandshakeProxy` in `src/mcp/proxy.ts`, `getStaticTools` is imported there). **Investigate: does `serve --mcp` answer `tools/list` *locally and instantly* from `getStaticTools`, or does it forward it to the still-connecting daemon?** If the latter, decouple it: advertise the static tools the moment the client asks, mark connected, and resolve the daemon in the background for actual tool *calls*. + - Verify with: `printf '\n\n\n' | node dist/bin/codegraph.js serve --mcp --path ` and time the `tools/list` response, daemon-mode vs in-process. In-process answered in ~165ms; daemon-mode is the suspect. + - If this lands, `pending`-at-startup largely disappears without any host change. + +2. **CODEGRAPH-SIDE — speed/skip the re-exec on the MCP serve path.** The re-exec exists for a V8 memory flag (`src/extraction/wasm-runtime-flags.ts`, `RELAUNCH_GUARD_ENV = CODEGRAPH_WASM_RELAUNCHED`). For MCP serving on a normal repo the flag may be unnecessary, or settable without a full process re-exec. Removing one process spawn from the cold path shaves the startup window. + +3. **CODEGRAPH-SIDE — a SessionStart hook that pre-warms the daemon.** Ship an opt-in Claude Code `SessionStart` hook (installer-added) that spawns/warms the daemon for the project at session start, so it's bound before the first query. Mitigation if (1) is hard. + +4. **HOST-SIDE — "wait/retry on pending" — this is what you asked about, but it's a Claude Code (MCP client) behavior, not codegraph's to fix.** codegraph can't make the agent retry. Options: (a) raise it with Anthropic as an MCP-client improvement (don't let the agent's first turn proceed until configured MCP servers finish connecting, or retry `pending` servers); (b) note `MCP_TIMEOUT` exists but did **not** help here, because the problem is *tool exposure timing*, not a connection timeout. Frame this as a request, and lean on (1)–(3) for what we control. + +**Recommendation:** chase **idea 1** (decouple `tools/list` from the daemon). It's the fix that makes codegraph "connected" instantly for everyone. Ship **idea 3** (pre-warm SessionStart hook) as a cheap mitigation in parallel. File the host-side request (4) but don't depend on it. + +--- + +## Key files / pointers + +- **Steering / tools:** `src/mcp/server-instructions.ts` (the `initialize` instructions — single source of truth), `src/mcp/tools.ts` (tool descriptions + handlers; `handleNode`/`handleFileView`/`handleSearch`, `getStaticTools`). +- **Startup / daemon / proxy:** `src/mcp/proxy.ts` (`runProxy`, `connectWithHello`, `runLocalHandshakeProxy`, PPID watchdog), `src/mcp/index.ts` (`runProxyWithLocalHandshake`, `spawnDetachedDaemon`), `src/mcp/daemon.ts`. +- **Runtime flags:** `src/extraction/wasm-runtime-flags.ts` (`RELAUNCH_GUARD_ENV=CODEGRAPH_WASM_RELAUNCHED`, `HOST_PPID_ENV=CODEGRAPH_HOST_PPID`). +- **Hooks (existing):** `scripts/agent-eval/block-read-hook.sh`, `scripts/agent-eval/hook-settings.json` (the eval's force-Read-0 hook — basis for the P1 redirect hook). +- **Installer (where to add a recommended hook):** `src/installer/targets/claude.ts`. +- **Eval harness:** `scripts/agent-eval/ab-new-vs-baseline.sh` (new-vs-baseline, pre-warm baked in), `run-all.sh` (with-vs-without), `parse-run.mjs` (tool-by-type counts; `codegraph tools exposed: 0` + 0 codegraph calls = ran without). +- **Doctrine:** `CLAUDE.md` → "Retrieval performance & dynamic-dispatch coverage" + the agent-eval note under "Validation methodology". + +## How to validate anything here +- **P1 (Read displacement):** `bash scripts/agent-eval/ab-new-vs-baseline.sh "" [baseline-ref]` — compare `Read` vs `mcp__codegraph__*` counts. ≥2 runs/arm (n=1 is noisy). Run non-nested for cleanest results. Use a *genuinely new* feature task (verify it doesn't already exist — the first A/B attempt wasted a run on an already-implemented `--quiet`). +- **P2 (startup):** time `tools/list` from `serve --mcp` (above); and count cold-start runs where `init` shows `connected` + tools > 0. Don't trust a single `pending` init snapshot — confirm by whether the agent actually called codegraph. + +## Constraints / gotchas to remember +- Descriptions/instructions are low-salience — **A/B every behavioral claim**, don't ship wording on faith. +- New tools < extending existing ones. +- The host's `init` snapshot can say `pending` even when the server then connects — judge by actual usage. +- Don't run evals nested for "clean" numbers unless pre-warmed; even then, a real terminal is better. + +## Suggested start order for the fresh session +1. **P2 idea 1** — verify whether `serve --mcp` answers `tools/list` locally/instantly; if not, decouple it from the daemon. (Highest-value, shippable, helps all users, no behavioral guesswork.) +2. **P1 idea 1** — prototype the PreToolUse(Read) redirect hook; A/B it. (Highest-value behavioral lever.) +3. Ship the P2 SessionStart pre-warm hook as a mitigation; file the host-side wait/retry request. diff --git a/scripts/agent-eval/ab-adoption.sh b/scripts/agent-eval/ab-adoption.sh new file mode 100644 index 000000000..eabf802c3 --- /dev/null +++ b/scripts/agent-eval/ab-adoption.sh @@ -0,0 +1,91 @@ +#!/usr/bin/env bash +# Does the agent PICK codegraph_node to read a file, vs the built-in Read tool? +# Build A/B: NEW build (HEAD, codegraph_node has Read parity) vs BASELINE build +# (a ref where it doesn't), BOTH codegraph-attached + pre-warmed, same task. The +# metric is tool CHOICE: Read calls vs codegraph_node[file] calls per run. +# +# Usage: ab-adoption.sh "" [runs-per-arm] [baseline-ref] +# Env: AGENT_EVAL_OUT (default: /tmp/ab-adoption) +set -uo pipefail +TARGET="${1:?usage: ab-adoption.sh \"\" [runs] [baseline-ref]}" +TASK="${2:?task required}" +RUNS="${3:-2}" +BASE_REF="${4:-HEAD~1}" +ENGINE="$(cd "$(dirname "$0")/../.." && pwd)" +BIN="$ENGINE/dist/bin/codegraph.js" +OUT="${AGENT_EVAL_OUT:-/tmp/ab-adoption}" + +command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; } +[ -d "$TARGET/.codegraph" ] || { echo "target not indexed: run 'codegraph init $TARGET' first"; exit 1; } +git -C "$ENGINE" diff --quiet && git -C "$ENGINE" diff --cached --quiet || { echo "engine has uncommitted changes — commit/stash first"; exit 1; } +CHANGED=$(git -C "$ENGINE" diff --name-only "$BASE_REF" HEAD -- src 2>/dev/null) +[ -n "$CHANGED" ] || { echo "no src/ changes between $BASE_REF and HEAD"; exit 1; } + +cleanup() { + pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null + git -C "$ENGINE" checkout HEAD -- $CHANGED 2>/dev/null + ( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) +} +trap cleanup EXIT +mkdir -p "$OUT" +echo "###### target=$TARGET runs/arm=$RUNS baseline=$BASE_REF" +echo "###### changed: $(echo "$CHANGED" | tr '\n' ' ')" +echo "###### task=$TASK"; echo + +prewarm() { + pkill -9 -f "serve --mcp --path $1" 2>/dev/null + CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" /dev/null 2>&1 & + node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" >/dev/null 2>&1 +} + +# Per-run tool-choice counts: Read vs codegraph_node[file] vs [symbol]. +count() { + node -e ' + const fs=require("fs"); + const lines=fs.readFileSync(process.argv[1],"utf8").split("\n").filter(Boolean); + let read=0,cgFile=0,cgSym=0,cgOther=0,exposed="?"; + for(const l of lines){try{const o=JSON.parse(l); + if(o.type==="system"&&o.subtype==="init"){exposed=(o.tools||[]).filter(t=>/codegraph/.test(t)).length;} + const blocks=o.message?.content||[]; + for(const b of (Array.isArray(blocks)?blocks:[])){ + if(b.type!=="tool_use")continue; + if(b.name==="Read")read++; + else if(b.name==="mcp__codegraph__codegraph_node"){ if(b.input&&b.input.symbol)cgSym++; else cgFile++; } + else if(/mcp__codegraph__/.test(b.name))cgOther++; + } + }catch{}} + console.log(` Read=${read} codegraph_node[file]=${cgFile} codegraph_node[symbol]=${cgSym} other_cg=${cgOther} (cg exposed=${exposed})`); + ' "$1" +} + +run_arm() { # label, N + local label="$1" n="$2" + local c="$OUT/mcp-$label.json" + for i in $(seq 1 "$n"); do + local tgt="$OUT/t-$label-$i" + rm -rf "$tgt" + rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$TARGET/" "$tgt/" + node "$BIN" init "$tgt" >/dev/null 2>&1 + printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$c" + prewarm "$tgt" + echo "----- [$label] run $i -----" + ( cd "$tgt" && claude -p "$TASK" \ + --output-format stream-json --verbose --permission-mode bypassPermissions \ + --model opus --max-budget-usd 4 --strict-mcp-config --mcp-config "$c" \ + "$OUT/run-$label-$i.jsonl" 2>"$OUT/run-$label-$i.err" ) + count "$OUT/run-$label-$i.jsonl" + pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null + done + echo +} + +echo "== NEW build (HEAD: codegraph_node has Read parity) ==" +( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built" +run_arm new "$RUNS" + +echo "== BASELINE build ($BASE_REF) ==" +git -C "$ENGINE" checkout "$BASE_REF" -- $CHANGED +( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built" +run_arm baseline "$RUNS" + +echo "###### DONE — compare [new] vs [baseline]: does codegraph_node[file] rise / Read fall? Logs: $OUT" diff --git a/scripts/agent-eval/ab-hook.sh b/scripts/agent-eval/ab-hook.sh new file mode 100644 index 000000000..8c1af32a3 --- /dev/null +++ b/scripts/agent-eval/ab-hook.sh @@ -0,0 +1,86 @@ +#!/usr/bin/env bash +# A/B the PreToolUse(Read) REDIRECT hook (P1): does steering Read → codegraph_node +# file-view actually move the agent off Read during implementation? BOTH arms use +# the CURRENT build with codegraph attached and pre-warmed; the only difference is +# the hook. Isolates the hook's behavioral effect from the build/file-view change +# (use ab-new-vs-baseline.sh for the build A/B). +# +# arm [nohook] — codegraph on, no hook (does the better file-view get picked on its own?) +# arm [hook] — codegraph on, + redirect hook (does routing close it?) +# +# Reliable attach (works nested): each arm pre-warms a persistent daemon and skips +# the startup re-exec (CODEGRAPH_WASM_RELAUNCHED=1), so claude connects before the +# agent's first turn. Judge by ACTUAL codegraph usage in parse-run.mjs's "by type", +# not claude's init snapshot (which can read pending even when it then connects). +# +# Usage: ab-hook.sh "" [runs-per-arm] +# a repo with a .codegraph index (copied per arm; never mutated) +# "" a GENUINELY-NEW implementation task (verify it isn't already done) +# [runs-per-arm] default 2 (n=1 is noisy — the doctrine says >=2) +# Env: AGENT_EVAL_OUT (default: /tmp/ab-hook) +set -uo pipefail + +TARGET="${1:?usage: ab-hook.sh \"\" [runs-per-arm]}" +TASK="${2:?task required}" +RUNS="${3:-2}" +ENGINE="$(cd "$(dirname "$0")/../.." && pwd)" +BIN="$ENGINE/dist/bin/codegraph.js" +HOOK="$ENGINE/scripts/agent-eval/redirect-read-hook.sh" +OUT="${AGENT_EVAL_OUT:-/tmp/ab-hook}" +PARSE="$ENGINE/scripts/agent-eval/parse-run.mjs" + +command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; } +command -v jq >/dev/null || { echo "jq not on PATH (the hook needs it)"; exit 1; } +[ -d "$TARGET/.codegraph" ] || { echo "target not indexed: run 'codegraph init $TARGET' first"; exit 1; } +chmod +x "$HOOK" + +cleanup() { pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null; } +trap cleanup EXIT + +mkdir -p "$OUT" +echo "###### engine=$ENGINE" +echo "###### target=$TARGET runs/arm=$RUNS" +echo "###### task=$TASK" +echo + +( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built" + +# A settings file carrying ONLY the PreToolUse(Read) redirect hook. +HOOK_SETTINGS="$OUT/hook-settings.json" +jq -n --arg cmd "bash $HOOK" \ + '{hooks:{PreToolUse:[{matcher:"Read",hooks:[{type:"command",command:$cmd}]}]}}' > "$HOOK_SETTINGS" + +prewarm() { # target — spawn a persistent daemon and wait for its socket + pkill -9 -f "serve --mcp --path $1" 2>/dev/null + CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" /dev/null 2>&1 & + node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" \ + && echo " daemon warm: $1" || echo " WARN: daemon never bound for $1" +} + +run_one() { # arm-label, run-index, use-hook(0|1) + local label="$1" idx="$2" hook="$3" + local tgt="$OUT/t-$label-$idx" c="$OUT/mcp-$label.json" + rm -rf "$tgt" + rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$TARGET/" "$tgt/" + node "$BIN" init "$tgt" >/dev/null 2>&1 + printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$c" + prewarm "$tgt" + local extra=() + [ "$hook" = "1" ] && extra=(--settings "$HOOK_SETTINGS") + echo "----- [$label] run $idx -----" + # ${extra[@]+...} guard: bash 3.2 (macOS) under `set -u` errors on an empty + # array expansion otherwise, which would skip the no-hook arm's claude run. + ( cd "$tgt" && claude -p "$TASK" \ + --output-format stream-json --verbose --permission-mode bypassPermissions \ + --model opus --max-budget-usd 4 --strict-mcp-config --mcp-config "$c" ${extra[@]+"${extra[@]}"} \ + "$OUT/run-$label-$idx.jsonl" 2>"$OUT/run-$label-$idx.err" ) + node "$PARSE" "$OUT/run-$label-$idx.jsonl" 2>&1 | grep -E "by type|Result" || echo " (parse failed — see $OUT/run-$label-$idx.jsonl)" + pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null + echo +} + +for i in $(seq 1 "$RUNS"); do run_one nohook "$i" 0; done +for i in $(seq 1 "$RUNS"); do run_one hook "$i" 1; done + +echo "###### DONE. Compare [nohook] vs [hook] 'by type' — Read should fall and" +echo "###### mcp__codegraph__codegraph_node should rise in the [hook] arm. Logs: $OUT" diff --git a/scripts/agent-eval/ab-impl.sh b/scripts/agent-eval/ab-impl.sh new file mode 100644 index 000000000..c5c23b58e --- /dev/null +++ b/scripts/agent-eval/ab-impl.sh @@ -0,0 +1,78 @@ +#!/usr/bin/env bash +# Sufficiency A/B for an IMPLEMENTATION task (the agent edits): when it uses +# codegraph (explore/node) to understand before editing, does it still Read? Like +# ab-sufficiency.sh but copies+indexes a FRESH target per run (the agent mutates +# it), so runs don't see each other's edits. +# +# WITH codegraph (pre-warmed) vs WITHOUT (empty MCP), N runs each. Reports +# explore/node vs Read/Grep + the files Read, and whether the build still passes. +# +# Usage: ab-impl.sh "" [runs] [build-cmd] +# Env: AGENT_EVAL_OUT (default: /tmp/ab-impl) +set -uo pipefail +REPO="${1:?usage: ab-impl.sh \"\" [runs] [build-cmd]}" +Q="${2:?task required}" +RUNS="${3:-2}" +BUILD_CMD="${4:-}" +ENGINE="$(cd "$(dirname "$0")/../.." && pwd)" +BIN="$ENGINE/dist/bin/codegraph.js" +OUT="${AGENT_EVAL_OUT:-/tmp/ab-impl}" +command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; } +[ -d "$REPO/.codegraph" ] || { echo "no .codegraph index at $REPO"; exit 1; } +cleanup(){ pkill -9 -f "serve --mcp --path $OUT/" 2>/dev/null; } +trap cleanup EXIT +mkdir -p "$OUT" +( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built engine" +echo "###### repo=$REPO runs/arm=$RUNS" +echo "###### task=$Q"; echo +echo '{"mcpServers":{}}' > "$OUT/mcp-empty.json" + +prewarm(){ + pkill -9 -f "serve --mcp --path $1" 2>/dev/null + CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$1" /dev/null 2>&1 & + node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$1" >/dev/null 2>&1 +} + +analyze(){ + node -e ' + const fs=require("fs"); + const L=fs.readFileSync(process.argv[1],"utf8").split("\n").filter(Boolean); + let ex=0,nf=0,ns=0,oc=0,gr=0,ed=0,exposed="?";const reads=[]; + for(const l of L){try{const o=JSON.parse(l); + if(o.type==="system"&&o.subtype==="init")exposed=(o.tools||[]).filter(t=>/codegraph/.test(t)).length; + for(const b of (o.message?.content||[])){if(b.type!=="tool_use")continue; + if(b.name==="mcp__codegraph__codegraph_explore")ex++; + else if(b.name==="mcp__codegraph__codegraph_node"){if(b.input&&b.input.symbol)ns++;else nf++;} + else if(/mcp__codegraph__/.test(b.name))oc++; + else if(b.name==="Read")reads.push((b.input?.file_path||"").split("/").pop()); + else if(b.name==="Grep")gr++; + else if(b.name==="Edit"||b.name==="Write")ed++; + }}catch{}} + console.log(` explore=${ex} node[sym]=${ns} node[file]=${nf} other_cg=${oc} | Read=${reads.length}${reads.length?" ("+reads.join(", ")+")":""} Grep=${gr} Edit=${ed} [cg exposed=${exposed}]`); + ' "$1" +} + +run(){ # label, withCodegraph(0/1) + local label="$1" wcg="$2" + for i in $(seq 1 "$RUNS"); do + local tgt="$OUT/t-$label-$i" cfg="$OUT/mcp-$label.json" + rm -rf "$tgt" + rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$REPO/" "$tgt/" + node "$BIN" init "$tgt" >/dev/null 2>&1 + if [ "$wcg" = "1" ]; then + printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$tgt" > "$cfg" + prewarm "$tgt" + else cp "$OUT/mcp-empty.json" "$cfg"; fi + ( cd "$tgt" && claude -p "$Q" --output-format stream-json --verbose \ + --permission-mode bypassPermissions --model opus --max-budget-usd 4 \ + --strict-mcp-config --mcp-config "$cfg" "$OUT/$label-$i.jsonl" 2>"$OUT/$label-$i.err" ) + echo "[$label] run $i:"; analyze "$OUT/$label-$i.jsonl" + if [ -n "$BUILD_CMD" ]; then ( cd "$tgt" && eval "$BUILD_CMD" >/dev/null 2>&1 && echo " build: PASS" || echo " build: FAIL" ); fi + pkill -9 -f "serve --mcp --path $tgt" 2>/dev/null + done + echo +} + +echo "== WITH codegraph =="; run with 1 +echo "== WITHOUT (Read/Grep only) =="; run without 0 +echo "###### DONE: $OUT" diff --git a/scripts/agent-eval/ab-sufficiency.sh b/scripts/agent-eval/ab-sufficiency.sh new file mode 100644 index 000000000..066253657 --- /dev/null +++ b/scripts/agent-eval/ab-sufficiency.sh @@ -0,0 +1,78 @@ +#!/usr/bin/env bash +# Sufficiency A/B: on a real understanding/flow question, WHEN the agent uses +# codegraph (explore/node), does it still Read? Premise under test: explore/node +# return source WITH line numbers, so a Read should not be needed. +# +# WITH codegraph (pre-warmed daemon, reliable nested attach) vs WITHOUT (empty +# MCP, Read/Grep only), N runs each, on a throwaway copy of the repo. Reports +# explore/node vs Read/Grep, and LISTS the files Read in the WITH arm so a true +# sufficiency gap (an indexed source file) is distinguishable from out-of-scope +# (configs, docs, a file codegraph didn't index). +# +# Usage: ab-sufficiency.sh "" [runs-per-arm] +# Env: AGENT_EVAL_OUT (default: /tmp/ab-sufficiency) +set -uo pipefail +REPO="${1:?usage: ab-sufficiency.sh \"\" [runs]}" +Q="${2:?question required}" +RUNS="${3:-2}" +ENGINE="$(cd "$(dirname "$0")/../.." && pwd)" +BIN="$ENGINE/dist/bin/codegraph.js" +OUT="${AGENT_EVAL_OUT:-/tmp/ab-sufficiency}" +TGT="$OUT/target" +command -v claude >/dev/null || { echo "claude CLI not on PATH"; exit 1; } +[ -d "$REPO/.codegraph" ] || { echo "no .codegraph index at $REPO"; exit 1; } +cleanup(){ pkill -9 -f "serve --mcp --path $TGT" 2>/dev/null; } +trap cleanup EXIT +mkdir -p "$OUT" +( cd "$ENGINE" && npm run build >/dev/null 2>&1 ) && echo "built" + +# Throwaway copy + fresh index (the agent works here; a read-only question won't +# edit, but isolate anyway). Excludes the source repo's index/build/vcs. +rm -rf "$TGT" +rsync -a --exclude node_modules --exclude .git --exclude dist --exclude .codegraph "$REPO/" "$TGT/" +node "$BIN" init "$TGT" >/dev/null 2>&1 && echo "indexed copy ($(node "$BIN" status --json 2>/dev/null | node -e 'let s="";process.stdin.on("data",d=>s+=d).on("end",()=>{try{console.log(JSON.parse(s).fileCount+" files")}catch{console.log("?")}})' 2>/dev/null || echo '?'))" + +echo "###### repo=$REPO runs/arm=$RUNS" +echo "###### Q=$Q"; echo +echo '{"mcpServers":{}}' > "$OUT/mcp-empty.json" +printf '{"mcpServers":{"codegraph":{"command":"env","args":["CODEGRAPH_WASM_RELAUNCHED=1","node","%s","serve","--mcp","--path","%s"]}}}' "$BIN" "$TGT" > "$OUT/mcp-cg.json" + +prewarm(){ + pkill -9 -f "serve --mcp --path $TGT" 2>/dev/null + CODEGRAPH_DAEMON_IDLE_TIMEOUT_MS=1800000 node "$BIN" serve --mcp --path "$TGT" /dev/null 2>&1 & + node -e 'const fs=require("fs");let n=0;const t=setInterval(()=>{if(fs.existsSync(process.argv[1]+"/.codegraph/daemon.sock")){clearInterval(t);process.exit(0)}if(n++>150){clearInterval(t);process.exit(1)}},100)' "$TGT" >/dev/null 2>&1 +} + +analyze(){ + node -e ' + const fs=require("fs"); + const L=fs.readFileSync(process.argv[1],"utf8").split("\n").filter(Boolean); + let ex=0,nf=0,ns=0,oc=0,gr=0,exposed="?";const reads=[]; + for(const l of L){try{const o=JSON.parse(l); + if(o.type==="system"&&o.subtype==="init")exposed=(o.tools||[]).filter(t=>/codegraph/.test(t)).length; + for(const b of (o.message?.content||[])){if(b.type!=="tool_use")continue; + if(b.name==="mcp__codegraph__codegraph_explore")ex++; + else if(b.name==="mcp__codegraph__codegraph_node"){if(b.input&&b.input.symbol)ns++;else nf++;} + else if(/mcp__codegraph__/.test(b.name))oc++; + else if(b.name==="Read")reads.push((b.input?.file_path||"").split("/").pop()); + else if(b.name==="Grep")gr++; + }}catch{}} + console.log(` explore=${ex} node[sym]=${ns} node[file]=${nf} other_cg=${oc} | Read=${reads.length}${reads.length?" ("+reads.join(", ")+")":""} Grep=${gr} [cg exposed=${exposed}]`); + ' "$1" +} + +run(){ # label, cfg, prewarm(0/1) + local label="$1" cfg="$2" pw="$3" + for i in $(seq 1 "$RUNS"); do + [ "$pw" = "1" ] && prewarm + ( cd "$TGT" && claude -p "$Q" --output-format stream-json --verbose \ + --permission-mode bypassPermissions --model opus --max-budget-usd 4 \ + --strict-mcp-config --mcp-config "$cfg" "$OUT/$label-$i.jsonl" 2>"$OUT/$label-$i.err" ) + echo "[$label] run $i:"; analyze "$OUT/$label-$i.jsonl" + done + echo +} + +echo "== WITH codegraph (premise: explore/node used -> Read ~0) =="; run with "$OUT/mcp-cg.json" 1 +echo "== WITHOUT (Read/Grep only — the contrast) =="; run without "$OUT/mcp-empty.json" 0 +echo "###### DONE. In the WITH arm: are explore/node>0 and Read~0? Any Read of an INDEXED source file = sufficiency gap. Logs: $OUT" diff --git a/scripts/agent-eval/redirect-read-hook.sh b/scripts/agent-eval/redirect-read-hook.sh new file mode 100755 index 000000000..3dce75652 --- /dev/null +++ b/scripts/agent-eval/redirect-read-hook.sh @@ -0,0 +1,38 @@ +#!/usr/bin/env bash +# PreToolUse(Read) REDIRECT hook — prototype for A/B (P1: get agents off Read and +# onto codegraph_node during implementation, not just for Q&A). +# +# When the agent Reads a SOURCE file, deny it and steer to codegraph_node's +# file-view, which (as of the Lever-1 change) returns the WHOLE file verbatim +# WITH line numbers — imports, top-level code, comments and all — PLUS the file's +# blast radius, in one call. That output is a strict superset of Read, so the +# redirect is lossless: the agent loses nothing by taking it, and gains who- +# depends-on-this for the edit it's about to make. +# +# Differs from block-read-hook.sh (which steers to explore/node-by-symbol): this +# names the FILE-VIEW path explicitly (file:"" + includeCode:true), the +# 1:1 Read replacement we're trying to get picked during implementation. +# +# Non-source files (configs, docs, lockfiles, .env) pass through to a real Read. +# A redirect to a file codegraph hasn't indexed SELF-CORRECTS: the file-view +# replies "No indexed file matches … Read it directly", so a just-created file +# never dead-ends — the agent Reads it on the next turn. +# +# Wire via: claude ... --settings +# Eval artifact only. The production version is an indexed-aware `codegraph` +# subcommand (cross-platform — no bash/jq — and queries the index so it never +# bounces a new/un-indexed file), wired opt-in by the installer. +set -uo pipefail +input="$(cat)" +fp="$(printf '%s' "$input" | jq -r '.tool_input.file_path // empty' 2>/dev/null)" +[ -n "$fp" ] || exit 0 +base="$(basename "$fp")" + +case "$fp" in + *.ts|*.tsx|*.js|*.jsx|*.mjs|*.cjs|*.py|*.go|*.rs|*.java|*.rb|*.php|*.swift|*.kt|*.kts|*.scala|*.c|*.cc|*.cpp|*.h|*.hpp|*.cs|*.lua|*.vue|*.svelte|*.m|*.mm) + msg="codegraph has this file indexed (kept in sync on every edit). Call codegraph_node with file:\"$base\" and includeCode:true instead of Read — it returns the WHOLE file verbatim WITH line numbers (imports, top-level code and all — safe to base an Edit on) PLUS which files depend on it, in one call. Treat its output as already-Read; do not Read this file. (If it answers that the file isn't indexed — e.g. you just created it — then Read it directly.)" + jq -n --arg m "$msg" '{hookSpecificOutput:{hookEventName:"PreToolUse",permissionDecision:"deny",permissionDecisionReason:$m}}' + exit 0 + ;; +esac +exit 0 diff --git a/src/mcp/server-instructions.ts b/src/mcp/server-instructions.ts index 5e197a8a9..b246899c7 100644 --- a/src/mcp/server-instructions.ts +++ b/src/mcp/server-instructions.ts @@ -48,7 +48,8 @@ typically one to a few calls; a grep/read exploration is dozens. - **"How does X reach/become Y? / the flow / the path from X to Y"** → \`codegraph_explore\`, naming the symbols that span the flow (e.g. \`mutateElement renderScene\`) — it surfaces the call path among them, including dynamic-dispatch hops (callbacks, React re-render, JSX children) grep can't follow - **"What is the symbol named X?" (just its location)** → \`codegraph_search\` - **"What calls this?" / "What does this call?" / "What would changing this break?"** → \`codegraph_callers\` / \`codegraph_callees\` / \`codegraph_impact\` -- **About to read or edit a symbol you can name** → \`codegraph_node\` (SECONDARY — the after-explore depth tool) instead of \`Read\`: it returns the **verbatim current on-disk source** (safe to base an \`Edit\` on) PLUS its caller/callee trail — the same bytes Read gives you, plus who calls it and what your change would break, for fewer tokens. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload. Or pass a FILE PATH alone (no symbol) to get that whole file's symbol map + what depends on it — a Read replacement for a source file +- **Reading a source FILE (any time you'd use the \`Read\` tool)** → \`codegraph_node\` with a \`file\` path and no \`symbol\`. It returns the file's **current source with line numbers — the same \`\\t\` shape \`Read\` gives you, safe to \`Edit\` from** — narrowable with \`offset\`/\`limit\` exactly like \`Read\`, PLUS a one-line note of which files depend on it. Same bytes as \`Read\`, faster (served from the index), with the blast radius attached. Use it **instead of \`Read\`** for indexed source files; fall back to \`Read\` only for what codegraph doesn't index (configs, docs). Pass \`symbolsOnly: true\` for just the file's structure. +- **About to read or edit a symbol you can name** → \`codegraph_node\` with that \`symbol\` (SECONDARY — the after-explore depth tool): the verbatim source (\`includeCode: true\`) PLUS its caller/callee trail, so before changing it you see what calls it and what your edit would break. For an OVERLOADED name it returns EVERY matching definition's body in one call, so you never Read a file to find the right overload - **"What's in directory X?"** → \`codegraph_files\` - **"Is the index ready / what's its size?"** → \`codegraph_status\` @@ -65,7 +66,7 @@ typically one to a few calls; a grep/read exploration is dozens. - **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature. - **Don't chain \`codegraph_search\` + \`codegraph_node\`** to understand an area — ONE \`codegraph_explore\` returns the relevant symbols' source together in a single round-trip. - **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol. -- **Don't \`Read\` a file just to see or edit a symbol you can name** — \`codegraph_node\` returns the same current source plus its caller/callee trail in one call, for fewer tokens. Reach for raw \`Read\` only for what codegraph doesn't index (configs, docs) or when the staleness banner flags a file as pending re-index. +- **Don't reach for the \`Read\` tool on an indexed source file** — \`codegraph_node\` with a \`file\` reads it for you (same \`\\t\` source, \`offset\`/\`limit\` like Read, faster, with its blast radius), and with a \`symbol\` it returns the source plus the caller/callee trail. Reach for raw \`Read\` only for what codegraph doesn't index (configs, docs) or when the staleness banner flags a file as pending re-index. - **After editing, check the staleness banner.** When a tool response starts with "⚠️ Some files referenced below were edited since the last index sync…", the listed files are pending re-index — Read those specific files for accurate content. Every file NOT in that banner is fresh, so still trust codegraph. \`codegraph_status\` also lists pending files under "Pending sync". ## Limitations diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index 8e1f47066..8a696fbc5 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -26,7 +26,7 @@ import { existsSync, readFileSync, } from 'fs'; -import { clamp, validatePathWithinRoot, validateProjectPath, isConfigLeafNode } from '../utils'; +import { clamp, validatePathWithinRoot, validateProjectPath, isConfigLeafNode, CONFIG_LEAF_LANGUAGES } from '../utils'; import { isGeneratedFile } from '../extraction/generated-detection'; import { resolve as resolvePath } from 'path'; @@ -463,26 +463,39 @@ export const tools: ToolDefinition[] = [ }, { name: 'codegraph_node', - description: 'SECONDARY (after codegraph_explore): the Read upgrade for ONE symbol you can name. Returns its location, signature, the verbatim CURRENT on-disk source (includeCode=true — the same bytes Read would give you, safe to base an Edit on), AND its caller/callee trail in a single call — so before changing a symbol you already see what calls it and what your edit would break, for fewer tokens than reading the file. Prefer it over Read whenever you know the symbol name. Or pass `file` ALONE (no symbol) to get that whole source file\'s symbol map + what depends on it — a Read replacement for a file. When the name is AMBIGUOUS (an overloaded method, or the same name on different types) it returns EVERY matching definition\'s full body in one call — so you never Read a file to find the right overload; pass `file` (and/or `line`) to pin one. Use codegraph_explore for several related symbols or the full flow.', + description: 'Two modes. (1) READ A FILE — use INSTEAD of the Read tool: pass `file` (a path or basename) with no `symbol` and it returns that file\'s current on-disk source with line numbers, exactly the shape Read gives you (`\\t`, safe to Edit from), narrowable with `offset`/`limit` just like Read — PLUS a one-line note of which files depend on it. Same bytes as Read, faster (served from the index), with the blast radius attached. Use it whenever you would Read a source file. (2) ONE SYMBOL you can name — its location, signature, verbatim source (includeCode=true) and caller/callee trail in one call, so before changing it you see what calls it and what your edit would break. For an AMBIGUOUS name it returns EVERY matching definition\'s body in one call (so you never Read a file to find the right overload); pass `file`/`line` to pin one. Use codegraph_explore for several related symbols or the full flow.', inputSchema: { type: 'object', properties: { symbol: { type: 'string', - description: 'Name of the symbol to get details for. Omit it and pass `file` alone to get the whole file\'s symbols + dependents (a Read replacement).', + description: 'Name of the symbol to read (symbol mode). Omit it and pass `file` alone to read a whole file like Read.', }, includeCode: { type: 'boolean', - description: 'Include full source bodies (default: false to minimize context). In file mode, returns every symbol\'s body up to a size budget.', + description: 'Symbol mode: include the symbol\'s full body (default: false). Ignored in file mode, which always returns source unless `symbolsOnly` is set.', default: false, }, file: { type: 'string', - description: 'A file path or basename (e.g. "harness.rs", "src/auth/session.ts"). Pass it ALONE (no symbol) to get that whole file\'s symbol map + dependents — a Read replacement. Or pass it WITH a symbol to disambiguate an overloaded name to the definition in this file.', + description: 'A file path or basename (e.g. "harness.rs", "src/auth/session.ts"). Pass it ALONE (no symbol) to READ the file like the Read tool — its full source with line numbers + which files depend on it. Or pass it WITH a symbol to disambiguate an overloaded name to the definition in this file.', + }, + offset: { + type: 'number', + description: 'File mode: 1-based line to start reading from, exactly like Read\'s offset. Defaults to the start of the file.', + }, + limit: { + type: 'number', + description: 'File mode: maximum number of lines to return, exactly like Read\'s limit. Defaults to the whole file (capped at 2000 lines, like Read).', + }, + symbolsOnly: { + type: 'boolean', + description: 'File mode: return just the file\'s symbol map + dependents (a cheap structural overview) instead of its source.', + default: false, }, line: { type: 'number', - description: 'Optional: disambiguate to the definition at/around this line (use with the file:line a trail showed you).', + description: 'Symbol mode only: disambiguate to the definition at/around this line (use with the file:line a trail showed you).', }, projectPath: projectPathProperty, }, @@ -2527,14 +2540,18 @@ export class ToolHandler { const includeCode = args.includeCode === true; const fileHint = typeof args.file === 'string' && args.file.trim() ? args.file.trim() : undefined; const lineHint = typeof args.line === 'number' && args.line > 0 ? args.line : undefined; + const offset = typeof args.offset === 'number' && args.offset > 0 ? Math.floor(args.offset) : undefined; + const limit = typeof args.limit === 'number' && args.limit > 0 ? Math.floor(args.limit) : undefined; + const symbolsOnly = args.symbolsOnly === true; const symbolRaw = typeof args.symbol === 'string' ? args.symbol.trim() : ''; - // FILE-VIEW MODE: a bare `file` with no `symbol` returns that file's symbol - // map + graph role (which files depend on it) — and, with includeCode, the - // bodies. A Read replacement for "show me file X" that also surfaces the - // blast radius, so an edit is made with impact in view. + // FILE READ MODE: a `file` with no `symbol` reads that file like the Read + // tool — its current on-disk source with line numbers, narrowable with + // `offset`/`limit` exactly as Read does — PLUS a one-line blast-radius + // header (which files depend on it). `symbolsOnly` returns just the + // structural map instead. Backed by the index: same bytes Read gives you. if (!symbolRaw && fileHint) { - return this.handleFileView(cg, fileHint, includeCode); + return this.handleFileView(cg, fileHint, { offset, limit, symbolsOnly }); } const symbol = this.validateString(args.symbol, 'symbol'); @@ -2634,11 +2651,23 @@ export class ToolHandler { } /** - * FILE-VIEW: resolve `fileArg` (path or basename) to an indexed file and - * return its symbol map + graph role (which files depend on it), plus bodies - * when `includeCode`. A Read replacement that also surfaces the blast radius. + * FILE READ MODE: resolve `fileArg` (path or basename) to an indexed file and + * read it like the Read tool — its current on-disk source with line numbers, + * narrowable with `offset`/`limit` exactly as Read's are — preceded by a + * one-line blast-radius header (which files depend on it). `symbolsOnly` + * returns just the structural map (symbols + dependents) instead of source. + * + * Parity goal: the numbered source block is byte-for-byte the shape Read + * returns (`\t`, no padding), so the agent treats it as a Read — only + * faster (served from the index) and with the blast radius attached. Security: + * yaml/properties files are summarized by key, never dumped (#383); reads go + * through validatePathWithinRoot (#527). */ - private async handleFileView(cg: CodeGraph, fileArg: string, includeCode: boolean): Promise { + private async handleFileView( + cg: CodeGraph, + fileArg: string, + opts: { offset?: number; limit?: number; symbolsOnly?: boolean } = {}, + ): Promise { const normalize = (p: string) => p.replace(/\\/g, '/').replace(/^(?:\.?\/+)+/, '').replace(/\/+$/, ''); const wantLower = normalize(fileArg).toLowerCase(); const allFiles = cg.getFiles(); @@ -2672,62 +2701,96 @@ export class ToolHandler { .sort((a, b) => a.startLine - b.startLine); const dependents = cg.getFileDependents(filePath); - const out: string[] = [`**${filePath}** — ${nodes.length} symbol${nodes.length === 1 ? '' : 's'}`]; - if (dependents.length) { - out.push( - `Depended on by ${dependents.length} file${dependents.length === 1 ? '' : 's'}` + - `${dependents.length > 8 ? ' (first 8)' : ''}: ${dependents.slice(0, 8).join(', ')}${dependents.length > 8 ? ', …' : ''}`, - '> Editing a symbol here can affect those files — run codegraph_impact on the specific symbol for its exact blast radius.', - ); - } else { - out.push('No other indexed file depends on this one.'); - } - out.push(''); + // Compact, one-line blast radius (codegraph's value-add over a plain Read). + const depSummary = dependents.length + ? `used by ${dependents.length} file${dependents.length === 1 ? '' : 's'}: ${dependents.slice(0, 8).join(', ')}${dependents.length > 8 ? `, +${dependents.length - 8} more` : ''}` + : 'no other indexed file depends on it'; + + // Symbol-map renderer — for symbolsOnly, the config fallback, and read errors. + const symbolMap = (heading: string, limit = 200): string[] => { + const lines: string[] = [heading]; + for (const n of nodes.slice(0, limit)) { + const sig = n.signature ? ` ${n.signature.replace(/\s+/g, ' ').trim()}` : ''; + lines.push(`- \`${n.name}\` (${n.kind})${sig} — :${n.startLine}`); + } + if (nodes.length > limit) lines.push(`- … +${nodes.length - limit} more`); + return lines; + }; - if (nodes.length === 0) { - out.push('_No indexed symbols in this file (codegraph may track it but not parse it for symbols)._'); + // symbolsOnly → the cheap structural overview, no source. + if (opts.symbolsOnly) { + const out = [`**${filePath}** — ${nodes.length} symbol${nodes.length === 1 ? '' : 's'}, ${depSummary}`, '']; + if (nodes.length) out.push(...symbolMap('### Symbols')); + else out.push('_No indexed symbols in this file._'); + out.push('', '> Drop `symbolsOnly` (or pass `offset`/`limit`) to read the source, like Read.'); return this.textResult(this.truncateOutput(out.join('\n'))); } - if (!includeCode) { - out.push('### Symbols'); - for (const n of nodes) { - const sig = n.signature ? ` ${n.signature.replace(/\s+/g, ' ').trim()}` : ''; - out.push(`- \`${n.name}\` (${n.kind})${sig} — :${n.startLine}`); - } - out.push('', '> Call again with `includeCode:true` for the bodies, or `codegraph_node ` for one symbol in full.'); + // SECURITY (#383): never dump a raw config/data file — a yaml/properties + // line is `key: `. Summarize by key and point to a real Read. + if (CONFIG_LEAF_LANGUAGES.has(resolved.language)) { + const out = [`**${filePath}** — configuration/data file, ${depSummary}`, '']; + if (nodes.length) out.push(...symbolMap('### Keys (values withheld for safety)')); + out.push('', '> Values may be secrets, so codegraph indexes keys only. Read the file directly if you need a value.'); return this.textResult(this.truncateOutput(out.join('\n'))); } - // Render each OUTERMOST symbol's verbatim body (a container's body already - // includes its members, so skip anything covered) — no duplication, and no - // "read the file" container outline. Budget-capped. - out.push('### Source (verbatim — treat as already Read)'); - const BODY_BUDGET = 14000; - const outermost = [...nodes].sort((a, b) => - a.startLine - b.startLine || (b.endLine ?? b.startLine) - (a.endLine ?? a.startLine)); - const covered: Array<[number, number]> = []; - let used = out.join('\n').length; - const listed: Node[] = []; - for (const n of outermost) { - const end = n.endLine ?? n.startLine; - if (covered.some(([s, e]) => s <= n.startLine && e >= end)) continue; - const code = await cg.getCode(n.id); - if (!code) continue; - const section = `#### \`${n.name}\` (${n.kind}) — :${n.startLine}\n\`\`\`\n${code}\n\`\`\``; - if (used + section.length <= BODY_BUDGET || used < 1500) { - out.push('', section); - used += section.length; - covered.push([n.startLine, end]); - } else { - listed.push(n); - } + // Read the current bytes from disk through the security chokepoint + // (validatePathWithinRoot: blocks `../` traversal and symlink escapes, #527). + const abs = validatePathWithinRoot(cg.getProjectRoot(), filePath); + let content: string | null = null; + if (abs) { + try { content = readFileSync(abs, 'utf-8'); } catch { content = null; } } - if (listed.length) { - out.push('', `### ${listed.length} more symbol${listed.length === 1 ? '' : 's'} (over the size budget — fetch with codegraph_node )`, - ...listed.slice(0, 30).map((n) => `- \`${n.name}\` (${n.kind}) — :${n.startLine}`)); + if (content === null) { + const out = [`**${filePath}** — could not read from disk (it may have moved since indexing). ${depSummary}`, '']; + if (nodes.length) out.push(...symbolMap('### Symbols')); + out.push('', `> Read \`${filePath}\` directly for its current content.`); + return this.textResult(this.truncateOutput(out.join('\n'))); } - return this.textResult(this.truncateOutput(out.join('\n'))); + + // Split exactly as Read does — keep the trailing empty line a final newline + // produces (Read numbers it too), so line numbers line up byte-for-byte. + const fileLines = content.split('\n'); + const total = fileLines.length; + + // Read-parity windowing: `offset`/`limit` mean exactly what they do on Read + // (1-based start line; max line count). Default: the whole file, capped like + // Read at 2000 lines and bounded by a char budget that tracks explore's + // proven-safe ~38k response ceiling. Overflow is stated explicitly (Read + // paginates too) — never the silent 15k truncateOutput chop. + const CHAR_BUDGET = 38000; + const DEFAULT_LIMIT = 2000; + const offset = Math.max(1, opts.offset ?? 1); + if (offset > total) { + return this.textResult(`**${filePath}** has ${total} line${total === 1 ? '' : 's'} — offset ${offset} is past the end. ${depSummary}`); + } + const maxLines = Math.max(1, opts.limit ?? DEFAULT_LIMIT); + const start = offset - 1; // 0-based + const header = `**${filePath}** — ${total} lines, ${nodes.length} symbol${nodes.length === 1 ? '' : 's'} · ${depSummary}`; + + // Numbered lines, byte-for-byte Read's shape: `\t`, no left-pad. + const numbered: string[] = []; + let used = header.length + 8; + let i = start; + for (; i < total && numbered.length < maxLines; i++) { + const ln = `${i + 1}\t${fileLines[i]}`; + if (used + ln.length + 1 > CHAR_BUDGET && numbered.length > 0) break; + numbered.push(ln); + used += ln.length + 1; + } + const shownEnd = start + numbered.length; + const complete = offset === 1 && shownEnd >= total; + + const out: string[] = [header, '', ...numbered]; + if (!complete) { + out.push( + '', + `(lines ${offset}–${shownEnd} of ${total} — pass \`offset\`/\`limit\` for another range, or \`codegraph_node \` for one symbol in full)`, + ); + } + // Self-bounded to CHAR_BUDGET — do NOT route through truncateOutput (15k). + return this.textResult(out.join('\n')); } /** Render one symbol: details + (optional) body/outline + its caller/callee trail. */ From 636d9fcb7d797485d059f38053aefa8b7f98544c Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Mon, 8 Jun 2026 19:05:36 -0400 Subject: [PATCH 03/51] feat(extraction): index string-literal names in generic tuple type aliases (#634) (#740) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TypeScript service/RPC contracts written as a tuple of generic types — `type List = [Service<'query_apply_record', Req, Resp>, …]` — carry their names only as string-literal type arguments, so static extraction never indexed them and `codegraph query query_apply_record` returned nothing. Add a narrow TS/TSX type-alias pass that emits each tuple entry's string-literal name as a `method` node under the alias (qualifiedName `List::query_apply_record`), making it searchable. Scope is limited to a direct literal arg of a generic that is a direct tuple element, with a valid-identifier filter — so utility types (Pick/Omit/Record), deeper nested generics, and route paths produce no noise. Bumps EXTRACTION_VERSION so existing indexes get a re-index hint. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/extraction.test.ts | 53 ++++++++++++++++++++ src/extraction/extraction-version.ts | 2 +- src/extraction/tree-sitter.ts | 72 ++++++++++++++++++++++++++++ 4 files changed, 127 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a94dadb45..54bfc8951 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,6 +24,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - ASP.NET Razor (`.cshtml`) and Blazor (`.razor`) markup are now parsed for code relationships. A `@model` / `@inherits` / `@inject` directive links the view to the C# view-model, base type, or service it names; a Blazor `` tag (plus `@typeof(...)` and generic `TItem="..."` arguments) links to the component class; and the C# inside `@code { }` / `@functions { }` / `@{ }` blocks is analyzed too, so services and types used in component logic are linked. A view-model, component, or service referenced only from markup is no longer reported as having no dependents, and editing it surfaces the views that use it. (ASP.NET, Blazor) - A Razor/Blazor type reference now resolves through the component's `@using` namespaces — including the folder's cascading `_Imports.razor` — so a simple name that exists in several namespaces lands on the right one. A `@model` / `` / `@code` reference to `CatalogBrand` resolves to the `@using`'d DTO (`BlazorShared.Models.CatalogBrand`) rather than a same-named domain entity. (ASP.NET, Blazor) - `codegraph status --json` now also reports the running CLI `version`, the index directory (`indexPath`), and a `lastIndexed` timestamp (ISO-8601, or null when nothing's indexed yet), so CI and scripts can pin the CLI version and check index freshness from a single command. A matching `CodeGraph.getLastIndexedAt()` library method exposes the same freshness check without shelling out. Thanks @12122J and @eddieran. (#329) +- TypeScript service/RPC contracts defined as a tuple of generic types — `type MyServiceList = [Service<'query_apply_record', …>, Service<'apply_confirm', …>]` — now index each entry's string-literal name as a searchable symbol. Previously these names existed only as type arguments, so `codegraph query query_apply_record` found nothing even though the names are the app's primary API surface. The pattern is common in typed RPC / BFF clients and mock servers where the types are the source of truth for a runtime proxy object. Utility types (`Pick`, `Omit`, `Record`) and route paths are deliberately left out to avoid noise. Thanks @jiezhiyong. (#634) (TypeScript) ### Fixes diff --git a/__tests__/extraction.test.ts b/__tests__/extraction.test.ts index 6d9af6066..89918c58b 100644 --- a/__tests__/extraction.test.ts +++ b/__tests__/extraction.test.ts @@ -451,6 +451,59 @@ type Internal = string; expect(exported).toHaveLength(2); expect(exported.map((n) => n.name).sort()).toEqual(['DateFormat', 'UnitSystem']); }); + + // A service/contract registry written as a tuple of generic instantiations — + // the names are string-literal type arguments, not declarations, so static + // extraction otherwise never indexes them (issue #634). + it('extracts string-literal contract names from a generic tuple type alias (#634)', () => { + const code = ` +interface Service { name: Name; } +export type MyServiceList = [ + Service<'query_apply_record', { pageNo: number }, { ok: boolean }>, + Service<'apply_confirm', { code: string }, { ok: boolean }> +]; +`; + const result = extractFromSource('services/api.ts', code); + + const names = result.nodes.filter( + (n) => n.kind === 'method' && n.qualifiedName.startsWith('MyServiceList::') + ); + expect(names.map((n) => n.name).sort()).toEqual(['apply_confirm', 'query_apply_record']); + + const queryNode = names.find((n) => n.name === 'query_apply_record'); + expect(queryNode?.qualifiedName).toBe('MyServiceList::query_apply_record'); + // Signature carries the full contract entry so search results show context. + expect(queryNode?.signature).toContain("Service<'query_apply_record'"); + + // The string-literal name is contained by the type alias. + const alias = result.nodes.find((n) => n.kind === 'type_alias' && n.name === 'MyServiceList'); + const containsEdge = result.edges.find( + (e) => e.kind === 'contains' && e.source === alias?.id && e.target === queryNode?.id + ); + expect(containsEdge).toBeDefined(); + }); + + it('does not extract string literals from utility types or nested generics (#634)', () => { + const code = ` +interface User { id: string; name: string; } +interface Service { name: Name; } +export type Picked = Pick; +export type Rec = Record<'foo' | 'bar', number>; +// Tuple entry, but the name is a non-identifier route path; the nested Pick's +// 'id' must also stay out (only DIRECT literal args of a tuple's generic count). +export type Routes = [Service<'/api/users', Pick, {}>]; +// Bare string-literal tuple — not generic type arguments. +export type Names = ['alpha', 'beta']; +`; + const result = extractFromSource('noise.ts', code); + + const leaked = result.nodes.filter( + (n) => + (n.kind === 'method' || n.kind === 'property') && + ['id', 'name', 'foo', 'bar', 'alpha', 'beta'].includes(n.name) + ); + expect(leaked).toEqual([]); + }); }); describe('Exported Variable Extraction', () => { diff --git a/src/extraction/extraction-version.ts b/src/extraction/extraction-version.ts index 5ca8f2d24..e216292cc 100644 --- a/src/extraction/extraction-version.ts +++ b/src/extraction/extraction-version.ts @@ -21,4 +21,4 @@ * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty * in the product is load-bearing"). */ -export const EXTRACTION_VERSION = 1; +export const EXTRACTION_VERSION = 2; diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index f237efc5d..6febee652 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -1677,6 +1677,9 @@ export class TreeSitterExtractor { // an unrelated class method picked by path-proximity (#359). if (this.language === 'typescript' || this.language === 'tsx') { this.extractTsTypeAliasMembers(value, typeAliasNode); + // `type List = [ Service<'name', Req, Resp>, … ]` — surface each + // entry's string-literal name as a searchable member (issue #634). + this.extractTsTupleContractNames(value, typeAliasNode); } } } @@ -1763,6 +1766,75 @@ export class TreeSitterExtractor { this.nodeStack.pop(); } + /** + * Surface the string-literal "names" of a TypeScript service/contract + * registry written as a tuple of generic instantiations: + * + * type MyServiceList = [ + * Service<'query_apply_record', Req, Resp>, + * Service<'apply_confirm', Req, Resp>, + * ]; + * + * Each `Service<'name', …>` tags an entry with a string-literal name that a + * dynamic factory (`createService()`) turns into a callable + * property (`api.query_apply_record(…)`). Static extraction otherwise never + * sees that name — it's a type argument, not a declaration — so + * `codegraph query query_apply_record` returned nothing (issue #634). We emit + * each name as a `method` node under the type alias (qualifiedName + * `MyServiceList::query_apply_record`) so it's searchable and resolvable as a + * symbol. (A call through the proxy, `api.query_apply_record(…)`, still + * resolves to the imported `api` binding — the receiver's type isn't known — + * so this fixes discoverability, not the per-method call edge.) + * + * Scope is deliberately narrow to avoid noise: only a string literal that is + * a DIRECT type argument of a `generic_type` that is itself a DIRECT element + * of a `tuple_type`. This excludes utility types (`Pick`/`Omit`/`Record` are + * never written as tuples) and string args nested deeper + * (`Service<'a', Pick>` yields only `a`, never `id`). Names must be + * valid identifiers, which also rules out route paths / arbitrary strings. + */ + private extractTsTupleContractNames(value: SyntaxNode, typeAliasNode: Node): void { + const tuples: SyntaxNode[] = []; + const collectTuples = (n: SyntaxNode, depth: number): void => { + if (depth > 6) return; // a type expression is shallow; cap defensively + if (n.type === 'tuple_type') tuples.push(n); + for (let i = 0; i < n.namedChildCount; i++) { + const c = n.namedChild(i); + if (c) collectTuples(c, depth + 1); + } + }; + collectTuples(value, 0); + if (tuples.length === 0) return; + + this.nodeStack.push(typeAliasNode.id); + for (const tuple of tuples) { + for (let i = 0; i < tuple.namedChildCount; i++) { + const entry = tuple.namedChild(i); + if (!entry || entry.type !== 'generic_type') continue; + const typeArgs = getChildByField(entry, 'type_arguments'); + if (!typeArgs) continue; + for (let j = 0; j < typeArgs.namedChildCount; j++) { + const arg = typeArgs.namedChild(j); + if (!arg || arg.type !== 'literal_type') continue; + // literal_type wraps the actual literal; only a string is a name. + const strNode = arg.namedChild(0); + if (!strNode || strNode.type !== 'string') continue; + const name = getNodeText(strNode, this.source) + .trim() + .replace(/^['"`]/, '') + .replace(/['"`]$/, ''); + if (!/^[A-Za-z_$][A-Za-z0-9_$]*$/.test(name)) continue; + const signature = getNodeText(entry, this.source).replace(/\s+/g, ' ').trim().slice(0, 120); + this.createNode('method', name, entry, { + signature, + qualifiedName: `${typeAliasNode.name}::${name}`, + }); + } + } + } + this.nodeStack.pop(); + } + /** * `foo: () => T` → property_signature whose type_annotation contains a * `function_type`. Treat that as a method-shaped contract member, since From a56d9e6941ce7233a0fb9c7f5e1c225a50c97b36 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Mon, 8 Jun 2026 19:31:50 -0400 Subject: [PATCH 04/51] feat(directory): CODEGRAPH_DIR env var to override the index dir name (#636) (#741) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two environments that share one working tree — most concretely Windows and WSL — can't safely share a single `.codegraph/`: the daemon lockfile records a platform-specific pid + socket (named pipe vs Unix socket), and SQLite locking across the WSL2/Windows filesystem boundary is unreliable, so two daemons over one index risks corruption. Add a `CODEGRAPH_DIR` env var (default `.codegraph`) that overrides the per-project data directory name, so each environment keeps its own index in the same tree (e.g. `CODEGRAPH_DIR=.codegraph-win` on Windows). The name is resolved live and validated (rejects separators / `..` / absolute, falling back to the default with a one-time stderr warning). Indexing and file-watching now skip ANY `.codegraph-*` sibling so neither side trips over the other's data. Routes the previously-hardcoded `.codegraph` literals (db path, lockfile, error log, watcher ignore, file-scan skip, installer) through the resolver. No extraction-version bump — index content is unchanged. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + README.md | 2 + __tests__/foundation.test.ts | 92 +++++++++++++++++++++++++++++++++++- src/bin/codegraph.ts | 4 +- src/db/index.ts | 3 +- src/directory.ts | 72 ++++++++++++++++++++++++++-- src/extraction/index.ts | 6 ++- src/index.ts | 3 +- src/installer/index.ts | 5 +- src/sync/watcher.ts | 7 ++- 10 files changed, 182 insertions(+), 13 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 54bfc8951..faf31754b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -25,6 +25,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - A Razor/Blazor type reference now resolves through the component's `@using` namespaces — including the folder's cascading `_Imports.razor` — so a simple name that exists in several namespaces lands on the right one. A `@model` / `` / `@code` reference to `CatalogBrand` resolves to the `@using`'d DTO (`BlazorShared.Models.CatalogBrand`) rather than a same-named domain entity. (ASP.NET, Blazor) - `codegraph status --json` now also reports the running CLI `version`, the index directory (`indexPath`), and a `lastIndexed` timestamp (ISO-8601, or null when nothing's indexed yet), so CI and scripts can pin the CLI version and check index freshness from a single command. A matching `CodeGraph.getLastIndexedAt()` library method exposes the same freshness check without shelling out. Thanks @12122J and @eddieran. (#329) - TypeScript service/RPC contracts defined as a tuple of generic types — `type MyServiceList = [Service<'query_apply_record', …>, Service<'apply_confirm', …>]` — now index each entry's string-literal name as a searchable symbol. Previously these names existed only as type arguments, so `codegraph query query_apply_record` found nothing even though the names are the app's primary API surface. The pattern is common in typed RPC / BFF clients and mock servers where the types are the source of truth for a runtime proxy object. Utility types (`Pick`, `Omit`, `Record`) and route paths are deliberately left out to avoid noise. Thanks @jiezhiyong. (#634) (TypeScript) +- New `CODEGRAPH_DIR` environment variable sets the per-project index directory name (default `.codegraph`). This lets one working tree hold two independent indexes — most usefully when you open the same checkout from both **Windows** and **WSL**, which can't safely share a single `.codegraph/`: the background-server lock and the SQLite database are tied to the OS that wrote them, and SQLite locking across the WSL2/Windows filesystem boundary is unreliable. Set `CODEGRAPH_DIR=.codegraph-win` on the Windows side, leave WSL on the default, and each keeps its own index in the same folder without clobbering the other. CodeGraph also skips any sibling `.codegraph-*` directory when indexing and watching, so neither environment trips over the other's data. Thanks @rrtt2323. (#636) ### Fixes diff --git a/README.md b/README.md index 7c7b84a9e..ab147b548 100644 --- a/README.md +++ b/README.md @@ -684,6 +684,8 @@ Framework routing is validated the same way, on a canonical app per framework: E **Missing symbols** — The MCP server auto-syncs on save (wait a couple seconds). Run `codegraph sync` manually if needed. Check that the file's language is supported and isn't inside a `.gitignore`d or default-excluded directory (e.g. `node_modules`, `dist`). +**Sharing one checkout between Windows and WSL** — Don't point both at the same `.codegraph/`: the background-server lock and the SQLite index are tied to the OS that wrote them, and SQLite locking across the WSL2/Windows filesystem boundary is unreliable. Give each side its own index in the same tree by setting `CODEGRAPH_DIR` to a distinct name on one of them — e.g. `CODEGRAPH_DIR=.codegraph-win` on Windows, leaving WSL on the default `.codegraph`. CodeGraph skips any sibling `.codegraph-*` directory when indexing and watching, so the two never trip over each other. + ## Star History diff --git a/__tests__/foundation.test.ts b/__tests__/foundation.test.ts index 9933cf8c5..12b11c21f 100644 --- a/__tests__/foundation.test.ts +++ b/__tests__/foundation.test.ts @@ -10,7 +10,7 @@ import * as path from 'path'; import * as os from 'os'; import { CodeGraph } from '../src'; import { Node, Edge } from '../src/types'; -import { isInitialized, getCodeGraphDir, validateDirectory } from '../src/directory'; +import { isInitialized, getCodeGraphDir, validateDirectory, codeGraphDirName, isCodeGraphDataDir } from '../src/directory'; import { DatabaseConnection, getDatabasePath } from '../src/db'; // Create a temporary directory for each test @@ -306,3 +306,93 @@ describe('Query Builder', () => { expect(files).toEqual([]); }); }); + +// Two environments that share one working tree (Windows-native + WSL) must not +// share one `.codegraph/`. CODEGRAPH_DIR overrides the data directory name so +// each side keeps its own index in the same tree (issue #636). +describe('CODEGRAPH_DIR override (#636)', () => { + const saved = process.env.CODEGRAPH_DIR; + let tempDir: string; + + beforeEach(() => { + tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-dirname-')); + }); + afterEach(() => { + if (saved === undefined) delete process.env.CODEGRAPH_DIR; + else process.env.CODEGRAPH_DIR = saved; + fs.rmSync(tempDir, { recursive: true, force: true }); + }); + + describe('codeGraphDirName()', () => { + it('defaults to .codegraph when unset', () => { + delete process.env.CODEGRAPH_DIR; + expect(codeGraphDirName()).toBe('.codegraph'); + }); + + it('honors a valid override', () => { + process.env.CODEGRAPH_DIR = '.codegraph-win'; + expect(codeGraphDirName()).toBe('.codegraph-win'); + }); + + // Anything that isn't a plain segment could escape the project root or + // clobber it, so it's ignored in favor of the default. + it.each(['foo/bar', 'a\\b', '..', '../x', '.', '/abs/path', ' ', ''])( + 'falls back to .codegraph for invalid value %j', + (bad) => { + process.env.CODEGRAPH_DIR = bad; + expect(codeGraphDirName()).toBe('.codegraph'); + } + ); + }); + + describe('isCodeGraphDataDir()', () => { + it('matches the default, the active override, and .codegraph-* siblings', () => { + process.env.CODEGRAPH_DIR = '.codegraph-win'; + expect(isCodeGraphDataDir('.codegraph')).toBe(true); // the other env's dir + expect(isCodeGraphDataDir('.codegraph-win')).toBe(true); // active override + expect(isCodeGraphDataDir('.codegraph-wsl')).toBe(true); // any sibling + }); + + it('does not match unrelated directories', () => { + delete process.env.CODEGRAPH_DIR; + for (const name of ['src', 'node_modules', '.git', 'codegraph', '.codegraphextra']) { + expect(isCodeGraphDataDir(name)).toBe(false); + } + }); + }); + + it('init writes the index under the overridden directory, not .codegraph', () => { + process.env.CODEGRAPH_DIR = '.codegraph-win'; + const cg = CodeGraph.initSync(tempDir); + try { + expect(fs.existsSync(path.join(tempDir, '.codegraph-win', 'codegraph.db'))).toBe(true); + expect(fs.existsSync(path.join(tempDir, '.codegraph'))).toBe(false); + expect(getCodeGraphDir(tempDir)).toBe(path.join(tempDir, '.codegraph-win')); + expect(CodeGraph.isInitialized(tempDir)).toBe(true); + } finally { + cg.close(); + } + }); + + it('two index dirs coexist in one tree and the override side skips the sibling', async () => { + // WSL side: default `.codegraph`, with a source file. + delete process.env.CODEGRAPH_DIR; + fs.writeFileSync(path.join(tempDir, 'app.ts'), 'export function onlyReal() {}\n'); + const wsl = await CodeGraph.init(tempDir, { index: true }); + wsl.close(); + + // Windows side: override dir, same tree. Plant a decoy source file INSIDE + // the WSL data dir — the override-side index must not pick it up. + process.env.CODEGRAPH_DIR = '.codegraph-win'; + fs.writeFileSync(path.join(tempDir, '.codegraph', 'decoy.ts'), 'export function decoyLeak() {}\n'); + const win = await CodeGraph.init(tempDir, { index: true }); + try { + expect(fs.existsSync(path.join(tempDir, '.codegraph', 'codegraph.db'))).toBe(true); + expect(fs.existsSync(path.join(tempDir, '.codegraph-win', 'codegraph.db'))).toBe(true); + expect(win.searchNodes('onlyReal').length).toBeGreaterThan(0); + expect(win.searchNodes('decoyLeak')).toEqual([]); // sibling data dir not indexed + } finally { + win.close(); + } + }); +}); diff --git a/src/bin/codegraph.ts b/src/bin/codegraph.ts index bd667738b..879dbb078 100644 --- a/src/bin/codegraph.ts +++ b/src/bin/codegraph.ts @@ -356,7 +356,7 @@ function printIndexResult(clack: typeof import('@clack/prompts'), result: IndexR clack.log.info(`The index is fully usable ${getGlyphs().dash} only the failed files are missing.`); } } else if (projectPath) { - const logPath = path.join(projectPath, '.codegraph', 'errors.log'); + const logPath = path.join(getCodeGraphDir(projectPath), 'errors.log'); if (fs.existsSync(logPath)) { fs.unlinkSync(logPath); } @@ -367,7 +367,7 @@ function printIndexResult(clack: typeof import('@clack/prompts'), result: IndexR * Write detailed error log to .codegraph/errors.log */ function writeErrorLog(projectPath: string, errors: Array<{ message: string; filePath?: string; severity: string; code?: string }>): void { - const cgDir = path.join(projectPath, '.codegraph'); + const cgDir = getCodeGraphDir(projectPath); if (!fs.existsSync(cgDir)) return; const logPath = path.join(cgDir, 'errors.log'); diff --git a/src/db/index.ts b/src/db/index.ts index cbc08b8f0..e6d52d47a 100644 --- a/src/db/index.ts +++ b/src/db/index.ts @@ -9,6 +9,7 @@ import * as fs from 'fs'; import * as path from 'path'; import { SchemaVersion } from '../types'; import { runMigrations, getCurrentVersion, CURRENT_SCHEMA_VERSION } from './migrations'; +import { getCodeGraphDir } from '../directory'; export { SqliteDatabase, SqliteBackend } from './sqlite-adapter'; @@ -240,5 +241,5 @@ export const DATABASE_FILENAME = 'codegraph.db'; * Get the default database path for a project */ export function getDatabasePath(projectRoot: string): string { - return path.join(projectRoot, '.codegraph', DATABASE_FILENAME); + return path.join(getCodeGraphDir(projectRoot), DATABASE_FILENAME); } diff --git a/src/directory.ts b/src/directory.ts index 3a5c91d93..8f5abb092 100644 --- a/src/directory.ts +++ b/src/directory.ts @@ -7,16 +7,82 @@ import * as fs from 'fs'; import * as path from 'path'; +/** The default per-project data directory name. */ +const DEFAULT_CODEGRAPH_DIR = '.codegraph'; + +let warnedBadDirName = false; + +/** + * Resolve the per-project data directory name, honoring the `CODEGRAPH_DIR` + * environment override (default `.codegraph`). The override is a single path + * segment that lives in the project root. + * + * Why this exists: two environments that share one working tree must NOT share + * one `.codegraph/` — most concretely Windows-native and WSL (issue #636). The + * daemon lockfile (`.codegraph/daemon.pid`) records a platform-specific pid and + * socket path (a Windows named pipe vs a WSL Unix socket), and SQLite file + * locking across the WSL2 ↔ Windows filesystem boundary is unreliable, so two + * daemons sharing one index risks corruption. Setting `CODEGRAPH_DIR=.codegraph-win` + * on one side gives each environment its own index in the same tree. + * + * Read live (not captured at load) so it is both process-accurate and testable. + * An override that isn't a plain directory name — empty, containing a path + * separator, `.`, `..`/traversal, or absolute — is ignored (we keep the + * default) rather than risk writing the index outside the project or into the + * project root itself; we warn once to stderr so the misconfiguration is seen. + */ +export function codeGraphDirName(): string { + const raw = process.env.CODEGRAPH_DIR?.trim(); + if (!raw) return DEFAULT_CODEGRAPH_DIR; + const invalid = + raw === '.' || + raw.includes('..') || + raw.includes('/') || + raw.includes('\\') || + path.isAbsolute(raw); + if (invalid) { + if (!warnedBadDirName) { + warnedBadDirName = true; + // stderr only — stdout is the MCP protocol channel. + console.warn( + `[codegraph] Ignoring invalid CODEGRAPH_DIR="${raw}" — it must be a plain ` + + `directory name (no path separators, no "..", not absolute). Using "${DEFAULT_CODEGRAPH_DIR}".` + ); + } + return DEFAULT_CODEGRAPH_DIR; + } + return raw; +} + /** - * CodeGraph directory name + * CodeGraph directory name — a load-time snapshot of {@link codeGraphDirName}. + * A running process's environment is fixed, so this equals the live value; + * it's kept as a stable string export for backward compatibility. Internal code + * resolves the name through {@link codeGraphDirName} / {@link getCodeGraphDir} + * so the `CODEGRAPH_DIR` override always applies. */ -export const CODEGRAPH_DIR = '.codegraph'; +export const CODEGRAPH_DIR = codeGraphDirName(); + +/** + * Is `name` (a single path segment) a CodeGraph data directory? Matches the + * default `.codegraph`, the active `CODEGRAPH_DIR` override, and any + * `.codegraph-*` sibling. File-watching and the indexer skip ALL of these, so + * when two environments share one working tree (Windows + WSL, issue #636) + * neither indexes or watches the other's index directory. + */ +export function isCodeGraphDataDir(name: string): boolean { + return ( + name === DEFAULT_CODEGRAPH_DIR || + name === codeGraphDirName() || + name.startsWith(DEFAULT_CODEGRAPH_DIR + '-') + ); +} /** * Get the .codegraph directory path for a project */ export function getCodeGraphDir(projectRoot: string): string { - return path.join(projectRoot, CODEGRAPH_DIR); + return path.join(projectRoot, codeGraphDirName()); } /** diff --git a/src/extraction/index.ts b/src/extraction/index.ts index 36569258e..9f1831f87 100644 --- a/src/extraction/index.ts +++ b/src/extraction/index.ts @@ -18,6 +18,7 @@ import { import { QueryBuilder } from '../db/queries'; import { extractFromSource } from './tree-sitter'; import { detectLanguage, isSourceFile, isLanguageSupported, isFileLevelOnlyLanguage, initGrammars, loadGrammarsForLanguages } from './grammars'; +import { isCodeGraphDataDir } from '../directory'; import { logDebug, logWarn } from '../errors'; import { validatePathWithinRoot, normalizePath } from '../utils'; import ignore, { Ignore } from 'ignore'; @@ -454,8 +455,9 @@ function scanDirectoryWalk( } for (const entry of entries) { - // Never descend into git internals or our own data directory. - if (entry.name === '.git' || entry.name === '.codegraph') continue; + // Never descend into git internals or any CodeGraph data directory + // (the active one or a sibling another environment created — #636). + if (entry.name === '.git' || isCodeGraphDataDir(entry.name)) continue; const fullPath = path.join(dir, entry.name); const relativePath = normalizePath(path.relative(rootDir, fullPath)); diff --git a/src/index.ts b/src/index.ts index 55ef12e64..1b2642dbd 100644 --- a/src/index.ts +++ b/src/index.ts @@ -48,6 +48,7 @@ import { ContextBuilder, createContextBuilder } from './context'; import { Mutex, FileLock } from './utils'; import { FileWatcher, WatchOptions, PendingFile, LockUnavailableError } from './sync'; import { EXTRACTION_VERSION } from './extraction/extraction-version'; +import { getCodeGraphDir } from './directory'; import { CodeGraphPackageVersion } from './mcp/version'; // Re-export types for consumers @@ -154,7 +155,7 @@ export class CodeGraph { this.queries = queries; this.projectRoot = projectRoot; this.fileLock = new FileLock( - path.join(projectRoot, '.codegraph', 'codegraph.lock') + path.join(getCodeGraphDir(projectRoot), 'codegraph.lock') ); this.orchestrator = new ExtractionOrchestrator(projectRoot, queries); this.resolver = createResolver(projectRoot, queries); diff --git a/src/installer/index.ts b/src/installer/index.ts index edd48ecaf..a9be118be 100644 --- a/src/installer/index.ts +++ b/src/installer/index.ts @@ -28,6 +28,7 @@ import { getGlyphs } from '../ui/glyphs'; // installer must stay importable even when native modules can't load). import { watchDisabledReason } from '../sync/watch-policy'; import { isGitRepo, isSyncHookInstalled, installGitSyncHook } from '../sync/git-hooks'; +import { getCodeGraphDir, codeGraphDirName } from '../directory'; // Backwards-compat: keep these named exports — downstream code may // import them. The shim in `config-writer.ts` continues to re-export @@ -362,8 +363,8 @@ export async function runUninstaller(opts: RunUninstallerOptions): Promise // Step 4: for local uninstall, the index dir is separate — point at // `uninit` so the user knows it's still there (and how to remove it). - if (location === 'local' && fs.existsSync(path.join(process.cwd(), '.codegraph'))) { - clack.log.info('The .codegraph/ index for this project is still here. Run `codegraph uninit` to delete it.'); + if (location === 'local' && fs.existsSync(getCodeGraphDir(process.cwd()))) { + clack.log.info(`The ${codeGraphDirName()}/ index for this project is still here. Run \`codegraph uninit\` to delete it.`); } // Step 5: summary. diff --git a/src/sync/watcher.ts b/src/sync/watcher.ts index 8635a26cc..9bc654708 100644 --- a/src/sync/watcher.ts +++ b/src/sync/watcher.ts @@ -37,6 +37,7 @@ import type { Ignore } from 'ignore'; import { isSourceFile, buildDefaultIgnore } from '../extraction'; import { logDebug, logWarn } from '../errors'; import { normalizePath } from '../utils'; +import { isCodeGraphDataDir } from '../directory'; import { watchDisabledReason } from './watch-policy'; /** @@ -425,8 +426,12 @@ export class FileWatcher { /** Our own dirs are always ignored, regardless of .gitignore. */ private isAlwaysIgnored(rel: string): boolean { + // First path segment. Ignore any CodeGraph data dir — the active one AND a + // sibling like `.codegraph-win` a second environment (Windows/WSL) created + // in the same tree, so neither side watches the other's index (#636). + const top = rel.split('/')[0] ?? rel; return ( - rel === '.codegraph' || rel.startsWith('.codegraph/') || + isCodeGraphDataDir(top) || rel === '.git' || rel.startsWith('.git/') ); } From fd03f31b2cdfe3148e953a0231ca247d03dbb5f7 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Mon, 8 Jun 2026 20:18:17 -0400 Subject: [PATCH 05/51] fix(cpp): resolve calls through singletons/factories/chained getters (#645) (#742) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A C++ method call whose receiver is another call's result — `Foo::instance().bar()`, `WidgetFactory::create().draw()`, `openSession()->run()`, or the same stored in an `auto` local first — lost the receiver's type during extraction. The callee degraded to a bare method name, so when two classes shared a method name the call silently resolved to whichever was indexed first (or not at all), corrupting callers / impact / trace with a plausible-but-wrong edge. Three parts: - Capture C++ return types (new nodes.return_type column, schema v5): the function_definition's `type` field, normalized — smart-pointer pointee unwrapped, void/primitives dropped. - Preserve the inner-call receiver in extraction: a C/C++ field_expression whose receiver is itself a call is encoded `inner().method` instead of dropping to the bare name. Other languages keep the existing behavior. - New resolution strategy (matchCppCallChain): infer the receiver's class from the inner call's return type, then resolve AND validate the method on it. Handles singletons/accessors, factories returning a different type, free-function factories, make_unique/make_shared/new/direct construction, single-level member chains, and namespace-qualified inner calls. A wrong inference yields no edge, never a wrong one. EXTRACTION_VERSION 2->3 (re-index to populate return types). Validated on the issue repro + spdlog: node count stable (no explosion), deterministic, and ~100 pre-existing wrong `.size()`-style edges removed. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/extraction.test.ts | 35 ++++++ __tests__/foundation.test.ts | 2 +- __tests__/pr19-improvements.test.ts | 2 +- __tests__/resolution.test.ts | 108 +++++++++++++++++ src/db/migrations.ts | 12 +- src/db/queries.ts | 9 +- src/db/schema.sql | 1 + src/extraction/extraction-version.ts | 2 +- src/extraction/languages/c-cpp.ts | 52 +++++++++ src/extraction/tree-sitter-types.ts | 9 ++ src/extraction/tree-sitter.ts | 21 ++++ src/resolution/name-matcher.ts | 166 ++++++++++++++++++++++++++- src/types.ts | 9 ++ 14 files changed, 421 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index faf31754b..08e6d4f80 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ### Fixes +- C++ method calls made through a singleton, factory, or chained getter now resolve to the correct class. A call like `Foo::instance().bar()`, `WidgetFactory::create().draw()`, `openSession()->run()`, or the same stored in an `auto` local first, used to lose the receiver's type — so when two classes had a same-named method the call silently attached to whichever was indexed first (or didn't resolve at all), corrupting callers, impact, and trace. CodeGraph now infers the receiver's type from what the inner call returns (capturing C++ return types for the first time) and creates the edge only when that class genuinely has the method, so a wrong guess produces no edge instead of a misleading one. Covers singletons and self-returning accessors, factories that return a different type, free-function factories, `make_unique` / `make_shared` / `new` / direct construction, and single-level member chains. Existing C/C++ indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @stabey. (#645) (C/C++) - The shared background server no longer logs a scary-looking `[error] … undefined` line on every session start. Attaching to the shared daemon is normal, healthy behavior, but the informational message was being surfaced by MCP hosts (Claude Code and others) as an error; it's now silent by default — set `CODEGRAPH_MCP_LOG_ATTACH=1` to surface it when debugging daemon attach. Thanks @mturac. (#618) - On Windows, CodeGraph's background processes no longer pile up without bound and saturate CPU over a long session. When the editor or agent that launched CodeGraph exited, its helper process couldn't tell its parent had gone — Windows reports process lineage differently than macOS and Linux — so the helper kept running, the shared background server never saw the client disconnect, and its idle timer never fired to shut it down. CodeGraph now detects parent-process exit directly on Windows, so helpers and the idle background server wind down promptly, the same as they already did on macOS and Linux. (#692, #576, #680) - The shared background server has two further safeguards against ever lingering: it now drops a client the moment it detects that client's process is gone (even if the disconnect arrived uncleanly — a force-quit or a dropped connection that never closed the socket), and it won't stay running indefinitely with clients attached but no activity. Together these guarantee it always winds down, on every platform. (#692) diff --git a/__tests__/extraction.test.ts b/__tests__/extraction.test.ts index 89918c58b..7bbaaea48 100644 --- a/__tests__/extraction.test.ts +++ b/__tests__/extraction.test.ts @@ -2369,6 +2369,41 @@ end }); }); + describe('C/C++ return type capture (#645)', () => { + it('captures the normalized return type of a C++ method/function', () => { + const code = ` +struct Widget { void draw(); }; +class Factory { public: static Widget create(); }; +Widget Factory::create() { return Widget(); } +void doNothing() {} +`; + const result = extractFromSource('f.cpp', code); + + const create = result.nodes.find( + (n) => n.name === 'create' && (n.kind === 'method' || n.kind === 'function') + ); + expect(create?.returnType).toBe('Widget'); + + // A `void` return records no type, so resolution never tries to resolve a + // method on it. + const doNothing = result.nodes.find((n) => n.name === 'doNothing'); + expect(doNothing).toBeDefined(); + expect(doNothing?.returnType).toBeUndefined(); + }); + + it('unwraps a smart-pointer return type to its pointee', () => { + const code = ` +#include +struct Widget {}; +std::unique_ptr makeWidget() { return nullptr; } +`; + const result = extractFromSource('f.cpp', code); + + const make = result.nodes.find((n) => n.name === 'makeWidget'); + expect(make?.returnType).toBe('Widget'); + }); + }); + describe('C/C++ imports', () => { it('should extract system include', () => { const code = `#include `; diff --git a/__tests__/foundation.test.ts b/__tests__/foundation.test.ts index 12b11c21f..405865b2f 100644 --- a/__tests__/foundation.test.ts +++ b/__tests__/foundation.test.ts @@ -242,7 +242,7 @@ describe('Database Connection', () => { const version = db.getSchemaVersion(); expect(version).not.toBeNull(); - expect(version?.version).toBe(4); + expect(version?.version).toBe(5); db.close(); }); diff --git a/__tests__/pr19-improvements.test.ts b/__tests__/pr19-improvements.test.ts index eb5200919..8e8ca8177 100644 --- a/__tests__/pr19-improvements.test.ts +++ b/__tests__/pr19-improvements.test.ts @@ -299,7 +299,7 @@ describe('Best-Candidate Resolution', () => { describe('Schema v2 Migration', () => { it.skipIf(!HAS_SQLITE)('should have correct current schema version', async () => { const { CURRENT_SCHEMA_VERSION } = await import('../src/db/migrations'); - expect(CURRENT_SCHEMA_VERSION).toBe(4); + expect(CURRENT_SCHEMA_VERSION).toBe(5); }); it.skipIf(!HAS_SQLITE)('should have migration for version 2', async () => { diff --git a/__tests__/resolution.test.ts b/__tests__/resolution.test.ts index 347cb635c..74ad5d7f5 100644 --- a/__tests__/resolution.test.ts +++ b/__tests__/resolution.test.ts @@ -1918,4 +1918,112 @@ func main() { } }); }); + + describe('C++ chained-call receiver resolution (#645)', () => { + async function indexCpp(files: Record): Promise { + for (const [name, content] of Object.entries(files)) { + fs.writeFileSync(path.join(tempDir, name), content); + } + cg = await CodeGraph.init(tempDir, { index: true }); + } + + function callerNamesOf(qualifiedName: string): string[] { + const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName); + if (!target) return []; + const names = cg + .getIncomingEdges(target.id) + .filter((e) => e.kind === 'calls') + .map((e) => cg.getNode(e.source)?.name) + .filter((n): n is string => !!n); + return [...new Set(names)].sort(); + } + + it('resolves singleton chains and auto locals to the right class, never the first-sorted one', async () => { + // Two classes share writeLog; Logger sorts first so it wins any name-only + // tie. All three call forms target Metrics. + await indexCpp({ + 'logger.hpp': `#pragma once +#include +class Logger { public: static Logger& instance(); void writeLog(const std::string&); }; +class Metrics { public: static Metrics& instance(); void writeLog(const std::string&); }; +`, + 'impl.cpp': `#include "logger.hpp" +Logger& Logger::instance() { static Logger l; return l; } +Metrics& Metrics::instance() { static Metrics m; return m; } +void Logger::writeLog(const std::string&) {} +void Metrics::writeLog(const std::string&) {} +`, + 'app.cpp': `#include "logger.hpp" +void a() { Metrics::instance().writeLog("x"); } // chained singleton +void b() { auto& m = Metrics::instance(); m.writeLog("x"); } // stored in auto +void c() { Metrics& m = Metrics::instance(); m.writeLog("x"); } // explicit type +`, + }); + + expect(callerNamesOf('Metrics::writeLog')).toEqual(['a', 'b', 'c']); + expect(callerNamesOf('Logger::writeLog')).toEqual([]); + }); + + it('resolves factories, free-function factories, and member chains via the inner call return type', async () => { + await indexCpp({ + 'types.hpp': `#pragma once +#include +struct Widget { void draw(); }; +struct Session { void run(); }; +struct View { void render(); }; +class WidgetFactory { public: static Widget create(); }; +class Manager { public: View view(); }; +Session* openSession(); +// Decoy that sorts first and has all three methods — must never win. +struct Aaa { void draw(); void run(); void render(); }; +`, + 'impl.cpp': `#include "types.hpp" +void Widget::draw() {} +void Session::run() {} +void View::render() {} +void Aaa::draw() {} +void Aaa::run() {} +void Aaa::render() {} +Widget WidgetFactory::create() { return Widget(); } +View Manager::view() { return View(); } +Session* openSession() { return nullptr; } +`, + 'app.cpp': `#include "types.hpp" +void factory() { WidgetFactory::create().draw(); } // -> Widget::draw +void freefunc() { openSession()->run(); } // -> Session::run +void member() { Manager mgr; mgr.view().render(); } // -> View::render +void makeUnique() { auto w = std::make_unique(); w->draw(); } // -> Widget::draw +`, + }); + + expect(callerNamesOf('Widget::draw')).toEqual(['factory', 'makeUnique']); + expect(callerNamesOf('Session::run')).toEqual(['freefunc']); + expect(callerNamesOf('View::render')).toEqual(['member']); + // The first-sorted decoy never captures any of them. + expect(callerNamesOf('Aaa::draw')).toEqual([]); + expect(callerNamesOf('Aaa::run')).toEqual([]); + expect(callerNamesOf('Aaa::render')).toEqual([]); + }); + + it('creates NO edge when the inferred type lacks the method (silent miss, not a wrong edge)', async () => { + await indexCpp({ + 'types.hpp': `#pragma once +struct Widget { void draw(); }; +struct Other { void onlyOther(); }; +class WidgetFactory { public: static Widget create(); }; +`, + 'impl.cpp': `#include "types.hpp" +void Widget::draw() {} +void Other::onlyOther() {} +Widget WidgetFactory::create() { return Widget(); } +`, + 'app.cpp': `#include "types.hpp" +// Widget has no onlyOther() — must produce NO edge, never a wrong one to Other. +void wrong() { WidgetFactory::create().onlyOther(); } +`, + }); + + expect(callerNamesOf('Other::onlyOther')).toEqual([]); + }); + }); }); diff --git a/src/db/migrations.ts b/src/db/migrations.ts index 1a8d1c542..bfea9024d 100644 --- a/src/db/migrations.ts +++ b/src/db/migrations.ts @@ -9,7 +9,7 @@ import { SqliteDatabase } from './sqlite-adapter'; /** * Current schema version */ -export const CURRENT_SCHEMA_VERSION = 4; +export const CURRENT_SCHEMA_VERSION = 5; /** * Migration definition @@ -65,6 +65,16 @@ const migrations: Migration[] = [ `); }, }, + { + version: 5, + description: + 'Add nodes.return_type — normalized return/result type for receiver-type inference (C++ singletons/factories, #645)', + up: (db) => { + db.exec(` + ALTER TABLE nodes ADD COLUMN return_type TEXT; + `); + }, + }, ]; /** diff --git a/src/db/queries.ts b/src/db/queries.ts index 3f35c5b04..3e4e6e14a 100644 --- a/src/db/queries.ts +++ b/src/db/queries.ts @@ -72,6 +72,7 @@ interface NodeRow { is_abstract: number; decorators: string | null; type_parameters: string | null; + return_type: string | null; updated_at: number; } @@ -133,6 +134,7 @@ function rowToNode(row: NodeRow): Node { isAbstract: row.is_abstract === 1, decorators: row.decorators ? safeJsonParse(row.decorators, undefined) : undefined, typeParameters: row.type_parameters ? safeJsonParse(row.type_parameters, undefined) : undefined, + returnType: row.return_type ?? undefined, updatedAt: row.updated_at, }; } @@ -232,13 +234,13 @@ export class QueryBuilder { start_line, end_line, start_column, end_column, docstring, signature, visibility, is_exported, is_async, is_static, is_abstract, - decorators, type_parameters, updated_at + decorators, type_parameters, return_type, updated_at ) VALUES ( @id, @kind, @name, @qualifiedName, @filePath, @language, @startLine, @endLine, @startColumn, @endColumn, @docstring, @signature, @visibility, @isExported, @isAsync, @isStatic, @isAbstract, - @decorators, @typeParameters, @updatedAt + @decorators, @typeParameters, @returnType, @updatedAt ) `); } @@ -281,6 +283,7 @@ export class QueryBuilder { isAbstract: node.isAbstract ? 1 : 0, decorators: node.decorators ? JSON.stringify(node.decorators) : null, typeParameters: node.typeParameters ? JSON.stringify(node.typeParameters) : null, + returnType: node.returnType ?? null, updatedAt: node.updatedAt ?? Date.now(), }); } @@ -321,6 +324,7 @@ export class QueryBuilder { is_abstract = @isAbstract, decorators = @decorators, type_parameters = @typeParameters, + return_type = @returnType, updated_at = @updatedAt WHERE id = @id `); @@ -355,6 +359,7 @@ export class QueryBuilder { isAbstract: node.isAbstract ? 1 : 0, decorators: node.decorators ? JSON.stringify(node.decorators) : null, typeParameters: node.typeParameters ? JSON.stringify(node.typeParameters) : null, + returnType: node.returnType ?? null, updatedAt: node.updatedAt ?? Date.now(), }); } diff --git a/src/db/schema.sql b/src/db/schema.sql index b08c34f37..292981c82 100644 --- a/src/db/schema.sql +++ b/src/db/schema.sql @@ -37,6 +37,7 @@ CREATE TABLE IF NOT EXISTS nodes ( is_abstract INTEGER DEFAULT 0, decorators TEXT, -- JSON array type_parameters TEXT, -- JSON array + return_type TEXT, -- normalized return/result type name (e.g. C++ method return, for receiver-type inference) updated_at INTEGER NOT NULL ); diff --git a/src/extraction/extraction-version.ts b/src/extraction/extraction-version.ts index e216292cc..aa5106f07 100644 --- a/src/extraction/extraction-version.ts +++ b/src/extraction/extraction-version.ts @@ -21,4 +21,4 @@ * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty * in the product is load-bearing"). */ -export const EXTRACTION_VERSION = 2; +export const EXTRACTION_VERSION = 3; diff --git a/src/extraction/languages/c-cpp.ts b/src/extraction/languages/c-cpp.ts index 5d13ddc5a..1365cc24c 100644 --- a/src/extraction/languages/c-cpp.ts +++ b/src/extraction/languages/c-cpp.ts @@ -45,6 +45,56 @@ function extractCppReceiverType(node: SyntaxNode, source: string): string | unde return parts.length > 1 ? parts.slice(0, -1).join('::') : undefined; } +/** + * Built-in / non-class return types that can never be a method receiver. We + * store no `returnType` for these so resolution never tries to resolve a method + * on `void` / `int` / etc. + */ +const CPP_NON_CLASS_RETURN = new Set([ + 'void', 'bool', 'char', 'short', 'int', 'long', 'float', 'double', 'unsigned', + 'signed', 'size_t', 'ssize_t', 'auto', 'wchar_t', 'char8_t', 'char16_t', + 'char32_t', 'int8_t', 'int16_t', 'int32_t', 'int64_t', 'uint8_t', 'uint16_t', + 'uint32_t', 'uint64_t', 'intptr_t', 'uintptr_t', 'nullptr_t', +]); + +/** + * Normalize a C++ return type to the bare class name a method could be called + * on. Unwraps smart-pointer / optional wrappers to their element type + * (`std::unique_ptr` → `Widget`) so a factory's `->method()` resolves on + * the pointee. Strips cv-qualifiers, `&`/`*`, namespace qualifiers, and other + * template args. Returns undefined for primitives / void / `auto` / empty. + */ +export function normalizeCppReturnType(raw: string): string | undefined { + let t = raw.trim(); + if (!t) return undefined; + // Unwrap smart pointers / optional to their pointee (the thing you call `->` on). + const wrapper = t.match(/\b(?:std\s*::\s*)?(?:unique_ptr|shared_ptr|weak_ptr|optional)\s*<\s*([^,>]+?)\s*>/); + if (wrapper && wrapper[1]) t = wrapper[1]; + t = t + .replace(/\b(?:const|volatile|typename|struct|class|enum)\b/g, ' ') + .replace(/<[^>]*>/g, ' ') + .replace(/[*&]+/g, ' ') + .replace(/\s+/g, ' ') + .trim(); + if (!t) return undefined; + const last = t.split('::').filter(Boolean).pop(); + if (!last) return undefined; + if (CPP_NON_CLASS_RETURN.has(last)) return undefined; + if (!/^[A-Za-z_]\w*$/.test(last)) return undefined; + return last; +} + +/** + * A function/method's return type lives in the `function_definition`'s `type` + * field (`Metrics& Metrics::instance()` → `Metrics`). Constructors, destructors, + * and conversion operators have no `type` field → undefined. + */ +function extractCppReturnType(node: SyntaxNode, source: string): string | undefined { + const typeNode = getChildByField(node, 'type'); + if (!typeNode) return undefined; + return normalizeCppReturnType(getNodeText(typeNode, source)); +} + export const cExtractor: LanguageExtractor = { functionTypes: ['function_definition'], classTypes: [], @@ -60,6 +110,7 @@ export const cExtractor: LanguageExtractor = { nameField: 'declarator', bodyField: 'body', paramsField: 'parameters', + getReturnType: extractCppReturnType, resolveTypeAliasKind: (node, _source) => { // C typedef: `typedef enum { ... } name;` or `typedef struct { ... } name;` // The inner enum_specifier/struct_specifier is anonymous, but we want the typedef name @@ -107,6 +158,7 @@ export const cppExtractor: LanguageExtractor = { paramsField: 'parameters', resolveName: extractCppQualifiedMethodName, getReceiverType: extractCppReceiverType, + getReturnType: extractCppReturnType, getVisibility: (node) => { // Check for access specifier in parent const parent = node.parent; diff --git a/src/extraction/tree-sitter-types.ts b/src/extraction/tree-sitter-types.ts index 8742961ca..cecd54c02 100644 --- a/src/extraction/tree-sitter-types.ts +++ b/src/extraction/tree-sitter-types.ts @@ -205,6 +205,15 @@ export interface LanguageExtractor { */ getReceiverType?: (node: SyntaxNode, source: string) => string | undefined; + /** + * Extract a function/method's normalized return type name (bare class name, + * smart-pointer pointee unwrapped), stored on the node as `returnType`. Used + * by C/C++ so resolution can infer a chained receiver's type from what the + * inner call returns (`Foo::instance().bar()` → resolve `bar` on `Foo`, + * issue #645). Return undefined for primitives / void / constructors. + */ + getReturnType?: (node: SyntaxNode, source: string) => string | undefined; + /** * Resolve the actual node kind for a type alias declaration. * Used by Go where `type_spec` is the named declaration wrapper for structs/interfaces: diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index 6febee652..ae28a6657 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -811,6 +811,7 @@ export class TreeSitterExtractor { const isExported = this.extractor.isExported?.(node, this.source); const isAsync = this.extractor.isAsync?.(node); const isStatic = this.extractor.isStatic?.(node); + const returnType = this.extractor.getReturnType?.(node, this.source); const funcNode = this.createNode('function', name, node, { docstring, @@ -819,6 +820,7 @@ export class TreeSitterExtractor { isExported, isAsync, isStatic, + returnType, }); if (!funcNode) return; @@ -930,12 +932,14 @@ export class TreeSitterExtractor { const visibility = this.extractor.getVisibility?.(node); const isAsync = this.extractor.isAsync?.(node); const isStatic = this.extractor.isStatic?.(node); + const returnType = this.extractor.getReturnType?.(node, this.source); const extraProps: Partial = { docstring, signature, visibility, isAsync, isStatic, + returnType, }; if (receiverType) { extraProps.qualifiedName = `${receiverType}::${name}`; @@ -2457,6 +2461,23 @@ export class TreeSitterExtractor { } else { calleeName = methodName; } + } else if ( + (this.language === 'cpp' || this.language === 'c') && + receiver && + receiver.type === 'call_expression' + ) { + // C/C++ receiver that is itself a call — `Foo::instance().bar()`, + // `openSession()->run()`, `mgr.view().render()`. Keep the inner + // call so resolution can infer bar()'s class from what the inner + // call RETURNS (#645). Encode as `().`; the + // `().` marker never appears in an ordinary ref, so the C++ + // resolver can detect and split it. Other languages keep the + // bare-name behavior (dropping the receiver) below. + const innerFn = getChildByField(receiver, 'function'); + const innerCallee = innerFn + ? getNodeText(innerFn, this.source).replace(/->/g, '.').replace(/\s+/g, '') + : ''; + calleeName = innerCallee ? `${innerCallee}().${methodName}` : methodName; } else { calleeName = methodName; } diff --git a/src/resolution/name-matcher.ts b/src/resolution/name-matcher.ts index d6bce5659..f01628c12 100644 --- a/src/resolution/name-matcher.ts +++ b/src/resolution/name-matcher.ts @@ -351,6 +351,7 @@ function inferCppReceiverType( receiverName: string, ref: UnresolvedRef, context: ResolutionContext, + depth = 0, ): string | null { const source = context.readFile(ref.filePath); if (!source) return null; @@ -368,7 +369,15 @@ function inferCppReceiverType( const declaratorMatch = line.match(declaratorRegex); if (declaratorMatch) { const normalized = normalizeCppTypeName(declaratorMatch[1] ?? ''); - if (normalized) return normalized; + if (normalized === 'auto') { + // `auto x = Foo::instance();` — the declared type is deduced; recover it + // from the initializer (call return type / construction) (#645). + const initType = inferCppAutoInitializerType(line, receiverName, ref, context, depth); + if (initType) return initType; + // No usable initializer on this line — keep scanning earlier ones. + } else if (normalized) { + return normalized; + } } } @@ -388,13 +397,158 @@ function inferCppReceiverType( const declaratorMatch = line.match(declaratorRegex); if (!declaratorMatch) continue; const normalized = normalizeCppTypeName(declaratorMatch[1] ?? ''); - if (normalized) return normalized; + if (normalized && normalized !== 'auto') return normalized; } } return null; } +/** + * Last `::`-separated segment of a (possibly namespace-qualified) C++ name. + */ +function cppLastSegment(name: string): string { + const parts = name.split('::').filter(Boolean); + return parts[parts.length - 1] ?? name; +} + +/** + * Return type captured at extraction for `Class::method` (or a free function), + * read off the indexed node's `returnType` (#645). Null when not indexed or no + * return type was recorded (e.g. a `void`/primitive return). + */ +function lookupCppReturnType( + callee: string, + ref: UnresolvedRef, + context: ResolutionContext, +): string | null { + let method = callee; + let cls: string | null = null; + if (callee.includes('::')) { + const parts = callee.split('::').filter(Boolean); + method = parts[parts.length - 1] ?? callee; + cls = parts.slice(0, -1).join('::'); + } + const candidates = context.getNodesByName(method).filter( + (n) => + (n.kind === 'method' || n.kind === 'function') && + n.language === ref.language && + !!n.returnType, + ); + if (cls) { + const want = `${cls}::${method}`; + // The call site may name the class with MORE namespace qualification than + // the stored node (`details::registry::instance` at the call vs + // `registry::instance` on the node — the receiver type only carries the + // immediate class), or LESS. Accept an exact match or either being a + // namespace-suffix of the other; the shared `::::` tail keeps + // it specific. + const m = candidates.find( + (n) => + n.qualifiedName === want || + n.qualifiedName.endsWith(`::${want}`) || + want.endsWith(`::${n.qualifiedName}`), + ); + return m?.returnType ?? null; + } + return candidates.find((n) => n.kind === 'function')?.returnType ?? null; +} + +/** Does the graph contain a class/struct named `name`'s last segment? */ +function cppClassExists(name: string, ref: UnresolvedRef, context: ResolutionContext): boolean { + const last = cppLastSegment(name); + return context + .getNodesByName(last) + .some((n) => (n.kind === 'class' || n.kind === 'struct') && n.language === ref.language); +} + +/** + * Infer the class produced by a C++ call/construction expression, using return + * types captured at extraction (#645). Handles, in order: + * - `make_unique()` / `make_shared()` → T + * - single-level member call `recv.method()` → recv's type, then method's return + * - `Class::method()` / free `func()` → the callee's recorded return type + * - direct construction `Type()` / `ns::Type()` → Type + * Returns null when undeterminable. Callers MUST still validate the outer method + * exists on the result before creating an edge, so a wrong guess stays silent. + */ +function resolveCppCallResultType( + inner: string, + ref: UnresolvedRef, + context: ResolutionContext, + depth = 0, +): string | null { + if (depth > 3) return null; // guard against pathological mutual recursion + const expr = inner.trim(); + + const make = expr.match(/(?:^|::)(?:make_unique|make_shared)\s*<\s*([A-Za-z_]\w*)/); + if (make) return make[1] ?? null; + + // Single-level member call `recv.method` (the `manager.view().render()` shape). + const dotIdx = expr.lastIndexOf('.'); + if (dotIdx > 0) { + const recv = expr.slice(0, dotIdx); + const method = expr.slice(dotIdx + 1); + if (recv.includes('.') || recv.includes('(') || recv.includes('::')) return null; // single level only + const recvType = inferCppReceiverType(recv, ref, context, depth + 1); + if (!recvType) return null; + return lookupCppReturnType(`${recvType}::${method}`, ref, context); + } + + const ret = lookupCppReturnType(expr, ref, context); + if (ret) return ret; + + // Direct construction — the callee itself names a class/struct. + if (cppClassExists(expr, ref, context)) return cppLastSegment(expr); + + return null; +} + +/** + * Recover the type of an `auto`-declared local from its initializer on the + * declaration line — `auto x = Foo::instance();`, `auto w = make_unique();`, + * `auto p = new W();`, `auto w = Widget();` (#645). + */ +function inferCppAutoInitializerType( + line: string, + receiverName: string, + ref: UnresolvedRef, + context: ResolutionContext, + depth: number, +): string | null { + const escaped = receiverName.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); + const m = line.match(new RegExp(`\\b${escaped}\\b\\s*=\\s*([^;]+)`)); + if (!m || !m[1]) return null; + const init = m[1].trim(); + + const neu = init.match(/^new\s+([A-Za-z_][\w:]*)/); + if (neu && neu[1]) return cppLastSegment(neu[1]); + + // A call or construction: `Foo(...)`, `A::b(...)`, `make_unique(...)`. + const call = init.match(/^([A-Za-z_][\w:]*(?:\s*<[^>;]*>)?)\s*\(/); + if (call && call[1]) return resolveCppCallResultType(call[1].replace(/\s+/g, ''), ref, context, depth + 1); + + return null; +} + +/** + * Resolve a C++ chained call whose receiver is itself a call — encoded by the + * extractor as `().` (#645). The receiver's type is what + * the inner call returns; the outer method is then resolved and VALIDATED on it + * (resolveMethodOnType requires `cls::method` to exist), so a wrong inference + * produces no edge rather than a wrong one. + */ +export function matchCppCallChain( + ref: UnresolvedRef, + context: ResolutionContext, +): ResolvedRef | null { + const m = ref.referenceName.match(/^(.+)\(\)\.(\w+)$/); + if (!m || !m[1] || !m[2]) return null; + const cls = resolveCppCallResultType(m[1], ref, context); + if (!cls) return null; + return resolveMethodOnType(cls, m[2], ref, context, 0.85, 'instance-method'); +} + /** * Java/Kotlin: infer a receiver's declared type by walking field declarations * in the class enclosing the call site. The field's `signature` is already in @@ -809,6 +963,14 @@ export function matchReference( result = matchByQualifiedName(ref, context); if (result) return result; + // 1b. C++ chained call whose receiver is another call — `Foo::instance().bar()` + // encoded as `Foo::instance().bar` by the extractor (#645). Resolve the + // receiver's type from what the inner call returns, then the method on it. + if (ref.language === 'cpp' || ref.language === 'c') { + result = matchCppCallChain(ref, context); + if (result) return result; + } + // 2. Method call pattern result = matchMethodCall(ref, context); if (result) return result; diff --git a/src/types.ts b/src/types.ts index 01aadae02..0ff4b7a5f 100644 --- a/src/types.ts +++ b/src/types.ts @@ -163,6 +163,15 @@ export interface Node { /** Generic type parameters */ typeParameters?: string[]; + /** + * Normalized return/result type name for a function/method (the bare class + * name, smart-pointer pointee unwrapped). Captured for C/C++ so resolution + * can infer a chained receiver's type from what the inner call returns — + * `Foo::instance().bar()` resolves `bar` on `Foo` (issue #645). Undefined for + * languages/symbols where it isn't captured. + */ + returnType?: string; + /** When the node was last updated */ updatedAt: number; } From 6e2a24d96ad1d969d6ef8b297ffc2280ee7e1193 Mon Sep 17 00:00:00 2001 From: Max Hsu Date: Tue, 9 Jun 2026 08:31:15 +0800 Subject: [PATCH 06/51] =?UTF-8?q?fix(extraction):=20map=20PHP=20include/re?= =?UTF-8?q?quire=20to=20file=E2=86=92file=20dependency=20edges=20(#660)=20?= =?UTF-8?q?(#663)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PHP's importTypes only captured namespace_use_declaration, so include/require(_once) — the dependency mechanism in procedural and script-style PHP — never produced edges. callers, impact, and trace missed the entire file-include graph; only namespace `use` became a dependency edge. Capture the four include/require expression types and emit file→file imports edges, reusing the path-based resolution that C/C++ #include already goes through. Only static string-literal paths are resolved (relative to the including file); dynamic forms (include $var, require __DIR__ . '/x', interpolated strings) are skipped. Include PATHS are distinguished from namespace `use` symbols by shape: a path contains '/' or '.', which PHP identifiers and FQNs never do. A path-shaped include that doesn't resolve to a known project file is left unresolved and does NOT fall back to the symbol name-matcher, which would otherwise mis-connect "inc/db.php" to an unrelated db.php elsewhere — a wrong edge is worse than a missing one. Co-authored-by: Claude Opus 4.8 (1M context) Co-authored-by: Colby McHenry --- CHANGELOG.md | 1 + __tests__/extraction.test.ts | 37 +++++++++ __tests__/resolution.test.ts | 134 +++++++++++++++++++++++++++++- src/extraction/languages/php.ts | 41 ++++++++- src/resolution/import-resolver.ts | 71 ++++++++++++++++ src/resolution/index.ts | 14 +++- 6 files changed, 295 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 08e6d4f80..f50a2d94f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -85,6 +85,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - CodeGraph's MCP server now answers an agent's `resources/list` and `prompts/list` probes with an empty list instead of an error, clearing the `-32601` messages some clients (opencode, Codex) logged on connect. (#621) - Svelte and Vue components used through a barrel file — `export { default as Button } from './Button.svelte'` re-exported from an `index.ts` and imported elsewhere — are no longer falsely reported as having **0 callers**. CodeGraph now follows the default re-export all the way to the component and resolves the imports that `.svelte` / `.vue` files themselves use, so `codegraph_callers` and `codegraph_impact` see every place a component is used. This also covers components imported from another package in a workspace/monorepo (`@scope/ui/widgets`) and bare directory imports (`import { x } from './'`). Previously a live component consumed only through a barrel looked like dead code. Thanks @nakisen. (#629) - Components used in a Vue Single-File Component's `