Working doc for a fresh session. Two problems to crack: (P1) agents still reach for
Read/grepduring implementation instead of codegraph; (P2) on startup the codegraph MCP server can bependingwhen the agent's first turn fires, so the agent runs with no codegraph at all.Read
codegraph/CLAUDE.md→ "Retrieval performance & dynamic-dispatch coverage" first — it's the doctrine these ideas must respect.
- #733 (
7175dc4) — reframed the agent-facing steering (src/mcp/server-instructions.ts+ thecodegraph_node/codegraph_exploredescriptions insrc/mcp/tools.ts) to cover implementation, not just Q&A; and added file-view mode:codegraph_nodenow accepts a barefile(nosymbol) → returns that file's symbol map + its dependents (blast radius) + verbatim bodies (includeCode).handleFileViewinsrc/mcp/tools.ts. - Clean A/B result (new build vs baseline build, both codegraph-connected, same fully-implemented task —
kindExcludeadded tocodegraph_search):- baseline: 0 codegraph calls, 8 Reads (agent ignored available codegraph).
- new: 2
codegraph_explorecalls, 5 Reads. - So the reframe did move tool-choice — but the agent used
codegraph_explore, never the file-view, and still Read 5×. n=1/arm.
- Eval harness fix (
#735): nested attach is a startup-latency problem, not a hard block.scripts/agent-eval/ab-new-vs-baseline.shnow pre-warms a daemon + skips the re-exec; use it (run non-nested for cleanest results).
Doctrine constraints (from CLAUDE.md — do not relitigate):
- Adapt the tool to the agent. Changing tool descriptions /
server-instructions.tsis low-salience and has regressed wall-clock before. Wording alone won't reliably move tool-choice. - New tools fare worse than extending an existing one (the agent under-picks even
trace;codegraph_contextwas removed). - The real levers that landed historically: coverage (more flows connect statically →
exploresurfaces them) and sufficiency (output complete enough that the agent stops reading). - The optimization target is wall-clock + tool-call count + Read=0, not token cost (cost is lower as a side effect).
The fix: make codegraph_node read a file exactly like the Read tool, only
faster — so the agent reaches for it naturally. No forcing. The owner's steer
settled the direction: "codegraph should be able to Read just like the Read
tool… make it as good as Read. Read is slow and old; querying the index is fast.
You keep diverging away from using codegraph rather than pursuing the fix."
DONE — handleFileView (src/mcp/tools.ts) is now full Read parity:
- A
filewith nosymbolreturns the file's current source numbered byte-for-byte the way Read does —<n>\t<line>, no padding, trailing empty line kept (verified by reading the same file with both and diffing). The only addition is a one-line blast-radius header (used by N files: …). offset/limitmean exactly what they do on Read (1-based start; max lines; default whole file capped at 2000 lines like Read). Large files paginate honestly ((lines X–Y of N — pass offset/limit…)), never the 15ktruncateOutputchop.- Content is the default (no
includeCodeneeded);symbolsOnly: truereturns the cheap structural map instead. Security preserved:yaml/propertiessummarized by key, never dumped (#383); reads viavalidatePathWithinRoot(#527). - Tests:
__tests__/node-file-view.test.ts(9, incl. strict format parity^1000\t const v998 = 998;and unpadded^1\timport …). Full suite green (1270). Descriptions /server-instructions.ts/ CHANGELOG reframed: "read a source file with codegraph_node instead of Read — same bytes, faster."
The hook (idea 1) — A/B'd and REJECTED. Do not ship. Kept only as an eval
artifact (scripts/agent-eval/redirect-read-hook.sh + ab-hook.sh).
- Clean A/B (2 runs/arm, devpit "add
dp ping, build it"; both arms codegraph-attached):- nohook: 0 codegraph calls, 1 Read, 5–7 tool calls, 6–8 turns, 55–77s. (Reproduces P1: agent ignores codegraph — but read-once-and-edit is efficient here.)
- hook (deny-redirect): 0 successful Reads + 1 file-view call (parity worked, edit compiled), but 8–9 tool calls, 9–10 turns, 200–239s, and the agent fought the deny —
ToolSearchto find the tool, reflexive re-Read (denied), thenBash python3to read the file around the block. - Verdict: a blanket Read-deny regresses the target metrics (~2× tool calls, more turns) on a simple edit and the agent routes around it. Forcing is the wrong lever; making the tool genuinely better than Read is the right one.
- If routing is ever revisited: not a blanket hook. Either a narrow trigger (large files only / after-N-reads) with a clean A/B on a Read-heavy multi-file task (the hook's best case, untested), or just keep widening coverage + sufficiency.
Symptom: even with codegraph attached + the new steering, the agent reflexively Reads/greps mid-implementation, and never reaches for the file-view. Descriptions can't fix this (low-salience wall).
-
PreToolUse(Read/Grep) hook that redirects to codegraph — highest leverage; the only channel that actually changes behavior.
- Claude Code hooks can intercept a tool call and inject context or block it — unlike descriptions, this is not low-salience. We already have
scripts/agent-eval/block-read-hook.sh+hook-settings.json(used to force Read=0 in evals). - Ship a recommended (opt-in) hook: on
Read(orGrep) of a path that's indexed, inject "this file is indexed —codegraph_node {file}returns it + its blast radius for fewer tokens; treat its output as already-Read." Soft nudge (don't hard-block, or it'll frustrate users on configs/docs codegraph doesn't index). - The installer (
src/installer/targets/claude.ts) could offer to add this hook (opt-in, like the auto-allow permissions). - Validate with
ab-new-vs-baseline.sh(Read count, with vs without the hook). This is the experiment most likely to move the needle. - Open Qs: how to know a path is indexed from inside a hook (query
codegraph files/status, or a fast local check against.codegraph); avoiding noise on non-indexed files; per-language false positives.
- Claude Code hooks can intercept a tool call and inject context or block it — unlike descriptions, this is not low-salience. We already have
-
Sufficiency: make the file-view the obvious Read replacement so the agent wants it.
- The A/B showed the agent never passed a
filetocodegraph_node. Why? It doesn't think "Read this file" → "codegraph_node file=X". Investigate: is the file-view's value (symbols + dependents + bodies) actually better than Read for the agent's next step (anEdit)? It returns bodies — but does it return enough surrounding context toEditconfidently? If not, the agent Reads anyway. - Consider: when the agent does Read an indexed file, is there a way to make codegraph's prior
explore/nodeoutput have already given it what it needed? (i.e. fix the upstream sufficiency, not the Read itself.)
- The A/B showed the agent never passed a
-
Coverage — the durable lever. Every flow that connects statically is one the agent doesn't Read to reconstruct. Keep closing dynamic-dispatch gaps (
src/resolution/). Less about "stop Reading," more about "never need to." -
Naming / affordance experiments (low confidence, cheap). The file-view is buried inside
codegraph_node. A dedicated, obviously-named affordance might get picked more — but "new tools fare worse," so this likely loses. If tried, A/B it; don't assume.
Recommendation: prototype idea 1 (the Read-redirect hook) and A/B it. It's the one lever with a real chance of moving behavior. Everything else is incremental.
Symptom: serve --mcp isn't ready when the agent's first turn fires (the host marks the MCP server status:"pending" / 0 tools), so the agent starts Read/grep and never uses codegraph. We saw this hard in nested evals (~2-3s startup vs the agent's turn-1); real users hit a milder version — the first query of a session may not have codegraph.
serve --mcp does a --liftoff-only re-exec (for a node memory flag) and spawns/binds a detached daemon before tools are usable. Under load that exceeds the host's MCP-startup window. (CODEGRAPH_WASM_RELAUNCHED=1 skips the re-exec; pre-warming a daemon removes the bind latency — both proven in ab-new-vs-baseline.sh. But a real user can't pre-warm.)
-
CODEGRAPH-SIDE — expose the static tool list INSTANTLY, decoupled from the daemon. Biggest shippable win; helps every user.
- Hypothesis: the host marks codegraph
pendingbecausetools/list(tool exposure) waits on the daemon connect. The local handshake already answersinitializefast (~107ms;runLocalHandshakeProxyinsrc/mcp/proxy.ts,getStaticToolsis imported there). Investigate: doesserve --mcpanswertools/listlocally and instantly fromgetStaticTools, or does it forward it to the still-connecting daemon? If the latter, decouple it: advertise the static tools the moment the client asks, mark connected, and resolve the daemon in the background for actual tool calls. - Verify with:
printf '<initialize>\n<initialized>\n<tools/list>\n' | node dist/bin/codegraph.js serve --mcp --path <repo>and time thetools/listresponse, daemon-mode vs in-process. In-process answered in ~165ms; daemon-mode is the suspect. - If this lands,
pending-at-startup largely disappears without any host change.
- Hypothesis: the host marks codegraph
-
CODEGRAPH-SIDE — speed/skip the re-exec on the MCP serve path. The re-exec exists for a V8 memory flag (
src/extraction/wasm-runtime-flags.ts,RELAUNCH_GUARD_ENV = CODEGRAPH_WASM_RELAUNCHED). For MCP serving on a normal repo the flag may be unnecessary, or settable without a full process re-exec. Removing one process spawn from the cold path shaves the startup window. -
CODEGRAPH-SIDE — a SessionStart hook that pre-warms the daemon. Ship an opt-in Claude Code
SessionStarthook (installer-added) that spawns/warms the daemon for the project at session start, so it's bound before the first query. Mitigation if (1) is hard. -
HOST-SIDE — "wait/retry on pending" — this is what you asked about, but it's a Claude Code (MCP client) behavior, not codegraph's to fix. codegraph can't make the agent retry. Options: (a) raise it with Anthropic as an MCP-client improvement (don't let the agent's first turn proceed until configured MCP servers finish connecting, or retry
pendingservers); (b) noteMCP_TIMEOUTexists but did not help here, because the problem is tool exposure timing, not a connection timeout. Frame this as a request, and lean on (1)–(3) for what we control.
Recommendation: chase idea 1 (decouple tools/list from the daemon). It's the fix that makes codegraph "connected" instantly for everyone. Ship idea 3 (pre-warm SessionStart hook) as a cheap mitigation in parallel. File the host-side request (4) but don't depend on it.
- Steering / tools:
src/mcp/server-instructions.ts(theinitializeinstructions — single source of truth),src/mcp/tools.ts(tool descriptions + handlers;handleNode/handleFileView/handleSearch,getStaticTools). - Startup / daemon / proxy:
src/mcp/proxy.ts(runProxy,connectWithHello,runLocalHandshakeProxy, PPID watchdog),src/mcp/index.ts(runProxyWithLocalHandshake,spawnDetachedDaemon),src/mcp/daemon.ts. - Runtime flags:
src/extraction/wasm-runtime-flags.ts(RELAUNCH_GUARD_ENV=CODEGRAPH_WASM_RELAUNCHED,HOST_PPID_ENV=CODEGRAPH_HOST_PPID). - Hooks (existing):
scripts/agent-eval/block-read-hook.sh,scripts/agent-eval/hook-settings.json(the eval's force-Read-0 hook — basis for the P1 redirect hook). - Installer (where to add a recommended hook):
src/installer/targets/claude.ts. - Eval harness:
scripts/agent-eval/ab-new-vs-baseline.sh(new-vs-baseline, pre-warm baked in),run-all.sh(with-vs-without),parse-run.mjs(tool-by-type counts;codegraph tools exposed: 0+ 0 codegraph calls = ran without). - Doctrine:
CLAUDE.md→ "Retrieval performance & dynamic-dispatch coverage" + the agent-eval note under "Validation methodology".
- P1 (Read displacement):
bash scripts/agent-eval/ab-new-vs-baseline.sh <indexed-repo> "<implementation task>" [baseline-ref]— compareReadvsmcp__codegraph__*counts. ≥2 runs/arm (n=1 is noisy). Run non-nested for cleanest results. Use a genuinely new feature task (verify it doesn't already exist — the first A/B attempt wasted a run on an already-implemented--quiet). - P2 (startup): time
tools/listfromserve --mcp(above); and count cold-start runs whereinitshowsconnected+ tools > 0. Don't trust a singlependinginit snapshot — confirm by whether the agent actually called codegraph.
- Descriptions/instructions are low-salience — A/B every behavioral claim, don't ship wording on faith.
- New tools < extending existing ones.
- The host's
initsnapshot can saypendingeven when the server then connects — judge by actual usage. - Don't run evals nested for "clean" numbers unless pre-warmed; even then, a real terminal is better.
- P2 idea 1 — verify whether
serve --mcpanswerstools/listlocally/instantly; if not, decouple it from the daemon. (Highest-value, shippable, helps all users, no behavioral guesswork.) - P1 idea 1 — prototype the PreToolUse(Read) redirect hook; A/B it. (Highest-value behavioral lever.)
- Ship the P2 SessionStart pre-warm hook as a mitigation; file the host-side wait/retry request.