From cf7db7cb9856935b34cae34ccc19eaea3ef72afd Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 10:32:08 -0500 Subject: [PATCH 01/47] fix(mcp): skip fs.watch on WSL2 /mnt drives that hang MCP startup (#199) (#210) Recursive fs.watch on a WSL2 /mnt NTFS/9p mount walks the directory tree with every readdir/stat crossing the Windows boundary, stalling the event loop long enough to blow past opencode's 30s MCP handshake timeout so the tools never appear. This is the file-watcher half of the #172 fix, which moved the DB/WASM open off the handshake but left the watcher on the critical path. - Add watchDisabledReason() policy: CODEGRAPH_NO_WATCH (off) > CODEGRAPH_FORCE_WATCH (force on) > WSL2 + /mnt auto-detect (off). FileWatcher.start() and the MCP server both honor it; the server now logs why watching is off and how to refresh. - Add `codegraph serve --mcp --no-watch`. - When watching is off, init/install offer git sync hooks (post-commit, post-merge, post-checkout) that run `codegraph sync` in the background, or fall back to manual sync; either way the user is told the index stays frozen until re-synced. uninit removes the hooks. - Tests: watch-policy + git-hooks (idempotency, user-content preservation, core.hooksPath). Root-cause analysis and workaround by @mengfanbo123. Closes #199 Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 24 ++++ __tests__/git-hooks.test.ts | 129 ++++++++++++++++++++ __tests__/watch-policy.test.ts | 95 +++++++++++++++ src/bin/codegraph.ts | 27 ++++- src/installer/index.ts | 91 ++++++++++++++- src/mcp/index.ts | 18 +++ src/sync/git-hooks.ts | 208 +++++++++++++++++++++++++++++++++ src/sync/index.ts | 12 ++ src/sync/watch-policy.ts | 104 +++++++++++++++++ src/sync/watcher.ts | 11 ++ 10 files changed, 714 insertions(+), 5 deletions(-) create mode 100644 __tests__/git-hooks.test.ts create mode 100644 __tests__/watch-policy.test.ts create mode 100644 src/sync/git-hooks.ts create mode 100644 src/sync/watch-policy.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 2f993857..ae3d8c7d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -20,6 +20,17 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). the line number while the line-numbered arm answered with zero follow-up tool calls. Payload cost is small (~3-5%). Set `CODEGRAPH_EXPLORE_LINENUMS=0` to disable. +- **MCP / watcher**: CodeGraph now skips the live file watcher on WSL2 + `/mnt/*` drives, where recursive `fs.watch` is slow enough to break MCP + startup (see Fixed). When the watcher is off, `codegraph init` / + `codegraph install` offer to keep the index fresh via git hooks + (`post-commit`, `post-merge`, `post-checkout`) that run `codegraph sync` + in the background — accept for automatic refresh on commit / pull / + checkout, or decline and sync by hand. Either way you're told the index + stays frozen until it's re-synced. New controls: `CODEGRAPH_NO_WATCH=1` + (or `codegraph serve --mcp --no-watch`) forces the watcher off anywhere; + `CODEGRAPH_FORCE_WATCH=1` overrides the WSL auto-detect when your `/mnt` + setup is actually fast. `codegraph uninit` removes any hooks it installed. ### Changed - **MCP / explore**: `codegraph_explore` output is now adaptive to project @@ -46,6 +57,19 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Thanks to [@essopsp](https://github.com/essopsp) for the repro. ### Fixed +- **MCP**: the server no longer hangs on startup under WSL2 when the project + lives on an NTFS `/mnt/*` mount. Setting up the recursive file watcher + there took tens of seconds — every directory read crosses the Windows/9p + boundary — which blew past the host's initialization timeout (opencode's + 30s), so the codegraph tools silently never appeared, even on small + projects. This is the file-watcher half of the + [#172](https://github.com/colbymchenry/codegraph/issues/172) startup fix: + that one moved the database/WASM open off the handshake, but the watcher + setup was still on the critical path. CodeGraph now auto-skips the watcher + on those mounts, with manual and git-hook sync fallbacks (see Added). + Closes [#199](https://github.com/colbymchenry/codegraph/issues/199). + Thanks to [@mengfanbo123](https://github.com/mengfanbo123) for the precise + root-cause analysis and workaround. - **Installer (Claude Code)**: project-local installs (`Just this project`) now write the MCP server to `.mcp.json` in the project root — the file Claude Code actually reads for project-scoped servers. Previously they diff --git a/__tests__/git-hooks.test.ts b/__tests__/git-hooks.test.ts new file mode 100644 index 00000000..4dfd80eb --- /dev/null +++ b/__tests__/git-hooks.test.ts @@ -0,0 +1,129 @@ +/** + * Git Sync Hooks Tests + * + * Covers installing/removing the opt-in commit/merge/checkout hooks that + * keep the index fresh when the live watcher is disabled (issue #199). + * Exercises real git repos in temp dirs — no mocking. + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { execFileSync } from 'child_process'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import { + installGitSyncHook, + removeGitSyncHook, + isSyncHookInstalled, + isGitRepo, + DEFAULT_SYNC_HOOKS, +} from '../src/sync/git-hooks'; + +function gitInit(dir: string): void { + execFileSync('git', ['init', '-q'], { cwd: dir, stdio: 'ignore' }); +} + +function isExecutable(file: string): boolean { + if (process.platform === 'win32') return true; // mode bits not meaningful + return (fs.statSync(file).mode & 0o111) !== 0; +} + +describe('git sync hooks', () => { + let repo: string; + + beforeEach(() => { + repo = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-githooks-')); + }); + + afterEach(() => { + if (fs.existsSync(repo)) fs.rmSync(repo, { recursive: true, force: true }); + }); + + it('installs all default hooks, executable, invoking codegraph sync', () => { + gitInit(repo); + const result = installGitSyncHook(repo); + + expect(result.installed.sort()).toEqual([...DEFAULT_SYNC_HOOKS].sort()); + expect(result.skipped).toBeUndefined(); + + for (const hook of DEFAULT_SYNC_HOOKS) { + const file = path.join(repo, '.git', 'hooks', hook); + expect(fs.existsSync(file)).toBe(true); + const body = fs.readFileSync(file, 'utf8'); + expect(body).toContain('codegraph sync'); + expect(body).toContain('command -v codegraph'); // no-op when not on PATH + expect(isExecutable(file)).toBe(true); + } + expect(isSyncHookInstalled(repo)).toBe(true); + }); + + it('is idempotent — re-install does not duplicate the block', () => { + gitInit(repo); + installGitSyncHook(repo); + installGitSyncHook(repo); + + const body = fs.readFileSync(path.join(repo, '.git', 'hooks', 'post-commit'), 'utf8'); + const occurrences = body.split('# >>> codegraph sync hook >>>').length - 1; + expect(occurrences).toBe(1); + }); + + it('preserves a pre-existing user hook and appends our block', () => { + gitInit(repo); + const file = path.join(repo, '.git', 'hooks', 'post-commit'); + fs.writeFileSync(file, '#!/bin/sh\necho "my custom hook"\n', { mode: 0o755 }); + + installGitSyncHook(repo, ['post-commit']); + + const body = fs.readFileSync(file, 'utf8'); + expect(body).toContain('echo "my custom hook"'); + expect(body).toContain('codegraph sync'); + }); + + it('remove strips our block; deletes a hook that was only ours', () => { + gitInit(repo); + installGitSyncHook(repo, ['post-commit']); + const file = path.join(repo, '.git', 'hooks', 'post-commit'); + expect(fs.existsSync(file)).toBe(true); + + const result = removeGitSyncHook(repo, ['post-commit']); + expect(result.installed).toEqual(['post-commit']); + expect(fs.existsSync(file)).toBe(false); // was ours-only → deleted + expect(isSyncHookInstalled(repo)).toBe(false); + }); + + it('remove keeps user content when the hook is shared', () => { + gitInit(repo); + const file = path.join(repo, '.git', 'hooks', 'post-commit'); + fs.writeFileSync(file, '#!/bin/sh\necho "keep me"\n', { mode: 0o755 }); + installGitSyncHook(repo, ['post-commit']); + + removeGitSyncHook(repo, ['post-commit']); + + expect(fs.existsSync(file)).toBe(true); + const body = fs.readFileSync(file, 'utf8'); + expect(body).toContain('echo "keep me"'); + expect(body).not.toContain('codegraph sync'); + }); + + it('honors core.hooksPath', () => { + gitInit(repo); + const customHooks = path.join(repo, '.husky'); + fs.mkdirSync(customHooks); + execFileSync('git', ['config', 'core.hooksPath', '.husky'], { cwd: repo, stdio: 'ignore' }); + + const result = installGitSyncHook(repo, ['post-commit']); + expect(result.hooksDir).toBe(customHooks); + expect(fs.existsSync(path.join(customHooks, 'post-commit'))).toBe(true); + // The default .git/hooks dir should NOT have received the hook. + expect(fs.existsSync(path.join(repo, '.git', 'hooks', 'post-commit'))).toBe(false); + }); + + it('skips cleanly when not a git repository', () => { + expect(isGitRepo(repo)).toBe(false); + const result = installGitSyncHook(repo); + expect(result.installed).toEqual([]); + expect(result.hooksDir).toBeNull(); + expect(result.skipped).toMatch(/not a git repository/); + expect(isSyncHookInstalled(repo)).toBe(false); + }); +}); diff --git a/__tests__/watch-policy.test.ts b/__tests__/watch-policy.test.ts new file mode 100644 index 00000000..ee50d8c9 --- /dev/null +++ b/__tests__/watch-policy.test.ts @@ -0,0 +1,95 @@ +/** + * Watch Policy Tests + * + * Covers the decision of whether the live file watcher runs, including the + * WSL2 /mnt auto-detect and the env-var escape hatches (issue #199), plus + * that FileWatcher.start() honors the decision. + */ + +import { describe, it, expect, afterEach, vi } from 'vitest'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import { watchDisabledReason } from '../src/sync/watch-policy'; +import { FileWatcher } from '../src/sync/watcher'; +import type { CodeGraphConfig } from '../src/types'; + +describe('watchDisabledReason', () => { + it('returns a reason when CODEGRAPH_NO_WATCH=1', () => { + const reason = watchDisabledReason('/home/me/project', { + env: { CODEGRAPH_NO_WATCH: '1' }, + isWsl: false, + }); + expect(reason).toBeTruthy(); + expect(reason).toMatch(/CODEGRAPH_NO_WATCH/); + }); + + it('auto-disables on a WSL2 /mnt drive', () => { + const reason = watchDisabledReason('/mnt/d/code/project', { env: {}, isWsl: true }); + expect(reason).toBeTruthy(); + expect(reason).toMatch(/mnt/); + }); + + it('does NOT disable on a native WSL home path', () => { + expect(watchDisabledReason('/home/me/project', { env: {}, isWsl: true })).toBeNull(); + }); + + it('does NOT disable on /mnt when not running under WSL', () => { + // A real Linux box may legitimately have a fast /mnt mount. + expect(watchDisabledReason('/mnt/d/code/project', { env: {}, isWsl: false })).toBeNull(); + }); + + it('does NOT treat /mnt/wsl (fast Linux mount) as a Windows drive', () => { + expect(watchDisabledReason('/mnt/wsl/project', { env: {}, isWsl: true })).toBeNull(); + }); + + it('CODEGRAPH_FORCE_WATCH=1 overrides WSL auto-detect', () => { + const reason = watchDisabledReason('/mnt/d/code/project', { + env: { CODEGRAPH_FORCE_WATCH: '1' }, + isWsl: true, + }); + expect(reason).toBeNull(); + }); + + it('CODEGRAPH_NO_WATCH wins over CODEGRAPH_FORCE_WATCH', () => { + const reason = watchDisabledReason('/home/me/project', { + env: { CODEGRAPH_NO_WATCH: '1', CODEGRAPH_FORCE_WATCH: '1' }, + isWsl: false, + }); + expect(reason).toBeTruthy(); + }); +}); + +describe('FileWatcher honors the watch policy', () => { + let testDir: string; + + const baseConfig: CodeGraphConfig = { + version: 1, + rootDir: '.', + include: ['**/*.ts'], + exclude: ['**/node_modules/**'], + languages: [], + frameworks: [], + maxFileSize: 1024 * 1024, + extractDocstrings: true, + trackCallSites: true, + }; + + afterEach(() => { + delete process.env.CODEGRAPH_NO_WATCH; + if (testDir && fs.existsSync(testDir)) { + fs.rmSync(testDir, { recursive: true, force: true }); + } + }); + + it('does not start when CODEGRAPH_NO_WATCH=1', () => { + testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-nowatch-')); + process.env.CODEGRAPH_NO_WATCH = '1'; + + const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); + const watcher = new FileWatcher(testDir, baseConfig, syncFn); + + expect(watcher.start()).toBe(false); + expect(watcher.isActive()).toBe(false); + }); +}); diff --git a/src/bin/codegraph.ts b/src/bin/codegraph.ts index 2b497b98..de608c36 100644 --- a/src/bin/codegraph.ts +++ b/src/bin/codegraph.ts @@ -415,6 +415,10 @@ program clack.log.success(`${target.displayName}: ${file.action} ${file.path}`); } } catch { /* non-fatal */ } + try { + const { offerWatchFallback } = await import('../installer'); + await offerWatchFallback(clack, projectPath); + } catch { /* non-fatal */ } clack.outro(''); return; } @@ -459,6 +463,11 @@ program clack.log.info('Run "codegraph index" to index the project'); } + try { + const { offerWatchFallback } = await import('../installer'); + await offerWatchFallback(clack, projectPath); + } catch { /* non-fatal */ } + clack.outro('Done'); cg.destroy(); } catch (err) { @@ -505,6 +514,15 @@ program const cg = CodeGraph.openSync(projectPath); cg.uninitialize(); + // Clean up any git sync hooks we installed (no-op if none / not a repo). + try { + const { removeGitSyncHook } = await import('../sync/git-hooks'); + const removed = removeGitSyncHook(projectPath); + if (removed.installed.length > 0) { + info(`Removed git ${removed.installed.join(', ')} sync hook${removed.installed.length > 1 ? 's' : ''}`); + } + } catch { /* non-fatal */ } + success(`Removed CodeGraph from ${projectPath}`); } catch (err) { error(`Failed to uninitialize: ${err instanceof Error ? err.message : String(err)}`); @@ -1085,9 +1103,16 @@ program .description('Start CodeGraph as an MCP server for AI assistants') .option('-p, --path ', 'Project path (optional for MCP mode, uses rootUri from client)') .option('--mcp', 'Run as MCP server (stdio transport)') - .action(async (options: { path?: string; mcp?: boolean }) => { + .option('--no-watch', 'Disable the file watcher (no auto-sync; useful on slow filesystems like WSL2 /mnt drives)') + .action(async (options: { path?: string; mcp?: boolean; watch?: boolean }) => { const projectPath = options.path ? resolveProjectPath(options.path) : undefined; + // Commander sets watch=false when --no-watch is passed. Route it through + // the same env-var chokepoint the watcher and MCP server already honor. + if (options.watch === false) { + process.env.CODEGRAPH_NO_WATCH = '1'; + } + try { if (options.mcp) { // Start MCP server - it handles initialization lazily based on rootUri from client diff --git a/src/installer/index.ts b/src/installer/index.ts index 833759da..687fc884 100644 --- a/src/installer/index.ts +++ b/src/installer/index.ts @@ -22,6 +22,11 @@ import { } from './targets/registry'; import type { AgentTarget, Location, WriteResult } from './targets/types'; import { getGlyphs } from '../ui/glyphs'; +// Import the lightweight submodules directly (not the ../sync barrel, which +// re-exports FileWatcher and would transitively pull in ../extraction — the +// installer must stay importable even when native modules can't load). +import { watchDisabledReason } from '../sync/watch-policy'; +import { isGitRepo, isSyncHookInstalled, installGitSyncHook } from '../sync/git-hooks'; // Backwards-compat: keep these named exports — downstream code may // import them. The shim in `config-writer.ts` continues to re-export @@ -198,7 +203,7 @@ export async function runInstallerWithOptions(opts: RunInstallerOptions): Promis // Step 6: for local install, initialize the project. if (location === 'local') { - await initializeLocalProject(clack); + await initializeLocalProject(clack, useDefaults); } if (location === 'global') { @@ -304,10 +309,14 @@ async function resolveTargets( } /** - * Initialize CodeGraph in the current project (for local installs). - * Unchanged from the pre-refactor version — agent-agnostic by nature. + * Initialize CodeGraph in the current project (for local installs), then + * offer the watch fallback when the live watcher won't run here (see + * offerWatchFallback). Agent-agnostic by nature. */ -async function initializeLocalProject(clack: typeof import('@clack/prompts')): Promise { +async function initializeLocalProject( + clack: typeof import('@clack/prompts'), + useDefaults = false, +): Promise { const projectPath = process.cwd(); let CodeGraph: typeof import('../index').default; @@ -323,6 +332,7 @@ async function initializeLocalProject(clack: typeof import('@clack/prompts')): P // Check if already initialized if (CodeGraph.isInitialized(projectPath)) { clack.log.info('CodeGraph already initialized in this project'); + await offerWatchFallback(clack, projectPath, { yes: useDefaults }); return; } @@ -348,4 +358,77 @@ async function initializeLocalProject(clack: typeof import('@clack/prompts')): P } cg.close(); + + await offerWatchFallback(clack, projectPath, { yes: useDefaults }); +} + +/** + * When the live file watcher will be disabled for this project (e.g. WSL2 + * /mnt drives, or CODEGRAPH_NO_WATCH), the index would silently go stale. + * Explain that, and offer to keep it fresh automatically via git hooks + * (commit / pull / checkout) instead of manual `codegraph sync`. + * + * No-op on environments where the watcher runs normally, so it's safe to + * call unconditionally after init. + */ +export async function offerWatchFallback( + clack: typeof import('@clack/prompts'), + projectPath: string, + opts: { yes?: boolean } = {}, +): Promise { + const reason = watchDisabledReason(projectPath); + if (!reason) return; // Watcher runs normally — nothing to set up. + + clack.log.warn(`Live file watching is disabled here — ${reason}.`); + clack.log.info('Until you re-sync, the CodeGraph index stays frozen — it will not pick up edits on its own.'); + + // No git repo → the commit-hook path doesn't apply; point at manual sync. + if (!isGitRepo(projectPath)) { + clack.log.info('Run `codegraph sync` after changing files to refresh the index.'); + return; + } + + // Already wired up on a previous run — confirm and move on without nagging. + if (isSyncHookInstalled(projectPath)) { + clack.log.info('Git sync hooks are already installed — the index refreshes after commit / pull / checkout.'); + return; + } + + let choice: 'hook' | 'manual'; + if (opts.yes) { + choice = 'hook'; + } else { + const sel = await clack.select({ + message: 'How should CodeGraph keep its index fresh?', + options: [ + { value: 'hook' as const, label: 'Sync on git commit / pull / checkout', hint: 'installs git hooks (recommended)' }, + { value: 'manual' as const, label: 'I\'ll run `codegraph sync` myself', hint: 'fully manual' }, + ], + initialValue: 'hook' as const, + }); + if (clack.isCancel(sel)) { + clack.log.info('Skipped — run `codegraph sync` after changes to refresh the index.'); + return; + } + choice = sel; + } + + if (choice === 'manual') { + clack.log.info('Run `codegraph sync` after changing files to refresh the index.'); + return; + } + + const result = installGitSyncHook(projectPath); + if (result.installed.length > 0) { + clack.log.success( + `Installed git ${result.installed.join(', ')} hook${result.installed.length > 1 ? 's' : ''} — ` + + 'the index refreshes in the background after each.', + ); + clack.log.info('Run `codegraph sync` anytime to refresh immediately.'); + } else { + clack.log.warn( + `Could not install git hooks${result.skipped ? ` (${result.skipped})` : ''}. ` + + 'Run `codegraph sync` after changes instead.', + ); + } } diff --git a/src/mcp/index.ts b/src/mcp/index.ts index 924fd77e..b601c36e 100644 --- a/src/mcp/index.ts +++ b/src/mcp/index.ts @@ -17,6 +17,7 @@ import * as path from 'path'; import CodeGraph, { findNearestCodeGraphRoot } from '../index'; +import { watchDisabledReason } from '../sync'; import { StdioTransport, JsonRpcRequest, JsonRpcNotification, ErrorCodes } from './transport'; import { tools, ToolHandler } from './tools'; import { SERVER_INSTRUCTIONS } from './server-instructions'; @@ -173,6 +174,18 @@ export class MCPServer { private startWatching(): void { if (!this.cg) return; + // When the watcher is intentionally disabled (e.g. WSL2 /mnt drives, or + // CODEGRAPH_NO_WATCH=1), say so explicitly and tell the user how to keep + // the graph fresh — otherwise the silent staleness is hard to diagnose. + const disabledReason = watchDisabledReason(this.projectPath ?? process.cwd()); + if (disabledReason) { + process.stderr.write( + `[CodeGraph MCP] File watcher disabled — ${disabledReason}. ` + + `The graph will not auto-update; run \`codegraph sync\` (or install the git sync hooks via \`codegraph init\`) to refresh.\n` + ); + return; + } + const started = this.cg.watch({ onSyncComplete: (result) => { if (result.filesChanged > 0) { @@ -188,6 +201,11 @@ export class MCPServer { if (started) { process.stderr.write('[CodeGraph MCP] File watcher active — graph will auto-sync on changes\n'); + } else { + // start() can also return false when recursive fs.watch isn't supported. + process.stderr.write( + '[CodeGraph MCP] File watcher unavailable on this platform — run `codegraph sync` to refresh the graph after changes.\n' + ); } } diff --git a/src/sync/git-hooks.ts b/src/sync/git-hooks.ts new file mode 100644 index 00000000..3344c5ff --- /dev/null +++ b/src/sync/git-hooks.ts @@ -0,0 +1,208 @@ +/** + * Git Sync Hooks + * + * When the live file watcher is disabled (e.g. on WSL2 `/mnt/*` drives, + * see watch-policy.ts), the CodeGraph index would otherwise go stale until + * the user runs `codegraph sync` by hand. As an opt-in alternative, we can + * install git hooks that refresh the index after the operations that change + * files on disk: commit, merge (covers `git pull`), and checkout. + * + * The hooks run `codegraph sync` in the background so they never block git, + * and are guarded by `command -v codegraph` so they no-op cleanly when the + * CLI isn't on PATH. Our snippet is delimited by marker comments so install + * is idempotent and removal preserves any user-authored hook content. + */ + +import * as fs from 'fs'; +import * as path from 'path'; +import { execFileSync } from 'child_process'; + +const MARKER_BEGIN = '# >>> codegraph sync hook >>>'; +const MARKER_END = '# <<< codegraph sync hook <<<'; + +export type GitHookName = 'post-commit' | 'post-merge' | 'post-checkout'; + +/** Hooks installed by default: commit, merge (git pull), and checkout. */ +export const DEFAULT_SYNC_HOOKS: GitHookName[] = ['post-commit', 'post-merge', 'post-checkout']; + +export interface GitHookResult { + /** Hook names that were created or updated. */ + installed: GitHookName[]; + /** Resolved hooks directory, or null when not a git repo. */ + hooksDir: string | null; + /** Reason nothing happened (e.g. not a git repository). */ + skipped?: string; +} + +/** + * Whether `projectRoot` is inside a git working tree. Returns false if git + * isn't installed or the path isn't a repo. + */ +export function isGitRepo(projectRoot: string): boolean { + try { + const out = execFileSync('git', ['rev-parse', '--is-inside-work-tree'], { + cwd: projectRoot, + encoding: 'utf8', + stdio: ['ignore', 'pipe', 'ignore'], + }).trim(); + return out === 'true'; + } catch { + return false; + } +} + +/** + * Resolve the git hooks directory for a project, honoring `core.hooksPath` + * and git worktrees. Returns an absolute path, or null when not a repo. + */ +function gitHooksDir(projectRoot: string): string | null { + try { + const out = execFileSync('git', ['rev-parse', '--git-path', 'hooks'], { + cwd: projectRoot, + encoding: 'utf8', + stdio: ['ignore', 'pipe', 'ignore'], + }).trim(); + if (!out) return null; + return path.isAbsolute(out) ? out : path.resolve(projectRoot, out); + } catch { + return null; + } +} + +/** The shell snippet (between markers) injected into each hook. */ +function markerBlock(): string { + return [ + MARKER_BEGIN, + '# Keeps the CodeGraph index fresh while the live file watcher is off', + '# (e.g. WSL2 /mnt drives). Runs in the background so it never blocks git.', + '# Managed by codegraph; remove with `codegraph uninit` or delete this block.', + 'if command -v codegraph >/dev/null 2>&1; then', + ' ( codegraph sync >/dev/null 2>&1 & ) >/dev/null 2>&1', + 'fi', + MARKER_END, + ].join('\n'); +} + +/** Remove our marker block (and the marker lines) from hook content. */ +function stripMarkerBlock(content: string): string { + const lines = content.split('\n'); + const kept: string[] = []; + let inBlock = false; + for (const line of lines) { + const trimmed = line.trim(); + if (trimmed === MARKER_BEGIN) { inBlock = true; continue; } + if (trimmed === MARKER_END) { inBlock = false; continue; } + if (!inBlock) kept.push(line); + } + return kept.join('\n'); +} + +/** Whether a hook body is just a shebang / blank lines (i.e. only ever ours). */ +function isEffectivelyEmpty(content: string): boolean { + return content + .split('\n') + .map((l) => l.trim()) + .every((l) => l.length === 0 || l.startsWith('#!')); +} + +function chmodExecutable(file: string): void { + try { + fs.chmodSync(file, 0o755); + } catch { + /* chmod is a no-op / unsupported on some platforms (e.g. Windows) */ + } +} + +/** + * Install (or update) the CodeGraph sync hooks in a git repository. + * Idempotent: re-running replaces our marker block rather than duplicating + * it, and any user-authored hook content is preserved. + */ +export function installGitSyncHook( + projectRoot: string, + hooks: GitHookName[] = DEFAULT_SYNC_HOOKS, +): GitHookResult { + const hooksDir = gitHooksDir(projectRoot); + if (!hooksDir) { + return { installed: [], hooksDir: null, skipped: 'not a git repository' }; + } + + try { + fs.mkdirSync(hooksDir, { recursive: true }); + } catch { + return { installed: [], hooksDir, skipped: 'could not access the git hooks directory' }; + } + + const block = markerBlock(); + const installed: GitHookName[] = []; + + for (const hook of hooks) { + const file = path.join(hooksDir, hook); + let content: string; + + if (fs.existsSync(file)) { + // Strip any prior block, then re-append the current one. + const base = stripMarkerBlock(fs.readFileSync(file, 'utf8')).replace(/\s*$/, ''); + content = base.length > 0 + ? `${base}\n\n${block}\n` + : `#!/bin/sh\n${block}\n`; + } else { + content = `#!/bin/sh\n${block}\n`; + } + + fs.writeFileSync(file, content); + chmodExecutable(file); + installed.push(hook); + } + + return { installed, hooksDir }; +} + +/** + * Remove the CodeGraph sync hooks. Strips only our marker block; deletes the + * hook file entirely when nothing but a shebang remains, otherwise rewrites + * the user's content untouched. + */ +export function removeGitSyncHook( + projectRoot: string, + hooks: GitHookName[] = DEFAULT_SYNC_HOOKS, +): GitHookResult { + const hooksDir = gitHooksDir(projectRoot); + if (!hooksDir) { + return { installed: [], hooksDir: null, skipped: 'not a git repository' }; + } + + const removed: GitHookName[] = []; + + for (const hook of hooks) { + const file = path.join(hooksDir, hook); + if (!fs.existsSync(file)) continue; + + const original = fs.readFileSync(file, 'utf8'); + if (!original.includes(MARKER_BEGIN)) continue; + + const stripped = stripMarkerBlock(original); + if (isEffectivelyEmpty(stripped)) { + fs.unlinkSync(file); + } else { + fs.writeFileSync(file, `${stripped.replace(/\s*$/, '')}\n`); + chmodExecutable(file); + } + removed.push(hook); + } + + return { installed: removed, hooksDir }; +} + +/** Whether any CodeGraph sync hook is currently installed. */ +export function isSyncHookInstalled( + projectRoot: string, + hooks: GitHookName[] = DEFAULT_SYNC_HOOKS, +): boolean { + const hooksDir = gitHooksDir(projectRoot); + if (!hooksDir) return false; + return hooks.some((hook) => { + const file = path.join(hooksDir, hook); + return fs.existsSync(file) && fs.readFileSync(file, 'utf8').includes(MARKER_BEGIN); + }); +} diff --git a/src/sync/index.ts b/src/sync/index.ts index 51b8b6f6..1857c5a4 100644 --- a/src/sync/index.ts +++ b/src/sync/index.ts @@ -6,8 +6,20 @@ * * Components: * - FileWatcher: Debounced fs.watch that auto-triggers sync on file changes + * - Watch policy: decides when the watcher must be disabled (e.g. WSL2 /mnt) + * - Git sync hooks: opt-in commit/merge/checkout hooks when watching is off * - Content hashing for change detection (in extraction module) * - Incremental reindexing (in extraction module) */ export { FileWatcher, WatchOptions } from './watcher'; +export { watchDisabledReason, detectWsl } from './watch-policy'; +export { + installGitSyncHook, + removeGitSyncHook, + isSyncHookInstalled, + isGitRepo, + DEFAULT_SYNC_HOOKS, + type GitHookName, + type GitHookResult, +} from './git-hooks'; diff --git a/src/sync/watch-policy.ts b/src/sync/watch-policy.ts new file mode 100644 index 00000000..426a8869 --- /dev/null +++ b/src/sync/watch-policy.ts @@ -0,0 +1,104 @@ +/** + * Watch Policy + * + * Decides whether the live file watcher should run for a given project. + * + * Native recursive `fs.watch` is pathologically slow on WSL2 `/mnt/*` + * drives (NTFS exposed over the 9p/drvfs bridge): setting up the recursive + * watch walks the directory tree, and every readdir/stat crosses the + * Windows boundary. Inside an MCP server this stalls the event loop during + * startup long enough to blow past host handshake timeouts (opencode's 30s), + * so the tools never appear. See issue #199. + * + * This module centralizes the on/off decision so the watcher, the MCP + * server (for diagnostics), and the installer all agree. + */ + +import * as fs from 'fs'; +import { normalizePath } from '../utils'; + +let wslChecked = false; +let wslValue = false; + +/** + * Detect whether the current process is running under WSL (Windows + * Subsystem for Linux). Result is cached after the first call. + * + * Checks the WSL-specific env vars first (no I/O), then falls back to + * `/proc/version`, which contains "microsoft" on WSL kernels. + */ +export function detectWsl(): boolean { + if (wslChecked) return wslValue; + wslChecked = true; + + if (process.platform !== 'linux') { + wslValue = false; + return wslValue; + } + if (process.env.WSL_DISTRO_NAME || process.env.WSL_INTEROP) { + wslValue = true; + return wslValue; + } + try { + const version = fs.readFileSync('/proc/version', 'utf8').toLowerCase(); + wslValue = version.includes('microsoft') || version.includes('wsl'); + } catch { + wslValue = false; + } + return wslValue; +} + +/** + * True for WSL Windows-drive mounts like `/mnt/c` or `/mnt/d/project`. + * Deliberately matches only single-letter drive mounts, so genuinely fast + * Linux mounts such as `/mnt/wsl/...` are not flagged. + */ +function isWindowsDriveMount(projectRoot: string): boolean { + return /^\/mnt\/[a-z](\/|$)/i.test(normalizePath(projectRoot)); +} + +/** + * Inputs that can be overridden in tests so the decision is deterministic + * without touching real env vars or `/proc/version`. + */ +export interface WatchProbe { + /** Defaults to `process.env`. */ + env?: NodeJS.ProcessEnv; + /** Defaults to `detectWsl()`. */ + isWsl?: boolean; +} + +/** + * Decide whether the file watcher should be disabled for a project, and why. + * + * Returns a short human-readable reason when watching should be skipped, or + * `null` when it should run normally. + * + * Precedence (first match wins): + * 1. `CODEGRAPH_NO_WATCH=1` → off (explicit opt-out always wins) + * 2. `CODEGRAPH_FORCE_WATCH=1` → on (overrides auto-detection) + * 3. WSL2 + `/mnt/*` drive → off (recursive fs.watch is too slow; #199) + */ +export function watchDisabledReason(projectRoot: string, probe: WatchProbe = {}): string | null { + const env = probe.env ?? process.env; + + if (env.CODEGRAPH_NO_WATCH === '1') { + return 'CODEGRAPH_NO_WATCH=1 is set'; + } + if (env.CODEGRAPH_FORCE_WATCH === '1') { + return null; + } + + const isWsl = probe.isWsl ?? detectWsl(); + if (isWsl && isWindowsDriveMount(projectRoot)) { + return 'project is on a WSL2 /mnt/ drive, where recursive fs.watch is too slow to be reliable'; + } + + return null; +} + +/** Test-only: reset the cached WSL detection. */ +export function __resetWslCacheForTests(): void { + wslChecked = false; + wslValue = false; +} diff --git a/src/sync/watcher.ts b/src/sync/watcher.ts index d3ef24b3..2c16d82a 100644 --- a/src/sync/watcher.ts +++ b/src/sync/watcher.ts @@ -13,6 +13,7 @@ import { CodeGraphConfig } from '../types'; import { shouldIncludeFile } from '../extraction'; import { logDebug, logWarn } from '../errors'; import { normalizePath } from '../utils'; +import { watchDisabledReason } from './watch-policy'; /** * Options for the file watcher @@ -82,6 +83,16 @@ export class FileWatcher { if (this.watcher) return true; // Already watching this.stopped = false; + // Some environments make recursive fs.watch unusable — most notably WSL2 + // /mnt/ drives, where setup blocks long enough to break MCP startup + // handshakes (issue #199). Skip watching there; callers fall back to + // manual `codegraph sync` or the git sync hooks. + const disabledReason = watchDisabledReason(this.projectRoot); + if (disabledReason) { + logDebug('File watcher disabled', { reason: disabledReason, projectRoot: this.projectRoot }); + return false; + } + try { this.watcher = fs.watch( this.projectRoot, From 1cd162a66da5475d2590ef6731512e20f5e90b93 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 11:35:13 -0500 Subject: [PATCH 02/47] fix(mcp): auto-detect project via roots/list when no rootUri (#196) (#214) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MCP tools failed with "CodeGraph not initialized" when a client launched the server outside the project and sent no rootUri/workspaceFolders — the server fell back to its own cwd, missed the project's .codegraph/, and returned a misleading "run codegraph init" error on every call. The only workaround was passing projectPath by hand to each tool. When no explicit path is given, the server now asks the client for its workspace root via the standard MCP roots/list request (gated on the client advertising the roots capability) before falling back to cwd. This required teaching the stdio transport to send server->client requests and match their responses by id (previously responses were dropped as invalid). When a project still can't be resolved, the error now names the directory it searched and tells the user to pass projectPath or add --path to the MCP config, instead of pointing at a re-init they don't need. Reported-by: @zhangyu1197 Closes #196 Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 17 ++++ __tests__/mcp-roots.test.ts | 180 ++++++++++++++++++++++++++++++++++++ src/mcp/index.ts | 120 ++++++++++++++++++++---- src/mcp/tools.ts | 22 ++++- src/mcp/transport.ts | 69 ++++++++++++++ 5 files changed, 388 insertions(+), 20 deletions(-) create mode 100644 __tests__/mcp-roots.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index ae3d8c7d..4f150837 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -57,6 +57,23 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Thanks to [@essopsp](https://github.com/essopsp) for the repro. ### Fixed +- **MCP**: tools no longer fail with "CodeGraph not initialized" when the index + actually exists. This hit clients that launch the MCP server from a directory + other than your project and don't report a workspace root in `initialize` + (some IDE/JetBrains-family integrations) — the server fell back to its own + working directory, missed the project's `.codegraph/`, and returned the + misleading "Run 'codegraph init' first" on every call. The only workaround + was passing `projectPath` to each tool by hand. Now, when no project path is + supplied, the server asks the client for its workspace root via the standard + MCP `roots/list` request (when the client advertises the `roots` capability) + before falling back to the working directory — so detection just works for + spec-compliant clients. When it still can't resolve a project, the error is + now actionable: it names the directory it searched and tells you to pass + `projectPath` or add `--path /abs/project` to the server's MCP config args, + instead of pointing you at a re-init you don't need. Closes + [#196](https://github.com/colbymchenry/codegraph/issues/196). Thanks to + [@zhangyu1197](https://github.com/zhangyu1197) for the report and the + `projectPath` workaround. - **MCP**: the server no longer hangs on startup under WSL2 when the project lives on an NTFS `/mnt/*` mount. Setting up the recursive file watcher there took tens of seconds — every directory read crosses the Windows/9p diff --git a/__tests__/mcp-roots.test.ts b/__tests__/mcp-roots.test.ts new file mode 100644 index 00000000..8e1d4520 --- /dev/null +++ b/__tests__/mcp-roots.test.ts @@ -0,0 +1,180 @@ +/** + * MCP project-resolution regression tests (issue #196). + * + * When an MCP client launches the server outside the project directory AND + * doesn't pass a `rootUri`/`workspaceFolders` in `initialize`, the server used + * to fall straight back to `process.cwd()` — which for many IDE clients is the + * wrong directory. Every tool call without an explicit `projectPath` then + * failed with a misleading "CodeGraph not initialized. Run 'codegraph init'." + * + * The fix: when no explicit path is provided, the server asks the client for + * its workspace root via the spec-blessed `roots/list` request (if the client + * advertised the `roots` capability), and only falls back to cwd otherwise. + * When it still can't resolve, the error now says exactly how to fix it. + * + * These tests drive the real stdio transport via a spawned subprocess — no + * mocking — so they also exercise the new bidirectional request/response path. + */ +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import { spawn, ChildProcessWithoutNullStreams } from 'child_process'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import { CodeGraph } from '../src'; + +const BIN = path.resolve(__dirname, '../dist/bin/codegraph.js'); + +function spawnServer(cwd: string): ChildProcessWithoutNullStreams { + // --no-watch keeps the test deterministic and avoids watcher startup noise. + return spawn(process.execPath, [BIN, 'serve', '--mcp', '--no-watch'], { + cwd, + stdio: ['pipe', 'pipe', 'pipe'], + }) as ChildProcessWithoutNullStreams; +} + +/** Parse every JSON-RPC message the server writes to stdout into an array. */ +function collectMessages(child: ChildProcessWithoutNullStreams): Array> { + const messages: Array> = []; + let buf = ''; + child.stdout.on('data', (chunk) => { + buf += chunk.toString('utf8'); + let idx; + while ((idx = buf.indexOf('\n')) !== -1) { + const line = buf.slice(0, idx).trim(); + buf = buf.slice(idx + 1); + if (!line) continue; + try { messages.push(JSON.parse(line)); } catch { /* ignore non-JSON */ } + } + }); + return messages; +} + +function waitForMessage( + messages: ReadonlyArray>, + predicate: (m: Record) => boolean, + timeoutMs: number, +): Promise> { + return new Promise((resolve, reject) => { + const started = Date.now(); + const tick = () => { + const hit = messages.find(predicate); + if (hit) return resolve(hit); + if (Date.now() - started > timeoutMs) { + return reject(new Error(`Timed out. Messages so far: ${JSON.stringify(messages)}`)); + } + setTimeout(tick, 20); + }; + tick(); + }); +} + +function send(child: ChildProcessWithoutNullStreams, msg: object): void { + child.stdin.write(JSON.stringify(msg) + '\n'); +} + +const CLIENT_INFO = { name: 'test', version: '0.0.0' }; + +describe('MCP project resolution via roots/list (issue #196)', () => { + let cwdDir: string; // where the server is launched — has NO .codegraph + let projectDir: string; // the real indexed project the client reports + let child: ChildProcessWithoutNullStreams | null = null; + + beforeEach(() => { + cwdDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-mcp-cwd-')); + projectDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-mcp-proj-')); + }); + + afterEach(() => { + if (child && !child.killed) { + child.kill('SIGKILL'); + child = null; + } + fs.rmSync(cwdDir, { recursive: true, force: true }); + fs.rmSync(projectDir, { recursive: true, force: true }); + }); + + it('resolves the project from the client roots/list when no rootUri is sent', async () => { + const cg = await CodeGraph.init(projectDir); + cg.close(); + + child = spawnServer(cwdDir); + const messages = collectMessages(child); + + // Advertise the roots capability but pass NO rootUri/workspaceFolders. + send(child, { + jsonrpc: '2.0', id: 0, method: 'initialize', + params: { protocolVersion: '2025-11-25', capabilities: { roots: {} }, clientInfo: CLIENT_INFO }, + }); + await waitForMessage(messages, (m) => m.id === 0 && !!m.result, 5000); + send(child, { jsonrpc: '2.0', method: 'notifications/initialized' }); + + // First tool call (no projectPath) drives the server to ask us for roots. + send(child, { jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'codegraph_status', arguments: {} } }); + + const rootsReq = await waitForMessage(messages, (m) => m.method === 'roots/list', 5000); + expect(typeof rootsReq.id).toBe('string'); // server-initiated id + send(child, { + jsonrpc: '2.0', id: rootsReq.id, + result: { roots: [{ uri: `file://${projectDir}`, name: 'proj' }] }, + }); + + // The status call now succeeds against the resolved project. + const resp = await waitForMessage(messages, (m) => m.id === 1, 8000); + const text = resp.result.content[0].text as string; + expect(text).toContain('CodeGraph Status'); + expect(text).not.toContain('No CodeGraph project is loaded'); + }, 20000); + + it('returns an actionable error when there is no rootUri and no roots capability', async () => { + child = spawnServer(cwdDir); + const messages = collectMessages(child); + + send(child, { + jsonrpc: '2.0', id: 0, method: 'initialize', + params: { protocolVersion: '2025-11-25', capabilities: {}, clientInfo: CLIENT_INFO }, + }); + await waitForMessage(messages, (m) => m.id === 0 && !!m.result, 5000); + send(child, { jsonrpc: '2.0', method: 'notifications/initialized' }); + + send(child, { jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'codegraph_status', arguments: {} } }); + const resp = await waitForMessage(messages, (m) => m.id === 1, 8000); + const text = resp.result.content[0].text as string; + + expect(text).toContain('No CodeGraph project is loaded'); + expect(text).toContain('projectPath'); + expect(text).toContain('--path'); + // Names the directory it actually searched (the wrong cwd) so the user can + // see why detection missed. basename survives any symlink realpath-ing. + expect(text).toContain(path.basename(cwdDir)); + // It must not have hung waiting on roots/list — the client never offered it. + expect(messages.some((m) => m.method === 'roots/list')).toBe(false); + }, 20000); + + it('honors an explicit rootUri without asking the client for roots', async () => { + const cg = await CodeGraph.init(projectDir); + cg.close(); + + child = spawnServer(cwdDir); + const messages = collectMessages(child); + + send(child, { + jsonrpc: '2.0', id: 0, method: 'initialize', + params: { + protocolVersion: '2025-11-25', + capabilities: { roots: {} }, + clientInfo: CLIENT_INFO, + rootUri: `file://${projectDir}`, + }, + }); + await waitForMessage(messages, (m) => m.id === 0 && !!m.result, 5000); + send(child, { jsonrpc: '2.0', method: 'notifications/initialized' }); + + send(child, { jsonrpc: '2.0', id: 1, method: 'tools/call', params: { name: 'codegraph_status', arguments: {} } }); + const resp = await waitForMessage(messages, (m) => m.id === 1, 8000); + const text = resp.result.content[0].text as string; + + expect(text).toContain('CodeGraph Status'); + // rootUri is a stronger signal than roots — we never needed to ask. + expect(messages.some((m) => m.method === 'roots/list')).toBe(false); + }, 20000); +}); diff --git a/src/mcp/index.ts b/src/mcp/index.ts index b601c36e..c790a4bc 100644 --- a/src/mcp/index.ts +++ b/src/mcp/index.ts @@ -54,6 +54,26 @@ const SERVER_INFO = { */ const PROTOCOL_VERSION = '2024-11-05'; +/** + * How long to wait for the client's `roots/list` response before giving up + * and falling back to the process cwd. + */ +const ROOTS_LIST_TIMEOUT_MS = 5000; + +/** + * Extract the first usable filesystem path from a `roots/list` result. + * Shape per MCP spec: `{ roots: [{ uri: "file:///path", name?: string }] }`. + * Returns null if the result is empty or malformed. + */ +function firstRootPath(result: unknown): string | null { + if (!result || typeof result !== 'object') return null; + const roots = (result as { roots?: unknown }).roots; + if (!Array.isArray(roots) || roots.length === 0) return null; + const first = roots[0] as { uri?: unknown }; + if (typeof first?.uri !== 'string') return null; + return fileUriToPath(first.uri); +} + /** * MCP Server for CodeGraph * @@ -68,6 +88,13 @@ export class MCPServer { // In-flight background init kicked off from handleInitialize. Tracked so the // sync retry path doesn't race against it (double-opening the SQLite file). private initPromise: Promise | null = null; + // Whether the client advertised the MCP `roots` capability during initialize. + // If so, and no explicit project path was given, we ask it for the workspace + // root via roots/list rather than guessing from the (often wrong) cwd. + private clientSupportsRoots = false; + // Guards the one-shot deferred resolution (roots/list or cwd) so we don't + // re-issue roots/list on every tool call. + private rootsAttempted = false; constructor(projectPath?: string) { this.projectPath = projectPath || null; @@ -108,6 +135,9 @@ export class MCPServer { * are still possible. */ private async tryInitializeDefault(projectPath: string): Promise { + // Record where we searched so a later "not initialized" error can name it. + this.toolHandler.setDefaultProjectHint(projectPath); + // Walk up parent directories to find nearest .codegraph/ const resolvedRoot = findNearestCodeGraphRoot(projectPath); @@ -146,10 +176,28 @@ export class MCPServer { // Already initialized successfully if (this.toolHandler.hasDefaultCodeGraph()) return; - // No project path to retry with - if (!this.projectPath) return; - const resolvedRoot = findNearestCodeGraphRoot(this.projectPath); + // No explicit path was given at initialize. Resolve it now, exactly once: + // ask the client via roots/list (if it advertised roots), else use cwd. + // Deferring to here lets a roots answer override the wrong cwd, and the + // one-shot guard means we never re-issue roots/list per tool call. + if (!this.projectPath && !this.rootsAttempted) { + this.rootsAttempted = true; + this.initPromise = ( + this.clientSupportsRoots + ? this.initFromRoots() + : this.tryInitializeDefault(process.cwd()) + ).finally(() => { this.initPromise = null; }); + try { await this.initPromise; } catch { /* fall through to last-resort below */ } + if (this.toolHandler.hasDefaultCodeGraph()) return; + } + + // Last resort: re-walk from the best candidate we have. Picks up projects + // initialized after the server started, and covers clients that sent no + // usable initialize signal at all. + const candidate = this.projectPath ?? process.cwd(); + this.toolHandler.setDefaultProjectHint(candidate); + const resolvedRoot = findNearestCodeGraphRoot(candidate); if (!resolvedRoot) return; try { @@ -167,6 +215,28 @@ export class MCPServer { } } + /** + * Resolve the project root via the MCP `roots/list` request and initialize + * from the first root the client reports. Falls back to the process cwd if + * the client returns no usable root or doesn't answer in time. See issue #196. + */ + private async initFromRoots(): Promise { + let target = process.cwd(); + try { + const result = await this.transport.request('roots/list', undefined, ROOTS_LIST_TIMEOUT_MS); + const rootPath = firstRootPath(result); + if (rootPath) { + target = rootPath; + } else { + process.stderr.write('[CodeGraph MCP] Client returned no workspace roots; falling back to process cwd.\n'); + } + } catch (err) { + const msg = err instanceof Error ? err.message : String(err); + process.stderr.write(`[CodeGraph MCP] roots/list request failed (${msg}); falling back to process cwd.\n`); + } + await this.tryInitializeDefault(target); + } + /** * Start file watching on the active CodeGraph instance. * Logs sync activity to stderr for diagnostics. @@ -279,20 +349,25 @@ export class MCPServer { const params = request.params as { rootUri?: string; workspaceFolders?: Array<{ uri: string; name: string }>; + capabilities?: { roots?: unknown }; } | undefined; - // Extract project path from rootUri or workspaceFolders - let projectPath = this.projectPath; + // Does the client support the MCP `roots` protocol? If so, and we have no + // explicit path, we ask it for the workspace root after the handshake + // instead of falling back to the (frequently wrong) cwd. See issue #196. + this.clientSupportsRoots = !!params?.capabilities?.roots; + // Explicit project signal, strongest first: a client-provided rootUri / + // workspaceFolders (LSP-style, non-standard but some clients send it), else + // the --path the server was launched with. cwd is NOT used here — we defer + // it so a roots/list answer can win over it. + let explicitPath: string | null = null; if (params?.rootUri) { - projectPath = fileUriToPath(params.rootUri); + explicitPath = fileUriToPath(params.rootUri); } else if (params?.workspaceFolders?.[0]?.uri) { - projectPath = fileUriToPath(params.workspaceFolders[0].uri); - } - - // Fall back to current working directory if no path provided - if (!projectPath) { - projectPath = process.cwd(); + explicitPath = fileUriToPath(params.workspaceFolders[0].uri); + } else if (this.projectPath) { + explicitPath = this.projectPath; } // Respond to the handshake BEFORE doing any heavy initialization. Loading @@ -315,13 +390,20 @@ export class MCPServer { instructions: SERVER_INSTRUCTIONS, }); - // Kick off the default-project init in the background. Tool calls that - // arrive before it finishes will see the "not initialized yet" path and - // fall through to `retryInitIfNeeded`, which now waits for this promise - // rather than racing against it with a second open. - this.initPromise = this.tryInitializeDefault(projectPath).finally(() => { - this.initPromise = null; - }); + // If we know the project dir, kick off init in the background now. Tool + // calls that arrive before it finishes fall through to `retryInitIfNeeded`, + // which waits for this promise rather than racing it with a second open. + // + // If we DON'T know it (no rootUri, no --path), defer: the first tool call + // resolves it via roots/list (when the client supports roots) or cwd. This + // is the fix for issue #196 — clients that launch the server outside the + // project and don't pass a rootUri previously got a misleading "not + // initialized" error on every call. + if (explicitPath) { + this.initPromise = this.tryInitializeDefault(explicitPath).finally(() => { + this.initPromise = null; + }); + } } /** diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index 7b0d55b0..204ee59c 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -440,6 +440,9 @@ export const tools: ToolDefinition[] = [ export class ToolHandler { // Cache of opened CodeGraph instances for cross-project queries private projectCache: Map = new Map(); + // The directory the server last searched for a default project. Surfaced in + // the "not initialized" error so users can see why detection missed. + private defaultProjectHint: string | null = null; constructor(private cg: CodeGraph | null) {} @@ -450,6 +453,14 @@ export class ToolHandler { this.cg = cg; } + /** + * Record the directory the server tried to resolve the default project from. + * Used only to make the "no default project" error actionable. + */ + setDefaultProjectHint(searchedPath: string): void { + this.defaultProjectHint = searchedPath; + } + /** * Whether a default CodeGraph instance is available */ @@ -495,7 +506,16 @@ export class ToolHandler { private getCodeGraph(projectPath?: string): CodeGraph { if (!projectPath) { if (!this.cg) { - throw new Error('CodeGraph not initialized for this project. Run \'codegraph init\' first.'); + const searched = this.defaultProjectHint ?? process.cwd(); + throw new Error( + 'No CodeGraph project is loaded for this session.\n' + + `Searched for a .codegraph/ directory starting from: ${searched}\n` + + 'The index is likely fine — this is a working-directory detection issue: ' + + "the MCP client launched the server outside your project and didn't report the " + + 'workspace root. Fix it either way:\n' + + ' • Pass projectPath to the tool call, e.g. projectPath: "/absolute/path/to/your/project"\n' + + ' • Or add --path to the server\'s MCP config args: ["serve", "--mcp", "--path", "/absolute/path/to/your/project"]' + ); } return this.cg; } diff --git a/src/mcp/transport.ts b/src/mcp/transport.ts index 44038918..2638600d 100644 --- a/src/mcp/transport.ts +++ b/src/mcp/transport.ts @@ -63,6 +63,13 @@ export type MessageHandler = (message: JsonRpcRequest | JsonRpcNotification) => export class StdioTransport { private rl: readline.Interface | null = null; private messageHandler: MessageHandler | null = null; + // Outstanding server-initiated requests (e.g. roots/list), keyed by the id + // we sent. Responses from the client are matched back here. + private pending = new Map void; + reject: (error: Error) => void; + }>(); + private nextRequestId = 1; /** * Start listening for messages on stdin @@ -89,12 +96,42 @@ export class StdioTransport { * Stop listening */ stop(): void { + // Fail any in-flight server-initiated requests so their awaiters don't hang. + for (const { reject } of this.pending.values()) { + reject(new Error('Transport stopped')); + } + this.pending.clear(); if (this.rl) { this.rl.close(); this.rl = null; } } + /** + * Send a server-initiated request to the client and await its response. + * + * MCP is bidirectional: the server can ask the client questions too. We use + * this for `roots/list` — the spec-blessed way to learn the workspace root + * when the client didn't pass one in `initialize` (see issue #196). Rejects + * on timeout so callers can fall back rather than hang forever. + */ + request(method: string, params?: unknown, timeoutMs = 5000): Promise { + const id = `cg-srv-${this.nextRequestId++}`; + return new Promise((resolve, reject) => { + const timer = setTimeout(() => { + this.pending.delete(id); + reject(new Error(`Timed out after ${timeoutMs}ms waiting for "${method}" response`)); + }, timeoutMs); + // Don't let a pending request keep the process alive on shutdown. + timer.unref?.(); + this.pending.set(id, { + resolve: (value) => { clearTimeout(timer); resolve(value); }, + reject: (error) => { clearTimeout(timer); reject(error); }, + }); + process.stdout.write(JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n'); + }); + } + /** * Send a response */ @@ -152,6 +189,20 @@ export class StdioTransport { return; } + // Response to a server-initiated request (has id + result/error, no method). + // Route it to the awaiting requester instead of the message handler — these + // used to be dropped as "Invalid Request" because they carry no method. + const obj = parsed as Record; + if ( + obj?.jsonrpc === '2.0' && + typeof obj.method !== 'string' && + 'id' in obj && + ('result' in obj || 'error' in obj) + ) { + this.handleResponse(obj); + return; + } + // Validate basic JSON-RPC structure if (!this.isValidMessage(parsed)) { this.sendError(null, ErrorCodes.InvalidRequest, 'Invalid Request: not a valid JSON-RPC 2.0 message'); @@ -174,6 +225,24 @@ export class StdioTransport { } } + /** + * Resolve (or reject) the pending server-initiated request matching this + * response's id. Unknown ids are ignored — the client may echo something we + * never sent, or a request may have already timed out. + */ + private handleResponse(msg: Record): void { + const id = msg.id as string | number; + const pending = this.pending.get(id); + if (!pending) return; + this.pending.delete(id); + if ('error' in msg && msg.error) { + const err = msg.error as { message?: string }; + pending.reject(new Error(err.message || 'Request failed')); + } else { + pending.resolve(msg.result); + } + } + /** * Check if message is a valid JSON-RPC 2.0 message */ From b3d3ddbd931bf8e234e5d7602e92db0276c5cdd0 Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Wed, 20 May 2026 11:36:45 -0500 Subject: [PATCH 03/47] refactor(eval): rename /audit skill to /agent-eval Renames the `.claude/skills/audit/` directory and all internal references to `agent-eval`, aligning the skill name with the `/agent-eval` command it invokes. --- .claude/skills/{audit => agent-eval}/SKILL.md | 6 +++--- .claude/skills/{audit => agent-eval}/corpus.json | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) rename .claude/skills/{audit => agent-eval}/SKILL.md (91%) rename .claude/skills/{audit => agent-eval}/corpus.json (96%) diff --git a/.claude/skills/audit/SKILL.md b/.claude/skills/agent-eval/SKILL.md similarity index 91% rename from .claude/skills/audit/SKILL.md rename to .claude/skills/agent-eval/SKILL.md index ee13ebe1..2e894a75 100644 --- a/.claude/skills/audit/SKILL.md +++ b/.claude/skills/agent-eval/SKILL.md @@ -1,6 +1,6 @@ --- -name: audit -description: Benchmark CodeGraph retrieval quality on a real codebase by comparing agent behavior with vs without CodeGraph. Use when the user runs /audit or asks to test, benchmark, audit, or validate a codegraph version (the local dev build or a published npm version) against a language's repo. +name: agent-eval +description: Benchmark CodeGraph retrieval quality on a real codebase by comparing agent behavior with vs without CodeGraph. Use when the user runs /agent-eval or asks to test, benchmark, audit, or validate a codegraph version (the local dev build or a published npm version) against a language's repo. --- # CodeGraph Quality Audit @@ -32,7 +32,7 @@ user type a specific version (e.g. `0.7.10`). Map the answer to a VERSION token: - "Latest published" → `latest` - a typed version → that string (e.g. `0.7.10`) -**Step 2 — language.** Read `.claude/skills/audit/corpus.json`. Ask with +**Step 2 — language.** Read `.claude/skills/agent-eval/corpus.json`. Ask with `AskUserQuestion` which language to test, listing the languages that have entries. **Step 3 — repo.** From the chosen language's entries, ask which repo. Label each diff --git a/.claude/skills/audit/corpus.json b/.claude/skills/agent-eval/corpus.json similarity index 96% rename from .claude/skills/audit/corpus.json rename to .claude/skills/agent-eval/corpus.json index 4b48dab0..6e223526 100644 --- a/.claude/skills/audit/corpus.json +++ b/.claude/skills/agent-eval/corpus.json @@ -1,5 +1,5 @@ { - "_comment": "Test corpus for /audit. Add entries freely. size: Small (<~150 files), Medium (~150-1500), Large (>~1500). 'question' is a representative architectural question that exercises cross-file understanding.", + "_comment": "Test corpus for /agent-eval. Add entries freely. size: Small (<~150 files), Medium (~150-1500), Large (>~1500). 'question' is a representative architectural question that exercises cross-file understanding.", "TypeScript": [ { "name": "ky", "repo": "https://github.com/sindresorhus/ky", "size": "Small", "files": "~25", "question": "How does ky implement request retries and timeouts?" }, { "name": "excalidraw", "repo": "https://github.com/excalidraw/excalidraw", "size": "Medium", "files": "~600", "question": "How does Excalidraw render and update canvas elements?" }, From 9b06b0edde65ea932bcb7eb317353d4ac3d7f2ff Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 11:54:19 -0500 Subject: [PATCH 04/47] fix(db): require better-sqlite3 ^12.4.1 so Node 24 gets the native backend (#203) (#216) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit better-sqlite3 ^11.0.0 (latest 11.10.0) ships no prebuilt binary for Node 24's ABI (node-v137) and predates Node 24, so every Node 24 install silently fell back to the 5-10x-slower WASM backend. Bump to ^12.4.1 — the first 12.x with the Node 24 prebuild — and raise the engines floor to Node 20 (Node 18 is EOL and dropped from better-sqlite3 12.x prebuilds). Verified on macOS Node 24.15.0 (ABI 137): prebuilt binary used with no compiler (installs even with CC/CXX sabotaged), `codegraph init -i` shows no WASM banner, and `codegraph status` reports Backend: native. 639/639 tests pass on Node 22. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 20 ++++++++++++++++++++ package-lock.json | 13 ++++++++----- package.json | 4 ++-- 3 files changed, 30 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4f150837..a3c76ee2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -33,6 +33,12 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). setup is actually fast. `codegraph uninit` removes any hooks it installed. ### Changed +- **Minimum Node.js is now 20** (was 18). Node 18 is end-of-life and the + native SQLite binding (`better-sqlite3` 12.x) no longer ships a Node 18 + prebuilt binary. Node 22 LTS and Node 24 get the native backend out of the + box; on other Node versions CodeGraph still runs via the WASM fallback + (slower, but functional). Node 25+ remains blocked (V8 WASM JIT crash, see + [#81](https://github.com/colbymchenry/codegraph/issues/81)). - **MCP / explore**: `codegraph_explore` output is now adaptive to project size. The tool used to apply a fixed 35KB cap regardless of how large the codebase was, which on small projects (~100 files) produced bigger @@ -57,6 +63,20 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Thanks to [@essopsp](https://github.com/essopsp) for the repro. ### Fixed +- **Native SQLite backend on Node 24**: indexing on Node 24 always dropped to + the 5-10x-slower WASM backend, printing a `better-sqlite3 unavailable` + warning that `npm rebuild better-sqlite3` / `xcode-select --install` could + not clear ([#203](https://github.com/colbymchenry/codegraph/issues/203)). + The bundled `better-sqlite3` was pinned to a v11 release that ships no + prebuilt binary for Node 24's ABI (`node-v137`), so every Node 24 install + silently degraded — and because CodeGraph is usually installed globally, the + `npm install` / `npm rebuild` people ran in their own project never touched + CodeGraph's copy. CodeGraph now requires `better-sqlite3` `^12.4.1`, whose + prebuilds include Node 24, so a fresh install on Node 22 or Node 24 gets the + native backend with no compiler. On an already-broken install, reinstall + CodeGraph (e.g. `npm install -g @colbymchenry/codegraph`) to pull the new + binding; `codegraph status` should then report `Backend: native`. Thanks to + [@Finndersen](https://github.com/Finndersen) for the report. - **MCP**: tools no longer fail with "CodeGraph not initialized" when the index actually exists. This hit clients that launch the MCP server from a directory other than your project and don't report a workspace root in `initialize` diff --git a/package-lock.json b/package-lock.json index 2d4e515a..1b4ce89d 100644 --- a/package-lock.json +++ b/package-lock.json @@ -31,10 +31,10 @@ "vitest": "^2.1.9" }, "engines": { - "node": ">=18.0.0 <25.0.0" + "node": ">=20.0.0 <25.0.0" }, "optionalDependencies": { - "better-sqlite3": "^11.0.0" + "better-sqlite3": "^12.4.1" } }, "node_modules/@clack/core": { @@ -992,15 +992,18 @@ "optional": true }, "node_modules/better-sqlite3": { - "version": "11.10.0", - "resolved": "https://registry.npmjs.org/better-sqlite3/-/better-sqlite3-11.10.0.tgz", - "integrity": "sha512-EwhOpyXiOEL/lKzHz9AW1msWFNzGc/z+LzeB3/jnFJpxu+th2yqvzsSWas1v9jgs9+xiXJcD5A8CJxAG2TaghQ==", + "version": "12.10.0", + "resolved": "https://registry.npmjs.org/better-sqlite3/-/better-sqlite3-12.10.0.tgz", + "integrity": "sha512-CyzaZRQKyHkB2ZInfTTl2nvT33EbDpjkLEbE8/Zck3Ll6O0qqvuGdrJ45HgtH+HykRg88ITY3AdreBGN70aBSQ==", "hasInstallScript": true, "license": "MIT", "optional": true, "dependencies": { "bindings": "^1.5.0", "prebuild-install": "^7.1.1" + }, + "engines": { + "node": "20.x || 22.x || 23.x || 24.x || 25.x || 26.x" } }, "node_modules/bindings": { diff --git a/package.json b/package.json index 60dc5c71..202e9a48 100644 --- a/package.json +++ b/package.json @@ -51,9 +51,9 @@ "vitest": "^2.1.9" }, "optionalDependencies": { - "better-sqlite3": "^11.0.0" + "better-sqlite3": "^12.4.1" }, "engines": { - "node": ">=18.0.0 <25.0.0" + "node": ">=20.0.0 <25.0.0" } } From 07c093cc3f9ae0dd799acb26d539dff77a68e24e Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 12:03:43 -0500 Subject: [PATCH 05/47] fix(extraction): index nested non-submodule git repos (#193) (#217) `codegraph init -i` from a git super-repo containing independent nested git repositories (not submodules) reported "No files found to index": git ls-files reports an embedded repo only as an opaque `subdir/` entry and never lists its files. Detect embedded repos via that trailing-slash signal and recurse `git ls-files` into each, indexing tracked + untracked source and honoring each repo's own .gitignore. Reported by @timxx. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 11 +++++ __tests__/extraction.test.ts | 73 ++++++++++++++++++++++++++++++++ src/extraction/index.ts | 80 ++++++++++++++++++++++++------------ 3 files changed, 138 insertions(+), 26 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a3c76ee2..0e3656c5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -63,6 +63,17 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Thanks to [@essopsp](https://github.com/essopsp) for the repro. ### Fixed +- **Indexing**: `codegraph init -i` now finds source inside nested, independent + git repositories — separate clones living inside the workspace that are **not** + git submodules (common in CMake "super-repo" layouts). When the top-level + workspace is itself a git repo, `git ls-files` reports an embedded repo only as + an opaque `subdir/` entry and never lists its files, so indexing from the + workspace root reported "No files found to index" even though indexing each + sub-repo individually worked. CodeGraph now detects these embedded repos and + indexes their tracked and untracked source, honoring each repo's own + `.gitignore`. Closes + [#193](https://github.com/colbymchenry/codegraph/issues/193). Thanks to + [@timxx](https://github.com/timxx) for the report. - **Native SQLite backend on Node 24**: indexing on Node 24 always dropped to the 5-10x-slower WASM backend, printing a `better-sqlite3 unavailable` warning that `npm rebuild better-sqlite3` / `xcode-select --install` could diff --git a/__tests__/extraction.test.ts b/__tests__/extraction.test.ts index cb69e2ab..b08408a4 100644 --- a/__tests__/extraction.test.ts +++ b/__tests__/extraction.test.ts @@ -3132,6 +3132,79 @@ describe('Git Submodules', () => { }); }); +describe('Nested non-submodule git repos', () => { + let tempDir: string; + + beforeEach(() => { + tempDir = createTempDir(); + }); + + afterEach(() => { + cleanupTempDir(tempDir); + }); + + it('should index files in embedded git repos run from a git super-repo (issue #193)', async () => { + const { execFileSync } = await import('child_process'); + const git = (cwd: string, ...args: string[]) => + execFileSync('git', args, { cwd, stdio: 'pipe' }); + + // Top-level workspace is itself a git repo, holding no source directly — + // the CMake "super-repo" layout from the issue. + const root = path.join(tempDir, 'root'); + fs.mkdirSync(path.join(root, 'coding'), { recursive: true }); + git(root, 'init', '-q'); + git(root, 'config', 'user.email', 'test@test.com'); + git(root, 'config', 'user.name', 'Test'); + fs.writeFileSync(path.join(root, 'CMakeLists.txt'), 'cmake_minimum_required(VERSION 3.10)\n'); + + // Two independent clones living inside the workspace (NOT submodules): + // one with committed source, one with only untracked source. + const sub1 = path.join(root, 'sub_repo1', 'src'); + fs.mkdirSync(sub1, { recursive: true }); + git(path.join(root, 'sub_repo1'), 'init', '-q'); + git(path.join(root, 'sub_repo1'), 'config', 'user.email', 'test@test.com'); + git(path.join(root, 'sub_repo1'), 'config', 'user.name', 'Test'); + fs.writeFileSync(path.join(sub1, 'one.ts'), 'export const one = 1;'); + git(path.join(root, 'sub_repo1'), 'add', '-A'); + git(path.join(root, 'sub_repo1'), 'commit', '-q', '-m', 'sub1 init'); + + const sub2 = path.join(root, 'sub_repo2', 'src'); + fs.mkdirSync(sub2, { recursive: true }); + git(path.join(root, 'sub_repo2'), 'init', '-q'); + fs.writeFileSync(path.join(sub2, 'two.ts'), 'export const two = 2;'); + + const config = { ...DEFAULT_CONFIG, rootDir: root }; + const files = scanDirectory(root, config); + + // Both committed and untracked source from the nested repos must be found. + expect(files).toContain('sub_repo1/src/one.ts'); + expect(files).toContain('sub_repo2/src/two.ts'); + }); + + it('should respect each embedded repo\'s own .gitignore', async () => { + const { execFileSync } = await import('child_process'); + const git = (cwd: string, ...args: string[]) => + execFileSync('git', args, { cwd, stdio: 'pipe' }); + + const root = path.join(tempDir, 'root'); + fs.mkdirSync(root, { recursive: true }); + git(root, 'init', '-q'); + + const sub = path.join(root, 'sub_repo', 'src'); + fs.mkdirSync(sub, { recursive: true }); + git(path.join(root, 'sub_repo'), 'init', '-q'); + fs.writeFileSync(path.join(root, 'sub_repo', '.gitignore'), 'src/generated.ts\n'); + fs.writeFileSync(path.join(sub, 'real.ts'), 'export const real = 1;'); + fs.writeFileSync(path.join(sub, 'generated.ts'), 'export const generated = 1;'); + + const config = { ...DEFAULT_CONFIG, rootDir: root }; + const files = scanDirectory(root, config); + + expect(files).toContain('sub_repo/src/real.ts'); + expect(files).not.toContain('sub_repo/src/generated.ts'); + }); +}); + // ============================================================================= // Scala // ============================================================================= diff --git a/src/extraction/index.ts b/src/extraction/index.ts index bf1e6319..b5269cbe 100644 --- a/src/extraction/index.ts +++ b/src/extraction/index.ts @@ -125,10 +125,61 @@ export function shouldIncludeFile( return false; } +/** + * Collect git-visible files (tracked + untracked, .gitignore-respected) from the + * git repository rooted at `repoDir`, adding each to `files` with `prefix` + * prepended so paths stay relative to the original scan root. + * + * Recurses into embedded git repositories — nested repos that are NOT submodules + * (independent clones living inside the workspace, common in CMake "super-repo" + * layouts). The parent repo's `git ls-files` cannot see into them: tracked output + * skips them entirely, and untracked output reports them only as an opaque + * "subdir/" entry (trailing slash) rather than expanding their files. Each + * embedded repo is its own git boundary, so we re-run `git ls-files` inside it. + * (See issue #193.) + */ +function collectGitFiles(repoDir: string, prefix: string, files: Set): void { + const gitOpts = { cwd: repoDir, encoding: 'utf-8' as const, timeout: 30000, maxBuffer: 50 * 1024 * 1024, stdio: ['pipe', 'pipe', 'pipe'] as ['pipe', 'pipe', 'pipe'] }; + + // Tracked files. --recurse-submodules pulls in files from active submodules, + // which the index would otherwise represent only as a commit pointer. + // Without this, monorepos using submodules index 0 files. (See issue #147.) + // Note: --recurse-submodules only supports -c/--cached and --stage modes — it + // can't be combined with -o, so untracked files are gathered separately below. + const tracked = execFileSync('git', ['ls-files', '-c', '--recurse-submodules'], gitOpts); + for (const line of tracked.split('\n')) { + const trimmed = line.trim(); + if (trimmed) { + files.add(normalizePath(prefix + trimmed)); + } + } + + // Untracked files (submodules manage their own untracked state). Embedded git + // repos surface here as a single "subdir/" entry that git refuses to descend + // into — recurse into those as their own repos so their source gets indexed. + const untracked = execFileSync('git', ['ls-files', '-o', '--exclude-standard'], gitOpts); + for (const line of untracked.split('\n')) { + const trimmed = line.trim(); + if (!trimmed) continue; + if (trimmed.endsWith('/')) { + // git only emits a trailing-slash directory entry for an embedded repo. + // Guard with a .git check anyway, and skip anything else exactly as git + // itself skips it (we never descend into a non-repo opaque dir). + const childDir = path.join(repoDir, trimmed); + if (fs.existsSync(path.join(childDir, '.git'))) { + collectGitFiles(childDir, prefix + trimmed, files); + } + continue; + } + files.add(normalizePath(prefix + trimmed)); + } +} + /** * Get all files visible to git (tracked + untracked but not ignored). - * Respects .gitignore at all levels (root, subdirectories). - * Returns null on failure (non-git project) so callers can fall back. + * Respects .gitignore at all levels (root, subdirectories) and descends into + * embedded (nested, non-submodule) git repos. Returns null on failure + * (non-git project) so callers can fall back to a filesystem walk. */ function getGitVisibleFiles(rootDir: string): Set | null { try { @@ -157,30 +208,7 @@ function getGitVisibleFiles(rootDir: string): Set | null { } const files = new Set(); - const gitOpts = { cwd: rootDir, encoding: 'utf-8' as const, timeout: 30000, maxBuffer: 50 * 1024 * 1024, stdio: ['pipe', 'pipe', 'pipe'] as ['pipe', 'pipe', 'pipe'] }; - - // Tracked files. --recurse-submodules pulls in files from active submodules, - // which the main repo's index would otherwise represent only as a commit pointer. - // Without this, monorepos using submodules index 0 files. (See issue #147.) - // Note: --recurse-submodules only supports -c/--cached and --stage modes — it - // can't be combined with -o, so untracked files are gathered separately below. - const tracked = execFileSync('git', ['ls-files', '-c', '--recurse-submodules'], gitOpts); - for (const line of tracked.split('\n')) { - const trimmed = line.trim(); - if (trimmed) { - files.add(normalizePath(trimmed)); - } - } - - // Untracked files in the main repo (submodules manage their own untracked state). - const untracked = execFileSync('git', ['ls-files', '-o', '--exclude-standard'], gitOpts); - for (const line of untracked.split('\n')) { - const trimmed = line.trim(); - if (trimmed) { - files.add(normalizePath(trimmed)); - } - } - + collectGitFiles(rootDir, '', files); return files; } catch { return null; From a47355780b138e87ef423ef54c86a32d1678f099 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 12:15:54 -0500 Subject: [PATCH 06/47] fix(sync): stop reporting git-untracked files as pending after sync (#206) (#218) Both git fast-paths in ExtractionOrchestrator (sync and getChangedFiles) classified every untracked (`??`) file as "added" without checking the index. Indexing a file doesn't make git track it, so the file stayed `??` and was re-reported as pending and re-indexed on every run: `codegraph status` listed it under Pending Changes forever and each `sync` re-added it, even though its symbols were already queryable. Merge the modified + added handling into a single hash-compared loop so untracked files get the same treatment as tracked ones: "added" only if missing from the index, "modified" if contents changed, skipped otherwise. The non-git fallback path already did this and is unchanged. Closes #206. Reported by @15290391025. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 12 +++++++++++ __tests__/sync.test.ts | 44 +++++++++++++++++++++++++++++++++++++++++ src/extraction/index.ts | 27 +++++++++++-------------- 3 files changed, 67 insertions(+), 16 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 0e3656c5..d0723efa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -63,6 +63,18 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Thanks to [@essopsp](https://github.com/essopsp) for the repro. ### Fixed +- **Sync / status**: git-untracked files are no longer reported as pending + "Added" forever. After `codegraph sync` indexed a newly-created untracked + source file, `codegraph status` kept listing it under Pending Changes and + every subsequent `sync` re-indexed it from scratch — even though its symbols + were already queryable. Change detection trusted `git status` and counted + every untracked (`??`) entry as new without checking the index, but indexing + a file doesn't make git track it, so the file stayed `??` and got re-added on + each run. CodeGraph now hash-compares untracked files against the index the + same way it does tracked files: a file counts as "added" only if it's missing + from the index, "modified" if its contents changed, and is skipped otherwise. + Closes [#206](https://github.com/colbymchenry/codegraph/issues/206). Thanks to + [@15290391025](https://github.com/15290391025) for the report. - **Indexing**: `codegraph init -i` now finds source inside nested, independent git repositories — separate clones living inside the workspace that are **not** git submodules (common in CMake "super-repo" layouts). When the top-level diff --git a/__tests__/sync.test.ts b/__tests__/sync.test.ts index 8365f630..374e7788 100644 --- a/__tests__/sync.test.ts +++ b/__tests__/sync.test.ts @@ -225,6 +225,50 @@ describe('Sync Module', () => { expect(nodes.length).toBeGreaterThan(0); }); + it('should stop reporting untracked files once they are indexed (issue #206)', async () => { + // Untracked files stay `??` in git status even after codegraph indexes + // them. Change detection must compare them against the DB by hash, not + // report every untracked file as "added" on every sync/status. + fs.writeFileSync( + path.join(testDir, 'src', 'new.ts'), + `export function newFunc() { return 42; }` + ); + + // First sync indexes the untracked file. + const first = await cg.sync(); + expect(first.filesAdded).toBe(1); + + // The file is still untracked in git, but now lives in the DB. + expect(cg.searchNodes('newFunc').length).toBeGreaterThan(0); + + // status must not keep flagging it as a pending addition... + const changes = cg.getChangedFiles(); + expect(changes.added).not.toContain('src/new.ts'); + expect(changes.modified).not.toContain('src/new.ts'); + + // ...and a second sync must be a no-op for it. + const second = await cg.sync(); + expect(second.filesAdded).toBe(0); + expect(second.filesModified).toBe(0); + }); + + it('should re-index an untracked file when its contents change', async () => { + const filePath = path.join(testDir, 'src', 'new.ts'); + fs.writeFileSync(filePath, `export function newFunc() { return 42; }`); + await cg.sync(); + + // Modify the still-untracked file. + fs.writeFileSync(filePath, `export function renamedFunc() { return 7; }`); + + const changes = cg.getChangedFiles(); + expect(changes.modified).toContain('src/new.ts'); + + const result = await cg.sync(); + expect(result.filesModified).toBe(1); + expect(cg.searchNodes('renamedFunc').length).toBeGreaterThan(0); + expect(cg.searchNodes('newFunc').length).toBe(0); + }); + it('should detect deleted files via git', async () => { fs.unlinkSync(path.join(testDir, 'src', 'index.ts')); diff --git a/src/extraction/index.ts b/src/extraction/index.ts index b5269cbe..18086bdf 100644 --- a/src/extraction/index.ts +++ b/src/extraction/index.ts @@ -1261,8 +1261,12 @@ export class ExtractionOrchestrator { } } - // Handle modified files — read + hash only these files - for (const filePath of gitChanges.modified) { + // Handle modified + added files — read + hash only these. Untracked + // (`??`) files stay untracked in git even after we index them, so they + // can't be trusted as "new": re-hash and compare against the DB exactly + // like modified files. Otherwise every sync re-indexes them and status + // reports them as pending forever. (See issue #206.) + for (const filePath of [...gitChanges.modified, ...gitChanges.added]) { const fullPath = path.join(this.rootDir, filePath); let content: string; try { @@ -1285,13 +1289,6 @@ export class ExtractionOrchestrator { filesModified++; } } - - // Handle added (untracked) files - for (const filePath of gitChanges.added) { - filesToIndex.push(filePath); - changedFilePaths.push(filePath); - filesAdded++; - } } else { // === Fallback: full scan (non-git project or git failure) === const currentFiles = new Set(scanDirectory(this.rootDir, this.config)); @@ -1395,8 +1392,11 @@ export class ExtractionOrchestrator { } } - // Modified files — read + hash only these, compare with DB - for (const filePath of gitChanges.modified) { + // Modified + added files — read + hash, compare with DB. Untracked (`??`) + // files stay untracked in git even after indexing, so they must be + // hash-compared like modified files instead of always counting as added — + // otherwise status reports them as pending forever. (See issue #206.) + for (const filePath of [...gitChanges.modified, ...gitChanges.added]) { const fullPath = path.join(this.rootDir, filePath); let content: string; try { @@ -1416,11 +1416,6 @@ export class ExtractionOrchestrator { } } - // Added (untracked) files - for (const filePath of gitChanges.added) { - added.push(filePath); - } - return { added, modified, removed }; } From f5bbc26c602ac56b9fc5b0a49d0ecaed163e30e6 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 16:33:50 -0500 Subject: [PATCH 07/47] =?UTF-8?q?perf(mcp):=20answer-directly=20steering?= =?UTF-8?q?=20=E2=80=94=20~35%=20cheaper,=20~70%=20fewer=20tool=20calls=20?= =?UTF-8?q?(#224)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * perf(mcp): steer agents to answer directly instead of delegating to subagents CodeGraph beats native grep/read on cost only when the agent queries it directly. When the agent delegates to file-reading sub-agents, those sub-agents read files regardless of the index, so CodeGraph becomes net overhead on top of the reads. The install templates even told agents to "spawn a subagent for explore-class questions" — the expensive path. Changes: - server-instructions + both install templates: add an "Answer directly — don't delegate exploration" directive; reposition codegraph_explore as the efficient one-call multi-symbol tool (was: "spawn a subagent for it"). - codegraph_explore: hard-cap output to its adaptive budget (it overran, ~30k vs a 28k cap) and tighten the medium tier (28k->13k). - codegraph_node: return a member outline for container kinds instead of the full class body. Rigorous N>=4-per-arm warm-block benchmark (median total_cost_usd): excalidraw (~600 files): WITH $0.54 vs native $1.02 (-47%) vscode (~10k files): WITH $0.41 vs native $0.72 (-42%) ky (~25 files): WITH $0.46 vs native $0.44 (wash) Answers were equal-or-better (correct, file:line-cited) with ~6x fewer tool calls; the directive drove the direct path on 14/14 codegraph runs. Co-Authored-By: Claude Opus 4.7 (1M context) * docs(readme): rebuild benchmark with real-world repos + cost/token/time/tool savings Replace the "Claude Code (Python+Rust/Java)" rows — which benchmarked the Claude Code CLI repo, not real codebases in those languages — with real open-source projects per language: Django (Python), Tokio (Rust), OkHttp (Java), Gin (Go), plus Alamofire (Swift) and the existing TypeScript repos (VS Code, Excalidraw). The table now reports all four savings the change targets — cost, tokens, time, tool calls — as the median of 4 runs per arm (Claude Opus 4.7, headless claude -p, with vs empty MCP config). Averages across the 7 repos: 35% cheaper, 59% fewer tokens, 49% faster, 70% fewer tool calls. Adds a methodology note and raw WITH->WITHOUT medians. Co-Authored-By: Claude Opus 4.7 (1M context) --------- Co-authored-by: Claude Opus 4.7 (1M context) --- .cursor/rules/codegraph.mdc | 5 +- CHANGELOG.md | 26 +++++++- README.md | 81 ++++++++++-------------- src/installer/instructions-template.ts | 5 +- src/mcp/server-instructions.ts | 16 ++++- src/mcp/tools.ts | 88 ++++++++++++++++++++++---- 6 files changed, 154 insertions(+), 67 deletions(-) diff --git a/.cursor/rules/codegraph.mdc b/.cursor/rules/codegraph.mdc index dac86b3a..3f23cf6b 100644 --- a/.cursor/rules/codegraph.mdc +++ b/.cursor/rules/codegraph.mdc @@ -19,16 +19,17 @@ Use codegraph for **structural** questions — what calls what, what would break | "What would break if I changed Z?" | `codegraph_impact` | | "Show me Y's signature / source / docstring" | `codegraph_node` | | "Give me focused context for a task/area" | `codegraph_context` | -| "Survey an unfamiliar module/topic" | `codegraph_explore` | +| "See several related symbols' source at once" | `codegraph_explore` | | "What files exist under path/" | `codegraph_files` | | "Is the index healthy?" | `codegraph_status` | ### Rules of thumb +- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: `codegraph_context` first, then ONE `codegraph_explore` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer. - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context. - **Don't grep first** when looking up a symbol by name. `codegraph_search` is faster and returns kind + location + signature in one call. - **Don't chain `codegraph_search` + `codegraph_node`** when you just want context — `codegraph_context` is one call. -- **`codegraph_explore` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean. +- **Don't loop `codegraph_node` over many symbols** — one `codegraph_explore` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more. - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn. ### If `.codegraph/` doesn't exist diff --git a/CHANGELOG.md b/CHANGELOG.md index d0723efa..4a36bdb8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -33,6 +33,25 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). setup is actually fast. `codegraph uninit` removes any hooks it installed. ### Changed +- **MCP / agent guidance**: CodeGraph now tells agents to answer "how does X + work" / architecture questions *directly* — `codegraph_context`, then one + `codegraph_explore` for the surfaced symbols — instead of delegating to a + file-reading sub-agent or a grep+read loop. The server instructions and the + installed instruction files (`CLAUDE.md`, `.cursor/rules/codegraph.mdc`, + `AGENTS.md`) previously suggested *spawning a sub-agent* for explore-class + questions, which produced the opposite, more expensive behavior: the + sub-agent reads files regardless of the index, so CodeGraph became overhead + stacked on top of the reads. In rigorous N≥4-per-arm benchmarks this cut the + cost of an architecture question by ~42–47% versus a no-CodeGraph agent on + medium and large repos (Excalidraw ~600 files, VS Code ~10k), with + equal-or-better, `file:line`-cited answers and ~6× fewer tool calls; on a + tiny repo (~25 files) it's a wash, since native grep is already trivially + cheap there. +- **MCP / codegraph_node**: `includeCode=true` on a class/interface/struct/enum + now returns a compact member outline (fields + method signatures + line + numbers) instead of the entire class body — which could be thousands of + characters and was rarely needed in full. Functions and methods still return + their full body; request a specific member for its source. - **Minimum Node.js is now 20** (was 18). Node 18 is end-of-life and the native SQLite binding (`better-sqlite3` 12.x) no longer ships a Node 18 prebuilt binary. Node 22 LTS and Node 24 get the native backend out of the @@ -48,7 +67,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). now scales with indexed file count: small projects (<500 files) cap at ~18KB and skip the "Additional relevant files" / completeness / explore- budget reminders that earn their keep on bigger codebases; medium - (<5,000) caps at ~28KB; large (<15,000) keeps the historical ~35KB; very + (<5,000) caps at ~13KB; large (<15,000) keeps the historical ~35KB; very large goes up to ~38KB. A new per-file char cap also prevents a single file with many adjacent symbols from collapsing into one whole-file dump (the Alamofire `Session.swift` case from #185). Per-file cluster @@ -63,6 +82,11 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). Thanks to [@essopsp](https://github.com/essopsp) for the repro. ### Fixed +- **MCP / explore**: `codegraph_explore` output is now hard-capped to its + adaptive size budget. It could previously overrun (e.g. ~30K against a 28K + cap) once the relationship map and trailer sections were appended; the + oversized payload then sat in the agent's context and was re-read on every + later turn. - **Sync / status**: git-untracked files are no longer reported as pending "Added" forever. After `codegraph sync` indexed a newly-created untracked source file, `codegraph status` kept listing it under Pending Changes and diff --git a/README.md b/README.md index 49cf8d54..663d7d9c 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ ### Supercharge Claude Code, Cursor, Codex, and OpenCode with Semantic Code Intelligence -**94% fewer tool calls · 77% faster exploration · 100% local** +**~35% cheaper · ~70% fewer tool calls · 100% local** [![npm version](https://img.shields.io/npm/v/@colbymchenry/codegraph.svg)](https://www.npmjs.com/package/@colbymchenry/codegraph) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) @@ -50,61 +50,50 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil ### Benchmark Results -Tested across 6 real-world codebases comparing Claude Code's Explore agent **with** and **without** CodeGraph: +Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. -> **Average: 92% fewer tool calls · 71% faster** +> **Average: 35% cheaper · 59% fewer tokens · 49% faster · 70% fewer tool calls** -| Codebase | With CG | Without CG | Improvement | -|----------|---------|------------|-------------| -| **VS Code** · TypeScript | 3 calls, 17s | 52 calls, 1m 37s | **94% fewer · 82% faster** | -| **Excalidraw** · TypeScript | 3 calls, 29s | 47 calls, 1m 45s | **94% fewer · 72% faster** | -| **Claude Code** · Python + Rust | 3 calls, 39s | 40 calls, 1m 8s | **93% fewer · 43% faster** | -| **Claude Code** · Java | 1 call, 19s | 26 calls, 1m 22s | **96% fewer · 77% faster** | -| **Alamofire** · Swift | 3 calls, 22s | 32 calls, 1m 39s | **91% fewer · 78% faster** | -| **Swift Compiler** · Swift/C++ | 6 calls, 35s | 37 calls, 2m 8s | **84% fewer · 73% faster** | +| Codebase | Language | Cost | Tokens | Time | Tool calls | +|----------|----------|------|--------|------|------------| +| **VS Code** | TypeScript · ~10k files | 35% cheaper | 73% fewer | 41% faster | 72% fewer | +| **Excalidraw** | TypeScript · ~600 | 47% cheaper | 73% fewer | 60% faster | 86% fewer | +| **Django** | Python · ~2.7k | 34% cheaper | 64% fewer | 59% faster | 81% fewer | +| **Tokio** | Rust · ~700 | 52% cheaper | 81% fewer | 63% faster | 89% fewer | +| **OkHttp** | Java · ~640 | 17% cheaper | 41% fewer | 36% faster | 64% fewer | +| **Gin** | Go · ~150 | 22% cheaper | 23% fewer | 34% faster | 19% fewer | +| **Alamofire** | Swift · ~100 | 38% cheaper | 59% fewer | 51% faster | 77% fewer | + +The gains scale with codebase size: on large repos the agent answers from the index in a handful of calls with **zero file reads**, while the no-CodeGraph agent fans out across grep/find/Read (and the sub-agents it spawns). On a small repo like Gin (~150 files) native search is already cheap, so the margin narrows.
Full benchmark details -All tests used Claude Opus 4.6 (1M context) with Claude Code v2.1.91. Each test spawned a single Explore agent with the same question. +**Methodology.** Each arm is `claude -p` (Claude Opus 4.7, Claude Code v2.1.145) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. -**Queries used:** +**Queries:** | Codebase | Query | |----------|-------| | VS Code | "How does the extension host communicate with the main process?" | -| Excalidraw | "How does collaborative editing and real-time sync work?" | -| Claude Code (Python+Rust) | "How does tool execution work end to end?" | -| Claude Code (Java) | "How does tool execution work end to end?" | -| Alamofire | "Trace how a request flows from Session.request() through to the URLSession layer" | -| Swift Compiler | "How does the Swift compiler handle error diagnostics?" | - -**With CodeGraph — the agent uses `codegraph_explore` and stops:** -| Codebase | Files Indexed | Nodes | Tool Uses | Tokens | Time | File Reads | -|----------|--------------|-------|-----------|--------|------|------------| -| VS Code (TypeScript) | 4,002 | 59,377 | 3 | 56.6k | 17s | 0 | -| Excalidraw (TypeScript) | 626 | 9,859 | 3 | 57.1k | 29s | 0 | -| Claude Code (Python+Rust) | 115 | 3,080 | 3 | 67.1k | 39s | 0 | -| Claude Code (Java) | — | — | 1 | 40.8k | 19s | 0 | -| Alamofire (Swift) | 102 | 2,624 | 3 | 57.3k | 22s | 0 | -| Swift Compiler (Swift/C++) | 25,874 | 272,898 | 6 | 77.4k | 35s | 0 | - -**Without CodeGraph — the agent uses grep, find, ls, and Read extensively:** -| Codebase | Tool Uses | Tokens | Time | File Reads | -|----------|-----------|--------|------|------------| -| VS Code (TypeScript) | 52 | 89.4k | 1m 37s | ~15 | -| Excalidraw (TypeScript) | 47 | 77.9k | 1m 45s | ~20 | -| Claude Code (Python+Rust) | 40 | 69.3k | 1m 8s | ~15 | -| Claude Code (Java) | 26 | 73.3k | 1m 22s | ~15 | -| Alamofire (Swift) | 32 | 52.4k | 1m 39s | ~10 | -| Swift Compiler (Swift/C++) | 37 | 99.1k | 2m 8s | ~20 | - -**Key observations:** -- With CodeGraph, the agent **never fell back to reading files** — it trusted the codegraph_explore results completely -- Without CodeGraph, agents spent most of their time on discovery (find, ls, grep) before they could even start reading relevant code -- The Java codebase needed only **1 codegraph_explore call** to answer the entire question -- Cross-language queries (Python+Rust) worked seamlessly — CodeGraph's graph traversal found connections across language boundaries -- The Swift benchmark (Alamofire) traced a **9-step call chain** from `Session.request()` to `URLSession.dataTask()` — CodeGraph's graph traversal at depth 3 captured the full chain in one explore call -- The **Swift Compiler** benchmark is the largest codebase tested (**25,874 files, 272,898 nodes**) — CodeGraph indexed it in under 4 minutes and the agent answered a complex cross-cutting question with **6 explore calls and zero file reads** in 35 seconds +| Excalidraw | "How does Excalidraw render and update canvas elements?" | +| Django | "How does Django's ORM build and execute a query from a QuerySet?" | +| Tokio | "How does tokio schedule and run async tasks on its runtime?" | +| OkHttp | "How does OkHttp process a request through its interceptor chain?" | +| Gin | "How does gin route requests through its middleware chain?" | +| Alamofire | "How does Alamofire build, send, and validate a request?" | + +**Raw medians — WITH → WITHOUT:** +| Codebase | Cost | Tokens | Time | Tool calls | +|----------|------|--------|------|------------| +| VS Code | $0.42 → $0.64 | 393k → 1.4M | 1m 0s → 1m 43s | 7 → 23 | +| Excalidraw | $0.54 → $1.02 | 851k → 3.2M | 1m 17s → 3m 14s | 12 → 83 | +| Django | $0.41 → $0.62 | 499k → 1.4M | 1m 0s → 2m 25s | 9 → 48 | +| Tokio | $0.50 → $1.04 | 657k → 3.4M | 1m 5s → 2m 56s | 9 → 75 | +| OkHttp | $0.36 → $0.44 | 352k → 596k | 45s → 1m 11s | 5 → 14 | +| Gin | $0.36 → $0.46 | 431k → 562k | 47s → 1m 11s | 7 → 8 | +| Alamofire | $0.61 → $0.99 | 1.1M → 2.6M | 1m 19s → 2m 41s | 15 → 64 | + +**Why CodeGraph wins:** with the index available, the agent answers directly — `codegraph_context` to map the area, then one `codegraph_explore` for the relevant source — and stops, usually with zero file reads. Without it, the agent (and the Explore sub-agents it spawns) spends most of its budget on discovery (find/ls/grep) before reading the right code. CodeGraph only helps when queried *directly*, so its instructions steer agents to answer directly rather than delegate exploration to file-reading sub-agents — otherwise a sub-agent reads files regardless and CodeGraph becomes overhead.
diff --git a/src/installer/instructions-template.ts b/src/installer/instructions-template.ts index e7e4cdde..10b6b7ca 100644 --- a/src/installer/instructions-template.ts +++ b/src/installer/instructions-template.ts @@ -37,16 +37,17 @@ Use codegraph for **structural** questions — what calls what, what would break | "What would break if I changed Z?" | \`codegraph_impact\` | | "Show me Y's signature / source / docstring" | \`codegraph_node\` | | "Give me focused context for a task/area" | \`codegraph_context\` | -| "Survey an unfamiliar module/topic" | \`codegraph_explore\` | +| "See several related symbols' source at once" | \`codegraph_explore\` | | "What files exist under path/" | \`codegraph_files\` | | "Is the index healthy?" | \`codegraph_status\` | ### Rules of thumb +- **Answer directly — don't delegate exploration.** For "how does X work" / architecture / trace questions, answer with 2-3 codegraph calls: \`codegraph_context\` first, then ONE \`codegraph_explore\` for the source of the symbols it surfaces. Codegraph IS the pre-built index, so spawning a separate file-reading sub-task/agent — or running a grep + read loop — repeats work codegraph already did and costs more for the same answer. - **Trust codegraph results.** They come from a full AST parse. Do NOT re-verify them with grep — that's slower, less accurate, and wastes context. - **Don't grep first** when looking up a symbol by name. \`codegraph_search\` is faster and returns kind + location + signature in one call. - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one call. -- **\`codegraph_explore\` is the heavy hitter** for unfamiliar areas — it returns full source from all relevant files in one call, but is token-heavy. If your harness supports parallel subagents (e.g., Claude Code's Task tool), spawn one for explore-class questions to keep main session context clean. +- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns several symbols' source grouped in a single capped call, while each separate node/Read call re-reads the whole context and costs far more. - **Index lag**: the file watcher debounces ~500ms behind writes; don't re-query immediately after editing a file in the same turn. ### If \`.codegraph/\` doesn't exist diff --git a/src/mcp/server-instructions.ts b/src/mcp/server-instructions.ts index 0c715ea8..d82a3091 100644 --- a/src/mcp/server-instructions.ts +++ b/src/mcp/server-instructions.ts @@ -22,6 +22,18 @@ in the workspace. Reads are sub-millisecond; the index lags writes by about a second through the file watcher. Consult it BEFORE writing or editing code, not during. +## Answer directly — don't delegate exploration + +For "how does X work", architecture, trace, or where-is-X questions, +answer DIRECTLY using 2-3 codegraph calls: \`codegraph_context\` first, +then ONE \`codegraph_explore\` for the source of the symbols it surfaces. +Codegraph IS the pre-built search index — so delegating the lookup to a +separate file-reading sub-task/agent, or running your own grep + read +loop, repeats work codegraph already did and costs more for the same +answer. Reach for raw Read/Grep only to confirm a specific detail +codegraph didn't cover. A direct codegraph answer is typically a handful +of calls; a grep/read exploration is dozens. + ## Tool selection by intent - **"What is the symbol named X?"** → \`codegraph_search\` @@ -30,7 +42,7 @@ editing code, not during. - **"What does this call?"** → \`codegraph_callees\` - **"What would changing this break?"** → \`codegraph_impact\` - **"Show me this symbol's source / signature / docstring."** → \`codegraph_node\` -- **"Survey an unfamiliar topic / pattern / module."** → \`codegraph_explore\` (heavier; deep dive) +- **"Show me several related symbols' source / survey an area."** → \`codegraph_explore\` (ONE capped call; prefer over many codegraph_node/Read) - **"What's in directory X?"** → \`codegraph_files\` - **"Is the index ready / what's its size?"** → \`codegraph_status\` @@ -44,7 +56,7 @@ editing code, not during. - **Don't grep first** when looking up a symbol by name — \`codegraph_search\` is faster and returns kind + location + signature. - **Don't chain \`codegraph_search\` + \`codegraph_node\`** when you just want context — \`codegraph_context\` is one round-trip. -- **Don't use \`codegraph_explore\` for narrow questions** — it's a multi-call deep dive, expensive in tokens. Save it for genuine "I'm new here" surveys. +- **Don't loop \`codegraph_node\` over many symbols** — one \`codegraph_explore\` call returns them all grouped by file, while each separate call re-reads the whole context and costs far more. Use \`codegraph_node\` for a single symbol. - **Don't query the index immediately after editing a file** — the watcher needs ~500ms to debounce + sync. Wait for the next turn. ## Limitations diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index 204ee59c..1c8721b9 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -25,6 +25,16 @@ const MAX_OUTPUT_LENGTH = 15000; */ const RUST_PATH_PREFIXES = new Set(['crate', 'super', 'self']); +/** + * Node kinds that contain other symbols. For these, `codegraph_node` with + * `includeCode=true` returns a structural outline (member names + signatures + * + line numbers) instead of the full body, which for a large class is a + * multi-thousand-character wall of source that bloats the agent's context. + */ +const CONTAINER_NODE_KINDS = new Set([ + 'class', 'struct', 'interface', 'trait', 'protocol', 'enum', 'namespace', 'module', +]); + /** Last `::` / `.` / `/`-separated segment of a qualified symbol. */ function lastQualifierPart(symbol: string): string { const parts = symbol.split(/::|[./]/).filter((p) => p.length > 0); @@ -102,12 +112,12 @@ export function getExploreOutputBudget(fileCount: number): ExploreOutputBudget { } if (fileCount < 5000) { return { - maxOutputChars: 28000, - defaultMaxFiles: 9, - maxCharsPerFile: 5000, - gapThreshold: 12, - maxSymbolsInFileHeader: 10, - maxEdgesPerRelationshipKind: 10, + maxOutputChars: 13000, + defaultMaxFiles: 6, + maxCharsPerFile: 2500, + gapThreshold: 10, + maxSymbolsInFileHeader: 8, + maxEdgesPerRelationshipKind: 8, includeRelationships: true, includeAdditionalFiles: true, includeCompletenessSignal: true, @@ -263,7 +273,7 @@ export const tools: ToolDefinition[] = [ }, { name: 'codegraph_context', - description: 'PRIMARY TOOL: Build comprehensive context for a task. Returns entry points, related symbols, and key code - often enough to understand the codebase without additional tool calls. NOTE: This provides CODE context, not product requirements. For new features, still clarify UX/behavior questions with the user before implementing.', + description: 'PRIMARY TOOL — call this FIRST for any "how does X work", architecture, feature, or bug-context question. Composes search + node + callers + callees and returns entry points, related symbols, and key code in ONE call — usually enough to answer with no further search/Read/Grep. Prefer this over chaining codegraph_search + codegraph_node, and over codegraph_explore. NOTE: provides CODE context, not product requirements; for new features still clarify UX/edge cases with the user.', inputSchema: { type: 'object', properties: { @@ -348,7 +358,7 @@ export const tools: ToolDefinition[] = [ }, { name: 'codegraph_node', - description: 'Get detailed information about a specific code symbol. Use includeCode=true only when you need the full source code - otherwise just get location and signature to minimize context usage.', + description: 'Get detailed info about ONE symbol (location, signature, docstring). Pass includeCode=true for source: a function/method returns its body; a class/interface/struct/enum returns a compact member OUTLINE (fields + method signatures + line numbers), not every method body — Read or codegraph_node a specific member for its body. Keep includeCode=false to minimize context. For SEVERAL related symbols, make ONE codegraph_explore (or codegraph_context) call instead of many node calls — repeated node calls each re-read the whole context and cost far more.', inputSchema: { type: 'object', properties: { @@ -368,7 +378,7 @@ export const tools: ToolDefinition[] = [ }, { name: 'codegraph_explore', - description: 'Deep exploration tool — returns comprehensive context for a topic in a SINGLE call. Groups all relevant source code by file (contiguous sections, not snippets), includes a relationship map, and uses deeper graph traversal. Designed to replace multiple codegraph_node + file Read calls. Use this instead of codegraph_context when you need thorough understanding. IMPORTANT: Use specific symbol names, file names, or short code terms in your query — NOT natural language sentences. Before calling this, use codegraph_search to discover relevant symbol names, then include those names in your query. Bad: "how are agent prompts loaded and passed to the CLI". Good: "readAgentsFromDirectory createClaudeSession chat-manager agents.ts".', + description: 'Returns source for SEVERAL related symbols grouped by file, plus a relationship map, in ONE capped call. This is the efficient way to inspect many related symbols at once — strongly prefer it over a series of codegraph_node or Read calls (each separate call re-reads the whole context, so 8 node calls cost far more than 1 explore). Use it after codegraph_context when you need to see the actual source of several symbols. Query with specific symbol/file/code terms, NOT natural-language sentences — run codegraph_search first to find names. Bad: "how are agent prompts loaded and passed to the CLI". Good: "renderStaticScene drawElementOnCanvas ShapeCache renderElement.ts".', inputSchema: { type: 'object', properties: { @@ -1241,7 +1251,20 @@ export class ToolHandler { } } - return this.textResult(lines.join('\n')); + // Hard-cap to the adaptive budget. The per-file loop bounds the source + // sections, but the relationship map, additional-files list, and + // completeness/budget notes can still push the assembled output past + // maxOutputChars (observed 30k against a 28k tier cap). A fat explore + // payload persists in the agent's context and is re-read as cache-input + // on every subsequent turn, so the overrun is paid many times over. + const output = lines.join('\n'); + if (output.length > budget.maxOutputChars) { + const cut = output.slice(0, budget.maxOutputChars); + const lastNewline = cut.lastIndexOf('\n'); + const safe = lastNewline > budget.maxOutputChars * 0.8 ? cut.slice(0, lastNewline) : cut; + return this.textResult(safe + '\n\n... (explore output truncated to budget — use codegraph_node or Read for more)'); + } + return this.textResult(output); } /** @@ -1261,12 +1284,24 @@ export class ToolHandler { } let code: string | null = null; + let outline: string | null = null; if (includeCode) { - code = await cg.getCode(match.node.id); + // For container symbols (class/interface/struct/…), the full body is the + // sum of every method body — a wall of source (e.g. a 10k-char class) + // that bloats context and is rarely needed in full. Return a structural + // outline (members + signatures + line numbers) instead; the agent can + // Read or codegraph_node a specific method for its body. Leaf symbols + // (function/method/etc.) return their full body as before. + if (CONTAINER_NODE_KINDS.has(match.node.kind)) { + outline = this.buildContainerOutline(cg, match.node); + } + if (!outline) { + code = await cg.getCode(match.node.id); + } } - const formatted = this.formatNodeDetails(match.node, code) + match.note; + const formatted = this.formatNodeDetails(match.node, code, outline) + match.note; return this.textResult(this.truncateOutput(formatted)); } @@ -1716,7 +1751,29 @@ export class ToolHandler { return lines.join('\n'); } - private formatNodeDetails(node: Node, code: string | null): string { + /** + * Build a compact structural outline of a container symbol from its + * indexed children (methods, fields, properties, …) — name, kind, + * line number, and signature — so the agent gets the shape of a class + * without the full source of every method. Returns '' when the container + * has no indexed children, so the caller can fall back to full source. + */ + private buildContainerOutline(cg: CodeGraph, node: Node): string { + const children = cg.getChildren(node.id) + .filter(c => c.kind !== 'import' && c.kind !== 'export') + .sort((a, b) => (a.startLine ?? 0) - (b.startLine ?? 0)); + if (children.length === 0) return ''; + + const lines = [`**Members (${children.length}):**`, '']; + for (const c of children) { + const loc = c.startLine ? `:${c.startLine}` : ''; + const sig = c.signature ? ` — \`${c.signature}\`` : ''; + lines.push(`- ${c.name} (${c.kind})${loc}${sig}`); + } + return lines.join('\n'); + } + + private formatNodeDetails(node: Node, code: string | null, outline?: string | null): string { const location = node.startLine ? `:${node.startLine}` : ''; const lines: string[] = [ `## ${node.name} (${node.kind})`, @@ -1733,7 +1790,10 @@ export class ToolHandler { lines.push('', node.docstring); } - if (code) { + if (outline) { + lines.push('', outline, '', + `> Structural outline only. Read \`${node.filePath}\` or call codegraph_node on a specific member for its body.`); + } else if (code) { lines.push('', '```' + node.language, code, '```'); } From 5c6e5d5c67a6b3d3871d934b81d67a5e3d5419be Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Wed, 20 May 2026 16:37:34 -0500 Subject: [PATCH 08/47] Update README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 663d7d9c..27632d1d 100644 --- a/README.md +++ b/README.md @@ -499,7 +499,7 @@ MIT
-**Made for the Claude Code community** +**Made for AI coding agents — Claude Code, Cursor, Codex CLI, and opencode** [Report Bug](https://github.com/colbymchenry/codegraph/issues) · [Request Feature](https://github.com/colbymchenry/codegraph/issues) From 948b287536d5ea524dd1aaff65e58b31767f2ca0 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Wed, 20 May 2026 17:09:54 -0500 Subject: [PATCH 09/47] feat(frameworks): add NestJS support (#220) (#225) Detect NestJS projects and emit `route` nodes (each linked by a `references` edge to its handler method) across all four transport layers: - HTTP controllers: @Controller prefix joined with @Get/@Post/@Put/@Patch/@Delete/@Head/@Options/@All - GraphQL resolvers: @Query/@Mutation/@Subscription - Microservices: @MessagePattern/@EventPattern - WebSocket gateways: @SubscribeMessage (prefixed with gateway namespace) Detected from any @nestjs/* dependency in package.json (falls back to scanning *.controller.ts/*.resolver.ts/*.gateway.ts). Handles class+method path joining with empty @Controller()/@Get(), a string-aware balanced-paren arg reader so GraphQL type thunks (@Query(() => [User])) aren't truncated, stacked decorators (@UseGuards) when locating the handler, and disambiguates the @Query() GraphQL method decorator from the REST @Query() param decorator (GraphQL only counts inside @Resolver classes). Also resolves injected *Service/*Controller refs to their classes by Nest file-naming convention. Adds 18 framework tests; updates the README framework table and CHANGELOG. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 11 + README.md | 1 + __tests__/frameworks.test.ts | 298 +++++++++++++++++++ src/resolution/frameworks/index.ts | 3 + src/resolution/frameworks/nestjs.ts | 438 ++++++++++++++++++++++++++++ 5 files changed, 751 insertions(+) create mode 100644 src/resolution/frameworks/nestjs.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 4a36bdb8..b661dfd5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,17 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [Unreleased] ### Added +- **Framework routes (NestJS)**: CodeGraph now recognises NestJS projects and + emits `route` nodes — each linked by a `references` edge to its handler + method — across all four transport layers: HTTP controllers (the + `@Controller` prefix joined with `@Get`/`@Post`/`@Put`/`@Patch`/`@Delete`/ + `@Head`/`@Options`/`@All`, including empty `@Controller()`/`@Get()`), + GraphQL resolvers (`@Query`/`@Mutation`/`@Subscription`), microservice + handlers (`@MessagePattern`/`@EventPattern`), and WebSocket gateways + (`@SubscribeMessage`, prefixed with the gateway namespace). Detected + automatically from any `@nestjs/*` dependency in `package.json`. Querying a + controller method or resolver now surfaces the route that binds it. + Resolves [#220](https://github.com/colbymchenry/codegraph/issues/220). - **MCP / explore**: `codegraph_explore` source sections now carry line numbers (cat -n style `\t`, matching the Read tool). This lets the agent cite `file:line` straight from the explore payload instead of diff --git a/README.md b/README.md index 27632d1d..e36fcf7f 100644 --- a/README.md +++ b/README.md @@ -123,6 +123,7 @@ CodeGraph detects web-framework routing files and emits `route` nodes linked by | **Flask** | `@app.route('/path', methods=[...])`, blueprint routes | | **FastAPI** | `@app.get(...)`, `@router.post(...)`, all standard methods | | **Express** | `app.get(...)`, `router.post(...)` with middleware chains | +| **NestJS** | `@Controller` + `@Get/@Post/...`, GraphQL `@Resolver` + `@Query/@Mutation`, `@MessagePattern`/`@EventPattern`, `@SubscribeMessage` | | **Laravel** | `Route::get()`, `Route::resource()`, `Controller@action`, tuple syntax | | **Rails** | `get '/x', to: 'users#index'`, hash-rocket `=>` syntax | | **Spring** | `@GetMapping`, `@PostMapping`, `@RequestMapping` on methods | diff --git a/__tests__/frameworks.test.ts b/__tests__/frameworks.test.ts index 8eb33e2e..a5e5c56b 100644 --- a/__tests__/frameworks.test.ts +++ b/__tests__/frameworks.test.ts @@ -175,6 +175,287 @@ describe('expressResolver.extract', () => { }); }); +import { nestjsResolver } from '../src/resolution/frameworks/nestjs'; + +describe('nestjsResolver.extract — HTTP', () => { + it('joins @Controller prefix with @Get and links the handler', () => { + const src = ` +@Controller('users') +export class UsersController { + @Get() + findAll() { return []; } +} +`; + const { nodes, references } = nestjsResolver.extract!('users.controller.ts', src); + expect(nodes).toHaveLength(1); + expect(nodes[0].kind).toBe('route'); + expect(nodes[0].name).toBe('GET /users'); + expect(references[0].referenceName).toBe('findAll'); + expect(references[0].referenceKind).toBe('references'); + expect(references[0].fromNodeId).toBe(nodes[0].id); + }); + + it('joins controller prefix with a method-level path param', () => { + const src = ` +@Controller('cats') +export class CatsController { + @Get(':id') + findOne(@Param('id') id: string) { return id; } +} +`; + const { nodes, references } = nestjsResolver.extract!('cats.controller.ts', src); + expect(nodes[0].name).toBe('GET /cats/:id'); + expect(references[0].referenceName).toBe('findOne'); + }); + + it('handles an empty @Controller() and empty @Post()', () => { + const src = ` +@Controller() +export class AppController { + @Post() + create() {} +} +`; + const { nodes, references } = nestjsResolver.extract!('app.controller.ts', src); + expect(nodes[0].name).toBe('POST /'); + expect(references[0].referenceName).toBe('create'); + }); + + it('covers HTTP verbs and skips intervening method decorators', () => { + const src = ` +@Controller('todos') +export class TodosController { + @Put(':id') + @UseGuards(AuthGuard) + update(@Param('id') id: string) {} + + @Delete(':id') + async remove(@Param('id') id: string) {} +} +`; + const { nodes, references } = nestjsResolver.extract!('todos.controller.ts', src); + expect(nodes.map((n) => n.name)).toEqual(['PUT /todos/:id', 'DELETE /todos/:id']); + expect(references.map((r) => r.referenceName)).toEqual(['update', 'remove']); + }); + + it('attributes methods to the right controller when a file has two', () => { + const src = ` +@Controller('a') +export class AController { + @Get('x') + ax() {} +} + +@Controller('b') +export class BController { + @Get('y') + by() {} +} +`; + const { nodes } = nestjsResolver.extract!('multi.controller.ts', src); + expect(nodes.map((n) => n.name)).toEqual(['GET /a/x', 'GET /b/y']); + }); +}); + +describe('nestjsResolver.extract — GraphQL', () => { + it('emits QUERY/MUTATION nodes from a resolver, defaulting to the method name', () => { + const src = ` +@Resolver(() => User) +export class UsersResolver { + @Query(() => [User]) + users() { return []; } + + @Mutation(() => User) + createUser(@Args('input') input: CreateUserInput) {} +} +`; + const { nodes, references } = nestjsResolver.extract!('users.resolver.ts', src); + expect(nodes.map((n) => n.name)).toEqual(['QUERY users', 'MUTATION createUser']); + expect(references.map((r) => r.referenceName)).toEqual(['users', 'createUser']); + }); + + it('uses an explicit operation name when given', () => { + const src = ` +@Resolver() +export class CatsResolver { + @Query(() => Cat, { name: 'cat' }) + getCat() {} +} +`; + const { nodes } = nestjsResolver.extract!('cats.resolver.ts', src); + expect(nodes[0].name).toBe('QUERY cat'); + }); + + it('does NOT treat the REST @Query() parameter decorator as a GraphQL op', () => { + const src = ` +@Controller('search') +export class SearchController { + @Get() + search(@Query() query: SearchDto) { return query; } +} +`; + const { nodes } = nestjsResolver.extract!('search.controller.ts', src); + // Only the HTTP route — the @Query() param decorator must be ignored. + expect(nodes.map((n) => n.name)).toEqual(['GET /search']); + }); +}); + +describe('nestjsResolver.extract — microservices & websockets', () => { + it('extracts @MessagePattern and @EventPattern handlers', () => { + const src = ` +@Controller() +export class MathController { + @MessagePattern({ cmd: 'sum' }) + accumulate(data: number[]) {} + + @EventPattern('user.created') + handleUserCreated(data: any) {} +} +`; + const { nodes, references } = nestjsResolver.extract!('math.controller.ts', src); + expect(nodes.map((n) => n.name)).toEqual(['MESSAGE sum', 'EVENT user.created']); + expect(references.map((r) => r.referenceName)).toEqual(['accumulate', 'handleUserCreated']); + }); + + it('extracts @SubscribeMessage handlers with the gateway namespace', () => { + const src = ` +@WebSocketGateway({ namespace: 'chat' }) +export class ChatGateway { + @SubscribeMessage('message') + handleMessage(@MessageBody() data: string) {} +} +`; + const { nodes, references } = nestjsResolver.extract!('chat.gateway.ts', src); + expect(nodes[0].name).toBe('WS chat:message'); + expect(references[0].referenceName).toBe('handleMessage'); + }); + + it('extracts @SubscribeMessage without a namespace', () => { + const src = ` +@WebSocketGateway() +export class EventsGateway { + @SubscribeMessage('events') + onEvent() {} +} +`; + const { nodes } = nestjsResolver.extract!('events.gateway.ts', src); + expect(nodes[0].name).toBe('WS events'); + }); + + it('returns empty for a non-JS/TS file', () => { + const { nodes, references } = nestjsResolver.extract!('thing.py', '@Controller("x")'); + expect(nodes).toEqual([]); + expect(references).toEqual([]); + }); +}); + +describe('nestjsResolver.detect', () => { + const baseContext = { + getNodesInFile: () => [], + getNodesByName: () => [], + getNodesByQualifiedName: () => [], + getNodesByKind: () => [], + fileExists: () => false, + getProjectRoot: () => '/test', + getAllFiles: () => [], + getNodesByLowerName: () => [], + getImportMappings: () => [], + }; + + it('detects @nestjs/* in package.json', () => { + const context = { + ...baseContext, + readFile: (p: string) => + p === 'package.json' + ? JSON.stringify({ dependencies: { '@nestjs/common': '^10.0.0' } }) + : null, + }; + expect(nestjsResolver.detect(context as any)).toBe(true); + }); + + it('detects @Controller in a *.controller.ts file when package.json is absent', () => { + const context = { + ...baseContext, + getAllFiles: () => ['src/users.controller.ts'], + readFile: (p: string) => + p === 'src/users.controller.ts' + ? `@Controller('users')\nexport class UsersController {}` + : null, + }; + expect(nestjsResolver.detect(context as any)).toBe(true); + }); + + it('returns false for a non-Nest project', () => { + const context = { + ...baseContext, + readFile: (p: string) => + p === 'package.json' ? JSON.stringify({ dependencies: { express: '^4' } }) : null, + }; + expect(nestjsResolver.detect(context as any)).toBe(false); + }); +}); + +describe('nestjsResolver.resolve', () => { + const baseContext = { + getNodesInFile: () => [], + getNodesByName: () => [], + getNodesByQualifiedName: () => [], + getNodesByKind: () => [], + fileExists: () => false, + readFile: () => null, + getProjectRoot: () => '/test', + getAllFiles: () => [], + getNodesByLowerName: () => [], + getImportMappings: () => [], + }; + + it('resolves an injected *Service reference to the class in a *.service.ts file', () => { + const svcNode: Node = { + id: 'class:src/users/users.service.ts:UsersService:3', + kind: 'class', + name: 'UsersService', + qualifiedName: 'src/users/users.service.ts::UsersService', + filePath: 'src/users/users.service.ts', + language: 'typescript', + startLine: 3, + endLine: 3, + startColumn: 0, + endColumn: 0, + updatedAt: Date.now(), + }; + const context = { + ...baseContext, + getNodesByName: (n: string) => (n === 'UsersService' ? [svcNode] : []), + }; + const ref = { + fromNodeId: 'class:src/users/users.controller.ts:UsersController:5', + referenceName: 'UsersService', + referenceKind: 'references' as const, + line: 6, + column: 4, + filePath: 'src/users/users.controller.ts', + language: 'typescript' as const, + }; + const result = nestjsResolver.resolve(ref, context as any); + expect(result?.targetNodeId).toBe(svcNode.id); + expect(result?.resolvedBy).toBe('framework'); + expect(result?.confidence).toBeGreaterThanOrEqual(0.85); + }); + + it('returns null for a name without a provider suffix', () => { + const ref = { + fromNodeId: 'x', + referenceName: 'doThing', + referenceKind: 'references' as const, + line: 1, + column: 1, + filePath: 'a.ts', + language: 'typescript' as const, + }; + expect(nestjsResolver.resolve(ref, baseContext as any)).toBeNull(); + }); +}); + import { laravelResolver } from '../src/resolution/frameworks/laravel'; describe('laravelResolver.extract', () => { @@ -768,4 +1049,21 @@ app.get("real", use: listUsers) expect(nodes.map((n) => n.name)).toEqual(['GET real']); expect(references.map((r) => r.referenceName)).toEqual(['listUsers']); }); + + it('nestjs: skips // and /* */ commented decorators', () => { + const src = ` +@Controller('users') +export class UsersController { + // @Get('fake') + // fake() {} + /* @Post('also-fake') + alsoFake() {} */ + @Get('real') + real() {} +} +`; + const { nodes, references } = nestjsResolver.extract!('users.controller.ts', src); + expect(nodes.map((n) => n.name)).toEqual(['GET /users/real']); + expect(references.map((r) => r.referenceName)).toEqual(['real']); + }); }); diff --git a/src/resolution/frameworks/index.ts b/src/resolution/frameworks/index.ts index f50ea84a..188b5e48 100644 --- a/src/resolution/frameworks/index.ts +++ b/src/resolution/frameworks/index.ts @@ -8,6 +8,7 @@ import { FrameworkResolver, ResolutionContext } from '../types'; import type { Language } from '../../types'; import { laravelResolver } from './laravel'; import { expressResolver } from './express'; +import { nestjsResolver } from './nestjs'; import { reactResolver } from './react'; import { svelteResolver } from './svelte'; import { vueResolver } from './vue'; @@ -27,6 +28,7 @@ const FRAMEWORK_RESOLVERS: FrameworkResolver[] = [ laravelResolver, // JavaScript/TypeScript expressResolver, + nestjsResolver, reactResolver, svelteResolver, vueResolver, @@ -105,6 +107,7 @@ export function registerFrameworkResolver(resolver: FrameworkResolver): void { // Re-export framework resolvers export { laravelResolver, FACADE_MAPPINGS } from './laravel'; export { expressResolver } from './express'; +export { nestjsResolver } from './nestjs'; export { reactResolver } from './react'; export { svelteResolver } from './svelte'; export { vueResolver } from './vue'; diff --git a/src/resolution/frameworks/nestjs.ts b/src/resolution/frameworks/nestjs.ts new file mode 100644 index 00000000..3a8c1e9a --- /dev/null +++ b/src/resolution/frameworks/nestjs.ts @@ -0,0 +1,438 @@ +/** + * NestJS Framework Resolver + * + * Handles NestJS decorator-based routing across its transport layers: + * - HTTP: @Controller(prefix) + @Get/@Post/@Put/@Patch/@Delete/@Head/@Options/@All + * - GraphQL: @Resolver + @Query/@Mutation/@Subscription + * - Microservices: @MessagePattern / @EventPattern + * - WebSockets: @WebSocketGateway(namespace) + @SubscribeMessage(event) + * + * Like the other framework extractors this is regex-over-source (comment- + * stripped), not AST traversal. NestJS differs from Spring/ASP.NET in two ways + * that this resolver has to account for: + * + * 1. An HTTP route's path is split across TWO decorators — the class-level + * `@Controller` prefix and the method-level `@Get`/`@Post` path — and both + * are frequently empty (`@Controller()`, `@Get()`). We pair each method + * decorator with its enclosing class and join the two paths. + * + * 2. `@Query()` is overloaded: it's a GraphQL *method* decorator (from + * `@nestjs/graphql`) AND a REST *parameter* decorator (from + * `@nestjs/common`). We only treat it as GraphQL when it sits inside an + * `@Resolver` class, which is what disambiguates the two. + */ + +import { Node } from '../../types'; +import { + FrameworkResolver, + UnresolvedRef, + ResolvedRef, + ResolutionContext, +} from '../types'; +import { stripCommentsForRegex } from '../strip-comments'; + +type JsLang = 'typescript' | 'javascript'; + +const HTTP_METHODS = ['Get', 'Post', 'Put', 'Patch', 'Delete', 'Head', 'Options', 'All']; +const GQL_OPS = ['Query', 'Mutation', 'Subscription']; + +export const nestjsResolver: FrameworkResolver = { + name: 'nestjs', + languages: ['typescript', 'javascript'], + + detect(context: ResolutionContext): boolean { + // Primary, fast path: any @nestjs/* dependency in package.json. + const packageJson = context.readFile('package.json'); + if (packageJson) { + try { + const pkg = JSON.parse(packageJson); + const deps = { ...pkg.dependencies, ...pkg.devDependencies }; + if (Object.keys(deps).some((k) => k.startsWith('@nestjs/'))) { + return true; + } + } catch { + // Invalid JSON — fall through to the source scan. + } + } + + // Fallback: NestJS-specific decorators in conventionally named files. + const allFiles = context.getAllFiles(); + for (const file of allFiles) { + if ( + file.endsWith('.controller.ts') || + file.endsWith('.controller.js') || + file.endsWith('.module.ts') || + file.endsWith('.resolver.ts') || + file.endsWith('.gateway.ts') + ) { + const content = context.readFile(file); + if ( + content && + (content.includes('@nestjs/') || + content.includes('@Controller') || + content.includes('@Module(') || + content.includes('@Resolver(') || + content.includes('@WebSocketGateway(')) + ) { + return true; + } + } + } + + return false; + }, + + resolve(ref: UnresolvedRef, context: ResolutionContext): ResolvedRef | null { + // Resolve provider/controller references (e.g. constructor-injected + // `UsersService`) to their class, preferring the Nest file-name + // convention (`*.service.ts`, `*.controller.ts`, …). + for (const [suffix, convention] of PROVIDER_CONVENTIONS) { + if (!suffix.test(ref.referenceName)) continue; + const candidates = context + .getNodesByName(ref.referenceName) + .filter((n) => n.kind === 'class'); + if (candidates.length === 0) return null; + const preferred = candidates.find((n) => n.filePath.includes(convention)); + const target = preferred ?? candidates[0]!; + return { + original: ref, + targetNodeId: target.id, + confidence: preferred ? 0.85 : 0.7, + resolvedBy: 'framework', + }; + } + return null; + }, + + extract(filePath, content) { + if (!/\.(m?js|tsx?|cjs)$/.test(filePath)) return { nodes: [], references: [] }; + const nodes: Node[] = []; + const references: UnresolvedRef[] = []; + const now = Date.now(); + const lang = detectLanguage(filePath); + const safe = stripCommentsForRegex(content, lang); + + const addRoute = ( + index: number, + method: string, + path: string, + length: number, + handler: string | null + ): void => { + const line = lineAt(safe, index); + const node: Node = { + id: `route:${filePath}:${line}:${method}:${path}`, + kind: 'route', + name: `${method} ${path}`, + qualifiedName: `${filePath}::${method}:${path}`, + filePath, + startLine: line, + endLine: line, + startColumn: 0, + endColumn: length, + language: lang, + updatedAt: now, + }; + nodes.push(node); + if (handler) { + references.push({ + fromNodeId: node.id, + referenceName: handler, + referenceKind: 'references', + line, + column: 0, + filePath, + language: lang, + }); + } + }; + + const scopes = buildClassScopes(safe); + + // HTTP routes: method decorator path joined onto the enclosing controller's prefix. + for (const hit of findDecorators(safe, HTTP_METHODS)) { + const scope = scopeFor(scopes, hit.index); + const prefix = scope && scope.kind === 'controller' ? scope.prefix : ''; + const path = joinHttpPath(prefix, parseStringArg(hit.args)); + addRoute(hit.index, hit.name.toUpperCase(), path, hit.length, methodNameAfter(safe, hit.end)); + } + + // GraphQL operations: only inside an @Resolver class (disambiguates the + // REST `@Query()` parameter decorator, which lives inside @Controller classes). + for (const hit of findDecorators(safe, GQL_OPS)) { + const scope = scopeFor(scopes, hit.index); + if (!scope || scope.kind !== 'resolver') continue; + const handler = methodNameAfter(safe, hit.end); + const name = parseGraphqlName(hit.args, handler); + addRoute(hit.index, hit.name.toUpperCase(), name, hit.length, handler); + } + + // Microservice message/event handlers. + for (const hit of findDecorators(safe, ['MessagePattern', 'EventPattern'])) { + const verb = hit.name === 'EventPattern' ? 'EVENT' : 'MESSAGE'; + const handler = methodNameAfter(safe, hit.end); + addRoute(hit.index, verb, parseStringArg(hit.args) || handler || '', hit.length, handler); + } + + // WebSocket message handlers, prefixed with the gateway namespace when present. + for (const hit of findDecorators(safe, ['SubscribeMessage'])) { + const scope = scopeFor(scopes, hit.index); + const namespace = scope && scope.kind === 'gateway' ? scope.prefix : ''; + const handler = methodNameAfter(safe, hit.end); + const event = parseStringArg(hit.args) || handler || ''; + addRoute(hit.index, 'WS', namespace ? `${namespace}:${event}` : event, hit.length, handler); + } + + return { nodes, references }; + }, +}; + +// --------------------------------------------------------------------------- +// Provider resolution conventions +// --------------------------------------------------------------------------- + +const PROVIDER_CONVENTIONS: Array<[RegExp, string]> = [ + [/Service$/, '.service.'], + [/Controller$/, '.controller.'], + [/Resolver$/, '.resolver.'], + [/Gateway$/, '.gateway.'], + [/Repository$/, '.repository.'], + [/Guard$/, '.guard.'], + [/Interceptor$/, '.interceptor.'], + [/Pipe$/, '.pipe.'], + [/Module$/, '.module.'], +]; + +// --------------------------------------------------------------------------- +// Decorator scanning +// --------------------------------------------------------------------------- + +interface DecoratorHit { + /** Decorator name without the leading `@` (e.g. `Get`). */ + name: string; + /** Raw text between the decorator's parentheses. */ + args: string; + /** Index of the leading `@` in the (comment-stripped) source. */ + index: number; + /** Index just past the decorator's closing `)`. */ + end: number; + /** Character length of the whole `@Name(...)` decorator. */ + length: number; +} + +/** + * Find every `@Name(...)` decorator whose name is in `names`. Uses a + * string-aware balanced-paren reader for the argument list so type thunks + * like `@Query(() => [User])` are captured whole rather than truncated at the + * inner `()`. + */ +function findDecorators(safe: string, names: string[]): DecoratorHit[] { + const hits: DecoratorHit[] = []; + const re = new RegExp(`@(${names.join('|')})\\s*\\(`, 'g'); + let m: RegExpExecArray | null; + while ((m = re.exec(safe)) !== null) { + const openIndex = m.index + m[0].length - 1; // position of '(' + const parsed = readArgs(safe, openIndex); + if (!parsed) continue; + hits.push({ + name: m[1]!, + args: parsed.args, + index: m.index, + end: parsed.end, + length: parsed.end - m.index, + }); + re.lastIndex = parsed.end; // resume past the args so nested text isn't re-scanned + } + return hits; +} + +/** + * Read a balanced `(...)` starting at `openIndex` (which must point at `(`). + * String-aware, so parens inside string literals don't unbalance the count. + * Returns the inner text and the index just past the closing `)`. + */ +function readArgs(s: string, openIndex: number): { args: string; end: number } | null { + if (s[openIndex] !== '(') return null; + let depth = 0; + let inStr: string | null = null; + for (let i = openIndex; i < s.length; i++) { + const ch = s[i]!; + if (inStr) { + if (ch === '\\') { + i++; + continue; + } + if (ch === inStr) inStr = null; + continue; + } + if (ch === '"' || ch === "'" || ch === '`') { + inStr = ch; + continue; + } + if (ch === '(') depth++; + else if (ch === ')') { + depth--; + if (depth === 0) return { args: s.slice(openIndex + 1, i), end: i + 1 }; + } + } + return null; +} + +/** + * Starting just after a method decorator's `)`, return the name of the method + * it decorates. Skips any further stacked decorators (`@UseGuards(...)`, + * `@HttpCode(204)`, …) and access/async modifiers in between. + */ +function methodNameAfter(safe: string, start: number): string | null { + let i = start; + const ws = /\s*/y; + const decoName = /@[\w.]+/y; + const modifier = /(?:public|private|protected|async|static)\b/y; + const ident = /([A-Za-z_$][\w$]*)\s*\(/y; + + const eatWs = (): void => { + ws.lastIndex = i; + if (ws.exec(safe)) i = ws.lastIndex; + }; + + // Skip stacked decorators. + for (;;) { + eatWs(); + if (safe[i] !== '@') break; + decoName.lastIndex = i; + if (!decoName.exec(safe)) break; + i = decoName.lastIndex; + eatWs(); + if (safe[i] === '(') { + const parsed = readArgs(safe, i); + if (!parsed) return null; + i = parsed.end; + } + } + + // Skip access/async/static modifiers. + for (;;) { + eatWs(); + modifier.lastIndex = i; + if (modifier.exec(safe) && modifier.lastIndex > i) { + i = modifier.lastIndex; + continue; + } + break; + } + + eatWs(); + ident.lastIndex = i; + const m = ident.exec(safe); + return m ? m[1]! : null; +} + +// --------------------------------------------------------------------------- +// Class scopes (controller / resolver / gateway boundaries) +// --------------------------------------------------------------------------- + +type ClassKind = 'controller' | 'resolver' | 'gateway' | 'other'; + +interface ClassScope { + kind: ClassKind; + /** HTTP prefix (controller) or WS namespace (gateway); '' otherwise. */ + prefix: string; + start: number; + end: number; +} + +/** + * Build the list of class-level decorator scopes, sorted by position. Each + * scope runs from its decorator up to the next class decorator (of any kind), + * which lets a method decorator find its enclosing class regardless of how + * many classes share a file. + */ +function buildClassScopes(safe: string): ClassScope[] { + const defs: Array<{ kind: ClassKind; name: string; prefixOf: (a: string) => string }> = [ + { kind: 'controller', name: 'Controller', prefixOf: parseControllerPrefix }, + { kind: 'resolver', name: 'Resolver', prefixOf: () => '' }, + { kind: 'gateway', name: 'WebSocketGateway', prefixOf: parseGatewayNamespace }, + { kind: 'other', name: 'Injectable', prefixOf: () => '' }, + { kind: 'other', name: 'Module', prefixOf: () => '' }, + { kind: 'other', name: 'Catch', prefixOf: () => '' }, + ]; + + const raw: Array<{ kind: ClassKind; prefix: string; index: number }> = []; + for (const def of defs) { + for (const hit of findDecorators(safe, [def.name])) { + raw.push({ kind: def.kind, prefix: def.prefixOf(hit.args), index: hit.index }); + } + } + raw.sort((a, b) => a.index - b.index); + + return raw.map((r, i) => ({ + kind: r.kind, + prefix: r.prefix, + start: r.index, + end: i + 1 < raw.length ? raw[i + 1]!.index : safe.length, + })); +} + +function scopeFor(scopes: ClassScope[], index: number): ClassScope | null { + for (const s of scopes) { + if (index >= s.start && index < s.end) return s; + } + return null; +} + +// --------------------------------------------------------------------------- +// Argument parsing +// --------------------------------------------------------------------------- + +/** First string literal anywhere in the args, or '' (covers `'x'`, `{ k: 'x' }`). */ +function parseStringArg(args: string): string { + const m = args.match(/['"`]([^'"`]*)['"`]/); + return m ? m[1]! : ''; +} + +/** `@Controller('users')` | `@Controller({ path: 'users', host })` | `@Controller(['a','b'])` | `@Controller()`. */ +function parseControllerPrefix(args: string): string { + const obj = args.match(/path\s*:\s*['"`]([^'"`]*)['"`]/); + if (obj) return obj[1]!; + return parseStringArg(args); +} + +/** `@WebSocketGateway({ namespace: 'chat' })` | `@WebSocketGateway(81, { namespace: '/chat' })` | `@WebSocketGateway()`. */ +function parseGatewayNamespace(args: string): string { + const m = args.match(/namespace\s*:\s*['"`]([^'"`]*)['"`]/); + return m ? m[1]! : ''; +} + +/** + * GraphQL operation name. Prefers an explicit `{ name: 'x' }` or a leading + * string literal (`@Query('users')`); otherwise the field name defaults to the + * handler method name. Avoids mistaking a `description` string for the name. + */ +function parseGraphqlName(args: string, handler: string | null): string { + const named = args.match(/name\s*:\s*['"`]([^'"`]*)['"`]/); + if (named) return named[1]!; + const lead = args.match(/^\s*['"`]([^'"`]*)['"`]/); + if (lead) return lead[1]!; + return handler ?? ''; +} + +// --------------------------------------------------------------------------- +// Path helpers +// --------------------------------------------------------------------------- + +/** Join a controller prefix and method path into a single normalised `/path`. */ +function joinHttpPath(prefix: string, sub: string): string { + const parts = [prefix, sub] + .map((p) => p.trim().replace(/^\/+|\/+$/g, '')) + .filter((p) => p.length > 0); + return '/' + parts.join('/'); +} + +function lineAt(safe: string, index: number): number { + return safe.slice(0, index).split('\n').length; +} + +function detectLanguage(filePath: string): JsLang { + if (filePath.endsWith('.ts') || filePath.endsWith('.tsx')) return 'typescript'; + return 'javascript'; +} From d2664e87b0cf3df78f7ff282398a003d81efccd9 Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Wed, 20 May 2026 18:36:18 -0500 Subject: [PATCH 10/47] Readme updated and detect test files in kotlin and swift --- README.md | 2 +- __tests__/is-test-file.test.ts | 53 +++++++++++++++++++++++++++ src/search/query-utils.ts | 65 +++++++++++++++++++--------------- 3 files changed, 90 insertions(+), 30 deletions(-) create mode 100644 __tests__/is-test-file.test.ts diff --git a/README.md b/README.md index e36fcf7f..559e8845 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ [![npm version](https://img.shields.io/npm/v/@colbymchenry/codegraph.svg)](https://www.npmjs.com/package/@colbymchenry/codegraph) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -[![Node.js](https://img.shields.io/badge/Node.js-18+-green.svg)](https://nodejs.org/) +[![Node.js](https://img.shields.io/badge/Node.js-20--24-green.svg)](https://nodejs.org/) [![Windows](https://img.shields.io/badge/Windows-supported-blue.svg)](#) [![macOS](https://img.shields.io/badge/macOS-supported-blue.svg)](#) diff --git a/__tests__/is-test-file.test.ts b/__tests__/is-test-file.test.ts new file mode 100644 index 00000000..e3fc6d03 --- /dev/null +++ b/__tests__/is-test-file.test.ts @@ -0,0 +1,53 @@ +/** + * isTestFile heuristic — test-file detection used to deprioritize test code in + * search/explore ranking. + * + * Regression coverage for the cold-query fix: the heuristic previously only + * knew Java/JS/Python conventions, so Kotlin (`*Test.kt`, `jvmTest/`), Swift + * (`*Tests.swift`), and camelCase test source-set dirs slipped through — which + * let OkHttp's tests flood `codegraph_explore` results on a plain-language + * query. The false-positive guards matter just as much: `latest.kt` / + * `manifest.kt` / a `RealCall.kt` production file must NOT be flagged. + */ +import { describe, it, expect } from 'vitest'; +import { isTestFile } from '../src/search/query-utils'; + +describe('isTestFile', () => { + it('flags Kotlin test files and source sets', () => { + expect(isTestFile('okhttp/src/jvmTest/kotlin/okhttp3/CallTest.kt')).toBe(true); + expect(isTestFile('okhttp/src/commonTest/kotlin/okhttp3/CompressionInterceptorTest.kt')).toBe(true); + expect(isTestFile('app/src/androidTest/java/com/example/FooTest.kt')).toBe(true); + expect(isTestFile('module/src/integrationTest/kotlin/BarSpec.kt')).toBe(true); + }); + + it('flags Swift test files', () => { + expect(isTestFile('Tests/SessionTests.swift')).toBe(true); + expect(isTestFile('Sources/FooTest.swift')).toBe(true); + }); + + it('still flags the previously-supported conventions', () => { + expect(isTestFile('foo/test_bar.py')).toBe(true); + expect(isTestFile('pkg/bar_test.go')).toBe(true); + expect(isTestFile('src/foo.test.ts')).toBe(true); + expect(isTestFile('src/foo.spec.ts')).toBe(true); + expect(isTestFile('com/example/FooTest.java')).toBe(true); + expect(isTestFile('com/example/FooTestCase.java')).toBe(true); + expect(isTestFile('project/__tests__/foo.ts')).toBe(true); + expect(isTestFile('project/tests/foo.rb')).toBe(true); + }); + + it('does NOT flag production files that merely contain "test" lowercase', () => { + // The fix is capital-led so camelCase boundaries distinguish these. + expect(isTestFile('src/latest/loader.kt')).toBe(false); + expect(isTestFile('lib/manifest.kt')).toBe(false); + expect(isTestFile('okhttp/src/jvmMain/kotlin/okhttp3/internal/connection/RealCall.kt')).toBe(false); + expect(isTestFile('src/contestEntry.ts')).toBe(false); + expect(isTestFile('pkg/greatest.go')).toBe(false); + }); + + it('does NOT flag ordinary production source', () => { + expect(isTestFile('src/flask/app.py')).toBe(false); + expect(isTestFile('src/vs/workbench/api/common/extensionHostMain.ts')).toBe(false); + expect(isTestFile('okhttp/src/commonJvmAndroid/kotlin/okhttp3/OkHttpClient.kt')).toBe(false); + }); +}); diff --git a/src/search/query-utils.ts b/src/search/query-utils.ts index 9a61acae..da0645f8 100644 --- a/src/search/query-utils.ts +++ b/src/search/query-utils.ts @@ -207,36 +207,43 @@ export function scorePathRelevance(filePath: string, query: string): number { */ export function isTestFile(filePath: string): boolean { const lower = filePath.toLowerCase(); - const fileName = path.basename(lower); - - // Common test file patterns - return ( - fileName.startsWith('test_') || - fileName.startsWith('test.') || - fileName.endsWith('.test.ts') || - fileName.endsWith('.test.js') || - fileName.endsWith('.test.tsx') || - fileName.endsWith('.test.jsx') || - fileName.endsWith('.spec.ts') || - fileName.endsWith('.spec.js') || - fileName.endsWith('_test.go') || - fileName.endsWith('_test.py') || - fileName.endsWith('_test.rs') || - fileName.endsWith('Tests.java') || - fileName.endsWith('Test.java') || - fileName.endsWith('Tester.java') || - fileName.endsWith('TestCase.java') || - lower.includes('/tests/') || - lower.includes('/test/') || - lower.includes('/__tests__/') || - lower.includes('/spec/') || - lower.includes('/testlib/') || + const fileName = path.basename(filePath); // original case — needed for camelCase boundaries + const lowerName = fileName.toLowerCase(); + + // --- Filename patterns --- + if ( + lowerName.startsWith('test_') || // python: test_foo.py + lowerName.startsWith('test.') || + // separator-delimited: foo_test.go, foo.test.ts, foo-spec.rb, bar_spec.py + /[._-](test|tests|spec|specs)\.[a-z0-9]+$/.test(lowerName) || + // CamelCase suffix (Java/Kotlin/Swift/C#/Scala): FooTest.kt, BarTests.swift, + // BazSpec.scala, QuxTestCase.java. Capital-led so "latest.kt"/"manifest.kt" + // (lowercase "test") are NOT matched. + /(?:Test|Tests|TestCase|Tester|Spec|Specs)\.[A-Za-z0-9]+$/.test(fileName) + ) { + return true; + } + + // --- Directory patterns --- + if ( + lower.includes('/tests/') || lower.includes('/test/') || + lower.includes('/__tests__/') || lower.includes('/spec/') || + lower.includes('/specs/') || lower.includes('/testlib/') || lower.includes('/testing/') || - // Non-production directories: examples, samples, benchmarks, fixtures, demos. - // Check both mid-path (/integration/) and start-of-path (integration/) since - // file paths may be stored as relative paths without a leading slash. - matchesNonProductionDir(lower) - ); + lower.startsWith('test/') || lower.startsWith('tests/') || + lower.startsWith('spec/') || lower.startsWith('specs/') || + // CamelCase test source-set dirs (Kotlin Multiplatform / Gradle / Xcode): + // jvmTest/, commonTest/, androidTest/, iosTest/, integrationTest/. Capital-led + // so "latest/" / "manifest/" are not matched. + /(?:^|\/)[A-Za-z0-9]*(?:Test|Tests|Spec)\//.test(filePath) + ) { + return true; + } + + // Non-production directories: examples, samples, benchmarks, fixtures, demos. + // Check both mid-path (/integration/) and start-of-path (integration/) since + // file paths may be stored as relative paths without a leading slash. + return matchesNonProductionDir(lower); } /** From c3f1e273d4c5e7052c8a9ec6bd3109c042f3af8c Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Wed, 20 May 2026 18:41:42 -0500 Subject: [PATCH 11/47] release: 0.8.0 --- CHANGELOG.md | 15 ++++++++++++++- package-lock.json | 4 ++-- package.json | 2 +- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b661dfd5..321721ae 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,7 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased] +## [0.8.0] - 2026-05-20 ### Added - **Framework routes (NestJS)**: CodeGraph now recognises NestJS projects and @@ -91,6 +91,18 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). VS Code ~12%. Agent-trust floor still holds — the Relationships section, scored cluster selection, and structured-source output are all retained. Thanks to [@essopsp](https://github.com/essopsp) for the repro. +- **Search ranking (Kotlin / Swift / Scala / C#)**: test files in these + languages are now correctly de-prioritized in `codegraph_search`, + `codegraph_context`, and `codegraph affected`. Detection previously only + recognized `snake_case`/`.test.`-style names plus a handful of Java + suffixes, so CamelCase test files (`FooTest.kt`, `BarTests.swift`, + `BazSpec.scala`, `QuxTestCase.cs`) and Gradle / Kotlin-Multiplatform / + Xcode test source-set directories (`jvmTest/`, `commonTest/`, + `androidTest/`, `iosTest/`, `integrationTest/`) were treated as production + code and could outrank the real implementation. Detection now matches + capital-led `*Test` / `*Tests` / `*Spec` / `*TestCase` filenames and + source-set directories — deliberately capital-led so lowercase look-alikes + like `latest.kt` and `manifest.kt` are not misclassified. ### Fixed - **MCP / explore**: `codegraph_explore` output is now hard-capped to its @@ -235,6 +247,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). returns `null` instead of resolving to an unrelated `rollback` in the same file. +[0.8.0]: https://github.com/colbymchenry/codegraph/releases/tag/v0.8.0 [0.7.10]: https://github.com/colbymchenry/codegraph/releases/tag/v0.7.10 ## [0.7.8] - 2026-05-17 diff --git a/package-lock.json b/package-lock.json index 1b4ce89d..44e4c829 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@colbymchenry/codegraph", - "version": "0.7.11", + "version": "0.8.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@colbymchenry/codegraph", - "version": "0.7.11", + "version": "0.8.0", "license": "MIT", "dependencies": { "@clack/prompts": "^1.3.0", diff --git a/package.json b/package.json index 202e9a48..58f9f0ab 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@colbymchenry/codegraph", - "version": "0.7.11", + "version": "0.8.0", "description": "Supercharge Claude Code with semantic code intelligence. 94% fewer tool calls • 77% faster exploration • 100% local.", "main": "dist/index.js", "types": "dist/index.d.ts", From 2fc0df71088a9eed5f64389c07ed01da18108958 Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Thu, 21 May 2026 08:27:12 -0500 Subject: [PATCH 12/47] chore: remove leftover debug scripts and stale docs Delete dead dev scratch (debug_python_ast*.js, test_python_inheritance.js), the obsolete tree-sitter-dart native patch (Dart loads via WASM now and the package is no longer a dependency), and orphaned docs superseded by the current code and the agent-eval skill (IMPLEMENTATION_PLAN.md, DELPHI-SUPPORT.md, run-interactive-test.md). Co-Authored-By: Claude Opus 4.7 (1M context) --- DELPHI-SUPPORT.md | 157 --- IMPLEMENTATION_PLAN.md | 1736 ----------------------------- debug_python_ast.js | 26 - debug_python_ast2.js | 26 - run-interactive-test.md | 131 --- scripts/patch-tree-sitter-dart.js | 112 -- test_python_inheritance.js | 35 - 7 files changed, 2223 deletions(-) delete mode 100644 DELPHI-SUPPORT.md delete mode 100644 IMPLEMENTATION_PLAN.md delete mode 100644 debug_python_ast.js delete mode 100644 debug_python_ast2.js delete mode 100644 run-interactive-test.md delete mode 100644 scripts/patch-tree-sitter-dart.js delete mode 100644 test_python_inheritance.js diff --git a/DELPHI-SUPPORT.md b/DELPHI-SUPPORT.md deleted file mode 100644 index 7d452451..00000000 --- a/DELPHI-SUPPORT.md +++ /dev/null @@ -1,157 +0,0 @@ -# Pascal / Delphi Support for CodeGraph - -## Why Delphi? - -Delphi (Object Pascal) remains one of the most widely used languages for Windows desktop and enterprise applications. With an estimated **1.5–3 million active developers** and a strong presence in industries like healthcare, finance, logistics, and government, Delphi projects often involve large, long-lived codebases that benefit significantly from semantic code intelligence. - -Many Delphi codebases have grown over decades — making structural understanding, impact analysis, and cross-file navigation exactly the kind of tooling gap CodeGraph is designed to fill. - -Adding Delphi support positions CodeGraph as a uniquely valuable tool for a community that has historically been underserved by modern static analysis and AI-assisted development tools. - -## What Was Implemented - -### Pascal / Object Pascal (tree-sitter) - -Full extraction support for `.pas`, `.dpr`, `.dpk`, and `.lpr` files using the `tree-sitter-pascal` grammar: - -| Feature | NodeKind | Details | -|---------|----------|---------| -| Units / Programs | `module` | `unit`, `program`, `package`, `library` | -| Classes | `class` | Including inheritance and interface implementation | -| Records | `class` | Treated as classes (consistent with AST structure) | -| Interfaces | `interface` | With GUID support | -| Methods | `method` | Constructor, destructor, procedures, functions | -| Functions / Procedures | `function` | Top-level (non-class) routines | -| Properties | `property` | With read/write accessors | -| Fields | `field` | Class and record fields | -| Constants | `constant` | `const` declarations | -| Enums | `enum` | With enum members | -| Type Aliases | `type_alias` | `type TFoo = ...` | -| Uses / Imports | `import` | `uses` clause extraction | -| Function Calls | — | `calls` edges for call graph | -| Visibility | — | `public`, `private`, `protected` on methods/fields | -| Static Methods | — | `class function` / `class procedure` | -| Containment | — | `contains` edges (class → method, unit → type, etc.) | -| Inheritance | — | `extends` / `implements` edges | - -### DFM / FMX Form Files (custom extractor) - -Support for Delphi form files (`.dfm` for VCL, `.fmx` for FireMonkey) using a regex-based custom extractor — no tree-sitter grammar exists for this format: - -| Feature | NodeKind / EdgeKind | Details | -|---------|---------------------|---------| -| Components | `component` | `object Button1: TButton` | -| Nested hierarchy | `contains` | Panel1 → Button1 | -| Event handlers | `references` (unresolved) | `OnClick = Button1Click` → links UI to Pascal methods | -| `inherited` keyword | `component` | Inherited form components | -| Multi-line properties | — | Correctly skipped during parsing | -| Item collections | — | `...` blocks correctly handled | - -The DFM ↔ PAS linkage via event handlers enables **cross-file impact analysis**: renaming a method in `.pas` immediately reveals which UI components reference it. - -## Architecture - -The implementation follows CodeGraph's established patterns: - -- **Pascal extraction** uses the standard `TreeSitterExtractor` with a Pascal-specific `LanguageExtractor` configuration and a `visitPascalNode()` hook for AST nodes that require special handling (e.g., `declType` wrappers, `defProc` implementation bodies) -- **DFM/FMX extraction** uses a `DfmExtractor` class — analogous to `LiquidExtractor` and `SvelteExtractor` — that parses the line-based format with regex -- **Routing** in `extractFromSource()` dispatches `.dfm`/`.fmx` files to `DfmExtractor` before reaching the tree-sitter path -- **`tree-sitter-pascal`** is declared as an `optionalDependency` (consistent with all other grammars), pinned to a specific commit for reproducible builds - -## Performance Improvements - -Testing with a large Delphi codebase (~3,400 files, ~244k nodes) uncovered performance bottlenecks in the reference resolution pipeline. The following fixes **benefit all languages**, not just Pascal: - -| Fix | Scope | Impact | -|-----|-------|--------| -| **Fuzzy match index** — replaced O(n) linear scan with lazily-built case-insensitive `Map` index | `name-matcher.ts` (all languages) | O(1) lookup per ref instead of iterating all nodes | -| **Import mapping cache** — cached per-file import mappings instead of re-reading/re-parsing for every ref | `import-resolver.ts` (all languages) | Eliminated redundant file I/O during resolution | -| **Kind cache** — pre-populated `getNodesByKind` results during warm-up | `resolution/index.ts` (all languages) | Avoided repeated DB queries for the same node kinds | -| **Pascal built-in filtering** — skip known RTL/VCL/FMX identifiers before resolution | `resolution/index.ts` (Pascal-specific) | ~60 built-in identifiers filtered out early | -| **Method index for `defProc`** — replaced O(n) `find()` with `Map` lookup when linking implementation bodies to declarations | `tree-sitter.ts` (Pascal-specific) | O(1) per implementation body | -| **Delphi-specific excludes** — `__history/**`, `__recovery/**`, `*.dcu` added to default excludes | `types.ts` (Pascal-specific) | Skips Delphi IDE temp files during indexing | - -**Result:** Reference resolution on a large Delphi project dropped from **~30 minutes to ~15 seconds** (120x speedup). The general improvements (fuzzy index, import cache, kind cache) will benefit all CodeGraph users. - -## Files Changed - -| File | Change | -|------|--------| -| `src/types.ts` | Added `'pascal'` to `Language` type, file patterns to `DEFAULT_CONFIG.include` | -| `src/extraction/grammars.ts` | Grammar loader, extension mappings (`.pas`, `.dpr`, `.dpk`, `.lpr`, `.dfm`, `.fmx`), display name | -| `src/extraction/tree-sitter.ts` | Pascal `LanguageExtractor`, `visitPascalNode()` with 7 helper methods, `DfmExtractor` class, routing in `extractFromSource()`, method index | -| `src/resolution/index.ts` | Pascal built-in filtering, kind cache, cache clearing | -| `src/resolution/import-resolver.ts` | Import mapping cache | -| `src/resolution/name-matcher.ts` | Fuzzy match index (case-insensitive `Map`) | -| `package.json` | `tree-sitter-pascal` in `optionalDependencies` (pinned commit) | -| `__tests__/extraction.test.ts` | 37 new tests covering all Pascal and DFM extraction features | - -## Test Results - -- **36 new tests**, all passing -- **0 regressions** — the same 28 pre-existing failures (unrelated: missing Swift/Dart grammars, database path issues, MCP truncation test) are unchanged -- Tests cover: language detection, modules, imports, classes, records, interfaces, methods, visibility, static methods, enums, properties, constants, type aliases, calls, containment, full fixture files (UAuth.pas, UTypes.pas, MainForm.dfm) - -## Dependency Note - -The npm package `tree-sitter-pascal@0.0.1` is outdated (uses NAN bindings, incompatible with Node.js v24+). The implementation uses the actively maintained GitHub repository ([Isopod/tree-sitter-pascal](https://github.com/Isopod/tree-sitter-pascal), v0.10.2) with a pinned commit hash for deterministic builds. This is consistent with how `@sengac/tree-sitter-dart` handles a similar situation. - -## Testing Instructions - -### Prerequisites - -- Node.js >= 18 -- npm -- Git - -### 1. Clone and build - -```bash -git clone -b delphi-support https://github.com/omonien/codegraph.git -cd codegraph -npm install -npm run build -``` - -### 2. Link globally - -```bash -npm link -``` - -Verify with: - -```bash -codegraph --version -``` - -### 3. Index a Delphi project - -```bash -cd /path/to/your/delphi-project -codegraph init -i -codegraph index -``` - -### 4. Query the code graph - -```bash -codegraph status # Show index statistics -codegraph query "TFormMain" # Search for a symbol -codegraph context "What does TCustomer do?" # Build AI context -``` - -### 5. Set up the MCP server (for Claude Code) - -```bash -codegraph install -``` - -This configures the MCP server, tool permissions, auto-sync hooks, and CLAUDE.md in one step. After that, start Claude Code in the project — CodeGraph tools will be available immediately. - -### 6. Clean up - -```bash -npm unlink -g @colbymchenry/codegraph # Remove global link -rm -rf /path/to/delphi-project/.codegraph # Remove project index -``` diff --git a/IMPLEMENTATION_PLAN.md b/IMPLEMENTATION_PLAN.md deleted file mode 100644 index 65d99d82..00000000 --- a/IMPLEMENTATION_PLAN.md +++ /dev/null @@ -1,1736 +0,0 @@ -# CodeGraph: Universal Code Knowledge Graph - -## Overview - -CodeGraph is a local-first code intelligence system that builds a semantic knowledge graph from any codebase. It provides structural understanding of code relationships—not just text similarity—enabling AI assistants to understand how code connects, what depends on what, and what breaks when something changes. - -**Type:** Headless library (no UI components — purely an API) -**Runtime:** Node.js (works standalone, in Electron, or any Node environment) -**Distribution:** npm package, installable in any project -**Per-Project Data:** `.codegraph/` directory in each indexed project -**Core Principle:** Deterministic extraction from AST, not AI-generated summaries - -### Use Cases - -1. **Beads Dashboard** — Integrated as a library to provide code intelligence -2. **Claude Code CLI users** — Install globally, run `codegraph init` in any project -3. **Any Node.js application** — Import as a library for code analysis -4. **MCP Server** — Expose as an MCP tool that Claude Code can query directly - ---- - -## Goals - -1. **Universal language support** via tree-sitter (PHP, Swift, Kotlin, Java, TypeScript, Python, Liquid, Ruby, Go, Rust, C#, etc.) -2. **Zero external API dependencies** for core functionality (local embeddings, local database) -3. **Portable per-project installation** — each project gets its own `.codegraph/` directory -4. **Incremental updates** via git hooks and hash-based change detection -5. **Rich structural queries** — callers, callees, impact radius, dependency chains -6. **Semantic search** — vector similarity to find entry points, then graph expansion - ---- - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ CONSUMERS │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ -│ │ Beads │ │ Claude │ │ Any Node.js App │ │ -│ │ Dashboard │ │ Code CLI │ │ / MCP Server │ │ -│ │ (Electron) │ │ (Terminal) │ │ │ │ -│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │ -│ │ │ │ │ -│ └─────────────────┼──────────────────────┘ │ -│ │ │ -│ ▼ │ -├─────────────────────────────────────────────────────────────────┤ -│ CODEGRAPH LIBRARY │ -│ (npm package) │ -│ │ -│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ -│ │ Context │ │ Query │ │ Sync │ │ -│ │ Builder │ │ Engine │ │ Manager │ │ -│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────────┘ │ -│ │ │ │ │ -│ └────────────────┼─────────────────────┘ │ -│ │ │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────────┐│ -│ │ STORAGE LAYER ││ -│ │ SQLite + sqlite-vss (per project) ││ -│ │ .codegraph/graph.db ││ -│ └─────────────────────────────────────────────────────────────┘│ -│ ▲ │ -│ │ │ -│ ┌─────────────────────────────────────────────────────────────┐│ -│ │ EXTRACTION LAYER ││ -│ │ ││ -│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ ││ -│ │ │ Tree-sitter │ │ Reference │ │ Framework │ ││ -│ │ │ Parser │ │ Resolver │ │ Patterns │ ││ -│ │ └─────────────┘ └─────────────┘ └─────────────────────┘ ││ -│ └─────────────────────────────────────────────────────────────┘│ -│ ▲ │ -│ │ │ -│ ┌─────────────────────────────────────────────────────────────┐│ -│ │ EMBEDDING LAYER ││ -│ │ Local ONNX Runtime + nomic-embed ││ -│ └─────────────────────────────────────────────────────────────┘│ -│ │ -└─────────────────────────────────────────────────────────────────┘ - -Per-Project Installation (created by codegraph init): -┌─────────────────────────────────────────────────────────────────┐ -│ my-laravel-app/ │ -│ ├── .codegraph/ │ -│ │ ├── graph.db # SQLite database with vectors │ -│ │ ├── config.json # Project-specific settings │ -│ │ └── .gitignore # Ignore db, keep config │ -│ ├── .git/ │ -│ │ └── hooks/ │ -│ │ └── post-commit # Triggers incremental reindex │ -│ ├── app/ │ -│ ├── routes/ │ -│ └── ... │ -└─────────────────────────────────────────────────────────────────┘ -``` - ---- - -## File Structure (npm package) - -``` -codegraph/ -├── package.json -├── tsconfig.json -├── README.md -│ -├── src/ -│ ├── index.ts # Main CodeGraph class, public API -│ ├── types.ts # TypeScript interfaces -│ │ -│ ├── db/ -│ │ ├── index.ts # Database initialization -│ │ ├── schema.sql # Table definitions -│ │ ├── migrations.ts # Schema versioning -│ │ └── queries.ts # Prepared statements -│ │ -│ ├── extraction/ -│ │ ├── index.ts # Extraction orchestrator -│ │ ├── tree-sitter.ts # Universal parser wrapper -│ │ ├── grammars.ts # Grammar loading and caching -│ │ └── queries/ # Tree-sitter query files (.scm) -│ │ ├── typescript.scm -│ │ ├── javascript.scm -│ │ ├── php.scm -│ │ ├── swift.scm -│ │ ├── kotlin.scm -│ │ ├── java.scm -│ │ ├── python.scm -│ │ ├── ruby.scm -│ │ ├── liquid.scm -│ │ ├── go.scm -│ │ └── csharp.scm -│ │ -│ ├── resolution/ -│ │ ├── index.ts # Reference resolver orchestrator -│ │ ├── name-matcher.ts # Symbol name matching -│ │ ├── import-resolver.ts # Import path resolution -│ │ └── frameworks/ # Framework-specific patterns -│ │ ├── index.ts -│ │ ├── laravel.ts -│ │ ├── express.ts -│ │ ├── nextjs.ts -│ │ ├── rails.ts -│ │ ├── shopify.ts -│ │ ├── spring.ts -│ │ └── swiftui.ts -│ │ -│ ├── graph/ -│ │ ├── index.ts # Graph query interface -│ │ ├── traversal.ts # BFS/DFS, impact radius -│ │ └── serialize.ts # Subgraph to context format -│ │ -│ ├── vectors/ -│ │ ├── index.ts # Vector operations interface -│ │ ├── embedder.ts # ONNX runtime + model -│ │ └── search.ts # Similarity search -│ │ -│ ├── sync/ -│ │ ├── index.ts # Sync orchestrator -│ │ ├── git-hooks.ts # Hook installation -│ │ └── hasher.ts # Content hashing for diffing -│ │ -│ └── context/ -│ ├── index.ts # Context builder -│ └── formatter.ts # Output formatting for Claude -│ -├── bin/ -│ └── codegraph.ts # CLI entry point (optional standalone usage) -│ -└── __tests__/ # Test files mirror src structure - ├── extraction/ - ├── resolution/ - ├── graph/ - └── fixtures/ # Sample code files for testing -``` - ---- - -## Database Schema - -**File: `src/db/schema.sql`** - -```sql --- ============================================================ --- CODEGRAPH SCHEMA v1 --- ============================================================ - --- Metadata table for schema versioning and project info -CREATE TABLE IF NOT EXISTS meta ( - key TEXT PRIMARY KEY, - value TEXT NOT NULL -); - --- ============================================================ --- NODES: Every significant code entity --- ============================================================ -CREATE TABLE IF NOT EXISTS nodes ( - id TEXT PRIMARY KEY, -- Unique ID: "func:src/auth.ts:validateToken:45" - kind TEXT NOT NULL, -- file, function, method, class, interface, type, variable, route, component, config - name TEXT NOT NULL, -- Human-readable: "validateToken" - qualified_name TEXT, -- Full path: "AuthService.validateToken" - file_path TEXT NOT NULL, -- Relative path: "src/services/auth.ts" - start_line INTEGER, - end_line INTEGER, - start_column INTEGER, - end_column INTEGER, - language TEXT NOT NULL, -- typescript, php, swift, etc. - signature TEXT, -- For functions: "(token: string) => Promise" - docstring TEXT, -- Extracted documentation - code_snippet TEXT, -- First ~500 chars of code for quick preview - code_hash TEXT NOT NULL, -- SHA256 of full code block - metadata TEXT, -- JSON: extra language/framework-specific data - created_at INTEGER NOT NULL, - updated_at INTEGER NOT NULL -); - --- ============================================================ --- EDGES: Relationships between nodes --- ============================================================ -CREATE TABLE IF NOT EXISTS edges ( - id INTEGER PRIMARY KEY AUTOINCREMENT, - source_id TEXT NOT NULL, - target_id TEXT NOT NULL, - kind TEXT NOT NULL, -- imports, calls, extends, implements, returns_type, throws, reads, writes, renders, instantiates - resolved INTEGER DEFAULT 0, -- 0 = unresolved (name only), 1 = resolved to actual node - target_name TEXT, -- Original name before resolution (for unresolved edges) - line_number INTEGER, -- Where this relationship occurs - metadata TEXT, -- JSON: additional context - UNIQUE(source_id, target_id, kind, line_number), - FOREIGN KEY (source_id) REFERENCES nodes(id) ON DELETE CASCADE - -- Note: target_id may reference non-existent node if unresolved/external -); - --- ============================================================ --- FILES: Track file-level state for incremental updates --- ============================================================ -CREATE TABLE IF NOT EXISTS files ( - path TEXT PRIMARY KEY, -- Relative file path - content_hash TEXT NOT NULL, -- SHA256 of file contents - language TEXT NOT NULL, - last_indexed INTEGER NOT NULL, -- Unix timestamp - node_count INTEGER DEFAULT 0, - error TEXT -- Last indexing error, if any -); - --- ============================================================ --- VECTOR EMBEDDINGS (sqlite-vss) --- ============================================================ - --- Virtual table for vector similarity search --- Dimension 384 for nomic-embed-text-v1.5 -CREATE VIRTUAL TABLE IF NOT EXISTS node_vectors USING vss0( - embedding(384) -); - --- Map vector rowids to nodes -CREATE TABLE IF NOT EXISTS vector_map ( - rowid INTEGER PRIMARY KEY, - node_id TEXT NOT NULL UNIQUE, - text_hash TEXT NOT NULL, -- Hash of text that was embedded - FOREIGN KEY (node_id) REFERENCES nodes(id) ON DELETE CASCADE -); - --- ============================================================ --- INDEXES --- ============================================================ -CREATE INDEX IF NOT EXISTS idx_nodes_file ON nodes(file_path); -CREATE INDEX IF NOT EXISTS idx_nodes_kind ON nodes(kind); -CREATE INDEX IF NOT EXISTS idx_nodes_name ON nodes(name); -CREATE INDEX IF NOT EXISTS idx_nodes_language ON nodes(language); -CREATE INDEX IF NOT EXISTS idx_edges_source ON edges(source_id); -CREATE INDEX IF NOT EXISTS idx_edges_target ON edges(target_id); -CREATE INDEX IF NOT EXISTS idx_edges_kind ON edges(kind); -CREATE INDEX IF NOT EXISTS idx_edges_resolved ON edges(resolved); -``` - ---- - -## Type Definitions - -**File: `src/types.ts`** - -```typescript -// ============================================================ -// CORE TYPES -// ============================================================ - -export type NodeKind = - | 'file' - | 'function' - | 'method' - | 'class' - | 'interface' - | 'type' - | 'variable' - | 'constant' - | 'route' - | 'component' - | 'config' - | 'module' - | 'namespace'; - -export type EdgeKind = - | 'imports' - | 'exports' - | 'calls' - | 'called_by' // Reverse of calls, computed - | 'extends' - | 'implements' - | 'returns_type' - | 'throws' - | 'reads' - | 'writes' - | 'renders' // React/Vue component rendering - | 'instantiates' - | 'decorates' // Decorators/attributes - | 'depends_on'; // Generic dependency - -export type Language = - | 'typescript' - | 'javascript' - | 'php' - | 'swift' - | 'kotlin' - | 'java' - | 'python' - | 'ruby' - | 'go' - | 'rust' - | 'csharp' - | 'liquid' - | 'vue' - | 'svelte'; - -export interface Node { - id: string; - kind: NodeKind; - name: string; - qualifiedName?: string; - filePath: string; - startLine?: number; - endLine?: number; - startColumn?: number; - endColumn?: number; - language: Language; - signature?: string; - docstring?: string; - codeSnippet?: string; - codeHash: string; - metadata?: Record; - createdAt: number; - updatedAt: number; -} - -export interface Edge { - id?: number; - sourceId: string; - targetId: string; - kind: EdgeKind; - resolved: boolean; - targetName?: string; - lineNumber?: number; - metadata?: Record; -} - -export interface FileRecord { - path: string; - contentHash: string; - language: Language; - lastIndexed: number; - nodeCount: number; - error?: string; -} - -// ============================================================ -// EXTRACTION TYPES -// ============================================================ - -export interface ExtractionResult { - nodes: Node[]; - edges: Edge[]; - errors: ExtractionError[]; -} - -export interface ExtractionError { - filePath: string; - line?: number; - message: string; - recoverable: boolean; -} - -export interface UnresolvedReference { - sourceId: string; - targetName: string; - kind: EdgeKind; - lineNumber?: number; - context?: string; // Surrounding code for better resolution -} - -// ============================================================ -// QUERY TYPES -// ============================================================ - -export interface Subgraph { - nodes: Node[]; - edges: Edge[]; - entryPoints: string[]; // Node IDs that initiated the query - stats: { - totalNodes: number; - totalEdges: number; - maxDepth: number; - }; -} - -export interface TraversalOptions { - maxDepth?: number; // Default: 2 - maxNodes?: number; // Default: 50 - edgeKinds?: EdgeKind[]; // Filter by edge type - nodeKinds?: NodeKind[]; // Filter by node type - direction?: 'outbound' | 'inbound' | 'both'; -} - -export interface SearchOptions { - limit?: number; // Default: 10 - nodeKinds?: NodeKind[]; // Filter results - minScore?: number; // Similarity threshold -} - -export interface SearchResult { - node: Node; - score: number; -} - -// ============================================================ -// CONTEXT TYPES -// ============================================================ - -export interface Context { - subgraph: Subgraph; - codeBlocks: CodeBlock[]; - summary: string; - relatedFiles: string[]; -} - -export interface CodeBlock { - nodeId: string; - nodeName: string; - nodeKind: NodeKind; - filePath: string; - startLine: number; - endLine: number; - code: string; - language: Language; -} - -// ============================================================ -// CONFIG TYPES -// ============================================================ - -export interface CodeGraphConfig { - version: number; - projectName?: string; - languages: Language[]; - exclude: string[]; // Glob patterns to ignore - include?: string[]; // Override: only index these - frameworks: FrameworkHint[]; // Help with resolution - embeddingModel: 'nomic-embed-text-v1.5' | 'all-MiniLM-L6-v2'; - chunkStrategy: 'ast' | 'hybrid'; - maxFileSize: number; // Skip files larger than this (bytes) - gitHooksEnabled: boolean; -} - -export type FrameworkHint = - | 'laravel' - | 'express' - | 'nextjs' - | 'nuxt' - | 'rails' - | 'django' - | 'flask' - | 'spring' - | 'swiftui' - | 'uikit' - | 'android' - | 'shopify' - | 'react' - | 'vue' - | 'svelte'; - -export const DEFAULT_CONFIG: CodeGraphConfig = { - version: 1, - languages: [], - exclude: [ - 'node_modules/**', - 'vendor/**', - '.git/**', - 'dist/**', - 'build/**', - '*.min.js', - '*.bundle.js', - '__pycache__/**', - '.venv/**', - 'Pods/**', - '.gradle/**', - ], - frameworks: [], - embeddingModel: 'nomic-embed-text-v1.5', - chunkStrategy: 'ast', - maxFileSize: 1024 * 1024, // 1MB - gitHooksEnabled: true, -}; -``` - ---- - -## Public API - -**File: `src/index.ts`** - -```typescript -export class CodeGraph { - // ============================================================ - // LIFECYCLE - // ============================================================ - - /** - * Initialize CodeGraph for a project directory. - * Creates .codegraph/ if it doesn't exist. - */ - static async init(projectPath: string, config?: Partial): Promise; - - /** - * Open existing CodeGraph for a project. - * Throws if not initialized. - */ - static async open(projectPath: string): Promise; - - /** - * Check if a project has CodeGraph initialized. - */ - static async isInitialized(projectPath: string): Promise; - - /** - * Close database connections and cleanup. - */ - async close(): Promise; - - // ============================================================ - // INDEXING - // ============================================================ - - /** - * Full index of the entire project. - * Use for initial setup or complete rebuild. - */ - async indexAll(options?: { - onProgress?: (progress: IndexProgress) => void; - signal?: AbortSignal; - }): Promise; - - /** - * Index specific files only. - * Use for incremental updates. - */ - async indexFiles(filePaths: string[]): Promise; - - /** - * Sync with current file state. - * Detects changes via content hashing, reindexes only changed files. - */ - async sync(): Promise; - - /** - * Get current index status. - */ - async getStatus(): Promise; - - // ============================================================ - // GRAPH QUERIES - // ============================================================ - - /** - * Get a node by ID. - */ - async getNode(nodeId: string): Promise; - - /** - * Find nodes by name (exact or fuzzy). - */ - async findNodes(query: string, options?: { - fuzzy?: boolean; - kinds?: NodeKind[]; - limit?: number; - }): Promise; - - /** - * Get all edges from/to a node. - */ - async getEdges(nodeId: string, direction?: 'outbound' | 'inbound' | 'both'): Promise; - - /** - * Get nodes that call this node. - */ - async getCallers(nodeId: string): Promise; - - /** - * Get nodes that this node calls. - */ - async getCallees(nodeId: string): Promise; - - /** - * Get nodes that this node depends on. - */ - async getDependencies(nodeId: string): Promise; - - /** - * Get nodes that depend on this node. - */ - async getDependents(nodeId: string): Promise; - - /** - * Traverse the graph from starting nodes. - * Returns a subgraph of connected nodes up to maxDepth. - */ - async traverse(startNodeIds: string[], options?: TraversalOptions): Promise; - - /** - * Get impact radius: what could be affected by changing this node. - */ - async getImpactRadius(nodeId: string, options?: TraversalOptions): Promise; - - /** - * Find paths between two nodes. - */ - async findPaths(fromId: string, toId: string, options?: { - maxDepth?: number; - maxPaths?: number; - }): Promise; - - // ============================================================ - // SEMANTIC SEARCH - // ============================================================ - - /** - * Search for nodes by semantic similarity. - */ - async search(query: string, options?: SearchOptions): Promise; - - /** - * Find relevant subgraph for a natural language query. - * Combines semantic search with graph traversal. - */ - async findRelevantContext(query: string, options?: { - searchLimit?: number; - traversalDepth?: number; - maxNodes?: number; - }): Promise; - - // ============================================================ - // CONTEXT BUILDING - // ============================================================ - - /** - * Build context for a task/issue. - * Returns structured context ready to inject into Claude. - */ - async buildContext(input: string | { title: string; description?: string }, options?: { - maxNodes?: number; - includeCode?: boolean; - format?: 'markdown' | 'json'; - }): Promise; - - /** - * Get the full code for a node. - */ - async getCode(nodeId: string): Promise; - - // ============================================================ - // GIT INTEGRATION - // ============================================================ - - /** - * Install git hooks for automatic incremental indexing. - */ - async installGitHooks(): Promise; - - /** - * Remove git hooks. - */ - async removeGitHooks(): Promise; - - /** - * Get files changed since last index. - */ - async getChangedFiles(): Promise; - - // ============================================================ - // UTILITIES - // ============================================================ - - /** - * Get statistics about the indexed codebase. - */ - async getStats(): Promise; - - /** - * Export the graph to JSON. - */ - async export(): Promise; - - /** - * Update configuration. - */ - async updateConfig(config: Partial): Promise; - - /** - * Get current configuration. - */ - getConfig(): CodeGraphConfig; -} - -// ============================================================ -// RESULT TYPES -// ============================================================ - -export interface IndexProgress { - phase: 'scanning' | 'parsing' | 'resolving' | 'embedding'; - current: number; - total: number; - currentFile?: string; -} - -export interface IndexResult { - success: boolean; - filesIndexed: number; - nodesCreated: number; - edgesCreated: number; - errors: ExtractionError[]; - duration: number; -} - -export interface SyncResult { - filesChecked: number; - filesChanged: number; - filesAdded: number; - filesRemoved: number; - nodesUpdated: number; - duration: number; -} - -export interface IndexStatus { - initialized: boolean; - lastIndexed?: number; - totalFiles: number; - totalNodes: number; - totalEdges: number; - languages: Language[]; - unresolvedReferences: number; -} - -export interface GraphStats { - files: number; - nodes: { - total: number; - byKind: Record; - byLanguage: Record; - }; - edges: { - total: number; - byKind: Record; - resolved: number; - unresolved: number; - }; - vectors: number; -} - -export interface Path { - nodes: Node[]; - edges: Edge[]; - length: number; -} - -export interface ExportedGraph { - version: number; - exportedAt: number; - config: CodeGraphConfig; - stats: GraphStats; - nodes: Node[]; - edges: Edge[]; -} -``` - ---- - -## Tree-sitter Extraction Queries - -These `.scm` files define what to extract from each language. - -**File: `src/extraction/queries/typescript.scm`** - -```scheme -; ============================================================ -; TYPESCRIPT/JAVASCRIPT EXTRACTION QUERIES -; ============================================================ - -; Functions -(function_declaration - name: (identifier) @function.name - parameters: (formal_parameters) @function.params - return_type: (type_annotation)? @function.return_type - body: (statement_block) @function.body -) @function.definition - -; Arrow functions assigned to variables -(lexical_declaration - (variable_declarator - name: (identifier) @function.name - value: (arrow_function - parameters: (formal_parameters) @function.params - return_type: (type_annotation)? @function.return_type - body: (_) @function.body - ) - ) -) @function.definition - -; Classes -(class_declaration - name: (type_identifier) @class.name - (class_heritage - (extends_clause - value: (identifier) @class.extends - )? - (implements_clause - (type_identifier) @class.implements - )* - )? - body: (class_body) @class.body -) @class.definition - -; Methods -(method_definition - name: (property_identifier) @method.name - parameters: (formal_parameters) @method.params - return_type: (type_annotation)? @method.return_type - body: (statement_block) @method.body -) @method.definition - -; Interfaces -(interface_declaration - name: (type_identifier) @interface.name - (extends_type_clause - (type_identifier) @interface.extends - )? - body: (interface_body) @interface.body -) @interface.definition - -; Type aliases -(type_alias_declaration - name: (type_identifier) @type.name - value: (_) @type.value -) @type.definition - -; Imports -(import_statement - (import_clause - (identifier)? @import.default - (named_imports - (import_specifier - name: (identifier) @import.named - alias: (identifier)? @import.alias - )* - )? - )? - source: (string) @import.source -) @import.statement - -; Exports -(export_statement - (export_clause - (export_specifier - name: (identifier) @export.name - )* - )? - declaration: (_)? @export.declaration -) @export.statement - -; Function calls -(call_expression - function: [ - (identifier) @call.function - (member_expression - object: (_) @call.object - property: (property_identifier) @call.method - ) - ] - arguments: (arguments) @call.args -) @call.expression - -; Variable declarations (const/let with significant values) -(lexical_declaration - (variable_declarator - name: (identifier) @variable.name - value: (_) @variable.value - ) -) @variable.declaration - -; JSDoc comments -(comment) @comment -``` - -**File: `src/extraction/queries/php.scm`** - -```scheme -; ============================================================ -; PHP EXTRACTION QUERIES -; ============================================================ - -; Classes -(class_declaration - name: (name) @class.name - (base_clause - (name) @class.extends - )? - (class_interface_clause - (name) @class.implements - )* - body: (declaration_list) @class.body -) @class.definition - -; Methods -(method_declaration - (visibility_modifier)? @method.visibility - name: (name) @method.name - parameters: (formal_parameters) @method.params - return_type: (return_type)? @method.return_type - body: (compound_statement) @method.body -) @method.definition - -; Functions -(function_definition - name: (name) @function.name - parameters: (formal_parameters) @function.params - return_type: (return_type)? @function.return_type - body: (compound_statement) @function.body -) @function.definition - -; Interfaces -(interface_declaration - name: (name) @interface.name - (base_clause - (name) @interface.extends - )? - body: (declaration_list) @interface.body -) @interface.definition - -; Traits -(trait_declaration - name: (name) @trait.name - body: (declaration_list) @trait.body -) @trait.definition - -; Use statements (imports) -(namespace_use_declaration - (namespace_use_clause - (qualified_name) @import.name - (namespace_aliasing_clause - (name) @import.alias - )? - ) -) @import.statement - -; Static method calls (e.g., User::find()) -(scoped_call_expression - scope: (name) @call.class - name: (name) @call.method - arguments: (arguments) @call.args -) @call.static - -; Instance method calls -(member_call_expression - object: (_) @call.object - name: (name) @call.method - arguments: (arguments) @call.args -) @call.instance - -; Function calls -(function_call_expression - function: (name) @call.function - arguments: (arguments) @call.args -) @call.expression - -; Route definitions (Laravel-specific pattern) -(member_call_expression - object: (name) @_route (#eq? @_route "Route") - name: (name) @route.method - arguments: (arguments - (argument - (string) @route.path - ) - ) -) @route.definition - -; PHPDoc comments -(comment) @comment -``` - -**File: `src/extraction/queries/swift.scm`** - -```scheme -; ============================================================ -; SWIFT EXTRACTION QUERIES -; ============================================================ - -; Classes -(class_declaration - name: (type_identifier) @class.name - (type_inheritance_clause - (type_identifier) @class.inherits - )? - body: (class_body) @class.body -) @class.definition - -; Structs -(struct_declaration - name: (type_identifier) @struct.name - (type_inheritance_clause - (type_identifier) @struct.conforms - )? - body: (struct_body) @struct.body -) @struct.definition - -; Protocols -(protocol_declaration - name: (type_identifier) @protocol.name - body: (protocol_body) @protocol.body -) @protocol.definition - -; Functions -(function_declaration - name: (simple_identifier) @function.name - (parameter_clause) @function.params - (function_result - (type_annotation) @function.return_type - )? - body: (function_body) @function.body -) @function.definition - -; Methods (inside class/struct) -(function_declaration - name: (simple_identifier) @method.name - (parameter_clause) @method.params - body: (function_body) @method.body -) @method.definition - -; Properties -(property_declaration - (pattern - (simple_identifier) @property.name - ) - (type_annotation)? @property.type -) @property.definition - -; Imports -(import_declaration - (identifier) @import.module -) @import.statement - -; Function calls -(call_expression - (simple_identifier) @call.function - (call_suffix - (value_arguments) @call.args - ) -) @call.expression - -; Method calls -(call_expression - (navigation_expression - (_) @call.object - (navigation_suffix - (simple_identifier) @call.method - ) - ) - (call_suffix - (value_arguments) @call.args - ) -) @call.method - -; SwiftUI View bodies -(computed_property - name: (simple_identifier) @_body (#eq? @_body "body") - (type_annotation - (user_type - (type_identifier) @_view (#match? @_view "View") - ) - )? - getter: (_) @view.body -) @view.definition - -; Documentation comments -(comment) @comment -(multiline_comment) @comment.multiline -``` - ---- - -## Framework Pattern Resolvers - -**File: `src/resolution/frameworks/laravel.ts`** - -```typescript -import { FrameworkResolver, UnresolvedReference, ResolvedReference } from '../types'; - -export const laravelResolver: FrameworkResolver = { - name: 'laravel', - - // Detect if this is a Laravel project - detect: async (projectPath: string): Promise => { - return await fileExists(join(projectPath, 'artisan')); - }, - - patterns: [ - // Eloquent Model static calls: User::find(), Post::where() - { - pattern: /^([A-Z][a-zA-Z]+)::(\w+)$/, - resolve: async (match, context) => { - const [, className, methodName] = match; - - // Check app/Models first (Laravel 8+) - let modelPath = `app/Models/${className}.php`; - if (await context.fileExists(modelPath)) { - return { filePath: modelPath, className, methodName }; - } - - // Fall back to app/ (Laravel 7 and below) - modelPath = `app/${className}.php`; - if (await context.fileExists(modelPath)) { - return { filePath: modelPath, className, methodName }; - } - - return null; - } - }, - - // Facade calls: Auth::user(), Cache::get() - { - pattern: /^(Auth|Cache|DB|Log|Mail|Queue|Session|Storage|Validator)::(\w+)$/, - resolve: async (match, context) => { - const [, facade, method] = match; - // Facades resolve to underlying service - we can link to the facade for now - return { - filePath: `vendor/laravel/framework/src/Illuminate/Support/Facades/${facade}.php`, - className: facade, - methodName: method, - isExternal: true - }; - } - }, - - // Route helpers: route('checkout.store') - { - pattern: /route\(['"]([^'"]+)['"]\)/, - resolve: async (match, context) => { - const [, routeName] = match; - // Search routes/web.php and routes/api.php for ->name('routeName') - const routeFiles = ['routes/web.php', 'routes/api.php']; - for (const file of routeFiles) { - const content = await context.readFile(file); - if (content?.includes(`name('${routeName}')`)) { - return { filePath: file, routeName }; - } - } - return null; - } - }, - - // View helpers: view('checkout.form') - { - pattern: /view\(['"]([^'"]+)['"]\)/, - resolve: async (match, context) => { - const [, viewName] = match; - const viewPath = viewName.replace(/\./g, '/'); - - // Check both .blade.php and .php - const candidates = [ - `resources/views/${viewPath}.blade.php`, - `resources/views/${viewPath}.php` - ]; - - for (const candidate of candidates) { - if (await context.fileExists(candidate)) { - return { filePath: candidate, viewName }; - } - } - return null; - } - }, - - // Controller references in routes - { - pattern: /\[([A-Z][a-zA-Z]+Controller)::class,\s*['"](\w+)['"]\]/, - resolve: async (match, context) => { - const [, controller, method] = match; - const controllerPath = `app/Http/Controllers/${controller}.php`; - if (await context.fileExists(controllerPath)) { - return { filePath: controllerPath, className: controller, methodName: method }; - } - return null; - } - } - ], - - // Additional node detection specific to Laravel - extractNodes: async (filePath: string, content: string) => { - const nodes: Node[] = []; - - // Detect route definitions - const routePattern = /Route::(get|post|put|patch|delete)\(\s*['"]([^'"]+)['"]/g; - let match; - while ((match = routePattern.exec(content)) !== null) { - const [, method, path] = match; - const line = content.slice(0, match.index).split('\n').length; - nodes.push({ - id: `route:${filePath}:${method.toUpperCase()}:${path}`, - kind: 'route', - name: `${method.toUpperCase()} ${path}`, - filePath, - startLine: line, - language: 'php', - metadata: { httpMethod: method.toUpperCase(), path } - }); - } - - return nodes; - } -}; -``` - -**File: `src/resolution/frameworks/shopify.ts`** - -```typescript -import { FrameworkResolver } from '../types'; - -export const shopifyResolver: FrameworkResolver = { - name: 'shopify', - - detect: async (projectPath: string): Promise => { - return await fileExists(join(projectPath, 'shopify.theme.toml')) || - await fileExists(join(projectPath, 'config/settings_schema.json')); - }, - - patterns: [ - // Render tags: {% render 'product-card' %} - { - pattern: /\{%\s*render\s+['"]([^'"]+)['"]/, - resolve: async (match, context) => { - const [, snippetName] = match; - const snippetPath = `snippets/${snippetName}.liquid`; - if (await context.fileExists(snippetPath)) { - return { filePath: snippetPath, kind: 'renders' }; - } - return null; - } - }, - - // Include tags: {% include 'header' %} - { - pattern: /\{%\s*include\s+['"]([^'"]+)['"]/, - resolve: async (match, context) => { - const [, snippetName] = match; - const snippetPath = `snippets/${snippetName}.liquid`; - if (await context.fileExists(snippetPath)) { - return { filePath: snippetPath, kind: 'includes' }; - } - return null; - } - }, - - // Section tags: {% section 'header' %} - { - pattern: /\{%\s*section\s+['"]([^'"]+)['"]/, - resolve: async (match, context) => { - const [, sectionName] = match; - const sectionPath = `sections/${sectionName}.liquid`; - if (await context.fileExists(sectionPath)) { - return { filePath: sectionPath, kind: 'renders' }; - } - return null; - } - }, - - // Asset URLs: {{ 'style.css' | asset_url }} - { - pattern: /['"]([\w\-\.]+)['"]\s*\|\s*asset_url/, - resolve: async (match, context) => { - const [, assetName] = match; - const assetPath = `assets/${assetName}`; - if (await context.fileExists(assetPath)) { - return { filePath: assetPath, kind: 'references' }; - } - return null; - } - } - ], - - extractNodes: async (filePath: string, content: string) => { - const nodes: Node[] = []; - - // Detect schema in sections - const schemaMatch = content.match(/\{%\s*schema\s*%\}([\s\S]*?)\{%\s*endschema\s*%\}/); - if (schemaMatch) { - try { - const schema = JSON.parse(schemaMatch[1]); - if (schema.name) { - nodes.push({ - id: `section:${filePath}`, - kind: 'component', - name: schema.name, - filePath, - language: 'liquid', - metadata: { - schemaSettings: schema.settings?.map(s => s.id), - schemaBlocks: schema.blocks?.map(b => b.type) - } - }); - } - } catch (e) { - // Invalid JSON in schema - } - } - - return nodes; - } -}; -``` - ---- - -## Context Builder Output Format - -**File: `src/context/formatter.ts`** - -```typescript -export function formatContextAsMarkdown(context: Context): string { - const lines: string[] = []; - - lines.push('## Code Context\n'); - - // Graph structure section - lines.push('### Structure\n'); - lines.push('```'); - for (const nodeId of context.subgraph.entryPoints) { - const node = context.subgraph.nodes.find(n => n.id === nodeId); - if (node) { - lines.push(formatNodeTree(node, context.subgraph, 0)); - } - } - lines.push('```\n'); - - // Code blocks section - if (context.codeBlocks.length > 0) { - lines.push('### Code\n'); - for (const block of context.codeBlocks) { - lines.push(`#### ${block.nodeName} (${block.filePath}:${block.startLine})\n`); - lines.push('```' + block.language); - lines.push(block.code); - lines.push('```\n'); - } - } - - // Related files section - if (context.relatedFiles.length > 0) { - lines.push('### Related Files\n'); - for (const file of context.relatedFiles) { - lines.push(`- ${file}`); - } - } - - return lines.join('\n'); -} - -function formatNodeTree(node: Node, subgraph: Subgraph, depth: number): string { - const indent = ' '.repeat(depth); - const lines: string[] = []; - - // Node header - const location = node.startLine ? `:${node.startLine}` : ''; - lines.push(`${indent}${node.name} (${node.filePath}${location})`); - - // Outbound edges - const outbound = subgraph.edges.filter(e => e.sourceId === node.id); - for (const edge of outbound) { - const target = subgraph.nodes.find(n => n.id === edge.targetId); - const targetName = target?.name || edge.targetName || 'unknown'; - lines.push(`${indent}├── ${edge.kind} → ${targetName}`); - } - - return lines.join('\n'); -} - -// Example output: -// -// ## Code Context -// -// ### Structure -// ``` -// CheckoutController (app/Http/Controllers/CheckoutController.php:15) -// ├── calls → CartService.getCart -// ├── calls → PaymentService.processPayment -// ├── calls → OrderService.create -// ├── throws → PaymentException -// -// PaymentService (app/Services/PaymentService.php:8) -// ├── calls → StripeClient.charge -// ├── calls → TransactionRepository.save -// ├── throws → PaymentException -// ├── throws → StripeTimeoutException -// ``` -// -// ### Code -// -// #### store (app/Http/Controllers/CheckoutController.php:45) -// ```php -// public function store(Request $request) -// { -// $cart = $this->cartService->getCart($request->user()); -// $payment = $this->paymentService->processPayment($cart); -// ... -// } -// ``` -``` - ---- - -## Installation & Integration - -**How to use CodeGraph (headless library, no UI):** - -### Option 1: CLI (for any project, no code required) - -```bash -# Install globally -npm install -g codegraph - -# Initialize in any project -cd /path/to/my-laravel-app -codegraph init - -# Index the codebase -codegraph index - -# Query the graph -codegraph query "what calls PaymentService" -codegraph impact "app/Services/AuthService.php" - -# Build context for a task (outputs markdown) -codegraph context "Fix checkout silent failure" - -# Check status -codegraph status - -# Sync after changes -codegraph sync -``` - -### Option 2: Library (for integration into apps like Beads Dashboard) - -```typescript -import { CodeGraph } from 'codegraph'; - -// Initialize for a project -const graph = await CodeGraph.init('/path/to/project'); - -// Full index with optional progress callback -await graph.indexAll({ - onProgress: (progress) => { - console.log(`${progress.phase}: ${progress.current}/${progress.total}`); - } -}); - -// Or open existing and sync -const graph = await CodeGraph.open('/path/to/project'); -const syncResult = await graph.sync(); - -// Build context for a task (returns structured data) -const context = await graph.buildContext('Fix checkout silent failure'); - -// Query the graph directly -const callers = await graph.getCallers('func:src/payment.ts:processPayment:45'); -const impact = await graph.getImpactRadius('class:AuthService', { maxDepth: 2 }); - -// Search semantically -const results = await graph.search('authentication middleware'); - -// Clean up -await graph.close(); -``` - -### Option 3: MCP Server (for Claude Code CLI integration) - -```bash -# Run as MCP server (Claude Code can query directly) -codegraph serve --mcp - -# In Claude Code's MCP config, add: -# { -# "codegraph": { -# "command": "codegraph", -# "args": ["serve", "--mcp", "--project", "/path/to/project"] -# } -# } -``` - -Then Claude Code can use tools like: -- `codegraph_search` — semantic search -- `codegraph_context` — build context for a task -- `codegraph_callers` — who calls this function -- `codegraph_impact` — what's affected if I change this - -**What gets created in the project:** - -``` -my-project/ -├── .codegraph/ -│ ├── graph.db # SQLite database (gitignored) -│ ├── config.json # User can customize (committed) -│ └── .gitignore # Contains: graph.db -└── .git/ - └── hooks/ - └── post-commit # Auto-installed hook -``` - -**Default `.codegraph/config.json`:** - -```json -{ - "version": 1, - "exclude": [ - "node_modules/**", - "vendor/**", - "dist/**", - "build/**" - ], - "frameworks": ["laravel"], - "gitHooksEnabled": true -} -``` - ---- - -## Implementation Phases - -### Phase 1: Foundation (Week 1) -- [ ] Project structure setup (npm package) -- [ ] SQLite database initialization with schema -- [ ] Basic types and interfaces -- [ ] Config file handling -- [ ] .codegraph/ directory management - -### Phase 2: Tree-sitter Extraction (Week 1-2) -- [ ] Tree-sitter native bindings setup (works in Node.js, Electron, etc.) -- [ ] Grammar loading system -- [ ] TypeScript/JavaScript extraction queries -- [ ] PHP extraction queries -- [ ] Basic node/edge extraction from AST - -### Phase 3: Reference Resolution (Week 2) -- [ ] Name-based symbol matching -- [ ] Import path resolution -- [ ] Laravel framework patterns -- [ ] Express/Next.js patterns -- [ ] Unresolved reference tracking - -### Phase 4: Graph Queries (Week 2-3) -- [ ] Basic traversal (callers, callees) -- [ ] Impact radius calculation -- [ ] Path finding between nodes -- [ ] Subgraph extraction - -### Phase 5: Vector Embeddings (Week 3) -- [ ] ONNX runtime integration -- [ ] nomic-embed-text model loading -- [ ] sqlite-vss setup -- [ ] Embedding generation for nodes -- [ ] Similarity search - -### Phase 6: Context Builder (Week 3-4) -- [ ] Semantic search → graph expansion pipeline -- [ ] Context formatting for Claude -- [ ] Code snippet extraction -- [ ] Output size management - -### Phase 7: Sync & Freshness (Week 4) -- [ ] Content hashing for change detection -- [ ] Incremental reindexing -- [ ] Git hook installation -- [ ] Post-commit handler - -### Phase 8: Additional Languages (Week 4+) -- [ ] Swift extraction queries -- [ ] Kotlin extraction queries -- [ ] Java extraction queries -- [ ] Liquid/Shopify patterns -- [ ] Ruby/Rails patterns - -### Phase 9: Polish & Hardening (Week 5) -- [ ] Error handling and recovery -- [ ] Performance optimization -- [ ] Memory management for large codebases -- [ ] Concurrent indexing safety -- [ ] API documentation and JSDoc comments - -### Phase 10: CLI (Week 5-6, Optional) -- [ ] CLI argument parsing (commander or yargs) -- [ ] `codegraph init` command -- [ ] `codegraph index` command -- [ ] `codegraph query` command -- [ ] `codegraph context` command -- [ ] `codegraph status` command -- [ ] `codegraph sync` command - -### Phase 11: MCP Server (Week 6, Optional) -- [ ] MCP protocol implementation -- [ ] `codegraph_search` tool -- [ ] `codegraph_context` tool -- [ ] `codegraph_callers` / `codegraph_callees` tools -- [ ] `codegraph_impact` tool -- [ ] Stdio transport for Claude Code integration - ---- - -## Testing Strategy - -```typescript -// Example test structure - -describe('CodeGraph', () => { - describe('extraction', () => { - it('extracts functions from TypeScript', async () => { - const code = ` - export function processPayment(amount: number): Promise { - return stripe.charge(amount); - } - `; - const result = await extract(code, 'typescript'); - - expect(result.nodes).toContainEqual(expect.objectContaining({ - kind: 'function', - name: 'processPayment', - signature: '(amount: number): Promise' - })); - - expect(result.edges).toContainEqual(expect.objectContaining({ - kind: 'calls', - targetName: 'stripe.charge' - })); - }); - - it('extracts Laravel routes from PHP', async () => { - const code = ` - Route::post('/checkout', [CheckoutController::class, 'store'])->name('checkout.store'); - `; - const result = await extract(code, 'php'); - - expect(result.nodes).toContainEqual(expect.objectContaining({ - kind: 'route', - name: 'POST /checkout' - })); - }); - }); - - describe('resolution', () => { - it('resolves Laravel model calls', async () => { - const graph = await createTestGraph({ - 'app/Models/User.php': 'class User extends Model { public static function find($id) {} }', - 'app/Http/Controllers/UserController.php': 'User::find($id);' - }); - - const edges = await graph.getEdges('controller:UserController:show'); - expect(edges).toContainEqual(expect.objectContaining({ - kind: 'calls', - targetId: 'method:app/Models/User.php:find', - resolved: true - })); - }); - }); - - describe('traversal', () => { - it('finds impact radius', async () => { - const graph = await createTestGraph(/* ... */); - const subgraph = await graph.getImpactRadius('class:PaymentService', { maxDepth: 2 }); - - expect(subgraph.nodes.map(n => n.name)).toContain('CheckoutController'); - expect(subgraph.nodes.map(n => n.name)).toContain('OrderService'); - }); - }); -}); -``` - ---- - -## Open Questions / Decisions Needed - -1. **Embedding model size vs quality**: nomic-embed-text-v1.5 (275MB) vs all-MiniLM-L6-v2 (90MB)? - -2. **Tree-sitter WASM vs native**: WASM is easier for Electron distribution, native is faster. Start with WASM? - -3. **Max context size**: How many nodes/code blocks before we truncate? Configurable? - -4. **Unresolved references**: Show them in context (with "unresolved" marker) or hide them? - -5. **Multi-language projects**: Projects mixing PHP + JS + Liquid — handle all simultaneously? - -6. **Binary/asset files**: Track references to images, fonts, etc. or ignore? - ---- - -## Success Criteria - -1. **Accuracy**: >90% of function calls correctly linked to definitions -2. **Speed**: Full index of 10k file project in <60 seconds -3. **Freshness**: Incremental update after commit in <5 seconds -4. **Context quality**: Generated context helps Claude solve issues faster (qualitative) -5. **Portability**: Works on any macOS machine without additional setup - ---- - -## Resources - -- Tree-sitter: https://tree-sitter.github.io/tree-sitter/ -- Tree-sitter WASM: https://github.com/nicolo-ribaudo/nicolo-nicolo-tree-sitter/tree-sitter-wasm-builds/tree/main -- sqlite-vss: https://github.com/asg017/sqlite-vss -- nomic-embed: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5 -- ONNX Runtime Node: https://onnxruntime.ai/docs/get-started/with-javascript.html diff --git a/debug_python_ast.js b/debug_python_ast.js deleted file mode 100644 index edfff62f..00000000 --- a/debug_python_ast.js +++ /dev/null @@ -1,26 +0,0 @@ -const { getParser, initGrammars, loadAllGrammars } = require('./dist/extraction/grammars'); - -(async () => { - await initGrammars(); - await loadAllGrammars(); - - const parser = getParser('python'); - - const code = `class Child(Parent): - pass`; - - const tree = parser.parse(code); - - function walk(node, depth = 0) { - const indent = ' '.repeat(depth); - const preview = node.text.substring(0, 30).replace(/\n/g, '\\n'); - console.log(`${indent}${node.type} [${node.startPosition.row}:${node.startPosition.column}] "${preview}"`); - - for (let i = 0; i < node.namedChildCount; i++) { - const child = node.namedChild(i); - if (child) walk(child, depth + 1); - } - } - - walk(tree.rootNode); -})(); diff --git a/debug_python_ast2.js b/debug_python_ast2.js deleted file mode 100644 index b92d5f0b..00000000 --- a/debug_python_ast2.js +++ /dev/null @@ -1,26 +0,0 @@ -const { getParser, initGrammars, loadAllGrammars } = require('./dist/extraction/grammars'); - -(async () => { - await initGrammars(); - await loadAllGrammars(); - - const parser = getParser('python'); - - const code = `class Child(Parent, Mixin, Base): - pass`; - - const tree = parser.parse(code); - - function walk(node, depth = 0) { - const indent = ' '.repeat(depth); - const preview = node.text.substring(0, 40).replace(/\n/g, '\\n'); - console.log(`${indent}${node.type} "${preview}"`); - - for (let i = 0; i < node.namedChildCount; i++) { - const child = node.namedChild(i); - if (child) walk(child, depth + 1); - } - } - - walk(tree.rootNode); -})(); diff --git a/run-interactive-test.md b/run-interactive-test.md deleted file mode 100644 index 448c9e62..00000000 --- a/run-interactive-test.md +++ /dev/null @@ -1,131 +0,0 @@ -# Running the agent-behavior test (how agents actually use codegraph) - -This explains how to measure **how a Claude Code agent uses the codegraph MCP -tools** on a real repo — which tools it calls (does it lead with -`codegraph_explore`?), how many follow-up `Read`/`Grep`s it does, and the token -cost. Use it when changing tool guidance (`server-instructions.ts`, -`instructions-template.ts`, tool descriptions) or retrieval, to verify the -change actually shifts agent behavior. - -Scripts live in `scripts/agent-eval/`. - -## Why two harnesses (read this first) - -| | Interactive (`itrun.sh`) | Headless (`run-agent.sh`) | -|---|---|---| -| Drives | the real TUI via tmux | `claude -p` print mode | -| Subagent it picks | **Explore** (matches real UX) | general-purpose (diverges) | -| Metrics | tool breakdown (from session logs) + `Done(…)` token summary | exact per-tool calls + tokens/cost (stream-json) | -| Cost | Claude Max subscription | API $ (`total_cost_usd`) | - -**Headless `claude -p` does NOT reproduce what users see** — it silently picks -the general-purpose subagent, while interactive sessions delegate to the -read-first **Explore** subagent. So for "what does my session actually do," use -the interactive harness. For a clean per-tool/token breakdown in one shot, use -headless (and ask for the Explore subagent in the prompt if you want that path). - -## Prerequisites - -- **tmux 3.0+** -- A logged-in `claude` CLI (Claude Max or API). -- codegraph configured as an MCP server (`claude mcp list` shows `codegraph`). - The interactive harness uses your global config, so it runs whatever - `codegraph` resolves to — point that at your dev build (`npm link` / the - symlinked global) to test local changes. -- A target repo, cloned and indexed: - ```bash - git clone --depth 1 https://github.com/square/okhttp /tmp/corpus/okhttp - cd /tmp/corpus/okhttp && codegraph init -i - ``` - Good scale spread for a sweep: Alamofire (~100 files), Excalidraw (~600), - OkHttp (~640), VS Code (~10k). - -## Interactive test (the faithful one) - -```bash -scripts/agent-eval/itrun.sh
@@ -40,7 +41,7 @@ npx @colbymchenry/codegraph # zero-install, or: npm i -g @colbymchenry/codegraph ``` -CodeGraph bundles its own runtime — nothing to compile, no native build, works the same everywhere. The interactive installer auto-configures your agent(s) — Claude Code, Cursor, Codex CLI, opencode. +CodeGraph bundles its own runtime — nothing to compile, no native build, works the same everywhere. The interactive installer auto-configures your agent(s) — Claude Code, Cursor, Codex CLI, opencode, Hermes Agent. ### Initialize Projects @@ -159,7 +160,7 @@ npx @colbymchenry/codegraph ``` The installer will: -- Ask which agent(s) to configure — auto-detects installed ones from: **Claude Code**, **Cursor**, **Codex CLI**, **opencode** +- Ask which agent(s) to configure — auto-detects installed ones from: **Claude Code**, **Cursor**, **Codex CLI**, **opencode**, **Hermes Agent** - Prompt to install `codegraph` on your PATH (so agents can launch the MCP server) - Ask whether configs apply to all your projects or just this one - Write each chosen agent's MCP server config + an instructions file (e.g. `CLAUDE.md`, `.cursor/rules/codegraph.mdc`, `~/.codex/AGENTS.md`) @@ -185,7 +186,7 @@ codegraph install --print-config codex # print snippet, no file wr ### 2. Restart Your Agent -Restart your agent (Claude Code / Cursor / Codex CLI / opencode) for the MCP server to load. +Restart your agent (Claude Code / Cursor / Codex CLI / opencode / Hermes Agent) for the MCP server to load. ### 3. Initialize Projects @@ -498,7 +499,7 @@ MIT
-**Made for AI coding agents — Claude Code, Cursor, Codex CLI, and opencode** +**Made for AI coding agents — Claude Code, Cursor, Codex CLI, opencode, and Hermes Agent** [Report Bug](https://github.com/colbymchenry/codegraph/issues) · [Request Feature](https://github.com/colbymchenry/codegraph/issues) diff --git a/__tests__/installer-targets.test.ts b/__tests__/installer-targets.test.ts index bb6c69ea..44e90d68 100644 --- a/__tests__/installer-targets.test.ts +++ b/__tests__/installer-targets.test.ts @@ -31,13 +31,25 @@ function mkTmpDir(label: string): string { // `os.homedir()` reads first. Same trick the rest of the suite uses // when it needs a mock home. function setHome(dir: string): { restore: () => void } { - const prev = { HOME: process.env.HOME, USERPROFILE: process.env.USERPROFILE }; + const prev = { + HOME: process.env.HOME, + USERPROFILE: process.env.USERPROFILE, + APPDATA: process.env.APPDATA, + XDG_CONFIG_HOME: process.env.XDG_CONFIG_HOME, + HERMES_HOME: process.env.HERMES_HOME, + }; process.env.HOME = dir; process.env.USERPROFILE = dir; + process.env.APPDATA = path.join(dir, '.config'); + process.env.XDG_CONFIG_HOME = path.join(dir, '.config'); + delete process.env.HERMES_HOME; return { restore() { if (prev.HOME === undefined) delete process.env.HOME; else process.env.HOME = prev.HOME; if (prev.USERPROFILE === undefined) delete process.env.USERPROFILE; else process.env.USERPROFILE = prev.USERPROFILE; + if (prev.APPDATA === undefined) delete process.env.APPDATA; else process.env.APPDATA = prev.APPDATA; + if (prev.XDG_CONFIG_HOME === undefined) delete process.env.XDG_CONFIG_HOME; else process.env.XDG_CONFIG_HOME = prev.XDG_CONFIG_HOME; + if (prev.HERMES_HOME === undefined) delete process.env.HERMES_HOME; else process.env.HERMES_HOME = prev.HERMES_HOME; }, }; } @@ -298,12 +310,59 @@ describe('Installer targets — partial-state idempotency', () => { it('opencode: local install writes ./opencode.jsonc and ./AGENTS.md in cwd', () => { const opencode = getTarget('opencode')!; const result = opencode.install('local', { autoAllow: true }); - const paths = result.files.map((f) => f.path); + const paths = result.files.map((f) => f.path.replace(/\\/g, '/')); // macOS realpath shenanigans (/var vs /private/var) — suffix match. expect(paths.some((p) => p.endsWith('/opencode.jsonc'))).toBe(true); expect(paths.some((p) => p.endsWith('/AGENTS.md'))).toBe(true); }); + it('hermes: install adds codegraph MCP server and cli toolset, preserving existing yaml', () => { + const hermes = getTarget('hermes')!; + const config = path.join(tmpHome, '.hermes', 'config.yaml'); + fs.mkdirSync(path.dirname(config), { recursive: true }); + fs.writeFileSync(config, [ + 'model:', + ' default: qwen-3.7', + 'mcp_servers:', + ' other:', + ' command: other', + 'platform_toolsets:', + ' cli:', + ' - hermes-cli', + ' discord:', + ' - hermes-discord', + '', + ].join('\n')); + + const result = hermes.install('global', { autoAllow: true }); + expect(result.files[0].action).toBe('updated'); + const body = fs.readFileSync(config, 'utf-8'); + expect(body).toContain('model:\n default: qwen-3.7'); + expect(body).toContain('mcp_servers:\n other:\n command: other'); + expect(body).toContain(' codegraph:\n command: codegraph'); + expect(body).toContain(' - hermes-cli'); + expect(body).toContain(' - mcp-codegraph'); + expect(body).toContain(' discord:\n - hermes-discord'); + + const second = hermes.install('global', { autoAllow: true }); + expect(second.files[0].action).toBe('unchanged'); + }); + + it('hermes: uninstall removes only codegraph MCP server and toolset entry', () => { + const hermes = getTarget('hermes')!; + const config = path.join(tmpHome, '.hermes', 'config.yaml'); + fs.mkdirSync(path.dirname(config), { recursive: true }); + + hermes.install('global', { autoAllow: true }); + fs.appendFileSync(config, 'custom:\n keep: true\n'); + + hermes.uninstall('global'); + const body = fs.readFileSync(config, 'utf-8'); + expect(body).not.toContain('codegraph:'); + expect(body).not.toContain('mcp-codegraph'); + expect(body).toContain('custom:\n keep: true'); + }); + it('opencode: uninstall removes only mcp.codegraph, preserves comments and siblings', () => { const opencode = getTarget('opencode')!; const dir = path.join(tmpHome, '.config', 'opencode'); @@ -358,7 +417,7 @@ describe('Installer targets — partial-state idempotency', () => { const claude = getTarget('claude')!; const result = claude.install('local', { autoAllow: false }); // The MCP entry lands in ./.mcp.json — the file Claude Code reads. - expect(result.files.some((f) => f.path.endsWith('/.mcp.json'))).toBe(true); + expect(result.files.some((f) => f.path.replace(/\\/g, '/').endsWith('/.mcp.json'))).toBe(true); expect(fs.existsSync(path.join(tmpCwd, '.mcp.json'))).toBe(true); expect(fs.existsSync(path.join(tmpCwd, '.claude.json'))).toBe(false); const cfg = JSON.parse(fs.readFileSync(path.join(tmpCwd, '.mcp.json'), 'utf-8')); @@ -556,6 +615,7 @@ describe('Installer targets — registry', () => { expect(getTarget('cursor')?.id).toBe('cursor'); expect(getTarget('codex')?.id).toBe('codex'); expect(getTarget('opencode')?.id).toBe('opencode'); + expect(getTarget('hermes')?.id).toBe('hermes'); expect(getTarget('not-a-real-target')).toBeUndefined(); }); diff --git a/src/bin/codegraph.ts b/src/bin/codegraph.ts index b1d5f0a1..dac8ce1e 100644 --- a/src/bin/codegraph.ts +++ b/src/bin/codegraph.ts @@ -1341,7 +1341,7 @@ program */ program .command('install') - .description('Install codegraph MCP server into one or more agents (Claude Code, Cursor, Codex CLI, opencode)') + .description('Install codegraph MCP server into one or more agents (Claude Code, Cursor, Codex CLI, opencode, Hermes Agent)') .option('-t, --target ', 'Target agent(s): comma-separated ids, or "auto"|"all"|"none". Default: prompt') .option('-l, --location ', 'Install location: "global" or "local". Default: prompt') .option('-y, --yes', 'Non-interactive: defaults to --location=global --target=auto, auto-allow on') diff --git a/src/installer/index.ts b/src/installer/index.ts index 687fc884..e5b18411 100644 --- a/src/installer/index.ts +++ b/src/installer/index.ts @@ -2,7 +2,8 @@ * CodeGraph Interactive Installer * * Multi-target: writes MCP server config + instructions for the - * agents the user picks (Claude Code, Cursor, Codex CLI, opencode). + * agents the user picks (Claude Code, Cursor, Codex CLI, opencode, + * Hermes Agent). * Defaults to the Claude-only behavior for backwards compatibility * when no targets are explicitly chosen and nothing else is detected. * diff --git a/src/installer/targets/hermes.ts b/src/installer/targets/hermes.ts new file mode 100644 index 00000000..b6abfb94 --- /dev/null +++ b/src/installer/targets/hermes.ts @@ -0,0 +1,299 @@ +/** + * Hermes Agent target. + * + * Hermes reads MCP servers from `$HERMES_HOME/config.yaml` under the + * top-level `mcp_servers` key, and exposes discovered MCP tools through + * dynamic toolsets named `mcp-`. We add: + * + * mcp_servers.codegraph -> `codegraph serve --mcp` + * platform_toolsets.cli -> `mcp-codegraph` + * + * The second entry matters because Hermes CLI profiles often enable an + * explicit `platform_toolsets.cli` list. Without `mcp-codegraph` in that + * list, the MCP server can be configured and connected but its tools may + * still be filtered out of normal CLI sessions. + */ + +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import { + AgentTarget, + DetectionResult, + InstallOptions, + Location, + WriteResult, +} from './types'; +import { atomicWriteFileSync } from './shared'; + +type LineRange = { start: number; end: number }; + +class HermesTarget implements AgentTarget { + readonly id = 'hermes' as const; + readonly displayName = 'Hermes Agent'; + readonly docsUrl = 'https://hermes-agent.nousresearch.com'; + + supportsLocation(loc: Location): boolean { + return loc === 'global'; + } + + detect(loc: Location): DetectionResult { + if (loc !== 'global') { + return { installed: false, alreadyConfigured: false }; + } + const file = configPath(); + const content = readText(file); + const installed = fs.existsSync(hermesHome()) || fs.existsSync(file); + return { + installed, + alreadyConfigured: hasCodeGraphMcpServer(content), + configPath: file, + }; + } + + install(loc: Location, _opts: InstallOptions): WriteResult { + if (loc !== 'global') { + return { + files: [], + notes: ['Hermes Agent uses $HERMES_HOME/config.yaml; re-run with --location=global.'], + }; + } + return { + files: [writeHermesConfig()], + notes: ['Start a new Hermes session for MCP changes to take effect.'], + }; + } + + uninstall(loc: Location): WriteResult { + if (loc !== 'global') return { files: [] }; + const file = configPath(); + if (!fs.existsSync(file)) { + return { files: [{ path: file, action: 'not-found' }] }; + } + + const before = readText(file); + const after = removeCodeGraphToolset(removeCodeGraphMcpServer(before)); + if (after === before) { + return { files: [{ path: file, action: 'not-found' }] }; + } + atomicWriteFileSync(file, ensureTrailingNewline(after)); + return { files: [{ path: file, action: 'removed' }] }; + } + + printConfig(loc: Location): string { + if (loc !== 'global') { + return '# Hermes Agent uses $HERMES_HOME/config.yaml; use --location=global.\n'; + } + return [ + `# Add to ${configPath()}`, + '', + renderCodeGraphMcpBlock().join('\n'), + '', + 'platform_toolsets:', + ' cli:', + ' - hermes-cli', + ' - mcp-codegraph', + '', + ].join('\n'); + } + + describePaths(loc: Location): string[] { + return loc === 'global' ? [configPath()] : []; + } +} + +function hermesHome(): string { + return process.env.HERMES_HOME + ? path.resolve(process.env.HERMES_HOME) + : path.join(os.homedir(), '.hermes'); +} + +function configPath(): string { + return path.join(hermesHome(), 'config.yaml'); +} + +function readText(file: string): string { + try { + return fs.readFileSync(file, 'utf-8'); + } catch { + return ''; + } +} + +function writeHermesConfig(): WriteResult['files'][number] { + const file = configPath(); + const existed = fs.existsSync(file); + const before = readText(file); + const afterMcp = upsertCodeGraphMcpServer(before); + const after = upsertCodeGraphToolset(afterMcp); + + if (after === before) { + return { path: file, action: 'unchanged' }; + } + atomicWriteFileSync(file, ensureTrailingNewline(after)); + return { path: file, action: existed ? 'updated' : 'created' }; +} + +function ensureTrailingNewline(text: string): string { + return text.endsWith('\n') ? text : text + '\n'; +} + +function splitLines(content: string): string[] { + return content.replace(/\r\n/g, '\n').replace(/\r/g, '\n').split('\n'); +} + +function joinLines(lines: string[]): string { + while (lines.length > 0 && lines[lines.length - 1] === '') lines.pop(); + return lines.join('\n') + '\n'; +} + +function topLevelRange(lines: string[], key: string): LineRange | null { + const start = lines.findIndex((line) => line.trim() === `${key}:`); + if (start === -1) return null; + let end = lines.length; + for (let i = start + 1; i < lines.length; i++) { + const line = lines[i] ?? ''; + if (line.trim() === '') continue; + if (/^[A-Za-z_][A-Za-z0-9_-]*:\s*(?:#.*)?$/.test(line)) { + end = i; + break; + } + } + return { start, end }; +} + +function childRange(lines: string[], parent: LineRange, child: string): LineRange | null { + const startPattern = new RegExp(`^ ${escapeRegExp(child)}:\\s*(?:#.*)?$`); + let start = -1; + for (let i = parent.start + 1; i < parent.end; i++) { + if (startPattern.test(lines[i] ?? '')) { + start = i; + break; + } + } + if (start === -1) return null; + + let end = parent.end; + for (let i = start + 1; i < parent.end; i++) { + const line = lines[i] ?? ''; + if (line.trim() === '') continue; + if (/^ \S/.test(line)) { + end = i; + break; + } + } + while (end > start + 1 && (lines[end - 1] ?? '').trim() === '') { + end--; + } + return { start, end }; +} + +function escapeRegExp(value: string): string { + return value.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); +} + +function renderCodeGraphMcpChild(): string[] { + return [ + ' codegraph:', + ' command: codegraph', + ' args:', + ' - serve', + ' - --mcp', + ' timeout: 120', + ' connect_timeout: 60', + ' enabled: true', + ]; +} + +function renderCodeGraphMcpBlock(): string[] { + return ['mcp_servers:', ...renderCodeGraphMcpChild()]; +} + +function hasCodeGraphMcpServer(content: string): boolean { + const lines = splitLines(content); + const parent = topLevelRange(lines, 'mcp_servers'); + return !!parent && !!childRange(lines, parent, 'codegraph'); +} + +function upsertCodeGraphMcpServer(content: string): string { + const lines = splitLines(content); + const parent = topLevelRange(lines, 'mcp_servers'); + const child = parent ? childRange(lines, parent, 'codegraph') : null; + const replacement = renderCodeGraphMcpChild(); + + if (!parent) { + if (lines.length > 0 && lines[lines.length - 1] === '') lines.pop(); + if (lines.length > 0) lines.push(''); + lines.push(...renderCodeGraphMcpBlock()); + return joinLines(lines); + } + + if (child) { + const existing = lines.slice(child.start, child.end); + if (arrayEqual(existing, replacement)) return joinLines(lines); + lines.splice(child.start, child.end - child.start, ...replacement); + return joinLines(lines); + } + + lines.splice(parent.end, 0, ...replacement); + return joinLines(lines); +} + +function removeCodeGraphMcpServer(content: string): string { + const lines = splitLines(content); + const parent = topLevelRange(lines, 'mcp_servers'); + const child = parent ? childRange(lines, parent, 'codegraph') : null; + if (!child) return content; + lines.splice(child.start, child.end - child.start); + return joinLines(lines); +} + +function upsertCodeGraphToolset(content: string): string { + const lines = splitLines(content); + const parent = topLevelRange(lines, 'platform_toolsets'); + const cli = parent ? childRange(lines, parent, 'cli') : null; + + if (!parent) { + if (lines.length > 0 && lines[lines.length - 1] === '') lines.pop(); + if (lines.length > 0) lines.push(''); + lines.push('platform_toolsets:', ' cli:', ' - hermes-cli', ' - mcp-codegraph'); + return joinLines(lines); + } + + if (!cli) { + lines.splice(parent.end, 0, ' cli:', ' - hermes-cli', ' - mcp-codegraph'); + return joinLines(lines); + } + + const hasEntry = lines + .slice(cli.start + 1, cli.end) + .some((line) => line.trim() === '- mcp-codegraph'); + if (hasEntry) return joinLines(lines); + + lines.splice(cli.end, 0, ' - mcp-codegraph'); + return joinLines(lines); +} + +function removeCodeGraphToolset(content: string): string { + const lines = splitLines(content); + const parent = topLevelRange(lines, 'platform_toolsets'); + const cli = parent ? childRange(lines, parent, 'cli') : null; + if (!cli) return content; + + const hasEntry = lines + .slice(cli.start + 1, cli.end) + .some((line) => line.trim() === '- mcp-codegraph'); + if (!hasEntry) return content; + + const next = lines.filter((line, idx) => { + if (idx <= cli.start || idx >= cli.end) return true; + return line.trim() !== '- mcp-codegraph'; + }); + return joinLines(next); +} + +function arrayEqual(a: string[], b: string[]): boolean { + return a.length === b.length && a.every((value, idx) => value === b[idx]); +} + +export const hermesTarget: AgentTarget = new HermesTarget(); diff --git a/src/installer/targets/registry.ts b/src/installer/targets/registry.ts index e671fd19..0091ab64 100644 --- a/src/installer/targets/registry.ts +++ b/src/installer/targets/registry.ts @@ -12,12 +12,14 @@ import { claudeTarget } from './claude'; import { cursorTarget } from './cursor'; import { codexTarget } from './codex'; import { opencodeTarget } from './opencode'; +import { hermesTarget } from './hermes'; export const ALL_TARGETS: readonly AgentTarget[] = Object.freeze([ claudeTarget, cursorTarget, codexTarget, opencodeTarget, + hermesTarget, ]); export function getTarget(id: string): AgentTarget | undefined { diff --git a/src/installer/targets/types.ts b/src/installer/targets/types.ts index fdff0d77..290f13ce 100644 --- a/src/installer/targets/types.ts +++ b/src/installer/targets/types.ts @@ -19,7 +19,7 @@ export type Location = 'global' | 'local'; * lookup. New targets add a value here when they're added to the * registry. Keep these short and lowercase. */ -export type TargetId = 'claude' | 'cursor' | 'codex' | 'opencode'; +export type TargetId = 'claude' | 'cursor' | 'codex' | 'opencode' | 'hermes'; /** * Result of `target.detect(location)`. From 5b71a89574f96c660d7d702c2d470fed1f589509 Mon Sep 17 00:00:00 2001 From: Marcelo Vani Date: Thu, 21 May 2026 23:11:19 +0100 Subject: [PATCH 25/47] feat(frameworks): add Drupal 8/9/10/11 support (#271) Detects Drupal projects via composer.json drupal/* deps; extracts routes from *.routing.yml (route nodes + references edges to controllers/forms/entity handlers) and Drupal hook implementations from .module/.install/.theme/.inc. Adds yaml/twig as file-level languages and excludes core/contrib by default. Resolves #268. --- CHANGELOG.md | 22 ++ README.md | 3 +- __tests__/drupal.test.ts | 518 ++++++++++++++++++++++++++++ src/extraction/grammars.ts | 17 +- src/extraction/tree-sitter.ts | 5 + src/resolution/frameworks/drupal.ts | 373 ++++++++++++++++++++ src/resolution/frameworks/index.ts | 3 + src/types.ts | 16 + 8 files changed, 955 insertions(+), 2 deletions(-) create mode 100644 __tests__/drupal.test.ts create mode 100644 src/resolution/frameworks/drupal.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 7bf3686f..87a4a3b9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,28 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [Unreleased] + +### Added +- **Framework support: Drupal 8/9/10/11** — CodeGraph now detects Drupal + projects (via a `drupal/*` dependency in `composer.json`) and adds three + levels of intelligence: + - **Route extraction**: `*.routing.yml` files emit a `route` node per route, + linked by a `references` edge to the `_controller`, `_form`, or + entity-handler class/method, so querying a controller method surfaces the + URL route that binds it. + - **Hook detection**: hook implementations in `.module`, `.install`, `.theme`, + and `.inc` files are detected via docblock (`Implements hook_X()`) with a + module-name-prefix fallback. Each emits a `references` edge to the canonical + `hook_X` name so `codegraph_callers("hook_form_alter")` returns every + implementation across modules. + - **Resolution**: `_controller`/`_form` FQCNs resolve to their PHP + class/method nodes. + New `yaml`/`twig` languages are tracked at the file level, the Drupal PHP + extensions (`.module`/`.install`/`.theme`/`.inc`) are indexed with the PHP + grammar, and `web/core`, `web/modules/contrib`, `web/themes/contrib` are + excluded by default. Resolves [#268](https://github.com/colbymchenry/codegraph/issues/268). + ## [0.9.1] - 2026-05-21 ### Fixed diff --git a/README.md b/README.md index 59b8dcbb..17bd2042 100644 --- a/README.md +++ b/README.md @@ -124,7 +124,7 @@ The gains scale with codebase size: on large repos the agent answers from the in | **Impact Analysis** | Trace callers, callees, and the full impact radius of any symbol before making changes | | **Always Fresh** | File watcher uses native OS events (FSEvents/inotify/ReadDirectoryChangesW) with debounced auto-sync — the graph stays current as you code, zero config | | **19+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Lua, Luau, Svelte, Liquid, Pascal/Delphi | -| **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 13 frameworks | +| **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 14 frameworks | | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only | --- @@ -141,6 +141,7 @@ CodeGraph detects web-framework routing files and emits `route` nodes linked by | **Express** | `app.get(...)`, `router.post(...)` with middleware chains | | **NestJS** | `@Controller` + `@Get/@Post/...`, GraphQL `@Resolver` + `@Query/@Mutation`, `@MessagePattern`/`@EventPattern`, `@SubscribeMessage` | | **Laravel** | `Route::get()`, `Route::resource()`, `Controller@action`, tuple syntax | +| **Drupal** | `*.routing.yml` routes (`_controller`, `_form`, entity handlers); `hook_*` implementations in `.module`/`.theme`/`.install`/`.inc` | | **Rails** | `get '/x', to: 'users#index'`, hash-rocket `=>` syntax | | **Spring** | `@GetMapping`, `@PostMapping`, `@RequestMapping` on methods | | **Gin / chi / gorilla / mux** | `r.GET(...)`, `router.HandleFunc(...)` | diff --git a/__tests__/drupal.test.ts b/__tests__/drupal.test.ts new file mode 100644 index 00000000..fda5415b --- /dev/null +++ b/__tests__/drupal.test.ts @@ -0,0 +1,518 @@ +/** + * Tests for Drupal framework resolver. + * + * Unit tests cover drupalResolver.detect(), extract() (routes + hooks), and resolve(). + * Integration tests use a real CodeGraph instance with a temporary Drupal project layout. + */ + +import * as fs from 'fs'; +import * as os from 'os'; +import * as path from 'path'; +import { afterEach, beforeAll, describe, expect, it } from 'vitest'; +import { CodeGraph } from '../src'; +import { initGrammars, loadAllGrammars } from '../src/extraction/grammars'; +import { drupalResolver } from '../src/resolution/frameworks/drupal'; +import type { ResolutionContext } from '../src/resolution/types'; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +function makeContext( + overrides: Partial = {}, +): ResolutionContext { + return { + getNodesInFile: () => [], + getNodesByName: () => [], + getNodesByQualifiedName: () => [], + getNodesByKind: () => [], + fileExists: () => false, + readFile: () => null, + getProjectRoot: () => '/project', + getAllFiles: () => [], + getNodesByLowerName: () => [], + getImportMappings: () => [], + ...overrides, + }; +} + +// --------------------------------------------------------------------------- +// detect() +// --------------------------------------------------------------------------- + +describe('drupalResolver.detect', () => { + it('returns true when composer.json has a drupal/ dependency', () => { + const ctx = makeContext({ + readFile: (f) => + f === 'composer.json' + ? JSON.stringify({ + require: { + 'drupal/core-recommended': '~10.5', + 'drush/drush': '^13', + }, + }) + : null, + }); + expect(drupalResolver.detect(ctx)).toBe(true); + }); + + it('returns true when drupal/ dependency is in require-dev', () => { + const ctx = makeContext({ + readFile: (f) => + f === 'composer.json' + ? JSON.stringify({ 'require-dev': { 'drupal/core': '^10' } }) + : null, + }); + expect(drupalResolver.detect(ctx)).toBe(true); + }); + + it('returns false when composer.json has no drupal/ dependencies', () => { + const ctx = makeContext({ + readFile: (f) => + f === 'composer.json' + ? JSON.stringify({ + require: { 'laravel/framework': '^10', php: '>=8.1' }, + }) + : null, + }); + expect(drupalResolver.detect(ctx)).toBe(false); + }); + + it('returns false when composer.json is absent', () => { + const ctx = makeContext({ readFile: () => null }); + expect(drupalResolver.detect(ctx)).toBe(false); + }); + + it('returns false when composer.json is malformed JSON', () => { + const ctx = makeContext({ readFile: () => '{ bad json' }); + expect(drupalResolver.detect(ctx)).toBe(false); + }); +}); + +// --------------------------------------------------------------------------- +// extract() — routing.yml +// --------------------------------------------------------------------------- + +describe('drupalResolver.extract — routing.yml', () => { + const routing = ` +mymodule.example: + path: '/mymodule/example' + defaults: + _controller: '\\Drupal\\mymodule\\Controller\\MyController::build' + _title: 'Example page' + requirements: + _permission: 'access content' +`; + + it('emits a route node for each YAML route', () => { + const { nodes } = drupalResolver.extract!( + 'mymodule/mymodule.routing.yml', + routing, + ); + expect(nodes).toHaveLength(1); + expect(nodes[0]!.kind).toBe('route'); + expect(nodes[0]!.name).toBe('/mymodule/example'); + }); + + it('sets qualifiedName to filePath::routeName', () => { + const { nodes } = drupalResolver.extract!( + 'mymodule/mymodule.routing.yml', + routing, + ); + expect(nodes[0]!.qualifiedName).toBe( + 'mymodule/mymodule.routing.yml::mymodule.example', + ); + }); + + it('emits a references edge to the controller FQCN', () => { + const { references } = drupalResolver.extract!( + 'mymodule/mymodule.routing.yml', + routing, + ); + expect(references).toHaveLength(1); + expect(references[0]!.referenceName).toBe( + '\\Drupal\\mymodule\\Controller\\MyController::build', + ); + expect(references[0]!.referenceKind).toBe('references'); + }); + + it('emits a references edge to a _form handler', () => { + const src = ` +mymodule.settings_form: + path: '/admin/config/mymodule' + defaults: + _form: '\\Drupal\\mymodule\\Form\\SettingsForm' + _title: 'MyModule settings' + requirements: + _permission: 'administer site configuration' +`; + const { nodes, references } = drupalResolver.extract!( + 'mymodule/mymodule.routing.yml', + src, + ); + expect(nodes).toHaveLength(1); + expect(references[0]!.referenceName).toBe( + '\\Drupal\\mymodule\\Form\\SettingsForm', + ); + }); + + it('handles multiple routes in one file', () => { + const src = ` +mod.page_one: + path: '/page-one' + defaults: + _controller: '\\Drupal\\mod\\Controller\\PageController::one' + requirements: + _permission: 'access content' + +mod.page_two: + path: '/page-two' + defaults: + _controller: '\\Drupal\\mod\\Controller\\PageController::two' + requirements: + _permission: 'access content' +`; + const { nodes, references } = drupalResolver.extract!( + 'mod/mod.routing.yml', + src, + ); + expect(nodes).toHaveLength(2); + expect(nodes.map((n) => n.name)).toContain('/page-one'); + expect(nodes.map((n) => n.name)).toContain('/page-two'); + expect(references).toHaveLength(2); + }); + + it('skips commented-out lines', () => { + const src = ` +mod.page: + path: '/page' + defaults: + #_controller: '\\Drupal\\mod\\Controller\\Old::build' + _controller: '\\Drupal\\mod\\Controller\\New::build' + requirements: + _permission: 'access content' +`; + const { references } = drupalResolver.extract!('mod/mod.routing.yml', src); + expect(references).toHaveLength(1); + expect(references[0]!.referenceName).toContain('New'); + }); + + it('includes HTTP methods in the route node name when present', () => { + const src = ` +mod.api: + path: '/api/resource' + defaults: + _controller: '\\Drupal\\mod\\Controller\\ApiController::get' + methods: [GET, POST] + requirements: + _permission: 'access content' +`; + const { nodes } = drupalResolver.extract!('mod/mod.routing.yml', src); + expect(nodes[0]!.name).toContain('GET'); + expect(nodes[0]!.name).toContain('POST'); + }); + + it('returns empty result for non-routing-yml files', () => { + const { nodes, references } = drupalResolver.extract!( + 'mymodule.module', + ' { + const { nodes, references } = drupalResolver.extract!( + 'some.routing.yml', + '# empty\n', + ); + expect(nodes).toHaveLength(0); + expect(references).toHaveLength(0); + }); +}); + +// --------------------------------------------------------------------------- +// extract() — hook detection in .module files +// --------------------------------------------------------------------------- + +describe('drupalResolver.extract — hook detection', () => { + it('detects hook implementation via docblock (Strategy A)', () => { + const src = ` r.referenceName === 'hook_form_alter', + ); + expect(hookRef).toBeDefined(); + expect(hookRef!.referenceKind).toBe('references'); + }); + + it('detects hook implementation via name pattern (Strategy B)', () => { + const src = ` r.referenceName === 'hook_views_data', + ); + expect(hookRef).toBeDefined(); + }); + + it('does not emit a hook ref for non-hook helper functions', () => { + // 'other_module_helper' doesn't start with 'mymodule_', so no hook ref + const src = ` { + const src = ` r.referenceName === 'hook_schema'); + expect(hookRef).toBeDefined(); + }); + + it('detects hooks in .theme files', () => { + const src = ` r.referenceName === 'hook_preprocess_node', + ); + expect(hookRef).toBeDefined(); + }); + + it('does not duplicate refs when both docblock and name pattern match', () => { + // Strategy A matches first and adds to docblockMatched set; + // Strategy B skips already-matched functions. + const src = ` r.referenceName === 'hook_form_alter', + ); + expect(hookRefs).toHaveLength(1); + }); +}); + +// --------------------------------------------------------------------------- +// resolve() +// --------------------------------------------------------------------------- + +describe('drupalResolver.resolve', () => { + it('resolves a _controller FQCN with ::method to the method node', () => { + const methodNode = { + id: 'method:abc123', + kind: 'method' as const, + name: 'build', + qualifiedName: 'MyController::build', + filePath: 'web/modules/custom/mymodule/src/Controller/MyController.php', + language: 'php' as const, + startLine: 10, + endLine: 20, + startColumn: 0, + endColumn: 0, + updatedAt: 0, + }; + const classNode = { + id: 'class:def456', + kind: 'class' as const, + name: 'MyController', + qualifiedName: 'MyController', + filePath: 'web/modules/custom/mymodule/src/Controller/MyController.php', + language: 'php' as const, + startLine: 5, + endLine: 30, + startColumn: 0, + endColumn: 0, + updatedAt: 0, + }; + const ctx = makeContext({ + getNodesByName: (name) => (name === 'MyController' ? [classNode] : []), + getNodesInFile: () => [classNode, methodNode], + }); + const ref = { + fromNodeId: 'route:x', + referenceName: '\\Drupal\\mymodule\\Controller\\MyController::build', + referenceKind: 'references' as const, + line: 1, + column: 0, + filePath: 'mymodule.routing.yml', + language: 'yaml' as const, + }; + const resolved = drupalResolver.resolve(ref, ctx); + expect(resolved).not.toBeNull(); + expect(resolved!.targetNodeId).toBe('method:abc123'); + expect(resolved!.confidence).toBeGreaterThanOrEqual(0.85); + }); + + it('resolves a _form FQCN (no ::method) to the class node', () => { + const classNode = { + id: 'class:form123', + kind: 'class' as const, + name: 'SettingsForm', + qualifiedName: 'SettingsForm', + filePath: 'web/modules/custom/mymodule/src/Form/SettingsForm.php', + language: 'php' as const, + startLine: 1, + endLine: 50, + startColumn: 0, + endColumn: 0, + updatedAt: 0, + }; + const ctx = makeContext({ + getNodesByName: (name) => (name === 'SettingsForm' ? [classNode] : []), + }); + const ref = { + fromNodeId: 'route:x', + referenceName: '\\Drupal\\mymodule\\Form\\SettingsForm', + referenceKind: 'references' as const, + line: 1, + column: 0, + filePath: 'mymodule.routing.yml', + language: 'yaml' as const, + }; + const resolved = drupalResolver.resolve(ref, ctx); + expect(resolved).not.toBeNull(); + expect(resolved!.targetNodeId).toBe('class:form123'); + }); + + it('returns null when the target class cannot be found', () => { + const ctx = makeContext({ getNodesByName: () => [] }); + const ref = { + fromNodeId: 'route:x', + referenceName: '\\Drupal\\mymodule\\Controller\\Missing::method', + referenceKind: 'references' as const, + line: 1, + column: 0, + filePath: 'mymodule.routing.yml', + language: 'yaml' as const, + }; + expect(drupalResolver.resolve(ref, ctx)).toBeNull(); + }); +}); + +// --------------------------------------------------------------------------- +// End-to-end integration test +// --------------------------------------------------------------------------- + +beforeAll(async () => { + await initGrammars(); + await loadAllGrammars(); +}); + +describe('Drupal end-to-end — route node linked to controller method', () => { + let tmpDir: string | undefined; + afterEach(() => { + if (tmpDir) fs.rmSync(tmpDir, { recursive: true, force: true }); + tmpDir = undefined; + }); + + it('creates a route→controller edge from routing.yml to PHP class', async () => { + tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-drupal-')); + + // Minimal composer.json to trigger Drupal detection + fs.writeFileSync( + path.join(tmpDir, 'composer.json'), + JSON.stringify({ require: { 'drupal/core-recommended': '~10.5' } }), + ); + + // Module directory structure + const modDir = path.join(tmpDir, 'web', 'modules', 'custom', 'my_module'); + fs.mkdirSync(path.join(modDir, 'src', 'Controller'), { recursive: true }); + + // routing.yml + fs.writeFileSync( + path.join(modDir, 'my_module.routing.yml'), + [ + 'my_module.hello:', + " path: '/hello'", + ' defaults:', + " _controller: '\\Drupal\\my_module\\Controller\\HelloController::build'", + " _title: 'Hello'", + ' requirements:', + " _permission: 'access content'", + ].join('\n') + '\n', + ); + + // PHP controller + fs.writeFileSync( + path.join(modDir, 'src', 'Controller', 'HelloController.php'), + [ + ' 'Hello'];", + ' }', + '}', + ].join('\n') + '\n', + ); + + const cg = CodeGraph.initSync(tmpDir); + await cg.indexAll(); + + // Route node must exist + const routes = cg.getNodesByKind('route'); + expect(routes.length).toBeGreaterThan(0); + const route = routes.find((n) => n.name.includes('/hello')); + expect(route).toBeDefined(); + + // Controller method must be indexed + const methods = cg.getNodesByKind('method'); + const buildMethod = methods.find((n) => n.name === 'build'); + expect(buildMethod).toBeDefined(); + + // Edge: route → build method (or class fallback) + const edges = cg.getOutgoingEdges(route!.id); + expect(edges.length).toBeGreaterThan(0); + + cg.close(); + }); +}); diff --git a/src/extraction/grammars.ts b/src/extraction/grammars.ts index 15f224d9..a67d36bb 100644 --- a/src/extraction/grammars.ts +++ b/src/extraction/grammars.ts @@ -10,7 +10,7 @@ import * as path from 'path'; import { Parser, Language as WasmLanguage } from 'web-tree-sitter'; import { Language } from '../types'; -export type GrammarLanguage = Exclude; +export type GrammarLanguage = Exclude; /** * WASM filename map — maps each language to its .wasm grammar file @@ -63,6 +63,16 @@ export const EXTENSION_MAP: Record = { '.hxx': 'cpp', '.cs': 'csharp', '.php': 'php', + // Drupal-specific PHP file extensions + '.module': 'php', + '.install': 'php', + '.theme': 'php', + '.inc': 'php', + // YAML (used for Drupal routing files; no symbol extraction, file-level tracking only) + '.yml': 'yaml', + '.yaml': 'yaml', + // Twig templates (file-level tracking only, no symbol extraction) + '.twig': 'twig', '.rb': 'ruby', '.rake': 'ruby', '.swift': 'swift', @@ -215,6 +225,8 @@ export function isLanguageSupported(language: Language): boolean { if (language === 'svelte') return true; // custom extractor (script block delegation) if (language === 'vue') return true; // custom extractor (script block delegation) if (language === 'liquid') return true; // custom regex extractor + if (language === 'yaml') return true; // file-level tracking only; Drupal routing extraction via framework resolver + if (language === 'twig') return true; // file-level tracking only if (language === 'unknown') return false; return language in WASM_GRAMMAR_FILES; } @@ -224,6 +236,7 @@ export function isLanguageSupported(language: Language): boolean { */ export function isGrammarLoaded(language: Language): boolean { if (language === 'svelte' || language === 'vue' || language === 'liquid') return true; + if (language === 'yaml' || language === 'twig') return true; // no WASM grammar needed return languageCache.has(language); } @@ -301,6 +314,8 @@ export function getLanguageDisplayName(language: Language): string { scala: 'Scala', lua: 'Lua', luau: 'Luau', + yaml: 'YAML', + twig: 'Twig', unknown: 'Unknown', }; return names[language] || language; diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index 5a40c75a..28022409 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -2535,6 +2535,11 @@ export function extractFromSource( // Use custom extractor for Liquid const extractor = new LiquidExtractor(filePath, source); result = extractor.extract(); + } else if (detectedLanguage === 'yaml' || detectedLanguage === 'twig') { + // No symbol extraction — file is tracked at the file-record level only. + // Framework extractors (e.g. Drupal routing resolver) run below and may + // add route nodes / references for yaml files such as *.routing.yml. + result = { nodes: [], edges: [], unresolvedReferences: [], errors: [], durationMs: 0 }; } else if ( detectedLanguage === 'pascal' && (fileExtension === '.dfm' || fileExtension === '.fmx') diff --git a/src/resolution/frameworks/drupal.ts b/src/resolution/frameworks/drupal.ts new file mode 100644 index 00000000..2049d264 --- /dev/null +++ b/src/resolution/frameworks/drupal.ts @@ -0,0 +1,373 @@ +/** + * Drupal Framework Resolver + * + * Supports Drupal 8/9/10/11 (Composer-based projects). Drupal 7 is not supported. + * + * ## What this resolver does + * + * 1. **Detection** — reads composer.json and checks for any `drupal/*` dependency in + * `require` or `require-dev`. + * + * 2. **Route extraction** — parses `*.routing.yml` files and emits `route` nodes for each + * Drupal route, with `references` edges to the `_controller`, `_form`, or entity handler + * class/method. + * + * 3. **Hook detection** — scans `.module`, `.install`, `.theme`, and `.inc` files for Drupal + * hook implementations. Two strategies are used: + * a. Docblock: `@Implements hook_X()` → precise, no false positives. + * b. Name pattern: function `{moduleName}_{hookSuffix}()` → catches hooks without + * docblocks but may produce false positives on helper functions. + * Detected hooks emit an `UnresolvedRef` from the implementing function node to the + * canonical `hook_X` name, linking implementations to the hook when `codegraph_callers` + * is invoked. + * + * ## Design decisions (review in future iterations) + * + * - Hook graph resolution (v1): hook references are stored as UnresolvedRef pointing to the + * canonical `hook_X` name. If Drupal core is indexed, these will resolve to core hook + * definitions. Without core, they remain unresolved but are still searchable via + * `codegraph_search("form_alter")`. Full hook-node creation (virtual nodes for every hook) + * is deferred to a future iteration. + * + * - Services / plugins (out of scope for v1): `*.services.yml` service definitions and plugin + * annotations (`@Block`, `@FormElement`, etc.) are not extracted. Add a TODO below when + * ready to implement. + * + * - Twig templates (out of scope for v1): `.twig` files are tracked as file nodes but no + * symbol extraction is performed (no tree-sitter Twig grammar). Implement when a Twig + * grammar WASM is available. + * + * ## TODOs for future iterations + * + * - TODO: Extract service definitions from `*.services.yml` files (class → service-id edges). + * - TODO: Extract plugin annotations (`@Block`, `@FormElement`, `@Field`, etc.) from PHP + * docblocks and emit plugin nodes with references to the annotated class. + * - TODO: Add Twig symbol extraction when a tree-sitter Twig grammar becomes available. + * - TODO: Improve hook resolution: create virtual `hook_*` nodes so `codegraph_callers` + * returns all implementations even when Drupal core is not indexed. + */ + +import { generateNodeId } from '../../extraction/tree-sitter-helpers'; +import { Node } from '../../types'; +import { FrameworkResolver, ResolutionContext, ResolvedRef, UnresolvedRef } from '../types'; + +// --------------------------------------------------------------------------- +// Helpers +// --------------------------------------------------------------------------- + +/** + * Parse the last PHP namespace segment from a FQCN like `\Drupal\mymodule\Controller\Foo`. + * Returns `null` for strings that don't look like a FQCN. + */ +function lastSegment(fqcn: string): string | null { + const clean = fqcn.replace(/^\\+/, '').trim(); + if (!clean.includes('\\')) return null; + const parts = clean.split('\\'); + return parts[parts.length - 1] ?? null; +} + +/** + * Derive the Drupal module name from a file path. + * e.g. `web/modules/custom/my_module/my_module.module` → `my_module` + */ +function moduleNameFromPath(filePath: string): string | null { + const match = filePath.match(/\/([^/]+)\.[^./]+$/); + return match ? match[1]! : null; +} + +// --------------------------------------------------------------------------- +// Route extraction helpers +// --------------------------------------------------------------------------- + +/** + * Extract route nodes and handler references from a Drupal `*.routing.yml` file. + * + * Drupal routing YAML format: + * + * route.name: + * path: '/some/path' + * defaults: + * _controller: '\Drupal\module\Controller\MyController::method' + * _form: '\Drupal\module\Form\MyForm' + * _title: 'Page title' + * requirements: + * _permission: 'access content' + * methods: [GET, POST] # optional + */ +function extractDrupalRoutes( + filePath: string, + content: string +): { nodes: Node[]; references: UnresolvedRef[] } { + const nodes: Node[] = []; + const references: UnresolvedRef[] = []; + const now = Date.now(); + + const lines = content.split('\n'); + + type PendingRoute = { name: string; lineNum: number }; + let pending: PendingRoute | null = null; + let currentPath: string | null = null; + let handlerRefs: string[] = []; + let methods: string[] = []; + + const flushRoute = () => { + if (!pending || !currentPath) return; + + const methodTag = methods.length > 0 ? ` [${methods.join(',')}]` : ''; + const routeNode: Node = { + id: `route:${filePath}:${pending.lineNum}:${currentPath}`, + kind: 'route', + name: `${currentPath}${methodTag}`, + qualifiedName: `${filePath}::${pending.name}`, + filePath, + startLine: pending.lineNum, + endLine: pending.lineNum, + startColumn: 0, + endColumn: 0, + language: 'yaml', + updatedAt: now, + }; + nodes.push(routeNode); + + for (const handler of handlerRefs) { + references.push({ + fromNodeId: routeNode.id, + referenceName: handler, + referenceKind: 'references', + line: pending.lineNum, + column: 0, + filePath, + language: 'yaml', + }); + } + }; + + for (let i = 0; i < lines.length; i++) { + const line = lines[i]!; + const trimmed = line.trim(); + + if (!trimmed || trimmed.startsWith('#')) continue; + + // Top-level route name: no leading whitespace, ends with a colon (no value after) + if (/^\S.*:\s*$/.test(line) && !/^\s/.test(line)) { + flushRoute(); + pending = { name: trimmed.slice(0, -1).trim(), lineNum: i + 1 }; + currentPath = null; + handlerRefs = []; + methods = []; + continue; + } + + // path: '/some/path' + const pathMatch = trimmed.match(/^path:\s*['"]?([^'"#\n]+?)['"]?\s*(?:#.*)?$/); + if (pathMatch) { + currentPath = pathMatch[1]!.trim(); + continue; + } + + // _controller: '\Drupal\...\Class::method' + const controllerMatch = trimmed.match(/^_controller:\s*['"]?([^'"#\n]+?)['"]?\s*(?:#.*)?$/); + if (controllerMatch) { + handlerRefs.push(controllerMatch[1]!.trim()); + continue; + } + + // _form: '\Drupal\...\Form\MyForm' + const formMatch = trimmed.match(/^_form:\s*['"]?([^'"#\n]+?)['"]?\s*(?:#.*)?$/); + if (formMatch) { + handlerRefs.push(formMatch[1]!.trim()); + continue; + } + + // _entity_form / _entity_list / _entity_view: entity.type + const entityMatch = trimmed.match(/^_(entity_form|entity_list|entity_view):\s*['"]?([^'"#\n]+?)['"]?\s*(?:#.*)?$/); + if (entityMatch) { + handlerRefs.push(entityMatch[2]!.trim()); + continue; + } + + // methods: [GET, POST] or methods: [GET] + const methodsMatch = trimmed.match(/^methods:\s*\[([^\]]+)\]/); + if (methodsMatch) { + methods = methodsMatch[1]!.split(',').map((m) => m.trim().toUpperCase()).filter(Boolean); + continue; + } + } + + flushRoute(); + return { nodes, references }; +} + +// --------------------------------------------------------------------------- +// Hook detection helpers +// --------------------------------------------------------------------------- + +const HOOK_FILE_EXTENSIONS = ['.module', '.install', '.theme', '.inc']; + +function isDrupalHookFile(filePath: string): boolean { + return HOOK_FILE_EXTENSIONS.some((ext) => filePath.endsWith(ext)); +} + +/** + * Extract hook implementation references from a Drupal PHP file. + * + * Strategy A (primary): look for docblocks containing `Implements hook_X().` + * followed immediately by the function definition. This is the Drupal coding + * standard and is precise. + * + * Strategy B (fallback): for functions whose name starts with `{moduleName}_`, + * treat the suffix as the hook name. Catches hooks without docblocks but may + * produce false positives on non-hook helper functions. + * + * Each detected hook emits an UnresolvedRef from the implementing function node + * (identified by computing the same ID tree-sitter would generate) to the + * canonical hook name, e.g. `hook_form_alter`. + */ +function extractDrupalHooks( + filePath: string, + content: string +): { nodes: Node[]; references: UnresolvedRef[] } { + const references: UnresolvedRef[] = []; + + // Build a map of function name → 1-indexed line number for all top-level functions. + // This mirrors tree-sitter's line numbering so we can reconstruct node IDs. + const funcLineMap = new Map(); + const funcDef = /^function\s+(\w+)\s*\(/gm; + let fm: RegExpExecArray | null; + while ((fm = funcDef.exec(content)) !== null) { + const name = fm[1]!; + if (!funcLineMap.has(name)) { + // line = number of newlines before match start + 1 + funcLineMap.set(name, content.slice(0, fm.index).split('\n').length); + } + } + + const emitHookRef = (hookName: string, funcName: string) => { + const lineNum = funcLineMap.get(funcName); + if (lineNum === undefined) return; + const nodeId = generateNodeId(filePath, 'function', funcName, lineNum); + references.push({ + fromNodeId: nodeId, + referenceName: hookName, + referenceKind: 'references', + line: lineNum, + column: 0, + filePath, + language: 'php', + }); + }; + + // Strategy A: docblock `Implements hook_X().` followed by function definition. + // The docblock and function may be separated by blank lines. + const docblockPattern = + /\/\*\*[\s\S]*?(?:@|\*\s+)Implements\s+(hook_\w+)\s*\(\)[\s\S]*?\*\/\s*\n(?:\s*\n)*function\s+(\w+)\s*\(/g; + const docblockMatched = new Set(); + let match: RegExpExecArray | null; + while ((match = docblockPattern.exec(content)) !== null) { + const [, hookName, funcName] = match; + emitHookRef(hookName!, funcName!); + docblockMatched.add(funcName!); + } + + // Strategy B: fallback name-pattern matching for functions without docblocks. + // Only applies to functions whose name starts with {moduleName}_ and that were + // not already matched by Strategy A. + const moduleName = moduleNameFromPath(filePath); + if (moduleName) { + const prefix = moduleName + '_'; + for (const [funcName] of funcLineMap) { + if (docblockMatched.has(funcName)) continue; + if (!funcName.startsWith(prefix)) continue; + const hookSuffix = funcName.slice(prefix.length); + if (!hookSuffix) continue; + // Emit a reference to hook_{suffix} — the resolver will link it if the + // hook is defined somewhere in the indexed graph (e.g. Drupal core). + emitHookRef(`hook_${hookSuffix}`, funcName); + } + } + + return { nodes: [], references }; +} + +// --------------------------------------------------------------------------- +// Resolver +// --------------------------------------------------------------------------- + +export const drupalResolver: FrameworkResolver = { + name: 'drupal', + languages: ['php', 'yaml'], + + detect(context: ResolutionContext): boolean { + const composer = context.readFile('composer.json'); + if (!composer) return false; + try { + const json = JSON.parse(composer) as { require?: Record; 'require-dev'?: Record }; + const deps = { ...json.require, ...(json['require-dev'] ?? {}) }; + return Object.keys(deps).some((k) => k.startsWith('drupal/')); + } catch { + return false; + } + }, + + resolve(ref: UnresolvedRef, context: ResolutionContext): ResolvedRef | null { + const name = ref.referenceName; + + // _controller: '\Drupal\module\...\ClassName::methodName' + const controllerMatch = name.match(/^\\?(?:Drupal\\[^:]+\\)?([^\\:]+)::(\w+)$/); + if (controllerMatch) { + const [, className, methodName] = controllerMatch; + const classNodes = context.getNodesByName(className!); + for (const cls of classNodes) { + if (cls.kind !== 'class') continue; + const fileNodes = context.getNodesInFile(cls.filePath); + const method = fileNodes.find((n) => n.kind === 'method' && n.name === methodName); + if (method) { + return { original: ref, targetNodeId: method.id, confidence: 0.9, resolvedBy: 'framework' }; + } + return { original: ref, targetNodeId: cls.id, confidence: 0.7, resolvedBy: 'framework' }; + } + } + + // _form / _entity_form: '\Drupal\module\...\ClassName' (no ::method) + if (name.includes('\\') && !name.includes('::')) { + const className = lastSegment(name); + if (className) { + const classNodes = context.getNodesByName(className); + const cls = classNodes.find((n) => n.kind === 'class'); + if (cls) { + return { original: ref, targetNodeId: cls.id, confidence: 0.85, resolvedBy: 'framework' }; + } + } + } + + // hook_X — find any function whose name ends in _{hookSuffix} in a hook file + if (name.startsWith('hook_')) { + const hookSuffix = name.slice(5); // strip 'hook_' + const candidates = context.getNodesByKind('function').filter( + (n) => n.name.endsWith(`_${hookSuffix}`) && isDrupalHookFile(n.filePath) + ); + if (candidates.length > 0) { + return { + original: ref, + targetNodeId: candidates[0]!.id, + confidence: 0.75, + resolvedBy: 'framework', + }; + } + } + + return null; + }, + + extract(filePath: string, content: string): { nodes: Node[]; references: UnresolvedRef[] } { + if (filePath.endsWith('.routing.yml')) { + return extractDrupalRoutes(filePath, content); + } + + if (isDrupalHookFile(filePath) || filePath.endsWith('.php')) { + return extractDrupalHooks(filePath, content); + } + + return { nodes: [], references: [] }; + }, +}; diff --git a/src/resolution/frameworks/index.ts b/src/resolution/frameworks/index.ts index 188b5e48..755718b6 100644 --- a/src/resolution/frameworks/index.ts +++ b/src/resolution/frameworks/index.ts @@ -6,6 +6,7 @@ import { FrameworkResolver, ResolutionContext } from '../types'; import type { Language } from '../../types'; +import { drupalResolver } from './drupal'; import { laravelResolver } from './laravel'; import { expressResolver } from './express'; import { nestjsResolver } from './nestjs'; @@ -26,6 +27,7 @@ import { swiftUIResolver, uikitResolver, vaporResolver } from './swift'; const FRAMEWORK_RESOLVERS: FrameworkResolver[] = [ // PHP laravelResolver, + drupalResolver, // JavaScript/TypeScript expressResolver, nestjsResolver, @@ -105,6 +107,7 @@ export function registerFrameworkResolver(resolver: FrameworkResolver): void { } // Re-export framework resolvers +export { drupalResolver } from './drupal'; export { laravelResolver, FACADE_MAPPINGS } from './laravel'; export { expressResolver } from './express'; export { nestjsResolver } from './nestjs'; diff --git a/src/types.ts b/src/types.ts index f7880407..54485ac0 100644 --- a/src/types.ts +++ b/src/types.ts @@ -87,6 +87,8 @@ export const LANGUAGES = [ 'scala', 'lua', 'luau', + 'yaml', + 'twig', 'unknown', ] as const; @@ -522,6 +524,15 @@ export const DEFAULT_CONFIG: CodeGraphConfig = { '**/*.cs', // PHP '**/*.php', + // Drupal-specific PHP extensions + '**/*.module', + '**/*.install', + '**/*.theme', + '**/*.inc', + // Drupal routing YAML + '**/*.routing.yml', + // Twig templates + '**/*.twig', // Ruby '**/*.rb', // Swift @@ -667,6 +678,11 @@ export const DEFAULT_CONFIG: CodeGraphConfig = { '**/storage/framework/**', '**/bootstrap/cache/**', + // Drupal - core and contrib are rarely customised; index only custom code + '**/web/core/**', + '**/web/modules/contrib/**', + '**/web/themes/contrib/**', + // Ruby '**/.bundle/**', '**/tmp/cache/**', From f6772dac7cbcc8d45c15d7f54f1e6161962aa0ea Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 21 May 2026 18:06:02 -0500 Subject: [PATCH 26/47] feat: zero-config indexing driven by .gitignore (#283) (#285) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Remove .codegraph/config.json and the entire config surface. CodeGraph now indexes every file whose extension maps to a supported language and respects .gitignore everywhere — git repos via git itself, non-git projects via the `ignore` library (root + nested .gitignore files, the same way git does). - Remove CodeGraphConfig/DEFAULT_CONFIG, src/config.ts, and the public config API (the `config` option on init, getConfig/updateConfig/getConfigPath). - Derive the source-file allowlist from EXTENSION_MAP (isSourceFile); maxFileSize is now a constant. Drop the .codegraphignore marker. - Behavior change: committed, non-gitignored dirs (vendor/, a committed dist/) are now indexed — .gitignore is the single source of truth. Earlier inert fields (languages, frameworks, extractDocstrings, trackCallSites, customPatterns) and their dead helpers are removed as part of this. Resolves #283. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 31 ++++ README.md | 39 ++--- __tests__/extraction.test.ts | 70 ++++---- __tests__/foundation.test.ts | 68 +------- __tests__/security.test.ts | 90 ++-------- __tests__/sync.test.ts | 8 +- __tests__/watch-policy.test.ts | 15 +- __tests__/watcher.test.ts | 31 +--- package-lock.json | 10 ++ package.json | 1 + src/config.ts | 297 --------------------------------- src/extraction/grammars.ts | 11 ++ src/extraction/index.ts | 181 +++++++++----------- src/index.ts | 68 +------- src/sync/watcher.ts | 12 +- src/types.ts | 291 -------------------------------- 16 files changed, 225 insertions(+), 998 deletions(-) delete mode 100644 src/config.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 87a4a3b9..20a2b9bc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,37 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). grammar, and `web/core`, `web/modules/contrib`, `web/themes/contrib` are excluded by default. Resolves [#268](https://github.com/colbymchenry/codegraph/issues/268). +### Changed +- **Zero-config indexing that respects `.gitignore`.** CodeGraph no longer has a + config file. It indexes every file whose extension maps to a supported language + and honors your `.gitignore` everywhere: in git repos via git itself, and in + non-git projects (e.g. a freshly-scaffolded app before `git init`) by reading + `.gitignore` files directly — root and nested, the same way git does (via the + `ignore` library, so negation/anchoring/nested rules all behave correctly). To + keep something out of the graph, add it to `.gitignore`. **Behavior change:** + committed files that are *not* gitignored are now indexed even under `vendor/`, + `Pods/`, or a committed `dist/` — previously a hardcoded exclude list skipped + those names; now `.gitignore` is the single source of truth. Resolves + [#283](https://github.com/colbymchenry/codegraph/issues/283). + +### Removed +- **`.codegraph/config.json` and the entire config surface.** Every field was + either inert or now redundant with `.gitignore`: + - `languages`/`frameworks` never affected indexing (languages are detected per + file from extensions; frameworks are auto-detected). `languages` was also + broken — its validator only knew the original 8 languages, so setting it to + anything newer (C#, PHP, Ruby, C/C++, Swift, Kotlin, Dart, Vue, Scala, Lua, …) + threw `Invalid configuration format`. + - `extractDocstrings`/`trackCallSites`/`customPatterns` were never read by any + extractor. + - `include` is now derived from the supported language extensions, `exclude` is + replaced by `.gitignore`, and `maxFileSize` (1 MB) is a constant. + + **Breaking (library API):** the `CodeGraphConfig` type, the `config` option on + `CodeGraph.init()`, and the `getConfig()`/`updateConfig()`/`getConfigPath` + exports are gone. Existing `.codegraph/config.json` files are simply ignored. + The `.codegraphignore` marker is no longer supported — use `.gitignore`. + ## [0.9.1] - 2026-05-21 ### Fixed diff --git a/README.md b/README.md index 17bd2042..598ac5b0 100644 --- a/README.md +++ b/README.md @@ -418,28 +418,23 @@ cg.close(); ## Configuration -The `.codegraph/config.json` file controls indexing: - -```json -{ - "version": 1, - "languages": ["typescript", "javascript"], - "exclude": ["node_modules/**", "dist/**", "build/**", "*.min.js"], - "frameworks": [], - "maxFileSize": 1048576, - "extractDocstrings": true, - "trackCallSites": true -} -``` - -| Option | Description | Default | -|--------|-------------|---------| -| `languages` | Languages to index (auto-detected if empty) | `[]` | -| `exclude` | Glob patterns to ignore | `["node_modules/**", ...]` | -| `frameworks` | Framework hints for better resolution | `[]` | -| `maxFileSize` | Skip files larger than this (bytes) | `1048576` (1MB) | -| `extractDocstrings` | Extract docstrings from code | `true` | -| `trackCallSites` | Track call site locations | `true` | +There isn't any — CodeGraph is zero-config. It indexes every file whose +extension maps to a [supported language](#supported-languages) and **respects +your `.gitignore`**: in git repos via git itself, and in non-git projects by +reading `.gitignore` files directly (root and nested, the same way git would). + +What that means in practice: + +- Anything git ignores — `node_modules`, build output, secrets in `.env` — is + never indexed. **To keep something out of the graph, add it to `.gitignore`.** +- There's no config file to write or keep in sync, and nothing to wire up per + language: support is automatic from the file extension. +- Files larger than 1 MB are skipped (generated bundles, minified JS, vendored + blobs) — they cost parse budget for no useful symbols. + +> Committed files that aren't gitignored *are* indexed, even under `vendor/` or a +> committed `dist/`. If you commit a dependency or build directory you don't want +> in the graph, add it to `.gitignore`. ## Supported Languages diff --git a/__tests__/extraction.test.ts b/__tests__/extraction.test.ts index 1b121478..92717759 100644 --- a/__tests__/extraction.test.ts +++ b/__tests__/extraction.test.ts @@ -9,10 +9,9 @@ import * as fs from 'fs'; import * as path from 'path'; import * as os from 'os'; import { CodeGraph } from '../src'; -import { extractFromSource, scanDirectory, shouldIncludeFile } from '../src/extraction'; +import { extractFromSource, scanDirectory } from '../src/extraction'; import { detectLanguage, isLanguageSupported, getSupportedLanguages, initGrammars, loadAllGrammars } from '../src/extraction/grammars'; import { normalizePath } from '../src/utils'; -import { DEFAULT_CONFIG } from '../src/types'; beforeAll(async () => { await initGrammars(); @@ -3003,39 +3002,57 @@ describe('Directory Exclusion', () => { cleanupTempDir(tempDir); }); - it('should exclude node_modules directories', () => { - // Create structure: src/index.ts + node_modules/pkg/index.js + it('should exclude directories listed in .gitignore', () => { + // Create structure: src/index.ts + node_modules/pkg/index.js, gitignore node_modules const srcDir = path.join(tempDir, 'src'); const nmDir = path.join(tempDir, 'node_modules', 'pkg'); fs.mkdirSync(srcDir, { recursive: true }); fs.mkdirSync(nmDir, { recursive: true }); fs.writeFileSync(path.join(srcDir, 'index.ts'), 'export const x = 1;'); fs.writeFileSync(path.join(nmDir, 'index.js'), 'module.exports = {};'); + fs.writeFileSync(path.join(tempDir, '.gitignore'), 'node_modules/\n'); - const config = { ...DEFAULT_CONFIG, rootDir: tempDir }; - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); expect(files).toContain('src/index.ts'); expect(files.every((f) => !f.includes('node_modules'))).toBe(true); }); - it('should exclude nested node_modules directories', () => { - // Create structure: packages/app/node_modules/pkg/index.js + it('should exclude nested node_modules via a root .gitignore', () => { + // A trailing-slash pattern with no leading slash matches at any depth. const srcDir = path.join(tempDir, 'packages', 'app', 'src'); const nmDir = path.join(tempDir, 'packages', 'app', 'node_modules', 'pkg'); fs.mkdirSync(srcDir, { recursive: true }); fs.mkdirSync(nmDir, { recursive: true }); fs.writeFileSync(path.join(srcDir, 'index.ts'), 'export const x = 1;'); fs.writeFileSync(path.join(nmDir, 'index.js'), 'module.exports = {};'); + fs.writeFileSync(path.join(tempDir, '.gitignore'), 'node_modules/\n'); - const config = { ...DEFAULT_CONFIG, rootDir: tempDir }; - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); expect(files).toContain('packages/app/src/index.ts'); expect(files.every((f) => !f.includes('node_modules'))).toBe(true); }); - it('should exclude .git directories', () => { + it('should apply a nested .gitignore only to its own subtree', () => { + const appSrc = path.join(tempDir, 'app', 'src'); + fs.mkdirSync(appSrc, { recursive: true }); + fs.writeFileSync(path.join(appSrc, 'keep.ts'), 'export const a = 1;'); + fs.writeFileSync(path.join(appSrc, 'skip.ts'), 'export const b = 2;'); + fs.writeFileSync(path.join(tempDir, 'app', '.gitignore'), 'src/skip.ts\n'); + // A sibling with the same name outside app/ must NOT be ignored. + const otherDir = path.join(tempDir, 'other', 'src'); + fs.mkdirSync(otherDir, { recursive: true }); + fs.writeFileSync(path.join(otherDir, 'skip.ts'), 'export const c = 3;'); + + const files = scanDirectory(tempDir); + + expect(files).toContain('app/src/keep.ts'); + expect(files).not.toContain('app/src/skip.ts'); + expect(files).toContain('other/src/skip.ts'); + }); + + it('should always skip .git directories', () => { const srcDir = path.join(tempDir, 'src'); const gitDir = path.join(tempDir, '.git', 'objects'); fs.mkdirSync(srcDir, { recursive: true }); @@ -3043,8 +3060,7 @@ describe('Directory Exclusion', () => { fs.writeFileSync(path.join(srcDir, 'index.ts'), 'export const x = 1;'); fs.writeFileSync(path.join(gitDir, 'pack.ts'), 'export const y = 2;'); - const config = { ...DEFAULT_CONFIG, rootDir: tempDir }; - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); expect(files).toContain('src/index.ts'); expect(files.every((f) => !f.includes('.git'))).toBe(true); @@ -3055,29 +3071,12 @@ describe('Directory Exclusion', () => { fs.mkdirSync(srcDir, { recursive: true }); fs.writeFileSync(path.join(srcDir, 'Button.tsx'), 'export function Button() {}'); - const config = { ...DEFAULT_CONFIG, rootDir: tempDir }; - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); expect(files.length).toBe(1); expect(files[0]).toBe('src/components/Button.tsx'); expect(files[0]).not.toContain('\\'); }); - - it('should respect .codegraphignore marker', () => { - const srcDir = path.join(tempDir, 'src'); - const vendorDir = path.join(tempDir, 'vendor'); - fs.mkdirSync(srcDir, { recursive: true }); - fs.mkdirSync(vendorDir, { recursive: true }); - fs.writeFileSync(path.join(srcDir, 'index.ts'), 'export const x = 1;'); - fs.writeFileSync(path.join(vendorDir, 'lib.ts'), 'export const y = 2;'); - fs.writeFileSync(path.join(vendorDir, '.codegraphignore'), ''); - - const config = { ...DEFAULT_CONFIG, rootDir: tempDir }; - const files = scanDirectory(tempDir, config); - - expect(files).toContain('src/index.ts'); - expect(files.every((f) => !f.includes('vendor'))).toBe(true); - }); }); describe('Git Submodules', () => { @@ -3124,8 +3123,7 @@ describe('Git Submodules', () => { ); git(mainDir, 'commit', '-q', '-m', 'add submodule'); - const config = { ...DEFAULT_CONFIG, rootDir: mainDir }; - const files = scanDirectory(mainDir, config); + const files = scanDirectory(mainDir); expect(files).toContain('app.ts'); expect(files).toContain('libs/lib/lib.ts'); @@ -3173,8 +3171,7 @@ describe('Nested non-submodule git repos', () => { git(path.join(root, 'sub_repo2'), 'init', '-q'); fs.writeFileSync(path.join(sub2, 'two.ts'), 'export const two = 2;'); - const config = { ...DEFAULT_CONFIG, rootDir: root }; - const files = scanDirectory(root, config); + const files = scanDirectory(root); // Both committed and untracked source from the nested repos must be found. expect(files).toContain('sub_repo1/src/one.ts'); @@ -3197,8 +3194,7 @@ describe('Nested non-submodule git repos', () => { fs.writeFileSync(path.join(sub, 'real.ts'), 'export const real = 1;'); fs.writeFileSync(path.join(sub, 'generated.ts'), 'export const generated = 1;'); - const config = { ...DEFAULT_CONFIG, rootDir: root }; - const files = scanDirectory(root, config); + const files = scanDirectory(root); expect(files).toContain('sub_repo/src/real.ts'); expect(files).not.toContain('sub_repo/src/generated.ts'); diff --git a/__tests__/foundation.test.ts b/__tests__/foundation.test.ts index 4e8f204a..78ebfce4 100644 --- a/__tests__/foundation.test.ts +++ b/__tests__/foundation.test.ts @@ -9,8 +9,7 @@ import * as fs from 'fs'; import * as path from 'path'; import * as os from 'os'; import { CodeGraph } from '../src'; -import { DEFAULT_CONFIG, Node, Edge } from '../src/types'; -import { loadConfig, saveConfig } from '../src/config'; +import { Node, Edge } from '../src/types'; import { isInitialized, getCodeGraphDir, validateDirectory } from '../src/directory'; import { DatabaseConnection, getDatabasePath } from '../src/db'; @@ -60,41 +59,12 @@ describe('CodeGraph Foundation', () => { cg.close(); }); - it('should create config.json with defaults', () => { - const cg = CodeGraph.initSync(tempDir); - - const configPath = path.join(getCodeGraphDir(tempDir), 'config.json'); - expect(fs.existsSync(configPath)).toBe(true); - - const config = cg.getConfig(); - expect(config.version).toBe(DEFAULT_CONFIG.version); - expect(config.include).toEqual(DEFAULT_CONFIG.include); - expect(config.exclude).toEqual(DEFAULT_CONFIG.exclude); - - cg.close(); - }); - it('should throw if already initialized', () => { const cg = CodeGraph.initSync(tempDir); cg.close(); expect(() => CodeGraph.initSync(tempDir)).toThrow(/already initialized/i); }); - - it('should accept custom config options', () => { - const cg = CodeGraph.initSync(tempDir, { - config: { - maxFileSize: 500000, - extractDocstrings: false, - }, - }); - - const config = cg.getConfig(); - expect(config.maxFileSize).toBe(500000); - expect(config.extractDocstrings).toBe(false); - - cg.close(); - }); }); describe('Opening Projects', () => { @@ -112,17 +82,6 @@ describe('CodeGraph Foundation', () => { it('should throw if not initialized', () => { expect(() => CodeGraph.openSync(tempDir)).toThrow(/not initialized/i); }); - - it('should preserve configuration across open/close', () => { - const cg1 = CodeGraph.initSync(tempDir, { - config: { maxFileSize: 123456 }, - }); - cg1.close(); - - const cg2 = CodeGraph.openSync(tempDir); - expect(cg2.getConfig().maxFileSize).toBe(123456); - cg2.close(); - }); }); describe('Static Methods', () => { @@ -182,31 +141,6 @@ describe('CodeGraph Foundation', () => { }); }); - describe('Configuration', () => { - it('should load and merge config with defaults', () => { - const cg = CodeGraph.initSync(tempDir); - cg.close(); - - const config = loadConfig(tempDir); - expect(config.version).toBe(DEFAULT_CONFIG.version); - expect(config.rootDir).toBe(path.resolve(tempDir)); - }); - - it('should update configuration', () => { - const cg = CodeGraph.initSync(tempDir); - - cg.updateConfig({ maxFileSize: 999999 }); - - expect(cg.getConfig().maxFileSize).toBe(999999); - - cg.close(); - - // Verify persistence - const config = loadConfig(tempDir); - expect(config.maxFileSize).toBe(999999); - }); - }); - describe('Directory Management', () => { it('should validate directory structure', () => { const cg = CodeGraph.initSync(tempDir); diff --git a/__tests__/security.test.ts b/__tests__/security.test.ts index b923a342..782b99da 100644 --- a/__tests__/security.test.ts +++ b/__tests__/security.test.ts @@ -15,9 +15,7 @@ import * as os from 'os'; import { FileLock } from '../src/utils'; import CodeGraph from '../src/index'; import { ToolHandler, tools } from '../src/mcp/tools'; -import { shouldIncludeFile, scanDirectory } from '../src/extraction'; -import { shouldIncludeFile as configShouldInclude } from '../src/config'; -import { CodeGraphConfig, DEFAULT_CONFIG } from '../src/types'; +import { scanDirectory, isSourceFile } from '../src/extraction'; import { DatabaseConnection, getDatabasePath } from '../src/db'; import { QueryBuilder } from '../src/db/queries'; @@ -298,58 +296,24 @@ describe('Atomic Writes', () => { }); }); -describe('Glob Matching (picomatch)', () => { - const makeConfig = (include: string[], exclude: string[]): CodeGraphConfig => ({ - ...DEFAULT_CONFIG, - rootDir: '/test', - include, - exclude, +describe('Source file detection (isSourceFile)', () => { + it('selects files by supported extension', () => { + expect(isSourceFile('src/index.ts')).toBe(true); + expect(isSourceFile('src/deep/nested/file.ts')).toBe(true); + expect(isSourceFile('src/component.tsx')).toBe(true); + expect(isSourceFile('lib/util.js')).toBe(true); + expect(isSourceFile('src/main.py')).toBe(true); }); - it('should match standard glob patterns in extraction', () => { - const config = makeConfig(['**/*.ts'], ['node_modules/**']); - - expect(shouldIncludeFile('src/index.ts', config)).toBe(true); - expect(shouldIncludeFile('src/deep/nested/file.ts', config)).toBe(true); - expect(shouldIncludeFile('src/index.js', config)).toBe(false); - expect(shouldIncludeFile('node_modules/lib/index.ts', config)).toBe(false); - }); - - it('should match standard glob patterns in config', () => { - const config = makeConfig(['**/*.py'], ['__pycache__/**']); - - expect(configShouldInclude('src/main.py', config)).toBe(true); - expect(configShouldInclude('src/main.ts', config)).toBe(false); - expect(configShouldInclude('__pycache__/module.py', config)).toBe(false); - }); - - it('should handle complex glob patterns correctly', () => { - const config = makeConfig(['src/**/*.{ts,tsx}', 'lib/**/*.js'], []); - - expect(shouldIncludeFile('src/component.ts', config)).toBe(true); - expect(shouldIncludeFile('src/component.tsx', config)).toBe(true); - expect(shouldIncludeFile('lib/util.js', config)).toBe(true); - expect(shouldIncludeFile('src/component.css', config)).toBe(false); - }); - - it('should handle patterns that previously caused ReDoS', () => { - // This pattern would cause catastrophic backtracking with hand-rolled regex - const evilPattern = '**/**/**/**/**/**/**/**/**/**/**/**/**/**/a'; - const config = makeConfig([evilPattern], []); - - const start = Date.now(); - // This should return quickly, not hang - shouldIncludeFile('x/x/x/x/x/x/x/x/x/x/x/x/x/x/b', config); - const elapsed = Date.now() - start; - - // Should complete in under 100ms, not seconds - expect(elapsed).toBeLessThan(100); + it('rejects unsupported extensions and extensionless files', () => { + expect(isSourceFile('src/component.css')).toBe(false); + expect(isSourceFile('README.md')).toBe(false); + expect(isSourceFile('Makefile')).toBe(false); + expect(isSourceFile('.gitignore')).toBe(false); }); - it('should handle dot files correctly', () => { - const config = makeConfig(['**/*.ts'], []); - - expect(shouldIncludeFile('.hidden/index.ts', config)).toBe(true); + it('matches regardless of leading dot directories', () => { + expect(isSourceFile('.hidden/index.ts')).toBe(true); }); }); @@ -464,15 +428,9 @@ describe('Symlink Cycle Detection', () => { return; } - const config: CodeGraphConfig = { - ...DEFAULT_CONFIG, - rootDir: tempDir, - include: ['**/*.ts'], - exclude: [], - }; // This should complete without hanging - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); // Should find the real file but not loop infinitely expect(files).toContain('src/index.ts'); @@ -496,14 +454,8 @@ describe('Symlink Cycle Detection', () => { return; } - const config: CodeGraphConfig = { - ...DEFAULT_CONFIG, - rootDir: tempDir, - include: ['**/*.ts'], - exclude: [], - }; - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); // Should find files from both the real dir and via the symlink // But deduplicate since they resolve to the same real path @@ -521,15 +473,9 @@ describe('Symlink Cycle Detection', () => { return; } - const config: CodeGraphConfig = { - ...DEFAULT_CONFIG, - rootDir: tempDir, - include: ['**/*.ts'], - exclude: [], - }; // Should not throw - const files = scanDirectory(tempDir, config); + const files = scanDirectory(tempDir); expect(files).toContain('src/valid.ts'); }); }); diff --git a/__tests__/sync.test.ts b/__tests__/sync.test.ts index 374e7788..708a92a4 100644 --- a/__tests__/sync.test.ts +++ b/__tests__/sync.test.ts @@ -281,11 +281,11 @@ describe('Sync Module', () => { expect(nodes.length).toBe(0); }); - it('should skip files not matching config', async () => { - // Create a .js file which doesn't match **/*.ts + it('should skip files with unsupported extensions', async () => { + // A .txt file has no supported grammar, so sync must not index it. fs.writeFileSync( - path.join(testDir, 'src', 'ignored.js'), - `function ignored() {}` + path.join(testDir, 'src', 'notes.txt'), + `just some notes` ); const result = await cg.sync(); diff --git a/__tests__/watch-policy.test.ts b/__tests__/watch-policy.test.ts index ee50d8c9..5cb92ce7 100644 --- a/__tests__/watch-policy.test.ts +++ b/__tests__/watch-policy.test.ts @@ -12,7 +12,6 @@ import * as path from 'path'; import * as os from 'os'; import { watchDisabledReason } from '../src/sync/watch-policy'; import { FileWatcher } from '../src/sync/watcher'; -import type { CodeGraphConfig } from '../src/types'; describe('watchDisabledReason', () => { it('returns a reason when CODEGRAPH_NO_WATCH=1', () => { @@ -63,18 +62,6 @@ describe('watchDisabledReason', () => { describe('FileWatcher honors the watch policy', () => { let testDir: string; - const baseConfig: CodeGraphConfig = { - version: 1, - rootDir: '.', - include: ['**/*.ts'], - exclude: ['**/node_modules/**'], - languages: [], - frameworks: [], - maxFileSize: 1024 * 1024, - extractDocstrings: true, - trackCallSites: true, - }; - afterEach(() => { delete process.env.CODEGRAPH_NO_WATCH; if (testDir && fs.existsSync(testDir)) { @@ -87,7 +74,7 @@ describe('FileWatcher honors the watch policy', () => { process.env.CODEGRAPH_NO_WATCH = '1'; const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn); + const watcher = new FileWatcher(testDir, syncFn); expect(watcher.start()).toBe(false); expect(watcher.isActive()).toBe(false); diff --git a/__tests__/watcher.test.ts b/__tests__/watcher.test.ts index f3638e6d..fde5f593 100644 --- a/__tests__/watcher.test.ts +++ b/__tests__/watcher.test.ts @@ -9,7 +9,6 @@ import * as fs from 'fs'; import * as path from 'path'; import * as os from 'os'; import { FileWatcher } from '../src/sync/watcher'; -import type { CodeGraphConfig } from '../src/types'; import CodeGraph from '../src/index'; /** @@ -34,18 +33,6 @@ function waitFor( describe('FileWatcher', () => { let testDir: string; - const baseConfig: CodeGraphConfig = { - version: 1, - rootDir: '.', - include: ['**/*.ts', '**/*.js'], - exclude: ['**/node_modules/**', '**/dist/**'], - languages: [], - frameworks: [], - maxFileSize: 1024 * 1024, - extractDocstrings: true, - trackCallSites: true, - }; - beforeEach(() => { testDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-watcher-')); // Create a source file so the directory isn't empty @@ -63,7 +50,7 @@ describe('FileWatcher', () => { describe('start/stop lifecycle', () => { it('should start and stop without errors', () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn); + const watcher = new FileWatcher(testDir, syncFn); const started = watcher.start(); expect(started).toBe(true); @@ -75,7 +62,7 @@ describe('FileWatcher', () => { it('should be idempotent on double start', () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn); + const watcher = new FileWatcher(testDir, syncFn); expect(watcher.start()).toBe(true); expect(watcher.start()).toBe(true); // Should not throw @@ -86,7 +73,7 @@ describe('FileWatcher', () => { it('should be idempotent on double stop', () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn); + const watcher = new FileWatcher(testDir, syncFn); watcher.start(); watcher.stop(); @@ -98,7 +85,7 @@ describe('FileWatcher', () => { describe('debounced sync', () => { it('should trigger sync after file change', async () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 1, durationMs: 10 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn, { debounceMs: 200 }); + const watcher = new FileWatcher(testDir, syncFn, { debounceMs: 200 }); watcher.start(); @@ -114,7 +101,7 @@ describe('FileWatcher', () => { it('should debounce rapid changes into a single sync', async () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 1, durationMs: 10 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn, { debounceMs: 500 }); + const watcher = new FileWatcher(testDir, syncFn, { debounceMs: 500 }); watcher.start(); @@ -140,7 +127,7 @@ describe('FileWatcher', () => { describe('filtering', () => { it('should ignore files not matching include patterns', async () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn, { debounceMs: 200 }); + const watcher = new FileWatcher(testDir, syncFn, { debounceMs: 200 }); watcher.start(); @@ -160,7 +147,7 @@ describe('FileWatcher', () => { it('should ignore .codegraph directory changes', async () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 0, durationMs: 0 }); - const watcher = new FileWatcher(testDir, baseConfig, syncFn, { debounceMs: 200 }); + const watcher = new FileWatcher(testDir, syncFn, { debounceMs: 200 }); watcher.start(); @@ -185,7 +172,7 @@ describe('FileWatcher', () => { it('should call onSyncComplete after successful sync', async () => { const syncFn = vi.fn().mockResolvedValue({ filesChanged: 2, durationMs: 50 }); const onSyncComplete = vi.fn(); - const watcher = new FileWatcher(testDir, baseConfig, syncFn, { + const watcher = new FileWatcher(testDir, syncFn, { debounceMs: 200, onSyncComplete, }); @@ -203,7 +190,7 @@ describe('FileWatcher', () => { it('should call onSyncError when sync throws', async () => { const syncFn = vi.fn().mockRejectedValue(new Error('sync failed')); const onSyncError = vi.fn(); - const watcher = new FileWatcher(testDir, baseConfig, syncFn, { + const watcher = new FileWatcher(testDir, syncFn, { debounceMs: 200, onSyncError, }); diff --git a/package-lock.json b/package-lock.json index 05a37245..d96712a0 100644 --- a/package-lock.json +++ b/package-lock.json @@ -13,6 +13,7 @@ "commander": "^14.0.2", "fast-string-width": "^3.0.2", "fast-wrap-ansi": "^0.2.0", + "ignore": "^7.0.5", "jsonc-parser": "^3.3.1", "picomatch": "^4.0.3", "sisteransi": "^1.0.5", @@ -1145,6 +1146,15 @@ "node": "^8.16.0 || ^10.6.0 || >=11.0.0" } }, + "node_modules/ignore": { + "version": "7.0.5", + "resolved": "https://registry.npmjs.org/ignore/-/ignore-7.0.5.tgz", + "integrity": "sha512-Hs59xBNfUIunMFgWAbGX5cq6893IbWg4KnrjbYwX3tx0ztorVgTDA6B2sxf8ejHJ4wz8BqGUMYlnzNBer5NvGg==", + "license": "MIT", + "engines": { + "node": ">= 4" + } + }, "node_modules/jsonc-parser": { "version": "3.3.1", "resolved": "https://registry.npmjs.org/jsonc-parser/-/jsonc-parser-3.3.1.tgz", diff --git a/package.json b/package.json index bdf1d6c1..fdd59185 100644 --- a/package.json +++ b/package.json @@ -36,6 +36,7 @@ "commander": "^14.0.2", "fast-string-width": "^3.0.2", "fast-wrap-ansi": "^0.2.0", + "ignore": "^7.0.5", "jsonc-parser": "^3.3.1", "picomatch": "^4.0.3", "sisteransi": "^1.0.5", diff --git a/src/config.ts b/src/config.ts deleted file mode 100644 index 9ab1032a..00000000 --- a/src/config.ts +++ /dev/null @@ -1,297 +0,0 @@ -/** - * Configuration Management - * - * Load, save, and validate CodeGraph configuration. - */ - -import * as fs from 'fs'; -import * as path from 'path'; -import picomatch from 'picomatch'; -import { CodeGraphConfig, DEFAULT_CONFIG, Language, NodeKind } from './types'; -import { normalizePath } from './utils'; - -/** - * Configuration filename - */ -export const CONFIG_FILENAME = 'config.json'; - -/** - * Get the config file path for a project - */ -export function getConfigPath(projectRoot: string): string { - return path.join(projectRoot, '.codegraph', CONFIG_FILENAME); -} - -/** - * Check if a regex pattern is safe from ReDoS attacks. - * - * Rejects patterns with nested quantifiers (e.g., (a+)+, (a*)*) which - * are the primary source of catastrophic backtracking. Also rejects - * excessively long patterns and validates compilability. - */ -function isSafeRegex(pattern: string): boolean { - // Reject excessively long patterns - if (pattern.length > 500) return false; - - // Reject nested quantifiers: (...)+ followed by +, *, or { - // These are the primary cause of catastrophic backtracking - if (/([+*}])\s*[+*{]/.test(pattern)) return false; - if (/\([^)]*[+*][^)]*\)[+*{]/.test(pattern)) return false; - - // Verify the pattern is a valid regex - try { - new RegExp(pattern); - return true; - } catch { - return false; - } -} - -/** - * Validate a configuration object - */ -export function validateConfig(config: unknown): config is CodeGraphConfig { - if (typeof config !== 'object' || config === null) { - return false; - } - - const c = config as Record; - - // Required fields - if (typeof c.version !== 'number') return false; - if (typeof c.rootDir !== 'string') return false; - if (!Array.isArray(c.include)) return false; - if (!Array.isArray(c.exclude)) return false; - if (!Array.isArray(c.languages)) return false; - if (!Array.isArray(c.frameworks)) return false; - if (typeof c.maxFileSize !== 'number') return false; - if (typeof c.extractDocstrings !== 'boolean') return false; - if (typeof c.trackCallSites !== 'boolean') return false; - - // Validate include/exclude are string arrays - if (!c.include.every((p) => typeof p === 'string')) return false; - if (!c.exclude.every((p) => typeof p === 'string')) return false; - - // Validate languages - const validLanguages: Language[] = [ - 'typescript', - 'javascript', - 'python', - 'go', - 'rust', - 'java', - 'svelte', - 'unknown', - ]; - if (!c.languages.every((l) => validLanguages.includes(l as Language))) return false; - - // Validate frameworks - for (const fw of c.frameworks) { - if (typeof fw !== 'object' || fw === null) return false; - const framework = fw as Record; - if (typeof framework.name !== 'string') return false; - } - - // Validate custom patterns if present - if (c.customPatterns !== undefined) { - if (!Array.isArray(c.customPatterns)) return false; - for (const pattern of c.customPatterns) { - if (typeof pattern !== 'object' || pattern === null) return false; - const p = pattern as Record; - if (typeof p.name !== 'string') return false; - if (typeof p.pattern !== 'string') return false; - if (typeof p.kind !== 'string') return false; - - // Validate regex is compilable and reject patterns with known ReDoS risks - if (!isSafeRegex(p.pattern)) return false; - } - } - - return true; -} - -/** - * Merge configuration with defaults - */ -function mergeConfig( - defaults: CodeGraphConfig, - overrides: Partial -): CodeGraphConfig { - return { - version: overrides.version ?? defaults.version, - rootDir: overrides.rootDir ?? defaults.rootDir, - include: overrides.include ?? defaults.include, - exclude: overrides.exclude ?? defaults.exclude, - languages: overrides.languages ?? defaults.languages, - frameworks: overrides.frameworks ?? defaults.frameworks, - maxFileSize: overrides.maxFileSize ?? defaults.maxFileSize, - extractDocstrings: overrides.extractDocstrings ?? defaults.extractDocstrings, - trackCallSites: overrides.trackCallSites ?? defaults.trackCallSites, - customPatterns: overrides.customPatterns ?? defaults.customPatterns, - }; -} - -/** - * Load configuration from a project - */ -export function loadConfig(projectRoot: string): CodeGraphConfig { - const configPath = getConfigPath(projectRoot); - - if (!fs.existsSync(configPath)) { - // Return default config with adjusted rootDir - return { - ...DEFAULT_CONFIG, - rootDir: projectRoot, - }; - } - - try { - const content = fs.readFileSync(configPath, 'utf-8'); - const parsed = JSON.parse(content) as unknown; - - // Merge with defaults to ensure all fields are present - const merged = mergeConfig(DEFAULT_CONFIG, parsed as Partial); - merged.rootDir = projectRoot; // Always use actual project root - - if (!validateConfig(merged)) { - throw new Error('Invalid configuration format'); - } - - return merged; - } catch (error) { - if (error instanceof SyntaxError) { - throw new Error(`Invalid JSON in config file: ${configPath}`); - } - throw error; - } -} - -/** - * Save configuration to a project - */ -export function saveConfig(projectRoot: string, config: CodeGraphConfig): void { - const configPath = getConfigPath(projectRoot); - const dir = path.dirname(configPath); - - // Ensure directory exists - if (!fs.existsSync(dir)) { - fs.mkdirSync(dir, { recursive: true }); - } - - // Create a copy without rootDir (it's always derived from project path) - const toSave = { ...config }; - delete (toSave as Partial).rootDir; - - const content = JSON.stringify(toSave, null, 2); - - // Atomic write: write to temp file then rename to prevent partial/corrupt configs - const tmpPath = configPath + '.tmp'; - fs.writeFileSync(tmpPath, content, 'utf-8'); - fs.renameSync(tmpPath, configPath); -} - -/** - * Create default configuration for a new project - */ -export function createDefaultConfig(projectRoot: string): CodeGraphConfig { - return { - ...DEFAULT_CONFIG, - rootDir: projectRoot, - }; -} - -/** - * Update specific configuration values - */ -export function updateConfig( - projectRoot: string, - updates: Partial -): CodeGraphConfig { - const current = loadConfig(projectRoot); - const updated = mergeConfig(current, updates); - updated.rootDir = projectRoot; - saveConfig(projectRoot, updated); - return updated; -} - -/** - * Add patterns to include list - */ -export function addIncludePatterns(projectRoot: string, patterns: string[]): CodeGraphConfig { - const config = loadConfig(projectRoot); - const newPatterns = patterns.filter((p) => !config.include.includes(p)); - config.include = [...config.include, ...newPatterns]; - saveConfig(projectRoot, config); - return config; -} - -/** - * Add patterns to exclude list - */ -export function addExcludePatterns(projectRoot: string, patterns: string[]): CodeGraphConfig { - const config = loadConfig(projectRoot); - const newPatterns = patterns.filter((p) => !config.exclude.includes(p)); - config.exclude = [...config.exclude, ...newPatterns]; - saveConfig(projectRoot, config); - return config; -} - -/** - * Add a custom pattern - */ -export function addCustomPattern( - projectRoot: string, - name: string, - pattern: string, - kind: NodeKind -): CodeGraphConfig { - const config = loadConfig(projectRoot); - - if (!config.customPatterns) { - config.customPatterns = []; - } - - // Check for duplicate name - const existing = config.customPatterns.find((p) => p.name === name); - if (existing) { - existing.pattern = pattern; - existing.kind = kind; - } else { - config.customPatterns.push({ name, pattern, kind }); - } - - saveConfig(projectRoot, config); - return config; -} - -/** - * Check if a file path matches the include/exclude patterns - */ -export function shouldIncludeFile(filePath: string, config: CodeGraphConfig): boolean { - // Normalize to forward slashes so Windows backslash paths match glob patterns - filePath = normalizePath(filePath); - - // Simple glob matching (for now, just check if any pattern matches) - // A full implementation would use a proper glob library - - const matchesPattern = (pattern: string, filePath: string): boolean => { - return picomatch.isMatch(filePath, pattern, { dot: true }); - }; - - // Check exclude patterns first - for (const pattern of config.exclude) { - if (matchesPattern(pattern, filePath)) { - return false; - } - } - - // Check include patterns - for (const pattern of config.include) { - if (matchesPattern(pattern, filePath)) { - return true; - } - } - - // Default to not including if no pattern matches - return false; -} diff --git a/src/extraction/grammars.ts b/src/extraction/grammars.ts index a67d36bb..c78c52ce 100644 --- a/src/extraction/grammars.ts +++ b/src/extraction/grammars.ts @@ -94,6 +94,17 @@ export const EXTENSION_MAP: Record = { '.luau': 'luau', }; +/** + * Whether a file is one CodeGraph can parse, based purely on its extension. + * This is the single source of truth for "should we index this file" — derived + * from EXTENSION_MAP so parser support and indexing selection never drift. + */ +export function isSourceFile(filePath: string): boolean { + const dot = filePath.lastIndexOf('.'); + if (dot < 0) return false; + return filePath.slice(dot).toLowerCase() in EXTENSION_MAP; +} + /** * Caches for loaded grammars and parsers */ diff --git a/src/extraction/index.ts b/src/extraction/index.ts index 18086bdf..d502a24f 100644 --- a/src/extraction/index.ts +++ b/src/extraction/index.ts @@ -14,14 +14,13 @@ import { FileRecord, ExtractionResult, ExtractionError, - CodeGraphConfig, } from '../types'; import { QueryBuilder } from '../db/queries'; import { extractFromSource } from './tree-sitter'; -import { detectLanguage, isLanguageSupported, initGrammars, loadGrammarsForLanguages } from './grammars'; +import { detectLanguage, isSourceFile, isLanguageSupported, initGrammars, loadGrammarsForLanguages } from './grammars'; import { logDebug, logWarn } from '../errors'; import { validatePathWithinRoot, normalizePath } from '../utils'; -import picomatch from 'picomatch'; +import ignore, { Ignore } from 'ignore'; import { detectFrameworks } from '../resolution/frameworks'; import type { ResolutionContext } from '../resolution/types'; @@ -94,36 +93,11 @@ export function hashContent(content: string): string { } /** - * Check if a path matches any glob pattern (simplified) + * Skip files larger than this (bytes). Generated bundles, minified JS, and + * vendored blobs blow the WASM heap and the worker-recycle budget for no useful + * symbols. 1 MB covers essentially all hand-written source. */ -function matchesGlob(filePath: string, pattern: string): boolean { - filePath = normalizePath(filePath); - return picomatch.isMatch(filePath, pattern, { dot: true }); -} - -/** - * Check if a file should be included based on config - */ -export function shouldIncludeFile( - filePath: string, - config: CodeGraphConfig -): boolean { - // Check exclude patterns first - for (const pattern of config.exclude) { - if (matchesGlob(filePath, pattern)) { - return false; - } - } - - // Check include patterns - for (const pattern of config.include) { - if (matchesGlob(filePath, pattern)) { - return true; - } - } - - return false; -} +const MAX_FILE_SIZE = 1024 * 1024; /** * Collect git-visible files (tracked + untracked, .gitignore-respected) from the @@ -230,7 +204,7 @@ interface GitChanges { * Use `git status` to detect changed files instead of scanning every file. * Returns null on failure so callers fall back to full scan. */ -function getGitChangedFiles(rootDir: string, config: CodeGraphConfig): GitChanges | null { +function getGitChangedFiles(rootDir: string): GitChanges | null { try { const output = execFileSync( 'git', @@ -248,8 +222,8 @@ function getGitChangedFiles(rootDir: string, config: CodeGraphConfig): GitChange const statusCode = line.substring(0, 2); const filePath = normalizePath(line.substring(3)); - // Skip files that don't match include/exclude config - if (!shouldIncludeFile(filePath, config)) continue; + // Skip non-source files (git status already omits .gitignored paths). + if (!isSourceFile(filePath)) continue; if (statusCode === '??') { added.push(filePath); @@ -268,20 +242,14 @@ function getGitChangedFiles(rootDir: string, config: CodeGraphConfig): GitChange } /** - * Marker file name that indicates a directory (and all children) should be skipped - */ -const CODEGRAPH_IGNORE_MARKER = '.codegraphignore'; - -/** - * Recursively scan directory for source files. + * Recursively scan a directory for source files. * - * In git repos, uses `git ls-files` to get the file list (inherently - * respects .gitignore at all levels), then filters by config include patterns. - * Falls back to filesystem walk for non-git projects. + * In git repos, uses `git ls-files` (inherently respects .gitignore at all + * levels), then keeps files with a supported source extension. For non-git + * projects, falls back to a filesystem walk that parses .gitignore itself. */ export function scanDirectory( rootDir: string, - config: CodeGraphConfig, onProgress?: (current: number, file: string) => void ): string[] { // Fast path: use git to get all visible files (respects .gitignore everywhere) @@ -290,7 +258,7 @@ export function scanDirectory( const files: string[] = []; let count = 0; for (const filePath of gitFiles) { - if (shouldIncludeFile(filePath, config)) { + if (isSourceFile(filePath)) { files.push(filePath); count++; onProgress?.(count, filePath); @@ -300,7 +268,7 @@ export function scanDirectory( } // Fallback: walk filesystem for non-git projects - return scanDirectoryWalk(rootDir, config, onProgress); + return scanDirectoryWalk(rootDir, onProgress); } /** @@ -309,7 +277,6 @@ export function scanDirectory( */ export async function scanDirectoryAsync( rootDir: string, - config: CodeGraphConfig, onProgress?: (current: number, file: string) => void ): Promise { const gitFiles = getGitVisibleFiles(rootDir); @@ -317,7 +284,7 @@ export async function scanDirectoryAsync( const files: string[] = []; let count = 0; for (const filePath of gitFiles) { - if (shouldIncludeFile(filePath, config)) { + if (isSourceFile(filePath)) { files.push(filePath); count++; onProgress?.(count, filePath); @@ -330,7 +297,7 @@ export async function scanDirectoryAsync( return files; } - return scanDirectoryWalk(rootDir, config, onProgress); + return scanDirectoryWalk(rootDir, onProgress); } /** @@ -338,14 +305,44 @@ export async function scanDirectoryAsync( */ function scanDirectoryWalk( rootDir: string, - config: CodeGraphConfig, onProgress?: (current: number, file: string) => void ): string[] { const files: string[] = []; let count = 0; const visitedDirs = new Set(); - function walk(dir: string): void { + // A .gitignore matcher scoped to the directory that declared it. Patterns in + // a nested .gitignore are relative to that directory, so we keep the dir + // alongside the matcher and test paths relative to it — mirroring how git + // applies .gitignore files at every level. + interface ScopedIgnore { + dir: string; + ig: Ignore; + } + + const loadIgnore = (dir: string): ScopedIgnore | null => { + try { + const giPath = path.join(dir, '.gitignore'); + if (fs.existsSync(giPath)) { + return { dir, ig: ignore().add(fs.readFileSync(giPath, 'utf-8')) }; + } + } catch { + // Unreadable .gitignore — treat as absent. + } + return null; + }; + + const isIgnored = (fullPath: string, isDir: boolean, matchers: ScopedIgnore[]): boolean => { + for (const { dir, ig } of matchers) { + let rel = normalizePath(path.relative(dir, fullPath)); + if (!rel || rel.startsWith('..')) continue; // not under this matcher's dir + if (isDir) rel += '/'; // dir-only rules (e.g. `build/`) only match with the slash + if (ig.ignores(rel)) return true; + } + return false; + }; + + function walk(dir: string, matchers: ScopedIgnore[]): void { let realDir: string; try { realDir = fs.realpathSync(dir); @@ -360,12 +357,9 @@ function scanDirectoryWalk( } visitedDirs.add(realDir); - // Check for .codegraphignore marker file - const ignoreMarker = path.join(dir, CODEGRAPH_IGNORE_MARKER); - if (fs.existsSync(ignoreMarker)) { - logDebug('Skipping directory due to .codegraphignore marker', { dir }); - return; - } + // This directory's own .gitignore (if present) applies to everything below it. + const own = loadIgnore(dir); + const active = own ? [...matchers, own] : matchers; let entries: fs.Dirent[]; try { @@ -376,6 +370,9 @@ function scanDirectoryWalk( } for (const entry of entries) { + // Never descend into git internals or our own data directory. + if (entry.name === '.git' || entry.name === '.codegraph') continue; + const fullPath = path.join(dir, entry.name); const relativePath = normalizePath(path.relative(rootDir, fullPath)); @@ -384,19 +381,11 @@ function scanDirectoryWalk( const realTarget = fs.realpathSync(fullPath); const stat = fs.statSync(realTarget); if (stat.isDirectory()) { - const dirPattern = relativePath + '/'; - let excluded = false; - for (const pattern of config.exclude) { - if (matchesGlob(dirPattern, pattern) || matchesGlob(relativePath, pattern)) { - excluded = true; - break; - } - } - if (!excluded) { - walk(fullPath); + if (!isIgnored(fullPath, true, active)) { + walk(fullPath, active); } } else if (stat.isFile()) { - if (shouldIncludeFile(relativePath, config)) { + if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath)) { files.push(relativePath); count++; onProgress?.(count, relativePath); @@ -409,19 +398,11 @@ function scanDirectoryWalk( } if (entry.isDirectory()) { - const dirPattern = relativePath + '/'; - let excluded = false; - for (const pattern of config.exclude) { - if (matchesGlob(dirPattern, pattern) || matchesGlob(relativePath, pattern)) { - excluded = true; - break; - } - } - if (!excluded) { - walk(fullPath); + if (!isIgnored(fullPath, true, active)) { + walk(fullPath, active); } } else if (entry.isFile()) { - if (shouldIncludeFile(relativePath, config)) { + if (!isIgnored(fullPath, false, active) && isSourceFile(relativePath)) { files.push(relativePath); count++; onProgress?.(count, relativePath); @@ -430,7 +411,7 @@ function scanDirectoryWalk( } } - walk(rootDir); + walk(rootDir, []); return files; } @@ -439,7 +420,6 @@ function scanDirectoryWalk( */ export class ExtractionOrchestrator { private rootDir: string; - private config: CodeGraphConfig; private queries: QueryBuilder; /** * Names of frameworks detected for this project, populated by indexAll(). @@ -449,9 +429,8 @@ export class ExtractionOrchestrator { */ private detectedFrameworkNames: string[] | null = null; - constructor(rootDir: string, config: CodeGraphConfig, queries: QueryBuilder) { + constructor(rootDir: string, queries: QueryBuilder) { this.rootDir = rootDir; - this.config = config; this.queries = queries; } @@ -500,7 +479,7 @@ export class ExtractionOrchestrator { */ private ensureDetectedFrameworks(files?: string[]): string[] { if (this.detectedFrameworkNames !== null) return this.detectedFrameworkNames; - const fileList = files ?? scanDirectory(this.rootDir, this.config); + const fileList = files ?? scanDirectory(this.rootDir); const context = this.buildDetectionContext(fileList); this.detectedFrameworkNames = detectFrameworks(context).map((r) => r.name); return this.detectedFrameworkNames; @@ -534,7 +513,7 @@ export class ExtractionOrchestrator { total: 0, }); - const files = await scanDirectoryAsync(this.rootDir, this.config, (current, file) => { + const files = await scanDirectoryAsync(this.rootDir, (current, file) => { onProgress?.({ phase: 'scanning', current, @@ -802,18 +781,16 @@ export class ExtractionOrchestrator { continue; } - // Honour config.maxFileSize. Without this check, vendored - // generated headers, minified bundles, and other multi-MB - // files get indexed despite the user setting a size cap — - // wasting WASM heap and the worker recycle budget on inputs - // the user explicitly opted out of. The single-file extractFile - // path already enforces this; the bulk path used to silently - // skip the check. - if (stats.size > this.config.maxFileSize) { + // Honour MAX_FILE_SIZE. Without this check, vendored generated + // headers, minified bundles, and other multi-MB files get indexed, + // wasting WASM heap and the worker recycle budget on inputs with no + // useful symbols. The single-file extractFile path already enforces + // this; the bulk path used to silently skip the check. + if (stats.size > MAX_FILE_SIZE) { processed++; filesSkipped++; errors.push({ - message: `File exceeds max size (${stats.size} > ${this.config.maxFileSize})`, + message: `File exceeds max size (${stats.size} > ${MAX_FILE_SIZE})`, filePath, severity: 'warning', code: 'size_exceeded', @@ -1108,14 +1085,14 @@ export class ExtractionOrchestrator { } // Check file size - if (stats.size > this.config.maxFileSize) { + if (stats.size > MAX_FILE_SIZE) { return { nodes: [], edges: [], unresolvedReferences: [], errors: [ { - message: `File exceeds max size (${stats.size} > ${this.config.maxFileSize})`, + message: `File exceeds max size (${stats.size} > ${MAX_FILE_SIZE})`, filePath: relativePath, severity: 'warning', code: 'size_exceeded', @@ -1245,7 +1222,7 @@ export class ExtractionOrchestrator { }); const filesToIndex: string[] = []; - const gitChanges = getGitChangedFiles(this.rootDir, this.config); + const gitChanges = getGitChangedFiles(this.rootDir); if (gitChanges) { // === Git fast path === @@ -1291,7 +1268,7 @@ export class ExtractionOrchestrator { } } else { // === Fallback: full scan (non-git project or git failure) === - const currentFiles = new Set(scanDirectory(this.rootDir, this.config)); + const currentFiles = new Set(scanDirectory(this.rootDir)); filesChecked = currentFiles.size; // Build Map for O(1) lookups instead of .find() per file @@ -1376,7 +1353,7 @@ export class ExtractionOrchestrator { * Uses git status as a fast path when available, falling back to full scan. */ getChangedFiles(): { added: string[]; modified: string[]; removed: string[] } { - const gitChanges = getGitChangedFiles(this.rootDir, this.config); + const gitChanges = getGitChangedFiles(this.rootDir); if (gitChanges) { // === Git fast path === @@ -1420,7 +1397,7 @@ export class ExtractionOrchestrator { } // === Fallback: full scan (non-git project or git failure) === - const currentFiles = new Set(scanDirectory(this.rootDir, this.config)); + const currentFiles = new Set(scanDirectory(this.rootDir)); const trackedFiles = this.queries.getAllFiles(); // Build Map for O(1) lookups @@ -1467,4 +1444,4 @@ export class ExtractionOrchestrator { // Re-export useful types and functions export { extractFromSource } from './tree-sitter'; -export { detectLanguage, isLanguageSupported, isGrammarLoaded, getSupportedLanguages, initGrammars, loadGrammarsForLanguages, loadAllGrammars } from './grammars'; +export { detectLanguage, isSourceFile, isLanguageSupported, isGrammarLoaded, getSupportedLanguages, initGrammars, loadGrammarsForLanguages, loadAllGrammars } from './grammars'; diff --git a/src/index.ts b/src/index.ts index 99b55ad7..b2acf346 100644 --- a/src/index.ts +++ b/src/index.ts @@ -7,7 +7,6 @@ import * as path from 'path'; import { - CodeGraphConfig, Node, Edge, FileRecord, @@ -25,7 +24,6 @@ import { } from './types'; import { DatabaseConnection, getDatabasePath } from './db'; import { QueryBuilder } from './db/queries'; -import { loadConfig, saveConfig, createDefaultConfig } from './config'; import { isInitialized, createDirectory, @@ -53,7 +51,6 @@ import { FileWatcher, WatchOptions } from './sync'; // Re-export types for consumers export * from './types'; export { getDatabasePath } from './db'; -export { getConfigPath } from './config'; export { getCodeGraphDir, isInitialized, @@ -85,9 +82,6 @@ export { MCPServer } from './mcp'; * Options for initializing a new CodeGraph project */ export interface InitOptions { - /** Custom configuration overrides */ - config?: Partial; - /** Whether to run initial indexing after init */ index?: boolean; @@ -128,7 +122,6 @@ export interface IndexOptions { export class CodeGraph { private db: DatabaseConnection; private queries: QueryBuilder; - private config: CodeGraphConfig; private projectRoot: string; private orchestrator: ExtractionOrchestrator; private resolver: ReferenceResolver; @@ -148,17 +141,15 @@ export class CodeGraph { private constructor( db: DatabaseConnection, queries: QueryBuilder, - config: CodeGraphConfig, projectRoot: string ) { this.db = db; this.queries = queries; - this.config = config; this.projectRoot = projectRoot; this.fileLock = new FileLock( path.join(projectRoot, '.codegraph', 'codegraph.lock') ); - this.orchestrator = new ExtractionOrchestrator(projectRoot, config, queries); + this.orchestrator = new ExtractionOrchestrator(projectRoot, queries); this.resolver = createResolver(projectRoot, queries); this.graphManager = new GraphQueryManager(queries); this.traverser = new GraphTraverser(queries); @@ -194,19 +185,12 @@ export class CodeGraph { // Create directory structure createDirectory(resolvedRoot); - // Create and save configuration - const config = createDefaultConfig(resolvedRoot); - if (options.config) { - Object.assign(config, options.config); - } - saveConfig(resolvedRoot, config); - // Initialize database const dbPath = getDatabasePath(resolvedRoot); const db = DatabaseConnection.initialize(dbPath); const queries = new QueryBuilder(db.getDb()); - const instance = new CodeGraph(db, queries, config, resolvedRoot); + const instance = new CodeGraph(db, queries, resolvedRoot); // Run initial indexing if requested if (options.index) { @@ -219,7 +203,7 @@ export class CodeGraph { /** * Initialize synchronously (without indexing) */ - static initSync(projectRoot: string, options: Omit = {}): CodeGraph { + static initSync(projectRoot: string): CodeGraph { const resolvedRoot = path.resolve(projectRoot); // Check if already initialized @@ -230,19 +214,12 @@ export class CodeGraph { // Create directory structure createDirectory(resolvedRoot); - // Create and save configuration - const config = createDefaultConfig(resolvedRoot); - if (options.config) { - Object.assign(config, options.config); - } - saveConfig(resolvedRoot, config); - // Initialize database const dbPath = getDatabasePath(resolvedRoot); const db = DatabaseConnection.initialize(dbPath); const queries = new QueryBuilder(db.getDb()); - return new CodeGraph(db, queries, config, resolvedRoot); + return new CodeGraph(db, queries, resolvedRoot); } /** @@ -267,15 +244,12 @@ export class CodeGraph { throw new Error(`Invalid CodeGraph directory: ${validation.errors.join(', ')}`); } - // Load configuration - const config = loadConfig(resolvedRoot); - // Open database const dbPath = getDatabasePath(resolvedRoot); const db = DatabaseConnection.open(dbPath); const queries = new QueryBuilder(db.getDb()); - const instance = new CodeGraph(db, queries, config, resolvedRoot); + const instance = new CodeGraph(db, queries, resolvedRoot); // Sync if requested if (options.sync) { @@ -302,15 +276,12 @@ export class CodeGraph { throw new Error(`Invalid CodeGraph directory: ${validation.errors.join(', ')}`); } - // Load configuration - const config = loadConfig(resolvedRoot); - // Open database const dbPath = getDatabasePath(resolvedRoot); const db = DatabaseConnection.open(dbPath); const queries = new QueryBuilder(db.getDb()); - return new CodeGraph(db, queries, config, resolvedRoot); + return new CodeGraph(db, queries, resolvedRoot); } /** @@ -330,32 +301,6 @@ export class CodeGraph { this.db.close(); } - // =========================================================================== - // Configuration - // =========================================================================== - - /** - * Get the current configuration - */ - getConfig(): CodeGraphConfig { - return { ...this.config }; - } - - /** - * Update configuration - */ - updateConfig(updates: Partial): void { - Object.assign(this.config, updates); - saveConfig(this.projectRoot, this.config); - // Recreate orchestrator and resolver with new config - this.orchestrator = new ExtractionOrchestrator( - this.projectRoot, - this.config, - this.queries - ); - this.resolver = createResolver(this.projectRoot, this.queries); - } - /** * Get the project root directory */ @@ -515,7 +460,6 @@ export class CodeGraph { this.watcher = new FileWatcher( this.projectRoot, - this.config, async () => { const result = await this.sync(); const filesChanged = result.filesAdded + result.filesModified + result.filesRemoved; diff --git a/src/sync/watcher.ts b/src/sync/watcher.ts index 2c16d82a..68e60fff 100644 --- a/src/sync/watcher.ts +++ b/src/sync/watcher.ts @@ -9,8 +9,7 @@ */ import * as fs from 'fs'; -import { CodeGraphConfig } from '../types'; -import { shouldIncludeFile } from '../extraction'; +import { isSourceFile } from '../extraction'; import { logDebug, logWarn } from '../errors'; import { normalizePath } from '../utils'; import { watchDisabledReason } from './watch-policy'; @@ -44,7 +43,7 @@ export interface WatchOptions { * Design goals: * - Minimal resource usage (native OS file events, no polling) * - Debounced to avoid thrashing on rapid saves - * - Filters against CodeGraph include/exclude patterns + * - Filters to supported source files by extension * - Ignores .codegraph/ directory changes */ export class FileWatcher { @@ -55,7 +54,6 @@ export class FileWatcher { private stopped = false; private readonly projectRoot: string; - private readonly config: CodeGraphConfig; private readonly debounceMs: number; private readonly syncFn: () => Promise<{ filesChanged: number; durationMs: number }>; private readonly onSyncComplete?: WatchOptions['onSyncComplete']; @@ -63,12 +61,10 @@ export class FileWatcher { constructor( projectRoot: string, - config: CodeGraphConfig, syncFn: () => Promise<{ filesChanged: number; durationMs: number }>, options: WatchOptions = {} ) { this.projectRoot = projectRoot; - this.config = config; this.syncFn = syncFn; this.debounceMs = options.debounceMs ?? 2000; this.onSyncComplete = options.onSyncComplete; @@ -112,8 +108,8 @@ export class FileWatcher { return; } - // Filter against include/exclude patterns - if (!shouldIncludeFile(normalized, this.config)) { + // Only sync changes to files we can actually parse. + if (!isSourceFile(normalized)) { return; } diff --git a/src/types.ts b/src/types.ts index 54485ac0..0168665d 100644 --- a/src/types.ts +++ b/src/types.ts @@ -426,297 +426,6 @@ export interface CodeBlock { node?: Node; } -// ============================================================================= -// Configuration Types -// ============================================================================= - -/** - * Framework-specific hints for better extraction - */ -export interface FrameworkHint { - /** Framework name (react, express, django, etc.) */ - name: string; - - /** Version constraint if relevant */ - version?: string; - - /** Custom patterns for this framework */ - patterns?: { - /** Component detection patterns */ - components?: string[]; - /** Route detection patterns */ - routes?: string[]; - /** Model detection patterns */ - models?: string[]; - }; -} - -/** - * Configuration for a CodeGraph project - */ -export interface CodeGraphConfig { - /** Schema version for migrations */ - version: number; - - /** Root directory of the project */ - rootDir: string; - - /** Glob patterns for files to include */ - include: string[]; - - /** Glob patterns for files to exclude */ - exclude: string[]; - - /** Languages to process (auto-detected if empty) */ - languages: Language[]; - - /** Framework hints for better extraction */ - frameworks: FrameworkHint[]; - - /** Maximum file size to process (in bytes) */ - maxFileSize: number; - - /** Whether to extract docstrings */ - extractDocstrings: boolean; - - /** Whether to track call sites */ - trackCallSites: boolean; - - /** Custom symbol patterns to extract */ - customPatterns?: { - /** Name for this pattern group */ - name: string; - /** Regex pattern to match */ - pattern: string; - /** Node kind to assign */ - kind: NodeKind; - }[]; -} - -/** - * Default configuration values - */ -export const DEFAULT_CONFIG: CodeGraphConfig = { - version: 1, - rootDir: '.', - include: [ - // TypeScript/JavaScript - '**/*.ts', - '**/*.tsx', - '**/*.js', - '**/*.jsx', - // Python - '**/*.py', - // Go - '**/*.go', - // Rust - '**/*.rs', - // Java - '**/*.java', - // C/C++ - '**/*.c', - '**/*.h', - '**/*.cpp', - '**/*.hpp', - '**/*.cc', - '**/*.cxx', - // C# - '**/*.cs', - // PHP - '**/*.php', - // Drupal-specific PHP extensions - '**/*.module', - '**/*.install', - '**/*.theme', - '**/*.inc', - // Drupal routing YAML - '**/*.routing.yml', - // Twig templates - '**/*.twig', - // Ruby - '**/*.rb', - // Swift - '**/*.swift', - // Kotlin - '**/*.kt', - '**/*.kts', - // Dart - '**/*.dart', - // Svelte - '**/*.svelte', - // Vue - '**/*.vue', - // Liquid (Shopify themes) - '**/*.liquid', - // Pascal / Delphi - '**/*.pas', - '**/*.dpr', - '**/*.dpk', - '**/*.lpr', - '**/*.dfm', - '**/*.fmx', - // Scala - '**/*.scala', - '**/*.sc', - // Lua - '**/*.lua', - // Luau - '**/*.luau', - ], - exclude: [ - // Version control - '**/.git/**', - - // Dependencies - '**/node_modules/**', - '**/vendor/**', - '**/Pods/**', - - // Generic build outputs - '**/dist/**', - '**/build/**', - '**/out/**', - '**/bin/**', - '**/obj/**', - '**/target/**', - - // JavaScript/TypeScript - '**/*.min.js', - '**/*.bundle.js', - '**/.next/**', - '**/.nuxt/**', - '**/.svelte-kit/**', - '**/.output/**', - '**/.turbo/**', - '**/.cache/**', - '**/.parcel-cache/**', - '**/.vite/**', - '**/.astro/**', - '**/.docusaurus/**', - '**/.gatsby/**', - '**/.webpack/**', - '**/.nx/**', - '**/.yarn/cache/**', - '**/.pnpm-store/**', - '**/storybook-static/**', - - // React Native / Expo - '**/.expo/**', - '**/web-build/**', - '**/ios/Pods/**', - '**/ios/build/**', - '**/android/build/**', - '**/android/.gradle/**', - - // Python - '**/__pycache__/**', - '**/.venv/**', - '**/venv/**', - '**/site-packages/**', - '**/dist-packages/**', - '**/.pytest_cache/**', - '**/.mypy_cache/**', - '**/.ruff_cache/**', - '**/.tox/**', - '**/.nox/**', - '**/*.egg-info/**', - '**/.eggs/**', - - // Go - '**/go/pkg/mod/**', - - // Rust - '**/target/debug/**', - '**/target/release/**', - - // Java/Kotlin/Gradle - '**/.gradle/**', - '**/.m2/**', - '**/generated-sources/**', - '**/.kotlin/**', - - // Dart/Flutter - '**/.dart_tool/**', - - // C#/.NET - '**/.vs/**', - '**/.nuget/**', - '**/artifacts/**', - '**/publish/**', - - // C/C++ - '**/cmake-build-*/**', - '**/CMakeFiles/**', - '**/bazel-*/**', - '**/vcpkg_installed/**', - '**/.conan/**', - '**/Debug/**', - '**/Release/**', - '**/x64/**', - '**/.pio/**', // Platform.io (IoT/embedded build artifacts and library deps) - - // Electron - '**/release/**', - '**/*.app/**', - '**/*.asar', - - // Swift/iOS/Xcode - '**/DerivedData/**', - '**/.build/**', - '**/.swiftpm/**', - '**/xcuserdata/**', - '**/Carthage/Build/**', - '**/SourcePackages/**', - - // Delphi/Pascal - '**/__history/**', - '**/__recovery/**', - '**/*.dcu', - - // PHP - '**/.composer/**', - '**/storage/framework/**', - '**/bootstrap/cache/**', - - // Drupal - core and contrib are rarely customised; index only custom code - '**/web/core/**', - '**/web/modules/contrib/**', - '**/web/themes/contrib/**', - - // Ruby - '**/.bundle/**', - '**/tmp/cache/**', - '**/public/assets/**', - '**/public/packs/**', - '**/.yardoc/**', - - // Testing/Coverage - '**/coverage/**', - '**/htmlcov/**', - '**/.nyc_output/**', - '**/test-results/**', - '**/.coverage/**', - - // IDE/Editor - '**/.idea/**', - - // Logs and temp - '**/logs/**', - '**/tmp/**', - '**/temp/**', - - // Documentation build output - '**/_build/**', - '**/docs/_build/**', - '**/site/**', - ], - languages: [], - frameworks: [], - maxFileSize: 1024 * 1024, // 1MB - extractDocstrings: true, - trackCallSites: true, -}; - // ============================================================================= // Database Types // ============================================================================= From c41559a9d022c47a8c316eae31fac493a2da866f Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 21 May 2026 21:22:50 -0500 Subject: [PATCH 27/47] fix(installer): Windows npm launcher EINVAL on modern Node (#289) (#292) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The npm thin-installer shim spawned the per-platform bundle's `.cmd` launcher directly. Modern Node on Windows refuses to spawn `.cmd`/`.bat` without `shell: true` (the CVE-2024-27980 hardening), so every `codegraph` command failed with `spawnSync …\codegraph.cmd EINVAL` (seen on Node 24). On Windows the shim now invokes the bundled `node.exe` against the app entry point directly, bypassing the `.cmd` (and avoiding the arg-quoting pitfalls of `shell: true`). Unix is unchanged. Validated end-to-end against a real win32-x64 bundle: `npm install` of the packed tarballs + `codegraph init -i`/`status` run on the bundled Node 24. Also cuts release 0.9.2, rolling up the pending Drupal, zero-config, config-removal, Hermes-installer, and symlink-security changes. Co-authored-by: Claude Opus 4.7 (1M context) --- BUNDLING.md | 4 +++- CHANGELOG.md | 25 ++++++++++++++++++++++++- package-lock.json | 4 ++-- package.json | 2 +- scripts/npm-shim.js | 20 ++++++++++++++++---- 5 files changed, 46 insertions(+), 9 deletions(-) diff --git a/BUNDLING.md b/BUNDLING.md index 8cba3309..dc21ab53 100644 --- a/BUNDLING.md +++ b/BUNDLING.md @@ -50,7 +50,9 @@ linux/amd64`). bundles ship as per-platform `optionalDependencies` (`@colbymchenry/codegraph-` with `os`/`cpu`), so npm installs only the matching one. The shim — run by the user's Node — execs the bundle, so the - real work runs on the bundled Node 24. Works even on old Node. + real work runs on the bundled Node 24. Works even on old Node. On Windows it + invokes the bundled `node.exe` against the app entry directly (not the `.cmd` + launcher) — modern Node throws `EINVAL` when asked to spawn a `.cmd`/`.bat`. 3. **Windows** ([`install.ps1`](install.ps1)) — `irm … | iex`; same flow as install.sh (detect arch, pull the `.zip` from Releases, add to PATH). 4. **Homebrew / Scoop** — TODO (tap + cask pointing at the Release archives). diff --git a/CHANGELOG.md b/CHANGELOG.md index 20a2b9bc..b544414b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,9 +7,13 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased] +## [0.9.2] - 2026-05-21 ### Added +- **Installer target: Hermes Agent (Nous Research).** `codegraph install` now + supports Hermes Agent — it writes the `mcp_servers.codegraph` entry and ensures + `platform_toolsets.cli` includes `mcp-codegraph` in `$HERMES_HOME/config.yaml`, + so Hermes can drive the CodeGraph knowledge graph like the other agents. - **Framework support: Drupal 8/9/10/11** — CodeGraph now detects Drupal projects (via a `drupal/*` dependency in `composer.json`) and adds three levels of intelligence: @@ -42,6 +46,15 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). those names; now `.gitignore` is the single source of truth. Resolves [#283](https://github.com/colbymchenry/codegraph/issues/283). +### Fixed +- **Windows: `npm i -g @colbymchenry/codegraph` then any `codegraph` command + failed with `spawnSync …\codegraph.cmd EINVAL`.** The npm launcher spawned the + bundle's `.cmd` file directly, which modern Node refuses to do on Windows + (the CVE-2024-27980 hardening — seen on Node 24). The launcher now invokes the + bundled `node.exe` against the app directly, so `codegraph` works on Windows + regardless of your Node version. Resolves + [#289](https://github.com/colbymchenry/codegraph/issues/289). + ### Removed - **`.codegraph/config.json` and the entire config surface.** Every field was either inert or now redundant with `.gitignore`: @@ -60,6 +73,15 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). exports are gone. Existing `.codegraph/config.json` files are simply ignored. The `.codegraphignore` marker is no longer supported — use `.gitignore`. +### Security +- **MCP session marker no longer follows symlinks** (CWE-59). Every + `codegraph_context` call writes a `codegraph-consulted-*` marker into the + system temp dir; the previous write followed symlinks, so on a multi-user + system another local user could pre-plant that path as a symlink and redirect + the write onto a victim-writable file. The marker is now opened with + `O_NOFOLLOW` and mode `0600`, and a planted symlink is refused rather than + followed. Resolves [#280](https://github.com/colbymchenry/codegraph/issues/280). + ## [0.9.1] - 2026-05-21 ### Fixed @@ -71,6 +93,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). find its bundle. The release pipeline now verifies every package reached the registry (and is idempotent), so a release can't pass green-but-broken again. +[0.9.2]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.2 [0.9.1]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.1 ## [0.9.0] - 2026-05-21 diff --git a/package-lock.json b/package-lock.json index d96712a0..49342496 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.1", + "version": "0.9.2", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@colbymchenry/codegraph", - "version": "0.9.1", + "version": "0.9.2", "license": "MIT", "dependencies": { "@clack/prompts": "^1.3.0", diff --git a/package.json b/package.json index fdd59185..4ea93215 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.1", + "version": "0.9.2", "description": "Supercharge Claude Code with semantic code intelligence. 94% fewer tool calls • 77% faster exploration • 100% local.", "main": "dist/index.js", "types": "dist/index.d.ts", diff --git a/scripts/npm-shim.js b/scripts/npm-shim.js index e12f6fb7..bea905f3 100755 --- a/scripts/npm-shim.js +++ b/scripts/npm-shim.js @@ -19,11 +19,23 @@ var childProcess = require('child_process'); var target = process.platform + '-' + process.arch; // e.g. darwin-arm64, linux-x64 var pkg = '@colbymchenry/codegraph-' + target; -var launcher = process.platform === 'win32' ? 'bin/codegraph.cmd' : 'bin/codegraph'; +var isWindows = process.platform === 'win32'; -var binPath; +// On Windows the bundle's launcher is a .cmd batch file. Modern Node refuses to +// spawn .cmd/.bat directly — spawnSync throws EINVAL (the CVE-2024-27980 +// hardening, observed on Node 24). So on Windows we skip the .cmd and invoke the +// bundled node.exe against the app entry point directly. On unix the bin launcher +// is a shell script that spawns cleanly. +var command, args; try { - binPath = require.resolve(pkg + '/' + launcher); + if (isWindows) { + command = require.resolve(pkg + '/node.exe'); + var entry = require.resolve(pkg + '/lib/dist/bin/codegraph.js'); + args = [entry].concat(process.argv.slice(2)); + } else { + command = require.resolve(pkg + '/bin/codegraph'); + args = process.argv.slice(2); + } } catch (e) { process.stderr.write( 'codegraph: no prebuilt bundle for ' + target + '.\n' + @@ -35,7 +47,7 @@ try { process.exit(1); } -var res = childProcess.spawnSync(binPath, process.argv.slice(2), { stdio: 'inherit' }); +var res = childProcess.spawnSync(command, args, { stdio: 'inherit' }); if (res.error) { process.stderr.write('codegraph: ' + res.error.message + '\n'); process.exit(1); From 359e5820d21c646cde88482f2875d6a2d3d9a335 Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Thu, 21 May 2026 21:30:14 -0500 Subject: [PATCH 28/47] ci: bump checkout/setup-node to v6 (Node 24 runtime) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit GitHub deprecated Node.js 20 for actions: actions/checkout@v4 and actions/setup-node@v4 run on Node 20 and emit a deprecation warning. Node 24 becomes the forced default on 2026-06-02 and Node 20 is removed on 2026-09-16. Bump both to @v6 (Node 24). Config is unchanged — node-version: 22 and registry-url are both supported in v6. Co-Authored-By: Claude Opus 4.7 (1M context) --- .github/workflows/release.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index dcb20613..ff1a1577 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -20,8 +20,8 @@ jobs: release: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v4 - - uses: actions/setup-node@v4 + - uses: actions/checkout@v6 + - uses: actions/setup-node@v6 with: node-version: 22 registry-url: https://registry.npmjs.org From bf73f4d05cc7e0683444f14940a018ddbe8c8186 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Fri, 22 May 2026 04:51:42 -0500 Subject: [PATCH 29/47] feat(installer): add `codegraph uninstall` command (#313) (#318) Adds a cross-channel uninstall that removes CodeGraph from every agent it's configured on (Claude Code, Cursor, Codex CLI, opencode, Hermes). Prompts global-vs-local up front (no flags required) and reports which providers it actually hit; --location / --target / --yes supported for non-interactive use. Removes only what install wrote; leaves the .codegraph/ index to `uninit`. Also fixes Cursor uninstall leaving an orphaned .cursor/rules/codegraph.mdc (its description: CodeGraph frontmatter lingered); the dedicated rules file is now deleted outright while user content outside our markers is preserved. Validated end-to-end on macOS and Docker Linux (global + local sweeps clean). Adds 8 tests; full suite 730 passing. Bumps to 0.9.3 with CHANGELOG entry. Resolves #313. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 24 ++++ __tests__/installer-targets.test.ts | 155 ++++++++++++++++++++++++++ package-lock.json | 4 +- package.json | 2 +- src/bin/codegraph.ts | 37 +++++++ src/installer/index.ts | 163 +++++++++++++++++++++++++++- src/installer/targets/cursor.ts | 58 +++++++++- 7 files changed, 435 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index b544414b..df309681 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,29 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.9.3] - 2026-05-22 + +### Added +- **`codegraph uninstall` command.** Cleanly removes CodeGraph from every agent + it's configured on — Claude Code, Cursor, Codex CLI, opencode, and Hermes + Agent — in one step. It asks up front whether to remove the global config + (`~/.claude`, `~/.codex`, …) or just this project's local config (no flags + required), then prints exactly which agents it touched so you can see what + changed. `--location`, `--target`, and `--yes` are accepted for scripted / + non-interactive use. It removes only what `install` wrote (MCP server entry, + instructions block, permissions) and leaves your `.codegraph/` index alone + (use `codegraph uninit` for that). Resolves + [#313](https://github.com/colbymchenry/codegraph/issues/313) — previously the + only cleanup path was an npm `preuninstall` hook that the published bundle + never shipped, so `npm uninstall -g` left every agent pointing at a CodeGraph + MCP server that no longer existed. + +### Fixed +- **Cursor uninstall left an orphaned `.cursor/rules/codegraph.mdc`.** It + stripped the rule body but left the file and its `description: CodeGraph …` + frontmatter behind. The dedicated rules file is now deleted outright on + uninstall, while any content you added outside CodeGraph's markers is kept. + ## [0.9.2] - 2026-05-21 ### Added @@ -93,6 +116,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). find its bundle. The release pipeline now verifies every package reached the registry (and is idempotent), so a release can't pass green-but-broken again. +[0.9.3]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.3 [0.9.2]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.2 [0.9.1]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.1 diff --git a/__tests__/installer-targets.test.ts b/__tests__/installer-targets.test.ts index 44e90d68..59e869e2 100644 --- a/__tests__/installer-targets.test.ts +++ b/__tests__/installer-targets.test.ts @@ -19,6 +19,7 @@ import * as fs from 'fs'; import * as path from 'path'; import * as os from 'os'; import { ALL_TARGETS, getTarget, resolveTargetFlag } from '../src/installer/targets/registry'; +import { uninstallTargets } from '../src/installer'; import { upsertTomlTable, removeTomlTable, buildTomlTable } from '../src/installer/targets/toml'; import { cleanupLegacyHooks } from '../src/installer/targets/claude'; @@ -723,6 +724,160 @@ describe('Installer targets — TOML serializer (Codex backbone)', () => { }); }); +describe('Installer — uninstallTargets sweep (codegraph uninstall)', () => { + let tmpHome: string; + let tmpCwd: string; + let origCwd: string; + let homeRestore: { restore: () => void }; + + beforeEach(() => { + tmpHome = mkTmpDir('un-home'); + tmpCwd = mkTmpDir('un-cwd'); + origCwd = process.cwd(); + process.chdir(tmpCwd); + homeRestore = setHome(tmpHome); + }); + + afterEach(() => { + homeRestore.restore(); + process.chdir(origCwd); + fs.rmSync(tmpHome, { recursive: true, force: true }); + fs.rmSync(tmpCwd, { recursive: true, force: true }); + }); + + it('sweeps every agent it was installed on and reports removed for each (global)', () => { + for (const t of ALL_TARGETS) { + if (t.supportsLocation('global')) t.install('global', { autoAllow: true }); + } + + const reports = uninstallTargets(ALL_TARGETS, 'global'); + + for (const t of ALL_TARGETS) { + const r = reports.find((x) => x.id === t.id)!; + expect(r.status).toBe('removed'); + expect(r.removedPaths.length).toBeGreaterThan(0); + // The actual config is gone afterward. + expect(t.detect('global').alreadyConfigured).toBe(false); + } + }); + + it('is safe on a clean slate — every agent reports not-configured, nothing removed', () => { + const reports = uninstallTargets(ALL_TARGETS, 'global'); + for (const r of reports) { + expect(r.status).toBe('not-configured'); + expect(r.removedPaths).toEqual([]); + } + }); + + it('reports removed only for agents that were actually configured', () => { + // Install on Claude only; the rest stay untouched. + getTarget('claude')!.install('global', { autoAllow: true }); + + const reports = uninstallTargets(ALL_TARGETS, 'global'); + + const claude = reports.find((r) => r.id === 'claude')!; + expect(claude.status).toBe('removed'); + expect(claude.displayName).toBe(getTarget('claude')!.displayName); + + for (const r of reports.filter((x) => x.id !== 'claude')) { + expect(r.status).toBe('not-configured'); + } + }); + + it('marks global-only agents as unsupported for a local sweep (and never touches them)', () => { + const reports = uninstallTargets(ALL_TARGETS, 'local'); + for (const t of ALL_TARGETS) { + const r = reports.find((x) => x.id === t.id)!; + if (t.supportsLocation('local')) { + expect(r.status).toBe('not-configured'); + } else { + expect(r.status).toBe('unsupported'); + expect(r.removedPaths).toEqual([]); + expect(r.notes[0]).toMatch(/global-only/); + } + } + }); + + it('is idempotent — a second sweep finds nothing left to remove', () => { + for (const t of ALL_TARGETS) { + if (t.supportsLocation('global')) t.install('global', { autoAllow: true }); + } + const first = uninstallTargets(ALL_TARGETS, 'global'); + expect(first.some((r) => r.status === 'removed')).toBe(true); + + const second = uninstallTargets(ALL_TARGETS, 'global'); + for (const r of second) { + expect(r.status).toBe('not-configured'); + expect(r.removedPaths).toEqual([]); + } + }); + + it('a --target subset removes only the chosen agents, leaving siblings configured', () => { + getTarget('claude')!.install('global', { autoAllow: true }); + getTarget('cursor')!.install('global', { autoAllow: true }); + + const reports = uninstallTargets(resolveTargetFlag('claude', 'global'), 'global'); + + expect(reports.map((r) => r.id)).toEqual(['claude']); + expect(reports[0].status).toBe('removed'); + // Cursor was not in the subset — still configured. + expect(getTarget('cursor')!.detect('global').alreadyConfigured).toBe(true); + expect(getTarget('claude')!.detect('global').alreadyConfigured).toBe(false); + }); +}); + +describe('Installer — Cursor rules file cleanup on uninstall', () => { + let tmpHome: string; + let tmpCwd: string; + let origCwd: string; + let homeRestore: { restore: () => void }; + const cursor = getTarget('cursor')!; + + beforeEach(() => { + tmpHome = mkTmpDir('cur-home'); + tmpCwd = mkTmpDir('cur-cwd'); + origCwd = process.cwd(); + process.chdir(tmpCwd); + homeRestore = setHome(tmpHome); + }); + + afterEach(() => { + homeRestore.restore(); + process.chdir(origCwd); + fs.rmSync(tmpHome, { recursive: true, force: true }); + fs.rmSync(tmpCwd, { recursive: true, force: true }); + }); + + const rulesFile = () => path.join(process.cwd(), '.cursor', 'rules', 'codegraph.mdc'); + + it('deletes the dedicated codegraph.mdc entirely (no orphaned frontmatter left behind)', () => { + cursor.install('local', { autoAllow: true }); + expect(fs.existsSync(rulesFile())).toBe(true); + + cursor.uninstall('local'); + + // The whole file — frontmatter included — is gone, not just the block. + expect(fs.existsSync(rulesFile())).toBe(false); + expect(cursor.detect('local').alreadyConfigured).toBe(false); + }); + + it('preserves user content added outside the codegraph markers (strips only our block)', () => { + cursor.install('local', { autoAllow: true }); + const withUserContent = + fs.readFileSync(rulesFile(), 'utf-8') + '\n## My own rule\nkeep me\n'; + fs.writeFileSync(rulesFile(), withUserContent); + + cursor.uninstall('local'); + + expect(fs.existsSync(rulesFile())).toBe(true); + const after = fs.readFileSync(rulesFile(), 'utf-8'); + expect(after).toContain('keep me'); + // Our tool-usage block is gone. + expect(after).not.toContain('codegraph_search'); + expect(after).not.toContain('CODEGRAPH_START'); + }); +}); + function listAllFiles(dir: string): string[] { if (!fs.existsSync(dir)) return []; const out: string[] = []; diff --git a/package-lock.json b/package-lock.json index 49342496..36c592b1 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.2", + "version": "0.9.3", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@colbymchenry/codegraph", - "version": "0.9.2", + "version": "0.9.3", "license": "MIT", "dependencies": { "@clack/prompts": "^1.3.0", diff --git a/package.json b/package.json index 4ea93215..f813c1e6 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.2", + "version": "0.9.3", "description": "Supercharge Claude Code with semantic code intelligence. 94% fewer tool calls • 77% faster exploration • 100% local.", "main": "dist/index.js", "types": "dist/index.d.ts", diff --git a/src/bin/codegraph.ts b/src/bin/codegraph.ts index dac8ce1e..6f90e6fe 100644 --- a/src/bin/codegraph.ts +++ b/src/bin/codegraph.ts @@ -7,6 +7,7 @@ * Usage: * codegraph Run interactive installer (when no args) * codegraph install Run interactive installer + * codegraph uninstall Remove CodeGraph from your agents * codegraph init [path] Initialize CodeGraph in a project * codegraph uninit [path] Remove CodeGraph from a project * codegraph index [path] Index all files in the project @@ -1398,6 +1399,42 @@ program } }); +/** + * codegraph uninstall + * + * Inverse of `install`. Removes the codegraph MCP server entry, + * instructions block, and permissions from every agent (or a + * `--target` subset). Prompts global-vs-local when not given. Does NOT + * delete the `.codegraph/` index — that's `codegraph uninit`. + */ +program + .command('uninstall') + .description('Remove codegraph from your agents (Claude Code, Cursor, Codex CLI, opencode, Hermes Agent)') + .option('-t, --target ', 'Target agent(s): comma-separated ids, or "all". Default: all') + .option('-l, --location ', 'Uninstall location: "global" or "local". Default: prompt') + .option('-y, --yes', 'Non-interactive: defaults to --location=global --target=all') + .action(async (opts: { + target?: string; + location?: string; + yes?: boolean; + }) => { + const { runUninstaller } = await import('../installer'); + if (opts.location && opts.location !== 'global' && opts.location !== 'local') { + error(`--location must be "global" or "local" (got "${opts.location}").`); + process.exit(1); + } + try { + await runUninstaller({ + target: opts.target, + location: opts.location as 'global' | 'local' | undefined, + yes: opts.yes, + }); + } catch (err) { + error(err instanceof Error ? err.message : String(err)); + process.exit(1); + } + }); + // Parse and run program.parse(); diff --git a/src/installer/index.ts b/src/installer/index.ts index e5b18411..0826d8da 100644 --- a/src/installer/index.ts +++ b/src/installer/index.ts @@ -21,7 +21,7 @@ import { getTarget, resolveTargetFlag, } from './targets/registry'; -import type { AgentTarget, Location, WriteResult } from './targets/types'; +import type { AgentTarget, Location, TargetId, WriteResult } from './targets/types'; import { getGlyphs } from '../ui/glyphs'; // Import the lightweight submodules directly (not the ../sync barrel, which // re-exports FileWatcher and would transitively pull in ../extraction — the @@ -217,6 +217,167 @@ export async function runInstallerWithOptions(opts: RunInstallerOptions): Promis clack.outro(finalNote); } +export interface RunUninstallerOptions { + /** + * Comma-separated target list, or `auto` / `all` / `none`. Defaults + * to `all` — uninstall sweeps every known agent and reports which + * ones it actually touched, so the user doesn't have to know where + * they configured it. + */ + target?: string; + /** Skip the location prompt; use this value directly. */ + location?: Location; + /** Non-interactive: location=global, target=all, no prompts. */ + yes?: boolean; +} + +export type UninstallStatus = 'removed' | 'not-configured' | 'unsupported'; + +/** + * Per-target outcome of an uninstall sweep. `removed` means we deleted + * at least one thing; `not-configured` means the agent had no codegraph + * config at this location (nothing to do); `unsupported` means the + * agent has no config concept for this location (e.g. Codex is + * global-only, so a `local` uninstall skips it). + */ +export interface UninstallReport { + id: TargetId; + displayName: string; + status: UninstallStatus; + /** Absolute paths we actually edited/removed (action === 'removed'). */ + removedPaths: string[]; + /** Verbatim notes from the target (rare for uninstall). */ + notes: string[]; +} + +/** + * Pure uninstall sweep — no prompts, no I/O beyond the targets' own + * file edits. Exposed (and unit-tested) separately from the clack UI in + * `runUninstaller` so the aggregation logic can be asserted directly. + * + * Each target's `uninstall()` is already safe to call when nothing was + * installed (it returns `not-found` actions), so this is safe to run + * across every target unconditionally. + */ +export function uninstallTargets( + targets: readonly AgentTarget[], + location: Location, +): UninstallReport[] { + return targets.map((target) => { + if (!target.supportsLocation(location)) { + const only: Location = location === 'local' ? 'global' : 'local'; + return { + id: target.id, + displayName: target.displayName, + status: 'unsupported' as const, + removedPaths: [], + notes: [`no ${location} config — this agent is ${only}-only`], + }; + } + const result = target.uninstall(location); + const removedPaths = result.files + .filter((f) => f.action === 'removed') + .map((f) => f.path); + return { + id: target.id, + displayName: target.displayName, + status: removedPaths.length > 0 ? ('removed' as const) : ('not-configured' as const), + removedPaths, + notes: result.notes ?? [], + }; + }); +} + +/** + * Interactive uninstaller — the inverse of `runInstallerWithOptions`. + * Asks global-vs-local first (unless `--location`/`--yes` is given), + * then sweeps every agent target (or the `--target` subset) and prints + * one block per agent so the user sees exactly which providers it hit. + * + * Removes only what install wrote (MCP server entry, instructions + * block, permissions) — never the `.codegraph/` index, which `codegraph + * uninit` owns. + */ +export async function runUninstaller(opts: RunUninstallerOptions): Promise { + const clack = await importESM('@clack/prompts'); + + clack.intro(`CodeGraph v${getVersion()} — uninstall`); + + const useDefaults = opts.yes === true; + + // Step 1: which location — asked FIRST, the one decision the user + // must make. Global sweeps ~/.claude, ~/.codex, etc.; local sweeps + // the configs in this project directory. + let location: Location; + if (opts.location) { + location = opts.location; + } else if (useDefaults) { + location = 'global'; + } else { + const sel = await clack.select({ + message: 'Remove CodeGraph from all your projects, or just this one?', + options: [ + { value: 'global' as const, label: 'All projects (global)', hint: '~/.claude, ~/.cursor, ~/.codex, ~/.config/opencode, ~/.hermes' }, + { value: 'local' as const, label: 'Just this project (local)', hint: './.claude, ./.cursor, ./opencode.jsonc' }, + ], + initialValue: 'global' as const, + }); + if (clack.isCancel(sel)) { + clack.cancel('Uninstall cancelled.'); + process.exit(0); + } + location = sel; + } + + // Step 2: which agents. Default is every agent, so the user doesn't + // have to remember where they installed it — unconfigured agents are + // reported as "nothing to remove" and left untouched. An explicit + // --target subsets this. + let targets: AgentTarget[]; + if (opts.target !== undefined) { + targets = resolveTargetFlag(opts.target, location); + } else { + targets = [...ALL_TARGETS]; + } + if (targets.length === 0) { + clack.outro('No agent targets selected — nothing to do.'); + return; + } + + // Step 3: sweep + per-agent feedback. + const reports = uninstallTargets(targets, location); + const removed = reports.filter((r) => r.status === 'removed'); + + for (const r of reports) { + if (r.status === 'removed') { + for (const p of r.removedPaths) { + clack.log.success(`${r.displayName}: removed ${tildify(p)}`); + } + } else if (r.status === 'not-configured') { + clack.log.info(`${r.displayName}: not configured — nothing to remove`); + } else { + clack.log.info(`${r.displayName}: skipped — ${r.notes[0] ?? 'unsupported location'}`); + } + } + + // Step 4: for local uninstall, the index dir is separate — point at + // `uninit` so the user knows it's still there (and how to remove it). + if (location === 'local' && fs.existsSync(path.join(process.cwd(), '.codegraph'))) { + clack.log.info('The .codegraph/ index for this project is still here. Run `codegraph uninit` to delete it.'); + } + + // Step 5: summary. + if (removed.length > 0) { + const names = removed.map((r) => r.displayName).join(', '); + clack.outro( + `Removed CodeGraph from ${removed.length} agent${removed.length > 1 ? 's' : ''}: ${names}. ` + + `Restart ${removed.length > 1 ? 'them' : 'it'} to apply.`, + ); + } else { + clack.outro(`CodeGraph was not configured in any ${location} agent — nothing to remove.`); + } +} + /** * For every target that has a global config and exposes * `wireProjectSurfaces`, write its project-local surfaces (e.g. diff --git a/src/installer/targets/cursor.ts b/src/installer/targets/cursor.ts index 850b6fc8..fb60a002 100644 --- a/src/installer/targets/cursor.ts +++ b/src/installer/targets/cursor.ts @@ -46,7 +46,6 @@ import { getMcpServerConfig, jsonDeepEqual, readJsonFile, - removeMarkedSection, replaceOrAppendMarkedSection, writeJsonFile, } from './shared'; @@ -140,9 +139,7 @@ class CursorTarget implements AgentTarget { } if (loc === 'local') { - const rules = rulesPath(); - const action = removeMarkedSection(rules, CODEGRAPH_SECTION_START, CODEGRAPH_SECTION_END); - files.push({ path: rules, action }); + files.push(removeRulesEntry()); } return { files }; @@ -237,4 +234,57 @@ function writeRulesEntry(): WriteResult['files'][number] { return { path: file, action: mapped }; } +/** + * Remove the Cursor rules file on uninstall. + * + * Unlike the shared CLAUDE.md / AGENTS.md files (where codegraph owns + * only a marker-delimited section), `.cursor/rules/codegraph.mdc` is a + * file we create OUTRIGHT — the frontmatter is ours too. So a plain + * `removeMarkedSection` is wrong here: it would strip our instruction + * block but leave the orphaned `description: CodeGraph ...` frontmatter + * behind, so the file lingers and still "mentions" codegraph. + * + * Instead: strip our block, and if nothing but our own frontmatter + * remains, delete the whole file. Only when the user has added their + * own content outside our markers do we keep the file (minus our block). + */ +function removeRulesEntry(): WriteResult['files'][number] { + const file = rulesPath(); + if (!fs.existsSync(file)) return { path: file, action: 'not-found' }; + + let content: string; + try { + content = fs.readFileSync(file, 'utf-8'); + } catch { + return { path: file, action: 'not-found' }; + } + + const ourFrontmatter = MDC_FRONTMATTER.trim(); + const startIdx = content.indexOf(CODEGRAPH_SECTION_START); + const endIdx = content.indexOf(CODEGRAPH_SECTION_END); + + // Our marked block is present — strip it, then decide what's left. + if (startIdx !== -1 && endIdx > startIdx) { + const before = content.substring(0, startIdx).trimEnd(); + const after = content.substring(endIdx + CODEGRAPH_SECTION_END.length).trimStart(); + const remainder = (before + (before && after ? '\n\n' : '') + after).trim(); + if (remainder === '' || remainder === ourFrontmatter) { + try { fs.unlinkSync(file); } catch { /* ignore */ } + } else { + atomicWriteFileSync(file, remainder + '\n'); + } + return { path: file, action: 'removed' }; + } + + // No block, but the file is still our pristine frontmatter-only file + // — it's ours, so remove it. + if (content.trim() === ourFrontmatter) { + try { fs.unlinkSync(file); } catch { /* ignore */ } + return { path: file, action: 'removed' }; + } + + // Foreign content we don't recognize — leave it alone. + return { path: file, action: 'not-found' }; +} + export const cursorTarget: AgentTarget = new CursorTarget(); From e5d633075c9c8c77fcb69c1ed553296fab868082 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Fri, 22 May 2026 05:39:24 -0500 Subject: [PATCH 30/47] fix: prevent V8 turboshaft WASM Zone OOM during indexing (#298, #293) (#322) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Large multi-language indexes crashed with `Fatal process out of memory: Zone` on Node 22/24 (including the bundled runtime) — V8's turboshaft optimizing WASM compiler exhausts its per-compilation Zone arena while compiling tree-sitter grammars on a background thread, even with tens of GB free (the Zone is a V8-internal arena, not the JS heap). Run node with V8 `--liftoff-only`, which keeps grammar compilation on the Liftoff baseline and never reaches the optimizing tier. Delivered via the bundled launcher + a one-shot CLI re-exec guard for all other launch paths. Empirically only `--liftoff-only` stops it (`--no-wasm-tier-up` / `--no-wasm-dynamic-tiering` do not), and it must be on node's command line (setFlagsFromString / worker execArgv / NODE_OPTIONS all fail). Reproduced the exact crash with the real indexer on Node 24.16 against a 2,880-file / 18-language repo and confirmed the fix eliminates it; full suite + 7 new tests pass. Bumps to 0.9.4. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 21 ++++++ __tests__/wasm-runtime-flags.test.ts | 87 +++++++++++++++++++++++++ package-lock.json | 4 +- package.json | 2 +- scripts/build-bundle.sh | 14 +++- scripts/npm-shim.js | 5 +- src/bin/codegraph.ts | 8 +++ src/extraction/wasm-runtime-flags.ts | 96 ++++++++++++++++++++++++++++ 8 files changed, 231 insertions(+), 6 deletions(-) create mode 100644 __tests__/wasm-runtime-flags.test.ts create mode 100644 src/extraction/wasm-runtime-flags.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index df309681..93e558a0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,26 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.9.4] - 2026-05-22 + +### Fixed +- **`Fatal process out of memory: Zone` crash while indexing large projects.** + On Node.js 22 and 24 — including CodeGraph's own bundled runtime — running + `codegraph index` / `codegraph init` on a large multi-language repo could + abort the entire process partway through parsing with + `Fatal process out of memory: Zone`, even with tens of GB of RAM free (the + failure is in a V8-internal compilation arena, not the JS heap). The cause is + V8's "turboshaft" optimizing WASM compiler exhausting its Zone budget while + compiling tree-sitter's large WebAssembly grammars on a background thread. + CodeGraph now runs with V8's `--liftoff-only`, which keeps grammar compilation + on the baseline compiler and never reaches the optimizing tier, eliminating + the crash; indexing output is otherwise unchanged. The bundled launcher passes + the flag directly, and any other launch path (from source, `npx`, a globally + linked dev build) re-execs once with it automatically. Resolves + [#298](https://github.com/colbymchenry/codegraph/issues/298) and + [#293](https://github.com/colbymchenry/codegraph/issues/293). (Node 25 stays + blocked — its variant of this V8 bug is not resolved by `--liftoff-only`.) + ## [0.9.3] - 2026-05-22 ### Added @@ -116,6 +136,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). find its bundle. The release pipeline now verifies every package reached the registry (and is idempotent), so a release can't pass green-but-broken again. +[0.9.4]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.4 [0.9.3]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.3 [0.9.2]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.2 [0.9.1]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.1 diff --git a/__tests__/wasm-runtime-flags.test.ts b/__tests__/wasm-runtime-flags.test.ts new file mode 100644 index 00000000..a4dae8bb --- /dev/null +++ b/__tests__/wasm-runtime-flags.test.ts @@ -0,0 +1,87 @@ +/** + * WASM runtime flags — the workaround for the V8 turboshaft WASM Zone OOM + * (`Fatal process out of memory: Zone`) that crashed `codegraph index` on large + * polyglot repos under Node >= 22. See issues #293 and #298. + * + * The crash was reproduced with the real indexer on the bundled Node 24 runtime; + * empirically only `--liftoff-only` prevents it (`--no-wasm-tier-up` / + * `--no-wasm-dynamic-tiering` do not), and the flag must be on node's command + * line — `setFlagsFromString`, worker `execArgv`, and `NODE_OPTIONS` all fail. + * These tests pin that contract so it can't silently regress. + */ +import { describe, it, expect } from 'vitest'; +import { spawnSync } from 'child_process'; +import * as fs from 'fs'; +import * as os from 'os'; +import * as path from 'path'; +import { + WASM_RUNTIME_FLAGS, + processHasWasmRuntimeFlags, + buildRelaunchArgv, +} from '../src/extraction/wasm-runtime-flags'; + +describe('WASM_RUNTIME_FLAGS', () => { + it('pins --liftoff-only (the only flag shown to stop the turboshaft Zone OOM)', () => { + // On Node 24, --no-wasm-tier-up and --no-wasm-dynamic-tiering both still + // crash; only --liftoff-only forces grammars onto the Liftoff baseline and + // off the optimizing tier. Pin it so it can't be swapped for an ineffective + // flag. + expect(WASM_RUNTIME_FLAGS).toContain('--liftoff-only'); + }); + + it('every flag is a real, accepted flag on the running Node/V8 runtime', () => { + // node rejects unknown CLI flags at startup, so a renamed/removed flag would + // break the bundled launcher and make the relaunch guard a silent no-op. + // Prove each flag actually launches node here. + const res = spawnSync( + process.execPath, + [...WASM_RUNTIME_FLAGS, '-e', 'process.exit(0)'], + { encoding: 'utf8' } + ); + expect(res.status, `node rejected ${WASM_RUNTIME_FLAGS.join(' ')}:\n${res.stderr}`).toBe(0); + }); +}); + +describe('processHasWasmRuntimeFlags', () => { + it('is true only when every required flag is present', () => { + expect(processHasWasmRuntimeFlags(['--liftoff-only'])).toBe(true); + expect(processHasWasmRuntimeFlags(['--liftoff-only', '--enable-source-maps'])).toBe(true); + }); + + it('is false when the flags are absent', () => { + expect(processHasWasmRuntimeFlags([])).toBe(false); + expect(processHasWasmRuntimeFlags(['--max-old-space-size=4096'])).toBe(false); + }); +}); + +describe('buildRelaunchArgv', () => { + it('places the wasm flags first, then the script and its args', () => { + expect(buildRelaunchArgv('/x/codegraph.js', ['index', '/repo'], [])).toEqual([ + '--liftoff-only', + '/x/codegraph.js', + 'index', + '/repo', + ]); + }); + + it('preserves other existing node flags without duplicating ours', () => { + expect( + buildRelaunchArgv('/x/codegraph.js', ['status'], ['--liftoff-only', '--enable-source-maps']) + ).toEqual(['--liftoff-only', '--enable-source-maps', '/x/codegraph.js', 'status']); + }); + + it('produces an argv that actually launches node WITH the flag applied', () => { + // End-to-end proof of the delivery mechanism without needing the crash: + // run the constructed argv and confirm the child sees the flag in execArgv. + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-relaunch-')); + try { + const harness = path.join(dir, 'harness.cjs'); + fs.writeFileSync(harness, 'process.stdout.write(JSON.stringify(process.execArgv));'); + const res = spawnSync(process.execPath, buildRelaunchArgv(harness, []), { encoding: 'utf8' }); + expect(res.status, res.stderr).toBe(0); + expect(JSON.parse(res.stdout)).toContain('--liftoff-only'); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } + }); +}); diff --git a/package-lock.json b/package-lock.json index 36c592b1..cad34c1b 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.3", + "version": "0.9.4", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@colbymchenry/codegraph", - "version": "0.9.3", + "version": "0.9.4", "license": "MIT", "dependencies": { "@clack/prompts": "^1.3.0", diff --git a/package.json b/package.json index f813c1e6..5455ced9 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.3", + "version": "0.9.4", "description": "Supercharge Claude Code with semantic code intelligence. 94% fewer tool calls • 77% faster exploration • 100% local.", "main": "dist/index.js", "types": "dist/index.d.ts", diff --git a/scripts/build-bundle.sh b/scripts/build-bundle.sh index a00f3369..120ac981 100755 --- a/scripts/build-bundle.sh +++ b/scripts/build-bundle.sh @@ -70,9 +70,18 @@ rm -f "$STAGE/lib/package-lock.json" # 4. Vendored Node + launcher (the launcher uses the bundled Node by relative # path, so no system Node is ever needed). +# +# `--liftoff-only`: keep tree-sitter's large WASM grammars on V8's Liftoff +# baseline compiler so they never reach the turboshaft optimizing tier, whose +# per-compilation Zone arena OOMs the whole process (`Fatal process out of +# memory: Zone`) on Node >= 22 — even with tens of GB free. The flag is read at +# V8 engine init so it must be on node's command line; the parse worker inherits +# it. See issues #293/#298 and src/extraction/wasm-runtime-flags.ts. (The CLI +# also self-relaunches with this flag when launched without it, so non-bundled +# runs are covered too; passing it here avoids that extra spawn.) if [ "$OSFAM" = "win32" ]; then cp "$NODE_BIN" "$STAGE/node.exe" - printf '@"%%~dp0..\\node.exe" "%%~dp0..\\lib\\dist\\bin\\codegraph.js" %%*\r\n' \ + printf '@"%%~dp0..\\node.exe" --liftoff-only "%%~dp0..\\lib\\dist\\bin\\codegraph.js" %%*\r\n' \ > "$STAGE/bin/codegraph.cmd" else cp "$NODE_BIN" "$STAGE/node" @@ -89,7 +98,8 @@ while [ -L "$SELF" ]; do esac done DIR="$(cd "$(dirname "$SELF")/.." && pwd)" -exec "$DIR/node" "$DIR/lib/dist/bin/codegraph.js" "$@" +# --liftoff-only: avoid the V8 turboshaft WASM Zone OOM (issues #293/#298). +exec "$DIR/node" --liftoff-only "$DIR/lib/dist/bin/codegraph.js" "$@" LAUNCH chmod +x "$STAGE/bin/codegraph" fi diff --git a/scripts/npm-shim.js b/scripts/npm-shim.js index bea905f3..81012124 100755 --- a/scripts/npm-shim.js +++ b/scripts/npm-shim.js @@ -31,7 +31,10 @@ try { if (isWindows) { command = require.resolve(pkg + '/node.exe'); var entry = require.resolve(pkg + '/lib/dist/bin/codegraph.js'); - args = [entry].concat(process.argv.slice(2)); + // --liftoff-only: keep tree-sitter's WASM grammars off V8's turboshaft tier + // to avoid the Zone OOM on Node >= 22 (issues #293/#298). The unix launcher + // passes this too; on Windows we invoke node.exe directly so add it here. + args = ['--liftoff-only', entry].concat(process.argv.slice(2)); } else { command = require.resolve(pkg + '/bin/codegraph'); args = process.argv.slice(2); diff --git a/src/bin/codegraph.ts b/src/bin/codegraph.ts index 6f90e6fe..711d39c8 100644 --- a/src/bin/codegraph.ts +++ b/src/bin/codegraph.ts @@ -27,6 +27,7 @@ import { createShimmerProgress } from '../ui/shimmer-progress'; import { getGlyphs } from '../ui/glyphs'; import { buildNode25BlockBanner, buildNodeTooOldBanner, MIN_NODE_MAJOR } from './node-version-check'; +import { relaunchWithWasmRuntimeFlagsIfNeeded } from '../extraction/wasm-runtime-flags'; // Lazy-load heavy modules (CodeGraph, runInstaller) to keep CLI startup fast. async function loadCodeGraph(): Promise { @@ -75,6 +76,13 @@ if (nodeMajor < MIN_NODE_MAJOR) { // Override active — banner shown for visibility, continuing. } +// Re-exec with V8's `--liftoff-only` if it isn't already set, so tree-sitter's +// large WASM grammars never hit the turboshaft Zone OOM (`Fatal process out of +// memory: Zone`) on Node >= 22. No-op under the bundled launcher, which already +// passes the flag. Must run before any grammar (in the parse worker, which +// inherits this process's flags) is compiled. See ../extraction/wasm-runtime-flags. +relaunchWithWasmRuntimeFlagsIfNeeded(__filename); + // Check if running with no arguments - run installer if (process.argv.length === 2) { import('../installer').then(({ runInstaller }) => diff --git a/src/extraction/wasm-runtime-flags.ts b/src/extraction/wasm-runtime-flags.ts new file mode 100644 index 00000000..f33a19ff --- /dev/null +++ b/src/extraction/wasm-runtime-flags.ts @@ -0,0 +1,96 @@ +/** + * WASM runtime flags — workaround for the V8 turboshaft WASM Zone OOM. + * + * tree-sitter grammars are large WebAssembly modules. On Node >= 22 the V8 + * "turboshaft" optimizing WASM compiler can exhaust its per-compilation Zone + * arena while compiling these grammars on a background thread, aborting the + * whole process with `Fatal process out of memory: Zone` — even with tens of + * GB of system memory free, because the Zone is a V8-internal arena, not the + * JS heap. Reproduced on Node 22 and 24; Node 25 is already hard-blocked for + * the same crash (see ../bin/node-version-check.ts). See issues #293 and #298. + * + * `--liftoff-only` forces every WASM module to the Liftoff baseline compiler + * and never runs turboshaft, which eliminates the crash. Parsing stays fully + * correct; we only forgo the (marginal, and for grammars rarely reached) + * optimized-tier speedup. + * + * This flag MUST be on node's command line — it is read by V8 at engine init, + * before any of our JS runs. Empirically (Node 24) none of these work: + * - `v8.setFlagsFromString('--liftoff-only')` at runtime — too late. + * - Worker `execArgv: ['--liftoff-only']` — rejected (ERR_WORKER_INVALID_EXEC_ARGV). + * - `NODE_OPTIONS=--liftoff-only` — not on Node's NODE_OPTIONS allowlist. + * Also empirically, `--no-wasm-tier-up` / `--no-wasm-dynamic-tiering` do NOT + * prevent the crash — only disabling the optimizing tier entirely does. + * + * Delivery: the bundled launcher passes the flag directly (see + * scripts/build-bundle.sh and scripts/npm-shim.js); for any other launch path + * (running dist directly, from source, etc.) the CLI re-execs itself once with + * the flag via {@link relaunchWithWasmRuntimeFlagsIfNeeded}. V8 flags are + * PROCESS-global, and the parse worker is created with default (inherited) + * execArgv, so flagging the main process governs the worker's WASM compilation + * too. + */ +import { spawnSync } from 'child_process'; + +/** + * The V8 flag(s) that keep tree-sitter grammar compilation off the turboshaft + * optimizing tier. Single source of truth: the relaunch guard and the test + * suite both read this (a test asserts each is a real flag on the running + * runtime, so a rename can't silently regress the fix). + */ +export const WASM_RUNTIME_FLAGS: readonly string[] = ['--liftoff-only']; + +/** + * Env var set on the relaunched child so a detection slip can never cause an + * infinite re-exec loop. Also lets users force-disable the relaunch. + */ +const RELAUNCH_GUARD_ENV = 'CODEGRAPH_WASM_RELAUNCHED'; + +/** True when every required WASM runtime flag is already present in `execArgv`. */ +export function processHasWasmRuntimeFlags( + execArgv: readonly string[] = process.execArgv +): boolean { + return WASM_RUNTIME_FLAGS.every((flag) => execArgv.includes(flag)); +} + +/** + * Build the argv for re-execing node with the WASM runtime flags: our flags + * first, then any node flags already in `execArgv` (deduped), then the script + * and its args. Pure — exported for unit testing. + */ +export function buildRelaunchArgv( + scriptPath: string, + scriptArgs: readonly string[], + execArgv: readonly string[] = process.execArgv +): string[] { + const preserved = execArgv.filter((arg) => !WASM_RUNTIME_FLAGS.includes(arg)); + return [...WASM_RUNTIME_FLAGS, ...preserved, scriptPath, ...scriptArgs]; +} + +/** + * If the current process is missing the WASM runtime flags, re-exec it once + * with them and exit with the child's status. No-op when the flags are already + * present (the normal bundled-launcher path), when already relaunched, or when + * disabled via CODEGRAPH_NO_RELAUNCH. + * + * On spawn failure, returns so the caller runs in-process anyway — risking the + * OOM is still better than refusing to start. + */ +export function relaunchWithWasmRuntimeFlagsIfNeeded(scriptPath: string): void { + if (processHasWasmRuntimeFlags()) return; + if (process.env[RELAUNCH_GUARD_ENV]) return; + if (process.env.CODEGRAPH_NO_RELAUNCH) return; + + const argv = buildRelaunchArgv(scriptPath, process.argv.slice(2)); + const result = spawnSync(process.execPath, argv, { + stdio: 'inherit', + env: { ...process.env, [RELAUNCH_GUARD_ENV]: '1' }, + }); + + if (result.error) { + // Couldn't relaunch (e.g. execPath unavailable) — fall through and run in + // this process. Degraded (may OOM on huge repos) but not broken. + return; + } + process.exit(result.status ?? (result.signal ? 1 : 0)); +} From c09dfd071e62d6ad502f181010e44049b5830c3a Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Fri, 22 May 2026 05:44:37 -0500 Subject: [PATCH 31/47] release: roll WASM Zone OOM fix into 0.9.3 (not 0.9.4) (#323) 0.9.3 was prepped in the repo but never released (latest published is 0.9.2), so the turboshaft WASM Zone OOM fix ships as part of 0.9.3. Fold its changelog entry into [0.9.3] and revert the version bump. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 37 ++++++++++++++++--------------------- package-lock.json | 4 ++-- package.json | 2 +- 3 files changed, 19 insertions(+), 24 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 93e558a0..51a656e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,26 +7,6 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [0.9.4] - 2026-05-22 - -### Fixed -- **`Fatal process out of memory: Zone` crash while indexing large projects.** - On Node.js 22 and 24 — including CodeGraph's own bundled runtime — running - `codegraph index` / `codegraph init` on a large multi-language repo could - abort the entire process partway through parsing with - `Fatal process out of memory: Zone`, even with tens of GB of RAM free (the - failure is in a V8-internal compilation arena, not the JS heap). The cause is - V8's "turboshaft" optimizing WASM compiler exhausting its Zone budget while - compiling tree-sitter's large WebAssembly grammars on a background thread. - CodeGraph now runs with V8's `--liftoff-only`, which keeps grammar compilation - on the baseline compiler and never reaches the optimizing tier, eliminating - the crash; indexing output is otherwise unchanged. The bundled launcher passes - the flag directly, and any other launch path (from source, `npx`, a globally - linked dev build) re-execs once with it automatically. Resolves - [#298](https://github.com/colbymchenry/codegraph/issues/298) and - [#293](https://github.com/colbymchenry/codegraph/issues/293). (Node 25 stays - blocked — its variant of this V8 bug is not resolved by `--liftoff-only`.) - ## [0.9.3] - 2026-05-22 ### Added @@ -45,6 +25,22 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). MCP server that no longer existed. ### Fixed +- **`Fatal process out of memory: Zone` crash while indexing large projects.** + On Node.js 22 and 24 — including CodeGraph's own bundled runtime — running + `codegraph index` / `codegraph init` on a large multi-language repo could + abort the entire process partway through parsing with + `Fatal process out of memory: Zone`, even with tens of GB of RAM free (the + failure is in a V8-internal compilation arena, not the JS heap). The cause is + V8's "turboshaft" optimizing WASM compiler exhausting its Zone budget while + compiling tree-sitter's large WebAssembly grammars on a background thread. + CodeGraph now runs with V8's `--liftoff-only`, which keeps grammar compilation + on the baseline compiler and never reaches the optimizing tier, eliminating + the crash; indexing output is otherwise unchanged. The bundled launcher passes + the flag directly, and any other launch path (from source, `npx`, a globally + linked dev build) re-execs once with it automatically. Resolves + [#298](https://github.com/colbymchenry/codegraph/issues/298) and + [#293](https://github.com/colbymchenry/codegraph/issues/293). (Node 25 stays + blocked — its variant of this V8 bug is not resolved by `--liftoff-only`.) - **Cursor uninstall left an orphaned `.cursor/rules/codegraph.mdc`.** It stripped the rule body but left the file and its `description: CodeGraph …` frontmatter behind. The dedicated rules file is now deleted outright on @@ -136,7 +132,6 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). find its bundle. The release pipeline now verifies every package reached the registry (and is idempotent), so a release can't pass green-but-broken again. -[0.9.4]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.4 [0.9.3]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.3 [0.9.2]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.2 [0.9.1]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.1 diff --git a/package-lock.json b/package-lock.json index cad34c1b..36c592b1 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.4", + "version": "0.9.3", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "@colbymchenry/codegraph", - "version": "0.9.4", + "version": "0.9.3", "license": "MIT", "dependencies": { "@clack/prompts": "^1.3.0", diff --git a/package.json b/package.json index 5455ced9..f813c1e6 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.4", + "version": "0.9.3", "description": "Supercharge Claude Code with semantic code intelligence. 94% fewer tool calls • 77% faster exploration • 100% local.", "main": "dist/index.js", "types": "dist/index.d.ts", From 5aae9c4bbff4fe02f8284ef5f91dd9d5391027f6 Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Fri, 22 May 2026 05:58:34 -0500 Subject: [PATCH 32/47] Uninstall info in readme --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index 598ac5b0..511e2094 100644 --- a/README.md +++ b/README.md @@ -56,6 +56,16 @@ codegraph init -i
+### Uninstall + +Changed your mind? One command removes CodeGraph from every agent it configured: + +```bash +codegraph uninstall +``` + +Reverses the installer — strips CodeGraph's MCP server config, instructions, and permissions from each configured agent. Your project indexes (`.codegraph/`) are left untouched; remove those per-project with `codegraph uninit`. Use `--target` to remove from specific agents, or `--yes` to run non-interactively. + --- ## Why CodeGraph? @@ -333,6 +343,7 @@ At the start of a session, ask the user if they'd like to initialize CodeGraph: ```bash codegraph # Run interactive installer codegraph install # Run installer (explicit) +codegraph uninstall # Remove CodeGraph from your agents (inverse of install) codegraph init [path] # Initialize in a project (--index to also index) codegraph uninit [path] # Remove CodeGraph from a project (--force to skip prompt) codegraph index [path] # Full index (--force to re-index, --quiet for less output) From 15072aa29fea795a7b506f96563700e6788f0889 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Fri, 22 May 2026 11:38:28 -0500 Subject: [PATCH 33/47] fix: self-heal missing platform bundle from GitHub Releases (#303) (#335) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Installing from a registry mirror (npmmirror/cnpm) that hadn't mirrored the per-platform optionalDependency left codegraph failing with "no prebuilt bundle for " — npm treats an unfetchable optional dep as success and silently skips it. The npm-shim now self-heals: when the bundle is missing it downloads the matching archive from GitHub Releases (checksum-verified, with a download timeout) and caches it, so a global install works on any registry. release.yml now publishes SHA256SUMS and triggers an npmmirror sync after publish. Adds hermetic tests for the shim (resolution, cache reuse, disable knob, download + checksum match/mismatch/absent). Co-authored-by: Claude Opus 4.7 (1M context) --- .github/workflows/release.yml | 27 +++- CHANGELOG.md | 26 ++++ __tests__/npm-shim.test.ts | 208 +++++++++++++++++++++++++++++ package.json | 2 +- scripts/npm-shim.js | 242 ++++++++++++++++++++++++++++++---- 5 files changed, 475 insertions(+), 30 deletions(-) create mode 100644 __tests__/npm-shim.test.ts diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index ff1a1577..51dea151 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -36,6 +36,13 @@ jobs: done ls -lh release + - name: Generate SHA256SUMS + # Published as a release asset; the npm launcher verifies downloaded + # bundles against it (basenames only, so its path.basename match works). + run: | + ( cd release && sha256sum codegraph-* > SHA256SUMS ) + cat release/SHA256SUMS + - name: Resolve version id: ver run: echo "version=$(node -p "require('./package.json').version")" >> "$GITHUB_OUTPUT" @@ -58,9 +65,9 @@ jobs: TAG="v${{ steps.ver.outputs.version }}" # Idempotent: create the release once, otherwise (re-run) refresh assets. if gh release view "$TAG" >/dev/null 2>&1; then - gh release upload "$TAG" release/codegraph-* --clobber + gh release upload "$TAG" release/codegraph-* release/SHA256SUMS --clobber else - gh release create "$TAG" release/codegraph-* --title "$TAG" --notes-file notes.md + gh release create "$TAG" release/codegraph-* release/SHA256SUMS --title "$TAG" --notes-file notes.md fi - name: Publish to npm @@ -96,3 +103,19 @@ jobs: [ -n "$ok" ] || { echo "::error::$name@$V never appeared on the registry"; exit 1; } echo "verified $name@$V" done + + - name: Sync packages to npmmirror + # npmmirror/cnpm mirror lazily and frequently never pull the per-platform + # optionalDependencies on their own, so `npm i` there fails with + # "no prebuilt bundle" (issue #303). Nudge a sync now so mirror users get + # the bundle without waiting. Best-effort — the launcher also self-heals + # from GitHub Releases — so a mirror hiccup never fails the release. + continue-on-error: true + run: | + for dir in release/npm/codegraph-* release/npm/main; do + name=$(node -p "require('./$dir/package.json').name") + enc=$(node -p "encodeURIComponent(require('./$dir/package.json').name)") + echo "sync $name" + curl -s -X PUT "https://registry.npmmirror.com/-/package/$enc/syncs" || true + echo + done diff --git a/CHANGELOG.md b/CHANGELOG.md index 51a656e0..535b0ce9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,31 @@ a [GitHub Release](https://github.com/colbymchenry/codegraph/releases) tagged This project follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.9.4] - 2026-05-22 + +### Added +- **Release archives now ship with a `SHA256SUMS` file**, and the npm launcher + verifies the bundle it downloads against it — a mismatch aborts before + anything runs. Releases published before this change have no checksum file, so + the verification is skipped (not failed) when none is available. + +### Fixed +- **`codegraph: no prebuilt bundle for ` after installing through a + registry mirror.** Installing `@colbymchenry/codegraph` from a registry that + hadn't mirrored the matching per-platform package — most often the + npmmirror/cnpm mirrors, but any lazily-syncing mirror or corporate proxy can + do it — left every command failing with `no prebuilt bundle for `. + The runtime ships as a per-platform `optionalDependency`, and npm treats an + optional package it can't fetch as a success and silently skips it, so the + bundle simply went missing. The launcher now self-heals: when the platform + bundle isn't installed, it downloads the same archive from GitHub Releases + (cached under `~/.codegraph/bundles/` for next time) and runs that — so a + global install works even on a mirror that never carried the platform package. + Set `CODEGRAPH_NO_DOWNLOAD=1` to disable the network fallback, or + `CODEGRAPH_DOWNLOAD_BASE=` to point it at your own mirror of the release + archives; the standalone `install.sh` remains the no-Node alternative. Resolves + [#303](https://github.com/colbymchenry/codegraph/issues/303). + ## [0.9.3] - 2026-05-22 ### Added @@ -132,6 +157,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). find its bundle. The release pipeline now verifies every package reached the registry (and is idempotent), so a release can't pass green-but-broken again. +[0.9.4]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.4 [0.9.3]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.3 [0.9.2]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.2 [0.9.1]: https://github.com/colbymchenry/codegraph/releases/tag/v0.9.1 diff --git a/__tests__/npm-shim.test.ts b/__tests__/npm-shim.test.ts new file mode 100644 index 00000000..16e70506 --- /dev/null +++ b/__tests__/npm-shim.test.ts @@ -0,0 +1,208 @@ +/** + * npm thin-installer launcher (`scripts/npm-shim.js`) tests. + * + * The shim runs on the user's own Node, locates the per-platform optionalDependency + * bundle, and — when a registry mirror failed to deliver it (issue #303) — falls + * back to downloading the bundle from GitHub Releases. These tests exercise that + * shim as a real subprocess from a temp "main package" dir (its own package.json + * + node_modules), so resolution and version lookup behave hermetically. + * + * The download/checksum paths run against a local self-signed HTTPS server via + * CODEGRAPH_DOWNLOAD_BASE — no real network, no published release needed. The + * shim is launched with async `spawn` (not spawnSync), so the test's event loop + * stays free to serve those requests. + * + * POSIX only: the fake bundle launcher is a shell script and extraction uses the + * system `tar`. Skipped on Windows (where the shim's exec path differs anyway). + */ + +import { describe, it, expect, beforeAll, afterAll } from 'vitest'; +import { spawn, execSync } from 'child_process'; +import * as https from 'https'; +import * as fs from 'fs'; +import * as os from 'os'; +import * as path from 'path'; +import * as crypto from 'crypto'; +import type { AddressInfo } from 'net'; + +const SHIM_SRC = path.join(__dirname, '..', 'scripts', 'npm-shim.js'); +const target = `${process.platform}-${process.arch}`; +const asset = `codegraph-${target}.tar.gz`; +const isWindows = process.platform === 'win32'; + +function hasOpenssl(): boolean { + try { execSync('openssl version', { stdio: 'ignore' }); return true; } catch { return false; } +} +const CAN_NET = !isWindows && hasOpenssl(); + +function mkTmp(label: string): string { + return fs.mkdtempSync(path.join(os.tmpdir(), `cg-shim-${label}-`)); +} + +// A temp dir standing in for the installed @colbymchenry/codegraph main package. +function makePkg(version = '9.9.9-test'): string { + const dir = mkTmp('pkg'); + fs.copyFileSync(SHIM_SRC, path.join(dir, 'npm-shim.js')); + fs.writeFileSync(path.join(dir, 'package.json'), + JSON.stringify({ name: '@colbymchenry/codegraph', version }) + '\n'); + return dir; +} + +// A fake bundle launcher that prints a marker + its args, so we can prove the +// shim found and exec'd it (and passed args through). +function writeLauncher(binDir: string): void { + fs.mkdirSync(binDir, { recursive: true }); + const p = path.join(binDir, 'codegraph'); + fs.writeFileSync(p, '#!/bin/sh\necho "FAKE_BUNDLE_RAN args:$*"\n'); + fs.chmodSync(p, 0o755); +} + +// Launch the shim with async spawn so the in-process HTTPS server can respond +// while it runs (spawnSync would block this event loop and deadlock). +function runShim(pkgDir: string, args: string[], env: Record) { + return new Promise<{ status: number | null; stdout: string; stderr: string }>((resolve) => { + const child = spawn(process.execPath, [path.join(pkgDir, 'npm-shim.js'), ...args], { + env: { ...process.env, ...env }, + }); + let stdout = '', stderr = ''; + child.stdout.on('data', (d) => { stdout += d.toString(); }); + child.stderr.on('data', (d) => { stderr += d.toString(); }); + child.on('close', (status) => resolve({ status, stdout, stderr })); + }); +} + +describe.skipIf(isWindows)('npm-shim launcher', () => { + it('runs the installed optional-dependency bundle without any download', async () => { + const pkg = makePkg(); + const platformPkg = path.join(pkg, 'node_modules', '@colbymchenry', `codegraph-${target}`); + writeLauncher(path.join(platformPkg, 'bin')); + fs.writeFileSync(path.join(platformPkg, 'package.json'), + JSON.stringify({ name: `@colbymchenry/codegraph-${target}`, version: '9.9.9-test' }) + '\n'); + const cache = mkTmp('cache'); + const r = await runShim(pkg, ['--probe-abc'], { CODEGRAPH_INSTALL_DIR: cache }); + + expect(r.status).toBe(0); + expect(r.stdout).toContain('FAKE_BUNDLE_RAN'); + expect(r.stdout).toContain('--probe-abc'); // args passed through + expect(r.stderr).not.toContain('downloading'); // never reached the fallback + expect(fs.existsSync(path.join(cache, 'bundles'))).toBe(false); + }); + + it('uses an already-cached bundle even when downloads are disabled', async () => { + const pkg = makePkg('1.2.3-cached'); + const cache = mkTmp('cache'); + writeLauncher(path.join(cache, 'bundles', `${target}-1.2.3-cached`, 'bin')); + const r = await runShim(pkg, ['--probe-xyz'], { + CODEGRAPH_INSTALL_DIR: cache, + CODEGRAPH_NO_DOWNLOAD: '1', + }); + + expect(r.status).toBe(0); + expect(r.stdout).toContain('FAKE_BUNDLE_RAN'); + expect(r.stdout).toContain('--probe-xyz'); + expect(r.stderr).toBe(''); + }); + + it('prints actionable guidance and exits 1 when disabled with no bundle', async () => { + const pkg = makePkg(); + const r = await runShim(pkg, ['--version'], { + CODEGRAPH_INSTALL_DIR: mkTmp('cache'), + CODEGRAPH_NO_DOWNLOAD: '1', + }); + + expect(r.status).toBe(1); + expect(r.stderr).toContain(`no prebuilt bundle for ${target}`); + expect(r.stderr).toContain(`@colbymchenry/codegraph-${target}`); + expect(r.stderr).toContain('--registry=https://registry.npmjs.org'); + expect(r.stderr).toContain('install.sh'); + }); +}); + +describe.skipIf(!CAN_NET)('npm-shim download fallback (local HTTPS)', () => { + let server: https.Server; + let port = 0; + let fixtureBytes: Buffer; + let fixtureSha: string; + let sumsBody: string | null = null; // per-test: SHA256SUMS contents, or null for 404 + + beforeAll(async () => { + // Self-signed cert for the mock release host. + const cdir = mkTmp('tls'); + const keyP = path.join(cdir, 'key.pem'); + const certP = path.join(cdir, 'cert.pem'); + execSync( + `openssl req -x509 -newkey rsa:2048 -nodes -keyout ${keyP} -out ${certP} -days 1 -subj "/CN=localhost"`, + { stdio: 'ignore' }, + ); + + // Build a fake bundle archive (codegraph-/bin/codegraph), like a real release asset. + const work = mkTmp('fixture'); + writeLauncher(path.join(work, `codegraph-${target}`, 'bin')); + const archive = path.join(work, asset); + execSync(`tar -czf ${JSON.stringify(archive)} -C ${JSON.stringify(work)} codegraph-${target}`); + fixtureBytes = fs.readFileSync(archive); + fixtureSha = crypto.createHash('sha256').update(fixtureBytes).digest('hex'); + + server = https.createServer({ key: fs.readFileSync(keyP), cert: fs.readFileSync(certP) }, (req, res) => { + const url = req.url || ''; + if (url.endsWith(`/${asset}`)) { + res.writeHead(200); res.end(fixtureBytes); + } else if (url.endsWith('/SHA256SUMS')) { + if (sumsBody === null) { res.writeHead(404); res.end('not found'); } + else { res.writeHead(200); res.end(sumsBody); } + } else { + res.writeHead(404); res.end('not found'); + } + }); + await new Promise((resolve) => server.listen(0, '127.0.0.1', resolve)); + port = (server.address() as AddressInfo).port; + }, 30000); + + afterAll(() => { server?.close(); }); + + function netEnv(cache: string): Record { + return { + CODEGRAPH_INSTALL_DIR: cache, + CODEGRAPH_DOWNLOAD_BASE: `https://127.0.0.1:${port}`, + NODE_TLS_REJECT_UNAUTHORIZED: '0', + }; + } + + it('downloads, verifies the checksum, extracts, and execs the bundle', async () => { + sumsBody = `${fixtureSha} ${asset}\n`; + const pkg = makePkg('5.0.0-net'); + const cache = mkTmp('cache'); + const r = await runShim(pkg, ['--probe-net'], netEnv(cache)); + + expect(r.stderr).toContain('downloading'); + expect(r.stderr).toContain('checksum verified'); + expect(r.status).toBe(0); + expect(r.stdout).toContain('FAKE_BUNDLE_RAN'); + expect(r.stdout).toContain('--probe-net'); + expect(fs.existsSync(path.join(cache, 'bundles', `${target}-5.0.0-net`, 'bin', 'codegraph'))).toBe(true); + }, 20000); + + it('aborts (exit 1) on a checksum mismatch and caches nothing', async () => { + sumsBody = `${'0'.repeat(64)} ${asset}\n`; + const pkg = makePkg('5.0.0-bad'); + const cache = mkTmp('cache'); + const r = await runShim(pkg, ['--version'], netEnv(cache)); + + expect(r.status).toBe(1); + expect(r.stderr).toContain('checksum mismatch'); + expect(r.stdout).not.toContain('FAKE_BUNDLE_RAN'); // never exec'd a tampered bundle + expect(fs.existsSync(path.join(cache, 'bundles', `${target}-5.0.0-bad`))).toBe(false); + }, 20000); + + it('proceeds when no SHA256SUMS is published (older releases)', async () => { + sumsBody = null; // 404 + const pkg = makePkg('5.0.0-nosums'); + const cache = mkTmp('cache'); + const r = await runShim(pkg, ['--version'], netEnv(cache)); + + expect(r.status).toBe(0); + expect(r.stderr).toContain('downloading'); + expect(r.stderr).not.toContain('checksum verified'); // skipped, not failed + expect(r.stdout).toContain('FAKE_BUNDLE_RAN'); + }, 20000); +}); diff --git a/package.json b/package.json index f813c1e6..5455ced9 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@colbymchenry/codegraph", - "version": "0.9.3", + "version": "0.9.4", "description": "Supercharge Claude Code with semantic code intelligence. 94% fewer tool calls • 77% faster exploration • 100% local.", "main": "dist/index.js", "types": "dist/index.d.ts", diff --git a/scripts/npm-shim.js b/scripts/npm-shim.js index 81012124..09b435e5 100755 --- a/scripts/npm-shim.js +++ b/scripts/npm-shim.js @@ -11,48 +11,236 @@ // (with node:sqlite), regardless of the user's Node version. The user's Node is // only ever a launcher; even an ancient version can run this file. // +// Self-heal (issue #303): some registries — notably the npmmirror/cnpm mirrors, +// and some corporate proxies — don't reliably mirror the per-platform +// optionalDependencies. npm treats an unfetchable optional dep as success and +// silently skips it, so the bundle goes missing and every command fails. When +// the installed bundle can't be resolved, this shim falls back to downloading +// the matching bundle straight from GitHub Releases — the very archive +// install.sh uses — into a cache dir, then runs that. Knobs: +// CODEGRAPH_NO_DOWNLOAD=1 disable the network fallback (print guidance) +// CODEGRAPH_INSTALL_DIR=DIR cache location (default: ~/.codegraph) +// CODEGRAPH_DOWNLOAD_BASE=URL release-download base (for mirrors/air-gapped) +// // Wired up at release time as the main package's `bin`: -// "bin": { "codegraph": "scripts/npm-shim.js" } +// "bin": { "codegraph": "npm-shim.js" } // with the platform packages listed in `optionalDependencies`. var childProcess = require('child_process'); +var fs = require('fs'); +var os = require('os'); +var path = require('path'); var target = process.platform + '-' + process.arch; // e.g. darwin-arm64, linux-x64 var pkg = '@colbymchenry/codegraph-' + target; var isWindows = process.platform === 'win32'; +var REPO = 'colbymchenry/codegraph'; + +main().catch(function (e) { + process.stderr.write('codegraph: ' + (e && e.message ? e.message : String(e)) + '\n'); + process.exit(1); +}); + +async function main() { + // Happy path: the npm-installed optional dependency. Fall back to a download + // when the registry didn't deliver it. + var resolved = resolveInstalledBundle() || (await selfHealBundle()); + var res = childProcess.spawnSync(resolved.command, resolved.args, { stdio: 'inherit' }); + if (res.error) { + process.stderr.write('codegraph: ' + res.error.message + '\n'); + process.exit(1); + } + process.exit(res.status === null ? 1 : res.status); +} -// On Windows the bundle's launcher is a .cmd batch file. Modern Node refuses to -// spawn .cmd/.bat directly — spawnSync throws EINVAL (the CVE-2024-27980 -// hardening, observed on Node 24). So on Windows we skip the .cmd and invoke the -// bundled node.exe against the app entry point directly. On unix the bin launcher -// is a shell script that spawns cleanly. -var command, args; -try { +// Resolve the launcher from the installed per-platform optionalDependency. +// Returns {command, args} or null if the package isn't installed. +function resolveInstalledBundle() { + try { + if (isWindows) { + // Modern Node refuses to spawn the bundle's .cmd directly (EINVAL, the + // CVE-2024-27980 hardening on Node 24), so invoke the bundled node.exe + // against the app entry point and pass --liftoff-only here. + var nodeExe = require.resolve(pkg + '/node.exe'); + var entry = require.resolve(pkg + '/lib/dist/bin/codegraph.js'); + return { command: nodeExe, args: liftoff(entry) }; + } + return { command: require.resolve(pkg + '/bin/codegraph'), args: process.argv.slice(2) }; + } catch (e) { + return null; + } +} + +// Locate the launcher inside an extracted GitHub bundle directory (same +// node/lib/bin layout as the npm platform package). Returns {command, args} or +// null when the directory doesn't hold a usable bundle yet. +function launcherIn(dir) { if (isWindows) { - command = require.resolve(pkg + '/node.exe'); - var entry = require.resolve(pkg + '/lib/dist/bin/codegraph.js'); - // --liftoff-only: keep tree-sitter's WASM grammars off V8's turboshaft tier - // to avoid the Zone OOM on Node >= 22 (issues #293/#298). The unix launcher - // passes this too; on Windows we invoke node.exe directly so add it here. - args = ['--liftoff-only', entry].concat(process.argv.slice(2)); + var nodeExe = path.join(dir, 'node.exe'); + var entry = path.join(dir, 'lib', 'dist', 'bin', 'codegraph.js'); + if (fs.existsSync(nodeExe) && fs.existsSync(entry)) { + return { command: nodeExe, args: liftoff(entry) }; + } } else { - command = require.resolve(pkg + '/bin/codegraph'); - args = process.argv.slice(2); + var launcher = path.join(dir, 'bin', 'codegraph'); + if (fs.existsSync(launcher)) return { command: launcher, args: process.argv.slice(2) }; + } + return null; +} + +// --liftoff-only keeps tree-sitter's WASM grammars off V8's turboshaft tier to +// avoid the Zone OOM on Node >= 22 (issues #293/#298). The unix bin/codegraph +// launcher already passes it; on Windows we invoke node.exe directly so add it. +function liftoff(entry) { + return ['--liftoff-only', entry].concat(process.argv.slice(2)); +} + +// Download + cache the platform bundle from GitHub Releases. Returns +// {command, args}; exits the process with guidance if it can't. +async function selfHealBundle() { + var version = readVersion(); + var bundlesDir = path.join(process.env.CODEGRAPH_INSTALL_DIR || path.join(os.homedir(), '.codegraph'), 'bundles'); + var dest = path.join(bundlesDir, target + '-' + version); + + // Already downloaded by a previous run? Use it even when downloads are + // disabled — CODEGRAPH_NO_DOWNLOAD blocks fetching, not a cached bundle. + var cached = launcherIn(dest); + if (cached) return cached; + + if (process.env.CODEGRAPH_NO_DOWNLOAD) { + fail('the network fallback is disabled (CODEGRAPH_NO_DOWNLOAD is set).'); } -} catch (e) { + + var asset = 'codegraph-' + target + (isWindows ? '.zip' : '.tar.gz'); + var base = process.env.CODEGRAPH_DOWNLOAD_BASE || ('https://github.com/' + REPO + '/releases/download'); + var url = base + '/v' + version + '/' + asset; + process.stderr.write( - 'codegraph: no prebuilt bundle for ' + target + '.\n' + - 'Expected the optional package ' + pkg + ' to be installed.\n' + - 'Try reinstalling: npm i -g @colbymchenry/codegraph\n' + - 'Or use the standalone installer (no Node required):\n' + - ' curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh | sh\n' + 'codegraph: platform bundle missing (registry did not provide ' + pkg + ').\n' + + 'codegraph: downloading ' + asset + ' from GitHub Releases (' + version + ')...\n' ); - process.exit(1); + + // Stage inside bundlesDir so the final rename is on the same filesystem (atomic, + // no EXDEV across tmpfs). Strip the archive's top-level codegraph-/ dir. + fs.mkdirSync(bundlesDir, { recursive: true }); + var stage = fs.mkdtempSync(path.join(bundlesDir, '.dl-')); + try { + var archivePath = path.join(stage, asset); + await download(url, archivePath, 6); + await verifyChecksum(archivePath, asset, base, version); + var extracted = path.join(stage, 'bundle'); + fs.mkdirSync(extracted); + extract(archivePath, extracted); + + var raced = launcherIn(dest); // another process may have finished meanwhile + if (raced) { rmrf(stage); return raced; } + try { + fs.renameSync(extracted, dest); + } catch (e) { + var other = launcherIn(dest); // lost the race but theirs is valid + if (other) { rmrf(stage); return other; } + throw e; + } + } catch (e) { + rmrf(stage); + fail('download failed (' + e.message + ').\n URL: ' + url); + } + rmrf(stage); + + var ready = launcherIn(dest); + if (!ready) fail('downloaded bundle is missing its launcher under ' + dest + '.'); + process.stderr.write('codegraph: bundle ready.\n'); + return ready; +} + +function readVersion() { + try { + return require(path.join(__dirname, 'package.json')).version; + } catch (e) { + fail('could not read this package\'s version to locate a matching release.'); + } } -var res = childProcess.spawnSync(command, args, { stdio: 'inherit' }); -if (res.error) { - process.stderr.write('codegraph: ' + res.error.message + '\n'); +// GET with manual redirect following (GitHub release URLs redirect to a CDN). +function download(url, dest, redirectsLeft) { + return new Promise(function (resolve, reject) { + var https = require('https'); + // timeout is an idle/inactivity timeout — it won't kill a slow-but-progressing + // download, only a stalled connection (so a blocked mirror fails fast with + // guidance instead of hanging the user's command forever). + var req = https.get(url, { headers: { 'User-Agent': 'codegraph-npm-shim' }, timeout: 30000 }, function (res) { + var status = res.statusCode; + if (status >= 300 && status < 400 && res.headers.location) { + res.resume(); + if (redirectsLeft <= 0) { reject(new Error('too many redirects')); return; } + download(new URL(res.headers.location, url).toString(), dest, redirectsLeft - 1).then(resolve, reject); + return; + } + if (status !== 200) { res.resume(); reject(new Error('HTTP ' + status)); return; } + var file = fs.createWriteStream(dest); + res.on('error', reject); + res.pipe(file); + file.on('error', reject); + file.on('finish', function () { file.close(function () { resolve(); }); }); + }); + req.on('timeout', function () { req.destroy(new Error('connection timed out')); }); + req.on('error', reject); + }); +} + +// Best-effort integrity check. When the release publishes a SHA256SUMS file, the +// downloaded archive MUST match its listed hash or we abort. When that file is +// absent (older releases) or simply unreachable, we proceed — the archive still +// arrived from GitHub over TLS. So tampering/corruption is caught, while a +// missing checksum never breaks an install. +async function verifyChecksum(archivePath, asset, base, version) { + var sumsPath = archivePath + '.SHA256SUMS'; + try { + await download(base + '/v' + version + '/SHA256SUMS', sumsPath, 6); + } catch (e) { + return; // not published / unreachable → skip + } + var expected = null; + var lines = fs.readFileSync(sumsPath, 'utf8').split('\n'); + for (var i = 0; i < lines.length; i++) { + var m = lines[i].trim().match(/^([0-9a-fA-F]{64})\s+\*?(.+)$/); + if (m && path.basename(m[2].trim()) === asset) { expected = m[1].toLowerCase(); break; } + } + if (!expected) return; // asset not listed → nothing to check + var actual = require('crypto').createHash('sha256').update(fs.readFileSync(archivePath)).digest('hex'); + if (actual !== expected) { + throw new Error('checksum mismatch for ' + asset + + ' (expected ' + expected.slice(0, 12) + '…, got ' + actual.slice(0, 12) + '…)'); + } + process.stderr.write('codegraph: checksum verified.\n'); +} + +// Extract via the system tar — present on macOS, Linux, and Windows 10+ +// (bsdtar reads .zip too). No third-party dependency in the shim. +function extract(archive, destDir) { + var args = isWindows + ? ['-xf', archive, '-C', destDir, '--strip-components=1'] + : ['-xzf', archive, '-C', destDir, '--strip-components=1']; + var res = childProcess.spawnSync('tar', args, { stdio: 'ignore' }); + if (res.error) throw new Error('tar unavailable: ' + res.error.message); + if (res.status !== 0) throw new Error('tar exited ' + res.status); +} + +function rmrf(p) { + try { fs.rmSync(p, { recursive: true, force: true }); } catch (e) { /* best effort */ } +} + +function fail(reason) { + process.stderr.write( + 'codegraph: no prebuilt bundle for ' + target + '.\n' + + (reason ? 'codegraph: ' + reason + '\n' : '') + + 'Expected the optional package ' + pkg + ' to be installed.\n' + + 'A registry mirror (e.g. npmmirror/cnpm) that did not mirror the per-platform\n' + + 'package is the usual cause. Fixes:\n' + + ' - install from the official registry:\n' + + ' npm i -g @colbymchenry/codegraph --registry=https://registry.npmjs.org\n' + + ' - or use the standalone installer (no Node required):\n' + + ' curl -fsSL https://raw.githubusercontent.com/' + REPO + '/main/install.sh | sh\n' + ); process.exit(1); } -process.exit(res.status === null ? 1 : res.status); From 4e34ba8399198585743b06af8ea168dc7263d4aa Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Fri, 22 May 2026 12:53:07 -0500 Subject: [PATCH 34/47] fix: resolve install.sh latest version without the GitHub API (#325) (#336) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The standalone installer resolved the latest release via the GitHub API, which rate-limits unauthenticated requests to 60/hr per IP and returns 403 on shared or cloud hosts (devboxes, CI) — leaving "could not resolve latest version". It now reads the version from the releases/latest web redirect (no rate limit), falling back to the API, and normalizes CODEGRAPH_VERSION so a bare "0.9.4" works as well as "v0.9.4". Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 9 +++++++++ install.sh | 14 +++++++++++++- 2 files changed, 22 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 535b0ce9..3e35df64 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -31,6 +31,15 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). `CODEGRAPH_DOWNLOAD_BASE=` to point it at your own mirror of the release archives; the standalone `install.sh` remains the no-Node alternative. Resolves [#303](https://github.com/colbymchenry/codegraph/issues/303). +- **`install.sh` failing with `403` / "could not resolve latest version" on + shared or cloud hosts.** The standalone installer resolved the latest release + through the GitHub API, whose unauthenticated limit is 60 requests/hour per IP + — routinely exhausted on cloud devboxes and CI where many users share an + address, returning `403` (issue #325). It now resolves the version from the + `releases/latest` web redirect, which isn't rate-limited (and still falls back + to the API). `CODEGRAPH_VERSION` also accepts a bare `0.9.4` in addition to + `v0.9.4`. Resolves + [#325](https://github.com/colbymchenry/codegraph/issues/325). ## [0.9.3] - 2026-05-22 diff --git a/install.sh b/install.sh index 5cf01346..b4004fb1 100755 --- a/install.sh +++ b/install.sh @@ -44,12 +44,24 @@ esac target="${os}-${arch}" # 2. Resolve the version (latest release unless pinned). +# +# Resolve "latest" from the releases/latest *web* redirect, not the GitHub API: +# the unauthenticated API is rate-limited to 60 requests/hour per IP and returns +# 403 once exhausted — routine on shared/cloud hosts and CI (issue #325). The +# redirect (github.com//releases/latest -> .../releases/tag/vX.Y.Z) has no +# such limit. Fall back to the API if the redirect can't be read. version="${CODEGRAPH_VERSION:-}" +if [ -z "$version" ]; then + version="$(curl -fsSLI -o /dev/null -w '%{url_effective}' "https://github.com/$REPO/releases/latest" \ + | sed -n 's#.*/releases/tag/##p')" +fi if [ -z "$version" ]; then version="$(curl -fsSL "https://api.github.com/repos/$REPO/releases/latest" \ | sed -n 's/.*"tag_name": *"\([^"]*\)".*/\1/p' | head -n1)" fi -[ -n "$version" ] || { echo "codegraph: could not resolve latest version; set CODEGRAPH_VERSION." >&2; exit 1; } +[ -n "$version" ] || { echo "codegraph: could not resolve latest version; set CODEGRAPH_VERSION (e.g. CODEGRAPH_VERSION=v0.9.4)." >&2; exit 1; } +# Release tags are vX.Y.Z; accept a bare X.Y.Z in CODEGRAPH_VERSION too. +case "$version" in v*) ;; *) version="v$version" ;; esac # 3. Download + extract the bundle. url="https://github.com/$REPO/releases/download/$version/codegraph-${target}.tar.gz" From b13f2f1ba184bed299e000d31c746f75fa4654c6 Mon Sep 17 00:00:00 2001 From: andreinknv Date: Fri, 22 May 2026 14:16:34 -0400 Subject: [PATCH 35/47] perf(db): batch node lookups, fix insertNode cache, run maintenance after writes (#108) Batch getNodesByIds to collapse N+1 reads in graph traversal, invalidate the insertNode LRU cache so INSERT OR REPLACE doesn't serve a stale row, and run incremental PRAGMA optimize + passive WAL checkpoint after bulk writes. Closes #108 --- __tests__/db-perf.test.ts | 161 ++++++++++++++++++++++++++++++++++++++ src/db/index.ts | 30 +++++++ src/db/queries.ts | 59 ++++++++++++++ src/graph/traversal.ts | 116 ++++++++++++++++----------- src/index.ts | 11 +++ 5 files changed, 330 insertions(+), 47 deletions(-) create mode 100644 __tests__/db-perf.test.ts diff --git a/__tests__/db-perf.test.ts b/__tests__/db-perf.test.ts new file mode 100644 index 00000000..256cf92c --- /dev/null +++ b/__tests__/db-perf.test.ts @@ -0,0 +1,161 @@ +/** + * DB Performance / Correctness Tests + * + * Regression tests for three changes: + * 1. Batch `getNodesByIds` collapses graph-traversal N+1 reads. + * 2. `insertNode` invalidates the LRU cache so INSERT OR REPLACE + * doesn't serve a stale cached row on next `getNodeById`. + * 3. `runMaintenance` runs `PRAGMA optimize` + `wal_checkpoint(PASSIVE)` + * after indexAll/sync without throwing. + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import { DatabaseConnection } from '../src/db'; +import { QueryBuilder } from '../src/db/queries'; +import { Node } from '../src/types'; + +function makeNode(id: string, name = id): Node { + return { + id, + kind: 'function', + name, + qualifiedName: name, + filePath: 'a.ts', + language: 'typescript', + startLine: 1, + endLine: 1, + startColumn: 0, + endColumn: 0, + updatedAt: Date.now(), + }; +} + +describe('getNodesByIds (batch lookup)', () => { + let dir: string; + let db: DatabaseConnection; + let q: QueryBuilder; + + beforeEach(() => { + dir = fs.mkdtempSync(path.join(os.tmpdir(), 'db-perf-batch-')); + db = DatabaseConnection.initialize(path.join(dir, 'test.db')); + q = new QueryBuilder(db.getDb()); + }); + + afterEach(() => { + db.close(); + if (fs.existsSync(dir)) fs.rmSync(dir, { recursive: true, force: true }); + }); + + it('returns a Map keyed by id, with one entry per existing node', () => { + q.insertNodes([makeNode('n1'), makeNode('n2'), makeNode('n3')]); + const out = q.getNodesByIds(['n1', 'n2', 'n3']); + expect(out.size).toBe(3); + expect(out.get('n1')!.name).toBe('n1'); + expect(out.get('n3')!.name).toBe('n3'); + }); + + it('omits missing IDs from the result map (no nulls, no exceptions)', () => { + q.insertNodes([makeNode('n1'), makeNode('n2')]); + const out = q.getNodesByIds(['n1', 'missing', 'n2']); + expect(out.size).toBe(2); + expect(out.has('missing')).toBe(false); + expect(out.has('n1')).toBe(true); + expect(out.has('n2')).toBe(true); + }); + + it('handles an empty input array', () => { + expect(q.getNodesByIds([]).size).toBe(0); + }); + + it('handles batches over the SQLite parameter limit (chunking)', () => { + // Insert 1500 nodes; the helper chunks at 500 internally. + const nodes = Array.from({ length: 1500 }, (_, i) => makeNode(`n${i}`)); + q.insertNodes(nodes); + const ids = nodes.map((n) => n.id); + const out = q.getNodesByIds(ids); + expect(out.size).toBe(1500); + // Spot-check a few from the first / middle / last chunk. + expect(out.has('n0')).toBe(true); + expect(out.has('n750')).toBe(true); + expect(out.has('n1499')).toBe(true); + }); + + it('serves cache hits from memory and queries only the misses', () => { + q.insertNodes([makeNode('n1'), makeNode('n2'), makeNode('n3')]); + // Warm the cache for n1 only. + q.getNodeById('n1'); + // Replace the underlying row to make a miss-vs-cache-hit detectable. + db.getDb().prepare('UPDATE nodes SET name = ? WHERE id = ?').run('changed', 'n1'); + const out = q.getNodesByIds(['n1', 'n2']); + // The cached n1 (still 'n1', not 'changed') must be returned. + expect(out.get('n1')!.name).toBe('n1'); + expect(out.get('n2')!.name).toBe('n2'); + }); +}); + +describe('insertNode cache invalidation', () => { + let dir: string; + let db: DatabaseConnection; + let q: QueryBuilder; + + beforeEach(() => { + dir = fs.mkdtempSync(path.join(os.tmpdir(), 'db-perf-cache-')); + db = DatabaseConnection.initialize(path.join(dir, 'test.db')); + q = new QueryBuilder(db.getDb()); + }); + + afterEach(() => { + db.close(); + if (fs.existsSync(dir)) fs.rmSync(dir, { recursive: true, force: true }); + }); + + it('does not serve a stale cached node after INSERT OR REPLACE', () => { + // Regression: insertNode (which uses INSERT OR REPLACE) used to skip + // cache invalidation, so the next getNodeById returned the pre-replace + // version until LRU eviction. + const original = makeNode('n1', 'oldName'); + q.insertNode(original); + const beforeReplace = q.getNodeById('n1'); + expect(beforeReplace!.name).toBe('oldName'); + + // Replace via insertNode (the bug path). + q.insertNode({ ...original, name: 'newName', updatedAt: Date.now() }); + const afterReplace = q.getNodeById('n1'); + expect(afterReplace!.name).toBe('newName'); + }); +}); + +describe('runMaintenance', () => { + let dir: string; + let db: DatabaseConnection; + + beforeEach(() => { + dir = fs.mkdtempSync(path.join(os.tmpdir(), 'db-perf-maint-')); + db = DatabaseConnection.initialize(path.join(dir, 'test.db')); + }); + + afterEach(() => { + db.close(); + if (fs.existsSync(dir)) fs.rmSync(dir, { recursive: true, force: true }); + }); + + it('runs without throwing on a fresh database', () => { + expect(() => db.runMaintenance()).not.toThrow(); + }); + + it('runs without throwing after writes', () => { + const q = new QueryBuilder(db.getDb()); + q.insertNodes([makeNode('n1'), makeNode('n2')]); + expect(() => db.runMaintenance()).not.toThrow(); + }); + + it('swallows failures rather than propagating (best-effort)', () => { + // Close the DB so the underlying handle would normally throw on any + // exec(). runMaintenance must still not propagate. + db.close(); + expect(() => db.runMaintenance()).not.toThrow(); + }); +}); diff --git a/src/db/index.ts b/src/db/index.ts index 36212de1..cbc08b8f 100644 --- a/src/db/index.ts +++ b/src/db/index.ts @@ -186,6 +186,36 @@ export class DatabaseConnection { this.db.exec('ANALYZE'); } + /** + * Lightweight, non-blocking maintenance to run after bulk writes + * (indexAll, sync). Two operations: + * + * - `PRAGMA optimize` — incremental ANALYZE; SQLite only re-analyzes + * tables whose row counts changed materially since the last + * ANALYZE. Without it, the query planner has no statistics on the + * freshly-bulk-loaded tables and can pick suboptimal indexes. + * + * - `PRAGMA wal_checkpoint(PASSIVE)` — fold pending WAL pages back + * into the main database file so the WAL file doesn't grow + * unboundedly between automatic checkpoints (auto-fires at 1000 + * pages by default; large indexAll runs blow past that). + * + * Both operations are silently swallowed on failure — they're a + * best-effort optimization, never load-bearing for correctness. + */ + runMaintenance(): void { + try { + this.db.exec('PRAGMA optimize'); + } catch { + // ignore + } + try { + this.db.exec('PRAGMA wal_checkpoint(PASSIVE)'); + } catch { + // ignore (e.g., not in WAL mode) + } + } + /** * Close the database connection */ diff --git a/src/db/queries.ts b/src/db/queries.ts index ebba66e6..fae3b754 100644 --- a/src/db/queries.ts +++ b/src/db/queries.ts @@ -224,6 +224,12 @@ export class QueryBuilder { return; } + // INSERT OR REPLACE may overwrite a node we have cached. Drop the + // stale entry so the next getNodeById sees the new row, not the old + // one (matches the cache-invalidation pattern used by updateNode and + // deleteNode below). + this.nodeCache.delete(node.id); + try { this.stmts.insertNode.run({ id: node.id, @@ -380,6 +386,59 @@ export class QueryBuilder { return node; } + /** + * Batch lookup: fetch many nodes by ID in a single SQL round-trip. + * + * Replaces the N+1 pattern in graph traversal where every edge would + * trigger its own `getNodeById` call. For a function with 50 callers + * this collapses 50 point reads into one IN-list query (~10-50x + * faster end-to-end). + * + * Returns a Map keyed by id so callers can preserve their own ordering + * (typically the order edges were returned from the graph). Missing IDs + * are simply absent from the map. + * + * Cache-aware: ids already in the LRU cache are served from memory and + * the SQL query only touches the misses. + */ + getNodesByIds(ids: readonly string[]): Map { + const out = new Map(); + if (ids.length === 0) return out; + + // Serve cache hits first; build the miss list for SQL. + const misses: string[] = []; + for (const id of ids) { + const cached = this.nodeCache.get(id); + if (cached !== undefined) { + // LRU touch + this.nodeCache.delete(id); + this.nodeCache.set(id, cached); + out.set(id, cached); + } else { + misses.push(id); + } + } + if (misses.length === 0) return out; + + // Chunk under SQLite's parameter limit (default 999, raised to 32766 + // in better-sqlite3 builds — chunk at 500 for safety across both + // backends and to keep the query plan simple). + const CHUNK = 500; + for (let i = 0; i < misses.length; i += CHUNK) { + const chunk = misses.slice(i, i + CHUNK); + const placeholders = chunk.map(() => '?').join(','); + const rows = this.db + .prepare(`SELECT * FROM nodes WHERE id IN (${placeholders})`) + .all(...chunk) as NodeRow[]; + for (const row of rows) { + const node = rowToNode(row); + out.set(node.id, node); + this.cacheNode(node); + } + } + return out; + } + /** * Add a node to the cache, evicting oldest if needed */ diff --git a/src/graph/traversal.ts b/src/graph/traversal.ts index dd5b5029..c366721b 100644 --- a/src/graph/traversal.ts +++ b/src/graph/traversal.ts @@ -90,29 +90,24 @@ export class GraphTraverser { return priority(a) - priority(b); }); + // Batch-fetch the unvisited neighbors in one query (was N+1 per BFS step). + const wantIds = adjacentEdges + .map((e) => (e.source === node.id ? e.target : e.source)) + .filter((id) => !visited.has(id)); + const neighborNodes = wantIds.length > 0 ? this.queries.getNodesByIds(wantIds) : new Map(); + for (const adjEdge of adjacentEdges) { - // Determine next node: for 'both' direction, edges can be either - // incoming or outgoing, so pick whichever end is not the current node const nextNodeId = adjEdge.source === node.id ? adjEdge.target : adjEdge.source; + if (visited.has(nextNodeId)) continue; - if (visited.has(nextNodeId)) { - continue; - } - - const nextNode = this.queries.getNodeById(nextNodeId); - if (!nextNode) { - continue; - } + const nextNode = neighborNodes.get(nextNodeId); + if (!nextNode) continue; - // Apply node kind filter if (opts.nodeKinds && opts.nodeKinds.length > 0 && !opts.nodeKinds.includes(nextNode.kind)) { continue; } - // Add node to result nodes.set(nextNode.id, nextNode); - - // Queue for further traversal queue.push({ node: nextNode, edge: adjEdge, depth: depth + 1 }); } } @@ -176,19 +171,18 @@ export class GraphTraverser { // Get adjacent edges const adjacentEdges = this.getAdjacentEdges(node.id, opts.direction, opts.edgeKinds); + // Batch-fetch unvisited neighbors (was N+1 per DFS step). + const wantIds = adjacentEdges + .map((e) => (e.source === node.id ? e.target : e.source)) + .filter((id) => !visited.has(id)); + const neighborNodes = wantIds.length > 0 ? this.queries.getNodesByIds(wantIds) : new Map(); + for (const edge of adjacentEdges) { - // Determine next node: for 'both' direction, edges can be either - // incoming or outgoing, so pick whichever end is not the current node const nextNodeId = edge.source === node.id ? edge.target : edge.source; + if (visited.has(nextNodeId)) continue; - if (visited.has(nextNodeId)) { - continue; - } - - const nextNode = this.queries.getNodeById(nextNodeId); - if (!nextNode) { - continue; - } + const nextNode = neighborNodes.get(nextNodeId); + if (!nextNode) continue; // Apply node kind filter if (opts.nodeKinds && opts.nodeKinds.length > 0 && !opts.nodeKinds.includes(nextNode.kind)) { @@ -255,9 +249,15 @@ export class GraphTraverser { visited.add(nodeId); const incomingEdges = this.queries.getIncomingEdges(nodeId, ['calls', 'references', 'imports']); + if (incomingEdges.length === 0) return; + + // Batch-fetch all caller nodes in one round-trip instead of one + // getNodeById per edge (was N+1 — meaningful on functions with many callers). + const sourceIds = incomingEdges.map((e) => e.source); + const callerNodes = this.queries.getNodesByIds(sourceIds); for (const edge of incomingEdges) { - const callerNode = this.queries.getNodeById(edge.source); + const callerNode = callerNodes.get(edge.source); if (callerNode && !visited.has(callerNode.id)) { result.push({ node: callerNode, edge }); this.getCallersRecursive(callerNode.id, maxDepth, currentDepth + 1, result, visited); @@ -294,9 +294,14 @@ export class GraphTraverser { visited.add(nodeId); const outgoingEdges = this.queries.getOutgoingEdges(nodeId, ['calls', 'references', 'imports']); + if (outgoingEdges.length === 0) return; + + // Batch-fetch callee nodes (was N+1 — see getCallersRecursive note). + const targetIds = outgoingEdges.map((e) => e.target); + const calleeNodes = this.queries.getNodesByIds(targetIds); for (const edge of outgoingEdges) { - const calleeNode = this.queries.getNodeById(edge.target); + const calleeNode = calleeNodes.get(edge.target); if (calleeNode && !visited.has(calleeNode.id)) { result.push({ node: calleeNode, edge }); this.getCalleesRecursive(calleeNode.id, maxDepth, currentDepth + 1, result, visited); @@ -388,9 +393,11 @@ export class GraphTraverser { visited.add(nodeId); const outgoingEdges = this.queries.getOutgoingEdges(nodeId, ['extends', 'implements']); + if (outgoingEdges.length === 0) return; + const parents = this.queries.getNodesByIds(outgoingEdges.map((e) => e.target)); for (const edge of outgoingEdges) { - const parentNode = this.queries.getNodeById(edge.target); + const parentNode = parents.get(edge.target); if (parentNode && !nodes.has(parentNode.id)) { nodes.set(parentNode.id, parentNode); edges.push(edge); @@ -411,9 +418,11 @@ export class GraphTraverser { visited.add(nodeId); const incomingEdges = this.queries.getIncomingEdges(nodeId, ['extends', 'implements']); + if (incomingEdges.length === 0) return; + const children = this.queries.getNodesByIds(incomingEdges.map((e) => e.source)); for (const edge of incomingEdges) { - const childNode = this.queries.getNodeById(edge.source); + const childNode = children.get(edge.source); if (childNode && !nodes.has(childNode.id)) { nodes.set(childNode.id, childNode); edges.push(edge); @@ -433,12 +442,13 @@ export class GraphTraverser { // Get all incoming edges (references, calls, type_of, etc.) const incomingEdges = this.queries.getIncomingEdges(nodeId); + if (incomingEdges.length === 0) return result; + // Batch-fetch source nodes (was N+1). + const sources = this.queries.getNodesByIds(incomingEdges.map((e) => e.source)); for (const edge of incomingEdges) { - const sourceNode = this.queries.getNodeById(edge.source); - if (sourceNode) { - result.push({ node: sourceNode, edge }); - } + const sourceNode = sources.get(edge.source); + if (sourceNode) result.push({ node: sourceNode, edge }); } return result; @@ -496,13 +506,16 @@ export class GraphTraverser { const containerKinds = new Set(['class', 'interface', 'struct', 'trait', 'protocol', 'module', 'enum']); if (containerKinds.has(focalNode.kind)) { const containsEdges = this.queries.getOutgoingEdges(nodeId, ['contains']); - for (const edge of containsEdges) { - const childNode = this.queries.getNodeById(edge.target); - if (childNode && !visited.has(childNode.id)) { - nodes.set(childNode.id, childNode); - edges.push(edge); - // Recurse into children at the same depth (they're part of the same symbol) - this.getImpactRecursive(childNode.id, maxDepth, currentDepth, nodes, edges, visited); + if (containsEdges.length > 0) { + const children = this.queries.getNodesByIds(containsEdges.map((e) => e.target)); + for (const edge of containsEdges) { + const childNode = children.get(edge.target); + if (childNode && !visited.has(childNode.id)) { + nodes.set(childNode.id, childNode); + edges.push(edge); + // Recurse into children at the same depth (they're part of the same symbol) + this.getImpactRecursive(childNode.id, maxDepth, currentDepth, nodes, edges, visited); + } } } } @@ -510,9 +523,11 @@ export class GraphTraverser { // Get all incoming edges (things that depend on this node) const incomingEdges = this.queries.getIncomingEdges(nodeId); + if (incomingEdges.length === 0) return; + const sources = this.queries.getNodesByIds(incomingEdges.map((e) => e.source)); for (const edge of incomingEdges) { - const sourceNode = this.queries.getNodeById(edge.source); + const sourceNode = sources.get(edge.source); if (sourceNode && !nodes.has(sourceNode.id)) { nodes.set(sourceNode.id, sourceNode); edges.push(edge); @@ -564,10 +579,17 @@ export class GraphTraverser { nodeId, edgeKinds.length > 0 ? edgeKinds : undefined ); + if (outgoingEdges.length === 0) continue; + + // Batch-fetch only the unvisited targets (was N+1 per BFS frontier). + const wantIds = outgoingEdges + .map((e) => e.target) + .filter((id) => !visited.has(id)); + const nextNodes = wantIds.length > 0 ? this.queries.getNodesByIds(wantIds) : new Map(); for (const edge of outgoingEdges) { if (!visited.has(edge.target)) { - const nextNode = this.queries.getNodeById(edge.target); + const nextNode = nextNodes.get(edge.target); if (nextNode) { queue.push({ nodeId: edge.target, @@ -627,15 +649,15 @@ export class GraphTraverser { */ getChildren(nodeId: string): Node[] { const containsEdges = this.queries.getOutgoingEdges(nodeId, ['contains']); - const children: Node[] = []; + if (containsEdges.length === 0) return []; + // Batch-fetch (was N+1). + const childNodes = this.queries.getNodesByIds(containsEdges.map((e) => e.target)); + const children: Node[] = []; for (const edge of containsEdges) { - const childNode = this.queries.getNodeById(edge.target); - if (childNode) { - children.push(childNode); - } + const childNode = childNodes.get(edge.target); + if (childNode) children.push(childNode); } - return children; } } diff --git a/src/index.ts b/src/index.ts index b2acf346..784bdbfa 100644 --- a/src/index.ts +++ b/src/index.ts @@ -347,6 +347,12 @@ export class CodeGraph { }); } + // Refresh planner stats + checkpoint the WAL after bulk writes. + // Cheap and non-blocking; never load-bearing for correctness. + if (result.success && result.filesIndexed > 0) { + this.db.runMaintenance(); + } + return result; } finally { this.fileLock.release(); @@ -428,6 +434,11 @@ export class CodeGraph { } } + // Refresh planner stats + checkpoint the WAL after bulk writes. + if (result.filesAdded > 0 || result.filesModified > 0 || result.filesRemoved > 0) { + this.db.runMaintenance(); + } + return result; } finally { this.fileLock.release(); From 7340892290cb36ae4471e086e65661620602b057 Mon Sep 17 00:00:00 2001 From: SRIKANTH A <147837484+srikaanthh@users.noreply.github.com> Date: Fri, 22 May 2026 14:18:26 -0400 Subject: [PATCH 36/47] fix: bound resolver caches, validate MCP input sizes, add integration tests (#213) Replace the 7 unbounded ReferenceResolver Map caches with a bounded LRU (env-tunable via CODEGRAPH_RESOLVER_CACHE_SIZE) so memory stays flat on large codebases, and add length caps on MCP tool string inputs (query/task/symbol + projectPath/path/pattern) to prevent oversized-payload DoS. Includes LRU, MCP-input-limit, and full-pipeline integration tests. Closes #213 --- __tests__/integration/full-pipeline.test.ts | 244 ++++++++++++++++++ __tests__/integration/lru-cache.test.ts | 96 +++++++ .../integration/mcp-input-limits.test.ts | 109 ++++++++ src/mcp/tools.ts | 73 +++++- src/resolution/index.ts | 48 +++- src/resolution/lru-cache.ts | 62 +++++ 6 files changed, 623 insertions(+), 9 deletions(-) create mode 100644 __tests__/integration/full-pipeline.test.ts create mode 100644 __tests__/integration/lru-cache.test.ts create mode 100644 __tests__/integration/mcp-input-limits.test.ts create mode 100644 src/resolution/lru-cache.ts diff --git a/__tests__/integration/full-pipeline.test.ts b/__tests__/integration/full-pipeline.test.ts new file mode 100644 index 00000000..cb01aa5c --- /dev/null +++ b/__tests__/integration/full-pipeline.test.ts @@ -0,0 +1,244 @@ +/** + * End-to-end pipeline integration tests + * + * Exercises the full happy path that unit tests cover in isolation: + * init → indexAll → resolveReferences → searchNodes/getCallers/buildContext → sync + * + * Also covers two error paths that were previously uncovered: + * - Indexing a file that contains a syntactically invalid snippet + * (parse errors must not abort the batch). + * - Sync correctly applies adds + modifies + removes in a single pass. + * + * A synthetic ~120-file project is generated per test (5k files would + * dwarf the test runner; 120 files of varied TS shape is enough to + * stress the resolver and graph layers without slowing the suite to a + * crawl). + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import CodeGraph from '../../src/index'; + +function createTempDir(prefix = 'codegraph-int-'): string { + return fs.mkdtempSync(path.join(os.tmpdir(), prefix)); +} + +function cleanupTempDir(dir: string): void { + if (fs.existsSync(dir)) { + fs.rmSync(dir, { recursive: true, force: true }); + } +} + +/** + * Generate a synthetic TypeScript project with the given module count. + * Each module exports a function that calls the previous module's + * function so that the resolver has real import edges + call edges to + * resolve. The first module is a leaf; the last is the root. + */ +function generateSyntheticProject(root: string, moduleCount: number): void { + const srcDir = path.join(root, 'src'); + fs.mkdirSync(srcDir, { recursive: true }); + + // Leaf module — no imports. + fs.writeFileSync( + path.join(srcDir, `mod0.ts`), + `export function fn0(x: number): number { return x + 1; }\n` + + `export class Mod0 { ping(): string { return 'mod0'; } }\n` + ); + + for (let i = 1; i < moduleCount; i++) { + const prev = i - 1; + fs.writeFileSync( + path.join(srcDir, `mod${i}.ts`), + `import { fn${prev}, Mod${prev} } from './mod${prev}';\n` + + `export function fn${i}(x: number): number { return fn${prev}(x) + 1; }\n` + + `export class Mod${i} extends Mod${prev} {\n` + + ` call${i}(): number { return fn${i}(${i}); }\n` + + `}\n` + ); + } + + // Entry point file. + fs.writeFileSync( + path.join(srcDir, 'index.ts'), + `import { fn${moduleCount - 1}, Mod${moduleCount - 1} } from './mod${moduleCount - 1}';\n` + + `export function entry(): number {\n` + + ` const m = new Mod${moduleCount - 1}();\n` + + ` return fn${moduleCount - 1}(0) + m.call${moduleCount - 1}();\n` + + `}\n` + ); +} + +describe('Integration: full pipeline', () => { + let tempDir: string; + + beforeEach(() => { + tempDir = createTempDir(); + }); + + afterEach(() => { + cleanupTempDir(tempDir); + }); + + it('runs init → index → resolve → search → callers → context → sync', async () => { + const MODULE_COUNT = 120; + generateSyntheticProject(tempDir, MODULE_COUNT); + + // ── init ────────────────────────────────────────────────────── + const cg = await CodeGraph.init(tempDir, { + config: { include: ['**/*.ts'], exclude: [] }, + }); + + try { + // ── indexAll ──────────────────────────────────────────────── + const indexResult = await cg.indexAll(); + // Synthetic project: MODULE_COUNT mod files + 1 index file. + expect(indexResult.filesIndexed).toBeGreaterThanOrEqual(MODULE_COUNT); + + const statsAfterIndex = cg.getStats(); + expect(statsAfterIndex.fileCount).toBeGreaterThanOrEqual(MODULE_COUNT); + expect(statsAfterIndex.nodeCount).toBeGreaterThan(MODULE_COUNT * 2); + + // ── resolveReferences ──────────────────────────────────────── + // Many call-site edges are wired up during extraction itself, so + // the unresolved-reference queue may already be drained by the + // time we get here. We assert that resolve completes cleanly and + // returns a well-formed result; downstream callers/callees + // assertions verify the graph is actually populated. + cg.reinitializeResolver(); + const resolution = cg.resolveReferences(); + expect(resolution).toBeDefined(); + expect(resolution.stats).toBeDefined(); + expect(typeof resolution.stats.total).toBe('number'); + expect(typeof resolution.stats.resolved).toBe('number'); + + // ── searchNodes ────────────────────────────────────────────── + const entryResults = cg.searchNodes('entry', { limit: 10 }); + expect(entryResults.length).toBeGreaterThan(0); + const entryNode = entryResults.find((r) => r.node.name === 'entry'); + expect(entryNode).toBeDefined(); + + const midResults = cg.searchNodes(`fn50`, { limit: 10 }); + expect(midResults.find((r) => r.node.name === 'fn50')).toBeDefined(); + + // ── getCallers / getCallees ────────────────────────────────── + const fn0Results = cg.searchNodes('fn0', { limit: 5 }); + const fn0Node = fn0Results.find((r) => r.node.name === 'fn0'); + expect(fn0Node).toBeDefined(); + const callers = cg.getCallers(fn0Node!.node.id); + // fn0 is called by fn1 (at least). After resolution this should + // be wired up. + expect(Array.isArray(callers)).toBe(true); + + // ── buildContext ───────────────────────────────────────────── + const context = await cg.buildContext('entry function chain', { + maxNodes: 10, + format: 'markdown', + }); + expect(typeof context).toBe('string'); + expect((context as string).length).toBeGreaterThan(0); + + // ── sync (add + modify + remove in one pass) ───────────────── + // Add: a new file referencing entry(). + fs.writeFileSync( + path.join(tempDir, 'src', 'consumer.ts'), + `import { entry } from './index';\nexport const result = entry();\n` + ); + // Modify: change mod0. + fs.writeFileSync( + path.join(tempDir, 'src', 'mod0.ts'), + `export function fn0(x: number): number { return x + 2; }\n` + + `export function newHelper(): string { return 'new'; }\n` + + `export class Mod0 { ping(): string { return 'mod0v2'; } }\n` + ); + // Remove: drop mod1 — note this will leave dangling imports in + // mod2, which the resolver should tolerate. + fs.unlinkSync(path.join(tempDir, 'src', 'mod1.ts')); + + const syncResult = await cg.sync(); + expect(syncResult.filesAdded).toBeGreaterThanOrEqual(1); + expect(syncResult.filesModified).toBeGreaterThanOrEqual(1); + expect(syncResult.filesRemoved).toBeGreaterThanOrEqual(1); + + // New symbol must now be findable; removed file's symbols gone. + expect(cg.searchNodes('newHelper').length).toBeGreaterThan(0); + + // Removed file should no longer appear in the indexed file list. + // (FTS prefix matching makes name-based assertions unreliable here — + // Mod10/Mod11/… all start with "Mod1" — so we check the file set + // instead.) + const filesAfterSync = cg.getNodesInFile('src/mod1.ts'); + expect(filesAfterSync).toHaveLength(0); + } finally { + cg.destroy(); + } + }, 60_000); + + it('keeps indexing files when one file has a parse error', async () => { + const srcDir = path.join(tempDir, 'src'); + fs.mkdirSync(srcDir, { recursive: true }); + + // Valid files + fs.writeFileSync( + path.join(srcDir, 'good1.ts'), + `export function good1(): number { return 1; }\n` + ); + fs.writeFileSync( + path.join(srcDir, 'good2.ts'), + `export function good2(): number { return 2; }\n` + ); + // Intentionally broken file — unclosed brace, stray tokens. + fs.writeFileSync( + path.join(srcDir, 'broken.ts'), + `export function broken(\n this is { not valid typescript at all\n` + ); + + const cg = await CodeGraph.init(tempDir, { + config: { include: ['**/*.ts'], exclude: [] }, + }); + + try { + const result = await cg.indexAll(); + // The two good files must still be indexed regardless of the + // broken one. Tree-sitter is error-tolerant so it may still + // extract a partial AST from broken.ts — but the test only + // requires that the batch completes and finds the good symbols. + expect(result.filesIndexed).toBeGreaterThanOrEqual(2); + + const good1 = cg.searchNodes('good1'); + const good2 = cg.searchNodes('good2'); + expect(good1.find((r) => r.node.name === 'good1')).toBeDefined(); + expect(good2.find((r) => r.node.name === 'good2')).toBeDefined(); + } finally { + cg.destroy(); + } + }, 30_000); + + it('handles repeated sync calls when nothing has changed', async () => { + generateSyntheticProject(tempDir, 10); + + const cg = await CodeGraph.init(tempDir, { + config: { include: ['**/*.ts'], exclude: [] }, + }); + + try { + await cg.indexAll(); + const statsBefore = cg.getStats(); + + const first = await cg.sync(); + const second = await cg.sync(); + + // Subsequent sync with no changes should be a no-op. + expect(first.filesAdded + first.filesModified + first.filesRemoved).toBe(0); + expect(second.filesAdded + second.filesModified + second.filesRemoved).toBe(0); + + const statsAfter = cg.getStats(); + expect(statsAfter.fileCount).toBe(statsBefore.fileCount); + expect(statsAfter.nodeCount).toBe(statsBefore.nodeCount); + } finally { + cg.destroy(); + } + }, 30_000); +}); diff --git a/__tests__/integration/lru-cache.test.ts b/__tests__/integration/lru-cache.test.ts new file mode 100644 index 00000000..8156760a --- /dev/null +++ b/__tests__/integration/lru-cache.test.ts @@ -0,0 +1,96 @@ +/** + * LRUCache unit tests + * + * Covers the eviction guarantees that the resolver relies on: + * - capacity is enforced (never exceeds max) + * - LRU ordering: hot keys survive eviction passes + * - has()/get()/set()/clear() behave like the original Map shape + * - null values are storable (the fileCache uses null for "failed read") + */ + +import { describe, it, expect } from 'vitest'; +import { LRUCache } from '../../src/resolution/lru-cache'; + +describe('LRUCache', () => { + it('enforces capacity by evicting the oldest entry on overflow', () => { + const cache = new LRUCache(3); + cache.set('a', 1); + cache.set('b', 2); + cache.set('c', 3); + cache.set('d', 4); // evicts 'a' + + expect(cache.size).toBe(3); + expect(cache.has('a')).toBe(false); + expect(cache.get('a')).toBeUndefined(); + expect(cache.get('b')).toBe(2); + expect(cache.get('c')).toBe(3); + expect(cache.get('d')).toBe(4); + }); + + it('promotes touched keys to most-recent so they survive eviction', () => { + const cache = new LRUCache(3); + cache.set('a', 1); + cache.set('b', 2); + cache.set('c', 3); + + // Touch 'a' — it should now be most-recent. + expect(cache.get('a')).toBe(1); + + cache.set('d', 4); // evicts the LRU, which is now 'b' (not 'a') + + expect(cache.has('a')).toBe(true); + expect(cache.has('b')).toBe(false); + expect(cache.has('c')).toBe(true); + expect(cache.has('d')).toBe(true); + }); + + it('overwriting an existing key refreshes its recency but does not grow size', () => { + const cache = new LRUCache(2); + cache.set('a', 1); + cache.set('b', 2); + cache.set('a', 99); // 'a' is now most-recent + + expect(cache.size).toBe(2); + expect(cache.get('a')).toBe(99); + + cache.set('c', 3); // should evict 'b', not 'a' + + expect(cache.has('a')).toBe(true); + expect(cache.has('b')).toBe(false); + expect(cache.has('c')).toBe(true); + }); + + it('stores null values (used by the file content cache)', () => { + const cache = new LRUCache(2); + cache.set('missing.ts', null); + expect(cache.has('missing.ts')).toBe(true); + expect(cache.get('missing.ts')).toBeNull(); + }); + + it('clear() resets the cache', () => { + const cache = new LRUCache(3); + cache.set('a', 1); + cache.set('b', 2); + cache.clear(); + expect(cache.size).toBe(0); + expect(cache.has('a')).toBe(false); + }); + + it('rejects non-positive capacity', () => { + expect(() => new LRUCache(0)).toThrow(); + expect(() => new LRUCache(-1)).toThrow(); + expect(() => new LRUCache(NaN)).toThrow(); + }); + + it('stays bounded under heavy churn (regression for OOM scenario)', () => { + const cache = new LRUCache(100); + for (let i = 0; i < 10_000; i++) { + cache.set(`key${i}`, i); + } + expect(cache.size).toBe(100); + // The last 100 keys should still be present, the rest evicted. + expect(cache.has('key9999')).toBe(true); + expect(cache.has('key9900')).toBe(true); + expect(cache.has('key0')).toBe(false); + }); +}); diff --git a/__tests__/integration/mcp-input-limits.test.ts b/__tests__/integration/mcp-input-limits.test.ts new file mode 100644 index 00000000..495d4933 --- /dev/null +++ b/__tests__/integration/mcp-input-limits.test.ts @@ -0,0 +1,109 @@ +/** + * MCP tool input-size limits + * + * Regression coverage for the DoS vector: MCP clients can ship + * unbounded payloads (`query`, `task`, `symbol`, `projectPath`, + * `path`, `pattern`). Before the cap, a 100MB string would hit + * the FTS5 layer and pin the server. These tests assert that the + * tool layer rejects oversize inputs early. + */ + +import { describe, it, expect, beforeEach, afterEach } from 'vitest'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import CodeGraph from '../../src/index'; +import { ToolHandler } from '../../src/mcp/tools'; + +describe('MCP input size limits', () => { + let tempDir: string; + let cg: CodeGraph; + let handler: ToolHandler; + + beforeEach(async () => { + tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'codegraph-mcp-limits-')); + fs.mkdirSync(path.join(tempDir, 'src'), { recursive: true }); + fs.writeFileSync( + path.join(tempDir, 'src', 'a.ts'), + `export function alpha(): number { return 1; }\n` + ); + cg = await CodeGraph.init(tempDir, { + config: { include: ['**/*.ts'], exclude: [] }, + }); + await cg.indexAll(); + handler = new ToolHandler(cg); + }); + + afterEach(() => { + if (cg) cg.destroy(); + if (fs.existsSync(tempDir)) { + fs.rmSync(tempDir, { recursive: true, force: true }); + } + }); + + it('accepts a normal-sized query', async () => { + const result = await handler.execute('codegraph_search', { query: 'alpha' }); + expect(result.isError).toBeFalsy(); + }); + + it('rejects an oversize query on codegraph_search', async () => { + const huge = 'a'.repeat(20_000); + const result = await handler.execute('codegraph_search', { query: huge }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/maximum length/i); + }); + + it('rejects an oversize task on codegraph_context', async () => { + const huge = 'b'.repeat(50_000); + const result = await handler.execute('codegraph_context', { task: huge }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/maximum length/i); + }); + + it('rejects an oversize symbol on codegraph_callers', async () => { + const huge = 'c'.repeat(15_000); + const result = await handler.execute('codegraph_callers', { symbol: huge }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/maximum length/i); + }); + + it('rejects an oversize symbol on codegraph_impact', async () => { + const huge = 'd'.repeat(11_000); + const result = await handler.execute('codegraph_impact', { symbol: huge }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/maximum length/i); + }); + + it('rejects an oversize projectPath', async () => { + const hugePath = '/tmp/' + 'x'.repeat(5_000); + const result = await handler.execute('codegraph_search', { + query: 'alpha', + projectPath: hugePath, + }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/projectPath/); + }); + + it('rejects an oversize path filter on codegraph_files', async () => { + const hugePath = 'src/' + 'y'.repeat(5_000); + const result = await handler.execute('codegraph_files', { path: hugePath }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/path/); + }); + + it('rejects an oversize glob pattern on codegraph_files', async () => { + const hugePattern = '*'.repeat(5_000); + const result = await handler.execute('codegraph_files', { pattern: hugePattern }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/pattern/); + }); + + it('rejects a non-string projectPath', async () => { + const result = await handler.execute('codegraph_search', { + query: 'alpha', + projectPath: 12345 as unknown as string, + }); + expect(result.isError).toBe(true); + expect(result.content[0]!.text).toMatch(/projectPath/); + }); +}); diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index 3ceb8551..f15cdc5d 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -22,6 +22,22 @@ import { join } from 'path'; /** Maximum output length to prevent context bloat (characters) */ const MAX_OUTPUT_LENGTH = 15000; +/** + * Maximum length for free-form string inputs (query, task, symbol). + * Bounds memory and CPU when a buggy or hostile MCP client sends a + * huge payload — without this an attacker could ship a 100MB string + * and force a full FTS5 scan / OOM the server. 10 000 characters is + * far beyond any realistic legitimate query. + */ +const MAX_INPUT_LENGTH = 10_000; + +/** + * Maximum length for path-like string inputs (projectPath, path + * filter, glob pattern). Paths beyond a few thousand chars are + * never legitimate and signal abuse or a bug upstream. + */ +const MAX_PATH_LENGTH = 4_096; + /** * Rust path roots that have no file-system equivalent — `crate` is the * current crate, `super` is the parent module, `self` is the current @@ -609,12 +625,46 @@ export class ToolHandler { } /** - * Validate that a value is a non-empty string + * Validate that a value is a non-empty string within length bounds. + * + * The `maxLength` cap protects against MCP clients that ship huge + * payloads (10MB+ query strings either by accident or maliciously). + * Without this, a single oversized input can pin the FTS5 index or + * exhaust memory before any real work runs. */ - private validateString(value: unknown, name: string): string | ToolResult { + private validateString( + value: unknown, + name: string, + maxLength: number = MAX_INPUT_LENGTH + ): string | ToolResult { if (typeof value !== 'string' || value.length === 0) { return this.errorResult(`${name} must be a non-empty string`); } + if (value.length > maxLength) { + return this.errorResult( + `${name} exceeds maximum length of ${maxLength} characters (got ${value.length})` + ); + } + return value; + } + + /** + * Validate an optional path-like string input. Returns the value if + * valid (or undefined), or a ToolResult with the error. + */ + private validateOptionalPath( + value: unknown, + name: string + ): string | undefined | ToolResult { + if (value === undefined || value === null) return undefined; + if (typeof value !== 'string') { + return this.errorResult(`${name} must be a string`); + } + if (value.length > MAX_PATH_LENGTH) { + return this.errorResult( + `${name} exceeds maximum length of ${MAX_PATH_LENGTH} characters (got ${value.length})` + ); + } return value; } @@ -623,6 +673,25 @@ export class ToolHandler { */ async execute(toolName: string, args: Record): Promise { try { + // Cross-cutting input validation. All tools accept an optional + // `projectPath` and most accept either `query`, `task`, or + // `symbol` — bound their lengths centrally so individual handlers + // can stay focused on tool-specific logic. + const pathCheck = this.validateOptionalPath(args.projectPath, 'projectPath'); + if (typeof pathCheck === 'object' && pathCheck !== undefined) { + return pathCheck; + } + // The `path` and `pattern` properties used by codegraph_files are + // also path-shaped — apply the same cap. + if (args.path !== undefined) { + const check = this.validateOptionalPath(args.path, 'path'); + if (typeof check === 'object' && check !== undefined) return check; + } + if (args.pattern !== undefined) { + const check = this.validateOptionalPath(args.pattern, 'pattern'); + if (typeof check === 'object' && check !== undefined) return check; + } + switch (toolName) { case 'codegraph_search': return await this.handleSearch(args); diff --git a/src/resolution/index.ts b/src/resolution/index.ts index 34aa4b90..2ae85ccb 100644 --- a/src/resolution/index.ts +++ b/src/resolution/index.ts @@ -22,6 +22,24 @@ import { detectFrameworks } from './frameworks'; import { loadProjectAliases, type AliasMap } from './path-aliases'; import { logDebug } from '../errors'; import type { ReExport } from './types'; +import { LRUCache } from './lru-cache'; + +/** + * Cache size limits. Each per-resolver cache is bounded so memory + * stays flat on large codebases (20k+ files). Sizes were chosen to + * cover the working set for typical resolution batches without + * exceeding a few hundred MB worst-case. Override via the env var + * `CODEGRAPH_RESOLVER_CACHE_SIZE` (single integer applied to all + * caches) when tuning for very large or very small projects. + */ +const DEFAULT_CACHE_LIMIT = 5_000; +function resolveCacheLimit(): number { + const raw = process.env.CODEGRAPH_RESOLVER_CACHE_SIZE; + if (!raw) return DEFAULT_CACHE_LIMIT; + const parsed = Number.parseInt(raw, 10); + if (Number.isFinite(parsed) && parsed > 0) return parsed; + return DEFAULT_CACHE_LIMIT; +} // Re-export types export * from './types'; @@ -121,13 +139,16 @@ export class ReferenceResolver { private queries: QueryBuilder; private context: ResolutionContext; private frameworks: FrameworkResolver[] = []; - private nodeCache: Map = new Map(); // per-file node cache (bounded) - private fileCache: Map = new Map(); // per-file content cache (bounded) - private importMappingCache: Map = new Map(); - private reExportCache: Map = new Map(); - private nameCache: Map = new Map(); // name → nodes cache - private lowerNameCache: Map = new Map(); // lower(name) → nodes cache - private qualifiedNameCache: Map = new Map(); // qualified_name → nodes cache + // All per-resolver caches are LRU-bounded. Previously these were + // unbounded Maps that grew with every distinct lookup and OOM'd on + // codebases with 20k+ files (see issue: unbounded cache growth). + private nodeCache: LRUCache; // per-file node cache + private fileCache: LRUCache; // per-file content cache + private importMappingCache: LRUCache; + private reExportCache: LRUCache; + private nameCache: LRUCache; // name → nodes cache + private lowerNameCache: LRUCache; // lower(name) → nodes cache + private qualifiedNameCache: LRUCache; // qualified_name → nodes cache private knownNames: Set | null = null; // all known symbol names for fast pre-filtering private knownFiles: Set | null = null; private cachesWarmed = false; @@ -139,6 +160,19 @@ export class ReferenceResolver { constructor(projectRoot: string, queries: QueryBuilder) { this.projectRoot = projectRoot; this.queries = queries; + + const limit = resolveCacheLimit(); + // The content cache is heavier (full file text), so we give it a + // smaller budget than the metadata caches. + const contentLimit = Math.max(64, Math.floor(limit / 5)); + this.nodeCache = new LRUCache(limit); + this.fileCache = new LRUCache(contentLimit); + this.importMappingCache = new LRUCache(limit); + this.reExportCache = new LRUCache(limit); + this.nameCache = new LRUCache(limit); + this.lowerNameCache = new LRUCache(limit); + this.qualifiedNameCache = new LRUCache(limit); + this.context = this.createContext(); } diff --git a/src/resolution/lru-cache.ts b/src/resolution/lru-cache.ts new file mode 100644 index 00000000..2a597ddb --- /dev/null +++ b/src/resolution/lru-cache.ts @@ -0,0 +1,62 @@ +/** + * Simple LRU cache backed by JavaScript's insertion-ordered Map. + * + * Used by ReferenceResolver to bound the per-resolver caches that + * previously grew without limit and OOM'd on large codebases (20k+ + * files). Each cache is sized independently — see `index.ts` for + * the chosen limits per cache type. + * + * Eviction is plain LRU: on `set`, if the cache is full, the + * least-recently-used entry (the first one in iteration order) is + * evicted. Touching via `get` moves the entry to the most-recently-used + * position so hot keys survive eviction passes. + */ +export class LRUCache { + private readonly max: number; + private readonly store = new Map(); + + constructor(max: number) { + if (!Number.isFinite(max) || max <= 0) { + throw new Error(`LRUCache max must be a positive finite number, got ${max}`); + } + this.max = Math.floor(max); + } + + get size(): number { + return this.store.size; + } + + get(key: K): V | undefined { + const value = this.store.get(key); + if (value === undefined) { + // Distinguish "missing" from "stored undefined" by checking has(). + // We don't store undefined in practice, but be defensive. + return this.store.has(key) ? value : undefined; + } + // Refresh recency by re-inserting. + this.store.delete(key); + this.store.set(key, value); + return value; + } + + has(key: K): boolean { + return this.store.has(key); + } + + set(key: K, value: V): void { + if (this.store.has(key)) { + this.store.delete(key); + } else if (this.store.size >= this.max) { + // Evict the oldest entry — first key in iteration order. + const oldest = this.store.keys().next().value; + if (oldest !== undefined) { + this.store.delete(oldest); + } + } + this.store.set(key, value); + } + + clear(): void { + this.store.clear(); + } +} From 23ad4ea923b31ed7f3aafe313f137ec7a822f91c Mon Sep 17 00:00:00 2001 From: Baijack-star <71923891+Baijack-star@users.noreply.github.com> Date: Sat, 23 May 2026 02:20:08 +0800 Subject: [PATCH 37/47] fix(mcp): cap codegraph_context output to prevent context bloat (#296) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Route handleContext's output through the shared truncateOutput cap (MAX_OUTPUT_LENGTH) so codegraph_context can no longer blow past the context budget — every sibling MCP tool already truncates; this was the one uncapped output path. Closes #296 Co-authored-by: Baijack-star <71923891+Baijack-star@users.noreply.github.com> --- __tests__/security.test.ts | 14 ++++++++++++++ src/mcp/tools.ts | 4 ++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/__tests__/security.test.ts b/__tests__/security.test.ts index 782b99da..c57158c2 100644 --- a/__tests__/security.test.ts +++ b/__tests__/security.test.ts @@ -239,6 +239,20 @@ describe('MCP Input Validation', () => { expect(result.content[0].text).toContain('non-empty string'); }); + it('should truncate oversized codegraph_context output', async () => { + const oversizedContext = Array.from({ length: 400 }, (_, i) => `line-${i} ${'x'.repeat(80)}`).join('\n'); + const fakeCg = { + buildContext: async () => oversizedContext, + }; + const fakeHandler = new ToolHandler(fakeCg as unknown as CodeGraph); + + const result = await fakeHandler.execute('codegraph_context', { task: 'find example' }); + + expect(result.isError).toBeFalsy(); + expect(result.content[0].text.length).toBeLessThan(oversizedContext.length); + expect(result.content[0].text).toContain('... (output truncated)'); + }); + it('should reject non-string symbol in codegraph_impact', async () => { const result = await handler.execute('codegraph_impact', { symbol: [] }); expect(result.isError).toBe(true); diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index f15cdc5d..dfd41542 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -775,11 +775,11 @@ export class ToolHandler { // buildContext returns string when format is 'markdown' if (typeof context === 'string') { - return this.textResult(context + reminder); + return this.textResult(this.truncateOutput(context + reminder)); } // If it returns TaskContext, format it - return this.textResult(this.formatTaskContext(context) + reminder); + return this.textResult(this.truncateOutput(this.formatTaskContext(context) + reminder)); } /** From c9d2a25b73c3fc66c0f464a1ec0e0eb1cf53de65 Mon Sep 17 00:00:00 2001 From: Colby McHenry Date: Fri, 22 May 2026 14:12:40 -0500 Subject: [PATCH 38/47] docs: validate Windows PRs via Parallels+SSH; gitignore .parallels Document the Mac-host -> Parallels Windows 11 SSH workflow for validating Windows-specific behavior, the win32-gated test convention (it.runIf), and guest toolchain quirks (PATH refresh, Windows-local clone, VC++ ARM64 redist). Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitignore | 3 +++ CLAUDE.md | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/.gitignore b/.gitignore index 435882b3..f7aa9d68 100644 --- a/.gitignore +++ b/.gitignore @@ -40,6 +40,9 @@ npm-debug.log* # Local Claude settings .claude/settings.local.json +# Parallels Windows VM SSH/connection config (local machine, see CLAUDE.md) +.parallels + # CodeGraph data directories (in test projects) .codegraph/ diff --git a/CLAUDE.md b/CLAUDE.md index d5222f37..be63c67b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -101,6 +101,26 @@ Tests live in `__tests__/` and mirror the module they cover. Notable ones beyond Tests create temp dirs with `fs.mkdtempSync` and clean up in `afterEach`. They write real files and exercise real SQLite — there is no DB mocking. +### Windows-gated tests + +Behavior that differs by platform (path resolution, drive letters, `SENSITIVE_PATHS`, `%APPDATA%` config dirs, CRLF) must be gated, not assumed. Use `it.runIf(process.platform === 'win32')(...)` for Windows-only assertions and `it.runIf(process.platform !== 'win32')(...)` for POSIX-only ones — e.g. `/etc` is sensitive on POSIX but resolves to `C:\etc` (non-existent) on Windows, so an ungated `/etc` assertion fails on Windows. Validate the Windows side for real (see below); don't merge a Windows-gated test you haven't seen run. + +## Windows validation (Parallels + SSH) + +For any Windows-specific PR, bug, or implementation, validate it on the real Windows VM rather than guessing. Connection details live in the gitignored **`.parallels`** file at the repo root (VM name, guest IP, SSH user/key). `prlctl exec` needs Parallels Pro and is unavailable, so SSH is the bridge. + +- Connect / run from the Mac host: `ssh @ "..."`. For multi-line work, pipe PowerShell over stdin and **refresh PATH from the registry** first (sshd's session has a stale PATH after winget installs): + ``` + ssh colby@10.211.55.3 "powershell -NoProfile -ExecutionPolicy Bypass -Command -" <<'PS' + $env:Path = [Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [Environment]::GetEnvironmentVariable("Path","User") + Set-Location C:\dev\codegraph + PS + ``` +- Clone fresh into a **Windows-local** path (`C:\dev\codegraph`) and `npm ci` there — never run npm against the shared Mac repo, since `esbuild`/`rollup` ship platform-specific binaries. +- Guest toolchain (winget): Node LTS, Git, and the **VC++ ARM64 redistributable** (required by `@rollup/rollup-win32-arm64-msvc`, which vitest pulls in). +- Fetch a contributor PR head straight from their fork to dodge `pull//head` lag: `git fetch ` then `git checkout -f FETCH_HEAD`. +- Known pre-existing Windows failure: `security.test.ts > Session marker symlink resistance > does not follow a pre-planted symlink` (symlink creation needs privileges on Windows). Unrelated to current work; don't let it mask new regressions. + ## Releases Released to npm and mirrored as [GitHub Releases](https://github.com/colbymchenry/codegraph/releases). `CHANGELOG.md` is the source of truth; GitHub Release notes are extracted from it. From 7d5dd4cda7402bb2c9f467851ceed7f7115919a3 Mon Sep 17 00:00:00 2001 From: "Leon.C" <160379708+zichen0116@users.noreply.github.com> Date: Sat, 23 May 2026 03:13:42 +0800 Subject: [PATCH 39/47] fix: remove dead try/catch in insertNode; fix SENSITIVE_PATHS case-sensitivity (#327) Drop the no-op try/catch around insertNode.run, and lowercase the Windows SENSITIVE_PATHS entries so validateProjectPath's case-insensitive check actually blocks c:\windows. Adds a validateProjectPath test (POSIX + Windows-gated); the Windows-gated case was validated on a real Windows 11 VM. Closes #327 --- __tests__/security.test.ts | 32 ++++++++++++++++++++++++- src/db/queries.ts | 48 +++++++++++++++++--------------------- src/utils.ts | 2 +- 3 files changed, 54 insertions(+), 28 deletions(-) diff --git a/__tests__/security.test.ts b/__tests__/security.test.ts index c57158c2..abb70fe6 100644 --- a/__tests__/security.test.ts +++ b/__tests__/security.test.ts @@ -12,7 +12,7 @@ import { describe, it, expect, beforeEach, afterEach } from 'vitest'; import * as fs from 'fs'; import * as path from 'path'; import * as os from 'os'; -import { FileLock } from '../src/utils'; +import { FileLock, validateProjectPath } from '../src/utils'; import CodeGraph from '../src/index'; import { ToolHandler, tools } from '../src/mcp/tools'; import { scanDirectory, isSourceFile } from '../src/extraction'; @@ -176,6 +176,36 @@ describe('Path Traversal Prevention', () => { }); }); +describe('validateProjectPath — sensitive directory blocking', () => { + // POSIX-only: on Windows '/etc' resolves to C:\etc (non-existent), not a + // sensitive dir — the Windows case is covered by the win32-gated test below. + it.runIf(process.platform !== 'win32')('blocks POSIX system directories (exact match)', () => { + expect(validateProjectPath('/')).toMatch(/sensitive system directory/i); + expect(validateProjectPath('/etc')).toMatch(/sensitive system directory/i); + }); + + it('allows a normal, existing directory', () => { + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'cg-validate-')); + try { + expect(validateProjectPath(dir)).toBeNull(); + } finally { + fs.rmSync(dir, { recursive: true, force: true }); + } + }); + + // SENSITIVE_PATHS stores the Windows entries lowercase and validateProjectPath + // matches via resolved.toLowerCase(), so 'C:\\Windows' and 'c:\\windows' are + // both blocked. path.resolve is platform-specific, so this only runs on Windows. + it.runIf(process.platform === 'win32')( + 'blocks Windows system directories regardless of case', + () => { + expect(validateProjectPath('C:\\Windows')).toMatch(/sensitive system directory/i); + expect(validateProjectPath('c:\\windows')).toMatch(/sensitive system directory/i); + expect(validateProjectPath('C:\\WINDOWS\\System32')).toMatch(/sensitive system directory/i); + } + ); +}); + describe('MCP Input Validation', () => { let testDir: string; let cg: CodeGraph; diff --git a/src/db/queries.ts b/src/db/queries.ts index fae3b754..9419a313 100644 --- a/src/db/queries.ts +++ b/src/db/queries.ts @@ -230,32 +230,28 @@ export class QueryBuilder { // deleteNode below). this.nodeCache.delete(node.id); - try { - this.stmts.insertNode.run({ - id: node.id, - kind: node.kind, - name: node.name, - qualifiedName: node.qualifiedName ?? node.name, - filePath: node.filePath, - language: node.language, - startLine: node.startLine ?? 0, - endLine: node.endLine ?? 0, - startColumn: node.startColumn ?? 0, - endColumn: node.endColumn ?? 0, - docstring: node.docstring ?? null, - signature: node.signature ?? null, - visibility: node.visibility ?? null, - isExported: node.isExported ? 1 : 0, - isAsync: node.isAsync ? 1 : 0, - isStatic: node.isStatic ? 1 : 0, - isAbstract: node.isAbstract ? 1 : 0, - decorators: node.decorators ? JSON.stringify(node.decorators) : null, - typeParameters: node.typeParameters ? JSON.stringify(node.typeParameters) : null, - updatedAt: node.updatedAt ?? Date.now(), - }); - } catch (error) { - throw error; - } + this.stmts.insertNode.run({ + id: node.id, + kind: node.kind, + name: node.name, + qualifiedName: node.qualifiedName ?? node.name, + filePath: node.filePath, + language: node.language, + startLine: node.startLine ?? 0, + endLine: node.endLine ?? 0, + startColumn: node.startColumn ?? 0, + endColumn: node.endColumn ?? 0, + docstring: node.docstring ?? null, + signature: node.signature ?? null, + visibility: node.visibility ?? null, + isExported: node.isExported ? 1 : 0, + isAsync: node.isAsync ? 1 : 0, + isStatic: node.isStatic ? 1 : 0, + isAbstract: node.isAbstract ? 1 : 0, + decorators: node.decorators ? JSON.stringify(node.decorators) : null, + typeParameters: node.typeParameters ? JSON.stringify(node.typeParameters) : null, + updatedAt: node.updatedAt ?? Date.now(), + }); } /** diff --git a/src/utils.ts b/src/utils.ts index e75e58e0..1ee1c937 100644 --- a/src/utils.ts +++ b/src/utils.ts @@ -43,7 +43,7 @@ import * as path from 'path'; const SENSITIVE_PATHS = new Set([ '/', '/etc', '/usr', '/bin', '/sbin', '/var', '/tmp', '/dev', '/proc', '/sys', '/root', '/boot', '/lib', '/lib64', '/opt', - 'C:\\', 'C:\\Windows', 'C:\\Windows\\System32', + 'c:\\', 'c:\\windows', 'c:\\windows\\system32', ]); /** From 02ea482b3734c6eff1c0293d360fe75ea3086000 Mon Sep 17 00:00:00 2001 From: Aditya Rawat Date: Sat, 23 May 2026 00:45:02 +0530 Subject: [PATCH 40/47] fix: validate projectPath in MCP handler to block sensitive directories (#230) Validate projectPath in getCodeGraph so MCP clients can't open a codegraph in a sensitive system directory. Guarded with existsSync so nested/not-yet-created sub-paths still resolve up to the default project (preserves issue #238). Adds MCP-handler rejection tests (POSIX + Windows-gated); validated on a real Windows 11 VM. Closes #230 --- __tests__/security.test.ts | 28 ++++++++++++++++++++++++++++ src/mcp/tools.ts | 14 +++++++++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/__tests__/security.test.ts b/__tests__/security.test.ts index abb70fe6..75ac8432 100644 --- a/__tests__/security.test.ts +++ b/__tests__/security.test.ts @@ -307,6 +307,34 @@ describe('MCP Input Validation', () => { const result = await handler.execute('codegraph_search', { query: 'example', limit: -5 }); expect(result.isError).toBeFalsy(); }); + + // #230: getCodeGraph must reject a sensitive system directory passed as + // projectPath before opening it. The error surfaces through execute()'s + // catch as an isError result. /etc is sensitive on POSIX; C:\Windows on + // Windows (path.resolve is platform-specific, so each case is gated). + it.runIf(process.platform !== 'win32')( + 'rejects a sensitive POSIX projectPath (/etc) via the MCP handler', + async () => { + const result = await handler.execute('codegraph_search', { + query: 'example', + projectPath: '/etc', + }); + expect(result.isError).toBe(true); + expect(result.content[0].text).toMatch(/sensitive system directory/i); + } + ); + + it.runIf(process.platform === 'win32')( + 'rejects a sensitive Windows projectPath (C:\\Windows) via the MCP handler', + async () => { + const result = await handler.execute('codegraph_search', { + query: 'example', + projectPath: 'C:\\Windows', + }); + expect(result.isError).toBe(true); + expect(result.content[0].text).toMatch(/sensitive system directory/i); + } + ); }); describe('Atomic Writes', () => { diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index dfd41542..deb8dfdc 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -15,7 +15,7 @@ import { readFileSync, writeSync, } from 'fs'; -import { clamp, validatePathWithinRoot } from '../utils'; +import { clamp, validatePathWithinRoot, validateProjectPath } from '../utils'; import { tmpdir } from 'os'; import { join } from 'path'; @@ -579,6 +579,18 @@ export class ToolHandler { return this.projectCache.get(projectPath)!; } + // Reject sensitive system directories before opening. Only validate a + // path that actually exists — a nested or not-yet-created sub-path of a + // real project must still be allowed to resolve UP to its .codegraph/ + // root below (issue #238), so we don't run the existence-checking + // validator on paths that are meant to walk up. + if (existsSync(projectPath)) { + const pathError = validateProjectPath(projectPath); + if (pathError) { + throw new Error(pathError); + } + } + // Walk up parent directories to find nearest .codegraph/ const resolvedRoot = findNearestCodeGraphRoot(projectPath); From 6f4b52151202fe04a086bd999b6d6239f72fe33b Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Fri, 22 May 2026 14:23:10 -0500 Subject: [PATCH 41/47] fix(mcp): make session-marker symlink resistance work on Windows (#337) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit O_NOFOLLOW is undefined on Windows (libuv ignores it), so the bitwise-OR silently dropped it and markSessionConsulted would follow a pre-planted symlink at the tmp marker path — the CWE-59 gap #280 closed on POSIX but not Windows. Add a cross-platform lstatSync isSymbolicLink() refuse-check before openSync (O_NOFOLLOW stays as the atomic, TOCTOU-free guard on POSIX). The existing Session-marker-symlink-resistance test now passes on Windows. Refs #280 Co-authored-by: Claude Opus 4.7 (1M context) --- src/mcp/tools.ts | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts index deb8dfdc..16df373d 100644 --- a/src/mcp/tools.ts +++ b/src/mcp/tools.ts @@ -11,6 +11,7 @@ import { constants as fsConstants, closeSync, existsSync, + lstatSync, openSync, readFileSync, writeSync, @@ -224,6 +225,16 @@ function markSessionConsulted(sessionId: string): void { try { const hash = createHash('md5').update(sessionId).digest('hex').slice(0, 16); const markerPath = join(tmpdir(), `codegraph-consulted-${hash}`); + // Refuse to follow a pre-planted symlink at the marker path (CWE-59). + // O_NOFOLLOW (below) is the atomic, TOCTOU-free guard on POSIX, but it is + // `undefined` on Windows (libuv ignores it), so the bitwise-OR silently + // drops it and openSync would follow the link. This lstat check closes that + // gap cross-platform; ENOENT (path is free) falls through to create it. + try { + if (lstatSync(markerPath).isSymbolicLink()) return; + } catch { + // No existing entry (or stat failed) — nothing to refuse; proceed. + } // O_NOFOLLOW makes openSync throw ELOOP if markerPath is already a symlink. // O_CREAT + O_TRUNC keep the original "create-or-overwrite" semantics, and // mode 0o600 prevents readback by other local users (the marker payload is From fd6a649518d306a02d61b58c1e480ddcffbf4b21 Mon Sep 17 00:00:00 2001 From: Andrew Barnes Date: Fri, 22 May 2026 15:49:49 -0400 Subject: [PATCH 42/47] docs(readme): link support badges to sections (#326) Point the previously-dead (#) support badges at new Supported Platforms / Supported Agents sections, grouped with Supported Languages near the bottom of the README. Co-authored-by: Andrew Barnes Co-authored-by: Claude Opus 4.7 (1M context) --- README.md | 40 ++++++++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 511e2094..a2c8801b 100644 --- a/README.md +++ b/README.md @@ -10,15 +10,15 @@ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Self-contained](https://img.shields.io/badge/Node.js-bundled%20%C2%B7%20none%20required-brightgreen.svg)](https://nodejs.org/) -[![Windows](https://img.shields.io/badge/Windows-supported-blue.svg)](#) -[![macOS](https://img.shields.io/badge/macOS-supported-blue.svg)](#) -[![Linux](https://img.shields.io/badge/Linux-supported-blue.svg)](#) +[![Windows](https://img.shields.io/badge/Windows-supported-blue.svg)](#supported-platforms) +[![macOS](https://img.shields.io/badge/macOS-supported-blue.svg)](#supported-platforms) +[![Linux](https://img.shields.io/badge/Linux-supported-blue.svg)](#supported-platforms) -[![Claude Code](https://img.shields.io/badge/Claude_Code-supported-blueviolet.svg)](#) -[![Cursor](https://img.shields.io/badge/Cursor-supported-blueviolet.svg)](#) -[![Codex CLI](https://img.shields.io/badge/Codex_CLI-supported-blueviolet.svg)](#) -[![opencode](https://img.shields.io/badge/opencode-supported-blueviolet.svg)](#) -[![Hermes Agent](https://img.shields.io/badge/Hermes_Agent-supported-blueviolet.svg)](#) +[![Claude Code](https://img.shields.io/badge/Claude_Code-supported-blueviolet.svg)](#supported-agents) +[![Cursor](https://img.shields.io/badge/Cursor-supported-blueviolet.svg)](#supported-agents) +[![Codex CLI](https://img.shields.io/badge/Codex_CLI-supported-blueviolet.svg)](#supported-agents) +[![opencode](https://img.shields.io/badge/opencode-supported-blueviolet.svg)](#supported-agents) +[![Hermes Agent](https://img.shields.io/badge/Hermes_Agent-supported-blueviolet.svg)](#supported-agents) @@ -447,6 +447,30 @@ What that means in practice: > committed `dist/`. If you commit a dependency or build directory you don't want > in the graph, add it to `.gitignore`. +## Supported Platforms + +Every release ships a self-contained build (bundled Node runtime — nothing to +compile) for all three desktop OSes, on both Intel/AMD (x64) and ARM (arm64): + +| Platform | Architectures | Install | +|----------|---------------|---------| +| Windows | x64, arm64 | PowerShell installer or npm | +| macOS | x64, arm64 | shell installer or npm | +| Linux | x64, arm64 | shell installer or npm | + +See [Get Started](#get-started) for the one-line install commands. + +## Supported Agents + +The interactive installer auto-detects and configures each of these — wiring up +the MCP server and writing its instructions file: + +- **Claude Code** +- **Cursor** +- **Codex CLI** +- **opencode** +- **Hermes Agent** + ## Supported Languages | Language | Extension | Status | From fb45959af74851b4322242633b758a81967ad7ac Mon Sep 17 00:00:00 2001 From: Infinity_Block <105136435+evanclan@users.noreply.github.com> Date: Sat, 23 May 2026 05:02:29 +0900 Subject: [PATCH 43/47] fix(mcp): reap serve --mcp child when parent is SIGKILL'd (#286) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a PPID watchdog to the MCP server so a `codegraph serve --mcp` child terminates when its host (Claude Code, opencode, …) is force-killed — OOM killer, `kill -9`, container teardown — and the stdin close handlers don't fire. The child would otherwise linger indefinitely, holding inotify watches, file descriptors, and the SQLite WAL. Also propagates the host PID across the `--liftoff-only` re-exec (CODEGRAPH_HOST_PPID) so the watchdog reaps the orphan on the from-source path too, not just the bundled launcher. Poll interval is CODEGRAPH_PPID_POLL_MS (default 5000ms, 0 disables). Resolves #277. Co-authored-by: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 12 ++ __tests__/mcp-ppid-watchdog.test.ts | 168 +++++++++++++++++++++++++++ src/extraction/wasm-runtime-flags.ts | 15 ++- src/mcp/index.ts | 97 ++++++++++++++++ 4 files changed, 291 insertions(+), 1 deletion(-) create mode 100644 __tests__/mcp-ppid-watchdog.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 3e35df64..3cfadd1a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,18 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). ## [0.9.4] - 2026-05-22 +### Fixed +- **Orphaned `codegraph serve --mcp` processes after a parent SIGKILL.** When + the MCP host (Claude Code, opencode, …) was force-killed — OOM killer, a + `kill -9`, a container teardown — the child kept running indefinitely on + Linux, holding inotify watches, file descriptors, and the SQLite WAL. The + kernel doesn't propagate parent death to children, and the stdin + `end`/`close` handlers we relied on don't always fire. The MCP server now + polls `process.ppid` and shuts down the moment it changes from the value + observed at startup; the poll interval is `CODEGRAPH_PPID_POLL_MS` (default + `5000`, `0` disables). Resolves + [#277](https://github.com/colbymchenry/codegraph/issues/277). + ### Added - **Release archives now ship with a `SHA256SUMS` file**, and the npm launcher verifies the bundle it downloads against it — a mismatch aborts before diff --git a/__tests__/mcp-ppid-watchdog.test.ts b/__tests__/mcp-ppid-watchdog.test.ts new file mode 100644 index 00000000..0e3dc188 --- /dev/null +++ b/__tests__/mcp-ppid-watchdog.test.ts @@ -0,0 +1,168 @@ +/** + * PPID watchdog regression test (#277). + * + * On Linux, when an MCP host (Claude Code, opencode, …) is SIGKILL'd by the + * OOM killer / a force-quit / a container teardown, the kernel does NOT + * propagate the death to its `codegraph serve --mcp` child. The child gets + * reparented to init/systemd, its stdin stays half-open in some + * configurations, and the existing `stdin.on('end' | 'close')` handlers + * never fire — the server lingers indefinitely, holding inotify watches, + * file descriptors, and the SQLite WAL. + * + * `src/mcp/index.ts` polls `process.ppid` and shuts down the moment it + * diverges from the value observed at startup. This test stands up a + * four-tier process tree (vitest → wrapper → {stdin-holder, codegraph}) and + * SIGKILL's the wrapper. The stdin-holder is a long-lived sibling whose + * `stdout` pipe is dup'd into codegraph's `stdin`. After the wrapper dies + * the pipe stays open (stdin-holder still owns the write-end), so the + * existing stdin close handlers do **not** fire — the only thing that can + * terminate codegraph then is the PPID watchdog. + * + * Windows is excluded — `process.kill(pid, 'SIGKILL')` does not actually + * deliver SIGKILL there, and the per-OS reparenting semantics the watchdog + * relies on are POSIX-specific. + */ +import { describe, it, expect, afterEach } from 'vitest'; +import { spawn, ChildProcessWithoutNullStreams } from 'child_process'; +import * as fs from 'fs'; +import * as os from 'os'; +import * as path from 'path'; + +const BIN = path.resolve(__dirname, '../dist/bin/codegraph.js'); + +function isAlive(pid: number): boolean { + try { + process.kill(pid, 0); + return true; + } catch { + return false; + } +} + +function waitForExit(pid: number, timeoutMs: number): Promise { + return new Promise((resolve) => { + const start = Date.now(); + const tick = () => { + if (!isAlive(pid)) return resolve(true); + if (Date.now() - start > timeoutMs) return resolve(false); + setTimeout(tick, 100); + }; + tick(); + }); +} + +describe.skipIf(process.platform === 'win32')('MCP PPID watchdog (#277)', () => { + let wrapper: ChildProcessWithoutNullStreams | null = null; + let childPid: number | null = null; + let stdinHolderPid: number | null = null; + + afterEach(() => { + if (wrapper && !wrapper.killed) { + try { wrapper.kill('SIGKILL'); } catch { /* already gone */ } + } + // Belt and suspenders — don't leak processes if an assertion failed. + for (const pid of [childPid, stdinHolderPid]) { + if (pid !== null && isAlive(pid)) { + try { process.kill(pid, 'SIGKILL'); } catch { /* already gone */ } + } + } + wrapper = null; + childPid = null; + stdinHolderPid = null; + }); + + it("shuts down when its parent is SIGKILL'd and stdin stays open", async () => { + // The wrapper: + // 1. Spawns a "stdin-holder" — a tiny long-lived node process whose + // `stdout` pipe is dup'd into codegraph's `stdin`. As long as the + // stdin-holder is alive (it is — it's an orphan after the wrapper + // dies), codegraph's stdin never sees EOF. + // 2. Spawns codegraph with that pipe as fd 0 and its stderr redirected + // to a tmp file that survives the wrapper, then reports both PIDs. + // 3. Idles until SIGKILL'd from the test. + // + // CODEGRAPH_PPID_POLL_MS=200 keeps the watchdog responsive in test; the + // production default is 5000ms. + const stderrLog = path.join( + fs.mkdtempSync(path.join(os.tmpdir(), 'cg-ppid-watchdog-')), + 'codegraph.stderr.log', + ); + // The wrapper waits 800ms before reporting the PIDs so the codegraph + // child has time to finish its async start() (dynamic import + transport + // setup + watchdog registration). Otherwise the test races: it + // SIGKILL's the wrapper before the watchdog interval is installed, and + // nothing terminates codegraph. + const wrapperSrc = ` + const { spawn } = require('child_process'); + const fs = require('fs'); + const stderrFd = fs.openSync(${JSON.stringify(stderrLog)}, 'a'); + const stdinHolder = spawn(process.execPath, ['-e', 'setInterval(() => {}, 60000)'], { + stdio: ['ignore', 'pipe', 'ignore'], + detached: true, + }); + stdinHolder.unref(); + const child = spawn(process.execPath, [${JSON.stringify(BIN)}, 'serve', '--mcp'], { + stdio: [stdinHolder.stdout, 'ignore', stderrFd], + env: { ...process.env, CODEGRAPH_PPID_POLL_MS: '200' }, + detached: true, + }); + child.unref(); + setTimeout(() => { + process.stdout.write(JSON.stringify({ pid: child.pid, stdinHolderPid: stdinHolder.pid }) + '\\n'); + }, 800); + setInterval(() => {}, 60000); + `; + wrapper = spawn(process.execPath, ['-e', wrapperSrc], { + stdio: ['pipe', 'pipe', 'pipe'], + }) as ChildProcessWithoutNullStreams; + + const pids = await new Promise<{ pid: number; stdinHolderPid: number }>((resolve, reject) => { + let buf = ''; + const timer = setTimeout( + () => reject(new Error('wrapper did not report PIDs in time')), + 10000, + ); + wrapper!.stdout.on('data', (chunk: Buffer) => { + buf += chunk.toString('utf8'); + const m = buf.match(/\{"pid":(\d+),"stdinHolderPid":(\d+)\}/); + if (m) { + clearTimeout(timer); + resolve({ pid: parseInt(m[1], 10), stdinHolderPid: parseInt(m[2], 10) }); + } + }); + wrapper!.on('exit', () => { + clearTimeout(timer); + reject(new Error('wrapper exited before reporting PIDs')); + }); + }); + childPid = pids.pid; + stdinHolderPid = pids.stdinHolderPid; + + expect(isAlive(childPid)).toBe(true); + expect(isAlive(stdinHolderPid)).toBe(true); + + // SIGKILL the wrapper — no cleanup runs, just like a real OOM kill. + // codegraph and the stdin-holder both get reparented to init/systemd. + // Crucially, the pipe between them stays open, so codegraph's stdin + // doesn't close: only the watchdog can take it down. + wrapper.kill('SIGKILL'); + + // Watchdog runs every 200ms in this test → 5s gives ~25 polls of headroom. + const exited = await waitForExit(childPid, 5000); + const stderrContent = fs.existsSync(stderrLog) ? fs.readFileSync(stderrLog, 'utf-8') : ''; + expect( + exited, + `codegraph child (pid=${childPid}) did not exit within 5s after wrapper was SIGKILL'd.\nstderr:\n${stderrContent}`, + ).toBe(true); + // The watchdog announces itself before tearing down — assert that the + // shutdown came from the parent-death path, not from any other signal. + expect(stderrContent).toMatch(/Parent process exited.*shutting down/); + + // The stdin-holder is now an orphan — kill it explicitly so it doesn't + // outlive the test. It's still tracked in `stdinHolderPid` for the + // afterEach safety net, but we tidy up proactively here too. + if (isAlive(stdinHolderPid)) { + try { process.kill(stdinHolderPid, 'SIGKILL'); } catch { /* race */ } + } + }, 20000); +}); diff --git a/src/extraction/wasm-runtime-flags.ts b/src/extraction/wasm-runtime-flags.ts index f33a19ff..e44c84d8 100644 --- a/src/extraction/wasm-runtime-flags.ts +++ b/src/extraction/wasm-runtime-flags.ts @@ -46,6 +46,19 @@ export const WASM_RUNTIME_FLAGS: readonly string[] = ['--liftoff-only']; */ const RELAUNCH_GUARD_ENV = 'CODEGRAPH_WASM_RELAUNCHED'; +/** + * Env var carrying the *host* PID (the relauncher's own parent) across the + * re-exec. Without `--liftoff-only` the CLI re-execs itself once, inserting an + * intermediate process between the MCP host and the server. That intermediate + * stays alive (blocked in spawnSync) even after the host is killed, so the + * server's PPID watchdog can't detect the host's death by watching its own + * `process.ppid`. Passing the host PID through lets the watchdog poll it + * directly. Unset on the no-re-exec path (bundled launcher / flag already + * present), where the server is already a direct child of the host. See + * src/mcp/index.ts (#277). + */ +export const HOST_PPID_ENV = 'CODEGRAPH_HOST_PPID'; + /** True when every required WASM runtime flag is already present in `execArgv`. */ export function processHasWasmRuntimeFlags( execArgv: readonly string[] = process.execArgv @@ -84,7 +97,7 @@ export function relaunchWithWasmRuntimeFlagsIfNeeded(scriptPath: string): void { const argv = buildRelaunchArgv(scriptPath, process.argv.slice(2)); const result = spawnSync(process.execPath, argv, { stdio: 'inherit', - env: { ...process.env, [RELAUNCH_GUARD_ENV]: '1' }, + env: { ...process.env, [RELAUNCH_GUARD_ENV]: '1', [HOST_PPID_ENV]: String(process.ppid) }, }); if (result.error) { diff --git a/src/mcp/index.ts b/src/mcp/index.ts index c790a4bc..8d0e35d7 100644 --- a/src/mcp/index.ts +++ b/src/mcp/index.ts @@ -21,6 +21,7 @@ import { watchDisabledReason } from '../sync'; import { StdioTransport, JsonRpcRequest, JsonRpcNotification, ErrorCodes } from './transport'; import { tools, ToolHandler } from './tools'; import { SERVER_INSTRUCTIONS } from './server-instructions'; +import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags'; /** * Convert a file:// URI to a filesystem path. @@ -60,6 +61,51 @@ const PROTOCOL_VERSION = '2024-11-05'; */ const ROOTS_LIST_TIMEOUT_MS = 5000; +/** + * How often to poll `process.ppid` to detect parent process death (see #277). + * 5s is a deliberate trade-off: the failure mode being guarded against is rare + * (parent SIGKILL'd), and longer poll = less wakeup overhead while idle. + */ +const DEFAULT_PPID_POLL_MS = 5000; + +/** + * Resolve the PPID watchdog poll interval from an env override. A value of + * `0` disables the watchdog entirely (escape hatch for embedded scenarios + * where the parent legitimately re-parents the server on purpose). Anything + * non-numeric or negative falls back to the default. + */ +function parsePpidPollMs(raw: string | undefined): number { + if (raw === undefined || raw === '') return DEFAULT_PPID_POLL_MS; + const parsed = Number(raw); + if (!Number.isFinite(parsed)) return DEFAULT_PPID_POLL_MS; + if (parsed < 0) return DEFAULT_PPID_POLL_MS; + return Math.floor(parsed); +} + +/** + * Parse the host PID propagated across the `--liftoff-only` re-exec + * ({@link HOST_PPID_ENV}). Returns a positive integer PID, or null when + * unset/invalid — the direct-launch path, where the watchdog falls back to + * `process.ppid` divergence. PIDs of 0/1 are rejected (0 = unknown, 1 = init, + * i.e. already orphaned), so the watchdog doesn't latch onto init. + */ +function parseHostPpid(raw: string | undefined): number | null { + if (raw === undefined || raw === '') return null; + const parsed = Number(raw); + if (!Number.isInteger(parsed) || parsed <= 1) return null; + return parsed; +} + +/** True if a process with `pid` currently exists (signal-0 probe). */ +function isProcessAlive(pid: number): boolean { + try { + process.kill(pid, 0); + return true; + } catch { + return false; + } +} + /** * Extract the first usable filesystem path from a `roots/list` result. * Shape per MCP spec: `{ roots: [{ uri: "file:///path", name?: string }] }`. @@ -95,6 +141,19 @@ export class MCPServer { // Guards the one-shot deferred resolution (roots/list or cwd) so we don't // re-issue roots/list on every tool call. private rootsAttempted = false; + // PPID watchdog — see start(). Captured at construction so we always have a + // baseline, even if start() runs after a fork-style reparent. + private originalPpid: number = process.ppid; + // The MCP host's PID, propagated across the `--liftoff-only` re-exec (see + // HOST_PPID_ENV). When set, the watchdog polls it directly: the re-exec + // inserts an intermediate process whose *death* — not just our reparenting — + // is what we'd otherwise miss. null on the direct (bundled) launch path. + private hostPpid: number | null = parseHostPpid(process.env[HOST_PPID_ENV]); + private ppidWatchdog: ReturnType | null = null; + // Idempotency guard for stop(). Without it, the watchdog can race with the + // stdin `end`/`close` handlers (or SIGTERM/SIGINT) and double-close cg and + // the transport before process.exit() lands. + private stopped = false; constructor(projectPath?: string) { this.projectPath = projectPath || null; @@ -122,6 +181,38 @@ export class MCPServer { // Detect this and shut down gracefully to prevent orphaned processes. process.stdin.on('end', () => this.stop()); process.stdin.on('close', () => this.stop()); + + // PPID watchdog (#277). Linux doesn't propagate parent death to children, + // so when the MCP host (Claude Code, opencode, …) is SIGKILL'd by the OOM + // killer / a force-quit / a container teardown, the child is reparented to + // init/systemd and the stdin `end`/`close` events don't always fire. The + // server would then linger indefinitely, holding inotify watches, file + // descriptors, and the SQLite WAL. Poll `process.ppid` and shut down the + // moment it changes from what we observed at startup. Cross-platform: + // reparenting changes ppid on Linux *and* macOS; on Windows the value can + // also drop to 0 once the parent is gone. When the CLI re-execs itself for + // `--liftoff-only`, an intermediate process sits between us and the host and + // outlives it, so our own ppid wouldn't change — in that case we poll the + // host PID (propagated via HOST_PPID_ENV) for liveness instead. The watchdog + // is `.unref()`'d so it never holds the event loop open on its own. + const pollMs = parsePpidPollMs(process.env.CODEGRAPH_PPID_POLL_MS); + if (pollMs > 0) { + this.ppidWatchdog = setInterval(() => { + const current = process.ppid; + const ppidChanged = current !== this.originalPpid; + const hostGone = this.hostPpid !== null && !isProcessAlive(this.hostPpid); + if (ppidChanged || hostGone) { + const reason = ppidChanged + ? `ppid ${this.originalPpid} -> ${current}` + : `host pid ${this.hostPpid} exited`; + process.stderr.write( + `[CodeGraph MCP] Parent process exited (${reason}); shutting down.\n` + ); + this.stop(); + } + }, pollMs); + this.ppidWatchdog.unref(); + } } /** @@ -283,6 +374,12 @@ export class MCPServer { * Stop the server */ stop(): void { + if (this.stopped) return; + this.stopped = true; + if (this.ppidWatchdog) { + clearInterval(this.ppidWatchdog); + this.ppidWatchdog = null; + } // Close all cached cross-project connections first this.toolHandler.closeAll(); // Close the main CodeGraph instance From 1f11de73ffbc2fd31e064dec97e156d842a3ef3a Mon Sep 17 00:00:00 2001 From: zhuchaokn Date: Sat, 23 May 2026 04:18:23 +0800 Subject: [PATCH 44/47] feat(cli): add callers, callees, impact commands for CLI/MCP parity (#204) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add `codegraph callers`, `codegraph callees`, and `codegraph impact` CLI commands, bringing the CLI to parity with the codegraph_callers/callees/impact MCP tools — so the graph-traversal queries work in scripts, CI, and git hooks without a running MCP server. All three support `--path` and `--json`; `impact` groups output by file to match the MCP layout. Co-authored-by: Claude Opus 4.7 (1M context) --- README.md | 3 + src/bin/codegraph.ts | 261 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 264 insertions(+) diff --git a/README.md b/README.md index a2c8801b..467fdd1d 100644 --- a/README.md +++ b/README.md @@ -352,6 +352,9 @@ codegraph status [path] # Show statistics codegraph query # Search symbols (--kind, --limit, --json) codegraph files [path] # Show file structure (--format, --filter, --max-depth, --json) codegraph context # Build context for AI (--format, --max-nodes) +codegraph callers # Find what calls a function/method (--limit, --json) +codegraph callees # Find what a function/method calls (--limit, --json) +codegraph impact # Analyze what code is affected by changing a symbol (--depth, --json) codegraph affected [files...] # Find test files affected by changes (see below) codegraph serve --mcp # Start MCP server ``` diff --git a/src/bin/codegraph.ts b/src/bin/codegraph.ts index 711d39c8..6bc63b3f 100644 --- a/src/bin/codegraph.ts +++ b/src/bin/codegraph.ts @@ -16,6 +16,9 @@ * codegraph query Search for symbols * codegraph files [options] Show project file structure * codegraph context Build context for a task + * codegraph callers Find what calls a function/method + * codegraph callees Find what a function/method calls + * codegraph impact Analyze what code is affected by changing a symbol * codegraph affected [files] Find test files affected by changes */ @@ -1207,6 +1210,264 @@ program } }); +/** + * codegraph callers + * + * CLI parity with the MCP graph tools (codegraph_callers/callees/impact) so the + * traversal queries work in scripts, CI, and git hooks without a running MCP + * server. + */ +program + .command('callers ') + .description('Find all functions/methods that call a specific symbol') + .option('-p, --path ', 'Project path') + .option('-l, --limit ', 'Maximum results', '20') + .option('-j, --json', 'Output as JSON') + .action(async (symbol: string, options: { path?: string; limit?: string; json?: boolean }) => { + const projectPath = resolveProjectPath(options.path); + + try { + if (!isInitialized(projectPath)) { + error(`CodeGraph not initialized in ${projectPath}`); + process.exit(1); + } + + const { default: CodeGraph } = await loadCodeGraph(); + const cg = await CodeGraph.open(projectPath); + const limit = parseInt(options.limit || '20', 10); + + const matches = cg.searchNodes(symbol, { limit: 50 }); + if (matches.length === 0) { + info(`Symbol "${symbol}" not found`); + cg.destroy(); + return; + } + + const seen = new Set(); + const allCallers: Array<{ name: string; kind: string; filePath: string; startLine?: number }> = []; + + for (const match of matches) { + const exactMatch = match.node.name === symbol || match.node.name.endsWith(`.${symbol}`) || match.node.name.endsWith(`::${symbol}`); + if (!exactMatch && matches.length > 1) continue; + for (const c of cg.getCallers(match.node.id)) { + if (!seen.has(c.node.id)) { + seen.add(c.node.id); + allCallers.push({ name: c.node.name, kind: c.node.kind, filePath: c.node.filePath, startLine: c.node.startLine }); + } + } + } + + // Fallback: if exact filter removed everything, use the top match + if (allCallers.length === 0 && matches[0]) { + for (const c of cg.getCallers(matches[0].node.id)) { + if (!seen.has(c.node.id)) { + seen.add(c.node.id); + allCallers.push({ name: c.node.name, kind: c.node.kind, filePath: c.node.filePath, startLine: c.node.startLine }); + } + } + } + + const limited = allCallers.slice(0, limit); + + if (options.json) { + console.log(JSON.stringify({ symbol, callers: limited }, null, 2)); + } else if (limited.length === 0) { + info(`No callers found for "${symbol}"`); + } else { + console.log(chalk.bold(`\nCallers of "${symbol}" (${limited.length}):\n`)); + for (const node of limited) { + const loc = node.startLine ? `:${node.startLine}` : ''; + console.log( + chalk.cyan(node.kind.padEnd(12)) + + chalk.white(node.name) + ); + console.log(chalk.dim(` ${node.filePath}${loc}`)); + console.log(); + } + } + + cg.destroy(); + } catch (err) { + error(`callers failed: ${err instanceof Error ? err.message : String(err)}`); + process.exit(1); + } + }); + +/** + * codegraph callees + */ +program + .command('callees ') + .description('Find all functions/methods that a specific symbol calls') + .option('-p, --path ', 'Project path') + .option('-l, --limit ', 'Maximum results', '20') + .option('-j, --json', 'Output as JSON') + .action(async (symbol: string, options: { path?: string; limit?: string; json?: boolean }) => { + const projectPath = resolveProjectPath(options.path); + + try { + if (!isInitialized(projectPath)) { + error(`CodeGraph not initialized in ${projectPath}`); + process.exit(1); + } + + const { default: CodeGraph } = await loadCodeGraph(); + const cg = await CodeGraph.open(projectPath); + const limit = parseInt(options.limit || '20', 10); + + const matches = cg.searchNodes(symbol, { limit: 50 }); + if (matches.length === 0) { + info(`Symbol "${symbol}" not found`); + cg.destroy(); + return; + } + + const seen = new Set(); + const allCallees: Array<{ name: string; kind: string; filePath: string; startLine?: number }> = []; + + for (const match of matches) { + const exactMatch = match.node.name === symbol || match.node.name.endsWith(`.${symbol}`) || match.node.name.endsWith(`::${symbol}`); + if (!exactMatch && matches.length > 1) continue; + for (const c of cg.getCallees(match.node.id)) { + if (!seen.has(c.node.id)) { + seen.add(c.node.id); + allCallees.push({ name: c.node.name, kind: c.node.kind, filePath: c.node.filePath, startLine: c.node.startLine }); + } + } + } + + if (allCallees.length === 0 && matches[0]) { + for (const c of cg.getCallees(matches[0].node.id)) { + if (!seen.has(c.node.id)) { + seen.add(c.node.id); + allCallees.push({ name: c.node.name, kind: c.node.kind, filePath: c.node.filePath, startLine: c.node.startLine }); + } + } + } + + const limited = allCallees.slice(0, limit); + + if (options.json) { + console.log(JSON.stringify({ symbol, callees: limited }, null, 2)); + } else if (limited.length === 0) { + info(`No callees found for "${symbol}"`); + } else { + console.log(chalk.bold(`\nCallees of "${symbol}" (${limited.length}):\n`)); + for (const node of limited) { + const loc = node.startLine ? `:${node.startLine}` : ''; + console.log( + chalk.cyan(node.kind.padEnd(12)) + + chalk.white(node.name) + ); + console.log(chalk.dim(` ${node.filePath}${loc}`)); + console.log(); + } + } + + cg.destroy(); + } catch (err) { + error(`callees failed: ${err instanceof Error ? err.message : String(err)}`); + process.exit(1); + } + }); + +/** + * codegraph impact + */ +program + .command('impact ') + .description('Analyze what code is affected by changing a symbol') + .option('-p, --path ', 'Project path') + .option('-d, --depth ', 'Traversal depth', '2') + .option('-j, --json', 'Output as JSON') + .action(async (symbol: string, options: { path?: string; depth?: string; json?: boolean }) => { + const projectPath = resolveProjectPath(options.path); + + try { + if (!isInitialized(projectPath)) { + error(`CodeGraph not initialized in ${projectPath}`); + process.exit(1); + } + + const { default: CodeGraph } = await loadCodeGraph(); + const cg = await CodeGraph.open(projectPath); + const depth = Math.min(Math.max(parseInt(options.depth || '2', 10), 1), 10); + + const matches = cg.searchNodes(symbol, { limit: 50 }); + if (matches.length === 0) { + info(`Symbol "${symbol}" not found`); + cg.destroy(); + return; + } + + // Merge impact subgraphs across all exact-matching symbols + const mergedNodes = new Map(); + const seenEdges = new Set(); + let edgeCount = 0; + + for (const match of matches) { + const exactMatch = match.node.name === symbol || match.node.name.endsWith(`.${symbol}`) || match.node.name.endsWith(`::${symbol}`); + if (!exactMatch && matches.length > 1) continue; + const impact = cg.getImpactRadius(match.node.id, depth); + for (const [id, n] of impact.nodes) { + mergedNodes.set(id, { name: n.name, kind: n.kind, filePath: n.filePath, startLine: n.startLine }); + } + for (const e of impact.edges) { + const key = `${e.source}->${e.target}:${e.kind}`; + if (!seenEdges.has(key)) { + seenEdges.add(key); + edgeCount++; + } + } + } + + // Fallback to top match if exact filter removed everything + if (mergedNodes.size === 0 && matches[0]) { + const impact = cg.getImpactRadius(matches[0].node.id, depth); + for (const [id, n] of impact.nodes) { + mergedNodes.set(id, { name: n.name, kind: n.kind, filePath: n.filePath, startLine: n.startLine }); + } + edgeCount = impact.edges.length; + } + + if (options.json) { + console.log(JSON.stringify({ + symbol, + depth, + nodeCount: mergedNodes.size, + edgeCount, + affected: Array.from(mergedNodes.values()), + }, null, 2)); + } else if (mergedNodes.size === 0) { + info(`No affected symbols found for "${symbol}"`); + } else { + console.log(chalk.bold(`\nImpact of changing "${symbol}" — ${mergedNodes.size} affected symbols:\n`)); + + // Group by file + const byFile = new Map>(); + for (const node of mergedNodes.values()) { + const list = byFile.get(node.filePath) || []; + list.push({ name: node.name, kind: node.kind, startLine: node.startLine }); + byFile.set(node.filePath, list); + } + + for (const [file, nodes] of byFile) { + console.log(chalk.cyan(file)); + for (const node of nodes) { + const loc = node.startLine ? `:${node.startLine}` : ''; + console.log(` ${chalk.dim(node.kind.padEnd(12))}${node.name}${chalk.dim(loc)}`); + } + console.log(); + } + } + + cg.destroy(); + } catch (err) { + error(`impact failed: ${err instanceof Error ? err.message : String(err)}`); + process.exit(1); + } + }); + /** * codegraph affected [files...] * From f366222dbd6b7e43047072a9417289b1b02ae457 Mon Sep 17 00:00:00 2001 From: Aimore Date: Fri, 22 May 2026 21:23:24 +0100 Subject: [PATCH 45/47] docs(readme): add codegraph_explore to the MCP Tools table (#226) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the missing `codegraph_explore` row to the 'MCP server exposes these tools' table — tools.ts exports 9 tools but the table listed 8. (The PR's Node-badge bump was dropped: that badge was replaced by 'Node.js bundled · none required' when the runtime became self-contained.) Co-authored-by: Claude Opus 4.7 (1M context) --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 467fdd1d..faf357bc 100644 --- a/README.md +++ b/README.md @@ -401,6 +401,7 @@ When running as an MCP server, CodeGraph exposes these tools to Claude Code: | `codegraph_callees` | Find what a function calls | | `codegraph_impact` | Analyze what code is affected by changing a symbol | | `codegraph_node` | Get details about a specific symbol (optionally with source code) | +| `codegraph_explore` | Return source for several related symbols grouped by file, plus a relationship map, in one call | | `codegraph_files` | Get indexed file structure (faster than filesystem scanning) | | `codegraph_status` | Check index health and statistics | From 025ebc88d6d708edd3732f5cb68516148719a061 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Sun, 24 May 2026 04:41:04 -0500 Subject: [PATCH 46/47] Release 0.9.4: framework-aware routing + dynamic-dispatch coverage + retrieval improvements (#365) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat(resolution): close dynamic-dispatch coverage holes (callback synthesis + django ORM) Static tree-sitter extraction misses calls whose target is computed or indirect, so flows through callbacks, observers, and descriptors were absent from the graph. - callback-synthesizer.ts: whole-graph pass after base resolution. Detects registrar/dispatcher channels (field-backed observers + string-keyed EventEmitters), correlates registration sites, and synthesizes dispatcher->callback `calls` edges (provenance:'heuristic'). Records the registration site (registeredAt) in edge metadata. Precision guards: named handlers only, registrar-name match, event fan-out cap. - frameworks/python.ts + resolution/{index,types}.ts: claimsReference hook + django ORM resolver (_iterable_class -> ModelIterable.__iter__). - extraction/tree-sitter.ts: extract named nested functions so inline named handlers become linkable nodes. trace(mutateElement, triggerRender) and trace(_fetch_all, execute_sql) now connect; node count stable (no explosion). Co-Authored-By: Claude Opus 4.7 (1M context) * feat(mcp): self-sufficient flow output + fix explore budget regression - Surface synthesized-edge evidence in trace, the node trail, and context call paths: a dynamic-dispatch hop now shows "callback via onUpdate @App.tsx:3148" with the registration site inline (and trace inlines each hop's call-site source line) -- the exact glue agents previously Read/Grep'd to reconstruct. - Fix non-monotonic explore output budget: the 500-5000 file tier capped maxCharsPerFile at 2500, BELOW the <500 tier's 3800, so on god-file projects (excalidraw's 415 KB App.tsx) one explore returned <1% of the file and forced a Read. Raised to 6500/file, 28000 total. - Stop explore from inviting Read: truncation/trim notes said "use Read for more"; they now steer to another codegraph_explore and treat returned source as already Read. Measured on excalidraw: best-case flow answer went from 5 reads / 131s to 0 reads / 73s with ~3-4 codegraph calls. Co-Authored-By: Claude Opus 4.7 (1M context) * chore(agent-eval): coverage probes, block-read hook, and design docs Dev-only validation harness for the dynamic-dispatch coverage work: - probe-{trace,node,context,explore}.mjs: drive MCP tools against a built index without a full agent run. - block-read-hook.sh + hook-settings.json: PreToolUse experiment that denies source Reads to measure codegraph sufficiency (forced Read-0). - docs/design/: callback-edge-synthesis + dynamic-dispatch-coverage playbook. Co-Authored-By: Claude Opus 4.7 (1M context) * feat(resolution): bridge React boundaries — re-render + JSX child synthesis Closes the two dynamic-dispatch hops that broke "state mutation -> on-screen render" flows in React apps. Both are call-invisible (React-internal) but the code between them is fully call-connected, so one synthesized edge each makes the whole flow trace end-to-end. - reactRenderEdges: setState(...) re-runs the component's render(). For each class with a render method, link sibling methods calling this.setState -> render. The setState gate keeps it to React class components. - reactJsxChildEdges: a component that returns mounts Child. Link parent -> each capitalized JSX child, resolved to a component/function/class node (the resolution gate drops TS generics like Array). File-oriented, capped per parent. - Surface both in synthEdgeNote (trace + node trail) and context call-paths. Validated on excalidraw: trace(mutateElement, renderStaticScene) now connects in 6 hops across callback -> react-render -> jsx-child; 1 + 46 + 280 synthesized edges, node count stable (no explosion). Partial coverage is worse than none: react-render alone raised agent reads (revealed a hop it then drilled); adding the jsx hop closed the flow and dropped reads to 0-1. Co-Authored-By: Claude Opus 4.7 (1M context) * docs(claude): retrieval performance contract + coverage validation methodology Add a "Retrieval performance & dynamic-dispatch coverage" section so future changes/PRs don't silently regress agent retrieval: - the explore call+output budget table by repo size, with the monotonic-per-file invariant (the bug that started this: <5000 tier's 2500 < <500 tier's 3800). - the "partial coverage is worse than none" principle. - the required validation methodology (small/medium/large x >=3 prompts per language x framework; deterministic probes + agent A/B; pass bar). - the Excalidraw worked example (before/after numbers) as the template to replicate for every language/framework. Co-Authored-By: Claude Opus 4.7 (1M context) * docs(claude): use full n=4 measured range in Excalidraw worked example Best run 0 Read/3 cg/76s; typical ~1 Read/~4 cg; occasional over-drill outlier. Report the range, not a single run — run-to-run variance is large. Co-Authored-By: Claude Opus 4.7 (1M context) * feat(mcp): steer flow questions to codegraph_trace first (tightens variance) codegraph_trace was absent from every steering intent map — all three guidance files routed "how does X reach Y" to context+explore, never to the trace tool. So agents used trace only by chance; when one didn't, it floundered reconstructing the path with search+callers (an 18-call run vs ~6 for trace-users). Add codegraph_trace to the intent map + a "flow" common chain (trace from->to FIRST = the whole path in one call, then ONE explore for bodies) across all three synced files (server-instructions, instructions-template, .cursor rule). Validated on excalidraw (hard "to the screen" Q, n=4 before/after): - call count 3-10 -> 3-4 (over-drill outlier gone) - duration 64-112s -> 51-74s - trace adoption 3/4 -> 4/4; search+callers path-reconstruction -> 0 - fully-clean runs (0 Read, 0 Grep) 0/4 -> 2/4; best 3 cg / 0 / 0 / 51s Co-Authored-By: Claude Opus 4.7 (1M context) * feat(resolution): Vue SFC template coverage (events + kebab components) The .vue extractor only parses