From d21d2dfa504b1e61979ea518e7f7a07a60571c0a Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 00:35:49 -0400 Subject: [PATCH 01/31] fix(objc): resolve chained message-send calls [[Foo create] doIt] (#750) (#786) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ports the #645/#608 chained-receiver mechanism to Objective-C. A message send whose receiver is itself a message send — `[[Foo create] doIt]` — used to drop the receiver, so `doIt` name-matched a same-named method on an unrelated class (commonly a test helper's `init` or an Apple-SDK method). - objc.ts: getReturnType reads the method's `method_type`, SKIPPING nullability / ARC qualifiers (`nonnull instancetype` must yield instancetype, not `nonnull`). - tree-sitter.ts: the message_expression branch now re-encodes a chained send `[[Foo create] doIt]` as `Foo.create().doIt` when the inner receiver is a capitalized class and the outer selector is unary. - name-matcher.ts: `objc` joins the dotted-chain gate + CHAIN_LANGUAGES. A class-message factory returns an instance of the RECEIVER class by convention (`instancetype`), so when the factory's own return type isn't recoverable (`alloc`/`new`/`shared…` return instancetype, or aren't user nodes), the receiver's type is the class itself — this resolves the ubiquitous `[[X alloc] init]` and singleton chains. resolveMethodOnType validates against the class and its supertypes, so a wrong inference yields no edge. Validation: 4 synthetic tests (factory+decoy, superclass conformance, absent-method safety, the nonnull-instancetype singleton). Real-repo A/B on SDWebImage (208 files): +35 / -75 — all corrections (the -75 are wrong `init` mis-matches to a test helper / wrong class, retargeted to the right class's init in the +35, plus 2 Apple-SDK chains on unindexed classes). db stable, no node explosion. EXTRACTION_VERSION 14->15. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/resolution.test.ts | 137 +++++++++++++++++++++++++++ src/extraction/extraction-version.ts | 2 +- src/extraction/languages/objc.ts | 41 ++++++++ src/extraction/tree-sitter.ts | 27 ++++++ src/resolution/index.ts | 2 +- src/resolution/name-matcher.ts | 32 +++++-- 7 files changed, 233 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 382cec9c2..fe1778ca2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -33,6 +33,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - Scala method calls made through a companion-object factory, a fluent chain, or a case-class `apply` now resolve to the correct type. A call like `Foo.create().bar()` or `Builder(cfg).bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.flatMap` / `.foreach` onto your own same-named class. CodeGraph now captures Scala return types (a generic `List[Foo]` resolves to its container `List`, a qualified `pkg.Foo` to `Foo`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a trait the type extends — creating the edge only when that type or one of its traits genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Scala indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Scala) - Rust method calls made through a chained associated function now resolve to the correct type. A call like `Foo::new().bar()` or `Foo::with(cfg).build()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — or didn't resolve. CodeGraph now captures Rust return types (`-> Self` resolves to the implementing type), infers the chained receiver's type from what the associated function returns, and resolves the method on it — including methods provided by a trait the type implements (via the new `impl Trait for Type` relationships) — creating the edge only when the type or one of its traits genuinely has the method. Existing Rust indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Rust) - Dart method calls made through a static factory, a factory or named constructor, or a fluent chain now resolve to the correct type. A call like `Foo.create().bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.where` onto your own same-named class. CodeGraph now indexes Dart **factory and named constructors** (`factory Foo.create()`, `Foo.named()`) as first-class members so calls to them resolve, captures Dart return types (a generic `List` resolves to its container `List`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a superclass or mixin — creating the edge only when that type genuinely has the method. Plain construction (`Foo(...)`) is still recorded as instantiation. Existing Dart indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Dart) +- Objective-C methods called through a chained message send now resolve to the correct class. A call like `[[Foo create] doIt]` used to drop the receiver, so `doIt` silently attached to a same-named method on an unrelated class — most often a test helper or stdlib class. CodeGraph now captures Objective-C method return types and infers the chained receiver's type from what the inner message returns. For the ubiquitous `[[X alloc] init]` and singleton (`[[X sharedInstance] …]`) patterns — where the factory returns `instancetype` — the receiver is the class `X` itself, so the chained method resolves on `X` (including methods inherited from a superclass), creating the edge only when the class genuinely has the method. Existing Objective-C indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Objective-C) - Chained method calls now resolve when the chained method is **inherited from a superclass or declared on an interface/protocol** the receiver's type conforms to — for example a call on a sealed-subclass instance (`Either.Right(x).combine(...)`) that invokes a method defined on its parent type. Previously these chains found no caller edge even though the factory's type was known, so the call was invisible to callers, impact, and trace. CodeGraph now walks the type's supertypes (its `extends` / `implements` relationships) to find the method, creating the edge only when a supertype genuinely declares it (so a wrong inference still produces no edge). This makes Java, Kotlin, and C# factory and fluent chains more complete. Existing indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) - Swift method calls made through a static factory, fluent chain, or constructor now resolve to the correct class. A call like `Foo.make().draw()` or `Foo().draw()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all. CodeGraph now captures Swift return types and infers the chained receiver's type from what the inner call returns (or the constructed type), creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Swift indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Swift) - C# method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.Create().Bar()` or `JObject.Parse(s).Property(...)` used to lose the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. CodeGraph now captures C# return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge). Existing C# indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (C#) diff --git a/__tests__/resolution.test.ts b/__tests__/resolution.test.ts index ea2b3c5ca..868e9b07a 100644 --- a/__tests__/resolution.test.ts +++ b/__tests__/resolution.test.ts @@ -2994,4 +2994,141 @@ void run() { expect(incoming.some((e) => e.kind === 'instantiates')).toBe(true); }); }); + + describe('Objective-C chained message-send call resolution (#645/#608 mechanism)', () => { + function callerNamesOf(qualifiedName: string): string[] { + const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName); + if (!target) return []; + const names = cg + .getIncomingEdges(target.id) + .filter((e) => e.kind === 'calls') + .map((e) => cg.getNode(e.source)?.name) + .filter((n): n is string => !!n); + return [...new Set(names)].sort(); + } + + it('resolves a chained message send [[Foo create] doIt] via the return type, never a same-named decoy', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.m'), + `@interface Bar : NSObject +- (void)doIt; +@end +@implementation Bar +- (void)doIt {} +@end +@interface Decoy : NSObject +- (void)doIt; +@end +@implementation Decoy +- (void)doIt {} +@end +@interface Foo : NSObject ++ (Bar *)create; +@end +@implementation Foo ++ (Bar *)create { return nil; } +- (void)run { [[Foo create] doIt]; } +@end +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(callerNamesOf('Bar::doIt')).toEqual(['run']); + expect(callerNamesOf('Decoy::doIt')).toEqual([]); + }); + + it('resolves a chained message whose method is inherited from a superclass (via conformance)', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.m'), + `@interface Base : NSObject +- (void)render; +@end +@implementation Base +- (void)render {} +@end +@interface Widget : Base +@end +@implementation Widget +@end +@interface Decoy : NSObject +- (void)render; +@end +@implementation Decoy +- (void)render {} +@end +@interface Factory : NSObject ++ (Widget *)make; +@end +@implementation Factory ++ (Widget *)make { return nil; } +- (void)run { [[Factory make] render]; } +@end +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(callerNamesOf('Base::render')).toEqual(['run']); + expect(callerNamesOf('Decoy::render')).toEqual([]); + }); + + it('creates NO edge when the factory return type lacks the method (silent miss)', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.m'), + `@interface Bar : NSObject +@end +@implementation Bar +@end +@interface Other : NSObject +- (void)onlyOther; +@end +@implementation Other +- (void)onlyOther {} +@end +@interface Foo : NSObject ++ (Bar *)create; +@end +@implementation Foo ++ (Bar *)create { return nil; } +- (void)run { [[Foo create] onlyOther]; } +@end +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + // Bar has no onlyOther — must not mis-attach to the same-named Other::onlyOther. + expect(callerNamesOf('Other::onlyOther')).toEqual([]); + }); + + it('resolves a singleton chain [[Cache shared] clearAll] whose factory returns nonnull instancetype', async () => { + // The factory returns `nonnull instancetype` — the nullability qualifier must + // be skipped (not captured AS the type), and an instancetype class-message + // factory returns the receiver class, so clearAll resolves on Cache, never a + // same-named decoy. (Regression for both: the captured-`nonnull` bug and the + // ubiquitous `[[X alloc] init]` / singleton pattern.) + fs.writeFileSync( + path.join(tempDir, 'main.m'), + `@interface Cache : NSObject ++ (nonnull instancetype)shared; +- (void)clearAll; +@end +@implementation Cache ++ (nonnull instancetype)shared { return nil; } +- (void)clearAll {} +@end +@interface Decoy : NSObject +- (void)clearAll; +@end +@implementation Decoy +- (void)clearAll {} +@end +@interface Caller : NSObject +- (void)run; +@end +@implementation Caller +- (void)run { [[Cache shared] clearAll]; } +@end +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(callerNamesOf('Cache::clearAll')).toEqual(['run']); + expect(callerNamesOf('Decoy::clearAll')).toEqual([]); + }); + }); }); diff --git a/src/extraction/extraction-version.ts b/src/extraction/extraction-version.ts index 70ede13de..2aba578ae 100644 --- a/src/extraction/extraction-version.ts +++ b/src/extraction/extraction-version.ts @@ -21,4 +21,4 @@ * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty * in the product is load-bearing"). */ -export const EXTRACTION_VERSION = 14; +export const EXTRACTION_VERSION = 15; diff --git a/src/extraction/languages/objc.ts b/src/extraction/languages/objc.ts index 6671284aa..cf5ecc4d7 100644 --- a/src/extraction/languages/objc.ts +++ b/src/extraction/languages/objc.ts @@ -31,6 +31,46 @@ function extractObjcMethodName(node: SyntaxNode, source: string): string | undef return identifiers.map((id) => `${getNodeText(id, source)}:`).join(''); } +/** Nullability / ARC qualifiers that sit where a return type's first type + * identifier does (`(nonnull instancetype)`, `(nullable Bar *)`) — never the type. */ +const OBJC_TYPE_QUALIFIERS = new Set([ + 'nonnull', 'nullable', 'null_unspecified', 'null_resettable', + '_Nonnull', '_Nullable', '_Null_unspecified', '__nonnull', '__nullable', + 'const', 'volatile', 'strong', 'weak', 'copy', 'assign', 'retain', 'oneway', + '__strong', '__weak', '__unsafe_unretained', '__autoreleasing', '__kindof', +]); + +/** Collect the type identifiers under a `method_type`, in document order. */ +function collectTypeIdentifiers(node: SyntaxNode, source: string, out: string[]): void { + if (node.type === 'type_identifier') out.push(getNodeText(node, source).trim()); + for (let i = 0; i < node.namedChildCount; i++) { + const child = node.namedChild(i); + if (child) collectTypeIdentifiers(child, source, out); + } +} + +/** + * Capture an ObjC method's declared return type as a bare class name, for the + * chained static-factory call mechanism (#750). `+ (Bar *)create` yields `Bar`; + * a nullability/ARC qualifier (`(nonnull instancetype)`, `(nullable Bar *)`) is + * skipped to reach the real type. `void` / `id` / `instancetype` / primitives + * yield undefined — for a class-message factory that means the receiver's type + * is the class itself (handled in resolution), so `[[X alloc] init]` and + * singleton chains still resolve. + */ +function extractObjcReturnType(node: SyntaxNode, source: string): string | undefined { + if (node.type !== 'method_definition' && node.type !== 'method_declaration') return undefined; + const methodType = node.namedChildren.find((c) => c.type === 'method_type'); + if (!methodType) return undefined; + const ids: string[] = []; + collectTypeIdentifiers(methodType, source, ids); + const name = ids.find((n) => !OBJC_TYPE_QUALIFIERS.has(n)); + if (!name || !/^[A-Za-z_]\w*$/.test(name) || name === 'void' || name === 'id' || name === 'instancetype') { + return undefined; + } + return name; +} + function extractObjcPropertyName(node: SyntaxNode, source: string): string | null { if (node.type !== 'property_declaration') return null; @@ -73,6 +113,7 @@ export const objcExtractor: LanguageExtractor = { nameField: 'declarator', bodyField: 'body', paramsField: 'parameters', + getReturnType: extractObjcReturnType, resolveName: extractObjcMethodName, extractPropertyName: extractObjcPropertyName, resolveBody: (node, bodyField) => { diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index 86c4c6d60..546c66dc9 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -2482,6 +2482,33 @@ export class TreeSitterExtractor { } else { calleeName = methodName; } + } else if (receiverField && receiverField.type === 'message_expression' && /^\w+$/.test(methodName)) { + // Chained message send `[[Foo create] doIt]` — the receiver is itself a + // class message. Recover the inner `Class.selector` and encode + // `Class.selector().doIt` so resolution infers doIt's class from what + // `Class.selector` RETURNS (#645/#608). Only a CLASS-factory chain + // (capitalized inner receiver); a unary outer selector is required + // because the chain resolver's method part is `\w+` (no `:`). An + // instance chain (`[[obj foo] bar]`, lowercase inner) stays bare. + const innerRecv = getChildByField(receiverField, 'receiver'); + const innerRecvName = innerRecv ? getNodeText(innerRecv, this.source) : ''; + if (innerRecv?.type === 'identifier' && /^[A-Z]/.test(innerRecvName)) { + const innerKw: string[] = []; + for (let i = 0; i < receiverField.namedChildCount; i++) { + if (receiverField.fieldNameForNamedChild(i) === 'method') { + const kw = receiverField.namedChild(i); + if (kw) innerKw.push(getNodeText(kw, this.source)); + } + } + let innerColon = false; + for (let i = 0; i < receiverField.childCount; i++) { + if (receiverField.child(i)?.type === ':') { innerColon = true; break; } + } + const innerSelector = innerColon ? innerKw.map((k) => `${k}:`).join('') : innerKw[0]; + calleeName = innerSelector ? `${innerRecvName}.${innerSelector}().${methodName}` : methodName; + } else { + calleeName = methodName; + } } else { calleeName = methodName; } diff --git a/src/resolution/index.ts b/src/resolution/index.ts index a8cdbe707..9435dac37 100644 --- a/src/resolution/index.ts +++ b/src/resolution/index.ts @@ -37,7 +37,7 @@ const SUPERTYPE_BEARING_KINDS = new Set([ * second pass. Dotted-receiver languages resolve via matchDottedCallChain; the * `::`-receiver ones (Rust) via matchScopedCallChain. */ -const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala', 'dart']); +const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala', 'dart', 'objc']); const SCOPED_CHAIN_LANGUAGES = new Set(['rust']); /** The extractor's chained-receiver encoding: `().`. */ diff --git a/src/resolution/name-matcher.ts b/src/resolution/name-matcher.ts index fff8219f5..19f0a7a70 100644 --- a/src/resolution/name-matcher.ts +++ b/src/resolution/name-matcher.ts @@ -673,7 +673,23 @@ export function matchDottedCallChain( const factoryMethod = inner.slice(lastDot + 1); if (!factoryClass || !factoryMethod) return null; const ret = lookupCalleeReturnType(`${factoryClass}::${factoryMethod}`, ref, context); - if (!ret) return null; + if (!ret) { + // Objective-C: a class-message factory — `[X alloc]`, `[X new]`, + // `[X sharedFoo]` — returns an instance of the RECEIVER class `X` by + // convention (`instancetype`). So when the factory's own return type isn't + // recoverable (its selector returns `instancetype`, or `alloc`/`new` aren't + // user-defined nodes at all), the receiver's type is the class `X` itself. + // This resolves the ubiquitous `[[X alloc] init]` and singleton chains. + // resolveMethodOnType validates against X (and its supertypes), so a class + // whose method actually lives elsewhere yields NO edge, not a wrong one — and + // crucially this does NOT fire when a concrete return type WAS captured but + // simply lacks the method (that already returned null above: absent-method + // safety, so a same-named decoy is still never matched). + if (ref.language === 'objc' && /^[A-Z]/.test(factoryClass)) { + return resolveMethodOnType(factoryClass, method, ref, context, 0.8, 'instance-method', importedFqnOf(factoryClass, ref, context)); + } + return null; + } return resolveMethodOnType(ret, method, ref, context, 0.85, 'instance-method', importedFqnOf(ret, ref, context)); } @@ -1123,11 +1139,12 @@ export function matchReference( } // 1d. Dotted chained static-factory / fluent call (Java / Kotlin / C# / Swift / - // Go / Scala / Dart) — `Foo.getInstance().bar()` encoded as `Foo.getInstance().bar`, - // Go's bare-factory `New().Method()` as `New().Method`, Scala's companion factory - // `Foo.create().bar()`, or Dart's static factory / factory-constructor - // `Foo.create().bar()` (#645/#608 mechanism). Resolve the method's class from the - // inner call's declared return type, then validate it. + // Go / Scala / Dart / Objective-C) — `Foo.getInstance().bar()` encoded as + // `Foo.getInstance().bar`, Go's bare-factory `New().Method()` as `New().Method`, + // Scala's companion factory, Dart's static factory / factory-constructor, or + // ObjC's chained message send `[[Foo create] doIt]` encoded as `Foo.create().doIt` + // (#645/#608 mechanism). Resolve the method's class from the inner call's + // declared return type, then validate it. if ( ref.language === 'java' || ref.language === 'kotlin' || @@ -1135,7 +1152,8 @@ export function matchReference( ref.language === 'swift' || ref.language === 'go' || ref.language === 'scala' || - ref.language === 'dart' + ref.language === 'dart' || + ref.language === 'objc' ) { result = matchDottedCallChain(ref, context); if (result) return result; From a4d19a5ed8b416c54a21c4f7daace660256a54f1 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 00:51:31 -0400 Subject: [PATCH 02/31] docs(design): record the chained static-factory call resolution mechanism (#750) (#787) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A checked-in design doc for the #645/#608/#750 chained-call mechanism — the permanent, discoverable record the work previously lacked (it lived only in git history, the tracking issue, and an untracked scratch handoff). Covers the 3-part mechanism, the three shared resolvers + receiver styles, the per-language coverage matrix (12 shipped with A/B results), the conformance pass, and the full 21-language README classification (incl. why TypeScript + Luau were skipped and Pascal is blocked). Co-authored-by: Claude Opus 4.8 (1M context) --- docs/design/chained-call-resolution.md | 145 +++++++++++++++++++++++++ 1 file changed, 145 insertions(+) create mode 100644 docs/design/chained-call-resolution.md diff --git a/docs/design/chained-call-resolution.md b/docs/design/chained-call-resolution.md new file mode 100644 index 000000000..4cf38ebef --- /dev/null +++ b/docs/design/chained-call-resolution.md @@ -0,0 +1,145 @@ +# Design + status: chained static-factory / fluent call resolution + +**Status:** SHIPPED for **11 languages** (C++, C, PHP, Java, Kotlin, C#, Swift, Rust, +Go, Scala, Dart, Objective-C) + a conformance pass. **TypeScript and Luau were evaluated +and intentionally skipped** (both gradually typed → the mechanism is +0 / regresses on +real code). **Pascal/Delphi** is blocked on a larger prerequisite (its method-call +extraction is broadly incomplete). See "Full README classification" below. Tracking +issue: **#750** (which began as "the statically-typed README languages" but that +enumeration was incomplete — it missed ObjC / Pascal / Luau). + +**Motivation:** a call whose **receiver is itself a call** — a factory / singleton / +builder that returns an object — should produce a `calls` edge to the chained method: + +```java +Foo.getInstance().bar(); // bar() should resolve to Foo::bar, never a same-named decoy +``` + +Before this work, every statically-typed language **dropped the receiver** and +name-matched the bare method (`bar`), so in 7 of 9 languages it silently attached to a +**same-named method on an unrelated type** — a correctness bug, not just missing coverage. + +--- + +## The 3-part mechanism (per language) + +1. **Capture the factory's declared return type** — a per-language `getReturnType` + hook writes `nodes.return_type` (schema v5). `*Foo`→`Foo`, `List`→`List`, + `pkg.Foo`→`Foo`, `-> Self` / `: self` / `this.type` → the declaring type. +2. **Preserve the chained receiver at extraction** — `tree-sitter.ts` (or a bespoke + extractor) encodes `Foo.getInstance().bar()` as the marker string + `Foo.getInstance().bar` (the `().` marker never appears in an ordinary ref). A + per-language gate keeps **instance** chains (`list.map().filter()`) bare so their + existing resolution is untouched — only capitalized-receiver / factory chains re-encode. +3. **Resolve AND VALIDATE** — at resolution the receiver's type is inferred from what + the inner call returns, then the outer method is resolved **on that type** and + validated: the method must exist on the type (or a supertype it conforms to), so a + wrong inference yields **no edge**, never a wrong one. + +Three shared resolvers in `src/resolution/name-matcher.ts`, all calling +`resolveMethodOnType` (which has the conformance supertype-walk): + +| Resolver | Receiver style | Languages | +|---|---|---| +| `matchCppCallChain` | `field_expression` (`Foo::instance().bar`) | C++, C | +| `matchScopedCallChain` | `::` (`Cls::for($x)->m`, `Foo::new().bar`) | PHP, Rust | +| `matchDottedCallChain` | `.` (`Foo.create().bar`) | Java, Kotlin, C#, Swift, Go, Scala, Dart | + +**Conformance pass (#754).** When the chained method lives on a **supertype** the +return type conforms to (an inherited / default-interface / trait / mixin / embedded +method), the first pass can't see it — `implements`/`extends` edges aren't built yet. +So failed chain refs are deferred (`CHAIN_LANGUAGES` in `resolution/index.ts`) and +re-resolved in a second pass `resolveChainedCallsViaConformance()` after edges exist, +walking `context.getSupertypes(...)`. + +**Adding a language:** `getReturnType` in `languages/*.ts`; encode the chained receiver ++ a node-type gate; add the language to the right `matchReference` gate (and +`CONSTRUCTS_VIA_BARE_CALL` if a bare capitalized call constructs the class); add to +`CHAIN_LANGUAGES`; synthetic tests + a real-repo A/B; bump `EXTRACTION_VERSION`. + +--- + +## Coverage (validated — each via synthetic decoy/absent-method tests + a real-repo A/B) + +| Language | PR | Receiver | Real-repo A/B (unique `calls` edges) | Notes | +|---|---|---|---|---| +| **C++ / C** | #645 (#742) | `field_expression` | — | The original: singletons / factories / chained getters. | +| **PHP** | #608 (#749) | `::` → `->` | — | `Cls::for($x)->method()` — the Laravel per-tenant client idiom. `: self`/`: static`. | +| **Java** | #751 | `.` | Guava **+1,507 / −0** | Missing-edge → purely additive. | +| **Kotlin** | #752 | `.` | arrow **+49 / −438** | Wrong-edge → precision win (438 removed = test/doc noise + wrong). Needed the capitalized-receiver gate + constructor-receiver handling. | +| **C#** | #753 | `.` | Newtonsoft +3 / NodaTime **+73 / −0** | Additive. Return type is the `returns` field; extension-method chains correctly don't resolve. | +| **conformance** | #754 | (resolver upgrade) | arrow **+22 / −0** | Supertype walk — enables Swift protocol-ext, Rust trait, Go embedded, Dart mixin, Java/Kotlin/C# inherited chains. | +| **Swift** | #755 | `.` | Alamofire / Kingfisher **0 / 0** | Neutral-safe (unique fluent names already bare-resolved). Needed a nested-extension naming fix (`KF.Builder`→`KF::Builder`). | +| **Rust** | #757 | `::` | clap **+937 / −775** | Precision win (622 wrong→right retargets, +162 net). `-> Self`; trait-default methods via conformance. Single-hop. | +| **Go** | #760 | `.` | gin **net-zero** | `New().Method()`; embedded structs via conformance. Variable-inner fallback. **Found + fixed a batched-resolver runaway** (a mutated `original.referenceName` looped the offset-0 batch → 5M edges / 1.4 GB; fixed by tying the fallback to the original ref + a non-progress guard). | +| **Scala** | #761 | `.` | gatling **+14 / −59** | Precision win (−59 = stdlib `Option`/`Iterator` `.map`/`.flatMap` the baseline mis-tied to gatling's `Validation::*`). Companion factories + case-class `apply`. | +| **Dart** | #762 | `.` | localsend hand-written **+17 / −10** | Precision win **+ constructors made first-class** (factory/named ctors `Foo.create()`/`Foo._()` are now indexed; unnamed `Foo()` stays `instantiates`). `dartCtorInfo` validates a ctor against the enclosing class name — handles a tree-sitter misparse where `@override (A,B) m()` makes `m()` look like a ctor. | +| **Objective-C** | #786 | message send | SDWebImage **+35 / −75** | Precision win. Chained message send `[[Foo create] doIt]` over `message_expression`. getReturnType skips nullability qualifiers (`nonnull instancetype`). A class-message factory returns the receiver class by convention, so `[[X alloc] init]` / singleton chains resolve on `X` (validated). The −75 are wrong `init` mis-matches retargeted to the right class. | +| **TypeScript** | — | `.` | typeorm +0/−6 · nest **+0/−164** | **Evaluated, NOT shipped** — gradual typing; see below. | +| **Luau** | — | `:` / `.` | Fusion +0/−0 · matter +0/−0 | **Evaluated, NOT shipped** — gradually typed; additive-safe (missing-edge gap, no regression) but real Luau rarely annotates factory returns, so +0 on both benchmarks. Works for `Foo.create(): Bar` then `:doIt()` (synthetic). | + +`EXTRACTION_VERSION` is now **15** (C++→…→Dart→Objective-C). Re-index with `codegraph index -f` +to pick up the newer extractor on an existing graph. + +## Why TypeScript was skipped + +The mechanism resolves a chain from the factory's **declared** return type. TypeScript +leans on **type inference** — e.g. NestJS's `Test.createTestingModule(m) { return new +TestingModuleBuilder(...) }` has no `: TestingModuleBuilder` annotation — so the +factory's type can't be recovered, the re-encoded chain can't resolve, and it **drops +the bare-name edge** the existing resolver found. Real-repo A/B was **+0 added on both +typeorm and nest** with a net recall regression (−164 on nest, mostly the ubiquitous +`Test.createTestingModule({…}).compile()` pattern). The removed edges were mostly +*wrong* (baseline mis-resolved `.compile()` to `ModuleCompiler::compile`), so it's +precision-positive but recall-negative — against the recall-first invariant, and adding +nothing where it doesn't hurt (TS method names are unique enough that bare-name already +lands them). It was fully implemented (5 synthetic tests passed, runaway-safe bare-name +fallback) and consciously not shipped. The only path to a TS win would be reading +**inferred** return types (resolving `return new X()` in the factory body) — a much +larger change. Full write-up on issue #750. + +--- + +## Full README classification (all 21 languages) + +The mechanism's real requirement is a **declared return type** to recover the receiver's +type — not "statically typed" (PHP qualifies via its `: self` / `: Type` return +declarations). Against the README's full supported-language list: + +| Bucket | Languages | +|---|---| +| **Covered** (12) | C++, C, PHP, Java, Kotlin, C#, Swift, Rust, Go, Scala, Dart, Objective-C | +| **Evaluated, skipped** (2) | **TypeScript** — gradual typing → inference-typed factories can't be recovered; net recall regression. **Luau** — gradually typed; additive-safe but +0 on Fusion AND matter (real Luau rarely annotates factory returns). Both: the mechanism needs reliably-declared return types, which gradually-typed code too often omits. | +| **Blocked by a prerequisite** (1) | **Pascal/Delphi** — statically typed (so the mechanism *would* pay off), but its method-call extraction from procedure bodies is broadly incomplete: paren-less calls (`TFoo.GetInstance.DoIt`) parse as a bare `exprDot` (not in `callTypes`), and even paren'd calls (`f.Regular()`) produce no edge (no receiver-type tracking for Pascal locals). Building Pascal's call graph is a substantial standalone extractor effort; the chained-call port is a small part of it. Separate follow-up. | +| **Out of scope — no declared return types** (6) | JavaScript, Ruby, Lua, Svelte, Vue, Liquid (Liquid has no methods/chains at all) | +| **Partial / separate** (1) | Python — only optional `-> T` hints; tracked as #578, not part of this mechanism | + +So #750's original framing ("the 9 statically-typed README languages") was incomplete — +it missed three more typed languages. Resolved: **Objective-C** shipped (#786, same +wrong-edge gap, mechanism ports directly); **Luau** evaluated and skipped (gradual +typing → +0 on real repos, additive-safe); **Pascal** is gated on unrelated extractor +work (its call graph is broadly incomplete). + +The through-line: this mechanism fits languages with **reliably-declared return types** +(the 12 shipped). Gradually-typed languages (TypeScript, Luau) omit them too often for +it to pay off, and dynamically-typed languages have none. + +--- + +## Edge cases / model +- **Single-hop**: a chain re-encodes one hop; deeper hops (`a.b().c().d()`) keep the + bare name (the inner `()` defeats the `Class::method` split). Re-measure on deep + fluent-builder repos. +- **Validation, not guessing**: every resolver ends in `resolveMethodOnType`, so an + unknown / wrong inferred type produces **no edge** — the decoy / absent-method + guarantee that makes this safe to ship. +- **Per-language receiver gate** keeps instance chains bare so existing resolution is + never regressed; the A/B "removed" counts are wrong-edge corrections, not losses. + +## Related work +- **Dynamic-dispatch / callback synthesis** (a *different* mechanism): observer / + EventEmitter / React-render / JSX-child / django-ORM edge synthesis lives in + `callback-edge-synthesis.md` + `dynamic-dispatch-coverage-playbook.md`. +- The verbose session working-notes for #750 are in + `.claude/handoffs/chained-call-multilang-probe.md` (scratch; this doc is the + permanent record). From af56f3539d16be4dcdbc3d97696815d5d6135dd9 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 08:37:04 -0400 Subject: [PATCH 03/31] fix(pascal): resolve chained factory calls TFoo.GetInstance().DoIt() (#750) (#791) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ports the #645/#608 chained-receiver mechanism to Pascal/Delphi — which I'd previously mis-scoped as blocked. The paren'd chained form extracts fine; it just hit the chained-call gap like the others (with a decoy, `TFoo.GetInstance().DoIt()` mis-resolved to a same-named method on an unrelated class). - pascal.ts: getReturnType reads the method's `typeref` (a `function GetInstance: TBar` returns TBar; an interface return `IFoo` is captured too). - tree-sitter.ts: extractPascalCall now re-encodes a chained call `TFoo.GetInstance().DoIt` (the exprDot's receiver is an exprCall) instead of collapsing it to bare `DoIt`. Gated on the Delphi type-naming convention (`TFoo`/`IFoo`) so a capitalized VARIABLE chain (Pascal capitalizes locals too — `Curve.X().Y()`, `Self.X().Y()`) stays bare and keeps its existing bare-name resolution. - name-matcher.ts: `pascal` joins the dotted-chain gate + CHAIN_LANGUAGES + CONSTRUCTS_VIA_BARE_CALL (a `TFoo(x)` typecast yields a TFoo). When the factory's return type wasn't captured (a `constructor Create` has no `: TBar` but returns its class), resolve the method on the factory class itself. resolveMethodOnType validates, so a wrong inference yields no edge. Validation: 4 synthetic tests (factory+decoy, constructor chain, typecast chain, absent-method safety). Real-repo A/B on PascalCoin (772 files): +19 / -18 — 15 of the -18 are correct class→interface retargets (`GetInstance(): IAsn1OctetString` resolves `.GetOctets` on the declared interface, not baseline's concrete-class guess); 3 are negligible drops (0.02%). EXTRACTION_VERSION 15->16. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/resolution.test.ts | 135 +++++++++++++++++++++++++++ src/extraction/extraction-version.ts | 2 +- src/extraction/languages/pascal.ts | 10 ++ src/extraction/tree-sitter.ts | 41 ++++++-- src/resolution/index.ts | 2 +- src/resolution/name-matcher.ts | 21 ++++- 7 files changed, 201 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fe1778ca2..dd60c4af4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -34,6 +34,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - Rust method calls made through a chained associated function now resolve to the correct type. A call like `Foo::new().bar()` or `Foo::with(cfg).build()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — or didn't resolve. CodeGraph now captures Rust return types (`-> Self` resolves to the implementing type), infers the chained receiver's type from what the associated function returns, and resolves the method on it — including methods provided by a trait the type implements (via the new `impl Trait for Type` relationships) — creating the edge only when the type or one of its traits genuinely has the method. Existing Rust indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Rust) - Dart method calls made through a static factory, a factory or named constructor, or a fluent chain now resolve to the correct type. A call like `Foo.create().bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.where` onto your own same-named class. CodeGraph now indexes Dart **factory and named constructors** (`factory Foo.create()`, `Foo.named()`) as first-class members so calls to them resolve, captures Dart return types (a generic `List` resolves to its container `List`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a superclass or mixin — creating the edge only when that type genuinely has the method. Plain construction (`Foo(...)`) is still recorded as instantiation. Existing Dart indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Dart) - Objective-C methods called through a chained message send now resolve to the correct class. A call like `[[Foo create] doIt]` used to drop the receiver, so `doIt` silently attached to a same-named method on an unrelated class — most often a test helper or stdlib class. CodeGraph now captures Objective-C method return types and infers the chained receiver's type from what the inner message returns. For the ubiquitous `[[X alloc] init]` and singleton (`[[X sharedInstance] …]`) patterns — where the factory returns `instancetype` — the receiver is the class `X` itself, so the chained method resolves on `X` (including methods inherited from a superclass), creating the edge only when the class genuinely has the method. Existing Objective-C indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Objective-C) +- Pascal/Delphi methods called through a chained factory call now resolve to the correct class. A call like `TFoo.GetInstance().DoIt()` used to drop the receiver, so `DoIt` silently attached to a same-named method on an unrelated class. CodeGraph now captures Pascal return types and infers the chained receiver's type from what the factory function returns — resolving to the declared type (including an interface return like `IFoo`), and for a constructor (`TFoo.Create().…`) or a typecast (`TFoo(x).…`) to the class `TFoo` itself, since both yield a `TFoo`. The edge is created only when that type genuinely has the method (so a wrong inference produces no edge). Existing Pascal/Delphi indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Pascal/Delphi) - Chained method calls now resolve when the chained method is **inherited from a superclass or declared on an interface/protocol** the receiver's type conforms to — for example a call on a sealed-subclass instance (`Either.Right(x).combine(...)`) that invokes a method defined on its parent type. Previously these chains found no caller edge even though the factory's type was known, so the call was invisible to callers, impact, and trace. CodeGraph now walks the type's supertypes (its `extends` / `implements` relationships) to find the method, creating the edge only when a supertype genuinely declares it (so a wrong inference still produces no edge). This makes Java, Kotlin, and C# factory and fluent chains more complete. Existing indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) - Swift method calls made through a static factory, fluent chain, or constructor now resolve to the correct class. A call like `Foo.make().draw()` or `Foo().draw()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all. CodeGraph now captures Swift return types and infers the chained receiver's type from what the inner call returns (or the constructed type), creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Swift indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Swift) - C# method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.Create().Bar()` or `JObject.Parse(s).Property(...)` used to lose the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. CodeGraph now captures C# return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge). Existing C# indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (C#) diff --git a/__tests__/resolution.test.ts b/__tests__/resolution.test.ts index 868e9b07a..f33197eda 100644 --- a/__tests__/resolution.test.ts +++ b/__tests__/resolution.test.ts @@ -3131,4 +3131,139 @@ void run() { expect(callerNamesOf('Decoy::clearAll')).toEqual([]); }); }); + + describe('Pascal/Delphi chained static-factory call resolution (#645/#608 mechanism)', () => { + function callerNamesOf(qualifiedName: string): string[] { + const target = cg.getNodesByKind('method').find((n) => n.qualifiedName === qualifiedName); + if (!target) return []; + const names = cg + .getIncomingEdges(target.id) + .filter((e) => e.kind === 'calls') + .map((e) => cg.getNode(e.source)?.name) + .filter((n): n is string => !!n); + return [...new Set(names)].sort(); + } + function isCalled(qn: string): boolean { + const t = cg.getNodesByKind('method').find((n) => n.qualifiedName === qn); + return !!t && cg.getIncomingEdges(t.id).some((e) => e.kind === 'calls'); + } + + it('resolves a chained factory call TFoo.GetInstance().DoIt() via the return type, never a same-named decoy', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TBar = class + procedure DoIt; + end; + TDecoy = class + procedure DoIt; + end; + TFoo = class + class function GetInstance: TBar; + end; +implementation +procedure TBar.DoIt; begin end; +procedure TDecoy.DoIt; begin end; +class function TFoo.GetInstance: TBar; begin Result := nil; end; +procedure Run; +begin + TFoo.GetInstance().DoIt(); +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(isCalled('TBar::DoIt')).toBe(true); + expect(isCalled('TDecoy::DoIt')).toBe(false); + }); + + it('resolves a constructor chain TFoo.Create().Configure() on the constructed class', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TFoo = class + constructor Create; + procedure Configure; + end; + TDecoy = class + procedure Configure; + end; +implementation +constructor TFoo.Create; begin end; +procedure TFoo.Configure; begin end; +procedure TDecoy.Configure; begin end; +procedure Run; +begin + TFoo.Create().Configure(); +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + // A constructor returns its own class (no `: TBar` annotation), so Configure + // resolves on TFoo, not the same-named decoy. + expect(isCalled('TFoo::Configure')).toBe(true); + expect(isCalled('TDecoy::Configure')).toBe(false); + }); + + it('resolves a typecast chain TFoo(x).DoIt() on the cast type', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TFoo = class + procedure DoIt; + end; + TDecoy = class + procedure DoIt; + end; +implementation +procedure TFoo.DoIt; begin end; +procedure TDecoy.DoIt; begin end; +procedure Run(obj: TObject); +begin + TFoo(obj).DoIt(); +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(isCalled('TFoo::DoIt')).toBe(true); + expect(isCalled('TDecoy::DoIt')).toBe(false); + }); + + it('creates NO edge when the factory return type lacks the method (silent miss)', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TBar = class + end; + TOther = class + procedure OnlyOther; + end; + TFoo = class + class function GetInstance: TBar; + end; +implementation +procedure TOther.OnlyOther; begin end; +class function TFoo.GetInstance: TBar; begin Result := nil; end; +procedure Run; +begin + TFoo.GetInstance().OnlyOther(); +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + // TBar has no OnlyOther — must not mis-attach to the same-named TOther::OnlyOther. + expect(isCalled('TOther::OnlyOther')).toBe(false); + }); + }); }); diff --git a/src/extraction/extraction-version.ts b/src/extraction/extraction-version.ts index 2aba578ae..1b847d000 100644 --- a/src/extraction/extraction-version.ts +++ b/src/extraction/extraction-version.ts @@ -21,4 +21,4 @@ * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty * in the product is load-bearing"). */ -export const EXTRACTION_VERSION = 15; +export const EXTRACTION_VERSION = 16; diff --git a/src/extraction/languages/pascal.ts b/src/extraction/languages/pascal.ts index aed6a59fe..004dadc83 100644 --- a/src/extraction/languages/pascal.ts +++ b/src/extraction/languages/pascal.ts @@ -17,6 +17,16 @@ export const pascalExtractor: LanguageExtractor = { bodyField: 'body', paramsField: 'args', returnField: 'type', + // Pascal/Delphi `function GetInstance: TBar` — the return type is a `typeref` + // child. Capture its bare class name for the chained static-factory call + // mechanism (#750). A procedure (no return) has no typeref → undefined. + getReturnType: (node, source) => { + const typeref = node.namedChildren.find((c: SyntaxNode) => c.type === 'typeref'); + if (!typeref) return undefined; + const id = typeref.namedChildren.find((c: SyntaxNode) => c.type === 'identifier') ?? typeref; + const name = getNodeText(id, source).trim(); + return /^[A-Za-z_]\w*$/.test(name) ? name : undefined; + }, getSignature: (node, source) => { const args = getChildByField(node, 'args'); const returnType = node.namedChildren.find( diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index 546c66dc9..253bc3af9 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -4312,12 +4312,41 @@ export class TreeSitterExtractor { let calleeName = ''; if (firstChild.type === 'exprDot') { - // Qualified call: Obj.Method(...) - const identifiers = firstChild.namedChildren.filter( - (c: SyntaxNode) => c.type === 'identifier' - ); - if (identifiers.length > 0) { - calleeName = identifiers.map((id: SyntaxNode) => getNodeText(id, this.source)).join('.'); + // Chained static-factory call: `TFoo.GetInstance().DoIt()` — the exprDot's + // receiver is itself an `exprCall`, so the bare identifier list would + // collapse to just `DoIt` and mis-resolve to a same-named method on an + // unrelated class. Encode `TFoo.GetInstance().DoIt` so resolution infers + // DoIt's class from what `TFoo.GetInstance` RETURNS (#645/#608). Only a + // capitalized class-factory chain; a unary outer method. + const innerCall = firstChild.namedChildren.find((c: SyntaxNode) => c.type === 'exprCall'); + const outerId = firstChild.namedChildren.filter((c: SyntaxNode) => c.type === 'identifier').pop(); + const method = outerId ? getNodeText(outerId, this.source) : ''; + if (innerCall && method && /^\w+$/.test(method)) { + const innerFirst = innerCall.namedChild(0); + let innerCallee = ''; + if (innerFirst?.type === 'exprDot') { + innerCallee = innerFirst.namedChildren + .filter((c: SyntaxNode) => c.type === 'identifier') + .map((id: SyntaxNode) => getNodeText(id, this.source)) + .join('.'); + } else if (innerFirst?.type === 'identifier') { + innerCallee = getNodeText(innerFirst, this.source); + } + // Gate on the Delphi type-naming convention — `TFoo` classes / `IFoo` + // interfaces — so a class-factory chain re-encodes but a capitalized + // VARIABLE/parameter chain (Pascal capitalizes locals too: `Curve.X().Y()`, + // `Self.X().Y()`) stays bare and keeps its existing bare-name resolution. + calleeName = innerCallee && /^[TI][A-Z]/.test(innerCallee) + ? `${innerCallee}().${method}` + : method; + } else { + // Qualified call: Obj.Method(...) + const identifiers = firstChild.namedChildren.filter( + (c: SyntaxNode) => c.type === 'identifier' + ); + if (identifiers.length > 0) { + calleeName = identifiers.map((id: SyntaxNode) => getNodeText(id, this.source)).join('.'); + } } } else if (firstChild.type === 'identifier') { calleeName = getNodeText(firstChild, this.source); diff --git a/src/resolution/index.ts b/src/resolution/index.ts index 9435dac37..96484001e 100644 --- a/src/resolution/index.ts +++ b/src/resolution/index.ts @@ -37,7 +37,7 @@ const SUPERTYPE_BEARING_KINDS = new Set([ * second pass. Dotted-receiver languages resolve via matchDottedCallChain; the * `::`-receiver ones (Rust) via matchScopedCallChain. */ -const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala', 'dart', 'objc']); +const CHAIN_LANGUAGES = new Set(['java', 'kotlin', 'csharp', 'swift', 'rust', 'go', 'scala', 'dart', 'objc', 'pascal']); const SCOPED_CHAIN_LANGUAGES = new Set(['rust']); /** The extractor's chained-receiver encoding: `().`. */ diff --git a/src/resolution/name-matcher.ts b/src/resolution/name-matcher.ts index 19f0a7a70..b1280a78f 100644 --- a/src/resolution/name-matcher.ts +++ b/src/resolution/name-matcher.ts @@ -603,9 +603,11 @@ export function matchScopedCallChain( * so a bare `Foo()` there is a method call, not construction — excluded. Scala's * `Foo(args)` is a case-class / companion `apply`, which conventionally returns * `Foo` — and resolveMethodOnType validates, so a non-conventional `apply` that - * returns another type simply yields no edge rather than a wrong one. + * returns another type simply yields no edge rather than a wrong one. Pascal/Delphi: + * a `TFoo(x)` is a TYPECAST whose result is a `TFoo`, so `TFoo(x).method()` resolves + * the method on `TFoo` — same shape, same validation. */ -const CONSTRUCTS_VIA_BARE_CALL = new Set(['kotlin', 'swift', 'scala', 'dart']); +const CONSTRUCTS_VIA_BARE_CALL = new Set(['kotlin', 'swift', 'scala', 'dart', 'pascal']); /** * Resolve a dotted chained call whose receiver is a static factory / fluent call — @@ -688,6 +690,18 @@ export function matchDottedCallChain( if (ref.language === 'objc' && /^[A-Z]/.test(factoryClass)) { return resolveMethodOnType(factoryClass, method, ref, context, 0.8, 'instance-method', importedFqnOf(factoryClass, ref, context)); } + // Pascal/Delphi: the extractor only re-encodes a `TFoo`/`IFoo`-prefixed chain + // (the type-naming convention), so `factoryClass` is always a real class here. + // A factory whose return type wasn't captured is a CONSTRUCTOR + // (`TFileMem.Create().SetCachePerformance` — `constructor Create` has no `: + // TBar` annotation but returns its own class) or an unannotated function. In + // both cases the receiver's type is the class itself, so resolve the method on + // `factoryClass`. resolveMethodOnType validates against it (and its + // supertypes), so a wrong inference yields no edge — and this never fires when + // a return type WAS captured but lacks the method (absent-method safety above). + if (ref.language === 'pascal' && /^[TI]/.test(factoryClass)) { + return resolveMethodOnType(factoryClass, method, ref, context, 0.8, 'instance-method', importedFqnOf(factoryClass, ref, context)); + } return null; } return resolveMethodOnType(ret, method, ref, context, 0.85, 'instance-method', importedFqnOf(ret, ref, context)); @@ -1153,7 +1167,8 @@ export function matchReference( ref.language === 'go' || ref.language === 'scala' || ref.language === 'dart' || - ref.language === 'objc' + ref.language === 'objc' || + ref.language === 'pascal' ) { result = matchDottedCallChain(ref, context); if (result) return result; From 4c35b72136ba58febdaefbee2c395939e11288f3 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 08:39:34 -0400 Subject: [PATCH 04/31] =?UTF-8?q?docs(design):=20Pascal/Delphi=20chained?= =?UTF-8?q?=20calls=20shipped=20(#791)=20=E2=80=94=2013=20languages=20(#75?= =?UTF-8?q?0)=20(#792)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates the chained-call design doc: Pascal moves from "blocked" to covered (#791) — the earlier "blocked" read was wrong, caused by probing only the paren-less form. 13 languages now shipped; EXTRACTION_VERSION 16. Co-authored-by: Claude Opus 4.8 (1M context) --- docs/design/chained-call-resolution.md | 31 +++++++++++++------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/docs/design/chained-call-resolution.md b/docs/design/chained-call-resolution.md index 4cf38ebef..8485a02f7 100644 --- a/docs/design/chained-call-resolution.md +++ b/docs/design/chained-call-resolution.md @@ -1,12 +1,11 @@ # Design + status: chained static-factory / fluent call resolution -**Status:** SHIPPED for **11 languages** (C++, C, PHP, Java, Kotlin, C#, Swift, Rust, -Go, Scala, Dart, Objective-C) + a conformance pass. **TypeScript and Luau were evaluated -and intentionally skipped** (both gradually typed → the mechanism is +0 / regresses on -real code). **Pascal/Delphi** is blocked on a larger prerequisite (its method-call -extraction is broadly incomplete). See "Full README classification" below. Tracking -issue: **#750** (which began as "the statically-typed README languages" but that -enumeration was incomplete — it missed ObjC / Pascal / Luau). +**Status:** SHIPPED for **13 languages** (C++, C, PHP, Java, Kotlin, C#, Swift, Rust, +Go, Scala, Dart, Objective-C, Pascal/Delphi) + a conformance pass. **TypeScript and Luau +were evaluated and intentionally skipped** (both gradually typed → the mechanism is +0 / +regresses on real code). See "Full README classification" below. Tracking issue: +**#750** (which began as "the statically-typed README languages" but that enumeration was +incomplete — it missed ObjC / Pascal / Luau). **Motivation:** a call whose **receiver is itself a call** — a factory / singleton / builder that returns an object — should produce a `calls` edge to the chained method: @@ -75,10 +74,11 @@ walking `context.getSupertypes(...)`. | **Scala** | #761 | `.` | gatling **+14 / −59** | Precision win (−59 = stdlib `Option`/`Iterator` `.map`/`.flatMap` the baseline mis-tied to gatling's `Validation::*`). Companion factories + case-class `apply`. | | **Dart** | #762 | `.` | localsend hand-written **+17 / −10** | Precision win **+ constructors made first-class** (factory/named ctors `Foo.create()`/`Foo._()` are now indexed; unnamed `Foo()` stays `instantiates`). `dartCtorInfo` validates a ctor against the enclosing class name — handles a tree-sitter misparse where `@override (A,B) m()` makes `m()` look like a ctor. | | **Objective-C** | #786 | message send | SDWebImage **+35 / −75** | Precision win. Chained message send `[[Foo create] doIt]` over `message_expression`. getReturnType skips nullability qualifiers (`nonnull instancetype`). A class-message factory returns the receiver class by convention, so `[[X alloc] init]` / singleton chains resolve on `X` (validated). The −75 are wrong `init` mis-matches retargeted to the right class. | +| **Pascal/Delphi** | #791 | `.` (`exprDot`) | PascalCoin **+19 / −18** | Precision win. `TFoo.GetInstance().DoIt()` over Pascal's `exprCall`/`exprDot`. getReturnType from the `typeref` (incl. interface returns `IFoo`). Re-encoding gated on the Delphi `TFoo`/`IFoo` type convention so capitalized *variable* chains stay bare. A constructor (no `: TBar`) or typecast `TFoo(x)` resolves on the class. 15 of the −18 are correct class→interface retargets (`GetInstance(): IAsn1OctetString`). | | **TypeScript** | — | `.` | typeorm +0/−6 · nest **+0/−164** | **Evaluated, NOT shipped** — gradual typing; see below. | | **Luau** | — | `:` / `.` | Fusion +0/−0 · matter +0/−0 | **Evaluated, NOT shipped** — gradually typed; additive-safe (missing-edge gap, no regression) but real Luau rarely annotates factory returns, so +0 on both benchmarks. Works for `Foo.create(): Bar` then `:doIt()` (synthetic). | -`EXTRACTION_VERSION` is now **15** (C++→…→Dart→Objective-C). Re-index with `codegraph index -f` +`EXTRACTION_VERSION` is now **16** (C++→…→Objective-C→Pascal). Re-index with `codegraph index -f` to pick up the newer extractor on an existing graph. ## Why TypeScript was skipped @@ -108,20 +108,21 @@ declarations). Against the README's full supported-language list: | Bucket | Languages | |---|---| -| **Covered** (12) | C++, C, PHP, Java, Kotlin, C#, Swift, Rust, Go, Scala, Dart, Objective-C | +| **Covered** (13) | C++, C, PHP, Java, Kotlin, C#, Swift, Rust, Go, Scala, Dart, Objective-C, Pascal/Delphi | | **Evaluated, skipped** (2) | **TypeScript** — gradual typing → inference-typed factories can't be recovered; net recall regression. **Luau** — gradually typed; additive-safe but +0 on Fusion AND matter (real Luau rarely annotates factory returns). Both: the mechanism needs reliably-declared return types, which gradually-typed code too often omits. | -| **Blocked by a prerequisite** (1) | **Pascal/Delphi** — statically typed (so the mechanism *would* pay off), but its method-call extraction from procedure bodies is broadly incomplete: paren-less calls (`TFoo.GetInstance.DoIt`) parse as a bare `exprDot` (not in `callTypes`), and even paren'd calls (`f.Regular()`) produce no edge (no receiver-type tracking for Pascal locals). Building Pascal's call graph is a substantial standalone extractor effort; the chained-call port is a small part of it. Separate follow-up. | +| **Known limitation (not blocking)** | **Pascal/Delphi** is shipped (#791), but only the **paren'd** chain `TFoo.GetInstance().DoIt()` is covered — the **paren-less** form `TFoo.GetInstance.DoIt` parses as a bare `exprDot` (not in `callTypes`) and isn't extracted as a call at all. Emitting paren-less method calls is a separate extractor follow-up (and a broader Pascal-coverage win independent of chains). | | **Out of scope — no declared return types** (6) | JavaScript, Ruby, Lua, Svelte, Vue, Liquid (Liquid has no methods/chains at all) | | **Partial / separate** (1) | Python — only optional `-> T` hints; tracked as #578, not part of this mechanism | So #750's original framing ("the 9 statically-typed README languages") was incomplete — -it missed three more typed languages. Resolved: **Objective-C** shipped (#786, same -wrong-edge gap, mechanism ports directly); **Luau** evaluated and skipped (gradual -typing → +0 on real repos, additive-safe); **Pascal** is gated on unrelated extractor -work (its call graph is broadly incomplete). +it missed three more typed languages, all now resolved: **Objective-C** shipped (#786, +same wrong-edge gap, mechanism ports directly); **Pascal/Delphi** shipped (#791, a clean +port for the paren'd chain — an initial "blocked" read was wrong, caused by probing only +the paren-less form); **Luau** evaluated and skipped (gradual typing → +0 on real repos, +additive-safe). The through-line: this mechanism fits languages with **reliably-declared return types** -(the 12 shipped). Gradually-typed languages (TypeScript, Luau) omit them too often for +(the 13 shipped). Gradually-typed languages (TypeScript, Luau) omit them too often for it to pay off, and dynamically-typed languages have none. --- From 35dce04e1fe28c0ff187bb3b10efb203815a4a90 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 08:54:17 -0400 Subject: [PATCH 05/31] feat(pascal): extract paren-less method calls (Obj.Free; / TFoo.GetInstance.DoIt;) (#793) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pascal/Delphi lets a no-arg method or procedure drop its parens, so the call parses as a bare `exprDot` (not an `exprCall`) and was never recorded as a call — callers/impact/trace missed all of them (e.g. `Obj.Free`, `List.Clear`, the paren-less factory chain `TFoo.GetInstance.DoIt`). extractPascalParenlessCall handles these, wired into visitPascalBlock scoped to STATEMENT position only: a bare `Obj.Field;` statement is a no-op, so a statement-level dot expression is a call — but a dot in assignment LHS/RHS or a condition is left alone, since there it's genuinely ambiguous with a field/property access. The chained paren-less form reuses the #750 chain encoding (gated on the Delphi `TFoo`/`IFoo` type convention) and resolves the same way. PascalCoin A/B: +1131 / -1 — purely additive, and all 1131 new edges resolve to METHOD nodes (zero field/property false positives, confirming the statement-level gate). 3 new synthetic tests (paren-less call, paren-less chained factory, and the property-write/read non-extraction guard). EXTRACTION_VERSION 16->17. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/resolution.test.ts | 87 ++++++++++++++++++++++++++++ src/extraction/extraction-version.ts | 2 +- src/extraction/tree-sitter.ts | 80 +++++++++++++++++++++++-- 4 files changed, 164 insertions(+), 6 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index dd60c4af4..19f406db4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -35,6 +35,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - Dart method calls made through a static factory, a factory or named constructor, or a fluent chain now resolve to the correct type. A call like `Foo.create().bar()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated type — most often mis-attributing a standard-library `Option` / `Iterator` `.map` / `.where` onto your own same-named class. CodeGraph now indexes Dart **factory and named constructors** (`factory Foo.create()`, `Foo.named()`) as first-class members so calls to them resolve, captures Dart return types (a generic `List` resolves to its container `List`), infers the chained receiver's type from what the inner call returns or constructs, and resolves the method on it — including methods inherited from a superclass or mixin — creating the edge only when that type genuinely has the method. Plain construction (`Foo(...)`) is still recorded as instantiation. Existing Dart indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Dart) - Objective-C methods called through a chained message send now resolve to the correct class. A call like `[[Foo create] doIt]` used to drop the receiver, so `doIt` silently attached to a same-named method on an unrelated class — most often a test helper or stdlib class. CodeGraph now captures Objective-C method return types and infers the chained receiver's type from what the inner message returns. For the ubiquitous `[[X alloc] init]` and singleton (`[[X sharedInstance] …]`) patterns — where the factory returns `instancetype` — the receiver is the class `X` itself, so the chained method resolves on `X` (including methods inherited from a superclass), creating the edge only when the class genuinely has the method. Existing Objective-C indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Objective-C) - Pascal/Delphi methods called through a chained factory call now resolve to the correct class. A call like `TFoo.GetInstance().DoIt()` used to drop the receiver, so `DoIt` silently attached to a same-named method on an unrelated class. CodeGraph now captures Pascal return types and infers the chained receiver's type from what the factory function returns — resolving to the declared type (including an interface return like `IFoo`), and for a constructor (`TFoo.Create().…`) or a typecast (`TFoo(x).…`) to the class `TFoo` itself, since both yield a `TFoo`. The edge is created only when that type genuinely has the method (so a wrong inference produces no edge). Existing Pascal/Delphi indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Pascal/Delphi) +- Pascal/Delphi **paren-less method calls are now tracked**. Pascal lets a no-argument method or procedure drop its parentheses (`Obj.Free;`, `List.Clear;`, `TFoo.GetInstance.DoIt;`), which previously weren't recorded as calls at all — so callers, impact, and trace missed them. CodeGraph now extracts these, scoped to statement position so a field or property access (which looks identical) is never mistaken for a call. On a real Delphi codebase this added ~1,100 previously-missing call edges with no false positives. Existing Pascal/Delphi indexes should be re-indexed (`codegraph index -f`) to benefit. (Pascal/Delphi) - Chained method calls now resolve when the chained method is **inherited from a superclass or declared on an interface/protocol** the receiver's type conforms to — for example a call on a sealed-subclass instance (`Either.Right(x).combine(...)`) that invokes a method defined on its parent type. Previously these chains found no caller edge even though the factory's type was known, so the call was invisible to callers, impact, and trace. CodeGraph now walks the type's supertypes (its `extends` / `implements` relationships) to find the method, creating the edge only when a supertype genuinely declares it (so a wrong inference still produces no edge). This makes Java, Kotlin, and C# factory and fluent chains more complete. Existing indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) - Swift method calls made through a static factory, fluent chain, or constructor now resolve to the correct class. A call like `Foo.make().draw()` or `Foo().draw()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all. CodeGraph now captures Swift return types and infers the chained receiver's type from what the inner call returns (or the constructed type), creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Swift indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Swift) - C# method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.Create().Bar()` or `JObject.Parse(s).Property(...)` used to lose the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. CodeGraph now captures C# return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge). Existing C# indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (C#) diff --git a/__tests__/resolution.test.ts b/__tests__/resolution.test.ts index f33197eda..35607e3dd 100644 --- a/__tests__/resolution.test.ts +++ b/__tests__/resolution.test.ts @@ -3265,5 +3265,92 @@ end. // TBar has no OnlyOther — must not mis-attach to the same-named TOther::OnlyOther. expect(isCalled('TOther::OnlyOther')).toBe(false); }); + + it('extracts paren-less method calls (Pascal lets a no-arg method drop its parens)', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TFoo = class + procedure DoThing; + procedure Reset; + end; +implementation +procedure TFoo.DoThing; begin end; +procedure TFoo.Reset; begin end; +procedure Run(f: TFoo); +begin + f.DoThing; + f.Reset; +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(isCalled('TFoo::DoThing')).toBe(true); + expect(isCalled('TFoo::Reset')).toBe(true); + }); + + it('resolves a PAREN-LESS chained factory call TFoo.GetInstance.DoIt via the return type', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TBar = class + procedure DoIt; + end; + TDecoy = class + procedure DoIt; + end; + TFoo = class + class function GetInstance: TBar; + end; +implementation +procedure TBar.DoIt; begin end; +procedure TDecoy.DoIt; begin end; +class function TFoo.GetInstance: TBar; begin Result := nil; end; +procedure Run; +begin + TFoo.GetInstance.DoIt; +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + expect(isCalled('TBar::DoIt')).toBe(true); + expect(isCalled('TDecoy::DoIt')).toBe(false); + }); + + it('does NOT turn a property write/read into a call edge (only statement-level dots are calls)', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TFoo = class + function GetValue: Integer; + procedure SetValue(v: Integer); + property Value: Integer read GetValue write SetValue; + end; +implementation +function TFoo.GetValue: Integer; begin Result := 0; end; +procedure TFoo.SetValue(v: Integer); begin end; +procedure Run(f: TFoo); +var x: Integer; +begin + f.Value := 5; + x := f.Value; +end; +end. +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + // A property read/write is a bare dot in assignment position, not a statement, + // so it must not be mis-extracted as a call to the property's getter/setter. + expect(isCalled('TFoo::GetValue')).toBe(false); + expect(isCalled('TFoo::SetValue')).toBe(false); + }); }); }); diff --git a/src/extraction/extraction-version.ts b/src/extraction/extraction-version.ts index 1b847d000..435b263e9 100644 --- a/src/extraction/extraction-version.ts +++ b/src/extraction/extraction-version.ts @@ -21,4 +21,4 @@ * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty * in the product is load-bearing"). */ -export const EXTRACTION_VERSION = 16; +export const EXTRACTION_VERSION = 17; diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index 253bc3af9..c641ab96b 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -4371,6 +4371,69 @@ export class TreeSitterExtractor { } } + /** + * Extract a PAREN-LESS Pascal method/procedure call (`Obj.Method;`, + * `TFoo.GetInstance.DoIt;`). Pascal lets a no-arg method drop its parens, so it + * parses as a bare `exprDot` (not an `exprCall`). A bare `exprDot` is + * syntactically identical to a field/property access, so this is only ever + * called for a STATEMENT-level exprDot (caller-gated): a bare `Obj.Field;` + * statement is a no-op, so a statement-level dot expression is a call. (An + * exprDot in assignment LHS/RHS or a condition is left alone — there it really + * can be a field/property read.) + */ + private extractPascalParenlessCall(node: SyntaxNode): void { + if (this.nodeStack.length === 0) return; + const callerId = this.nodeStack[this.nodeStack.length - 1]; + if (!callerId) return; + + const receiver = node.namedChild(0); + const outerId = node.namedChildren.filter((c: SyntaxNode) => c.type === 'identifier').pop(); + const method = outerId ? getNodeText(outerId, this.source) : ''; + if (!method) return; + + let calleeName = ''; + // Chained: the receiver is itself a call — a paren-less `TFoo.GetInstance` (an + // inner exprDot) or a paren'd `TFoo.GetInstance()` (an exprCall). Encode the + // chain `TFoo.GetInstance().DoIt` so resolution infers DoIt's class from what + // the factory RETURNS (#645/#608), gated on the Delphi `TFoo`/`IFoo` type + // convention; a capitalized VARIABLE chain stays a bare method name. + if ((receiver?.type === 'exprDot' || receiver?.type === 'exprCall') && /^\w+$/.test(method)) { + const innerCalleeNode = receiver.type === 'exprCall' ? receiver.namedChild(0) : receiver; + const innerCallee = !innerCalleeNode + ? '' + : innerCalleeNode.type === 'identifier' + ? getNodeText(innerCalleeNode, this.source) + : innerCalleeNode.namedChildren + .filter((c: SyntaxNode) => c.type === 'identifier') + .map((id: SyntaxNode) => getNodeText(id, this.source)) + .join('.'); + if (innerCallee && /^[TI][A-Z]/.test(innerCallee)) { + calleeName = `${innerCallee}().${method}`; + // The T/I-prefixed inner is itself a real call — record it too. + if (receiver.type === 'exprCall') this.extractPascalCall(receiver); + else this.extractPascalParenlessCall(receiver); + } else { + calleeName = method; // non-class receiver: a bare method ref (no field-access ref) + } + } else { + // Simple: `Obj.Method` → the dotted name (resolves via the receiver / bare name). + calleeName = node.namedChildren + .filter((c: SyntaxNode) => c.type === 'identifier') + .map((id: SyntaxNode) => getNodeText(id, this.source)) + .join('.'); + } + + if (calleeName) { + this.unresolvedReferences.push({ + fromNodeId: callerId, + referenceName: calleeName, + referenceKind: 'calls', + line: node.startPosition.row + 1, + column: node.startPosition.column, + }); + } + } + /** * Recursively visit a Pascal block/statement tree for call expressions */ @@ -4381,11 +4444,18 @@ export class TreeSitterExtractor { if (child.type === 'exprCall') { this.extractPascalCall(child); } else if (child.type === 'exprDot') { - // Check if exprDot contains an exprCall - for (let j = 0; j < child.namedChildCount; j++) { - const grandchild = child.namedChild(j); - if (grandchild?.type === 'exprCall') { - this.extractPascalCall(grandchild); + // A STATEMENT-level bare exprDot is a paren-less call (`Obj.Free;`, + // `TFoo.GetInstance.DoIt;`). Anywhere else (assignment side, condition, + // expression) a bare exprDot is ambiguous with a field/property access, + // so there we only descend for paren'd inner calls. + if (node.type === 'statement') { + this.extractPascalParenlessCall(child); + } else { + for (let j = 0; j < child.namedChildCount; j++) { + const grandchild = child.namedChild(j); + if (grandchild?.type === 'exprCall') { + this.extractPascalCall(grandchild); + } } } } else { From 5342f7a93e5b3e70c77715cad4d5fc34ecacf0e0 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 08:55:27 -0400 Subject: [PATCH 06/31] docs(design): Pascal paren-less method calls now extracted (#793) (#794) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updates the chained-call design doc: the Pascal paren-less-call follow-up is done (#793) — `Obj.Free;` / `TFoo.GetInstance.DoIt;` are now extracted (scoped to statement position so field/property accesses aren't mistaken for calls). PascalCoin +1131/-1. EXTRACTION_VERSION 17. Co-authored-by: Claude Opus 4.8 (1M context) --- docs/design/chained-call-resolution.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/design/chained-call-resolution.md b/docs/design/chained-call-resolution.md index 8485a02f7..a4d9338d1 100644 --- a/docs/design/chained-call-resolution.md +++ b/docs/design/chained-call-resolution.md @@ -78,7 +78,7 @@ walking `context.getSupertypes(...)`. | **TypeScript** | — | `.` | typeorm +0/−6 · nest **+0/−164** | **Evaluated, NOT shipped** — gradual typing; see below. | | **Luau** | — | `:` / `.` | Fusion +0/−0 · matter +0/−0 | **Evaluated, NOT shipped** — gradually typed; additive-safe (missing-edge gap, no regression) but real Luau rarely annotates factory returns, so +0 on both benchmarks. Works for `Foo.create(): Bar` then `:doIt()` (synthetic). | -`EXTRACTION_VERSION` is now **16** (C++→…→Objective-C→Pascal). Re-index with `codegraph index -f` +`EXTRACTION_VERSION` is now **17** (C++→…→Pascal chains→Pascal paren-less calls). Re-index with `codegraph index -f` to pick up the newer extractor on an existing graph. ## Why TypeScript was skipped @@ -110,7 +110,7 @@ declarations). Against the README's full supported-language list: |---|---| | **Covered** (13) | C++, C, PHP, Java, Kotlin, C#, Swift, Rust, Go, Scala, Dart, Objective-C, Pascal/Delphi | | **Evaluated, skipped** (2) | **TypeScript** — gradual typing → inference-typed factories can't be recovered; net recall regression. **Luau** — gradually typed; additive-safe but +0 on Fusion AND matter (real Luau rarely annotates factory returns). Both: the mechanism needs reliably-declared return types, which gradually-typed code too often omits. | -| **Known limitation (not blocking)** | **Pascal/Delphi** is shipped (#791), but only the **paren'd** chain `TFoo.GetInstance().DoIt()` is covered — the **paren-less** form `TFoo.GetInstance.DoIt` parses as a bare `exprDot` (not in `callTypes`) and isn't extracted as a call at all. Emitting paren-less method calls is a separate extractor follow-up (and a broader Pascal-coverage win independent of chains). | +| **Pascal paren-less calls** | **Resolved (#793).** Pascal lets a no-arg method drop its parens (`Obj.Free;`, `TFoo.GetInstance.DoIt;`), which parse as a bare `exprDot` and weren't extracted as calls at all. Now extracted, scoped to STATEMENT position (a bare dot in assignment/condition position is left alone — there it's ambiguous with a field/property access). The paren-less chain reuses the same `TFoo`/`IFoo`-gated encoding. PascalCoin A/B **+1131 / −1**, all new edges resolve to methods (zero field/property false positives). | | **Out of scope — no declared return types** (6) | JavaScript, Ruby, Lua, Svelte, Vue, Liquid (Liquid has no methods/chains at all) | | **Partial / separate** (1) | Python — only optional `-> T` hints; tracked as #578, not part of this mechanism | From dac00e7d449a7b27c15292bc66f5c0c108bc9f9e Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 09:05:08 -0400 Subject: [PATCH 07/31] fix(pascal): attribute a free routine's calls to it, not the file (#795) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A Pascal/Delphi procedure or function defined ONLY in the implementation section (no interface declaration, not a class method) had no node of its own, so extractPascalDefProc's caller lookup fell through to the nodeStack top — the file node. Every call in such a routine's body was lumped under the unit: callers returned the file, and impact couldn't attribute the call to the routine. (Methods were fine — they get a node from their class declaration.) Fix: when extractPascalDefProc finds no existing node for a FREE routine (a name with no `.`), create a function node for it and attribute the body's calls to it. Interface-declared free routines already have a node (found via the methodIndex), so there's no duplicate; methods keep their existing class-declaration node. PascalCoin A/B: +511 / -145 — the +511 are calls now correctly attributed to their actual routine (`allocate_new_datablock -> TDisposables::GetMem`), replacing -145 file-level aggregates; +248 new function nodes for the implementation-only routines. New synthetic test asserts a free routine's call attributes to it alongside a method caller. EXTRACTION_VERSION 17->18. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/resolution.test.ts | 25 +++++++++++++++++++++++++ src/extraction/extraction-version.ts | 2 +- src/extraction/tree-sitter.ts | 26 +++++++++++++++++++++++--- 4 files changed, 50 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 19f406db4..eb3a57c94 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -36,6 +36,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - Objective-C methods called through a chained message send now resolve to the correct class. A call like `[[Foo create] doIt]` used to drop the receiver, so `doIt` silently attached to a same-named method on an unrelated class — most often a test helper or stdlib class. CodeGraph now captures Objective-C method return types and infers the chained receiver's type from what the inner message returns. For the ubiquitous `[[X alloc] init]` and singleton (`[[X sharedInstance] …]`) patterns — where the factory returns `instancetype` — the receiver is the class `X` itself, so the chained method resolves on `X` (including methods inherited from a superclass), creating the edge only when the class genuinely has the method. Existing Objective-C indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Objective-C) - Pascal/Delphi methods called through a chained factory call now resolve to the correct class. A call like `TFoo.GetInstance().DoIt()` used to drop the receiver, so `DoIt` silently attached to a same-named method on an unrelated class. CodeGraph now captures Pascal return types and infers the chained receiver's type from what the factory function returns — resolving to the declared type (including an interface return like `IFoo`), and for a constructor (`TFoo.Create().…`) or a typecast (`TFoo(x).…`) to the class `TFoo` itself, since both yield a `TFoo`. The edge is created only when that type genuinely has the method (so a wrong inference produces no edge). Existing Pascal/Delphi indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Pascal/Delphi) - Pascal/Delphi **paren-less method calls are now tracked**. Pascal lets a no-argument method or procedure drop its parentheses (`Obj.Free;`, `List.Clear;`, `TFoo.GetInstance.DoIt;`), which previously weren't recorded as calls at all — so callers, impact, and trace missed them. CodeGraph now extracts these, scoped to statement position so a field or property access (which looks identical) is never mistaken for a call. On a real Delphi codebase this added ~1,100 previously-missing call edges with no false positives. Existing Pascal/Delphi indexes should be re-indexed (`codegraph index -f`) to benefit. (Pascal/Delphi) +- Pascal/Delphi calls inside a **standalone procedure or function** (one with no `interface` declaration, defined only in the `implementation` section) are now attributed to that routine instead of the whole file. Previously such a routine had no symbol of its own, so everything it called was lumped under the unit — `codegraph_callers` returned the file, and impact couldn't tell which routine was responsible. These routines are now indexed and their calls attributed correctly. Existing Pascal/Delphi indexes should be re-indexed (`codegraph index -f`) to benefit. (Pascal/Delphi) - Chained method calls now resolve when the chained method is **inherited from a superclass or declared on an interface/protocol** the receiver's type conforms to — for example a call on a sealed-subclass instance (`Either.Right(x).combine(...)`) that invokes a method defined on its parent type. Previously these chains found no caller edge even though the factory's type was known, so the call was invisible to callers, impact, and trace. CodeGraph now walks the type's supertypes (its `extends` / `implements` relationships) to find the method, creating the edge only when a supertype genuinely declares it (so a wrong inference still produces no edge). This makes Java, Kotlin, and C# factory and fluent chains more complete. Existing indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) - Swift method calls made through a static factory, fluent chain, or constructor now resolve to the correct class. A call like `Foo.make().draw()` or `Foo().draw()` used to drop the receiver, so the chained method silently attached to a same-named method on an unrelated class — or didn't resolve at all. CodeGraph now captures Swift return types and infers the chained receiver's type from what the inner call returns (or the constructed type), creating the edge only when that class genuinely has the method (so a wrong inference produces no edge instead of a misleading one). Existing Swift indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (Swift) - C# method calls made through a static factory or fluent chain now resolve to the correct class. A call like `Foo.Create().Bar()` or `JObject.Parse(s).Property(...)` used to lose the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. CodeGraph now captures C# return types and infers the chained receiver's type from what the inner call returns, creating the edge only when that class genuinely has the method (so a wrong inference produces no edge). Existing C# indexes should be re-indexed (`codegraph index -f`) to benefit. (#750) (C#) diff --git a/__tests__/resolution.test.ts b/__tests__/resolution.test.ts index 35607e3dd..12131f3cc 100644 --- a/__tests__/resolution.test.ts +++ b/__tests__/resolution.test.ts @@ -3352,5 +3352,30 @@ end. expect(isCalled('TFoo::GetValue')).toBe(false); expect(isCalled('TFoo::SetValue')).toBe(false); }); + + it('attributes an implementation-only free procedure\'s calls to the procedure, not the file', async () => { + fs.writeFileSync( + path.join(tempDir, 'main.pas'), + `unit Main; +interface +type + TTgt = class + procedure Hit; + end; + TFoo = class + procedure DoStuff; + end; +implementation +procedure TTgt.Hit; begin end; +procedure TFoo.DoStuff; var t: TTgt; begin t.Hit; end; +procedure Helper; var t: TTgt; begin t.Hit; end; +` + ); + cg = await CodeGraph.init(tempDir, { index: true }); + // `Helper` is implementation-only (no interface decl, not a method), but its + // body's call must attribute to `Helper`, not the file/module — alongside the + // method `DoStuff`. + expect(callerNamesOf('TTgt::Hit')).toEqual(['DoStuff', 'Helper']); + }); }); }); diff --git a/src/extraction/extraction-version.ts b/src/extraction/extraction-version.ts index 435b263e9..7b2df06d4 100644 --- a/src/extraction/extraction-version.ts +++ b/src/extraction/extraction-version.ts @@ -21,4 +21,4 @@ * turns the re-index hint into noise — keep it honest (see CLAUDE.md, "Honesty * in the product is load-bearing"). */ -export const EXTRACTION_VERSION = 17; +export const EXTRACTION_VERSION = 18; diff --git a/src/extraction/tree-sitter.ts b/src/extraction/tree-sitter.ts index c641ab96b..8eb04c6e8 100644 --- a/src/extraction/tree-sitter.ts +++ b/src/extraction/tree-sitter.ts @@ -4281,10 +4281,30 @@ export class TreeSitterExtractor { } } - const parentId = + let parentId = this.methodIndex.get(fullNameKey) || - this.methodIndex.get(shortNameKey) || - this.nodeStack[this.nodeStack.length - 1]; + this.methodIndex.get(shortNameKey); + + // No existing node? This is an implementation-only **free** procedure/function + // (`procedure Helper; begin … end;` with no interface declaration and not a + // class method). Create a function node so its body's calls attribute to it, + // not to the enclosing file/module. A method (`TClass.Method`, a dotted name) + // always has a node from its class declaration, so this only fires for free + // routines — and the methodIndex lookup above already covers interface-declared + // free routines, so there's no duplicate. + if (!parentId && !fullName.includes('.')) { + const fnNode = this.createNode('function', fullName, declProc, { + signature: this.extractor?.getSignature?.(declProc, this.source), + visibility: this.extractor?.getVisibility?.(declProc), + }); + if (fnNode) { + parentId = fnNode.id; + this.methodIndex.set(fullNameKey, fnNode.id); + if (!this.methodIndex.has(shortNameKey)) this.methodIndex.set(shortNameKey, fnNode.id); + } + } + + if (!parentId) parentId = this.nodeStack[this.nodeStack.length - 1]; if (!parentId) return; // Visit the block for calls From 0b3f3f969c988409ae2fcadbf51ec86800c57b01 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 09:06:37 -0400 Subject: [PATCH 08/31] docs(design): Pascal free-routine call attribution fixed (#795) (#796) Records the second Pascal call-coverage follow-up (#795): a free routine defined only in the implementation section now gets a function node so its body's calls attribute to it, not the file. EXTRACTION_VERSION 18. Co-authored-by: Claude Opus 4.8 (1M context) --- docs/design/chained-call-resolution.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/design/chained-call-resolution.md b/docs/design/chained-call-resolution.md index a4d9338d1..9fa34a6e6 100644 --- a/docs/design/chained-call-resolution.md +++ b/docs/design/chained-call-resolution.md @@ -78,7 +78,7 @@ walking `context.getSupertypes(...)`. | **TypeScript** | — | `.` | typeorm +0/−6 · nest **+0/−164** | **Evaluated, NOT shipped** — gradual typing; see below. | | **Luau** | — | `:` / `.` | Fusion +0/−0 · matter +0/−0 | **Evaluated, NOT shipped** — gradually typed; additive-safe (missing-edge gap, no regression) but real Luau rarely annotates factory returns, so +0 on both benchmarks. Works for `Foo.create(): Bar` then `:doIt()` (synthetic). | -`EXTRACTION_VERSION` is now **17** (C++→…→Pascal chains→Pascal paren-less calls). Re-index with `codegraph index -f` +`EXTRACTION_VERSION` is now **18** (C++→…→Pascal chains→paren-less calls→free-routine attribution). Re-index with `codegraph index -f` to pick up the newer extractor on an existing graph. ## Why TypeScript was skipped @@ -110,7 +110,7 @@ declarations). Against the README's full supported-language list: |---|---| | **Covered** (13) | C++, C, PHP, Java, Kotlin, C#, Swift, Rust, Go, Scala, Dart, Objective-C, Pascal/Delphi | | **Evaluated, skipped** (2) | **TypeScript** — gradual typing → inference-typed factories can't be recovered; net recall regression. **Luau** — gradually typed; additive-safe but +0 on Fusion AND matter (real Luau rarely annotates factory returns). Both: the mechanism needs reliably-declared return types, which gradually-typed code too often omits. | -| **Pascal paren-less calls** | **Resolved (#793).** Pascal lets a no-arg method drop its parens (`Obj.Free;`, `TFoo.GetInstance.DoIt;`), which parse as a bare `exprDot` and weren't extracted as calls at all. Now extracted, scoped to STATEMENT position (a bare dot in assignment/condition position is left alone — there it's ambiguous with a field/property access). The paren-less chain reuses the same `TFoo`/`IFoo`-gated encoding. PascalCoin A/B **+1131 / −1**, all new edges resolve to methods (zero field/property false positives). | +| **Pascal call-coverage follow-ups** | Two gaps from the chained-call work, both resolved. **Paren-less calls (#793):** Pascal lets a no-arg method drop its parens (`Obj.Free;`, `TFoo.GetInstance.DoIt;`), which parse as a bare `exprDot` and weren't extracted as calls at all. Now extracted, scoped to STATEMENT position (a bare dot in assignment/condition position is left alone — ambiguous with a field/property access). PascalCoin A/B **+1131 / −1**, all new edges resolve to methods. **Free-routine attribution (#795):** a procedure/function defined only in the `implementation` section (no interface decl, not a method) had no node, so its body's calls were lumped under the file; now it gets a function node and its calls attribute to it. PascalCoin A/B **+511 / −145** (file-level aggregates → per-routine edges). | | **Out of scope — no declared return types** (6) | JavaScript, Ruby, Lua, Svelte, Vue, Liquid (Liquid has no methods/chains at all) | | **Partial / separate** (1) | Python — only optional `-> T` hints; tracked as #578, not part of this mechanism | From b7b7c8b4e8794e0108ddbd67402a29a6bea9b095 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 09:25:01 -0400 Subject: [PATCH 09/31] =?UTF-8?q?docs(readme):=20update=20Pascal/Delphi=20?= =?UTF-8?q?coverage=2075.7%=20=E2=86=92=2077.4%=20(#797)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The paren-less call extraction (#793) and free-routine attribution (#795) added real call coverage on PascalCoin. Controlled A/B on a fresh clone, same source-file filter, only the build differing: baseline (pre-Pascal-work, d21d2df): 75.79% (≈ the documented 75.7%) current (main, v18): 77.37% (+1.58) The baseline reproducing the documented 75.7% confirms the metric is the same one the README table uses; the +1.58 is the measured coverage gain from this session's Pascal extraction work. Co-authored-by: Claude Opus 4.8 (1M context) --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ab147b548..c2356b23d 100644 --- a/README.md +++ b/README.md @@ -665,7 +665,7 @@ Impact and blast-radius queries are only as good as the dependency graph behind | Lua | nvim-telescope/telescope.nvim | 84.2% | | Luau | dphfox/Fusion | 92.2% | | Liquid | Shopify/dawn | 73.8% | -| Pascal / Delphi | PascalCoin | 75.7% | +| Pascal / Delphi | PascalCoin | 77.4% | Framework routing is validated the same way, on a canonical app per framework: Express 100%, FastAPI 98%, Flask 100%, NestJS 96.8%, Gin 96.5%, Axum 100%, Rocket 93.8%, Vapor 100%, Laravel 92%, Rails 89.6%, React Router 100% — and the convention/reflection-heavy ones at their honest static-analysis ceiling: ASP.NET 83.9%, Spring 83.3%, Drupal 78.9%, Django 74.1%. From c39b4b938ec9cca48c5b953987ac26e1c0b6d5e6 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 09:57:48 -0400 Subject: [PATCH 10/31] =?UTF-8?q?docs(readme):=20fill=20framework-coverage?= =?UTF-8?q?=20gaps=20=E2=80=94=20add=20Play,=20Vue/Nuxt,=20Scala=20(#798)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The framework story was missing several supported frameworks: - Play (Scala/Java) — absent from both the Framework-aware Routes table and the routing-coverage line. Measured 76.3% (106/139 routes resolved to a handler) across the 31 verb-route apps in playframework/play-samples; every miss is Play's framework-provided `Assets` controller (vendored library code, not app source). Slots into the convention-ceiling bucket. - Vue Router / Nuxt — recognized (file-based pages/, server/api/, middleware) but missing from the routes table. - Scala + Vue — missing from the "20+ Languages" highlight. File-based routers (SvelteKit, Vue/Nuxt) have no separate handler edge — the page IS the handler — so their coverage is the fair-coverage language figure (Svelte/SvelteKit 100%, Vue/Nuxt 93.5%), now cited explicitly. Existing framework numbers left untouched (they were measured ad-hoc; a fresh re-measure would shift them and isn't part of this gap-fill). Co-authored-by: Claude Opus 4.8 (1M context) --- README.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index c2356b23d..11c379b48 100644 --- a/README.md +++ b/README.md @@ -225,8 +225,8 @@ CodeGraph cuts **tokens, tool calls, and wall-clock time on every repo** — acr | **Full-Text Search** | Find code by name instantly across your entire codebase, powered by FTS5 | | **Impact Analysis** | Trace callers, callees, and the full impact radius of any symbol before making changes | | **Always Fresh** | File watcher uses native OS events (FSEvents/inotify/ReadDirectoryChangesW) with debounced auto-sync — the graph stays current as you code, zero config | -| **20+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Objective-C, Swift, Kotlin, Dart, Lua, Luau, Svelte, Liquid, Pascal/Delphi | -| **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 14 frameworks | +| **20+ Languages** | TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Objective-C, Swift, Kotlin, Scala, Dart, Lua, Luau, Svelte, Vue, Liquid, Pascal/Delphi | +| **Framework-aware Routes** | Recognizes web-framework routing files and links URL patterns to their handlers across 16 frameworks | | **Mixed iOS / React Native / Expo** | Closes cross-language flows that static parsing misses: Swift ↔ ObjC bridging, React Native legacy bridge + TurboModules + Fabric view components, native → JS event emitters, Expo Modules | | **100% Local** | No data leaves your machine. No API keys. No external services. SQLite database only | @@ -274,11 +274,13 @@ CodeGraph detects web-framework routing files and emits `route` nodes linked by | **Drupal** | `*.routing.yml` routes (`_controller`, `_form`, entity handlers); `hook_*` implementations in `.module`/`.theme`/`.install`/`.inc` | | **Rails** | `get '/x', to: 'users#index'`, hash-rocket `=>` syntax | | **Spring** | `@GetMapping`, `@PostMapping`, `@RequestMapping` on methods | +| **Play** | `GET`/`POST`/… verb routes in `conf/routes` → `Controller.method` actions (Scala + Java) | | **Gin / chi / gorilla / mux** | `r.GET(...)`, `router.HandleFunc(...)` | | **Axum / actix / Rocket** | `.route("/x", get(handler))` | | **ASP.NET** | `[HttpGet("/x")]` attributes on action methods | | **Vapor** | `app.get("x", use: handler)` | | **React Router** / **SvelteKit** | Route component nodes | +| **Vue Router** / **Nuxt** | `pages/` file-based routes, `server/api/` endpoints, route middleware | --- @@ -667,7 +669,7 @@ Impact and blast-radius queries are only as good as the dependency graph behind | Liquid | Shopify/dawn | 73.8% | | Pascal / Delphi | PascalCoin | 77.4% | -Framework routing is validated the same way, on a canonical app per framework: Express 100%, FastAPI 98%, Flask 100%, NestJS 96.8%, Gin 96.5%, Axum 100%, Rocket 93.8%, Vapor 100%, Laravel 92%, Rails 89.6%, React Router 100% — and the convention/reflection-heavy ones at their honest static-analysis ceiling: ASP.NET 83.9%, Spring 83.3%, Drupal 78.9%, Django 74.1%. +Framework routing is validated the same way, on a canonical app per framework: Express 100%, FastAPI 98%, Flask 100%, NestJS 96.8%, Gin 96.5%, Axum 100%, Rocket 93.8%, Vapor 100%, Laravel 92%, Rails 89.6%, React Router 100% — and the convention/reflection-heavy ones at their honest static-analysis ceiling: ASP.NET 83.9%, Spring 83.3%, Drupal 78.9%, Play 76.3%, Django 74.1%. SvelteKit and Vue/Nuxt use file-based routing, so their page/endpoint coverage is the Svelte/SvelteKit (100%) and Vue/Nuxt (93.5%) figures in the table above. ## Troubleshooting From 9a0f1447702709b5027ce346110f4dc86a7f1eb1 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 11:09:05 -0500 Subject: [PATCH 11/31] fix(directory): self-heal a stale .codegraph/.gitignore so daemon.pid is ignored (#788) (#802) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Versions <= 0.9.9 wrote an explicit-allowlist .codegraph/.gitignore (*.db, cache/, .dirty, ...) that never listed daemon.pid or the socket, so the daemon's runtime pidfile got committed. The wildcard rewrite in #654/#492/#484 fixed new inits, but the file is only written when absent, so existing installs kept their stale file forever — the fix never reached the people hitting it. Make the gitignore self-heal: ensureGitignore() writes the file if absent and upgrades a stale CodeGraph-generated default in place, leaving a user-authored file untouched. A "stale default" is one that carries our `# CodeGraph data files` header but predates the wildcard ignore (no bare `*` line) — a header match heals every historical variant (v0.7.x..0.9.9, all verified to share it) and is idempotent. validateDirectory() runs on every open()/openSync(), so existing repos heal on the next codegraph command after upgrading. The duplicated template (previously inlined in two formats) is consolidated into one GITIGNORE_CONTENT constant. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/foundation.test.ts | 40 +++++++++++++++++ src/directory.ts | 87 +++++++++++++++++++++++++++--------- 3 files changed, 107 insertions(+), 21 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index eb3a57c94..6f790c7c3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -93,6 +93,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - Indexing a very large repository no longer aborts during its first sync with a "too many SQL variables" error. (#540) - Files under directories with non-ASCII names (for example CJK characters) are no longer silently skipped during indexing. (#541) - The `.codegraph/` index folder no longer clutters `git status`: its generated ignore file now excludes everything in the folder except itself, so the database, `daemon.pid`, sockets, and logs stop showing up as untracked changes. (#492, #484) +- Projects initialized by an older version now get that fix automatically: a `.codegraph/.gitignore` written before this change — which listed only the database, cache, and logs and so let the daemon's `daemon.pid` get committed — is upgraded in place the next time you run any CodeGraph command. A `.gitignore` you've customized yourself is left untouched. (#788) - SAP HANA `.xsjs` / `.xsjslib` files are now indexed as JavaScript. (#556) - TypeScript `.mts` and `.cts` module files are now indexed instead of being skipped. (#366) - JavaScript modules that wrap their code in an anonymous function — AMD/RequireJS, NetSuite SuiteScript, IIFE bundles — now have their inner functions and calls indexed, instead of the file coming up nearly empty. (#528) diff --git a/__tests__/foundation.test.ts b/__tests__/foundation.test.ts index 405865b2f..05fa79804 100644 --- a/__tests__/foundation.test.ts +++ b/__tests__/foundation.test.ts @@ -159,6 +159,46 @@ describe('CodeGraph Foundation', () => { expect(validation.valid).toBe(false); expect(validation.errors.length).toBeGreaterThan(0); }); + + it('upgrades a stale pre-wildcard .gitignore in place (issue #788)', () => { + const cg = CodeGraph.initSync(tempDir); + cg.close(); + + const gitignorePath = path.join(getCodeGraphDir(tempDir), '.gitignore'); + // A .gitignore written by an older version (<= 0.9.9): an explicit + // allowlist that never ignored daemon.pid, so the daemon's runtime + // pidfile got committed. + const staleV099 = + '# CodeGraph data files\n' + + '# These are local to each machine and should not be committed\n\n' + + '# Database\n*.db\n*.db-wal\n*.db-shm\n\n' + + '# Cache\ncache/\n\n# Logs\n*.log\n\n# Hook markers\n.dirty\n'; + fs.writeFileSync(gitignorePath, staleV099, 'utf-8'); + + // Opening the project runs validateDirectory, which self-heals. + const cg2 = CodeGraph.openSync(tempDir); + cg2.close(); + + const upgraded = fs.readFileSync(gitignorePath, 'utf-8'); + expect(upgraded).toContain('\n*\n'); // wildcard ignores everything… + expect(upgraded).toContain('!.gitignore'); // …except this file + expect(upgraded).not.toContain('.dirty'); // old explicit list is gone + }); + + it('leaves a user-customized .codegraph/.gitignore untouched', () => { + const cg = CodeGraph.initSync(tempDir); + cg.close(); + + const gitignorePath = path.join(getCodeGraphDir(tempDir), '.gitignore'); + // No CodeGraph header → user-authored → must not be rewritten. + const custom = '# my own rules\n*.db\n!keep-this.json\n'; + fs.writeFileSync(gitignorePath, custom, 'utf-8'); + + const cg2 = CodeGraph.openSync(tempDir); + cg2.close(); + + expect(fs.readFileSync(gitignorePath, 'utf-8')).toBe(custom); + }); }); describe('Uninitialize', () => { diff --git a/src/directory.ts b/src/directory.ts index 8f5abb092..1c7729a42 100644 --- a/src/directory.ts +++ b/src/directory.ts @@ -129,6 +129,61 @@ export function findNearestCodeGraphRoot(startPath: string): string | null { return null; } +/** + * Contents of `.codegraph/.gitignore`. A single wildcard ignore keeps every + * transient file in the index dir — the database, `daemon.pid`, the socket, + * logs, cache, and anything future versions add — out of git, without having + * to enumerate each name (issues #788, #492, #484). Older versions wrote an + * explicit allowlist that never listed `daemon.pid` or the socket, so those + * runtime files were silently committed. + */ +const GITIGNORE_CONTENT = `# CodeGraph data files — local to each machine, not for committing. +# Ignore everything in .codegraph/ except this file itself, so transient +# files (the database, daemon.pid, sockets, logs) never show up in git. +* +!.gitignore +`; + +/** Header line that prefixes every .gitignore CodeGraph has auto-generated. */ +const GITIGNORE_MARKER = '# CodeGraph data files'; + +/** + * Is `content` a stale CodeGraph-generated `.gitignore` that should be + * regenerated in place? True when it carries our header but predates the + * wildcard ignore (it has no bare `*` line) — i.e. one of the old explicit + * allowlists (`*.db`, `cache/`, `.dirty`, …) that never ignored `daemon.pid` + * or the socket (issue #788). A file WITHOUT our header is user-authored and + * is left untouched; one that already has the wildcard is current. Matching + * on the header (not a byte-exact list of past defaults) heals every old + * variant — v0.7.x through 0.9.9 — and is idempotent once upgraded. + */ +function isStaleDefaultGitignore(content: string): boolean { + if (!content.trimStart().startsWith(GITIGNORE_MARKER)) return false; + return !content.split('\n').some((line) => line.trim() === '*'); +} + +/** + * Write `.codegraph/.gitignore` if it's absent, or upgrade a stale + * CodeGraph-generated default in place; a user-customized file is left alone. + * Best-effort — returns `false` only if a needed write failed. + */ +function ensureGitignore(gitignorePath: string): boolean { + let existing: string | null; + try { + existing = fs.readFileSync(gitignorePath, 'utf-8'); + } catch { + existing = null; // absent (ENOENT) or unreadable — (re)create below + } + // Current default or a user-authored file: nothing to do. + if (existing !== null && !isStaleDefaultGitignore(existing)) return true; + try { + fs.writeFileSync(gitignorePath, GITIGNORE_CONTENT, 'utf-8'); + return true; + } catch { + return false; + } +} + /** * Create the .codegraph directory structure * Note: Only throws if codegraph.db already exists, not just if .codegraph/ exists. @@ -146,18 +201,9 @@ export function createDirectory(projectRoot: string): void { // Create main directory (if it doesn't exist) fs.mkdirSync(codegraphDir, { recursive: true }); - // Create .gitignore inside .codegraph (if it doesn't exist) - const gitignorePath = path.join(codegraphDir, '.gitignore'); - if (!fs.existsSync(gitignorePath)) { - const gitignoreContent = `# CodeGraph data files — local to each machine, not for committing. -# Ignore everything in .codegraph/ except this file itself, so transient -# files (the database, daemon.pid, sockets, logs) never show up in git. -* -!.gitignore -`; - - fs.writeFileSync(gitignorePath, gitignoreContent, 'utf-8'); - } + // Write .gitignore inside .codegraph (create if absent, upgrade a stale + // pre-wildcard default left by an older version — issue #788). + ensureGitignore(path.join(codegraphDir, '.gitignore')); } /** @@ -296,16 +342,15 @@ export function validateDirectory(projectRoot: string): { return { valid: false, errors }; } - // Auto-repair missing .gitignore (non-critical file) + // Auto-repair / upgrade .gitignore (non-critical file). A missing one is + // recreated; a stale pre-wildcard default that never ignored daemon.pid is + // regenerated in place (issue #788); a user-authored file is left alone. const gitignorePath = path.join(codegraphDir, '.gitignore'); - if (!fs.existsSync(gitignorePath)) { - try { - const gitignoreContent = `# CodeGraph data files — local to each machine, not for committing.\n# Ignore everything in .codegraph/ except this file itself, so transient\n# files (the database, daemon.pid, sockets, logs) never show up in git.\n*\n!.gitignore\n`; - fs.writeFileSync(gitignorePath, gitignoreContent, 'utf-8'); - } catch { - // Non-fatal: warn but don't block - errors.push('.gitignore missing in .codegraph directory and could not be created'); - } + const existedBefore = fs.existsSync(gitignorePath); + if (!ensureGitignore(gitignorePath) && !existedBefore) { + // Only a missing-and-uncreatable file is surfaced; a failed in-place + // upgrade of an existing file is non-fatal — the index still works. + errors.push('.gitignore missing in .codegraph directory and could not be created'); } return { From d0e649969a73d369ae4f0f41ff802da94cccdd95 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 11:25:13 -0500 Subject: [PATCH 12/31] fix(graph): treat class instantiation as a caller/callee edge (#774) (#804) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `callers ` returned "No callers found" (or only the importing file) even when a class's constructor was called from many sites, and the instantiation sites were invisible — the opposite of what "what breaks if I change this class?" should answer. The `instantiates` edges already existed in the graph, correctly attributed to the constructing function; they were simply excluded from the caller/callee traversal, which queried only calls/references/imports. Constructing a class is calling its constructor, so add `instantiates` to the edge-kind set in both getCallers and getCallees (kept symmetric so they stay inverses and `trace` can cross the instantiation boundary, function -> class -> its methods). impact already traversed all edge kinds, so it was unaffected. Query-layer only — existing indexes benefit on upgrade with no re-index. Verified on a Python fixture: `callers Supervisor` now returns the construction sites (main/work/test_it), and a new graph test asserts main() <-> DerivedClass via the instantiation. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/graph.test.ts | 19 +++++++++++++++++++ src/graph/traversal.ts | 13 +++++++++++-- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f790c7c3..4aca757b7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -58,6 +58,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - `codegraph affected` now reports the tests and files that actually depend on your changes. It used to follow only `import` statements — but those never cross file boundaries in CodeGraph's graph — so it returned **no affected tests for any change, in every language**. It now traces the real cross-file usage graph (calls, references, instantiations, and class `extends` / `implements`), so `git diff --name-only | codegraph affected` surfaces the test files that exercise the changed code. Circular-dependency detection, which had the same blind spot, now works too. - Blast radius, callers, and `codegraph affected` now recognize far more of the dependencies that were already in your code. A symbol now counts as a dependency whether it's called, used only in a type annotation inside a function body (`const items: Foo[] = []`), imported and placed in a registry array or passed as an argument, used as a JSX component, simply re-exported from a barrel (`export { X } from './x'`), or pulled in as a namespace (`import * as ns from '@/x'`) — including through tsconfig path aliases like `@/`. Previously only called, instantiated, or signature-typed symbols created a cross-file link, so a file that used a dependency in any other way could look like it depended on nothing — and the file that defined a widely-used symbol could look like nothing depended on it. The graph still indexes exactly the same symbols; it just connects the ones that were already there. (TypeScript/JavaScript) - The same completeness fix now applies to **Python**: a name brought in with `from module import X` is recorded as a dependency on that module even when `X` is only stored in a list/dict, passed as an argument, used as a decorator, or re-exported through an `__init__.py`. Previously Python linked only imports that were called or instantiated, so a module consumed purely by value — or only re-exported — looked like nothing depended on it. +- `codegraph_callers` (and the `callers` command) now lists the places a class is **instantiated**, not just where it's imported. Constructing a class — `Foo(...)` in Python, `new Foo()` elsewhere — is calling its constructor, so asking who calls a class now returns the construction sites, and `codegraph_callees` / trace cross the instantiation the same way. Previously a class's instantiation sites were invisible to `callers`, so "what breaks if I change this class?" could come back empty even when the constructor was called from many places. Works on your existing index — no re-index needed. (#774) - Rust impact and `codegraph affected` now connect far more of the module graph. Struct literals (`Widget { n: 1 }`) are recorded as instantiations; a `use` / `pub use` brings its item into the dependency graph — so a `pub use` re-export hub (a `mod.rs` re-exporting its submodules) depends on the modules it re-exports — resolved by Rust module path (`crate::`/`self::`/`super::`), so a re-export of a common name like `read` links to the right module instead of a same-named symbol elsewhere; and trait dispatch reaches implementations — a struct whose methods cover a trait's is treated as implementing it, and a call through `&dyn Trait` resolves to the concrete method. Previously a Rust type linked only when called or used in a type position, so structs built by literal, modules surfaced only through `pub use`, and trait-only implementations looked like they had no dependents. (#584 for Rust traits) - Rust cross-module function calls now resolve to the right file. A call to a sibling submodule's function — `users::router()`, the common router-assembly / handler-registration pattern where `mod users;` makes `users` a child of the current module — is now resolved relative to the current module, not only the crate root. Deeper module-path calls (`database::profiles::find()` — the `db.run(|c| …)` data-access shape) now resolve too; these were being discarded before resolution even ran, because the path's leaf function name was never checked. Previously such a call linked to nothing, so a module reached only as `module::path::function()` looked like it had no dependents; a web app wired this way (Axum, Rocket, and similar) now surfaces its handler and data-access modules' real callers. (Rust) - Rocket route handlers now connect to where they're mounted. A handler registered in a `routes![a::b::handler, …]` or `catchers![…]` macro used to be invisible — the macro body is a raw token tree, so the handler looked like it had no caller (Rocket mounts it at runtime) and its file showed no dependents. The handler paths are now read out of the macro and linked to the `mount`/`register` call, so editing a Rocket handler surfaces its route registration and a routes module is no longer reported as unused. (Rust, Rocket) diff --git a/__tests__/graph.test.ts b/__tests__/graph.test.ts index 5ddbd028f..bc25942ac 100644 --- a/__tests__/graph.test.ts +++ b/__tests__/graph.test.ts @@ -293,6 +293,25 @@ export { main }; expect(Array.isArray(callees)).toBe(true); }); + + it('treats class instantiation as a caller/callee of the class (#774)', () => { + // main() does `new DerivedClass(10, 'test')`. Constructing a class is + // calling its constructor, so main is a caller of DerivedClass and + // DerivedClass is a callee of main. Before #774 the `instantiates` edge + // was excluded from the caller/callee traversal, so `callers ` + // returned the importing file (or nothing) and missed every + // construction site. + const derived = cg.getNodesByKind('class').find((n) => n.name === 'DerivedClass'); + const main = cg.getNodesByKind('function').find((n) => n.name === 'main'); + expect(derived).toBeDefined(); + expect(main).toBeDefined(); + + const callerNames = cg.getCallers(derived!.id).map((c) => c.node.name); + expect(callerNames).toContain('main'); + + const calleeNames = cg.getCallees(main!.id).map((c) => c.node.name); + expect(calleeNames).toContain('DerivedClass'); + }); }); describe('getImpactRadius()', () => { diff --git a/src/graph/traversal.ts b/src/graph/traversal.ts index 82fc208d3..c50b877fe 100644 --- a/src/graph/traversal.ts +++ b/src/graph/traversal.ts @@ -248,7 +248,12 @@ export class GraphTraverser { } visited.add(nodeId); - const incomingEdges = this.queries.getIncomingEdges(nodeId, ['calls', 'references', 'imports']); + // `instantiates` counts as a caller: constructing a class (`Foo(...)` / + // `new Foo()`) is calling its constructor, so the instantiation site is a + // caller of the class. Without it, `callers ` surfaced only the + // importing file (via `imports`) and missed every construction site — + // the opposite of "what breaks if I change this class?" (#774). + const incomingEdges = this.queries.getIncomingEdges(nodeId, ['calls', 'references', 'imports', 'instantiates']); if (incomingEdges.length === 0) return; // Batch-fetch all caller nodes in one round-trip instead of one @@ -293,7 +298,11 @@ export class GraphTraverser { } visited.add(nodeId); - const outgoingEdges = this.queries.getOutgoingEdges(nodeId, ['calls', 'references', 'imports']); + // Symmetric with getCallers: a function that constructs a class + // (`Foo(...)` / `new Foo()`) has that class as a callee, so callers and + // callees stay inverses of each other and `trace` can cross the + // instantiation boundary (function → class → its methods) (#774). + const outgoingEdges = this.queries.getOutgoingEdges(nodeId, ['calls', 'references', 'imports', 'instantiates']); if (outgoingEdges.length === 0) return; // Batch-fetch callee nodes (was N+1 — see getCallersRecursive note). From 0b1a2eed97e54cb362d501cc93d9133f706c3e41 Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 12:04:51 -0500 Subject: [PATCH 13/31] fix(mcp): treat a stdin 'error' as shutdown so the server can't orphan/spin (#799) (#805) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A stdio MCP server's lifeline is stdin: when the host/client goes away, stdin should end and the server should exit. The server paths listened for stdin 'end'/'close' but NOT 'error'. That gap bites with a socket-backed stdin — the shape VS Code / Claude Code use (a socketpair, not a pipe). On client death the socket can surface as an 'error' (ECONNRESET/hangup) instead of a clean 'close'. Unhandled, it escalated to the process-wide uncaughtException handler, which logs and keeps running — so the server orphaned instead of exiting. On Linux a POLLHUP socket fd left registered in epoll then wakes the event loop continuously, pinning a core at 100% CPU; once the main thread spins, the setInterval PPID watchdog can't even fire, so the orphan runs forever (the report's 28+ minutes). Add treatStdinFailureAsShutdown(): listen for 'error' as well as 'end'/'close', and DESTROY the stdin stream on any terminal event so the fd leaves epoll and can't churn, then run the path's shutdown. Wired into the live paths — startDirect, the local-handshake proxy, and StdioTransport — plus the legacy pipe proxy. Fires once (re-entry guard). Note: this is hardening for a class of failure that matches every piece of the report's evidence (socket stdin, userspace main-thread spin, high involuntary context switches, watchdog never firing), but the exact 100% CPU spin could not be reproduced in Docker (Linux) across /dev/null EOF, socket peer-death (RST/FIN), the reporter's 0.9.7 bundle, and the npx chain — all exited cleanly — so the trigger is environment-specific. Co-authored-by: Claude Opus 4.8 (1M context) --- CHANGELOG.md | 1 + __tests__/stdin-teardown.test.ts | 46 ++++++++++++++++++++++++++++++++ src/mcp/index.ts | 7 +++-- src/mcp/proxy.ts | 18 ++++++++++--- src/mcp/stdin-teardown.ts | 46 ++++++++++++++++++++++++++++++++ src/mcp/transport.ts | 16 +++++++++-- 6 files changed, 126 insertions(+), 8 deletions(-) create mode 100644 __tests__/stdin-teardown.test.ts create mode 100644 src/mcp/stdin-teardown.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 4aca757b7..c2a8dbc96 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -49,6 +49,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). - C++ method calls made through a singleton, factory, or chained getter now resolve to the correct class. A call like `Foo::instance().bar()`, `WidgetFactory::create().draw()`, `openSession()->run()`, or the same stored in an `auto` local first, used to lose the receiver's type — so when two classes had a same-named method the call silently attached to whichever was indexed first (or didn't resolve at all), corrupting callers, impact, and trace. CodeGraph now infers the receiver's type from what the inner call returns (capturing C++ return types for the first time) and creates the edge only when that class genuinely has the method, so a wrong guess produces no edge instead of a misleading one. Covers singletons and self-returning accessors, factories that return a different type, free-function factories, `make_unique` / `make_shared` / `new` / direct construction, and single-level member chains. Existing C/C++ indexes should be re-indexed (`codegraph index -f`) to benefit. Thanks @stabey. (#645) (C/C++) - The shared background server no longer logs a scary-looking `[error] … undefined` line on every session start. Attaching to the shared daemon is normal, healthy behavior, but the informational message was being surfaced by MCP hosts (Claude Code and others) as an error; it's now silent by default — set `CODEGRAPH_MCP_LOG_ATTACH=1` to surface it when debugging daemon attach. Thanks @mturac. (#618) - On Windows, CodeGraph's background processes no longer pile up without bound and saturate CPU over a long session. When the editor or agent that launched CodeGraph exited, its helper process couldn't tell its parent had gone — Windows reports process lineage differently than macOS and Linux — so the helper kept running, the shared background server never saw the client disconnect, and its idle timer never fired to shut it down. CodeGraph now detects parent-process exit directly on Windows, so helpers and the idle background server wind down promptly, the same as they already did on macOS and Linux. (#692, #576, #680) +- The MCP server now shuts down cleanly when its editor/agent connection drops abruptly, instead of risking an orphaned process that pins a CPU core. Editors talk to a stdio MCP server over a socket; if that socket failed with an error rather than closing cleanly — which can happen when the editor window is reloaded or the launching process is killed — the server didn't treat it as a disconnect and could be left running. CodeGraph now treats any failure of its input stream as a shutdown signal and tears the stream down, so an orphaned server exits promptly. (#799) - The shared background server has two further safeguards against ever lingering: it now drops a client the moment it detects that client's process is gone (even if the disconnect arrived uncleanly — a force-quit or a dropped connection that never closed the socket), and it won't stay running indefinitely with clients attached but no activity. Together these guarantee it always winds down, on every platform. (#692) - A session no longer loses CodeGraph when the shared background server is restarted out from under it — for example when your MCP host (opencode and others) stops and restarts the server as you open another session. Previously the affected session's connection died silently and any request in flight at that moment hung; now CodeGraph keeps that session working by serving it locally, so the tools stay available without restarting the session. (#662) - React Native native→JS events now connect through the common `sendEvent(context, "X", body)` wrapper. Many libraries (react-native-device-info and others) wrap the event emitter behind a helper whose `.emit(eventName, …)` takes a *variable*, so the matcher — which looked for `.emit("literal", …)` — missed it; the literal event name actually lives in the wrapper call. Now a native method that fires `sendEvent(…, "batteryLevelChanged", …)` links to the JS `addListener('batteryLevelChanged', …)` handler, so editing the native emitter surfaces the JS subscriber. (React Native) diff --git a/__tests__/stdin-teardown.test.ts b/__tests__/stdin-teardown.test.ts new file mode 100644 index 000000000..c538ac5b2 --- /dev/null +++ b/__tests__/stdin-teardown.test.ts @@ -0,0 +1,46 @@ +/** + * #799 — a socket-backed stdin that fails must shut the server down, not + * orphan/busy-spin. treatStdinFailureAsShutdown is the shared guard. + */ +import { describe, it, expect } from 'vitest'; +import { PassThrough } from 'stream'; +import { treatStdinFailureAsShutdown } from '../src/mcp/stdin-teardown'; + +describe('treatStdinFailureAsShutdown (#799)', () => { + it("treats a stdin 'error' (ECONNRESET/hangup) as a shutdown signal", () => { + const s = new PassThrough(); + let calls = 0; + treatStdinFailureAsShutdown(() => { calls++; }, s); + + // No extra 'error' listener would throw here — the guard registers one. + s.emit('error', new Error('read ECONNRESET')); + expect(calls).toBe(1); + }); + + it("also fires on 'end' and on 'close'", () => { + for (const ev of ['end', 'close'] as const) { + const s = new PassThrough(); + let calls = 0; + treatStdinFailureAsShutdown(() => { calls++; }, s); + s.emit(ev); + expect(calls, `event ${ev}`).toBe(1); + } + }); + + it('destroys the stream so a hung fd leaves epoll', () => { + const s = new PassThrough(); + treatStdinFailureAsShutdown(() => { /* noop */ }, s); + s.emit('error', new Error('boom')); + expect(s.destroyed).toBe(true); + }); + + it('fires onTerminal at most once, even across error → close', () => { + const s = new PassThrough(); + let calls = 0; + treatStdinFailureAsShutdown(() => { calls++; }, s); + s.emit('error', new Error('boom')); // fire() also destroys → emits 'close' + s.emit('close'); // must not double-fire + s.emit('end'); + expect(calls).toBe(1); + }); +}); diff --git a/src/mcp/index.ts b/src/mcp/index.ts index fa939dfbb..9007ba6e9 100644 --- a/src/mcp/index.ts +++ b/src/mcp/index.ts @@ -50,6 +50,7 @@ import { import { connectWithHello, runLocalHandshakeProxy } from './proxy'; import { getDaemonSocketPath } from './daemon-paths'; import { supervisionLostReason } from './ppid-watchdog'; +import { treatStdinFailureAsShutdown } from './stdin-teardown'; import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags'; /** @@ -330,8 +331,10 @@ export class MCPServer { // Detect parent-process death — same logic as pre-refactor. When stdin // closes we go through StdioTransport's `process.exit(0)` already, but // SIGKILL of the parent doesn't reliably close stdin on Linux (#277). - process.stdin.on('end', () => this.stop()); - process.stdin.on('close', () => this.stop()); + // Also treat a stdin `'error'` (a socket-backed stdin can fail with + // ECONNRESET/hangup instead of a clean close) as shutdown, and destroy the + // stream so a hung fd can't busy-spin the event loop (#799). + treatStdinFailureAsShutdown(() => this.stop()); this.mode = 'direct'; this.installSignalHandlers(); diff --git a/src/mcp/proxy.ts b/src/mcp/proxy.ts index d18649678..2efe25a48 100644 --- a/src/mcp/proxy.ts +++ b/src/mcp/proxy.ts @@ -23,6 +23,7 @@ import * as net from 'net'; import { HOST_PPID_ENV } from '../extraction/wasm-runtime-flags'; import { DaemonClientHello, DaemonHello, MAX_HELLO_LINE_BYTES } from './daemon'; import { supervisionLostReason } from './ppid-watchdog'; +import { treatStdinFailureAsShutdown } from './stdin-teardown'; import { CodeGraphPackageVersion } from './version'; import { SERVER_INFO, PROTOCOL_VERSION } from './session'; import { SERVER_INSTRUCTIONS } from './server-instructions'; @@ -298,8 +299,11 @@ export async function runLocalHandshakeProxy(deps: LocalHandshakeDeps): Promise< } } }); - process.stdin.on('end', shutdown); - process.stdin.on('close', shutdown); + // Shut down when stdin ends/closes — and also on a stdin `'error'`, which a + // socket-backed stdin (the VS Code stdio shape) can emit on client death + // instead of a clean close; destroying the stream stops a hung fd from + // busy-spinning the event loop (#799). + treatStdinFailureAsShutdown(shutdown); startPpidWatchdogNoSocket(shutdown); // ---- daemon connection (background) ---- @@ -459,10 +463,16 @@ function pipeUntilClose(socket: net.Socket): Promise { try { socket.end(); } catch { /* ignore */ } done(); }); - process.stdin.on('close', () => { + // 'close' and 'error' both tear down: a socket-backed stdin can fail with + // an 'error' (ECONNRESET/hangup) rather than a clean close; destroying it + // stops a hung fd from busy-spinning the event loop (#799). + const teardown = () => { + try { process.stdin.destroy(); } catch { /* ignore */ } try { socket.destroy(); } catch { /* ignore */ } done(); - }); + }; + process.stdin.on('close', teardown); + process.stdin.on('error', teardown); socket.on('data', (chunk) => { try { process.stdout.write(chunk); } catch { /* ignore */ } diff --git a/src/mcp/stdin-teardown.ts b/src/mcp/stdin-teardown.ts new file mode 100644 index 000000000..1d60f7490 --- /dev/null +++ b/src/mcp/stdin-teardown.ts @@ -0,0 +1,46 @@ +/** + * Treat a stdin failure as a shutdown signal — issue #799. + * + * An MCP stdio server's lifeline is its stdin: when the host/client goes away, + * stdin should end and the server should exit. The server paths listened for + * `'end'` and `'close'` — but NOT `'error'`. + * + * That gap bites with a socket-backed stdin, which is the shape VS Code / + * Claude Code use (a socketpair, not a pipe). When the client dies, the socket + * can surface as an `'error'` (ECONNRESET / hangup) rather than a clean + * `'close'`. With no `'error'` listener, Node escalates it to the process-wide + * `uncaughtException` handler, which logs and keeps running — so the server + * orphans instead of exiting. Worse, on Linux a `POLLHUP` socket fd left + * registered in epoll wakes the event loop continuously, pinning a core at + * 100% CPU (the spin reported in #799); once the main thread spins, the + * `setInterval` PPID watchdog can't even fire, so the orphan runs forever. + * + * Fix: listen for `'error'` as well, and DESTROY the stdin stream on any + * terminal event so the fd leaves epoll and can't keep churning, then run the + * caller's shutdown. Fires `onTerminal` at most once — callers' shutdowns are + * already re-entry-guarded, but the single-shot guard also keeps `destroy()`'s + * follow-on `'close'` from re-invoking it. + * + * `stream` is injectable for tests; it defaults to `process.stdin`. + */ +export function treatStdinFailureAsShutdown( + onTerminal: () => void, + stream: NodeJS.ReadableStream = process.stdin +): void { + let fired = false; + const fire = (): void => { + if (fired) return; + fired = true; + // Drop the fd from epoll so a hung/half-closed socket can't keep waking + // the loop. Best-effort: the stream may already be torn down. + try { + (stream as Partial<{ destroy(): void }>).destroy?.(); + } catch { + /* already gone */ + } + onTerminal(); + }; + stream.on('end', fire); + stream.on('close', fire); + stream.on('error', fire); +} diff --git a/src/mcp/transport.ts b/src/mcp/transport.ts index aecc0368f..de1038be5 100644 --- a/src/mcp/transport.ts +++ b/src/mcp/transport.ts @@ -286,12 +286,24 @@ export class StdioTransport extends LineBasedJsonRpcTransport { await this.handleLine(line); }); - this.rl.on('close', () => { + // readline 'close' fires on a clean stdin EOF. But a socket-backed stdin + // (the VS Code stdio shape) can fail with an 'error' (ECONNRESET/hangup) + // that readline doesn't surface as 'close' — unhandled, it escalated to + // the global uncaughtException handler (which keeps running), orphaning + // the server and, on Linux, busy-spinning a POLLHUP fd at 100% CPU. Treat + // 'error' as terminal too, and destroy stdin so the fd leaves epoll (#799). + let closed = false; + const onStreamEnd = (): void => { + if (closed) return; + closed = true; + try { process.stdin.destroy(); } catch { /* already gone */ } this.opts.onClose(); if (this.opts.exitOnClose) { process.exit(0); } - }); + }; + this.rl.on('close', onStreamEnd); + process.stdin.on('error', onStreamEnd); } stop(): void { From 0df9246752692ff3cbfe5ed09fa2fd05fb0825fe Mon Sep 17 00:00:00 2001 From: Colby Mchenry Date: Thu, 11 Jun 2026 12:38:22 -0500 Subject: [PATCH 14/31] fix(extraction): capture & clean docstrings across all README languages (#780) (#806) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(extraction): capture docstrings for export/const/decorator-wrapped symbols (#780) getPrecedingDocstring walked previousNamedSibling from the EMITTED declaration node, so it only found a leading comment when the comment was a direct sibling of that node. For a declaration nested under a wrapper — `export class X` / `export const f = () => {}` (export_statement / lexical_declaration), a plain const arrow (variable_declarator), or a decorated Python def/class (decorated_definition) — the comment is a sibling of the WRAPPER, so the inner node had no preceding comment and the docstring was stored as NULL. Climb out through the wrapper node(s) before scanning for the comment. Each wrapper holds exactly one declaration, so this can't mis-attribute a comment to a sibling (verified: an uncommented method does NOT inherit its class's comment). Also strip leading `#` from Python/Ruby/shell line comments, which the cleanup chain missed (Python docstrings used to keep their `#`). Query/extraction-layer change to a parse helper; re-index to pick up docstrings on already-indexed files. Verified on the reporter's JS/TS and Python repros (8/8 now captured) plus over-walk controls; +3 tests. Co-Authored-By: Claude Opus 4.8 (1M context) * fix(extraction): clean comment markers across all supported languages (#780) Validating docstring capture across every README language surfaced that the marker cleanup only knew C-style `//` and `/* */`, plus the `#` added earlier this branch. Doc comments in other styles were captured but left their markers in the stored text: - Rust/Swift/Kotlin doc lines `///` and `//!` -> leading `/` / `!` leaked - Lua/Luau `--` and `--[[ ]]` -> not stripped - Pascal `{ }` and `(* *)` -> not stripped Extract the cleanup into cleanCommentMarkers() and handle every style. Paired block delimiters are stripped only when the comment OPENS with one, so a line comment that happens to end with `}` / `*)` / `]]` is never truncated; per-line markers stay anchored at line start. Validated end-to-end (extract -> index -> codegraph_node output) across all 19 tree-sitter code languages plus Svelte/Vue ` + +`; + const result = extractFromSource('Guard.astro', code); + + const templateRefs = result.unresolvedReferences.filter( + (r) => r.referenceKind === 'references' && r.referenceName === 'FakeComponent' + ); + expect(templateRefs).toHaveLength(0); + + // maybeCall/scriptCall come from the delegated TS extraction (once), + // not double-counted by the template scanner + const maybeCalls = result.unresolvedReferences.filter( + (r) => r.referenceName === 'maybeCall' && r.referenceKind === 'calls' + ); + expect(maybeCalls.length).toBeLessThanOrEqual(1); + }); + + it('should extract +`; + const result = extractFromSource('Tracker.astro', code); + + const fn = result.nodes.find((n) => n.kind === 'function' && n.name === 'trackView'); + expect(fn).toBeDefined(); + expect(fn?.startLine).toBe(6); + expect(fn?.language).toBe('astro'); + }); + + it('should create component node for a frontmatter-less template-only file', () => { + const code = `
Static content
+`; + const result = extractFromSource('Static.astro', code); + + const componentNode = result.nodes.find((n) => n.kind === 'component'); + expect(componentNode).toBeDefined(); + expect(componentNode?.name).toBe('Static'); + expect(componentNode?.language).toBe('astro'); + }); + + it('should treat an unclosed frontmatter fence as no frontmatter', () => { + const code = `--- +const broken = true; +
never closed
+`; + const result = extractFromSource('Broken.astro', code); + + // No TS delegation happened (the fence never closes), but the component + // node still exists and nothing throws. + const componentNode = result.nodes.find((n) => n.kind === 'component'); + expect(componentNode).toBeDefined(); + expect(result.nodes.find((n) => n.name === 'broken')).toBeUndefined(); + }); + + it('should create containment edges from component to frontmatter nodes', () => { + const code = `--- +const value = 42; +--- +
{value}
+`; + const result = extractFromSource('Contained.astro', code); + + const componentNode = result.nodes.find((n) => n.kind === 'component'); + expect(componentNode).toBeDefined(); + + const containEdges = result.edges.filter( + (e) => e.source === componentNode!.id && e.kind === 'contains' + ); + expect(containEdges.length).toBeGreaterThan(0); + }); +}); + describe('Instantiates + Decorates edge extraction', () => { it('emits an instantiates ref for `new Foo()`', () => { const code = ` diff --git a/__tests__/frameworks.test.ts b/__tests__/frameworks.test.ts index c0e874908..ff1abb57b 100644 --- a/__tests__/frameworks.test.ts +++ b/__tests__/frameworks.test.ts @@ -1373,6 +1373,7 @@ func boot(routes: RoutesBuilder) throws { import { reactResolver } from '../src/resolution/frameworks/react'; import { svelteResolver } from '../src/resolution/frameworks/svelte'; +import { astroResolver } from '../src/resolution/frameworks/astro'; describe('reactResolver.extract — React Router', () => { it('extracts a v6 }>', () => { @@ -1428,6 +1429,77 @@ describe('svelteResolver.extract (smoke)', () => { }); }); +describe('astroResolver.extract — src/pages file-based routing', () => { + const routeNames = (filePath: string): string[] => + astroResolver.extract!(filePath, '').nodes.filter((n) => n.kind === 'route').map((n) => n.name); + + it('maps index.astro to /', () => { + expect(routeNames('src/pages/index.astro')).toEqual(['/']); + }); + + it('maps nested index and plain pages', () => { + expect(routeNames('src/pages/blog/index.astro')).toEqual(['/blog']); + expect(routeNames('src/pages/about.astro')).toEqual(['/about']); + }); + + it('converts [param] and [...rest] syntax', () => { + expect(routeNames('src/pages/blog/[slug].astro')).toEqual(['/blog/:slug']); + expect(routeNames('src/pages/[...path].astro')).toEqual(['/*path']); + }); + + it('maps .ts endpoints under src/pages to routes', () => { + expect(routeNames('src/pages/api/posts.ts')).toEqual(['/api/posts']); + expect(routeNames('src/pages/rss.xml.js')).toEqual(['/rss.xml']); + }); + + it('excludes underscore-prefixed segments and config files', () => { + expect(routeNames('src/pages/_partial.astro')).toEqual([]); + expect(routeNames('src/pages/blog/_components/Card.astro')).toEqual([]); + expect(routeNames('src/pages/vite.config.ts')).toEqual([]); + }); + + it('ignores .astro files outside src/pages', () => { + expect(routeNames('src/components/Button.astro')).toEqual([]); + expect(routeNames('docs/pages/guide.astro')).toEqual([]); + }); +}); + +describe('astroResolver.resolve — Astro global and virtual modules', () => { + const ctx = {} as never; + const baseRef = { + fromNodeId: 'component:a', + line: 1, + column: 0, + filePath: 'src/pages/index.astro', + language: 'astro', + }; + + it('claims Astro.* global references as framework-provided', () => { + const res = astroResolver.resolve( + { ...baseRef, referenceName: 'Astro.props', referenceKind: 'references' } as never, + ctx + ); + expect(res?.resolvedBy).toBe('framework'); + expect(res?.confidence).toBe(1.0); + }); + + it('claims astro:content virtual module imports', () => { + const res = astroResolver.resolve( + { ...baseRef, referenceName: 'astro:content', referenceKind: 'imports' } as never, + ctx + ); + expect(res?.resolvedBy).toBe('framework'); + }); + + it('leaves ordinary names alone', () => { + const res = astroResolver.resolve( + { ...baseRef, referenceName: 'astrolabe', referenceKind: 'calls' } as never, + { getNodesByName: () => [] } as never + ); + expect(res).toBeNull(); + }); +}); + // Regression tests: commented-out and docstring route examples must NOT // surface as phantom route nodes. These would have failed before the // strip-comments wiring (the regex would happily scan comments/docstrings). diff --git a/__tests__/resolution.test.ts b/__tests__/resolution.test.ts index 47c6b9220..3059392d4 100644 --- a/__tests__/resolution.test.ts +++ b/__tests__/resolution.test.ts @@ -1438,6 +1438,47 @@ func main() { expect(callers.some((c) => c.node.filePath === 'src/Bar.svelte')).toBe(true); }); + it('links an .astro page to the component and TS util it uses (#768)', async () => { + // The canonical Astro shape: a page imports a layout/component in + // frontmatter and uses it as a template tag; the component's template + // calls an imported .ts util. Both hops must produce graph edges or + // an Astro project is invisible to callers/impact. + fs.mkdirSync(path.join(tempDir, 'src/components'), { recursive: true }); + fs.mkdirSync(path.join(tempDir, 'src/utils'), { recursive: true }); + fs.mkdirSync(path.join(tempDir, 'src/pages'), { recursive: true }); + fs.writeFileSync( + path.join(tempDir, 'src/utils/format.ts'), + `export function formatDate(d: Date): string { return d.toISOString(); }\n` + ); + fs.writeFileSync( + path.join(tempDir, 'src/components/PostCard.astro'), + `---\nimport { formatDate } from '../utils/format';\nconst { date } = Astro.props;\n---\n\n` + ); + fs.writeFileSync( + path.join(tempDir, 'src/pages/index.astro'), + `---\nimport PostCard from '../components/PostCard.astro';\n---\n\n` + ); + + cg = await CodeGraph.init(tempDir, { index: true }); + cg.resolveReferences(); + + // Hop 1: page → component (template tag through the frontmatter import) + const cardNode = cg + .getNodesByKind('component') + .find((n) => n.name === 'PostCard' && n.filePath === 'src/components/PostCard.astro'); + expect(cardNode).toBeDefined(); + const cardCallers = cg.getCallers(cardNode!.id); + expect(cardCallers.some((c) => c.node.filePath === 'src/pages/index.astro')).toBe(true); + + // Hop 2: component template call → .ts util + const fmtNode = cg + .getNodesByKind('function') + .find((n) => n.name === 'formatDate' && n.filePath === 'src/utils/format.ts'); + expect(fmtNode).toBeDefined(); + const fmtCallers = cg.getCallers(fmtNode!.id); + expect(fmtCallers.some((c) => c.node.filePath === 'src/components/PostCard.astro')).toBe(true); + }); + it('resolves a bare directory import (import { x } from "." / "./") to index.ts (#629)', async () => { // `import { helper } from '.'` (or './') must map to the // directory's index.ts before the re-export chase can run. The diff --git a/src/extraction/astro-extractor.ts b/src/extraction/astro-extractor.ts new file mode 100644 index 000000000..e38989375 --- /dev/null +++ b/src/extraction/astro-extractor.ts @@ -0,0 +1,365 @@ +import { Node, Edge, ExtractionResult, ExtractionError, UnresolvedReference } from '../types'; +import { generateNodeId } from './tree-sitter-helpers'; +import { TreeSitterExtractor } from './tree-sitter'; +import { isLanguageSupported } from './grammars'; + +/** + * Astro built-in components — compiler-provided (``) or shipped by + * `astro:components` (``, ``), not user code. + */ +const ASTRO_BUILTIN_COMPONENTS = new Set(['Fragment', 'Code', 'Debug']); + +/** + * AstroExtractor - Extracts code relationships from Astro component files + * + * Astro files are multi-language: a TypeScript frontmatter block fenced by + * `---` lines, a JSX-like HTML template, and optional