Skip to content

feat(agent): serve context sources and live MCP servers over the agent socket#26526

Draft
kylecarbs wants to merge 1 commit into
mainfrom
kylecarbs/agent-context-socket
Draft

feat(agent): serve context sources and live MCP servers over the agent socket#26526
kylecarbs wants to merge 1 commit into
mainfrom
kylecarbs/agent-context-socket

Conversation

@kylecarbs

@kylecarbs kylecarbs commented Jun 18, 2026

Copy link
Copy Markdown
Member

Splits the agent-only changes out of #26466 so they can be reviewed and merged independently. There are no changes outside agent/.

This adds the agent-side capabilities the chat-context work in #26466 builds on:

  • Agent socket context API. New ContextSources, GetContextSource, AddContextSource, RemoveContextSource, GetContextSnapshot, and ResyncContext RPCs (proto, generated code, client, and service), wired through WithContextManager. The socket returns a clean error when no context manager is attached.
  • Live MCP servers in the context snapshot. The resolver takes a non-blocking MCP tool source (ManagerOptions.MCPTools, a func() []workspacesdk.MCPToolInfo) and folds the tools into one KindMCPServer resource per server: grouping by server, sorting servers and tools deterministically, and hashing each server's tool set so any change flips the snapshot's aggregate hash. The conversion from MCP tools to context resources lives in agentcontext itself. The agent backs the source with a small tool cache that syncMCPServersFromContext refreshes from the MCP manager on each context change, deduping by both the discovered .mcp.json path set and the tool set so the re-resolve it triggers cannot loop.
  • MCP participates in the snapshot hash. .mcp.json files get lightweight structural validation (surfaced as StatusInvalid), and MCP resources contribute to the snapshot aggregate hash like any other resource. Deciding when a connecting server should (or should not) re-pin a chat is left to the coderd layer in a follow-up PR.

Notably, this needs no changes to agent/x/agentmcp: the resolver reads the manager's existing Tools cache rather than adding new accessors or callbacks.

Part of #26466. The coderd, codersdk, CLI, and frontend pieces follow in separate PRs.

Design notes
  • In-package conversion, no interface. Because the MCP-to-Resource conversion and hashing now live in agentcontext (mcp.go), there is no cross-package boundary to abstract: the resolver holds a plain MCPToolSource (func() []workspacesdk.MCPToolInfo) and calls buildMCPServerResources directly. An earlier revision wrapped this in an MCPProvider interface; that indirection was removed once the conversion moved in-package.
  • Event-driven refresh, no poller. agentmcp.Manager.Tools reloads synchronously when a .mcp.json snapshot changes, so syncMCPServersFromContext can refresh tools off the existing context-change signal instead of a callback or timer. It publishes new tools into an atomic.Pointer cache and Triggers a re-resolve; deduping by tool-set equality stops the trigger from looping.
  • Hashing. Per-server content hashes reuse the resolver's length-prefixed framing (writeLengthPrefixed) and resourceID scheme so MCP resources sort and hash consistently with filesystem resources.
  • Provenance. The agent socket and agentcontext validation pieces are taken from feat: consume pinned chat context, and surface drift, diff, refresh (UI + CLI) #26466 (head a96951b); the MCP wiring was reworked for this split to drop the agentmcp changes and the drift-hash exclusion. make pre-commit (gen, fmt, lint, build) passes locally with no make gen diff.

This PR was created by Coder Agents on behalf of @kylecarbs. Split from #26466.

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Chat: Review in progress | View chat
Requested: 2026-06-18 18:38 UTC by @kylecarbs
Spend: $85.74 / $100.00

Review history
  • R1 (2026-06-18): 21 reviewers, 4 Nit, 2 Note, 2 P2, 5 P3, COMMENT. Review
  • R2 (2026-06-18): 6 reviewers, 6 Nit, 2 Note, 2 P2, 8 P3, COMMENT. Review

deep-review v0.8.0 | Round 3 | 023a1c9..5d4b950

Status: Churn guard

Last posted: Round 2, 18 findings (2 P2, 8 P3, 6 Nit, 2 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 Nit Author fixed (a9283c6) configwatcher_internal_test.go:35 Comment references deleted function name cachedTools (should be CachedTools) R1 Netero Yes
CRF-2 Note Author fixed (a9283c6) contextmcp.go:115 resourceID duplicated across packages (contextmcp.go and resolve.go) R1 Netero, Robin P3, Knov Note Yes
CRF-3 P2 Author fixed (a9283c6) contextmcp.go:119 hashMCPServer/hashMCPServerError comments falsely claim hashes flip the aggregate drift hash R1 Leorio P2, Razor P3 Yes
CRF-4 P2 Author fixed (a9283c6) manager.go:568 Server map and health map updated in separate lock acquisitions; CachedServers sees inconsistent state R1 Hisoka P2, Meruem P3 Yes
CRF-5 P3 Author fixed (a9283c6) manager.go:848 CachedServers returns in nondeterministic map-iteration order R1 Komugi Yes
CRF-6 P3 Author fixed (a9283c6) context_test.go:177 NoManagerErrors test exercises only 1 of 6 context RPCs R1 Chopper Yes
CRF-7 P3 Author fixed (a9283c6) contextmcp.go:130 hashMCPServer silently drops schema from hash when json.Marshal fails R1 Hisoka P3, Chopper P3 Yes
CRF-8 P3 Author fixed (a9283c6) contextmcp.go:157 writeHashField duplicates writeLengthPrefixed from resolve.go:1062 R1 Zoro P3, Robin Nit Yes
CRF-9 P3 Author fixed (a9283c6) agent.go:1558 8-line inline comment duplicates syncMCPServersFromContext's doc comment 22 lines below R1 Gon Yes
CRF-10 Nit Author fixed (a9283c6) agent.go:1642 Map-to-sorted-slice could use slices.Sorted(maps.Keys(seen)) R1 Ging-Go Yes
CRF-11 Nit Author fixed (a9283c6) manager.go:163 ServerStatus.Err is Error everywhere else R1 Gon Yes
CRF-12 Nit Author fixed (a9283c6) agentsocket.proto:107 ContextResource comment says "minus the payload" but also omits Tools R1 Razor Yes
CRF-13 Note Author accepted R2 (verbose comments trimmed where rework removed them; kept mcp.go comments for determinism/hashing contracts) contextmcp.go:39 Comment verbosity pattern: 30 of 54 comments longer than minimum R1 Gon Yes
CRF-14 Note Dropped by orchestrator (structural observation, not actionable defect) agent.go:1588 syncMCPServersFromContext wakes on every snapshot broadcast including irrelevant tool changes R1 Meruem P3, Ryosuke P3 No
CRF-15 Note Dropped by orchestrator (correct by default: new kinds hash unless excluded) resolve.go:1012 driftResources is a manually maintained allow-list R1 Meruem, Chopper No
CRF-16 P4 Dropped by orchestrator (transport-only consumer today; premature to move) service.go:30 ContextManager interface lives in transport layer instead of domain layer R1 Ryosuke No
CRF-17 P3 Open mcp.go:82 buildMCPServerResources drops Required from MCPToolInfo; required-field changes invisible to content hash R2 Meruem P3, Razor P3 Yes
CRF-18 P3 Open agent.go:1622 lastPaths set before Reload succeeds; transient failure permanently skips reload for those paths R2 Hisoka Yes
CRF-19 P3 Open service.go:309 contextSnapshotToProto duplicates snapshotResponse conversion; Name field already drifted R2 Robin Yes
CRF-20 Nit Open mcp.go:92 Map-to-sorted-slice manual pattern; CRF-10 fix not applied to new instance R2 Robin Yes
CRF-21 Nit Open resolve.go:510 validateMCPConfig iterates map nondeterministically; error message varies for multi-entry invalid files R2 Komugi (downgraded from P3) Yes

Law analysis

Effective LOC: 1476 (813 production, 663 test, 1429 generated). Head SHA: 0c574da. Verdict: Don't split. Enforcement: Advisory.
Reasoning: Single risk domain (all under agent/), moderate effective LOC, already a split of #26466, layered dependent concerns with clean topological order.

Contested and acknowledged

CRF-13 (Note, contextmcp.go:39) - Comment verbosity pattern

  • Finding: 30 of 54 production comments are longer than the minimum text needed to deliver their category. Proposed trimming each to minimum draft.
  • Author accepted: Partly addressed by the rework (verbose contextmcp.go/driftResources blocks gone). Remaining mcp.go comments kept deliberately because they document non-obvious determinism and hashing contracts.

Round log

Round 1

Panel (20 reviewers). 2 P2, 5 P3, 3 Nit, 2 Note new. 3 dropped. Reviewed against 023a1c9..0c574da.

Round 2

Churn guard: PROCEED. 12 addressed, 1 acknowledged. Panel (6 reviewers). 3 P3, 2 Nit new. 0 dropped. Reviewed against 023a1c9..a9283c6.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured agent-side feature. The provider seam, drift-hash exclusion, and feedback-loop dedup are clean. The test suite (44.9% density) covers every new code path including real subprocess failures and hash sensitivity. Particularly nice: the slices.Equal(paths, last) guard that prevents reload storms from MCP tool broadcasts, and the default-include switch in driftResources that makes new resource kinds hash-safe by default.

Severity count: 2 P2, 5 P3, 3 Nit, 2 Note.

The two P2s are (1) false claims in doc comments about aggregate-hash participation, and (2) a stale-data window between installServers and recordServerHealth that Resync can observe. Both are fixable in-place.

"Two tools identical in name and description but differing only in schema would hash identically, producing a false negative in drift detection." (Hisoka, on the silent schema drop)


agent/x/agentmcp/configwatcher_internal_test.go:35

Nit [CRF-1] The awaitTools doc comment still says cachedTools after the rename to CachedTools.

(Netero)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread agent/contextmcp.go Outdated
Comment thread agent/x/agentmcp/manager.go Outdated
Comment thread agent/x/agentmcp/manager.go Outdated
Comment thread agent/agentsocket/context_test.go
Comment thread agent/contextmcp.go Outdated
Comment thread agent/agent.go Outdated
Comment thread agent/agent.go Outdated
Comment thread agent/x/agentmcp/manager.go Outdated
Comment thread agent/agentsocket/proto/agentsocket.proto
Comment thread agent/contextmcp.go Outdated
@kylecarbs kylecarbs force-pushed the kylecarbs/agent-context-socket branch 2 times, most recently from 025644d to a9283c6 Compare June 18, 2026 17:55
@kylecarbs

Copy link
Copy Markdown
Member Author

Reworked the MCP wiring and addressed the prior review (all 12 threads replied to and resolved).

What changed since the first push

  • No changes to agent/x/agentmcp. That package is reverted to main; the provider reads its existing Tools() cache instead of new CachedServers/SetOnToolsChanged accessors. This dissolves both P2s (the doc-comment false claims and the installServers/recordServerHealth cross-lock race).
  • MCP-to-Resource conversion moved into agent/agentcontext/mcp.go, next to the MCPProvider seam, built from a non-blocking func() []workspacesdk.MCPToolInfo source. It reuses the package's resourceID/writeLengthPrefixed helpers (no duplication).
  • driftResources removed: MCP resources now participate in the snapshot aggregate hash normally. Deciding when a connecting server re-pins a chat moves to the coderd layer in a follow-up.
  • Applied the surviving P3/Nits directly (all-six-RPC nil-manager test, slices.Sorted(maps.Keys), marshal-error sentinel, trimmed comment, proto comment).

Force-pushed as a single clean commit (a9283c6). make pre-commit passes with no make gen diff.

This update was made by Coder Agents on behalf of @kylecarbs.

@kylecarbs

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean rework. The agentmcp revert eliminates both R1 P2s (the false-claim comments and the cross-lock race), and the MCP-to-Resource conversion landing in agentcontext dissolves the duplication findings. All 13 R1 findings are resolved.

The reworked code is well-structured: the provider reads the manager's existing Tools cache via an atomic.Pointer, the sync loop deduplicates by both path-set and tool-set equality, and the marshal-error sentinel from CRF-7 is properly carried over.

R2 severity count: 3 P3, 2 Nit.

"When a tool changes which parameters are required without changing property definitions, hashMCPServer produces the same hash." (Meruem, on the Required gap)

🤖 This review was automatically generated with Coder Agents.

Comment thread agent/agentcontext/mcp.go
if t.ServerName == "" {
continue
}
byServer[t.ServerName] = append(byServer[t.ServerName], MCPTool{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-17] buildMCPServerResources converts MCPToolInfo to MCPTool, mapping Name, Description, and Schema (properties) but dropping Required []string. hashMCPServer covers only MCPTool fields, so when a tool changes which parameters are required without any other schema change, the server's ContentHash and the aggregate hash stay the same.

The sync loop does detect the change: mcpToolsChanged uses reflect.DeepEqual on the full MCPToolInfo (which includes Required), so a re-resolve fires and the snapshot version increments. But the hashes don't flip, so hash-based consumers (e.g. coderd's push path in follow-up PRs) would miss the update.

Fix: add Required []string to MCPTool, populate it here, and include it in hashMCPServer's hash computation.

(Meruem P3, Razor P3)

🤖

Comment thread agent/agent.go
lastPaths []string
lastTools []workspacesdk.MCPToolInfo
)
refresh := func() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-18] lastPaths = paths is committed before Reload at line 1623. If Reload fails (e.g. manager closing during a shutdown race), the next wake sees equal paths and short-circuits, permanently skipping the explicit reload for those paths. Self-healing requires a file edit on disk that changes the discovered path set.

In practice this bites only during shutdown races, and the agent is tearing down anyway. But the fix is trivial: set lastPaths after Reload succeeds, or clear it on error.

if !slices.Equal(paths, lastPaths) {
    if err := a.mcpManager.Reload(ctx, paths); err != nil {
        a.logger.Warn(ctx, "failed to reload workspace MCP servers", slog.Error(err))
    } else {
        lastPaths = paths
    }
}

(Hisoka)

🤖

// contextSnapshotToProto converts an agentcontext.Snapshot to its on-wire
// form. Payload bytes are intentionally omitted; they reach coderd via the
// drpc PushContextState path.
func contextSnapshotToProto(s agentcontext.Snapshot) *proto.ContextSnapshot {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-19] contextSnapshotToProto and snapshotResponse at agentcontext/api.go:180 are structurally identical converters from agentcontext.Snapshot. The Name field has already drifted: contextSnapshotToProto maps it (line 327), but snapshotResponse's SnapshotResource type lacks it. Adding a field to Resource now requires updating two converters in two packages.

A shared resourceToWire helper or at minimum syncing the field sets would prevent further drift.

(Robin)

🤖

Comment thread agent/agentcontext/mcp.go
return nil
}

servers := make([]string, 0, len(byServer))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-20] slices.Sorted(maps.Keys(byServer)) replaces these four lines. CRF-10 fixed this pattern in mcpConfigPaths (same PR), but this new instance uses the old manual form.

(Robin)

🤖

// entry is a structural error the MCP manager would reject.
// The top-level Unmarshal above already rejects malformed JSON,
// so a well-formed value starting with '{' is a complete object.
for name, raw := range shape.MCPServers {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-21] validateMCPConfig iterates shape.MCPServers in nondeterministic map order. When a .mcp.json has multiple invalid entries, the error message varies between resolves. Status (StatusInvalid) and the aggregate hash are unaffected (hash does not include Error), so this is cosmetic only. Sorting the keys before iterating would make the error stable.

(Komugi)

🤖

…t socket

Split from #26466 (agent-only changes); no behavior outside agent/.

- agentsocket: add context source CRUD (ContextSources, GetContextSource,
  AddContextSource, RemoveContextSource) plus GetContextSnapshot and
  ResyncContext RPCs, with matching client methods and proto. The server
  takes the context Manager via WithContextManager and returns a clean
  error when it is absent.
- agentcontext: take a non-blocking MCP tool source via
  ManagerOptions.MCPTools and validate .mcp.json structure (StatusInvalid
  on malformed). The resolver folds the tools into one KindMCPServer
  resource per server, grouping by server, sorting deterministically, and
  hashing the tool set so any change flips the snapshot's aggregate hash.
  The conversion from MCP tools to context resources lives in this package
  (no provider interface, since it is no longer cross-package).
- agent: back the MCP tool source with a non-blocking cache that
  syncMCPServersFromContext refreshes from the MCP manager on each context
  change, deduping by the discovered .mcp.json path set and by the tool
  set so the re-resolve it triggers does not loop. No changes to
  agent/x/agentmcp.
@kylecarbs kylecarbs force-pushed the kylecarbs/agent-context-socket branch from a9283c6 to 5d4b950 Compare June 18, 2026 18:37
@kylecarbs

Copy link
Copy Markdown
Member Author

Follow-up simplification (per review feedback): removed the MCPProvider interface seam now that the MCP-to-Resource conversion lives in agentcontext itself. The resolver takes a plain MCPToolSource (func() []workspacesdk.MCPToolInfo) and calls buildMCPServerResources in-package — no NewMCPProvider/mcpToolProvider indirection. Tests were updated to drive the resolver through the tool source; the byte-cap test became a count-cap test (MCP resources carry Tools, not Payload). Pushed as 5d4b950.

This update was made by Coder Agents on behalf of @kylecarbs.

@kylecarbs

Copy link
Copy Markdown
Member Author

/coder-agents-review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant