Skip to content

feat(coderd): copy agent context resources into the per-chat pin#26438

Merged
kylecarbs merged 4 commits into
mainfrom
kylecarbs/chat-context-copy-on-pin
Jun 17, 2026
Merged

feat(coderd): copy agent context resources into the per-chat pin#26438
kylecarbs merged 4 commits into
mainfrom
kylecarbs/chat-context-copy-on-pin

Conversation

@kylecarbs

Copy link
Copy Markdown
Member

What

Populates chat_context_resources (the per-chat pinned copy added in #26430) by copying from workspace_agent_context_resources at the points where a chat's context_aggregate_hash is set, in the same transaction, so the pinned hash and pinned bodies always agree. No prompt-building change yet; consuming the pinned copy in prepareGeneration is a later, experiment-gated PR.

How

  • HydrateAgentChatsContext now hydrates NULL-hash chats and copies the agent's resources onto them in one statement (a data-modifying CTE), so the chat-create and agent-push paths need no Go change.
  • New queries InsertAgentContextResourcesIntoChat, DeleteChatContextResources, ListChatContextResources, each with a hand-written dbauthz wrapper (per-chat update/read) and a MethodTestSuite entry.
  • RefreshChatContext re-pins resources via a shared repinChatContext helper (clear-then-copy in a transaction). A dirty chat keeps its old bodies until refresh.
  • On agent rebind (e.g. a workspace rebuild produces a new agent), the chat's context is re-pinned to the new agent so it stops injecting the previous agent's resources. Best-effort: a context error never fails the binding.

Invariant

A chat's chat_context_resources always correspond to its context_aggregate_hash. Bodies are (re)written only when the hash is set (hydrate, refresh, rebind); a dirty chat keeps its old bodies until refresh.

Testing

Extends the context integration test to push real resources and assert the copy across hydrate, dirty (no re-copy), and refresh. The dbauthz MethodTestSuite covers the three new methods.

Why clear-then-copy (two statements)

The refresh/rebind re-pin clears the chat's rows then inserts the agent's. It uses two sequential statements inside the transaction rather than a single WITH cleared AS (DELETE ...) INSERT ..., because a data-modifying CTE cannot see its own delete under snapshot isolation, so overlapping sources (the common case: the same files re-pinned) would collide on the (chat_id, source) primary key. The hydrate path inserts into never-pinned (NULL-hash) chats and uses ON CONFLICT DO UPDATE defensively.

Follow-ups
  • prepareGeneration consuming the pinned instructions and skills (experiment-gated).
  • codersdk.ChatContext resources plus changed diff, and the frontend indicator/refresh.
  • Removing the per-turn pull and last_injected_context.

This PR was created by Coder Agents on behalf of @kylecarbs. Builds on #26430.

Populate chat_context_resources (the per-chat pinned copy added in
#26430) by copying from workspace_agent_context_resources at the points
where a chat's context_aggregate_hash is set, in the same transaction,
so the pinned hash and pinned bodies always agree. No prompt-building
change yet; that is a later, experiment-gated PR.

- HydrateAgentChatsContext now hydrates NULL-hash chats AND copies the
  agent's resources onto them in one statement, so the chat-create and
  agent-push paths need no Go change.
- Add InsertAgentContextResourcesIntoChat, DeleteChatContextResources,
  and ListChatContextResources, with dbauthz wrappers (per-chat
  update/read) and MethodTestSuite entries.
- RefreshChatContext re-pins resources via a shared repinChatContext
  helper (clear-then-copy in a transaction). A dirty chat keeps its old
  bodies until refresh.
- On agent rebind (e.g. workspace rebuild), re-pin the chat's context to
  the new agent so it stops injecting the previous agent's resources.
  Best-effort: a context error never fails the binding.

Extends the context integration test to push real resources and assert
the copy across hydrate, dirty (no re-copy), and refresh.

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-06-17 06:25 UTC by @kylecarbs
Spend: $85.71 / $100.00

Review history
  • R1 (2026-06-16): 14 reviewers, 3 Nit, 1 P2, 1 P3, COMMENT. Review
  • R2 (2026-06-16): 8 reviewers, 3 Nit, 1 P2, 2 P3, 1 P4, COMMENT. Review
  • R3 (2026-06-16): 8 reviewers, 3 Nit, 1 P2, 2 P3, 1 P4, APPROVE. Review
  • R4 (2026-06-17): 6 reviewers, 3 Nit, 1 P2, 2 P3, 1 P4, APPROVE. Review

deep-review v0.7.1 | Round 4 | 53a6459..f0b1591

Last posted: Round 4, 7 findings (1 P2, 2 P3, 1 P4, 3 Nit), APPROVE. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 Nit Author fixed (ce6f7ed) chats.sql:1563 DeleteChatContextResources and ListChatContextResources omit ByChatID suffix R1 Netero Yes
CRF-2 P2 Author fixed (ce6f7ed) chatd.go:716 Agent rebind re-pin path has zero test coverage R1 Bisky P2, Hisoka P2, Meruem P2, Mafu-san P3, Mafuuu P3, Pariston P3, Kite P3, Chopper P3, Knov P3, Zoro P3 Yes
CRF-3 P3 Author fixed (ce6f7ed) context_hydration.go:184 RefreshChatContext uses stale AgentID on ReadModifyUpdate retry after concurrent rebind R1 Hisoka Yes
CRF-4 Nit Author fixed (ce6f7ed) chats.sql:1503 "Does not bump updated_at" ambiguous now that statement writes two tables R1 Leorio Yes
CRF-5 Nit Author fixed (ce6f7ed) context_integration_test.go:126 nolint comment misidentifies row ownership R1 Chopper Yes
CRF-6 P3 Author fixed (83aa54d) context_hydration.go:102 hydrateChatContextOnCreate reads snapshot hash outside CTE; concurrent push can leave hash v1 with resources v2 R2 Mafuuu Yes
CRF-7 P4 Author fixed (83aa54d) context_hydration.go:158 repinChatContext no-snapshot clearing branch untested R2 Bisky Yes

Contested and acknowledged

(none)

Round log

Round 1

Netero pass: 1 Nit. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Knuckle, Ging-Go, Gon, Leorio, Meruem, Kite, Chopper, Knov, Zoro. 1 P2, 1 P3, 3 Nits. Reviewed against 53a6459..7e6519e.

Round 2

CRF-1 through CRF-5 addressed. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Meruem, Kite. 1 P3, 1 P4 new. Reviewed against 53a6459..ce6f7ed.

Round 3

CRF-6 and CRF-7 addressed. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Meruem, Knov. 0 new findings. All 7 prior findings verified fixed. Reviewed against 53a6459..83aa54d.

Round 4

No open findings. Delta: 17-line test mock helper. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper. 0 new findings. Reviewed against 53a6459..f0b1591.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean work overall. The CTE in HydrateAgentChatsContext that atomically stamps the hash and copies resources is well-designed, the clear-then-copy rationale is sound (snapshot isolation prevents a single CTE from seeing its own DELETE), and the negative test assertion that dirty marking does NOT re-copy resources guards the invariant precisely. The comment quality and PR description are thorough.

Severity count: 1 P2, 1 P3, 3 Nits.

The P2 is a convergent finding from the full panel (12 of 14 reviewers flagged it independently): the agent-rebind re-pin path ships with zero test coverage. The shared repinChatContext helper is proven through the refresh path, but the rebind-specific wiring (condition guard, AsChatd escalation, ReadModifyUpdate wrapper, error swallowing) is not exercised.

The P3 is a narrow race in RefreshChatContext: the function captures chat.AgentID from the HTTP middleware read outside the ReadModifyUpdate closure, so on a serialization conflict retry after a concurrent rebind, the retry re-pins using the old agent's ID. Self-heals on the next push, but the refresh actively undoes correct state.

"A data-modifying CTE cannot see its own delete under snapshot isolation, so overlapping sources would collide on the (chat_id, source) primary key." Good catch documenting this; it's the kind of subtle Postgres behavior that trips up future editors.

Notes for awareness: ListChatContextResources has no production caller yet (documented as follow-up). The test asserts resource metadata (hash, kind, status) but not the Body payload; hash uniqueness makes this safe today, but asserting one representative body round-trip would prove the feature, not just the plumbing. Two structurally distinct hydration mechanisms (single CTE vs. three-statement transaction) maintain the same invariant from different layers; the function-level comments document this well, but the constraint is not visible from either layer alone.

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatd.go
Comment thread coderd/x/chatd/context_hydration.go Outdated
Comment thread coderd/database/queries/chats.sql Outdated
Comment thread coderd/database/queries/chats.sql Outdated
Comment thread coderd/x/chatd/context_integration_test.go Outdated
- Cover the agent-rebind re-pin path in persistBuildAgentBinding: a
  real-DB test asserts the pinned hash and resources switch to the new
  agent, a guard test asserts a first-time binding does not re-pin, and
  a mock test asserts a re-pin failure is swallowed so it never fails
  the binding. (CRF-2)
- RefreshChatContext re-reads the chat inside the ReadModifyUpdate
  closure so a serialization-conflict retry re-pins against the chat's
  current agent instead of a stale captured AgentID. (CRF-3)
- Rename DeleteChatContextResources and ListChatContextResources to
  DeleteChatContextResourcesByChatID and ListChatContextResourcesByChatID
  to match the filtered-query naming convention. (CRF-1)
- Clarify the HydrateAgentChatsContext comment: it does not bump
  chats.updated_at, but the resource upsert sets
  chat_context_resources.updated_at on rewritten rows. (CRF-4)
- Correct the integration test nolint comment: the rows are chat-owned,
  read as the chatd subject because the test carries no per-user actor.
  (CRF-5)

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All five R1 findings addressed cleanly in ce6f7ed. The rebind test suite (context_rebind_internal_test.go) is well-structured: real-DB for the happy path, mock for error injection, and a guard-condition case. The CRF-3 fix re-reads the chat inside the ReadModifyUpdate closure with a clear comment explaining why.

Severity count for new findings: 1 P3, 1 P4.

The P3 is a narrow TOCTOU in hydrateChatContextOnCreate: the hash is read from workspace_agent_context_snapshots in Go, then passed as a parameter to the CTE, which reads resources from workspace_agent_context_resources at statement time. A concurrent push that commits between the two calls can leave a chat with hash v1 but resources v2. Self-heals on the next push (hash mismatch triggers dirty marking). Resources are not consumed yet, so no user-visible impact today, but the stated invariant is violated.

The P4 is a minor test coverage gap: repinChatContext's "no snapshot" clearing branch (agent exists but has not pushed) is not exercised by any test.

"The push's HydrateAgentChatsContext CTE runs before the new chat is committed, so it doesn't see the chat. The push commits. The on-create CTE runs at READ COMMITTED and copies v2 resources, but stamps v1 hash." (Mafuuu, tracing the interleaving that makes this race possible)


coderd/x/chatd/context_hydration.go:102

P3 [CRF-6] hydrateChatContextOnCreate reads the snapshot hash outside the CTE, so a concurrent push can leave a chat pinned to hash v1 with resources v2.

latestAgentSnapshot (line 102) reads the hash from workspace_agent_context_snapshots. The CTE at line 111 stamps that hash as a Go parameter but copies resources from workspace_agent_context_resources at statement time. A concurrent push that commits between the two calls produces a mismatch:

  1. Agent has pushed v1. latestAgentSnapshot reads v1.
  2. Push v2 starts. Its HydrateAgentChatsContext CTE runs before the new chat is committed, so it misses the chat.
  3. The new chat commits.
  4. Push v2 commits (v2 resources and v2 snapshot now visible).
  5. The on-create CTE runs at READ COMMITTED: stamps v1 hash (parameter), copies v2 resources (now committed).

The chat has hash v1, resources v2, and no dirty marker. Self-heals when push v3 marks the chat dirty (v1 != v3). The resources are strictly newer (v2), so no user sees worse data. But the stated invariant ("A chat's chat_context_resources always correspond to its context_aggregate_hash") is violated for the interval between create-time hydration and the next push.

Fix: have the CTE join workspace_agent_context_snapshots to read the hash directly, so hash and resources come from the same statement snapshot, or wrap both reads in a lightweight transaction. (Mafuuu)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/context_hydration.go
…-pin

Round 2 review fixes:

- hydrateChatContextOnCreate now reads the snapshot hash and copies the
  agent's resources in one repeatable-read transaction, so a concurrent
  push cannot commit between the two and leave a chat stamped with one
  snapshot's hash but another snapshot's resources. (CRF-6)
- Add a test for repinChatContext's no-snapshot branch: rebinding to an
  agent that has not pushed yet clears the chat's pinned hash and
  resources. (CRF-7)
@kylecarbs

Copy link
Copy Markdown
Member Author

Re CRF-6 (hydrateChatContextOnCreate create-time race): fixed in 83aa54d. The snapshot-hash read and the resource-copy CTE now run in one repeatable-read transaction, so a concurrent push cannot commit between them and leave a chat stamped with one snapshot's hash but another snapshot's resources. The NULL-hash guard inside the statement still keeps a concurrent push that already hydrated the chat from being clobbered.

(CRF-6 was reported in the review summary rather than as a line thread, hence this top-level reply.)

Coder Agents on behalf of @kylecarbs.

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 7 findings from rounds 1 and 2 verified fixed across three review rounds. No new findings from the round 3 panel (8 reviewers, unanimous clean).

The PR delivers a well-designed resource-pinning lifecycle: the CTE atomically stamps hash and copies resources for bulk hydration, the clear-then-copy helper handles single-chat re-pin for refresh and rebind, and every write path runs inside appropriate transaction boundaries. The CRF-6 fix (wrapping create-time hydration in ReadModifyUpdate) closes the last TOCTOU window. Test density is 67.9% with four distinct testing strategies: E2E integration through real agent RPC, real-DB rebind through internal test, mock-based error injection, and dbauthz authorization checks.

"I tried to build a case against this change and could not. The problem is correctly understood, the solution is proportional to the problem, and the fix is at the right causal level." (Pariston)

🤖 This review was automatically generated with Coder Agents.

@kylecarbs kylecarbs marked this pull request as ready for review June 17, 2026 06:02
The dbmock-based agent-switch tests drive persistBuildAgentBinding, which
fires a best-effort context re-pin whenever a turn rebinds a chat to a
different agent. That re-pin opens a transaction via database.ReadModifyUpdate,
so the mock store now needs to expect InTx and the no-snapshot clear path.

Add a shared expectBestEffortContextRepin helper that registers AnyTimes
mocks for InTx, GetLatestWorkspaceAgentContextSnapshot (ErrNoRows),
SetChatContextSnapshot, and DeleteChatContextResourcesByChatID, and wire it
into the three affected tests. The re-pin behavior itself stays covered by
TestPersistBuildAgentBindingRepinsContext.

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delta since R3 is a 17-line test mock helper (expectBestEffortContextRepin) that lets three existing agent-switch tests work with the new rebind re-pin behavior. No production code changed. All 7 prior findings remain verified fixed. 6 reviewers, unanimous clean.

This PR is ready to merge.

🤖 This review was automatically generated with Coder Agents.

@kylecarbs kylecarbs merged commit 1c78bd8 into main Jun 17, 2026
29 checks passed
@kylecarbs kylecarbs deleted the kylecarbs/chat-context-copy-on-pin branch June 17, 2026 07:10
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants