feat(coderd): copy agent context resources into the per-chat pin#26438
Conversation
Populate chat_context_resources (the per-chat pinned copy added in #26430) by copying from workspace_agent_context_resources at the points where a chat's context_aggregate_hash is set, in the same transaction, so the pinned hash and pinned bodies always agree. No prompt-building change yet; that is a later, experiment-gated PR. - HydrateAgentChatsContext now hydrates NULL-hash chats AND copies the agent's resources onto them in one statement, so the chat-create and agent-push paths need no Go change. - Add InsertAgentContextResourcesIntoChat, DeleteChatContextResources, and ListChatContextResources, with dbauthz wrappers (per-chat update/read) and MethodTestSuite entries. - RefreshChatContext re-pins resources via a shared repinChatContext helper (clear-then-copy in a transaction). A dirty chat keeps its old bodies until refresh. - On agent rebind (e.g. workspace rebuild), re-pin the chat's context to the new agent so it stops injecting the previous agent's resources. Best-effort: a context error never fails the binding. Extends the context integration test to push real resources and assert the copy across hydrate, dirty (no re-copy), and refresh.
|
/coder-agents-review |
|
Chat: Review posted | View chat Review historydeep-review v0.7.1 | Round 4 | Last posted: Round 4, 7 findings (1 P2, 2 P3, 1 P4, 3 Nit), APPROVE. Review Finding inventoryFindings
Contested and acknowledged(none) Round logRound 1Netero pass: 1 Nit. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Knuckle, Ging-Go, Gon, Leorio, Meruem, Kite, Chopper, Knov, Zoro. 1 P2, 1 P3, 3 Nits. Reviewed against 53a6459..7e6519e. Round 2CRF-1 through CRF-5 addressed. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Meruem, Kite. 1 P3, 1 P4 new. Reviewed against 53a6459..ce6f7ed. Round 3CRF-6 and CRF-7 addressed. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Meruem, Knov. 0 new findings. All 7 prior findings verified fixed. Reviewed against 53a6459..83aa54d. Round 4No open findings. Delta: 17-line test mock helper. Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper. 0 new findings. Reviewed against 53a6459..f0b1591. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Clean work overall. The CTE in HydrateAgentChatsContext that atomically stamps the hash and copies resources is well-designed, the clear-then-copy rationale is sound (snapshot isolation prevents a single CTE from seeing its own DELETE), and the negative test assertion that dirty marking does NOT re-copy resources guards the invariant precisely. The comment quality and PR description are thorough.
Severity count: 1 P2, 1 P3, 3 Nits.
The P2 is a convergent finding from the full panel (12 of 14 reviewers flagged it independently): the agent-rebind re-pin path ships with zero test coverage. The shared repinChatContext helper is proven through the refresh path, but the rebind-specific wiring (condition guard, AsChatd escalation, ReadModifyUpdate wrapper, error swallowing) is not exercised.
The P3 is a narrow race in RefreshChatContext: the function captures chat.AgentID from the HTTP middleware read outside the ReadModifyUpdate closure, so on a serialization conflict retry after a concurrent rebind, the retry re-pins using the old agent's ID. Self-heals on the next push, but the refresh actively undoes correct state.
"A data-modifying CTE cannot see its own delete under snapshot isolation, so overlapping sources would collide on the (chat_id, source) primary key." Good catch documenting this; it's the kind of subtle Postgres behavior that trips up future editors.
Notes for awareness: ListChatContextResources has no production caller yet (documented as follow-up). The test asserts resource metadata (hash, kind, status) but not the Body payload; hash uniqueness makes this safe today, but asserting one representative body round-trip would prove the feature, not just the plumbing. Two structurally distinct hydration mechanisms (single CTE vs. three-statement transaction) maintain the same invariant from different layers; the function-level comments document this well, but the constraint is not visible from either layer alone.
🤖 This review was automatically generated with Coder Agents.
- Cover the agent-rebind re-pin path in persistBuildAgentBinding: a real-DB test asserts the pinned hash and resources switch to the new agent, a guard test asserts a first-time binding does not re-pin, and a mock test asserts a re-pin failure is swallowed so it never fails the binding. (CRF-2) - RefreshChatContext re-reads the chat inside the ReadModifyUpdate closure so a serialization-conflict retry re-pins against the chat's current agent instead of a stale captured AgentID. (CRF-3) - Rename DeleteChatContextResources and ListChatContextResources to DeleteChatContextResourcesByChatID and ListChatContextResourcesByChatID to match the filtered-query naming convention. (CRF-1) - Clarify the HydrateAgentChatsContext comment: it does not bump chats.updated_at, but the resource upsert sets chat_context_resources.updated_at on rewritten rows. (CRF-4) - Correct the integration test nolint comment: the rows are chat-owned, read as the chatd subject because the test carries no per-user actor. (CRF-5)
|
/coder-agents-review |
There was a problem hiding this comment.
All five R1 findings addressed cleanly in ce6f7ed. The rebind test suite (context_rebind_internal_test.go) is well-structured: real-DB for the happy path, mock for error injection, and a guard-condition case. The CRF-3 fix re-reads the chat inside the ReadModifyUpdate closure with a clear comment explaining why.
Severity count for new findings: 1 P3, 1 P4.
The P3 is a narrow TOCTOU in hydrateChatContextOnCreate: the hash is read from workspace_agent_context_snapshots in Go, then passed as a parameter to the CTE, which reads resources from workspace_agent_context_resources at statement time. A concurrent push that commits between the two calls can leave a chat with hash v1 but resources v2. Self-heals on the next push (hash mismatch triggers dirty marking). Resources are not consumed yet, so no user-visible impact today, but the stated invariant is violated.
The P4 is a minor test coverage gap: repinChatContext's "no snapshot" clearing branch (agent exists but has not pushed) is not exercised by any test.
"The push's HydrateAgentChatsContext CTE runs before the new chat is committed, so it doesn't see the chat. The push commits. The on-create CTE runs at READ COMMITTED and copies v2 resources, but stamps v1 hash." (Mafuuu, tracing the interleaving that makes this race possible)
coderd/x/chatd/context_hydration.go:102
P3 [CRF-6] hydrateChatContextOnCreate reads the snapshot hash outside the CTE, so a concurrent push can leave a chat pinned to hash v1 with resources v2.
latestAgentSnapshot (line 102) reads the hash from workspace_agent_context_snapshots. The CTE at line 111 stamps that hash as a Go parameter but copies resources from workspace_agent_context_resources at statement time. A concurrent push that commits between the two calls produces a mismatch:
- Agent has pushed v1.
latestAgentSnapshotreads v1. - Push v2 starts. Its
HydrateAgentChatsContextCTE runs before the new chat is committed, so it misses the chat. - The new chat commits.
- Push v2 commits (v2 resources and v2 snapshot now visible).
- The on-create CTE runs at READ COMMITTED: stamps v1 hash (parameter), copies v2 resources (now committed).
The chat has hash v1, resources v2, and no dirty marker. Self-heals when push v3 marks the chat dirty (v1 != v3). The resources are strictly newer (v2), so no user sees worse data. But the stated invariant ("A chat's chat_context_resources always correspond to its context_aggregate_hash") is violated for the interval between create-time hydration and the next push.
Fix: have the CTE join workspace_agent_context_snapshots to read the hash directly, so hash and resources come from the same statement snapshot, or wrap both reads in a lightweight transaction. (Mafuuu)
🤖
🤖 This review was automatically generated with Coder Agents.
…-pin Round 2 review fixes: - hydrateChatContextOnCreate now reads the snapshot hash and copies the agent's resources in one repeatable-read transaction, so a concurrent push cannot commit between the two and leave a chat stamped with one snapshot's hash but another snapshot's resources. (CRF-6) - Add a test for repinChatContext's no-snapshot branch: rebinding to an agent that has not pushed yet clears the chat's pinned hash and resources. (CRF-7)
|
Re CRF-6 ( (CRF-6 was reported in the review summary rather than as a line thread, hence this top-level reply.) Coder Agents on behalf of @kylecarbs. |
|
/coder-agents-review |
There was a problem hiding this comment.
All 7 findings from rounds 1 and 2 verified fixed across three review rounds. No new findings from the round 3 panel (8 reviewers, unanimous clean).
The PR delivers a well-designed resource-pinning lifecycle: the CTE atomically stamps hash and copies resources for bulk hydration, the clear-then-copy helper handles single-chat re-pin for refresh and rebind, and every write path runs inside appropriate transaction boundaries. The CRF-6 fix (wrapping create-time hydration in ReadModifyUpdate) closes the last TOCTOU window. Test density is 67.9% with four distinct testing strategies: E2E integration through real agent RPC, real-DB rebind through internal test, mock-based error injection, and dbauthz authorization checks.
"I tried to build a case against this change and could not. The problem is correctly understood, the solution is proportional to the problem, and the fix is at the right causal level." (Pariston)
🤖 This review was automatically generated with Coder Agents.
The dbmock-based agent-switch tests drive persistBuildAgentBinding, which fires a best-effort context re-pin whenever a turn rebinds a chat to a different agent. That re-pin opens a transaction via database.ReadModifyUpdate, so the mock store now needs to expect InTx and the no-snapshot clear path. Add a shared expectBestEffortContextRepin helper that registers AnyTimes mocks for InTx, GetLatestWorkspaceAgentContextSnapshot (ErrNoRows), SetChatContextSnapshot, and DeleteChatContextResourcesByChatID, and wire it into the three affected tests. The re-pin behavior itself stays covered by TestPersistBuildAgentBindingRepinsContext.
|
/coder-agents-review |
There was a problem hiding this comment.
The delta since R3 is a 17-line test mock helper (expectBestEffortContextRepin) that lets three existing agent-switch tests work with the new rebind re-pin behavior. No production code changed. All 7 prior findings remain verified fixed. 6 reviewers, unanimous clean.
This PR is ready to merge.
🤖 This review was automatically generated with Coder Agents.
What
Populates
chat_context_resources(the per-chat pinned copy added in #26430) by copying fromworkspace_agent_context_resourcesat the points where a chat'scontext_aggregate_hashis set, in the same transaction, so the pinned hash and pinned bodies always agree. No prompt-building change yet; consuming the pinned copy inprepareGenerationis a later, experiment-gated PR.How
HydrateAgentChatsContextnow hydrates NULL-hash chats and copies the agent's resources onto them in one statement (a data-modifying CTE), so the chat-create and agent-push paths need no Go change.InsertAgentContextResourcesIntoChat,DeleteChatContextResources,ListChatContextResources, each with a hand-written dbauthz wrapper (per-chat update/read) and aMethodTestSuiteentry.RefreshChatContextre-pins resources via a sharedrepinChatContexthelper (clear-then-copy in a transaction). A dirty chat keeps its old bodies until refresh.Invariant
A chat's
chat_context_resourcesalways correspond to itscontext_aggregate_hash. Bodies are (re)written only when the hash is set (hydrate, refresh, rebind); a dirty chat keeps its old bodies until refresh.Testing
Extends the context integration test to push real resources and assert the copy across hydrate, dirty (no re-copy), and refresh. The dbauthz
MethodTestSuitecovers the three new methods.Why clear-then-copy (two statements)
The refresh/rebind re-pin clears the chat's rows then inserts the agent's. It uses two sequential statements inside the transaction rather than a single
WITH cleared AS (DELETE ...) INSERT ..., because a data-modifying CTE cannot see its own delete under snapshot isolation, so overlapping sources (the common case: the same files re-pinned) would collide on the(chat_id, source)primary key. The hydrate path inserts into never-pinned (NULL-hash) chats and usesON CONFLICT DO UPDATEdefensively.Follow-ups
prepareGenerationconsuming the pinned instructions and skills (experiment-gated).codersdk.ChatContextresources plus changed diff, and the frontend indicator/refresh.last_injected_context.This PR was created by Coder Agents on behalf of @kylecarbs. Builds on #26430.