fix(coderd/x/chatd): always commit a workspace context marker#26520
fix(coderd/x/chatd): always commit a workspace context marker#26520hugodutka wants to merge 1 commit into
Conversation
|
/coder-agents-review |
|
Chat: Review posted | View chat Review history
deep-review v0.8.0 | Round 1 | Last posted: Round 1, 4 findings (1 P3, 2 Nit, 1 Note), COMMENT. Review Finding inventoryFindings
Round logRound 1Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper, Ging-Go, Gon, Leorio, Komugi, Meruem, Kite, Knov (wildcard). Netero: no findings. 1 P3, 2 Nit, 1 Note. Reviewed against a5a4c49..2bd76cd. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
The fix is well-diagnosed and proportional. The PR description traces the stuck-chat bug through production logs to the exact three-function call chain, and the solution reuses the existing empty-marker pattern from buildWorkspaceContext (lines 84-91) rather than inventing new machinery. Test coverage is strong at 71% test LOC with assertions on marker shape, model call count, system prompt content, and agent dial avoidance.
1 P3, 2 Nit, 1 Note.
"Boring code. No fight here. ♠" — Hisoka
coderd/x/chatd/workspace_context_builder.go:57
P3 [CRF-1] Stale comment describes dead behavior.
This comment says "fall back to no-op so the generation action exits without committing stale context." After this PR, the caller (persistWorkspaceContext) catches errWorkspaceContextUnavailable and commits a marker instead of exiting. Someone debugging a stuck chat reads this, forms the mental model "the generation action exits here," and looks in the wrong direction.
Suggested fix:
// If the workspace or agent is gone, report unavailable so the
// caller can decide how to proceed.(Leorio P3, Pariston Nit)
🤖
🤖 This review was automatically generated with Coder Agents.
| return matched | ||
| } | ||
|
|
||
| func contextFileMarkersForAgent( |
There was a problem hiding this comment.
Nit [CRF-2] contextFileMarkersForAgent returns all context-file parts for the agent, not just markers.
The only difference from the adjacent contextFilePartsForAgent is the dropped ContextFileContent == "" filter. If a future test calls this expecting only empty-content sentinels while content-bearing parts also exist for the same agent, it will silently return both. The test compensates with require.Empty(t, markers[0].ContextFileContent), so correctness is fine today.
Consider allContextFilePartsForAgent, or a single function with a requireContent bool parameter.
(Bisky, Gon, Meruem, Kite)
🤖
| }) | ||
| } | ||
|
|
||
| func workspaceContextMarkerMessage(chat database.Chat, activeAPIKeyID string) (chatstate.Message, error) { |
There was a problem hiding this comment.
Nit [CRF-3] Missing doc comment on workspaceContextMarkerMessage.
Every neighboring function in this file has a doc comment (persistWorkspaceContext, beginGenerationAttempt, compactionModel). This function constructs the sentinel that unsticks the entire generation loop, the core mechanism of the fix. A reader in a year sees an undocumented function building a message with empty ContextFileContent and has to trace callers and consumers to understand why.
Suggested:
// workspaceContextMarkerMessage builds an empty context-file sentinel
// for the chat's selected agent. Committing this marker lets the
// generation loop proceed when the agent is unreachable.(Leorio)
🤖
| // agent missing, unreachable, etc.). Exit the action without | ||
| // committing so the next worker task can re-read the chat. | ||
| return errTaskExpectedExit | ||
| if err != nil || len(result.Messages) == 0 { |
There was a problem hiding this comment.
Note [CRF-4] Two observations on this condition.
-
Error conflation. The old code distinguished
errWorkspaceContextUnavailablefrom otherbuildWorkspaceContexterrors. The newerr != nilcatches all errors and commits a marker. Today all errors are non-transient (errWorkspaceContextUnavailablefor agent gone,MarshalPartsfor programming bug), so this is correct. TheerrWorkspaceContextUnavailablesentinel identity is now vestigial from the caller's perspective (noerrors.Isconsumer), though it still serves documentation and log-string value. IfbuildWorkspaceContextlater gains a transient error path, the marker would silently absorb it. -
Defensive guard. The
len(result.Messages) == 0branch witherr == nilis unreachable.buildWorkspaceContextalways returns at least one message on success. Harmless defense against a future builder change.
No action needed. Noting for future readers.
(Pariston, Kite, Knov, Chopper, Meruem, Leorio)
🤖
Closes CODAGT-629. A stuck chat had these logs associated with it:
"workspace agent not resolvable" is printed by
fetchContextForBuild. this causesbuildWorkspaceContextto exit with aerrWorkspaceContextUnavailableerror. That in turn is interpreted bypersistWorkspaceContextas an "expected exit" scenario. That's a bug: because the task exits without changing the chat state, the runner never issues another task to process the chat any further. But even if it did, it'd go through the same code path and exit again. We need to ensure thatpersistWorkspaceContextcommits a marker file even if it cannot reach the agent.