Skip to content

fix(coderd/x/chatd): always commit a workspace context marker#26520

Open
hugodutka wants to merge 1 commit into
mainfrom
hugodutka/workspace-ctx-marker-fix
Open

fix(coderd/x/chatd): always commit a workspace context marker#26520
hugodutka wants to merge 1 commit into
mainfrom
hugodutka/workspace-ctx-marker-fix

Conversation

@hugodutka

Copy link
Copy Markdown
Contributor

Closes CODAGT-629. A stuck chat had these logs associated with it:

1781735334322	2026-06-17T22:28:54.322Z	2026-06-17 22:28:54.322 [debu]  coderd.chatd.processor: workspace context build: workspace agent not resolvable  chat_id=d4524ebb-4494-47df-b258-d933c0248942  owner_id=d96bf761-3f94-46b3-a1da-6316e2e4735d
1781735334298	2026-06-17T22:28:54.298Z	2026-06-17 22:28:54.298 [debu]  coderd.chatd.processor: plan path instruction: agent not reachable  chat_id=d4524ebb-4494-47df-b258-d933c0248942  owner_id=d96bf761-3f94-46b3-a1da-6316e2e4735d  chat_id=d4524ebb-4494-47df-b258-d933c0248942 ...
    error= workspace has no running agent: the workspace is likely stopped. Use the start_workspace tool to start it:
               github.com/coder/coder/v2/coderd/x/chatd.init
                   <autogenerated>:1

"workspace agent not resolvable" is printed by fetchContextForBuild. this causes buildWorkspaceContext to exit with a errWorkspaceContextUnavailable error. That in turn is interpreted by persistWorkspaceContext as an "expected exit" scenario. That's a bug: because the task exits without changing the chat state, the runner never issues another task to process the chat any further. But even if it did, it'd go through the same code path and exit again. We need to ensure that persistWorkspaceContext commits a marker file even if it cannot reach the agent.

@linear-code

linear-code Bot commented Jun 18, 2026

Copy link
Copy Markdown

CODAGT-629

@hugodutka

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-06-18 13:49 UTC by @hugodutka
Spend: $26.68 / $100.00

Review history
  • R1 (2026-06-18): 13 reviewers, 2 Nit, 1 Note, 1 P3, COMMENT. Review

deep-review v0.8.0 | Round 1 | a5a4c49..2bd76cd

Last posted: Round 1, 4 findings (1 P3, 2 Nit, 1 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Open workspace_context_builder.go:57 Stale comment describes dead exit behavior R1 Leorio P3, Pariston Nit Yes
CRF-2 Nit Open chatd_test.go:13065 contextFileMarkersForAgent name misleads R1 Bisky, Gon, Meruem, Kite Yes
CRF-3 Nit Open generation.go:851 Missing doc comment on workspaceContextMarkerMessage R1 Leorio Yes
CRF-4 Note Open generation.go:833 Error conflation safe today; defensive guard unreachable R1 Pariston, Kite, Knov, Chopper, Meruem, Leorio Yes

Round log

Round 1

Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper, Ging-Go, Gon, Leorio, Komugi, Meruem, Kite, Knov (wildcard). Netero: no findings. 1 P3, 2 Nit, 1 Note. Reviewed against a5a4c49..2bd76cd.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is well-diagnosed and proportional. The PR description traces the stuck-chat bug through production logs to the exact three-function call chain, and the solution reuses the existing empty-marker pattern from buildWorkspaceContext (lines 84-91) rather than inventing new machinery. Test coverage is strong at 71% test LOC with assertions on marker shape, model call count, system prompt content, and agent dial avoidance.

1 P3, 2 Nit, 1 Note.

"Boring code. No fight here. ♠" — Hisoka


coderd/x/chatd/workspace_context_builder.go:57

P3 [CRF-1] Stale comment describes dead behavior.

This comment says "fall back to no-op so the generation action exits without committing stale context." After this PR, the caller (persistWorkspaceContext) catches errWorkspaceContextUnavailable and commits a marker instead of exiting. Someone debugging a stuck chat reads this, forms the mental model "the generation action exits here," and looks in the wrong direction.

Suggested fix:

// If the workspace or agent is gone, report unavailable so the
// caller can decide how to proceed.

(Leorio P3, Pariston Nit)

🤖

🤖 This review was automatically generated with Coder Agents.

return matched
}

func contextFileMarkersForAgent(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-2] contextFileMarkersForAgent returns all context-file parts for the agent, not just markers.

The only difference from the adjacent contextFilePartsForAgent is the dropped ContextFileContent == "" filter. If a future test calls this expecting only empty-content sentinels while content-bearing parts also exist for the same agent, it will silently return both. The test compensates with require.Empty(t, markers[0].ContextFileContent), so correctness is fine today.

Consider allContextFilePartsForAgent, or a single function with a requireContent bool parameter.

(Bisky, Gon, Meruem, Kite)

🤖

})
}

func workspaceContextMarkerMessage(chat database.Chat, activeAPIKeyID string) (chatstate.Message, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-3] Missing doc comment on workspaceContextMarkerMessage.

Every neighboring function in this file has a doc comment (persistWorkspaceContext, beginGenerationAttempt, compactionModel). This function constructs the sentinel that unsticks the entire generation loop, the core mechanism of the fix. A reader in a year sees an undocumented function building a message with empty ContextFileContent and has to trace callers and consumers to understand why.

Suggested:

// workspaceContextMarkerMessage builds an empty context-file sentinel
// for the chat's selected agent. Committing this marker lets the
// generation loop proceed when the agent is unreachable.

(Leorio)

🤖

// agent missing, unreachable, etc.). Exit the action without
// committing so the next worker task can re-read the chat.
return errTaskExpectedExit
if err != nil || len(result.Messages) == 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-4] Two observations on this condition.

  1. Error conflation. The old code distinguished errWorkspaceContextUnavailable from other buildWorkspaceContext errors. The new err != nil catches all errors and commits a marker. Today all errors are non-transient (errWorkspaceContextUnavailable for agent gone, MarshalParts for programming bug), so this is correct. The errWorkspaceContextUnavailable sentinel identity is now vestigial from the caller's perspective (no errors.Is consumer), though it still serves documentation and log-string value. If buildWorkspaceContext later gains a transient error path, the marker would silently absorb it.

  2. Defensive guard. The len(result.Messages) == 0 branch with err == nil is unreachable. buildWorkspaceContext always returns at least one message on success. Harmless defense against a future builder change.

No action needed. Noting for future readers.

(Pariston, Kite, Knov, Chopper, Meruem, Leorio)

🤖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant