Skip to content

feat: remove chat chain mode#26980

Draft
johnstcn wants to merge 5 commits into
mainfrom
cian/codagt-445-remove-chain-mode-entirely
Draft

feat: remove chat chain mode#26980
johnstcn wants to merge 5 commits into
mainfrom
cian/codagt-445-remove-chain-mode-entirely

Conversation

@johnstcn

@johnstcn johnstcn commented Jul 3, 2026

Copy link
Copy Markdown
Member

Removes OpenAI Responses "chain mode" from chatd. Chain mode sent previous_response_id on follow-up turns instead of replaying full history. The provider periodically loses chain state (ZDR, key rotation, item GC), each new error message format broke the chain-broken classifier, and the recovery path added complexity while still dead-ending in some scenarios. Every turn now replays full input.

Closes CODAGT-445.

Changes

  • Delete chatopenai/responses.go (chain detection, activation, prompt filtering, response ID extraction) and its tests.
  • Delete the ChainBroken classification in chaterror and the chatloop retry bookkeeping that disabled chain mode mid-generation.
  • Drop the chain_broken label from the coderd_chatd_stream_retries_total metric.
  • Stop reading and writing chat_messages.provider_response_id; delete the dead ClearChatMessageProviderResponseIDsByChatID query. The column stays on disk for rollback safety; dropping it is a follow-up migration.
  • Delete the never-invoked ReloadMessages and DisableChainMode chatloop hooks and stale chain-mode comments.

Not touched: the aibridge provider_response_id columns (unrelated feature), the Store provider option (still a valid OpenAI dashboard-logging knob), and the go.mod fantasy fork (carries 13 other changes; only its replay-safety item related to chain mode, and that logic still protects full replay).

Follow-ups

  • Migration to drop chat_messages.provider_response_id after a release cycle: CODAGT-748.
  • Product decision on keeping the Store option now that it has no in-product consumer.
  • Optional: simplify reference-replay logic in coder/fantasy now that no consumer sends previous_response_id.
Implementation plan

Executed in three phases on this branch:

  1. Red (test(coderd/x/chatd): drop chain mode test coverage): deleted chain-mode tests: chatopenai/responses_test.go, six TestActiveServer_ChainBroken* tests, TestClassify_ChainBroken*, TestAdvisorChainMode_SnapshotKeepsFullHistory, TestOpenAIResponsesChainMode*. chatd_chainmode_test.go was renamed to chatd_helpers_test.go because it housed shared helpers used by ~20 other test sites.
  2. Green (feat(coderd/x/chatd): remove chain mode): deleted chatopenai/responses.go entirely; the chain activation block in generation_preparer.go; ChainModeDisabled plumbing and the chain_broken log field in generation.go; ChainBroken and chainBrokenClassification in chaterror; the chain_broken metric label; PersistedStep.ProviderResponseID in chatloop; the PreviousResponseID reset in chatadvisor (Store=false behavior kept); ProviderResponseID from internal step structs, chatstate params, and the CreateChatMessages INSERT; the dead ClearChatMessageProviderResponseIDsByChatID query with its dbauthz wrapper; regenerated DB and metrics artifacts via make gen.
  3. Refactor (chore(coderd/x/chatd): drop dead ReloadMessages hook and stale chain comments): removed the orphaned ReloadMessages chatloop hook and reworded three stale chain-mode comments.

Deliberately skipped: shrinking the RecordStreamRetry signature (the ClassifiedError param encodes a real invariant), merging the chatadvisor clone/reset helpers (the defensive clone is cheap insurance), and any change to generation_preparer.go's promptRows fetch (verified still multi-purpose).

Verification: make build, make gen (no diff on re-run), make fmt, make lint, and go test ./coderd/x/chatd/... ./coderd/database/dbauthz/... all pass locally with postgres.


🤖 Generated by Coder Agents on behalf of @johnstcn.

@linear-code

linear-code Bot commented Jul 3, 2026

Copy link
Copy Markdown

CODAGT-445

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown

Docs preview

📖 View docs preview for docs/admin/integrations/prometheus.md

@johnstcn johnstcn changed the title feat(coderd/x/chatd): remove chain mode feat: remove chat chain mode Jul 3, 2026
@johnstcn

johnstcn commented Jul 3, 2026

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-07-03 16:49 UTC by @johnstcn
Spend: $38.96 / $100.00

Review history
  • R1 (2026-07-03): 15 reviewers, 2 Nit, 3 Note, 2 P3, COMMENT. Review

deep-review v0.9.0 | Round 1 | 121107f..e38f363

Last posted: Round 1, 7 findings (2 P3, 2 Nit, 3 Note), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 Nit Open chatd_helpers_test.go:161 Stale hardcoded chat title "chain mode test" in general-purpose helper R1 Netero, Chopper Yes
CRF-2 P3 Open chatd_test.go:12495 Doc comment first line truncated during adjacent test deletion R1 Hisoka Yes
CRF-3 P3 Open chatd.go:3787 buildSystemPrompt comment says "initial prompt assembly" but runs every turn R1 Razor Nit, Gon P2 Yes
CRF-4 Nit Open chatd_test.go:5899 require.True(strings.Contains) should be require.Contains for diagnostics R1 Leorio Yes
CRF-5 Note Open PR description Orphaned column has no drop-migration ticket R1 Knuckle Yes
CRF-6 Note Open chatloop/metrics.go:111 chain_broken label removal is metric contract change for operators R1 Chopper Yes
CRF-7 Note Open chatadvisor/runtime.go:64 Comment verbosity pattern: several rewritten comments lead with mechanism the code already shows R1 Gon P2 Yes
CRF-8 Nit Dropped by orchestrator (file comment on test helper is Go convention; filename is self-documenting) chatd_helpers_test.go:3 File comment restates filename R1 Gon No
CRF-9 P2 Dropped by orchestrator (comment is accurate per Leorio; verbosity covered by CRF-7 pattern note) chatadvisor/runtime.go:93 resetProviderOptionsForNestedCall doc opens with mechanism narration R1 Gon No
CRF-10 P2 Dropped by orchestrator (comment explains why, not just what; subsumed by CRF-7) chatadvisor/runner.go:46 Inline comment restates function calls below R1 Gon No
CRF-11 P2 Dropped by orchestrator (comment is accurate per Leorio; verbosity covered by CRF-7) context_prompt.go:20 AgentChatContextSentinelPath doc carries redundant phrasing R1 Gon No
CRF-12 P2 Dropped by orchestrator (comment is accurate per Leorio; verbosity covered by CRF-7) chatloop/chatloop.go:65 PersistedStep doc enumerates content types visible from type constraint R1 Gon No

Contested and acknowledged

(none)

Round log

Round 1

Panel (15 reviewers). 0 P0, 0 P1, 0 P2, 2 P3, 2 Nit, 3 Note posted. 5 dropped (comment verbosity subsumed by pattern note). Reviewed against 121107f..e38f363.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, thorough removal of chain mode. The deletion scope is well-chosen: every PreviousResponseID, ChainBroken, chain_broken, ChainModeDisabled, ReloadMessages, DisableChainMode, and PrepareMessages reference is gone from production code. The three dead chatloop hooks were correctly identified as never-wired. Tests pass. The follow-ups (column drop migration, Store option product decision, fantasy fork simplification) are concrete and properly scoped.

Severity count: 2 P3, 2 Nit, 3 Note. No blockers.

Note for operators: the chain_broken label removal from coderd_chatd_stream_retries_total is a metric contract change. Any Prometheus queries or Grafana dashboards referencing that label will need updating at deploy time.

Note: the PR description says "Executed in three phases" but the branch has four commits; the 4th commit removes the dead PrepareMessages hook and inlines single-use test helpers, which the Changes section doesn't mention. Minor description inconsistency.

"I tried to build a case against this and could not." (Pariston)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/chatd_helpers_test.go Outdated
OrganizationID: orgID,
OwnerID: userID,
APIKeyID: testAPIKeyID(t, db, userID),
Title: "chain mode test",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-1] Stale hardcoded chat title "chain mode test" in a general-purpose helper.

createChatThroughServer is now used by ~20 tests across retry, compaction, provider-switch, and sanitization flows. The hardcoded title is a leftover from when this function lived in chatd_chainmode_test.go. Change to a generic title like "test chat" or accept a title parameter.

(Netero, Chopper)

🤖

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9fd43b2: title is now "test chat". Audited the diff for other stale chain-mode strings; this was the only one. 🤖 Coder Agents on behalf of @johnstcn

}

// TestProviderSwitchSanitizesAndRestoresPEToolHistory verifies the A→B→A
// provider-switch contract:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-2] Doc comment first line truncated during adjacent test deletion.

The deletion of TestAdvisorChainMode_SnapshotKeepsFullHistory above this comment cut the opening line:

// TestProviderSwitchSanitizesAndRestoresPEToolHistory verifies the A→B→A

The comment now starts mid-sentence with // provider-switch contract:, violating Go doc convention (doc comments on test functions should lead with the function name). Prepend the missing line.

(Hisoka)

🤖

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9fd43b2: restored the missing first line. Scanned all touched files for other name/doc mismatches; one more exists at chatd_test.go:11718 but it predates this PR (present on main), so left alone. 🤖 Coder Agents on behalf of @johnstcn

Comment thread coderd/x/chatd/chatd.go Outdated
Comment on lines +3786 to +3787
// buildSystemPrompt applies system-level prompt injections in the
// canonical order. It is used by both the initial prompt assembly
// and the ReloadMessages callback to keep them in sync.
// canonical order for the initial prompt assembly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 [CRF-3] Comment says "for the initial prompt assembly" but buildSystemPrompt runs on every turn.

buildSystemPrompt is called from generation_preparer.go:344 inside prepareGeneration, which executes every turn, not only the first. The old comment ("used by both the initial prompt assembly and the ReloadMessages callback") was inaccurate in a different way; this replacement trades one inaccuracy for another.

Suggested fix: drop the scope qualifier.

// buildSystemPrompt applies system-level prompt injections in
// canonical order.

(Razor, Gon)

🤖

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9fd43b2: dropped the scope qualifier as suggested. The same audit found one more stale-rationale comment in chatadvisor/runner.go (claimed chatloop mutates options across calls; it no longer mutates provider options at all), also fixed. 🤖 Coder Agents on behalf of @johnstcn

Comment thread coderd/x/chatd/chatd_test.go Outdated
require.Len(t, generationRequests, 2)
recovered := generationRequests[1]
require.True(t, anthropicSystemHasEphemeralCacheControl(t, recovered))
require.True(t, strings.Contains(string(recovered.System), `"cache_control":{"type":"ephemeral"}`))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit [CRF-4] require.True(t, strings.Contains(...)) gives a useless failure message.

On failure, testify prints "Should be true" with no context about what the system prompt actually contained. Compare with line 6907 which correctly uses require.Contains. Use:

require.Contains(t, string(recovered.System), `"cache_control":{"type":"ephemeral"}`)

(Leorio)

🤖

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9fd43b2: now require.Contains. Only instance in the diff. 🤖 Coder Agents on behalf of @johnstcn

Name: "stream_retries_total",
Help: "Total LLM stream retries.",
}, []string{"provider", "model", "kind", "chain_broken"}),
}, []string{"provider", "model", "kind"}),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-6] Removing the chain_broken label from coderd_chatd_stream_retries_total is a metric contract change. Any Prometheus queries, Grafana dashboards, or alerting rules referencing this label will break after deploy. Since chain mode is fully removed and the label would only ever read "false" going forward, the removal is correct.

(Chopper)

🤖

total_cost_micros,
runtime_ms,
provider_response_id
runtime_ms

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-5] The orphaned chat_messages.provider_response_id column has no drop-migration ticket. The PR description lists it as a follow-up, but no linked ticket exists. The column is nullable TEXT with no index, no constraints, no active writers (after this PR), and no active readers in chatd. Cost of carrying it is near zero (one bit per row in the null bitmap), but worth a ticket so it doesn't get forgotten.

(Knuckle)

🤖

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed CODAGT-748 for the column drop and linked it in the PR description. 🤖 Coder Agents on behalf of @johnstcn

Comment thread coderd/x/chatd/chatadvisor/runtime.go Outdated
Comment on lines +64 to +71
// cloneProviderOptions returns a copy of opts with pointer entries for known,
// in-place mutated provider option types replaced by a shallow struct copy.
// chatloop mutates the OpenAI Responses entry (PreviousResponseID) on
// chain-mode exit, so sharing the pointer with the parent run would let an
// advisor call corrupt the parent's chain state. Value fields such as
// Metadata and Include are still shared with the parent; nothing in this
// package mutates them, but callers that need true deep-copy semantics must
// handle those fields explicitly.
// resetProviderOptionsForNestedCall mutates the OpenAI Responses entry
// (Store), so sharing the pointer with the parent run would let an advisor
// call corrupt the parent's options. Value fields such as Metadata and
// Include are still shared with the parent; nothing in this package mutates
// them, but callers that need true deep-copy semantics must handle those
// fields explicitly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note [CRF-7] Several rewritten comments in this PR lead with mechanism narration the code already shows, burying the substantive content (the trap, the invariant, the why) deeper in the paragraph. This comment is a representative example: the first two lines restate what the function body does; the useful parts are the Store mutation risk and the shared-value-fields caveat.

The pattern also appears in resetProviderOptionsForNestedCall (runtime.go:93), runner.go:46, context_prompt.go:20, and chatloop.go:65. All comments are accurate (and the technical writing is noted as good), but leading with the category-bearing content would make them more concise per project style.

(Gon)

🤖

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9fd43b2: restructured all five listed comments to lead with the invariant/trap (cloneProviderOptions, resetProviderOptionsForNestedCall, runner.go, AgentChatContextSentinelPath, PersistedStep). 🤖 Coder Agents on behalf of @johnstcn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant