Skip to content

fix(coderd/x/chatd): drop foreign provider-executed tools on model switch#26555

Draft
johnstcn wants to merge 6 commits into
mainfrom
cian/codagt-471-sanitize-provider-switch-tool-history
Draft

fix(coderd/x/chatd): drop foreign provider-executed tools on model switch#26555
johnstcn wants to merge 6 commits into
mainfrom
cian/codagt-471-sanitize-provider-switch-tool-history

Conversation

@johnstcn

Copy link
Copy Markdown
Member

Switching model configs mid-chat must be safe, but starting on Anthropic (which provider-executes web_search) and switching to Bedrock currently fails: Bedrock shares the Anthropic wire format yet rejects web_search_tool_result blocks with HTTP 400 (CODAGT-471). The same class of failure applies to any provider-executed tool replayed to a provider that did not produce it.

This drops provider-executed tool history (calls and results) from assistant rows whose producing model-config provider differs from the target turn's provider, before the prompt is built. Same-provider history is left untouched, so normal web_search replay is unaffected.

Approach

  • Equality is on the resolved upstream provider of the producing model config (ModelConfigID -> AIProvider.Type via resolveModelConfigAndNormalizedProvider), not on model.Provider() or any fantasy wire signal. This is deliberate: once requests route through aibridged, fantasyConfigForAIBridge serializes both Anthropic- and Bedrock-typed providers as the anthropic fantasy provider, so model.Provider() and the ProviderOptions wire key both collapse the Anthropic-vs-Bedrock distinction this bug depends on. The model-config provider (derived from AIProvider.Type) keeps them distinct and survives removal of the direct route.
  • Sanitization runs at the database.ChatMessage row level in prepareGeneration, before ConvertMessagesWithFiles, where ModelConfigID is unambiguous (no fragile promptRows <-> prompt index alignment).
  • Foreign provider-executed results are dropped, not text-ified. The assistant's own text summary of a search is a normal text part and is preserved.
  • Unknown origin (unresolvable ModelConfigID) fails closed (strip).

Testing

  • New unit tests for the pure stripForeignProviderExecutedToolRows: same-provider no-op, anthropic->bedrock drop, foreign-only row dropped, multi-provider keep-native/strip-foreign, non-provider-executed untouched, empty target no-op, unknown-origin fail-closed.
  • go test ./coderd/x/chatd/ -run TestStripForeignProviderExecutedToolRows, go build ./coderd/x/chatd/..., go vet, gofmt all clean.

Non-goals / follow-ups

  • Long-term home: once everything routes through aibridged, the most natural place for this is inside aibridged (it owns provider translation and sees the real upstream). Filed as a follow-up; this chatd fix works on both routes today and survives removal of the direct route.
  • Bedrock-internal protocol switch (same bedrock provider hosting Anthropic vs OpenAI underneath) is not distinguished; provider-executed web_search originates from direct Anthropic/OpenAI today.
  • Computer-use turns reassign the provider after resolveChatModel; this uses the primary turn's modelConfig.ID.
  • OpenAI previous_response_id chain-mode invalidation on provider switch is a separate concern.
Implementation plan and decision log

The design was grilled through several iterations before implementation:

  1. Started from the abandoned draft fix(coderd/x/chatd/chatsanitize): sanitize web_search history for non-Anthropic providers #25593 (binary target != "anthropic" strip, wired into now-stale chatd.go sites; the reload_messages site no longer exists on main).
  2. Reframed the trigger to provider equality (a provider-executed block is only valid for the provider that produced it).
  3. Considered provenance via ProviderOptions wire-key sniffing (Option A). Rejected once the aibridge direction surfaced: the wire key collapses Anthropic and Bedrock to anthropic, so it cannot distinguish them.
  4. Settled on ModelConfigID -> AIProvider.Type provenance (authoritative, aibridge-safe) and row-level stripping before conversion (Option C, no index-alignment fragility).
  5. Ran an over-engineering pass: dropped the cheap ModelConfigID gate, the per-ToolCallID grouping, a duplicate logger/stats, and (per review) the result text-ification path in favor of dropping foreign results outright.

Single call site (generation_preparer.go); chatloop pre-request and reload paths untouched.


This pull request was created by Coder Agents on behalf of @johnstcn.

@linear-code

linear-code Bot commented Jun 19, 2026

Copy link
Copy Markdown

CODAGT-471

@johnstcn

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review

coder-agents-review Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Chat: Review posted | View chat
Requested: 2026-06-19 20:45 UTC by @johnstcn
Spend: $71.62 / $100.00

Review history
  • R1 (2026-06-19): 13 reviewers, 3 Nit, 1 Note, 3 P3, 1 P4, COMMENT. Review
  • R2 (2026-06-19), 3 Nit, 1 Note, 3 P3, 1 P4, COMMENT. Review
  • R3 (2026-06-19): 9 reviewers, 3 Nit, 1 Note, 6 P3, 1 P4, COMMENT. Review
  • R4 (2026-06-19): 10 reviewers, 3 Nit, 1 Note, 6 P3, 1 P4, APPROVE. Review

deep-review v0.8.0 | Round 4 | d5ec26b..c46d4ad

Last posted: Round 4, 11 findings (6 P3, 1 P4, 3 Nit, 1 Note), APPROVE. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Author fixed (e339f11) provider_switch_sanitize.go:118 Silent return when target resolution fails; no log R1 Mafuuu P3, Chopper P3 Yes
CRF-2 P3 Author contested; panel closed R3 (5/5 accept) provider_switch_sanitize.go:85 Dropped PE-only rows can produce consecutive same-role messages; implicit coupling to downstream sanitizer R1 Hisoka Yes
CRF-3 P3 Author fixed (e339f11) provider_switch_sanitize.go:111 Wrapper name drops "Executed" from inner function name, implying broader stripping R1 Gon Yes
CRF-4 P4 Author fixed (e339f11) provider_switch_sanitize.go:52 Comment claims PE blocks only on assistant rows; legacy tool-role PE rows exist R1 Meruem Yes
CRF-5 Note Dropped by orchestrator (pre-existing pattern, acknowledged in PR non-goals) generation_preparer.go:223 Computer-use provider override after sanitization R1 Knov No
CRF-6 Nit Author contested; panel closed R3 (4/4 accept) generation_preparer.go:223 Commit subject 83 chars, over 72-char convention R1 Leorio Yes
CRF-7 Nit Author contested; panel closed R3 (4/4 accept) generation_preparer.go:223 Commit body empty; decision context only in PR description R1 Leorio Yes
CRF-8 Nit Author fixed (e339f11) provider_switch_sanitize_test.go:78 cfgByProvider name inverts mapping direction R1 Gon Yes
CRF-9 Note Dropped by orchestrator (Gon-Leorio disagreement; style preference) provider_switch_sanitize.go:23 Comment verbosity pattern (5 instances) R1 Gon P2, Leorio contradicts No
CRF-10 Note Author accepted R3 (deferred without ticket; gap tolerable now, per-origin breakdown can be added when multi-provider histories are common) provider_switch_sanitize.go:135 Log missing foreign origin provider names R1 Chopper Yes
CRF-11 P3 Author fixed (c46d4ad) provider_switch_sanitize_internal_test.go:169 Parse-error fallback path at lines 64-67 has zero test coverage R3 Bisky Yes
CRF-12 P3 Author contested; panel closed R4 (7/7 accept) provider_switch_sanitize.go:136 Origin resolution inherits enabled checks; disabled same-provider configs falsely treated as foreign R3 Mafuuu Yes
CRF-13 P3 Author fixed (c46d4ad) provider_switch_sanitize.go:137 Origin resolution errors silently discarded; asymmetry with target path logging R3 Chopper Yes
CRF-14 Nit Dropped by orchestrator (Gon-Leorio persistent disagreement on comment style) provider_switch_sanitize.go:108 Wrapper doc comment restates identifiers with zero informational category R3 Gon P2→Nit No
CRF-15 Nit Dropped by orchestrator (weakest instance of comment pattern) provider_switch_sanitize.go:14 Stats type doc comment restates type name R3 Gon P2→Nit No
CRF-16 Nit Dropped by orchestrator (Leorio explicitly praised this comment in R3) generation_preparer.go:218 Call-site comment partially restates function name R3 Gon Nit No

Contested and acknowledged

CRF-2 (P3, provider_switch_sanitize.go:85) - Consecutive same-role messages after row drop

  • Finding: Dropping a foreign-PE-only assistant row can produce consecutive same-role user messages. Requested a comment documenting the coupling.
  • Author defense: SanitizeAnthropicProviderToolHistory already produces consecutive same-role turns. Anthropic concatenates rather than rejecting.
  • Panel closure (R3, 5/5): Hisoka, Mafuuu, Pariston, Chopper, Razor confirmed both routes handle this. PE-only row trigger extremely unlikely.

CRF-6 (Nit, commit) - Commit subject over 72 chars

  • Finding: 83 chars.
  • Author defense: Squash-merge; PR title shortened.
  • Panel closure (R3, 4/4): Defense holds.

CRF-7 (Nit, commit) - Commit body empty

  • Finding: Empty body.
  • Author defense: PR description becomes squash body.
  • Panel closure (R3, 4/4): Defense holds.

CRF-10 (Note, provider_switch_sanitize.go:135) - Log missing foreign origin providers

  • Finding: Log omits stripped foreign origins.
  • Author accepted: Gap tolerable now, can add when multi-provider common. No ticket.

CRF-12 (P3, provider_switch_sanitize.go:136) - Disabled config false-positive stripping

  • Finding: resolveModelConfigAndNormalizedProvider returns sql.ErrNoRows for disabled configs. When an admin rotates configs within the same provider, disabled same-provider configs are falsely treated as foreign and their PE blocks are stripped.
  • Author defense: Fail-closed is intentional. Resolving disabled configs requires ignoring the Enabled gate, a special-case that risks leaking disabled configs into other callers. Failure mode is bounded (text summary survives). Alternative is fail-open which reintroduces HTTP 400. Observability improved via CRF-13.
  • Panel closure (R4, 7/7): Hisoka, Mafuuu, Pariston, Chopper, Meruem, Kite, Mafu-san accepted. Fail-closed with observability is the correct engineering choice. The trigger is narrow (admin config rotation during active PE chat), the consequence is bounded (text summary survives), and the alternative adds a special-case path that risks fail-open replay.

Round log

Round 1

Panel. 3 P3, 1 P4, 3 Nit, 1 Note posted (8 inline). 1 Note, 1 Note dropped. Reviewed against d5ec26b..ea1d6c9.

Round 2

BLOCKED. All 8 open findings silent. Reviewed against d5ec26b..80a6a93.

Round 3

Panel. CRF-1,3,4,8 fixed (verified). CRF-2,6,7 contested and closed by panel. CRF-10 acknowledged. 3 P3 new. 3 Nit dropped. Reviewed against d5ec26b..e339f11.

Round 4

Panel. CRF-11,13 fixed (verified). CRF-12 contested and closed by panel (7/7 accept). No new findings. All findings resolved. Reviewed against d5ec26b..c46d4ad.

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Komugi flake/determinism
Kurapika security
Law decomposition
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

@datadog-coder

This comment has been minimized.

@johnstcn

Copy link
Copy Markdown
Member Author

/coder-agents-review

@johnstcn

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-scoped fix. The two-layer sanitization design (row-level before conversion, message-level after), provenance via ModelConfigID -> AIProvider.Type, and the fail-closed/fail-open asymmetry are all defensible. Test suite is honest: 7 subtests with real marshal/parse round-tripping, no mocking, and each proves a distinct behavior.

3 P3, 1 P4, 3 Nit, 1 Note.

"I tried to build a case against this and could not." (Pariston)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/provider_switch_sanitize.go
Comment thread coderd/x/chatd/provider_switch_sanitize.go
Comment thread coderd/x/chatd/provider_switch_sanitize.go Outdated
Comment thread coderd/x/chatd/provider_switch_sanitize.go Outdated
Comment thread coderd/x/chatd/generation_preparer.go Outdated
Comment thread coderd/x/chatd/generation_preparer.go Outdated
Comment thread coderd/x/chatd/provider_switch_sanitize_internal_test.go
Comment thread coderd/x/chatd/provider_switch_sanitize.go

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 blocked. 8 open findings from round 1 have no author response and no code fix.

Three fixup commits landed (import formatting, spelling correction, test file rename to _internal_test.go, helper simplification) but none address review findings.

Unaddressed findings:

  • CRF-1 (P3) Silent return when target resolution fails; no log
  • CRF-2 (P3) Dropped PE-only rows can produce consecutive same-role messages; implicit coupling to downstream sanitizer
  • CRF-3 (P3) Wrapper name drops "Executed" from inner function name
  • CRF-4 (P4) Comment claims PE blocks only on assistant rows; legacy tool-role PE rows exist
  • CRF-6 (Nit) Commit subject 83 chars, over 72-char convention
  • CRF-7 (Nit) Commit body empty
  • CRF-8 (Nit) cfgByProvider name inverts mapping direction
  • CRF-10 (Note) Log missing foreign origin provider names

Further review is blocked until the author responds to existing threads (fix, acknowledge, or contest) or pushes fixes.

🤖 This review was automatically generated with Coder Agents.

@johnstcn johnstcn changed the title fix(coderd/x/chatd): drop foreign provider-executed tool history on provider switch fix(coderd/x/chatd): drop foreign provider-executed tools on model switch Jun 19, 2026
@johnstcn

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3. CRF-1, CRF-3, CRF-4, CRF-8 verified fixed. CRF-2, CRF-6, CRF-7 contested and closed by panel (defenses hold). CRF-10 acknowledged by author without ticket.

3 new P3.

"Every sentence in the comment restates identifiers visible from the method signature and body." (Gon, on the wrapper doc comment, which 4 other reviewers praised)

CRF-10 (log missing foreign origin providers): author accepted the gap but filed no ticket. This needs a human decision: file a ticket or explicitly accept the gap as permanent.

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/x/chatd/provider_switch_sanitize.go
Comment thread coderd/x/chatd/provider_switch_sanitize.go
Comment thread coderd/x/chatd/provider_switch_sanitize_internal_test.go
@johnstcn

Copy link
Copy Markdown
Member Author

/coder-agents-review

@coder-agents-review coder-agents-review Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4. CRF-11 and CRF-13 verified fixed. CRF-12 contested and closed by panel (7/7 accept: fail-closed with observability is the correct engineering choice for this edge case).

All 16 findings across 4 rounds resolved: 8 fixed, 4 contested and panel-closed, 1 acknowledged, 3 dropped.

No new findings. Clean to merge.

🤖 This review was automatically generated with Coder Agents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant