fix: decouple Bedrock AI providers#26035
Conversation
|
/coder-agents-review |
|
Chat: Review in progress | View chat deep-review v0.7.1 | Round 10 | Status: Panel review (4 reviewers) Last posted: Round 9, 19 findings (1 P0, 2 P1, 1 P2, 10 P3, 4 Nit, 1 Note), COMMENT. Review Finding inventoryFindings
Round logRound 1Netero-only. 1 P1, 1 Nit. Reviewed against 8a9580a..993f16c. Round 2Panel (22 reviewers). CRF-1, CRF-2 addressed. 1 P1, 1 P2, 4 P3, 1 Nit, 1 Note new. Reviewed against 8a9580a..fe6bc63. Round 3BLOCKED. CRF-3, CRF-6, CRF-7, CRF-9 addressed. CRF-4, CRF-5, CRF-8, CRF-10 silent. No review. Round 4Panel (14 reviewers). CRF-4, CRF-5, CRF-8, CRF-10 addressed. CRF-11 confirmed broken by 3 reviewers. 1 P3, 1 Nit new. Reviewed against 8a9580a..f82e103. Round 5Panel (3 reviewers). CRF-12, CRF-13 addressed. CRF-11 confirmed still broken by all 3 reviewers. 1 P3 new. Reviewed against c5631a8..7f32073. Round 6Panel (3 reviewers). CRF-11, CRF-14 addressed. Netero clean. 1 P3 new (test gap). Reviewed against c5631a8..45f12b7. Round 7Panel (3 reviewers). CRF-15 addressed. Netero clean. 1 P3, 1 Nit new. Reviewed against c5631a8..4b4c146. Round 8Panel (4 reviewers). CRF-16, CRF-17 addressed. Netero clean. No new findings. All 17 findings resolved. APPROVED. Reviewed against c5631a8..37b8ae5. Round 9Scope expanded (CODAGT-549 folded in from #26038). Effective LOC grew from 972 to 1369. Law: don't split (advisory). Netero clean. Panel (12 reviewers). 2 P3 new. Reviewed against 6b556ea..0c1bc27. Law analysisEffective LOC: 1369. Head SHA: 0c1bc27. Verdict: Don't split. Enforcement: Advisory. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
First-pass review (Netero). This is a mechanical scan only; the full review panel has not yet reviewed this PR. The panel will review after these findings are addressed.
1 P1, 1 Nit.
The decoupling approach is sound: canonicalizing Bedrock at runtime avoids a migration while ensuring models stay visible regardless of Anthropic provider presence. The test coverage for the seeding and CRUD paths is thorough.
Netero says: "The mock is incomplete; the test file imports a transitive dependency it never declared."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Panel review (Round 2). R1 findings (CRF-1, CRF-2) verified fixed.
The canonicalization design is sound: promoting legacy type=anthropic + bedrock settings to type=bedrock at every API boundary avoids a migration while fixing the visibility bug. The test coverage is thorough (47.5% test density) and the integration test BedrockOnlyProviderRemainsVisibleWithoutAnthropic directly exercises the fix. Melody walked 12 pairings across the Go/TS wire boundary and found all in agreement. Mafu-san confirmed every boundary is covered, proportional output, and intentional architectural constraints are visible.
1 P1, 1 P2, 4 P3, 1 Nit.
The P1 (seeder startup failure) was raised by 8 of 22 reviewers independently. The P2 (chatd bypasses canonicalization) was flagged by Ryosuke, who traced 6+ call sites in the chat runtime that read provider.Type raw.
Meruem: "Consequence: SeedAIProvidersFromEnv returns an error that blocks Coder startup. The operator must remove CODER_AIBRIDGE_ANTHROPIC_KEY to recover. No data loss, but unexpected downtime on what otherwise looks like a routine upgrade."
coderd/x/chatd/chatd.go:8663
P2 [CRF-4] chatd reads provider.Type raw without canonicalization at 6+ call sites.
aiProviderConfigFromKeys sets Provider: string(provider.Type) directly from the database row, bypassing the canonicalization this PR introduces. The same pattern appears at:
model_routing_direct.go:92(provider hint)model_routing_aibridge.go:245,292(route selection)subagent.go:163,509(provider matching)chatd.go:8767(provider filtering)
For legacy rows where type=anthropic with bedrock settings, chatd would use "anthropic" as the provider hint while model configs created post-PR store provider="bedrock". The API layer (exp_chats.go, db2sdk.go) canonicalizes correctly, but chatd is the primary chat runtime and misses it.
The seeder's hash-match-skip behavior means legacy rows retain type="anthropic" indefinitely unless manually updated via the API. Every new code path reading provider.Type must remember to canonicalize, which is a maintenance hazard this PR demonstrates.
Could
canonicalAIProviderTypeForRow(or its equivalent) be applied in chatd'saiProviderConfig,directProviderHintAndProviderForConfig, and the provider-type matching loops? Alternatively, if the seeder wrote the canonical type on match (a one-time self-healing write), chatd's raw reads would produce the correct value without needing canonicalization.
(Ryosuke)
🤖
coderd/ai_providers_migrate.go:152
P3 [CRF-5] Misleading error message names wrong provider when the bedrock fallback adopts a legacy row.
When the fallback at lines 106-117 adopts the old name="anthropic" row for the desired name="bedrock" provider, existing.Name is "anthropic" but dp.Name is "bedrock". If the hash comparison fails, the error says:
AI provider "bedrock" already exists in the database and differs
The operator searches for a provider named "bedrock" and finds nothing. The actual row is named "anthropic."
Could the error include both names? E.g.,
"AI provider %q (matched existing row %q) already exists..."
(Chopper)
🤖
coderd/exp_chats.go:7411
P3 [CRF-8] convertChatModelConfig passes through the raw config.Provider column without canonicalization.
Old chat_model_configs rows created for bedrock-via-anthropic providers store provider = "anthropic". New configs store provider = "bedrock". The admin model config listing returns whichever value the row carries, so the admin panel would show some models under "anthropic" and others under "bedrock" for the same underlying AI provider.
The runtime availability path (getUserChatProviderAvailability) canonicalizes correctly via modelProviderName, so chat routing is unaffected. The inconsistency is confined to the admin panel display and cost reporting breakdowns (convertChatCostModelBreakdown).
Could
convertChatModelConfigresolve the canonical provider type whenconfig.AIProviderIDis valid? This matches whatmodelProviderNamedoes ingetUserChatProviderAvailability.
(Knov P2, Meruem P3, Knuckle P3)
🤖
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Round 3: BLOCKED. 4 of 8 open findings were addressed in 857d679 (CRF-3, CRF-6, CRF-7, CRF-9). The remaining 4 have no author response or code change. Further review is blocked until each is addressed, acknowledged with rationale, or deferred with a ticket.
Unaddressed findings:
CRF-4 (P2) chatd reads provider.Type raw without canonicalization at 6+ call sites (chatd.go:8663, model_routing_direct.go:92, model_routing_aibridge.go:245,292, subagent.go:163,509, chatd.go:8767). Response needed: fix in this PR, file a GitHub issue for follow-up, or explain why the legacy type=anthropic value in chatd is safe.
CRF-5 (P3) Misleading error message names wrong provider when the bedrock fallback hits a hash mismatch (ai_providers_migrate.go:152). With CRF-3 addressed, this error path may be restructured. Response needed: confirm whether this path is still reachable and whether the message is now accurate, or fix the message.
CRF-8 (P3) convertChatModelConfig passes raw config.Provider without canonicalization (exp_chats.go:7404). Old model configs show provider="anthropic" in the admin panel for bedrock-backed models. Response needed: fix in this PR, file a GitHub issue, or explain why the admin display inconsistency is acceptable.
CRF-10 (Note) Duplicate bedrock API key rejection on create (ai_providers.go:167). The handler check preempts Validate(), making the structured validation path dead code for this condition. Response needed: acknowledge, or remove the duplication.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Re-review (Round 4, 14 reviewers). All R2 findings now addressed. CRF-4 (chatd canonicalization), CRF-5 (error message), CRF-8 (convertChatModelConfig), CRF-10 (duplicate validation) verified fixed.
The fix commits are solid. chatd canonicalization covers all 6+ call sites. The seeder upgrade path handles the legacy row gracefully. The admin model config listing now resolves canonical types via providerNameByID. Test density is 48.8%.
1 P0, 1 P3, 1 Nit.
The P0 is a test that was added to address CRF-6 but deterministically fails. Confirmed by running it. Three reviewers independently flagged it.
Pariston: "The approach is sound. The canonicalization is conservative: it only converts anthropic to bedrock when settings.Bedrock != nil. There is no path where a legitimate Anthropic provider has settings.Bedrock != nil."
coderd/x/chatd/model_routing_aibridge.go:228
P3 [CRF-13] aiGatewayRequestFormatForProviderType receives raw provider.Type without canonicalization.
The CRF-4 fix canonicalized all provider-type reads in chatd's config and routing paths, but this function takes a database.AIProviderType directly from the switch at its call site. Currently safe because the switch handles both anthropic and bedrock in the same case branch, but if Bedrock ever needs distinct request-format handling, the raw type would route incorrectly.
The same pattern appears at fantasyConfigForAIBridge (line 134).
(Ryosuke P3, Razor P3)
🤖
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
|
/coder-agents-review set-spend-limit:250 |
ce9b0de to
7f32073
Compare
There was a problem hiding this comment.
Round 5 (3 reviewers). CRF-12 (duplicate helper) and CRF-13 (raw provider.Type in AI gateway) verified fixed.
CRF-11 remains broken (now 2 rounds). The fix attempt in 7f32073 changed the failure mode from wrong-error to no-error. All 3 reviewers confirmed independently. The structural cause is a round-trip gap: AIProviderSettings{}.MarshalJSON() returns null, which the server treats as "no settings change." The wire format has no representation for "clear settings."
1 P0 (CRF-11, 2nd round), 1 P3 (CRF-14, new).
12 of 14 findings resolved across 5 rounds.
Meruem: "The API has no wire representation for 'clear settings' vs. 'don't change settings.' Both serialize to the same null. This is the class of bug, not just this test."
coderd/exp_chats.go:7094
P3 [CRF-14] Update model config response returns stale provider type for legacy rows.
When an admin updates a legacy chat model config (stored with provider = "anthropic" but backed by a Bedrock AI provider) without changing AIProviderID, the handler sets provider = existing.Provider (this line), preserving the raw database value. The response from convertChatModelConfig(updated, nil) passes nil for the provider name map, so the response carries the stale "anthropic" value.
The listing endpoint returns "bedrock" for the same config because it builds a canonicalization map. Same config, different provider values depending on the endpoint.
Impact is cosmetic; chat routing canonicalizes independently.
Could the non-
req.AIProviderIDbranch look up the existing AI provider (whenexisting.AIProviderID.Valid) and canonicalize before settingupdateParams.Provider?
(Mafuuu P3, Meruem P3)
🤖
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Round 6 (3 reviewers, Netero clean). CRF-11 (P0, failing test) and CRF-14 (P3, stale update response) verified fixed. The custom MarshalJSON on UpdateAIProviderRequest correctly breaks the null/{} serialization ambiguity, and the update handler now canonicalizes legacy provider types.
14 of 15 findings resolved across 6 rounds. 1 new P3 (test gap).
Pariston: "I tried to build a case against this PR's premises and could not. The problem is correctly understood, the solution is proportional, and the fix is at the right causal level."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Round 7 (3 reviewers, Netero clean). CRF-15 (legacy ListChatModels test gap) verified fixed in 4b4c146 with LegacyBedrockModelConfigUsesCanonicalProvider. Test density is now 50.7%.
15 of 17 findings resolved across 7 rounds. 2 new findings (1 P3 test gap, 1 Nit naming).
Pariston: "I tried to build a case against this change and could not. The solution is proportional. Every boundary that reads provider.Type now routes through CanonicalAIProviderType. I confirmed zero raw reads remain via grep."
Mafuuu: no findings. Contract fidelity confirmed.
The PR is in strong shape. The remaining P3 and Nit are test quality observations, not functional defects. All production code paths are canonicalized and tested.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Round 8 (4 reviewers, Netero clean). CRF-16 (seeder cross-name error test) and CRF-17 (misleading test name) verified fixed. All 4 reviewers and Netero report no findings.
17 of 17 findings resolved across 8 rounds. Test density 50.7%.
The canonicalization design covers every boundary: Go SDK, db2sdk, coderd handlers, chatd routing, frontend provider mapping. The seeder handles legacy upgrades gracefully. The custom MarshalJSON on UpdateAIProviderRequest breaks the null/empty-object serialization ambiguity. Every canonicalization path is integration-tested with dbgen-inserted legacy data.
Pariston (R6): "I tried to build a case against this change and could not. The solution is proportional. Every boundary that reads provider.Type now routes through CanonicalAIProviderType. I confirmed zero raw reads remain via grep."
🤖 This review was automatically generated with Coder Agents.
# Conflicts: # coderd/exp_chats.go # coderd/exp_chats_test.go
|
/coder-agents-review |
There was a problem hiding this comment.
Round 9 (12 reviewers, Netero clean, Law: don't split). Scope expanded with CODAGT-549 folded in from #26038 (+403 new additions, 7 new files). All 17 prior findings remain resolved.
9 of 12 reviewers found no findings. The expanded scope (chatd canonical-type-first matching, Bedrock-as-Anthropic computer-use routing, aiProviderTypeCanSatisfyRequest) is well-integrated. Melody traced 14 pairings across the wire boundary and confirmed all agree. Law analyzed 10 concerns with a linear dependency chain and concluded the PR delivers one reviewable idea.
2 new P3s, both individual (no convergence). 17 of 19 findings resolved.
Pariston: "I tried to build a case against this change and could not. The CODAGT-549 additions are a necessary consequence of the type change. chatd's canonical-type-first matching with raw fallback handles the upgrade window correctly."
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
|
@ibetitsmike ⛔ This review has reached its per-chat spend limit ($287.00 / $250.00). Further review rounds are paused. To raise the limit and continue, comment: This is a per-chat budget, separate from any account-level usage limit.
|
|
/coder-agents-review set-spend-limit:300 |
|
OK fine, @ibetitsmike, $300.00 it is... but I'm telling finance about this. 💸
|
|
/coder-agents-review |
Bedrock AI providers could disappear from Agents unless an Anthropic provider was also configured, because legacy Bedrock rows were treated as Anthropic at provider boundaries while model configs referenced Bedrock.
This keeps new Bedrock writes native to
type=bedrock, canonicalizes legacy Anthropic-shaped Bedrock rows at provider boundaries, splits legacy Anthropic and Bedrock env seeding, and routes Bedrock creation from the simplified Agents provider form to the full AI settings flow. This intentionally avoids database migrations.This also folds in CODAGT-549 from #26038. Linked Bedrock Agents model configs now serialize and render with Bedrock identity, chatd matches providers by canonical type first with raw fallback for safe routing, and Bedrock can satisfy Anthropic computer-use routing when it uses the Anthropic-compatible runtime schema.
Supersedes #26038.
Refs CODAGT-548
Refs CODAGT-549