fix: backfill legacy Bedrock AI provider rows and stale model config strings#26155
Conversation
| // Promote legacy type=anthropic rows that carry Bedrock settings. | ||
| // Must run after newAPI so options.Database is dbcrypt-wrapped. | ||
| coderd.BackfillBedrockProviderType(aibridgeInitCtx, options.Database, logger.Named("aibridge.backfill")) | ||
| // Fix stale provider strings on model configs after provider types are corrected. | ||
| // Both backfills share aibridgeInitCtx; if the first times out the second | ||
| // will be skipped and retried on the next startup. | ||
| coderd.BackfillChatModelConfigProviderStrings(aibridgeInitCtx, options.Database, logger.Named("aibridge.backfill")) |
There was a problem hiding this comment.
self-review: these should both be fairly quick, but I'm happy to bump aibridgeInitCtx if needed.
There was a problem hiding this comment.
Yeah I think we should be fine 👍
| //nolint:gocritic // Startup-only backfill; no user actor is present. | ||
| sysCtx := dbauthz.AsSystemRestricted(ctx) |
There was a problem hiding this comment.
self-review: I'm happy to change this to a different actor but this seems like one of those rare cases where AsSystemRestricted is actually defensible.
|
/coder-agents-review |
|
Chat: Review in progress | View chat deep-review v0.7.1 | Round 3 | Last posted: Round 3, 16 findings (1 P0, 3 P2, 4 P3, 8 Nit), APPROVE. Review Finding inventoryFinding InventoryFindings
Round logRound 2Panel (re-review). CRF-1 through CRF-12 all addressed. CRF-5 accepted. 1 P2, 1 P3, 2 Nit new. 1 dropped (CRF-15 subsumed by CRF-17). Reviewed against 75dbc57..8c37323. Round 3Netero-only (all findings resolved). CRF-16, CRF-17, CRF-19 fixed. CRF-18 accepted. Netero clean. Reviewed against 75dbc57..1f727d5. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Pull request overview
Adds idempotent startup-time database backfills to repair legacy “Bedrock stored as Anthropic” AI provider rows and to update stale chat_model_configs.provider strings so the UI and routing reflect the canonical provider type, without introducing a numbered migration that would break cross-branch upgrade paths.
Changes:
- Add startup backfills to (1) promote legacy
ai_providers.type=anthropicrows that contain Bedrock settings totype=bedrock, and (2) updatechat_model_configs.providerfrom"anthropic"→"bedrock"when linked providers were promoted. - Update
UpdateAIProviderSQL/query bindings so thetypecolumn is written on update, and adjust call sites/tests accordingly. - Add DB/sqlc plumbing (query, interface, authz wrapper, metrics wrapper, mocks) and comprehensive tests (including encrypted-settings coverage).
Reviewed changes
Copilot reviewed 13 out of 16 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| enterprise/dbcrypt/dbcrypt_internal_test.go | Updates AI provider update tests for new Type parameter. |
| enterprise/coderd/x/chatd/chatd_test.go | Updates test helper to pass provider Type in updates. |
| enterprise/coderd/ai_providers_backfill_test.go | Adds enterprise test ensuring backfill works with encrypted settings. |
| coderd/x/chatd/chatd_test.go | Updates test helper to pass provider Type in updates. |
| coderd/database/queries/chatmodelconfigs.sql | Adds BackfillChatModelConfigProvider query to fix stale provider strings. |
| coderd/database/queries/ai_providers.sql | Updates UpdateAIProvider to also write the type column. |
| coderd/database/queries.sql.go | Regenerates sqlc output for new query and updated UpdateAIProvider params. |
| coderd/database/querier.go | Extends sqlc querier interface to include the backfill query (returns sql.Result). |
| coderd/database/dbmock/dbmock.go | Adds gomock method for BackfillChatModelConfigProvider. |
| coderd/database/dbmetrics/querymetrics.go | Adds query metrics wrapper for the new backfill query. |
| coderd/database/dbauthz/dbauthz.go | Adds authz wrapper for the new backfill query. |
| coderd/database/dbauthz/dbauthz_test.go | Adds method accounting test coverage for the new authz wrapper. |
| coderd/ai_providers.go | Ensures API update path supplies Type to UpdateAIProvider. |
| coderd/ai_providers_backfill.go | Implements the two startup backfill functions. |
| coderd/ai_providers_backfill_test.go | Adds DB-backed and failure-path tests for both backfills. |
| cli/server.go | Wires both backfills into server startup after newAPI() / dbcrypt wrapping. |
Files not reviewed (3)
- coderd/database/dbmetrics/querymetrics.go: Language not supported
- coderd/database/dbmock/dbmock.go: Language not supported
- coderd/database/querier.go: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dannykopping
left a comment
There was a problem hiding this comment.
LGTM modulo a few minor comments
| // Promote legacy type=anthropic rows that carry Bedrock settings. | ||
| // Must run after newAPI so options.Database is dbcrypt-wrapped. | ||
| coderd.BackfillBedrockProviderType(aibridgeInitCtx, options.Database, logger.Named("aibridge.backfill")) | ||
| // Fix stale provider strings on model configs after provider types are corrected. | ||
| // Both backfills share aibridgeInitCtx; if the first times out the second | ||
| // will be skipped and retried on the next startup. | ||
| coderd.BackfillChatModelConfigProviderStrings(aibridgeInitCtx, options.Database, logger.Named("aibridge.backfill")) |
There was a problem hiding this comment.
Yeah I think we should be fine 👍
| UPDATE | ||
| ai_providers | ||
| SET | ||
| type = @type::ai_provider_type, |
There was a problem hiding this comment.
This does present an interesting/awkward semantic whereby types could be changed imperatively by operators later, but this would need API/UI changes so for now I think it's fine.
There was a problem hiding this comment.
The backfill logic is well-structured: idempotent, resilient to partial failure, correctly ordered, and tested at high density (73.9%). The SQL query for model config backfill is atomic and properly constrained. The enterprise dbcrypt test proves the encrypted round-trip works. Test coverage of edge cases (deleted rows, disabled rows, unparsable settings, non-Bedrock Anthropic) is thorough.
Severity breakdown: 1 P0, 1 P2, 3 P3, 6 Nit.
The P0 blocks merge: SeedAIProvidersFromEnv still assigns type=anthropic for legacy Bedrock configs (line 330 of ai_providers_migrate.go). After the backfill promotes the row to type=bedrock, the next startup's seed sees a hash mismatch and the server refuses to start. The seed and the backfill fight over the same row's type, and the seed's drift detection treats the backfill's fix as corruption.
"A 20-line doc comment that narrates chatd's routing internals and the frontend's icon selection is a love letter that arrived in the wrong mailbox." (Gon)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All 12 R1 findings addressed: 11 fixed, 1 accepted (CRF-5). The P0 fix is solid; the seed now writes type=bedrock directly for legacy Bedrock configs and normalizes existing rows before hash comparison. The test fixes, comment trimming, and success logging are all clean.
Severity breakdown (R2): 1 P2, 1 P3, 2 Nit.
The P2 is a test coverage gap in the CRF-1 fix itself: SeedIsIdempotentAfterBedrockBackfill does not exercise the canonical-type normalization code that prevents a startup crash on upgrade. The test starts from a fresh DB where the seed creates type=bedrock directly, so the normalization branch (ai_providers_migrate.go:129-135) is never reached. Four reviewers independently found this.
Pariston notes a simplification opportunity: if the backfill ran before the seed (instead of after), the normalization code at lines 129-135 would be unnecessary. On a fresh deployment the backfill would be a no-op; on a legacy deployment it would promote before the seed checks hashes. Not a defect, but worth considering.
"Coverage data confirms: the normalization branch at ai_providers_migrate.go:133-135 is executed 0 times across all seed tests." (Pariston)
🤖 This review was automatically generated with Coder Agents.
dannykopping
left a comment
There was a problem hiding this comment.
LGTM, thanks a mil!
Discussed this with @dannykopping. While the idea is genuinely interesting, it mixes concerns and may not handle the .0 -> .N upgrade path correctly. |
|
/coder-agents-review |
There was a problem hiding this comment.
All findings resolved across three rounds (19 findings total: 14 fixed, 2 accepted, 3 dropped). Netero R3 and panel R3 are clean.
R1 caught a P0 (server crash on restart when seed and backfill fight over the type column) that was fixed promptly. R2 caught a P2 (the regression test for the P0 fix didn't exercise the normalization code it existed to protect) that was also fixed correctly: the test now inserts a pre-upgrade-state row directly rather than going through the updated seed. The author's responses throughout were substantive and thorough.
The backfill logic is solid: idempotent, resilient to partial failure, correctly ordered, and tested at high density (77.5%). The seed normalization handles the upgrade path. CI is green.
🤖 This review was automatically generated with Coder Agents.
|
Pending smoke-test before merge |
|
@dannykopping smoke-tested and no magic smoke escaped ✅ |
Bedrock AI providers could disappear from the Agents page unless an Anthropic provider was also configured, because legacy deployments stored Bedrock providers as
type=anthropicwith Bedrock settings in the settings blob. The frontend also showed Anthropic styling and icons for Bedrock-backed model configs because the storedproviderstring was"anthropic".A numbered database migration cannot safely fix this: release/2.34 is at migration 509 while main is at 516, so inserting a migration on both branches creates a hole that breaks the upgrade path from 2.34 to 2.35.
Instead, two idempotent startup backfills run after
newAPI()(which applies the dbcrypt wrapper so encrypted settings are readable):BackfillBedrockProviderType: promotesai_providersrows fromtype=anthropicwith Bedrock settings totype=bedrock.BackfillChatModelConfigProviderStrings: fixes stalechat_model_configs.provider = "anthropic"strings on rows whose linked provider was just promoted.The
UpdateAIProviderquery now also writes thetypecolumn, so the fix persists on any subsequent PATCH.Refs CODAGT-548