refactor: load AI providers from the database at startup#25672
Conversation
Docs preview📖 View docs preview for |
df99b3e to
0702534
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
0702534 to
5a884de
Compare
|
/coder-agents-review |
|
Review posted | Chat Review historydeep-review v0.5.0 | Round 2 | Last posted: Round 2, 19 findings (1 P2, 11 P3, 7 Nit), COMMENT. Review Finding inventoryFindings
Contested and acknowledged(none) Round logRound 1Panel. 1 P2, 8 P3, 8 Nit/Note. Reviewed against 32ed9f1..5a884de. Netero + 16-reviewer panel (Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Gon, Leorio, Ryosuke, Kurapika, Kite, Chopper, Meruem, Robin, Knov, Razor). Round 2Panel. CRF-6 re-raised (author claimed fix but test not added). 2 new P3s. 14 R1 fixes verified. CI failing (test-go-pg). Reviewed against 32ed9f1..3ec6f29. Netero + 9-reviewer panel (Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper, Kite, Kurapika, Meruem). About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Clean refactor. The two-phase boot (seed env to DB, then build from DB) is the right architecture: it unifies the source of truth, lets API-created providers participate, and paves the way for runtime refresh. The decomposition into buildAIProviderFromRow, buildAIProviderKeyPool, bedrockConfigFromRow, and circuitBreakerConfig keeps the switch tight. The enterprise test rewrite to pure unit scope is a genuine improvement. Kurapika confirmed the RBAC grant is minimal and the provider-name regex prevents directory traversal in the dump path.
1 P2, 8 P3, 8 Nit. The P2 is a behavioral regression: keyless OpenAI/Anthropic providers (BYOK-only) are now rejected at startup where they previously worked. The P3s cluster around test coverage gaps for the new error-handling contract, a misleading RBAC comment, and a few interaction-level concerns.
"The daemon starts, appears healthy, and every user request fails. The operator has to notice individual error log lines or wait for user complaints." (Mafuuu, on what happens when all providers fail construction)
coderd/ai_providers_migrate.go:294
P3 [CRF-14] When both CODER_AI_GATEWAY_ANTHROPIC_KEY and CODER_AIBRIDGE_BEDROCK_* are set, the hasLegacyBedrock branch takes precedence and dp.Keys is never populated. The Anthropic API key is silently dropped.
Before this PR, the old BuildProviders carried both the keypool (from the Anthropic key) and the Bedrock config to NewAnthropicProvider. Now the seed decides what reaches the DB, and the key is lost. The indexed provider path handles the same situation correctly at line 376-382 by logging "ignoring bearer keys configured on Bedrock AI provider".
An admin upgrading from a dual-credential deployment gets no log explaining why their Anthropic key is now unused. Could we add a logger.Warn when hasLegacyBedrock && hasAnthropicKey, mirroring the indexed path? (Razor)
🤖
🤖 This review was automatically generated with Coder Agents.
Addresses all review findings: - CRF-1: Use filepath.Join instead of path.Join for filesystem paths - CRF-2: Remove no-op empty string concatenation in test - CRF-3: Use dumpBase constant instead of hardcoded path in test - CRF-4: Gate BYOK key requirement on cfg.AllowBYOK.Value() so keyless OpenAI/Anthropic providers work when BYOK is enabled - CRF-5: Trim RBAC comment to only mention startup loading (pubsub refresh is implemented upstack) - CRF-7: Add Model and SmallFastModel assertions to BedrockSettingsPresent test - CRF-8: Add logger.Warn when all enabled providers fail to build - CRF-9: Remove dead nil guard (buildAIProviderFromRow never returns nil, nil) - CRF-10: Restore IsBedrockConfigured guard in bedrockConfigFromRow - CRF-11: Use GetAIProviderKeysByProviderIDs to load only enabled provider keys - CRF-12: Wrap GetAIProviders and key query in a read-only transaction - CRF-13: Rename logBase to dumpBase in test - CRF-14: Log warning when legacy Anthropic key is dropped due to Bedrock - CRF-15: Trim redundant doc comment first sentences - CRF-16: Use ptr.NilToEmpty instead of manual nil-pointer dereference - CRF-17: Fix log message to use gerund form (skipping misconfigured ai provider)
|
/coder-agents-review |
Addresses all review findings: - CRF-1: Use filepath.Join instead of path.Join for filesystem paths - CRF-2: Remove no-op empty string concatenation in test - CRF-3: Use dumpBase constant instead of hardcoded path in test - CRF-4: Gate BYOK key requirement on cfg.AllowBYOK.Value() so keyless OpenAI/Anthropic providers work when BYOK is enabled - CRF-5: Trim RBAC comment to only mention startup loading (pubsub refresh is implemented upstack) - CRF-7: Add Model and SmallFastModel assertions to BedrockSettingsPresent test - CRF-8: Add logger.Warn when all enabled providers fail to build - CRF-9: Remove dead nil guard (buildAIProviderFromRow never returns nil, nil) - CRF-10: Restore IsBedrockConfigured guard in bedrockConfigFromRow - CRF-11: Use GetAIProviderKeysByProviderIDs to load only enabled provider keys - CRF-12: Wrap GetAIProviders and key query in a read-only transaction - CRF-13: Rename logBase to dumpBase in test - CRF-14: Log warning when legacy Anthropic key is dropped due to Bedrock - CRF-15: Trim redundant doc comment first sentences - CRF-16: Use ptr.NilToEmpty instead of manual nil-pointer dereference - CRF-17: Fix log message to use gerund form (skipping misconfigured ai provider)
3ec6f29 to
1f8832a
Compare
There was a problem hiding this comment.
R2 re-review. 16 of 17 R1 findings verified fixed; excellent fix commit with itemized tracking. 3 open findings remain (1 re-raised, 2 new). All P3.
The fix commit is thorough: CRF-4 gates on AllowBYOK, CRF-12 wraps queries in a read-only transaction, CRF-10 restores the IsBedrockConfigured guard, and the enterprise tests were correctly decoupled from the DB-backed builder. The order-independent map-based test lookups are a genuine improvement over index-based assertions.
CRF-6 was claimed fixed but the skip-and-continue path in BuildProviders still has no direct test. The UnknownType test exercises the seed-level skip, not the build-time skip. 4 of 9 reviewers independently verified this.
"The commit message is honest about what it fixes; the inventory status is inaccurate." (Mafu-san, on CRF-6)
CI is failing on test-go-pg across all platforms.
🤖 This review was automatically generated with Coder Agents.
- CRF-6: Add direct BuildProviders skip-and-continue test coverage for corrupt settings, unsupported types, and mixed bad/good rows. - CRF-18: Scope OpenAI keypool err locally so an unrelated outer err cannot surface as "openai key pool: ..." in BYOK paths. - CRF-19: Gate bedrockConfigFromRow on AIProviderBedrockSettings.IsConfigured instead of IsBedrockConfigured(row.BaseUrl, ...). The row's generic BaseUrl is always non-empty and so cannot serve as a Bedrock signal; add a BedrockSettingsEmpty test for the non-nil-but-empty case. - Drop the now-meaningless APIDumpDir() assertions and unused dumpBase config in IndexedOnly and CopilotVariants. Providers receive the base dump directory directly; the per-provider subdirectory is appended downstream by apidump.NewBridgeMiddleware.
Documentation CheckUpdates Needed
New Documentation Needed
Automated review via Coder Agents |
Replace the env-based BuildProviders with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via SeedAIProvidersFromEnv (run at boot) and BuildProviders reads it back as [aibridge.Provider] instances. cli/server.go and enterprise/cli/server.go both call the same path, so aibridged and aibridgeproxyd see the same provider set. Move the per-provider DumpDir to a top-level CODER_AI_GATEWAY_DUMP_DIR base. Each provider's effective dump path is <base>/<provider name>, keeping the debug feature alive without adding a dump_dir column to ai_providers. dbauthz.AsAIBridged gains ResourceAIProvider:Read so the loader can read the live provider snapshot under a narrowly scoped subject. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses all review findings: - CRF-1: Use filepath.Join instead of path.Join for filesystem paths - CRF-2: Remove no-op empty string concatenation in test - CRF-3: Use dumpBase constant instead of hardcoded path in test - CRF-4: Gate BYOK key requirement on cfg.AllowBYOK.Value() so keyless OpenAI/Anthropic providers work when BYOK is enabled - CRF-5: Trim RBAC comment to only mention startup loading (pubsub refresh is implemented upstack) - CRF-7: Add Model and SmallFastModel assertions to BedrockSettingsPresent test - CRF-8: Add logger.Warn when all enabled providers fail to build - CRF-9: Remove dead nil guard (buildAIProviderFromRow never returns nil, nil) - CRF-10: Restore IsBedrockConfigured guard in bedrockConfigFromRow - CRF-11: Use GetAIProviderKeysByProviderIDs to load only enabled provider keys - CRF-12: Wrap GetAIProviders and key query in a read-only transaction - CRF-13: Rename logBase to dumpBase in test - CRF-14: Log warning when legacy Anthropic key is dropped due to Bedrock - CRF-15: Trim redundant doc comment first sentences - CRF-16: Use ptr.NilToEmpty instead of manual nil-pointer dereference - CRF-17: Fix log message to use gerund form (skipping misconfigured ai provider)
- CRF-6: Add direct BuildProviders skip-and-continue test coverage for corrupt settings, unsupported types, and mixed bad/good rows. - CRF-18: Scope OpenAI keypool err locally so an unrelated outer err cannot surface as "openai key pool: ..." in BYOK paths. - CRF-19: Gate bedrockConfigFromRow on AIProviderBedrockSettings.IsConfigured instead of IsBedrockConfigured(row.BaseUrl, ...). The row's generic BaseUrl is always non-empty and so cannot serve as a Bedrock signal; add a BedrockSettingsEmpty test for the non-nil-but-empty case. - Drop the now-meaningless APIDumpDir() assertions and unused dumpBase config in IndexedOnly and CopilotVariants. Providers receive the base dump directory directly; the per-provider subdirectory is appended downstream by apidump.NewBridgeMiddleware.
50ed5f8 to
10ef9c7
Compare
Merge activity
|

Replace the env-based
BuildProviderswith a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives viaSeedAIProvidersFromEnv(run at boot) andBuildProvidersreads it back asaibridge.Providerinstances.cli/server.goandenterprise/cli/server.goboth call the same path, so aibridged and aibridgeproxyd see the same provider set.Per-provider
DumpDiris replaced by a top-levelCODER_AI_GATEWAY_DUMP_DIRbase; each provider's effective dump path is<base>/<provider name>.