Skip to content

refactor: load AI providers from the database at startup#25672

Merged
dannykopping merged 5 commits into
mainfrom
dk/aibridge-providers-db-load
May 26, 2026
Merged

refactor: load AI providers from the database at startup#25672
dannykopping merged 5 commits into
mainfrom
dk/aibridge-providers-db-load

Conversation

@dannykopping
Copy link
Copy Markdown
Contributor

@dannykopping dannykopping commented May 26, 2026

Replace the env-based BuildProviders with a DB-backed loader. The database is now the single source of truth for runtime provider configuration; env config arrives via SeedAIProvidersFromEnv (run at boot) and BuildProviders reads it back as aibridge.Provider instances. cli/server.go and enterprise/cli/server.go both call the same path, so aibridged and aibridgeproxyd see the same provider set.

Per-provider DumpDir is replaced by a top-level CODER_AI_GATEWAY_DUMP_DIR base; each provider's effective dump path is <base>/<provider name>.

@github-actions
Copy link
Copy Markdown

Docs preview

📖 View docs preview for docs/reference/api/general.md

Copy link
Copy Markdown
Contributor Author

dannykopping commented May 26, 2026

@dannykopping dannykopping force-pushed the dk/aibridge-providers-db-load branch from 0702534 to 5a884de Compare May 26, 2026 09:14
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

@coder-agents-review
Copy link
Copy Markdown
Contributor

coder-agents-review Bot commented May 26, 2026

Review posted | Chat
Requested: 2026-05-26 10:49 UTC by @dannykopping
Spend: $78.79 / $100.00

Review history
  • R1 (2026-05-26): 16 reviewers, 7 Nit, 1 P2, 9 P3, COMMENT. Review
  • R2 (2026-05-26): 9 reviewers, 7 Nit, 1 P2, 11 P3, COMMENT. Review

deep-review v0.5.0 | Round 2 | 32ed9f1..3ec6f29

Last posted: Round 2, 19 findings (1 P2, 11 P3, 7 Nit), COMMENT. Review

Finding inventory

Findings

# Sev Status Location Summary Round Reviewer Posted
CRF-1 P3 Author fixed (3ec6f29) cli/aibridged.go:121 path.Join used for filesystem path; should be filepath.Join R1 Netero Yes
CRF-2 Nit Author fixed (3ec6f29) cli/aibridged_internal_test.go:221 No-op ""+ string concatenation R1 Netero Yes
CRF-3 Nit Author fixed (3ec6f29) cli/aibridged_internal_test.go:202 Hardcoded /tmp/aibridge-dump instead of logBase constant R1 Netero Yes
CRF-4 P2 Author fixed (3ec6f29) cli/aibridged.go:126 BYOK-only OpenAI/Anthropic providers silently dropped at startup R1 Hisoka P2, Meruem P2, Kite Note Yes
CRF-5 P3 Author fixed (3ec6f29) coderd/database/dbauthz/dbauthz.go:630 RBAC comment claims non-existent pubsub-driven provider refresh R1 Gon, Leorio, Meruem, Ryosuke, Razor Yes
CRF-6 P3 Open (re-raised R2; author claimed fix but no test added) cli/aibridged_internal_test.go:29 No test exercises BuildProviders skip-and-continue path directly R1 Bisky P3, Kite P3, Chopper P3 Yes
CRF-7 P3 Author fixed (3ec6f29) cli/aibridged_internal_test.go:264 BedrockSettingsPresent test missing Model/SmallFastModel assertions R1 Bisky Yes
CRF-8 P3 Author fixed (3ec6f29) cli/aibridged.go:100 All-providers-fail returns success (empty, nil); no aggregate diagnostic R1 Mafuuu P3, Knov P3 Yes
CRF-9 Nit Author fixed (3ec6f29) cli/aibridged.go:95 Dead prov == nil guard; buildAIProviderFromRow never returns (nil, nil) R1 Ryosuke, Kite, Chopper, Meruem, Knov, Pariston, Mafuuu, Gon, Mafu-san, Razor Yes
CRF-10 P3 Author fixed (3ec6f29) cli/aibridged.go:196 bedrockConfigFromRow drops IsBedrockConfigured guard from old code R1 Hisoka Yes
CRF-11 P3 Author fixed (3ec6f29) cli/aibridged.go:74 GetAIProviderKeys loads plaintext keys for disabled providers R1 Meruem P3, Ryosuke Note Yes
CRF-12 P3 Author fixed (3ec6f29) cli/aibridged.go:66 GetAIProviders and GetAIProviderKeys run without a transaction R1 Ryosuke Yes
CRF-13 Nit Author fixed (3ec6f29) cli/aibridged_internal_test.go:65 logBase names a dump directory, not a log directory R1 Gon Yes
CRF-14 P3 Author fixed (3ec6f29) coderd/ai_providers_migrate.go:294 Legacy seed drops Anthropic API key when Bedrock also configured R1 Razor Yes
CRF-15 Nit Author fixed (3ec6f29) cli/aibridged.go:182 Doc comments could be tighter (four instances restate signatures) R1 Gon Yes
CRF-16 Nit Author fixed (3ec6f29) cli/aibridged.go:200 Manual nil-pointer dereference; ptr.NilToEmpty exists R1 Robin Yes
CRF-17 Nit Author fixed (3ec6f29) cli/aibridged.go:87 Log message "skip ai provider" breaks gerund convention R1 Leorio Yes
CRF-18 P3 Open cli/aibridged.go:152 OpenAI keypool err reuses outer err; Anthropic correctly scopes it R2 Hisoka P3, Chopper P3, Meruem P3, Mafuuu Nit, Pariston Nit, Kite Nit Yes
CRF-19 P3 Open cli/aibridged.go:223 IsBedrockConfigured guard is vacuous for DB-loaded rows (BaseUrl always non-empty) R2 Mafuuu Yes

Contested and acknowledged

(none)

Round log

Round 1

Panel. 1 P2, 8 P3, 8 Nit/Note. Reviewed against 32ed9f1..5a884de. Netero + 16-reviewer panel (Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Ging-Go, Gon, Leorio, Ryosuke, Kurapika, Kite, Chopper, Meruem, Robin, Knov, Razor).

Round 2

Panel. CRF-6 re-raised (author claimed fix but test not added). 2 new P3s. 14 R1 fixes verified. CI failing (test-go-pg). Reviewed against 32ed9f1..3ec6f29. Netero + 9-reviewer panel (Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Chopper, Kite, Kurapika, Meruem).

About deep-review

CRF = Coder Review Finding (P0-P4, Nit, Note)

Reviewer Focus
Bisky tests
Chopper ops/errors
Churn-guard change verification
Ging language modernization
Gon naming
Hisoka edge cases
Killua perf
Kite change integrity
Knov contracts
Knuckle SQL
Kurapika security
Leorio docs
Luffy product
Mafu-san process
Mafuuu contracts
Melody dispatch/pairing
Meruem structural
Nami frontend
Netero mechanical checks
Pariston premise testing
Pen-botter product gaps
Razor verification
Robin duplication
Ryosuke Go arch
Takumi concurrency
Zoro shape

🤖 Managed by Coder Agents.

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean refactor. The two-phase boot (seed env to DB, then build from DB) is the right architecture: it unifies the source of truth, lets API-created providers participate, and paves the way for runtime refresh. The decomposition into buildAIProviderFromRow, buildAIProviderKeyPool, bedrockConfigFromRow, and circuitBreakerConfig keeps the switch tight. The enterprise test rewrite to pure unit scope is a genuine improvement. Kurapika confirmed the RBAC grant is minimal and the provider-name regex prevents directory traversal in the dump path.

1 P2, 8 P3, 8 Nit. The P2 is a behavioral regression: keyless OpenAI/Anthropic providers (BYOK-only) are now rejected at startup where they previously worked. The P3s cluster around test coverage gaps for the new error-handling contract, a misleading RBAC comment, and a few interaction-level concerns.

"The daemon starts, appears healthy, and every user request fails. The operator has to notice individual error log lines or wait for user complaints." (Mafuuu, on what happens when all providers fail construction)


coderd/ai_providers_migrate.go:294

P3 [CRF-14] When both CODER_AI_GATEWAY_ANTHROPIC_KEY and CODER_AIBRIDGE_BEDROCK_* are set, the hasLegacyBedrock branch takes precedence and dp.Keys is never populated. The Anthropic API key is silently dropped.

Before this PR, the old BuildProviders carried both the keypool (from the Anthropic key) and the Bedrock config to NewAnthropicProvider. Now the seed decides what reaches the DB, and the key is lost. The indexed provider path handles the same situation correctly at line 376-382 by logging "ignoring bearer keys configured on Bedrock AI provider".

An admin upgrading from a dual-credential deployment gets no log explaining why their Anthropic key is now unused. Could we add a logger.Warn when hasLegacyBedrock && hasAnthropicKey, mirroring the indexed path? (Razor)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread cli/aibridged.go Outdated
Comment thread coderd/database/dbauthz/dbauthz.go Outdated
Comment thread cli/aibridged_internal_test.go
Comment thread cli/aibridged_internal_test.go
Comment thread cli/aibridged.go
Comment thread cli/aibridged_internal_test.go Outdated
Comment thread cli/aibridged_internal_test.go Outdated
Comment thread cli/aibridged.go Outdated
Comment thread cli/aibridged.go Outdated
Comment thread cli/aibridged.go Outdated
dannykopping added a commit that referenced this pull request May 26, 2026
Addresses all review findings:

- CRF-1: Use filepath.Join instead of path.Join for filesystem paths
- CRF-2: Remove no-op empty string concatenation in test
- CRF-3: Use dumpBase constant instead of hardcoded path in test
- CRF-4: Gate BYOK key requirement on cfg.AllowBYOK.Value() so keyless
  OpenAI/Anthropic providers work when BYOK is enabled
- CRF-5: Trim RBAC comment to only mention startup loading (pubsub
  refresh is implemented upstack)
- CRF-7: Add Model and SmallFastModel assertions to BedrockSettingsPresent test
- CRF-8: Add logger.Warn when all enabled providers fail to build
- CRF-9: Remove dead nil guard (buildAIProviderFromRow never returns nil, nil)
- CRF-10: Restore IsBedrockConfigured guard in bedrockConfigFromRow
- CRF-11: Use GetAIProviderKeysByProviderIDs to load only enabled provider keys
- CRF-12: Wrap GetAIProviders and key query in a read-only transaction
- CRF-13: Rename logBase to dumpBase in test
- CRF-14: Log warning when legacy Anthropic key is dropped due to Bedrock
- CRF-15: Trim redundant doc comment first sentences
- CRF-16: Use ptr.NilToEmpty instead of manual nil-pointer dereference
- CRF-17: Fix log message to use gerund form (skipping misconfigured ai provider)
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

dannykopping added a commit that referenced this pull request May 26, 2026
Addresses all review findings:

- CRF-1: Use filepath.Join instead of path.Join for filesystem paths
- CRF-2: Remove no-op empty string concatenation in test
- CRF-3: Use dumpBase constant instead of hardcoded path in test
- CRF-4: Gate BYOK key requirement on cfg.AllowBYOK.Value() so keyless
  OpenAI/Anthropic providers work when BYOK is enabled
- CRF-5: Trim RBAC comment to only mention startup loading (pubsub
  refresh is implemented upstack)
- CRF-7: Add Model and SmallFastModel assertions to BedrockSettingsPresent test
- CRF-8: Add logger.Warn when all enabled providers fail to build
- CRF-9: Remove dead nil guard (buildAIProviderFromRow never returns nil, nil)
- CRF-10: Restore IsBedrockConfigured guard in bedrockConfigFromRow
- CRF-11: Use GetAIProviderKeysByProviderIDs to load only enabled provider keys
- CRF-12: Wrap GetAIProviders and key query in a read-only transaction
- CRF-13: Rename logBase to dumpBase in test
- CRF-14: Log warning when legacy Anthropic key is dropped due to Bedrock
- CRF-15: Trim redundant doc comment first sentences
- CRF-16: Use ptr.NilToEmpty instead of manual nil-pointer dereference
- CRF-17: Fix log message to use gerund form (skipping misconfigured ai provider)
@dannykopping dannykopping force-pushed the dk/aibridge-providers-db-load branch from 3ec6f29 to 1f8832a Compare May 26, 2026 11:40
Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 re-review. 16 of 17 R1 findings verified fixed; excellent fix commit with itemized tracking. 3 open findings remain (1 re-raised, 2 new). All P3.

The fix commit is thorough: CRF-4 gates on AllowBYOK, CRF-12 wraps queries in a read-only transaction, CRF-10 restores the IsBedrockConfigured guard, and the enterprise tests were correctly decoupled from the DB-backed builder. The order-independent map-based test lookups are a genuine improvement over index-based assertions.

CRF-6 was claimed fixed but the skip-and-continue path in BuildProviders still has no direct test. The UnknownType test exercises the seed-level skip, not the build-time skip. 4 of 9 reviewers independently verified this.

"The commit message is honest about what it fixes; the inventory status is inaccurate." (Mafu-san, on CRF-6)

CI is failing on test-go-pg across all platforms.

🤖 This review was automatically generated with Coder Agents.

Comment thread cli/aibridged.go
Comment thread cli/aibridged_internal_test.go
Comment thread cli/aibridged.go Outdated
@dannykopping dannykopping marked this pull request as ready for review May 26, 2026 12:00
Comment thread cli/server_aibridge_internal_test.go
Comment thread enterprise/cli/server.go
dannykopping added a commit that referenced this pull request May 26, 2026
- CRF-6: Add direct BuildProviders skip-and-continue test coverage for
  corrupt settings, unsupported types, and mixed bad/good rows.
- CRF-18: Scope OpenAI keypool err locally so an unrelated outer err
  cannot surface as "openai key pool: ..." in BYOK paths.
- CRF-19: Gate bedrockConfigFromRow on AIProviderBedrockSettings.IsConfigured
  instead of IsBedrockConfigured(row.BaseUrl, ...). The row's generic
  BaseUrl is always non-empty and so cannot serve as a Bedrock signal;
  add a BedrockSettingsEmpty test for the non-nil-but-empty case.
- Drop the now-meaningless APIDumpDir() assertions and unused dumpBase
  config in IndexedOnly and CopilotVariants. Providers receive the base
  dump directory directly; the per-provider subdirectory is appended
  downstream by apidump.NewBridgeMiddleware.
Comment thread cli/testdata/coder_server_--help.golden
@dannykopping dannykopping requested a review from johnstcn May 26, 2026 12:34
@coder-tasks
Copy link
Copy Markdown
Contributor

coder-tasks Bot commented May 26, 2026

Documentation Check

Updates Needed

  • docs/ai-coder/ai-gateway/setup.md - Removed DUMP_DIR as a per-provider field and the per-provider warning

New Documentation Needed

  • docs/ai-coder/ai-gateway/setup.md - New "API Dumps" section documents:
    • Global --ai-gateway-dump-dir / CODER_AI_GATEWAY_DUMP_DIR configuration
    • Per-provider dump path structure: <base>/<provider_name>
    • Sensitive header redaction
    • Empty value disables dumping
    • Clear warning about diagnostic use and data sensitivity
  • docs/ai-coder/ai-gateway/setup.md - YAML configuration example added to the "API Dumps" section for consistency with other top-level settings

Automated review via Coder Agents

dannykopping and others added 5 commits May 26, 2026 15:39
Replace the env-based BuildProviders with a DB-backed loader. The
database is now the single source of truth for runtime provider
configuration; env config arrives via SeedAIProvidersFromEnv (run at
boot) and BuildProviders reads it back as [aibridge.Provider]
instances. cli/server.go and enterprise/cli/server.go both call the
same path, so aibridged and aibridgeproxyd see the same provider set.

Move the per-provider DumpDir to a top-level CODER_AI_GATEWAY_DUMP_DIR
base. Each provider's effective dump path is <base>/<provider name>,
keeping the debug feature alive without adding a dump_dir column to
ai_providers.

dbauthz.AsAIBridged gains ResourceAIProvider:Read so the loader can
read the live provider snapshot under a narrowly scoped subject.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses all review findings:

- CRF-1: Use filepath.Join instead of path.Join for filesystem paths
- CRF-2: Remove no-op empty string concatenation in test
- CRF-3: Use dumpBase constant instead of hardcoded path in test
- CRF-4: Gate BYOK key requirement on cfg.AllowBYOK.Value() so keyless
  OpenAI/Anthropic providers work when BYOK is enabled
- CRF-5: Trim RBAC comment to only mention startup loading (pubsub
  refresh is implemented upstack)
- CRF-7: Add Model and SmallFastModel assertions to BedrockSettingsPresent test
- CRF-8: Add logger.Warn when all enabled providers fail to build
- CRF-9: Remove dead nil guard (buildAIProviderFromRow never returns nil, nil)
- CRF-10: Restore IsBedrockConfigured guard in bedrockConfigFromRow
- CRF-11: Use GetAIProviderKeysByProviderIDs to load only enabled provider keys
- CRF-12: Wrap GetAIProviders and key query in a read-only transaction
- CRF-13: Rename logBase to dumpBase in test
- CRF-14: Log warning when legacy Anthropic key is dropped due to Bedrock
- CRF-15: Trim redundant doc comment first sentences
- CRF-16: Use ptr.NilToEmpty instead of manual nil-pointer dereference
- CRF-17: Fix log message to use gerund form (skipping misconfigured ai provider)
- CRF-6: Add direct BuildProviders skip-and-continue test coverage for
  corrupt settings, unsupported types, and mixed bad/good rows.
- CRF-18: Scope OpenAI keypool err locally so an unrelated outer err
  cannot surface as "openai key pool: ..." in BYOK paths.
- CRF-19: Gate bedrockConfigFromRow on AIProviderBedrockSettings.IsConfigured
  instead of IsBedrockConfigured(row.BaseUrl, ...). The row's generic
  BaseUrl is always non-empty and so cannot serve as a Bedrock signal;
  add a BedrockSettingsEmpty test for the non-nil-but-empty case.
- Drop the now-meaningless APIDumpDir() assertions and unused dumpBase
  config in IndexedOnly and CopilotVariants. Providers receive the base
  dump directory directly; the per-provider subdirectory is appended
  downstream by apidump.NewBridgeMiddleware.
@dannykopping dannykopping force-pushed the dk/aibridge-providers-db-load branch from 50ed5f8 to 10ef9c7 Compare May 26, 2026 13:40
@dannykopping dannykopping removed the request for review from jdomeracki-coder May 26, 2026 13:41
@dannykopping dannykopping merged commit 282ab7d into main May 26, 2026
49 of 53 checks passed
Copy link
Copy Markdown
Contributor Author

Merge activity

@dannykopping dannykopping deleted the dk/aibridge-providers-db-load branch May 26, 2026 13:57
@github-actions github-actions Bot locked and limited conversation to collaborators May 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants