feat!: seed ai_providers from env on server startup#24895
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
6f5778a to
2ce6bfd
Compare
44db962 to
ec333ba
Compare
2ce6bfd to
66e791f
Compare
ec333ba to
87d51ac
Compare
7270b26 to
1fe121a
Compare
87d51ac to
847aded
Compare
1fe121a to
7ff75d3
Compare
847aded to
8cb5a08
Compare
8cb5a08 to
eb0f556
Compare
7ff75d3 to
ae1c48e
Compare
eb0f556 to
99f7de6
Compare
06a86cd to
75af16f
Compare
9085404 to
190628c
Compare
35194e7 to
893ed76
Compare
2502c3c to
9ac5fc7
Compare
893ed76 to
783cd4d
Compare
|
/coder-agents-review |
There was a problem hiding this comment.
Round 2 is blocked. All three P0 findings from round 1 remain unaddressed (no author response in threads, despite new commits being pushed).
Unaddressed findings:
- DEREM-1 (P0):
encodeBedrockSettingsundefined, package does not compile - DEREM-2 (P0):
*stringvsstringtype mismatch on AccessKey/AccessKeySecret (6 sites) - DEREM-3 (P0):
database.AIProviderKeynot inAuditableunion
Further review is blocked until the author responds to or resolves each finding thread. The panel will review once these are addressed.
🤖 This review was automatically generated with Coder Agents.
286f036 to
00e8b40
Compare
be49026 to
d148e00
Compare
|
/coder-agents-review |
There was a problem hiding this comment.
Round 3 is blocked (same as round 2). All three P0 compilation findings from round 1 appear fixed in the current code, but the author resolved each thread without a substantive response describing the fix.
The churn guard confirmed the fixes:
- DEREM-1: Now uses
encodeAIProviderSettings(codersdk.AIProviderSettings{Bedrock: dp.Bedrock})atcoderd/ai_providers_migrate.go:90 - DEREM-2: Now uses
ptr.Ref(key)/ptr.Ref(secret)for pointer assignments - DEREM-3:
AIProviderKeyadded toAuditableunion via #25534
To unblock: reply in each finding thread with a brief description of how the finding was addressed (e.g. "Fixed in commit X by doing Y"). The panel review will proceed on the next round.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
The architecture is well-structured: advisory-lock serialization, canonical hashing for drift detection, soft-delete stickiness, and comprehensive test coverage (14 subcases against real Postgres). The separation of env parsing from DB reconciliation is clean, and the test for LegacyAnthropicKeyOnlyIgnoresBedrockModelDefaults using real deployment defaults is particularly strong.
The R1 compilation errors (DEREM-1 through DEREM-3) are all correctly fixed.
1 P1, 4 P2, 2 P3, 3 Nit.
The P1 is a migration data loss: Bedrock BaseURL (CODER_AIBRIDGE_BEDROCK_BASE_URL) is parsed and validated but never stored. Operators with custom VPC/FIPS endpoints lose their endpoint override when the runtime switches to DB-sourced providers. Five reviewers converged on this independently.
Process note: the original commit shipped 300+ lines of Go without running go build, producing three independent compilation failures. The agent should build before pushing.
"An operator sets CODER_AIBRIDGE_PROVIDER_0_TYPE=copilot. The seed code hits the default case and drops it. No error, no warning. The operator sees a clean startup and assumes the provider is seeded." (Chopper)
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Round 5 is blocked. Code fixes were pushed for 8 of 10 findings (commit bf14b1b), but 9 findings have no thread response. The churn guard confirmed the fixes target the right code, but the process requires a brief reply in each thread (like the "Fixed." replies on DEREM-1/2/3 that unblocked round 4).
Churn guard confirmed code changes for:
- DEREM-4 (P1): Now stores Bedrock BaseURL; new
IsBedrockConfiguredunifies detection - DEREM-7 (P2):
logger.Warnadded for unknown provider type skip - DEREM-8 (P2): Error message reworded to remove misattribution
- DEREM-9 (P3): Audit log assertion added
- DEREM-10 (P3): Comment rewritten
- DEREM-11 (Nit): Comments trimmed
- DEREM-12 (Nit): Changed to
dbtime.Now() - DEREM-13 (Nit): Changed to
slices.Sorted(maps.Keys())
Findings needing author response:
- DEREM-4, 7, 8, 9, 10, 11, 12, 13: Reply "Fixed." in each thread
- DEREM-5 (contested): Author defense received ("table will be empty"). Panel will evaluate.
- DEREM-6 (deferred): "Leave this for now" needs a linked ticket, or an explicit acceptance that false-positive drift on upgrade is acceptable
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
Round 6 is blocked. DEREM-5 is now addressed (author replied "Fixed.", code confirmed). The remaining 9 findings have confirmed code fixes from R5 but still lack thread replies.
To unblock, reply "Fixed." in each of these threads:
- DEREM-4 (P1): Bedrock BaseURL
- DEREM-7 (P2): Unknown type log
- DEREM-8 (P2): Drift error message
- DEREM-9 (P3): Audit test assertion
- DEREM-10 (P3): Hash comment
- DEREM-11 (Nit): Comment trim
- DEREM-12 (Nit): dbtime.Now()
- DEREM-13 (Nit): slices.Sorted
DEREM-6 (P2, model defaults drift): "Leave this for now" needs either a linked ticket or an explicit human decision that false-positive drift on upgrade is acceptable. This needs a human decision.
🤖 This review was automatically generated with Coder Agents.
|
/coder-agents-review |
There was a problem hiding this comment.
All 10 findings from round 4 are correctly addressed. The IsBedrockConfigured unification is well done: 4 callsites now use the same detection logic, eliminating the three-definition divergence that caused DEREM-4. The DEREM-6 fix (excluding Model/SmallFastModel from the canonical hash) is the right call, and the comment at line 196 clearly explains why.
1 P2, 5 P3, 2 Nit.
The P2 is a transactional correctness issue: audit.BackgroundAudit is called inside db.InTx, so if a later insert fails and the transaction rolls back, the audit entries for earlier inserts persist as phantom records. The CRUD handler in ai_providers.go separates these correctly.
"If provider A inserts and audits, then provider B's insert fails, the transaction rolls back: provider A's row disappears but its audit entry persists. The audit log records a 'create' for a resource that was never committed." (Meruem)
🤖 This review was automatically generated with Coder Agents.
johnstcn
left a comment
There was a problem hiding this comment.
This change has enough potential cases for failure that I think warrants:
- Calling out as breaking
- Adding docs on how this works and what to do if it complains
Reconciles CODER_AIBRIDGE_PROVIDER_<N>_* (and the legacy single-provider env vars) with the ai_providers / ai_provider_keys tables at server startup. Runs on the AGPL startup codepath unconditionally so operators can seed providers via env without enabling the bridge or proxy features. Concurrent server starts are serialized via a Postgres advisory lock; conflicts between env and DB fail startup with a clear error. Soft-deleted rows are not resurrected; existing keys are not duplicated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…conflict handling Bedrock detection now lives on AIProviderBedrockSettings.IsConfigured() so the legacy and indexed paths share one rule, and both paths skip model-defaults that would otherwise force every deployment to look like Bedrock. Credentials drop out of the canonical hash so operators can rotate them via the API without restart, and an Anthropic provider can no longer carry both a bearer key and Bedrock settings (the CLI rejects indexed configs upfront; legacy env vars are validated alongside). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add codersdk.IsBedrockConfigured as the single canonical "is Bedrock configured?" predicate, used by the seed, runtime config builder, and legacy validator. - Add codersdk.NewAIProviderBedrockSettings to promote credential strings into the pointer-typed fields without per-call boilerplate. - Preserve Bedrock BaseURL (LegacyBedrock.BaseURL / BedrockBaseURL) when seeding so custom VPC, FIPS, or proxy endpoints survive. - Warn instead of silently skipping indexed providers with unsupported types (e.g. copilot). - Reword the drift error to stop blaming the operator and use dbtime.Now() for key timestamps. - slices.Sorted(maps.Keys(out)) over sort.Strings. - Assert audit entries are emitted on the initial seed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merge activity
|

Disclaimer: implemented by a Coder Agent using Claude Opus 4.7
Part of the implementation of RFC: Common AI Provider Configs (AIGOV-201).
Note
This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database.
I'll raise a PR upstack to document this process and workarounds should a startup fail.
What this PR does
Reconciles environment-derived AI provider configuration with the
ai_providerstable at server startup. The seed runs before the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacyCODER_AIBRIDGE_*environment variables become a one-shot migration source.Behavior
LockIDAIProvidersEnvSeed).copilot, until the DB enum is widened) are skipped with a log entry rather than failing startup.Canonical hashing
The
canonicalAIProvidershape captures exactly the fields that determine runtime behavior —type,base_url, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is computed on demand from the row + env, never persisted, so the database does not need a new column for it. API keys live in the separateai_provider_keystable and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart.Decision log
iotaslot incoderd/database/lock.gorather thanGenLockIDso it's stable, easy to audit, and matches the convention used for every other startup lock.ai_providers_changedpubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel.