Skip to content

feat!: seed ai_providers from env on server startup#24895

Merged
dannykopping merged 3 commits into
mainfrom
dk/aibridge-providers-env-migration
May 22, 2026
Merged

feat!: seed ai_providers from env on server startup#24895
dannykopping merged 3 commits into
mainfrom
dk/aibridge-providers-env-migration

Conversation

@dannykopping
Copy link
Copy Markdown
Contributor

@dannykopping dannykopping commented May 1, 2026

Disclaimer: implemented by a Coder Agent using Claude Opus 4.7

Part of the implementation of RFC: Common AI Provider Configs (AIGOV-201).

Note

This change can cause a previously working installation to fail to start should a conflict exist between the providers configured in the environment & those now migrated to the database.

I'll raise a PR upstack to document this process and workarounds should a startup fail.

What this PR does

Reconciles environment-derived AI provider configuration with the ai_providers table at server startup. The seed runs before the aibridged daemon is initialized, so the runtime always reads providers from the database; the legacy CODER_AIBRIDGE_* environment variables become a one-shot migration source.

Behavior

  • Concurrent server starts are serialized through a Postgres advisory lock (LockIDAIProvidersEnvSeed).
  • Missing rows are inserted with an audit entry attributed to the system actor.
  • Existing rows whose canonical hash matches the env-derived hash are left alone (the common no-op restart path).
  • Existing rows whose canonical hash does not match cause server startup to fail with a descriptive error so the operator can explicitly resolve the conflict in either env or DB.
  • Soft-deleted rows are NOT resurrected from env; an explicit operator deletion is sticky across restarts.
  • Indexed providers whose name conflicts with a legacy env var fail startup with a clear remediation message.
  • Unknown provider types (e.g. copilot, until the DB enum is widened) are skipped with a log entry rather than failing startup.

Canonical hashing

The canonicalAIProvider shape captures exactly the fields that determine runtime behavior — type, base_url, and the Bedrock subset of settings (access key, access key secret, region, model, small fast model) — and is hashed with SHA-256. The hash is computed on demand from the row + env, never persisted, so the database does not need a new column for it. API keys live in the separate ai_provider_keys table and are intentionally excluded from the hash so operators can rotate keys via the API without forcing a server restart.

Decision log
  • The hash is intentionally not persisted in the database. The RFC discussed this trade-off; computing on demand keeps the schema minimal and lets the canonical shape evolve without a migration.
  • The lock uses an iota slot in coderd/database/lock.go rather than GenLockID so it's stable, easy to audit, and matches the convention used for every other startup lock.
  • A bearer-token Anthropic provider whose env vars also set Bedrock metadata but no AWS credentials does NOT store the Bedrock fields. Without credentials the discriminated settings would misrepresent the row as Bedrock auth.
  • We deliberately do NOT publish to the ai_providers_changed pubsub channel from the seed because the seed completes before any subscriber is started; the follow-up PR introduces that channel.

Copy link
Copy Markdown
Contributor Author

dannykopping commented May 1, 2026

@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch from 6f5778a to 2ce6bfd Compare May 1, 2026 18:34
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from 44db962 to ec333ba Compare May 1, 2026 18:34
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch from 2ce6bfd to 66e791f Compare May 1, 2026 19:24
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from ec333ba to 87d51ac Compare May 1, 2026 19:24
@dannykopping dannykopping changed the title feat(enterprise/coderd): seed ai_providers from env on server startup feat: seed ai_providers from env on server startup May 1, 2026
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch 2 times, most recently from 7270b26 to 1fe121a Compare May 13, 2026 15:06
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from 87d51ac to 847aded Compare May 13, 2026 15:10
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch from 1fe121a to 7ff75d3 Compare May 14, 2026 10:57
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from 847aded to 8cb5a08 Compare May 14, 2026 10:57
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from 8cb5a08 to eb0f556 Compare May 14, 2026 11:23
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch from 7ff75d3 to ae1c48e Compare May 14, 2026 11:23
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from eb0f556 to 99f7de6 Compare May 14, 2026 11:41
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch 2 times, most recently from 06a86cd to 75af16f Compare May 14, 2026 13:20
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch 2 times, most recently from 9085404 to 190628c Compare May 14, 2026 13:38
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch 2 times, most recently from 35194e7 to 893ed76 Compare May 14, 2026 14:12
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch 2 times, most recently from 2502c3c to 9ac5fc7 Compare May 15, 2026 11:39
@dannykopping dannykopping force-pushed the dk/aibridge-providers-handlers branch from 893ed76 to 783cd4d Compare May 15, 2026 11:39
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 is blocked. All three P0 findings from round 1 remain unaddressed (no author response in threads, despite new commits being pushed).

Unaddressed findings:

  • DEREM-1 (P0): encodeBedrockSettings undefined, package does not compile
  • DEREM-2 (P0): *string vs string type mismatch on AccessKey/AccessKeySecret (6 sites)
  • DEREM-3 (P0): database.AIProviderKey not in Auditable union

Further review is blocked until the author responds to or resolves each finding thread. The panel will review once these are addressed.

🤖 This review was automatically generated with Coder Agents.

@dannykopping dannykopping changed the base branch from dk/aibridge-providers-audit to graphite-base/24895 May 20, 2026 12:44
@dannykopping dannykopping force-pushed the graphite-base/24895 branch from 286f036 to 00e8b40 Compare May 20, 2026 12:45
@dannykopping dannykopping force-pushed the dk/aibridge-providers-env-migration branch from be49026 to d148e00 Compare May 20, 2026 12:45
@graphite-app graphite-app Bot changed the base branch from graphite-base/24895 to main May 20, 2026 12:45
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 is blocked (same as round 2). All three P0 compilation findings from round 1 appear fixed in the current code, but the author resolved each thread without a substantive response describing the fix.

The churn guard confirmed the fixes:

  • DEREM-1: Now uses encodeAIProviderSettings(codersdk.AIProviderSettings{Bedrock: dp.Bedrock}) at coderd/ai_providers_migrate.go:90
  • DEREM-2: Now uses ptr.Ref(key) / ptr.Ref(secret) for pointer assignments
  • DEREM-3: AIProviderKey added to Auditable union via #25534

To unblock: reply in each finding thread with a brief description of how the finding was addressed (e.g. "Fixed in commit X by doing Y"). The panel review will proceed on the next round.

🤖 This review was automatically generated with Coder Agents.

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architecture is well-structured: advisory-lock serialization, canonical hashing for drift detection, soft-delete stickiness, and comprehensive test coverage (14 subcases against real Postgres). The separation of env parsing from DB reconciliation is clean, and the test for LegacyAnthropicKeyOnlyIgnoresBedrockModelDefaults using real deployment defaults is particularly strong.

The R1 compilation errors (DEREM-1 through DEREM-3) are all correctly fixed.

1 P1, 4 P2, 2 P3, 3 Nit.

The P1 is a migration data loss: Bedrock BaseURL (CODER_AIBRIDGE_BEDROCK_BASE_URL) is parsed and validated but never stored. Operators with custom VPC/FIPS endpoints lose their endpoint override when the runtime switches to DB-sourced providers. Five reviewers converged on this independently.

Process note: the original commit shipped 300+ lines of Go without running go build, producing three independent compilation failures. The agent should build before pushing.

"An operator sets CODER_AIBRIDGE_PROVIDER_0_TYPE=copilot. The seed code hits the default case and drops it. No error, no warning. The operator sees a clean startup and assumes the provider is seeded." (Chopper)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread coderd/ai_providers_migrate.go
Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread coderd/ai_providers_migrate.go
Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread coderd/ai_providers_migrate_test.go
Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread coderd/ai_providers_migrate.go Outdated
Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 5 is blocked. Code fixes were pushed for 8 of 10 findings (commit bf14b1b), but 9 findings have no thread response. The churn guard confirmed the fixes target the right code, but the process requires a brief reply in each thread (like the "Fixed." replies on DEREM-1/2/3 that unblocked round 4).

Churn guard confirmed code changes for:

  • DEREM-4 (P1): Now stores Bedrock BaseURL; new IsBedrockConfigured unifies detection
  • DEREM-7 (P2): logger.Warn added for unknown provider type skip
  • DEREM-8 (P2): Error message reworded to remove misattribution
  • DEREM-9 (P3): Audit log assertion added
  • DEREM-10 (P3): Comment rewritten
  • DEREM-11 (Nit): Comments trimmed
  • DEREM-12 (Nit): Changed to dbtime.Now()
  • DEREM-13 (Nit): Changed to slices.Sorted(maps.Keys())

Findings needing author response:

  • DEREM-4, 7, 8, 9, 10, 11, 12, 13: Reply "Fixed." in each thread
  • DEREM-5 (contested): Author defense received ("table will be empty"). Panel will evaluate.
  • DEREM-6 (deferred): "Leave this for now" needs a linked ticket, or an explicit acceptance that false-positive drift on upgrade is acceptable

🤖 This review was automatically generated with Coder Agents.

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 6 is blocked. DEREM-5 is now addressed (author replied "Fixed.", code confirmed). The remaining 9 findings have confirmed code fixes from R5 but still lack thread replies.

To unblock, reply "Fixed." in each of these threads:

  • DEREM-4 (P1): Bedrock BaseURL
  • DEREM-7 (P2): Unknown type log
  • DEREM-8 (P2): Drift error message
  • DEREM-9 (P3): Audit test assertion
  • DEREM-10 (P3): Hash comment
  • DEREM-11 (Nit): Comment trim
  • DEREM-12 (Nit): dbtime.Now()
  • DEREM-13 (Nit): slices.Sorted

DEREM-6 (P2, model defaults drift): "Leave this for now" needs either a linked ticket or an explicit human decision that false-positive drift on upgrade is acceptable. This needs a human decision.

🤖 This review was automatically generated with Coder Agents.

Copy link
Copy Markdown
Contributor Author

/coder-agents-review

Copy link
Copy Markdown
Contributor

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 10 findings from round 4 are correctly addressed. The IsBedrockConfigured unification is well done: 4 callsites now use the same detection logic, eliminating the three-definition divergence that caused DEREM-4. The DEREM-6 fix (excluding Model/SmallFastModel from the canonical hash) is the right call, and the comment at line 196 clearly explains why.

1 P2, 5 P3, 2 Nit.

The P2 is a transactional correctness issue: audit.BackgroundAudit is called inside db.InTx, so if a later insert fails and the transaction rolls back, the audit entries for earlier inserts persist as phantom records. The CRUD handler in ai_providers.go separates these correctly.

"If provider A inserts and audits, then provider B's insert fails, the transaction rolls back: provider A's row disappears but its audit entry persists. The audit log records a 'create' for a resource that was never committed." (Meruem)

🤖 This review was automatically generated with Coder Agents.

Comment thread coderd/ai_providers_migrate.go Outdated
Comment thread enterprise/cli/aibridged.go
Comment thread coderd/ai_providers_migrate.go
Comment thread coderd/ai_providers_migrate.go
Comment thread coderd/ai_providers_migrate.go
Comment thread coderd/ai_providers_migrate.go
Comment thread coderd/ai_providers_migrate_test.go
Comment thread coderd/ai_providers_migrate.go
Copy link
Copy Markdown
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has enough potential cases for failure that I think warrants:

  1. Calling out as breaking
  2. Adding docs on how this works and what to do if it complains

dannykopping and others added 3 commits May 21, 2026 20:37
Reconciles CODER_AIBRIDGE_PROVIDER_<N>_* (and the legacy single-provider
env vars) with the ai_providers / ai_provider_keys tables at server
startup. Runs on the AGPL startup codepath unconditionally so operators
can seed providers via env without enabling the bridge or proxy
features. Concurrent server starts are serialized via a Postgres
advisory lock; conflicts between env and DB fail startup with a clear
error. Soft-deleted rows are not resurrected; existing keys are not
duplicated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…conflict handling

Bedrock detection now lives on AIProviderBedrockSettings.IsConfigured() so
the legacy and indexed paths share one rule, and both paths skip
model-defaults that would otherwise force every deployment to look like
Bedrock. Credentials drop out of the canonical hash so operators can
rotate them via the API without restart, and an Anthropic provider can
no longer carry both a bearer key and Bedrock settings (the CLI rejects
indexed configs upfront; legacy env vars are validated alongside).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add codersdk.IsBedrockConfigured as the single canonical "is Bedrock
  configured?" predicate, used by the seed, runtime config builder,
  and legacy validator.
- Add codersdk.NewAIProviderBedrockSettings to promote credential
  strings into the pointer-typed fields without per-call boilerplate.
- Preserve Bedrock BaseURL (LegacyBedrock.BaseURL / BedrockBaseURL)
  when seeding so custom VPC, FIPS, or proxy endpoints survive.
- Warn instead of silently skipping indexed providers with unsupported
  types (e.g. copilot).
- Reword the drift error to stop blaming the operator and use
  dbtime.Now() for key timestamps.
- slices.Sorted(maps.Keys(out)) over sort.Strings.
- Assert audit entries are emitted on the initial seed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

dannykopping commented May 22, 2026

Merge activity

  • May 22, 6:37 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 22, 6:37 AM UTC: @dannykopping merged this pull request with Graphite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release/breaking This label is applied to PRs to detect breaking changes as part of the release process

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants