feat(enterprise): hot-reload aibridged provider pool from DB on pubsub#24897
Closed
dannykopping wants to merge 1 commit into
Closed
feat(enterprise): hot-reload aibridged provider pool from DB on pubsub#24897dannykopping wants to merge 1 commit into
dannykopping wants to merge 1 commit into
Conversation
This was referenced May 1, 2026
Contributor
Author
44db962 to
ec333ba
Compare
e6263e1 to
c0ca02a
Compare
ec333ba to
87d51ac
Compare
c0ca02a to
9d381bb
Compare
87d51ac to
847aded
Compare
9d381bb to
ebe85d2
Compare
847aded to
8cb5a08
Compare
ebe85d2 to
f50344a
Compare
f50344a to
fe2d5e5
Compare
eb0f556 to
99f7de6
Compare
fe2d5e5 to
42d6c00
Compare
99f7de6 to
9085404
Compare
d57f04a to
5d9e052
Compare
190628c to
2502c3c
Compare
5d9e052 to
59d051d
Compare
2502c3c to
9ac5fc7
Compare
59d051d to
0268595
Compare
9ac5fc7 to
d5c5710
Compare
c05f349 to
2c8d709
Compare
2f0543c to
3fc3676
Compare
2c8d709 to
a229e16
Compare
3fc3676 to
8f98de6
Compare
bbe52ac to
f21ec02
Compare
d2c57f1 to
be49026
Compare
f21ec02 to
771c1ba
Compare
d148e00 to
75b4a11
Compare
771c1ba to
8c1140e
Compare
75b4a11 to
c3cde94
Compare
8c1140e to
0b18288
Compare
7eac2f3 to
d1abba1
Compare
0b18288 to
3d87f3f
Compare
This was referenced May 21, 2026
… on pubsub
Switches the in-memory aibridged daemon from a static, env-derived
provider list to a database-backed list that hot-reloads via pubsub.
After this PR:
- aibridged loads providers from ai_providers at startup (system
actor, dbauthz-gated) and joins them with ai_provider_keys to
pick the operator-preferred primary key (first by created_at).
- Non-Bedrock providers with zero ai_provider_keys are skipped
with a warning; Bedrock providers always have zero keys and
authenticate via the encrypted settings blob (AWS access key +
secret).
- The CRUD handlers from the previous PR publish on
'ai_providers_changed' after every successful Insert/Update/
SoftDelete of a provider AND after every Insert/Delete of a
key, because key changes alone affect the runtime pool.
- Each replica subscribes to that channel and triggers
aibridged.Server.Reload, which atomically swaps the providers
slice on the pool and clears the cached RequestBridge instances.
- In-flight requests continue against their existing
RequestBridge until completion; the cache's OnEvict shutdown
closes MCP connections in the background after a 5-second grace
period.
The proxy daemon is intentionally NOT reloaded yet to keep this PR
focused; it still receives the boot-time provider snapshot. A
follow-up will introduce a Pooler interface for the proxy and mirror
this pattern.
Pool changes:
- CachedBridgePool stores providers via atomic.Pointer[[]Provider]
instead of a fixed slice.
- New Reload(providers) method on the Pooler interface that
atomically swaps the snapshot, calls cache.Clear, and waits for
buffered writes to drain so a subsequent Acquire always sees the
new set.
Tests:
- TestPoolReload covers the happy path: build a pool, acquire a
bridge, Reload, ensure the next Acquire targets the new provider
set.
- TestPoolReloadAfterShutdown ensures Reload is a no-op post-Close
so a stale subscriber notification cannot resurrect a torn-down
pool.
- TestAIProvidersPubsubPublish exercises the producer side: each
of Insert/Update/Delete on a provider emits a notification on
AIBridgeProvidersChangedChannel.
- TestAIProviderKeysPubsubPublish does the same for the keys
sub-resource (Insert and Delete).
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Disclaimer: implemented by a Coder Agent using Claude Opus 4.7
Part of the implementation of RFC: Common AI Provider Configs (AIGOV-201).
What this PR does
Switches the in-memory
aibridgeddaemon from a static, env-derived provider list to a database-backed list that hot-reloads via pubsub. After this PR:aibridgedloads providers fromai_providersat startup (system actor, dbauthz-gated).ai_providers_changedafter every successful Insert/Update/SoftDelete on a provider or key.aibridged.Server.Reload, which atomically swaps the providers slice on the pool and clears the cachedRequestBridgeinstances.RequestBridgeuntil completion; the cache'sOnEvictshutdown closes MCP connections in the background after the existing 5-second grace period.Pubsub channel
The channel name (
ai_providers_changed, exported ascoderd.AIProvidersChangedChannel) is provider-generic, not aibridge-specific: any future consumer ofai_providersrows can subscribe. Today aibridged is the only subscriber.Pool changes
CachedBridgePoolstores providers viaatomic.Pointer[[]Provider]instead of a fixed slice.Reload(providers)method on thePoolerinterface that atomically swaps the snapshot, callscache.Clear, andcache.Waits for buffered writes to drain so a subsequentAcquirealways sees the cleared state.Wire-up
enterprise/cli/server.gonow callsloadProvidersFromDB(ctx, db, cfg)instead ofbuildProviders(cfg). The legacy env-drivenbuildProvidersis preserved for the proxy daemon path until the proxy reload follow-up lands.coderd.AIProvidersChangedChannel. Each notification re-loads providers from the database and callsaibridgeDaemon.Reload(...).Tests
TestPoolReload— prime the cache,Reload, observe that the nextAcquireis a fresh build (KeysAdded=1,Misses=1against the post-Reload zeroed metrics).TestPoolReloadAfterShutdown—Reloadis a safe no-op afterShutdown.TestAIProvidersPubsubPublish/TestAIProviderKeysPubsubPublish— end-to-end: each handler mutation publishes a notification on the providers-changed channel.TestAIProvidersCRUD,TestAIProviderKeysCRUD) still pass with the new publish hook in the handlers.Decision log
ai_providers_changedrather thanaibridge_providers_changedbecause the rows it announces are not aibridge-specific. Subscribers are currently aibridged only, but the contract does not assume that.cache.Wait()aftercache.Clear()is required because ristretto's set/clear operations are buffered. Without it, a Reload immediately followed by an Acquire could see the old (cached) bridge.aibridgedown the pubsub subscription, but the package currently has no database/pubsub dependencies. Wiring the subscription inenterprise/cli/server.gokeeps the daemon's interface narrow (Reload(providers)) and matches how other daemon lifecycles are managed.MockPoolerwas regenerated rather than hand-edited so future regen passes are deterministic.