feat(coderd/aibridged): fetch providers over DRPC (GetAIProviders)#26650
Draft
dannykopping wants to merge 1 commit into
Draft
feat(coderd/aibridged): fetch providers over DRPC (GetAIProviders)#26650dannykopping wants to merge 1 commit into
dannykopping wants to merge 1 commit into
Conversation
Split aibridged so the AI Gateway no longer reads the database directly: coderd stays the source of truth and serves provider configuration over a new ProviderConfigurator DRPC service (GetAIProviders, API v1.1). The handler reads providers + keys under a read-only transaction that acquires LockIDAIProvidersEnvSeed, so the response is never a partial, mid-seed snapshot. Both embedded and standalone gateways init via this RPC; embedded keeps pubsub as the hot-reload trigger, only the data path moves from direct-DB to the in-memory RPC. Standalone retries until the first fetch succeeds (an empty list is valid). Removes the temporary env-based provider building for standalone (BuildProvidersFromConfig / ProvidersFromConfig) and the obsolete BuildProviders DB-read path; the DB read now lives server-side in the handler and the builder half stays as buildProvider on the client.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
AIGOV-455: Extend DRPC with provider fetch (
GetAIProviders)Context
We are splitting
aibridged(the AI Gateway) into a standalone process thatmust not access the database directly.
coderdremains the source of truth:it seeds the
ai_providers/ai_provider_keystables from the environment(
SeedAIProvidersFromEnv, holdingLockIDAIProvidersEnvSeed). The standalonegateway has no provider env vars and no DB access, so it must fetch provider
configuration from
coderdover DRPC. While seeding is in flight, the gateway'sfetch must synchronize on the seed lock so it does not race and observe a partial
snapshot.
AIGOV-465 (publish a provider-seed completion signal so the gateway can refresh)
is a follow-up. This issue is initialization-only; refresh for the standalone
gateway is out of scope here.
Already done on the current branch (
pawel/aigov-315-...)coderd/aibridged/dialer.go),/api/v2/aibridge/serveendpoint (enterprise/coderd/aibridgeserve.go),yamux + DRPC, AI Gateway key auth, proto version negotiation.
buildProvider(aiProviderSpec, cfg, metrics)is refactored to be DB-neutral(
cli/aibridged.go) and is reusable as-is for an RPC response.aibridged.Server.Client()blocks until connected - a natural hook for"fetch providers after connect".
(
BuildProvidersFromConfig/ReadAIProvidersFromEnv). This is temporaryand is removed by this issue.
Decisions
aibridge.ProviderAcquireLock(LockIDAIProvidersEnvSeed), then readssrvfirst, async initial reload, drop the boot-time pre-buildProviderConfiguratorservice, unaryGetAIProvidersAIProviderBedrockproto messageCurrentMinorto 1enabledflag; keys only for enabledBuildProvidersFromConfig+ProvidersFromConfigin this PRDesign
Proto (
coderd/aibridged/proto/aibridged.proto, thenmake gen)version.go: bumpCurrentMinorto1; extend the version-history comment.The version gate prevents a v1.1 gateway (which needs
GetAIProviders) fromconnecting to a v1.0 coderd that lacks it, while an old v1.0 gateway still works
against a new coderd.
Server (
coderd/aibridgedserver)GetAIProvidershandler:InTx.AcquireLock(LockIDAIProvidersEnvSeed)so the handler waits for any in-flightseed transaction to commit/rollback before reading.
GetAIProviders{IncludeDisabled: true}+GetAIProviderKeysByProviderIDsfor the enabled provider IDs only.
dbauthz.AsAIBridged(same subject/permissions thatcli.BuildProvidersuses today).storeinterface withInTx,AcquireLock,GetAIProviders,GetAIProviderKeysByProviderIDs.ProviderConfiguratorinregister.go.Client plumbing
proto.DRPCProviderConfiguratorClientto theDRPCClientunion(
coderd/aibridged/client.go) and to the concreteClientstruct.dialer.go(standalone) andCreateInMemoryAIBridgeServer(embedded).Provider building (cli layer)
AIProvider->aiProviderSpec->buildProvider.Disabled entries ->
NewDisabledProviderStub. Reuses the existing DB-neutralbuildProvider, so construction is byte-identical across embedded and standalone.Embedded (
cli/aibridged.go,cli/server.go)newAIBridgeDaemon: build an empty pool, createsrv(
aibridged.New), then subscribe a reloader whoseReloaddoessrv.Client()->GetAIProviders-> build ->pool.ReplaceProviders.Reloadasynchronous so startup does not park onClient().cli/server.goBuildProviderspre-build and theprovidersargument to
newAIBridgeDaemon.AIProvidersChangedChannel), so embedded keepshot-reload; only the data path changes from direct-DB to the in-memory RPC.
Standalone (
enterprise/cli/aigatewaystart.go)ReadAIProvidersFromEnv+BuildProvidersFromConfigusage.srv+ websocket dialer, then run aretry-until-success loop:
srv.Client()->GetAIProviders-> build ->ReplaceProviders-> serve. A successful empty list is valid and ends the loop.Cleanup (this PR)
cli.BuildProvidersFromConfigandcoderd.ProvidersFromConfig.cli.BuildProviders: its DB-read half (rows + keys in a tx)moves server-side into the handler; its builder half is already
buildProvider.ReadAIProvidersFromEnv(still used by the embedded env->DB seed atcli/server.go:934).Tests
CODER_PG_CONNECTION_URL,DB=ci):hold
LockIDAIProvidersEnvSeedin one tx (simulating an in-flight seed), fireGetAIProvidersconcurrently, assert it blocks until release, then returns theseeded set. Advisory locks are Postgres-specific, so this cannot use the mock.
Known tradeoffs / implications
BuildProvidersis split - DB-read half to the server handler, builderhalf stays as
buildProvideron the client.brief and consistent across modes.
propagate to a running standalone gateway; embedded still hot-reloads.
depends on the in-memory
Client()being available.posture: the
Authorizer.IsAuthorizedRPC already carries a full API token onthe same channel. TLS is governed by the client URL scheme for standalone and
is trivially safe in-memory for embedded.
Out of scope (AIGOV-465 follow-up)
WatchAIProvidersstreaming RPC and standalone refresh-on-signal.