Skip to content

feat: add automatic key failover for AI Bridge OpenAI#24847

Merged
ssncferreira merged 4 commits into
mainfrom
ssncf/aibridge-openai-key-failover
May 7, 2026
Merged

feat: add automatic key failover for AI Bridge OpenAI#24847
ssncferreira merged 4 commits into
mainfrom
ssncf/aibridge-openai-key-failover

Conversation

@ssncferreira
Copy link
Copy Markdown
Contributor

@ssncferreira ssncferreira commented Apr 30, 2026

Description

Adds automatic key failover for centralized OpenAI provider, covering both chat completions and responses APIs. Same shape as the Anthropic PR: each upstream call walks the configured key pool, keys are marked temporary on 429 (with cooldown from Retry-After) and permanent on 401/403. Each agentic-loop iteration gets its own fresh walker so a tool-call continuation can fail over independently of the initial request.

BYOK is unchanged: BYOK requests run as a single attempt with no failover.

Changes

  • config.OpenAI carries a KeyPool. Key remains for BYOK Authorization Bearer set per interception.
  • Chat completions blocking interceptor: walks the pool via newChatCompletionWithKeyFailover, marks keys on key-specific failures, returns on first success or non-failover error.
  • Chat completions streaming interceptor: per-iteration walker. Pre-stream failures fail over to the next key; mid-stream errors are relayed as SSE events.
  • Responses blocking interceptor: extracts newResponseWithKeyFailover parallel to chatcompletions.
  • Responses streaming interceptor: per-iteration walker, retains the existing buffer-then-forward design.

Related Issues

Related to: coder/internal#1446
Related to: https://linear.app/codercom/issue/AIGOV-197/aibridge-automatic-key-failover-for-bridged-and-passthrough-routes

Follow-up PRs

  • Bedrock multi-key support.
  • Refactor provider vs interceptor config separation.
  • Record the actually-used key in the interception credential hint after failover.

Note

Initially generated by Claude Opus 4.7, modified and reviewed by @ssncferreira

Copy link
Copy Markdown
Contributor Author

ssncferreira commented Apr 30, 2026

@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from d60c20c to 358cc72 Compare April 30, 2026 17:57
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from c486d89 to cb3d525 Compare April 30, 2026 17:57
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from 358cc72 to b282407 Compare April 30, 2026 17:57
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from cb3d525 to afef97c Compare May 4, 2026 08:55
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from b282407 to ae98c2d Compare May 4, 2026 08:56
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from afef97c to 615b83c Compare May 4, 2026 09:18
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from ae98c2d to 541a917 Compare May 4, 2026 09:18
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from 615b83c to 6dfa0c0 Compare May 4, 2026 09:36
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch 2 times, most recently from 337ee29 to 2f7c02d Compare May 4, 2026 09:59
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from 6dfa0c0 to 1ae6384 Compare May 4, 2026 09:59
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch 2 times, most recently from c38a2a8 to 8ac3606 Compare May 4, 2026 11:01
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch 2 times, most recently from 9f7d1d5 to cee332e Compare May 4, 2026 11:14
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch 2 times, most recently from 38ed74c to 866efb9 Compare May 4, 2026 14:52
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch 2 times, most recently from 37e9958 to cfd1a7a Compare May 4, 2026 15:35
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from 866efb9 to 3ddd9a2 Compare May 4, 2026 15:35
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from cfd1a7a to f13fb00 Compare May 4, 2026 16:03
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from 3ddd9a2 to 8590bdf Compare May 4, 2026 16:03
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from f13fb00 to 63d2574 Compare May 4, 2026 16:08
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch 3 times, most recently from 02cc359 to ca5e0ce Compare May 4, 2026 18:35
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from 1f50a85 to 38e4faa Compare May 5, 2026 11:49
Comment thread aibridge/intercept/chatcompletions/base.go
Comment thread aibridge/intercept/chatcompletions/base.go
Comment thread aibridge/intercept/chatcompletions/base.go
Comment thread aibridge/intercept/chatcompletions/base.go
Comment thread aibridge/intercept/chatcompletions/base.go Outdated
Comment thread aibridge/intercept/responses/blocking.go
Comment thread aibridge/intercept/responses/blocking.go
Comment thread aibridge/intercept/responses/base.go
Comment thread aibridge/intercept/responses/streaming.go
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch 10 times, most recently from 554ca68 to a6072ef Compare May 5, 2026 19:22
@ssncferreira ssncferreira requested a review from pawbana May 6, 2026 10:44
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from 4ae41e2 to e9b9e65 Compare May 7, 2026 07:51
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch 3 times, most recently from 3acbc6f to 0ea0412 Compare May 7, 2026 11:17
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-anthropic-key-failover branch from 4bd785f to 2682942 Compare May 7, 2026 11:36
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from 0ea0412 to 14fca1b Compare May 7, 2026 11:36
// Then: 1 request, 429 response, no failover, upstream
// Retry-After propagated to the client.
name: "byok_no_failover",
byokKey: "user-byok",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keys be defined here? It would test that BYOK has precedence over keys.

},
expectedRequestCount: 3,
expectedSeenKeys: []string{"k0", "k0", "k1"},
expectedStatusCode: http.StatusTooManyRequests,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry-After could be checked?

}

// mockServerProxier is a test implementation of mcp.ServerProxier.
type mockServerProxier struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be moved to testutil package.


// stubToolCaller is a minimal mcp.ToolCaller that returns a fixed
// text result, so the agentic continuation can proceed.
type stubToolCaller struct{}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy paste from Anthropic tests.

Comment thread aibridge/intercept/openai_errors.go
expectedNil bool
expectedStatus int
expectedRetryAfter time.Duration
}{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test case with nil error could be added

Comment thread aibridge/intercept/responses/blocking_test.go
Comment thread aibridge/internal/integrationtest/keypool_failover_test.go
Comment thread aibridge/provider/openai.go
Copy link
Copy Markdown
Contributor Author

ssncferreira commented May 7, 2026

Merge activity

  • May 7, 1:57 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 7, 1:59 PM UTC: Graphite rebased this pull request as part of a merge.
  • May 7, 2:10 PM UTC: Graphite couldn't merge this PR because it was not satisfying all requirements (Failed CI: 'required', 'test-go-pg-17').
  • May 7, 2:35 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 7, 2:35 PM UTC: @ssncferreira merged this pull request with Graphite.

@ssncferreira ssncferreira changed the base branch from ssncf/aibridge-anthropic-key-failover to graphite-base/24847 May 7, 2026 13:57
@ssncferreira ssncferreira changed the base branch from graphite-base/24847 to main May 7, 2026 13:57
@ssncferreira ssncferreira force-pushed the ssncf/aibridge-openai-key-failover branch from 14fca1b to 3d646d3 Compare May 7, 2026 13:58
@ssncferreira ssncferreira merged commit b6dacb4 into main May 7, 2026
46 of 48 checks passed
@ssncferreira ssncferreira deleted the ssncf/aibridge-openai-key-failover branch May 7, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants