feat(opencode): add LLM provider fallback chain#26292
Conversation
|
The following comment was made by an LLM, it may be inaccurate: Based on the search results, I found several related PRs that address similar functionality: Potential Related PRs
These PRs address related concerns around provider fallback chains, transient error handling, and retry logic, though they appear to be separate implementations in different components. PR #26292 appears to be the consolidated, comprehensive implementation of this functionality. |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
9526a6b to
bd85c37
Compare
bf9c1e7 to
0d07b43
Compare
Issue for this PR
Closes #7602
Type of change
What does this PR do?
Adds a configurable fallback chain so that when a provider returns a transient error (rate limit, overload, 5xx), OpenCode automatically retries on the next model in the chain instead of failing the session.
{ "model": "anthropic/claude-sonnet-4-20250514", "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"], "cooldown_seconds": 300 }fallbackscan be set at the top level or per-agent.cooldown_secondsdefaults to 300 — after a retryable failure, that provider/model is skipped for the cooldown duration so you don't wait on retries to an overloaded provider.Why built-in instead of a proxy: cheaper providers are unreliable, and routing through LiteLLM degrades tool-call quality. When a provider gets overloaded, falling through immediately is faster than retrying the same one.
Design
Data flow
Cooldown durations
cooldown_seconds(default 300)retry-afterheaderOn success, the winning provider's cooldown is cleared so it's immediately available next request.
Key decisions
Cooldown over session state — We track what's failed, not what's succeeded. No "sticky" fallback — once a provider recovers, traffic routes back naturally.
Stream-level error detection — Providers can return HTTP 200 with an error in the stream body. We wrap
fullStreamto throw on{ type: "error" }chunks, triggering fallback just like a connection error.Quota limits trigger fallback (6h cooldown) — A quota-limited provider is unavailable, not the session. Fallback keeps the session alive. The 6h cooldown prevents hammering a capped provider.
No dedup in the chain — The same model can appear twice. After falling through once, the primary may be worth retrying with fresh context. Cooldown handles this naturally — if primary is still on cooldown, it's skipped; if it's cleared, trying it again is valid.
Model attribution updates on fallback — When a fallback succeeds,
usedFallbackpropagates back so events, logs, and billing reflect the actual provider that handled the request.Config contract
New fields:
fallbacks(array ofprovider/modelstrings),cooldown_seconds(positive int, default 300).Not in scope
cooldown_secondsglobally for now)How did you verify your code works?
CooldownManager(put/get/clear/expiry) and config validation (fallbacks array and cooldown_seconds)bun typecheckpasses for all 12 packages in the monorepoScreenshots / recordings
N/A — no UI changes visible in screenshots (toast is a runtime notification)
Checklist
Comparison with related PRs
Reviewed #24369, #26192, #24013, #18443, and the closed #13189:
resolveFallbackChainutility — minor convenience we can add later.Key differences in our approach: