feat(opencode): add LLM provider fallback chain by j3k0 · Pull Request #26292 · anomalyco/opencode

j3k0 · 2026-05-08T06:20:06Z

Issue for this PR

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Adds a configurable fallback chain so that when a provider returns a transient error (rate limit, overload, 5xx), OpenCode automatically retries on the next model in the chain instead of failing the session.

{
  "model": "anthropic/claude-sonnet-4-20250514",
  "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"],
  "cooldown_seconds": 300
}

fallbacks can be set at the top level or per-agent. cooldown_seconds defaults to 300 — after a retryable failure, that provider/model is skipped for the cooldown duration so you don't wait on retries to an overloaded provider.

Why built-in instead of a proxy: cheaper providers are unreliable, and routing through LiteLLM degrades tool-call quality. When a provider gets overloaded, falling through immediately is faster than retrying the same one.

Design

Data flow

User sends message
       │
       ▼
  LLM.run(input)
       │
       ▼
  pickStart(primary, fallbacks, cooldown)
       │
       ├─ primary available → call primary
       │
       └─ primary on cooldown → find first available fallback
                                 │
                                 └─ all on cooldown → pick soonest expiry
       │
       ▼
  streamText() → Effect Stream
       │
       ├── success → Stream ends normally. Clear cooldown on the winning provider.
       │
       └── retryable error → CooldownManager.put(failed, duration)
                              │
                              ▼
              chainFallback → try next in chain
                              │
                              ├── success → prepend notice event, continue stream
                              │
                              └── error → chain next fallback (or fail)

Cooldown durations

Trigger	Duration	Source
Rate limit / 5xx / overload	`cooldown_seconds` (default 300)	Config
Quota limit (weekly/monthly)	6 hours	Hardcoded
Provider sends `retry-after` header	Parsed value	Provider response

On success, the winning provider's cooldown is cleared so it's immediately available next request.

Key decisions

Cooldown over session state — We track what's failed, not what's succeeded. No "sticky" fallback — once a provider recovers, traffic routes back naturally.
Stream-level error detection — Providers can return HTTP 200 with an error in the stream body. We wrap fullStream to throw on { type: "error" } chunks, triggering fallback just like a connection error.
Quota limits trigger fallback (6h cooldown) — A quota-limited provider is unavailable, not the session. Fallback keeps the session alive. The 6h cooldown prevents hammering a capped provider.
No dedup in the chain — The same model can appear twice. After falling through once, the primary may be worth retrying with fresh context. Cooldown handles this naturally — if primary is still on cooldown, it's skipped; if it's cleared, trying it again is valid.
Model attribution updates on fallback — When a fallback succeeds, usedFallback propagates back so events, logs, and billing reflect the actual provider that handled the request.

Config contract

// Top-level (applies to all agents)
{
  "model": "anthropic/claude-sonnet-4-20250514",
  "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"],
  "cooldown_seconds": 300
}

// Per-agent (overrides top-level)
{
  "agents": {
    "code": {
      "model": "anthropic/claude-sonnet-4-20250514",
      "fallbacks": ["openai/gpt-4.1"]
    }
  }
}

New fields: fallbacks (array of provider/model strings), cooldown_seconds (positive int, default 300).

Not in scope

Per-provider cooldown configuration (use cooldown_seconds globally for now)
TTFT timeout thresholds (per-provider config, separate feature)
Admin API to inspect/clear cooldown state (runtime only, no persistence)
Persisted cooldown across restarts (cooldowns reset on process restart)

How did you verify your code works?

Unit tests for CooldownManager (put/get/clear/expiry) and config validation (fallbacks array and cooldown_seconds)
Integration test: stream error with fallbacks triggers fallback, stream error without fallbacks halts with error
Running this in production for 1 week across daily work without issues
bun typecheck passes for all 12 packages in the monorepo

Screenshots / recordings

N/A — no UI changes visible in screenshots (toast is a runtime notification)

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

Comparison with related PRs

Reviewed #24369, #26192, #24013, #18443, and the closed #13189:

vs feat(processor): add model fallback chain when retries are exhausted #24369: We have cooldown (they have none), stream error detection, and user notification. They have a resolveFallbackChain utility — minor convenience we can add later.
vs fix(session): add fallback retry handling and harden pre-push bun path #26192: They identified the model attribution gap (events/logs/billing showing original provider instead of fallback) — now fixed in our PR. Their dedup would prevent trying the same model twice in a chain, but this can be intentional (try primary, fall through, then retry primary with fresh context). Our cooldown solves this differently.
vs fix(opencode): stop retrying non-transient rate limits #24013: We take the opposite approach — quota-limited providers should trigger fallback (the next provider may not be quota-limited), but we put them on a 6h cooldown instead of the default 5min. This way the session doesn't fail just because one provider hit its monthly cap.
vs fix(retry): retry transient 429 responses even when provider marks non-retryable #18443: They handle 429+isRetryable=false from provider proxies. Our code already handles this via 5xx override, but the explicit 429 path would be a good future improvement.
vs feat: add model fallback support with TTFT-based timeout #13189 (closed, stale): They had TTFT timeout and session state tracking. Our cooldown naturally routes away from failed providers toward working ones (simpler), and TTFT is better handled per-provider in config.

Key differences in our approach:

Cooldown is superior to session state: remembers what's failed, not what's succeeded — no "sticky" fallback
Stream error detection via first-chunk peek catches 200-with-error responses
retry-after header parsing respects provider-suggested backoff
Quota limits trigger fallback with 6h cooldown: the provider is unavailable, not the session
No dedup by design: deliberate — allows retrying primary after falling through

github-actions · 2026-05-08T06:21:08Z

The following comment was made by an LLM, it may be inaccurate:

Based on the search results, I found several related PRs that address similar functionality:

Potential Related PRs

PR fix(session): add fallback retry handling and harden pre-push bun path #26192 - fix(session): add fallback retry handling and harden pre-push bun path
- Related because it adds fallback retry handling in the session layer, which is closely related to the fallback chain feature
PR feat(processor): add model fallback chain when retries are exhausted #24369 - feat(processor): add model fallback chain when retries are exhausted
- Similar feature but for the processor - implements a fallback chain mechanism when retries are exhausted
PR fix(opencode): stop retrying non-transient rate limits #24013 - fix(opencode): stop retrying non-transient rate limits
- Related to distinguishing transient errors (rate limits, 5xx) that should trigger fallbacks
PR fix(retry): retry transient 429 responses even when provider marks non-retryable #18443 - fix(retry): retry transient 429 responses even when provider marks non-retryable
- Related to handling transient 429 rate limit errors which is a key trigger for the fallback chain

These PRs address related concerns around provider fallback chains, transient error handling, and retry logic, though they appear to be separate implementations in different components. PR #26292 appears to be the consolidated, comprehensive implementation of this functionality.

github-actions · 2026-05-08T06:25:58Z

Thanks for updating your PR! It now meets our contributing guidelines. 👍

github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026

github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026

j3k0 force-pushed the feat/llm-fallback branch 6 times, most recently from 9526a6b to bd85c37 Compare May 10, 2026 13:15

j3k0 mentioned this pull request May 10, 2026

[FEATURE]: Native Model Fallback / Failover Support #7602

Open

j3k0 force-pushed the feat/llm-fallback branch 5 times, most recently from bf9c1e7 to 0d07b43 Compare May 15, 2026 16:12

feat: Add model fallbacks

e54c5c1

j3k0 force-pushed the feat/llm-fallback branch from 0d07b43 to e54c5c1 Compare May 16, 2026 08:38

loss-and-quick mentioned this pull request May 16, 2026

feat(session): add configurable fallback model chain #27939

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(opencode): add LLM provider fallback chain#26292

feat(opencode): add LLM provider fallback chain#26292
j3k0 wants to merge 1 commit into
anomalyco:devfrom
j3k0:feat/llm-fallback

j3k0 commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

j3k0 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue for this PR

Type of change

What does this PR do?

Design

Data flow

Cooldown durations

Key decisions

Config contract

Not in scope

How did you verify your code works?

Screenshots / recordings

Checklist

Comparison with related PRs

Uh oh!

github-actions Bot commented May 8, 2026

Potential Related PRs

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

j3k0 commented May 8, 2026 •

edited

Loading