Skip to content

feat(opencode): add LLM provider fallback chain#26292

Open
j3k0 wants to merge 1 commit into
anomalyco:devfrom
j3k0:feat/llm-fallback
Open

feat(opencode): add LLM provider fallback chain#26292
j3k0 wants to merge 1 commit into
anomalyco:devfrom
j3k0:feat/llm-fallback

Conversation

@j3k0
Copy link
Copy Markdown

@j3k0 j3k0 commented May 8, 2026

Issue for this PR

Closes #7602

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds a configurable fallback chain so that when a provider returns a transient error (rate limit, overload, 5xx), OpenCode automatically retries on the next model in the chain instead of failing the session.

{
  "model": "anthropic/claude-sonnet-4-20250514",
  "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"],
  "cooldown_seconds": 300
}

fallbacks can be set at the top level or per-agent. cooldown_seconds defaults to 300 — after a retryable failure, that provider/model is skipped for the cooldown duration so you don't wait on retries to an overloaded provider.

Why built-in instead of a proxy: cheaper providers are unreliable, and routing through LiteLLM degrades tool-call quality. When a provider gets overloaded, falling through immediately is faster than retrying the same one.

Design

Data flow

User sends message
       │
       ▼
  LLM.run(input)
       │
       ▼
  pickStart(primary, fallbacks, cooldown)
       │
       ├─ primary available → call primary
       │
       └─ primary on cooldown → find first available fallback
                                 │
                                 └─ all on cooldown → pick soonest expiry
       │
       ▼
  streamText() → Effect Stream
       │
       ├── success → Stream ends normally. Clear cooldown on the winning provider.
       │
       └── retryable error → CooldownManager.put(failed, duration)
                              │
                              ▼
              chainFallback → try next in chain
                              │
                              ├── success → prepend notice event, continue stream
                              │
                              └── error → chain next fallback (or fail)

Cooldown durations

Trigger Duration Source
Rate limit / 5xx / overload cooldown_seconds (default 300) Config
Quota limit (weekly/monthly) 6 hours Hardcoded
Provider sends retry-after header Parsed value Provider response

On success, the winning provider's cooldown is cleared so it's immediately available next request.

Key decisions

  1. Cooldown over session state — We track what's failed, not what's succeeded. No "sticky" fallback — once a provider recovers, traffic routes back naturally.

  2. Stream-level error detection — Providers can return HTTP 200 with an error in the stream body. We wrap fullStream to throw on { type: "error" } chunks, triggering fallback just like a connection error.

  3. Quota limits trigger fallback (6h cooldown) — A quota-limited provider is unavailable, not the session. Fallback keeps the session alive. The 6h cooldown prevents hammering a capped provider.

  4. No dedup in the chain — The same model can appear twice. After falling through once, the primary may be worth retrying with fresh context. Cooldown handles this naturally — if primary is still on cooldown, it's skipped; if it's cleared, trying it again is valid.

  5. Model attribution updates on fallback — When a fallback succeeds, usedFallback propagates back so events, logs, and billing reflect the actual provider that handled the request.

Config contract

// Top-level (applies to all agents)
{
  "model": "anthropic/claude-sonnet-4-20250514",
  "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"],
  "cooldown_seconds": 300
}

// Per-agent (overrides top-level)
{
  "agents": {
    "code": {
      "model": "anthropic/claude-sonnet-4-20250514",
      "fallbacks": ["openai/gpt-4.1"]
    }
  }
}

New fields: fallbacks (array of provider/model strings), cooldown_seconds (positive int, default 300).

Not in scope

  • Per-provider cooldown configuration (use cooldown_seconds globally for now)
  • TTFT timeout thresholds (per-provider config, separate feature)
  • Admin API to inspect/clear cooldown state (runtime only, no persistence)
  • Persisted cooldown across restarts (cooldowns reset on process restart)

How did you verify your code works?

  • Unit tests for CooldownManager (put/get/clear/expiry) and config validation (fallbacks array and cooldown_seconds)
  • Integration test: stream error with fallbacks triggers fallback, stream error without fallbacks halts with error
  • Running this in production for 1 week across daily work without issues
  • bun typecheck passes for all 12 packages in the monorepo

Screenshots / recordings

N/A — no UI changes visible in screenshots (toast is a runtime notification)

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Comparison with related PRs

Reviewed #24369, #26192, #24013, #18443, and the closed #13189:

Key differences in our approach:

  • Cooldown is superior to session state: remembers what's failed, not what's succeeded — no "sticky" fallback
  • Stream error detection via first-chunk peek catches 200-with-error responses
  • retry-after header parsing respects provider-suggested backoff
  • Quota limits trigger fallback with 6h cooldown: the provider is unavailable, not the session
  • No dedup by design: deliberate — allows retrying primary after falling through

@github-actions github-actions Bot added the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

The following comment was made by an LLM, it may be inaccurate:

Based on the search results, I found several related PRs that address similar functionality:

Potential Related PRs

  1. PR fix(session): add fallback retry handling and harden pre-push bun path #26192 - fix(session): add fallback retry handling and harden pre-push bun path

    • Related because it adds fallback retry handling in the session layer, which is closely related to the fallback chain feature
  2. PR feat(processor): add model fallback chain when retries are exhausted #24369 - feat(processor): add model fallback chain when retries are exhausted

    • Similar feature but for the processor - implements a fallback chain mechanism when retries are exhausted
  3. PR fix(opencode): stop retrying non-transient rate limits #24013 - fix(opencode): stop retrying non-transient rate limits

    • Related to distinguishing transient errors (rate limits, 5xx) that should trigger fallbacks
  4. PR fix(retry): retry transient 429 responses even when provider marks non-retryable #18443 - fix(retry): retry transient 429 responses even when provider marks non-retryable

    • Related to handling transient 429 rate limit errors which is a key trigger for the fallback chain

These PRs address related concerns around provider fallback chains, transient error handling, and retry logic, though they appear to be separate implementations in different components. PR #26292 appears to be the consolidated, comprehensive implementation of this functionality.

@github-actions github-actions Bot removed the needs:compliance This means the issue will auto-close after 2 hours. label May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

@j3k0 j3k0 force-pushed the feat/llm-fallback branch 6 times, most recently from 9526a6b to bd85c37 Compare May 10, 2026 13:15
@j3k0 j3k0 force-pushed the feat/llm-fallback branch 5 times, most recently from bf9c1e7 to 0d07b43 Compare May 15, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Native Model Fallback / Failover Support

1 participant