feat(session): add configurable fallback model chain#27939
Open
loss-and-quick wants to merge 3 commits into
Open
feat(session): add configurable fallback model chain#27939loss-and-quick wants to merge 3 commits into
loss-and-quick wants to merge 3 commits into
Conversation
Signed-off-by: minicx <minicx@disroot.org>
Signed-off-by: minicx <minicx@disroot.org>
Contributor
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found one related PR that should be noted: Related PR:
Other potentially relevant PRs (not duplicates):
The current PR (#27939) is a continuation/improvement of #26292, not a duplicate. It's the successor that fixes issues found in production use of the original feature. |
Author
|
@nexxeln, Can you please review? |
Fallback notices (e.g. 'Using GLM-4.5-Air while ... is cooling down') are stored with ignored: true but were not filtered out when converting assistant parts to model messages, causing them to leak into the LLM context and be repeated in responses.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #7602
Related: refactors and extends the approach from #26292.
Type of change
What does this PR do?
Adds a configurable fallback chain: when the primary model returns a retryable error (rate limit, 5xx, overload, quota), the session switches to the next model in the chain instead of failing. The failed model is parked in a process-local cooldown so subsequent turns skip it until it recovers, then traffic flows back to primary on its own.
{ "model": "anthropic/claude-sonnet-4-20250514", "fallbacks": ["openai/gpt-4.1", "deepseek/deepseek-v4"], "cooldown_seconds": 300 }fallbackscan be set at the top level (applies to the main chat agent) or per-agent.cooldown_secondsdefaults to300. Quota errors (weekly / monthly / "exceeded your …") take a hardcoded 6h cooldown instead, andretry-after/retry-after-msheaders are honoured when the provider sends them.This is a continuation of #26292 with three correctness fixes that came up while running it:
Model attribution actually updates. The original patch communicated "we used a fallback" by mutating the
StreamInputobject, butllm.tsspreads that object on the way in (run({ ...input, abort })) so the mutation never reachedprompt.ts. As a result the DB kept the primary model,wasOnFallbackwas alwaysfalse, the "Switched to …" toast fired on every subsequent turn, and the model name under each assistant message didn't change.Fix:
withFallbackpublishes aFallbackUsedbus event instead.SessionProcessor.processsubscribes for the lifetime of one process call and updatesctx.assistantMessage.modelID/providerID+ publishesSessionEvent.Model.Updated. The subscription is released viaEffect.ensuring. This also makes the flow work for subsessions and title generation, which both go through the same processor.Hang after the first fallback. If every model in the chain was on cooldown,
pickStartreturned awaitdecision but the caller proceeded straight intodeps.call(primary)without actually waiting — on some providers (notably self-hosted) this hung on a kept-alive socket. Replaced with a realEffect.sleepbounded byWAIT_CAP_MS = 30s, then re-pick the start entry.Toast spam. The original patch fired both
FallbackTriggeredandFallbackUsedtoasts, and there is also an inline~> Switching to …notice in the message stream — three notifications for one fallback. Kept only theFallbackTriggeredwarning toast; dropped theFallbackUsedinfo toast since the inline notice already shows the same info with attribution to the specific message.A few smaller things:
sync-v2.tsxhandler forsession.next.model.updatedwas usingdraft.find(...)(first assistant in the session) — changed toactiveAssistant(draft) ?? [...].reverse().find(...)so it updates the current message.using/switch/resume) with different colours so it's obvious whether we're starting on a fallback because primary is cooling down, switched mid-stream after an error, or returning to primary after recovery.agent.fallbacksonly, not the top-levelfallbacks. The top-level list is sized for the main chat model and is typically too expensive for the small title pass; users who want title fallbacks configure them on the title agent.How did you verify your code works?
Screenshots / recordings
Checklist