fix(retry): retry transient 429 responses even when provider marks non-retryable#18443
Conversation
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
|
The following comment was made by an LLM, it may be inaccurate: Based on my search, I found several related PRs that address retry logic, though none appear to be direct duplicates of PR #18443: Related PRs (not duplicates):
No duplicate PRs found |
|
Please provide a bit more info, what providers are u hitting this error with? When does a 429 set isRetryable to false? |
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
|
Good question. We’re seeing this on OpenAI-compatible/proxy paths (including OpenRouter-style setups). In those paths, a transient 429 can still be tagged with This PR only changes that case: 429s that look transient are retried; hard quota failures are still treated as terminal ( |
|
Rebased on upstream/dev (HEAD: 1256228). CI re-running. |
9065b79 to
1256228
Compare
|
Already on upstream/dev. HEAD: 1256228 |
|
Automated PR Cleanup Thank you for contributing to opencode. Due to the high volume of PRs from users and AI agents, we periodically close older PRs using automated criteria so maintainers can focus review time on the most active and community-supported contributions. This PR was closed because it matched the following cleanup criteria:
PRs created within the last month are not affected by this cleanup. If you believe this PR was closed incorrectly, or if you are still actively working on it, please leave a comment explaining why it should be reopened. A maintainer can review and reopen it if appropriate. Thanks again for taking the time to contribute. |
Issue for this PR
Closes #1712
Type of change
What does this PR do?
This fixes a retry classification gap for API 429 responses. Today, retry decisions in
SessionRetry.retryabletrustisRetryablefirst. Some OpenAI-compatible/provider-proxy paths can returnstatusCode=429while still markingisRetryable=false, which causes transient rate limits to stop the run instead of retrying.This change treats 429 as retryable when the error body looks like a transient rate limit, while still not retrying known quota-exhausted responses (
insufficient_quota,usage_not_included, free-usage exhaustion signatures). This keeps retries for temporary throttling but avoids retry loops for hard quota exhaustion.How did you verify your code works?
cd packages/opencode && bun run typecheckcd packages/opencode && bun test test/session/retry.test.tsisRetryable=falsefor transient casesScreenshots / recordings
Not a UI change.
Checklist