fix(coderd/x/chatd): retry quota 429s#26200
Conversation
|
/coder-agents-review |
|
Chat: Review in progress | View chat deep-review v0.7.1 | Round 2 | Status: Panel review (4 reviewers) Last posted: Round 1, 5 findings (1 P3, 3 Nit, 1 Note), COMMENT. Review Finding inventoryFindings
Round logRound 1Panel: Bisky, Hisoka, Mafu-san, Mafuuu, Pariston, Gon, Leorio, Ging-Go, Chopper, Ryosuke, Meruem, Kite, Knov. Netero: no findings. 1 P3, 3 Nit, 1 Note. Reviewed against ace1a5a..3957b18. About deep-reviewCRF = Coder Review Finding (P0-P4, Nit, Note)
|
There was a problem hiding this comment.
Precise fix for a real provider interop problem. The two-tier pattern split (usageLimitAnyStatusPatterns vs usageLimitPatterns gated on non-429) uses HTTP semantics as the disambiguation signal, which is the right abstraction. High test density (88.7%) with positive and negative coverage for the behavioral change, including realistic Gemini, Azure OpenAI, and structured provider error shapes. The relocated QuotaWith429Status test was expanded with a structured-error variant, not dropped.
1 P3, 3 Nit, 1 Note. The P3 is a test gap for non-429 structured-detail matching.
Hisoka: "Clean fight. The 429 gate is sound, the insufficient_quota carve-out is precise, and the test matrix covers the real provider error shapes."
🤖 This review was automatically generated with Coder Agents.
| deadline := errors.Is(err, context.DeadlineExceeded) || strings.Contains(lower, "context deadline exceeded") | ||
| overloadedMatch := statusCode == 529 || containsAny(lower, overloadedPatterns...) | ||
| usageLimitMatch := containsAny(lower, usageLimitPatterns...) | ||
| usageLimitText := lower + "\n" + strings.ToLower(structured.detail) |
There was a problem hiding this comment.
Note [CRF-5] structured.detail is now part of usage-limit matching but not other signals.
Other signal matchers (configMatch, authStrong, rateLimitMatch, etc.) still check only lower. This is correct scoping: rate limits have 429, auth has 401/403, and config patterns describe model names that appear in the error chain. Usage limits lack a dedicated HTTP code, so body text is their primary signal. Eight reviewers noted this independently and all concurred the asymmetry is intentional. Worth keeping in mind if future providers surface critical signals only in their response body. (Hisoka, Mafuuu, Gon, Leorio, Ryosuke, Meruem, Kite, Knov)
🤖
|
/coder-agents-review |
Provider 429 responses that mention quota or billing were classified as non-retryable usage limits before the rate-limit rule could run, which suppressed retries for Gemini and Azure OpenAI rate limits.
Treat broad quota and billing prose as rate-limit retryable when the provider returns HTTP 429, while preserving
insufficient_quotaas a terminal usage-limit signal for OpenAI billing exhaustion. Structured provider details are included in usage-limit matching so response-body error codes are classified consistently.Refs https://linear.app/codercom/issue/CODAGT-495/broad-quota-pattern-suppresses-retries-on-provider-429-rate-limit
Implementation plan and review context
Plan:
/home/coder/.coder/plans/PLAN-e03d6baf-4aad-48dc-9868-c5d836c6fae4.mdDeep review found that OpenAI documents
429 - You exceeded your current quota, please check your plan and billing detailsas a terminal billing exhaustion case, while Google Gemini and Azure OpenAI document 429 quota wording as retryable rate limiting. The implementation therefore gates broad quota/billing text on non-429 status while keeping the explicitinsufficient_quotacode terminal at any status.Generated by Coder Agents on behalf of @johnstcn.