Mitigate safe-output PR review rate-limit failures#39797
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR improves reliability of safe-output PR review submission under GitHub App installation token rate limiting by making execution-state metadata capture best-effort and adding bounded retry behavior around pulls.createReview, while avoiding additional metadata calls once the flow has degraded.
Changes:
- Add best-effort “before/after” review state capture that logs warnings and proceeds without metadata when rate-limited or otherwise failing.
- Wrap
pulls.createReviewin bounded retry logic (single retry, capped delay) and reuse it across fallback review submission paths. - Add targeted tests for degraded-mode state capture and
createReviewrate-limit retry behavior.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/pr_review_buffer.cjs | Makes review state capture non-fatal, skips after-state fetch in degraded mode, and introduces createReview retry wrapper. |
| actions/setup/js/pr_review_buffer.test.cjs | Adds tests covering rate-limited before-state capture and one-retry behavior for createReview. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 3
| const { repo, repoParts, pullRequestNumber, pullRequest } = reviewContext; | ||
| const beforeState = await fetchPullRequestReviewState(github, repoParts, pullRequestNumber); | ||
| const beforeState = await fetchReviewStateBestEffort(repoParts, pullRequestNumber, "before"); | ||
|
|
| const REVIEW_RATE_LIMIT_RETRY_CONFIG = { | ||
| ...RATE_LIMIT_RETRY_CONFIG, | ||
| maxRetries: 1, | ||
| maxDelayMs: 60000, | ||
| }; |
| async function fetchReviewStateBestEffort(repoParts, pullRequestNumber, phase) { | ||
| try { | ||
| return await fetchPullRequestReviewState(github, repoParts, pullRequestNumber); | ||
| } catch (error) { | ||
| core.warning(`Failed to capture ${phase} PR review state for #${pullRequestNumber}: ${getErrorMessage(error)}. Continuing without execution-state metadata.`); |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #39797 does not have the 'implementation' label (has_implementation_label=false) and has 0 new lines of code in business logic directories (well below the 100-line threshold). |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — good targeted fix with solid primary tests; a few coverage gaps and one config-readability note worth addressing.
📋 Key Themes & Highlights
Key Themes
- Test coverage gaps: retry/degraded-mode tests exist only for the main submission path; the own-PR fallback, body-only fallback, and exhausted-retry paths are untested with the new behaviour
- Config transparency:
REVIEW_RATE_LIMIT_RETRY_CONFIGsilently inheritsinitialDelayMsandjitterMsfrom the base config; the actual first-retry sleep (~30 s withoutRetry-After) is not visible at the definition site
Positive Highlights
- ✅ Root cause correctly addressed — both the metadata-capture path and the
createReviewcall are now rate-limit resilient - ✅
fetchAfterStateIfAvailableshort-circuiting after-state when before-state failed is an elegant quota-conserving design - ✅
REVIEW_RATE_LIMIT_RETRY_CONFIG.maxRetries: 1is the right call — no runaway waits that eat into the workflow timeout window - ✅
attachExecutionState's falsy-guard onbeforeState/afterStatemeans null flows through cleanly; the test assertiontoBeUndefined()is correct - ✅ Both new tests use realistic error shapes (403 + rate-limit headers) matching what the GitHub API actually returns
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer
| ...RATE_LIMIT_RETRY_CONFIG, | ||
| maxRetries: 1, | ||
| maxDelayMs: 60000, | ||
| }; |
There was a problem hiding this comment.
[/diagnose] REVIEW_RATE_LIMIT_RETRY_CONFIG only overrides maxRetries and maxDelayMs — the inherited initialDelayMs: 15000 and jitterMs: 5000 are invisible here, but they determine the actual retry sleep duration.
💡 What this means in practice
Without a retry-after response header, the single retry will sleep:
min(initialDelayMs × backoffMultiplier, maxDelayMs) + jitter
= min(15 000 × 2, 60 000) + rand(0, 4 999)
= 30 000–34 999 ms (~30–35 s)
With retry-after: 1 (as in the test) it drops to 1 000–5 999 ms — but a live installation-token exhaustion often does not include that header.
This is almost certainly intentional (a 30 s single-stall budget seems fine), but nothing in the block makes that visible. Consider adding comments or explicit overrides:
const REVIEW_RATE_LIMIT_RETRY_CONFIG = {
...RATE_LIMIT_RETRY_CONFIG,
maxRetries: 1,
maxDelayMs: 60_000,
// inherits initialDelayMs:15000 → first backoff ~30 s when no Retry-After header
// inherits jitterMs:5000 → up to 5 s spread
};| } finally { | ||
| setTimeoutSpy.mockRestore(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
[/tdd] The new retry test covers only the success-after-one-retry happy path. There is no test for the exhausted-retry case (initial call + retry both fail with rate-limit errors), which leaves the caller's error-handling path unverified.
💡 Why this matters
When maxRetries: 1 is exhausted, withRetry rethrows an enhanced error with extra metadata (retryable: true, suggestion: ...). The outer catch (error) block in submitReview calls getErrorMessage(error) and returns { success: false, error: ... }. That path looks correct, but a regression test would lock it in.
Suggested addition (sibling to this it block):
it("should return success:false when createReview rate-limit retry is exhausted", async () => {
buffer.setReviewMetadata("Looks good", "COMMENT");
buffer.setReviewContext({ repo: "owner/repo", repoParts: { owner: "owner", repo: "repo" }, pullRequestNumber: 42, pullRequest: { head: { sha: "abc123" } } });
const rateLimitError = new Error("API rate limit exceeded for installation");
rateLimitError.response = { status: 403, headers: { "x-ratelimit-remaining": "0", "retry-after": "1" } };
// Both attempts fail
mockGithub.rest.pulls.createReview.mockRejectedValue(rateLimitError);
const result = await buffer.submitReview();
expect(result.success).toBe(false);
expect(result.error).toBeDefined();
});| const { data: review } = await github.rest.pulls.createReview(requestParams); | ||
| const { data: review } = await createReviewWithRetry(requestParams); | ||
| await maybeSupersedeOlderReviews(review.id); | ||
| const afterState = await fetchAfterStateIfAvailable(); |
There was a problem hiding this comment.
[/tdd] The own-PR COMMENT fallback (here) and the body-only fallback (line ~626) both call createReviewWithRetry and fetchAfterStateIfAvailable, but neither path has a rate-limit or degraded-state test.
💡 Why this is worth covering
The main submission path now has two regression tests. The fallback paths share the same helpers so they're unlikely to break independently — but they're only reached via specific error conditions, so a rate-limit during one of them would produce an interleaved error-handling flow that hasn't been exercised.
At minimum, one test confirming that a rate-limit on createReview inside the own-PR fallback still returns { success: true } (after retry) or { success: false } (after exhaustion) would round out the story.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 100/100 — Excellent
📊 Metrics & Test Classification (2 tests analyzed)
Go: 0 ( Verdict
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). Both tests cover behavioral contracts (rate-limit graceful degradation and retry-with-bounded-delay policy) with full error-path coverage.
There was a problem hiding this comment.
Request changes — four medium-severity issues collectively work against the quota-reduction goal this PR sets out to achieve.
💡 Blocking concerns
1. fetchReviewStateBestEffort catches all exceptions (new — inline comment, line 111)
The catch block is written as a blanket exception handler. Auth failures, wrong-repo targeting, 404s, and code bugs are silently swallowed and logged as warnings. Only rate-limit / transient failures should be suppressed; everything else must propagate so real breakage is visible.
2. fetchPullRequestReviewState fires both API requests concurrently under rate pressure (line 112)
When already rate-limited, the helper still launches both pulls.get and pulls.listReviews in parallel, burning quota on the second call even though the first will fail. A sequential / bail-fast pattern would match the stated quota-reduction goal.
3. REVIEW_RATE_LIMIT_RETRY_CONFIG inherits initialDelayMs: 15000 / jitterMs: 5000 (line 44)
With one retry and no Retry-After header, backoff produces a ~30–35 s sleep. Jitter is also added after maxDelayMs is applied, so the actual sleep can exceed maxDelayMs by up to ~5 s. If the intent is a short bounded wait for quota stalls, both values should be overridden explicitly.
4. beforeState captured before early-exit guards (line 276)
fetchReviewStateBestEffort is called before the staged-mode, empty-review, and missing-SHA guards. Two API calls are spent even when the review will be skipped entirely — the opposite of the PR's quota-reduction intent.
🔎 Code quality review by PR Code Quality Reviewer
| async function fetchReviewStateBestEffort(repoParts, pullRequestNumber, phase) { | ||
| try { | ||
| return await fetchPullRequestReviewState(github, repoParts, pullRequestNumber); | ||
| } catch (error) { |
There was a problem hiding this comment.
Catch-all hides real failures: this catches every exception — auth errors, wrong-repo targeting, permission regressions, unexpected API contract changes — and silently treats them all as null state. The PR intent is specifically quota-stall resilience, but the implementation is much wider.
💡 Suggested fix — narrow to rate-limit / transient errors only
} catch (error) {
// Only suppress transient / rate-limit failures; let real errors propagate.
if (!isTransientError(error)) throw error;
core.warning(`Failed to capture ${phase} PR review state for #${pullRequestNumber}: ${getErrorMessage(error)}. Continuing without execution-state metadata.`);
return null;
}Import isTransientError from error_recovery.cjs (already handles rate-limit patterns). A broken fetchPullRequestReviewState call signature, wrong credentials, or a 404 should surface as a real error — not disappear into a warning log line.
|
@copilot run pr-finisher skill |
|
@copilot review all comments and summarize the remaining blockers.
|
Executed the pr-finisher pass on this branch.
|
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Completed. I ran a full pr-finisher pass and pushed the fixes in commit |
Safe-output review finalization was failing with
403 API rate limit exceeded for installationafter buffering review comments. The hot path was spending quota on pre/post review metadata fetches aroundpulls.createReview, so a brief quota stall could drop the entire review.Review finalization
pulls.get/pulls.listReviewsfailures occur under rate pressurebefore_state/after_statemetadata when it is actually availableRate-limit recovery
pulls.createReviewin bounded retry logic for rate-limit responsesQuota reduction in degraded mode
pulls.getfirst, thenpulls.listReviews) to avoid spending an extra call when PR fetch failsTests
createReviewrate-limit failuresCOMMENTretry path is itself rate-limited and retried