Inspect all completed forecast runs in `gh aw forecast` by Copilot · Pull Request #39102 · github/gh-aw

Copilot · 2026-06-13T15:46:30Z

gh aw forecast was sampling only successful runs, which undercounted AI credit usage for workflows where failed jobs still consumed credits. This change broadens sampling to all completed runs while continuing to exclude skipped runs from forecast inputs.

Behavior change
- Request completed workflow runs instead of success runs when building forecast samples.
- Apply the same completed-run filter in evaluation/backtesting so forecast inputs and validation observe the same run population.
Skipped-run handling
- Preserve the existing guard that excludes runs with conclusion == "skipped" from sampling and metric computation.
- This keeps no-op executions out of AIC and success-rate estimates.
Test coverage
- Update the forecast unit test to assert that the command now queries completed runs rather than only successful runs.

opts := ListWorkflowRunsOptions{
    WorkflowName: apiName,
    Status:       "completed",
    StartDate:    startDate,
    Limit:        config.SampleSize,
}

func isCompletedNonSkippedRun(r WorkflowRun) bool {
    return r.Status == "completed" && r.Conclusion != "skipped"
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-13T16:00:59Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-13T16:01:04Z

🧪 Test Quality Sentinel completed test quality analysis.

Copilot

Pull request overview

This pull request updates gh aw forecast to sample all completed workflow runs (instead of only successful ones) so AI credit usage estimates include runs that consume credits even when jobs fail, while still excluding skipped runs from sampling and evaluation.

Changes:

Update forecast sampling to request workflow runs with Status: "completed" instead of Status: "success".
Apply the same completed-run query behavior in forecast evaluation/backtesting to keep training and validation populations aligned.
Update the unit test to assert the new "completed" status query.

Show a summary per file

File	Description
pkg/cli/forecast.go	Switches workflow run listing filters from `success` to `completed` for both forecasting and evaluation.
pkg/cli/forecast_test.go	Renames/updates the test to verify the CLI now requests `completed` runs.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions · 2026-06-13T16:02:16Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #39102 does not have the implementation label (has_implementation_label=false) and has only 4 new lines in business logic directories, well below the 100-line threshold (requires_adr_by_default_volume=false).

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — overall the fix is correct and well-scoped; two suggestions to strengthen it.

📋 Key Themes & Highlights

Key Themes

Test coverage gap: the evaluateForecast backtesting path uses listWorkflowRunsWithPagination directly (not the injectable forecastListWorkflowRunsPaginated), so the "success" → "completed" fix there has no unit test regression guard.
Testability asymmetry: forecastWorkflow and evaluateForecast follow different injection patterns, making symmetric test coverage hard without a small refactor.

Positive Highlights

✅ Root cause correctly identified and fixed in both paths (forecastWorkflow and evaluateForecast), keeping sampling and backtesting consistent.
✅ Existing isCompletedNonSkippedRun guard preserved — skipped runs remain excluded from AIC and success-rate estimates.
✅ Test renamed and assertion updated to match the new behaviour — clean, minimal diff.
✅ PR description is precise about what changed and why.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 226.1 AIC · ⌖ 13.5 AIC · ⊞ 27.9K

github-actions · 2026-06-13T16:07:19Z

 	opts := ListWorkflowRunsOptions{
 		WorkflowName: apiName,
-		Status:       "success",
+		Status:       "completed",


[/diagnose] This fix is correct, but the evaluateForecast path has no unit test regression coverage.

Unlike forecastWorkflow, which calls listRunsWithBackoff → forecastListWorkflowRunsPaginated (the injectable var at line 50), evaluateForecast calls listWorkflowRunsWithPagination directly on line 1126. That direct call means no mock can capture capturedOpts.Status here, so the "success" → "completed" fix in this function cannot be pinned with a unit test today.

💡 Suggestion: introduce an injectable variable for `evaluateForecast`

Add a package-level var mirroring the existing pattern:

// near top of file, next to forecastListWorkflowRunsPaginated var forecastEvalListWorkflowRuns = listWorkflowRunsWithPagination

Then replace the direct call on line 1126:

// before runs, _, err := listWorkflowRunsWithPagination(opts) // after runs, _, err := forecastEvalListWorkflowRuns(opts)

This lets a test capture opts.Status on the validation path, closing the coverage gap symmetrically with TestForecastWorkflow_RequestsCompletedRuns.

github-actions · 2026-06-13T16:07:19Z

 	}, 30)
 	require.NoError(t, err)
-	assert.Equal(t, "success", capturedOpts.Status)
+	assert.Equal(t, "completed", capturedOpts.Status)


[/tdd] This test covers the forecastWorkflow sampling path — consider adding a parallel test for evaluateForecast to lock in the fix on the backtesting path too.

💡 Sketch of a parallel test

Once forecastEvalListWorkflowRuns (or equivalent) is made injectable:

func TestEvaluateForecast_RequestsCompletedRuns(t *testing.T) { original := forecastEvalListWorkflowRuns t.Cleanup(func() { forecastEvalListWorkflowRuns = original }) var capturedOpts ListWorkflowRunsOptions forecastEvalListWorkflowRuns = func(opts ListWorkflowRunsOptions) ([]WorkflowRun, int, error) { capturedOpts = opts return nil, 0, nil } evaluateForecast(context.Background(), "smoke-copilot", ForecastWorkflowResult{}, "2026-01-01", "2026-02-01", ForecastConfig{}) assert.Equal(t, "completed", capturedOpts.Status) }

This mirrors the existing TestForecastWorkflow_RequestsCompletedRuns and ensures both sampling paths stay in sync.

github-actions · 2026-06-13T16:08:15Z

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 70/100 — Acceptable, with suggestions

Analyzed 1 test (modified): 1 design test, 0 implementation tests, 0 guideline violations. The test correctly encodes the behavioral contract of the "completed" status filter change.

📊 Metrics & Test Classification (1 test analyzed)

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	0 (0%)
Duplicate test clusters	0
Test inflation detected	No
🚨 Coding-guideline violations	0

Test Classification Details

Test	File	Classification	Issues Detected
`TestForecastWorkflow_RequestsCompletedRuns`	`pkg/cli/forecast_test.go:353`	✅ Design	Missing assertion messages; no error-path coverage

Language Support

Tests analyzed:

🐹 Go (*_test.go): 1 test — unit (//go:build !integration)

⚠️ Flagged Tests — Requires Review (1 issue)

⚠️ `TestForecastWorkflow_RequestsCompletedRuns` (`pkg/cli/forecast_test.go:353`)

Classification: Design test ✅ (but with quality gaps)

Issue 1 — Missing assertion messages: Both assertions lack a descriptive message argument:

require.NoError(t, err)                             // no message
assert.Equal(t, "completed", capturedOpts.Status)  // no message

The project guideline requires a message on every assertion call so failures are self-describing. Suggested:

require.NoError(t, err, "forecastWorkflow should not return an error")
assert.Equal(t, "completed", capturedOpts.Status, "forecastWorkflow must request completed runs, not just successful ones")

Issue 2 — Happy-path only: The test verifies the status filter is set correctly when the call succeeds, but does not test the error path (e.g. what happens when forecastListWorkflowRunsPaginated returns an error). A complementary table-driven test would increase confidence.

What design invariant does this test enforce? It verifies that forecastWorkflow requests "completed" runs from the GitHub API — the behavioral requirement that both successful and failed/cancelled runs are sampled for forecasting purposes.

What would break if deleted? A future refactor reverting Status to "success" would go undetected, silently excluding failed and cancelled runs from the forecast sample and producing skewed results.

Verdict

✅ Check passed. 0% of new/modified tests are implementation tests (threshold: 30%). No coding-guideline violations detected.

The edit is a focused correctness fix — the renamed test and updated assertion accurately reflect the new behavioral contract. The quality score is 70 rather than higher only because the test covers the happy path exclusively and is missing assertion messages.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §27471709397

🧪 Test quality analysis by Test Quality Sentinel · 177.3 AIC · ⌖ 27.5 AIC · ⊞ 27.3K · ◷

github-actions

✅ Test Quality Sentinel: 70/100. Test quality is acceptable — 0% of new/modified tests are implementation tests (threshold: 30%).

Inspect completed forecast runs

7ef545b

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 13, 2026

Copilot AI changed the title ~~Inspect completed runs in forecast command~~ Inspect all completed forecast runs in gh aw forecast Jun 13, 2026

Copilot finished work on behalf of pelikhan June 13, 2026 15:46

Copilot created this pull request from a session on behalf of pelikhan June 13, 2026 15:46 View session

Copilot AI requested a review from pelikhan June 13, 2026 15:46

pelikhan marked this pull request as ready for review June 13, 2026 15:59

Copilot AI review requested due to automatic review settings June 13, 2026 15:59

Copilot started reviewing on behalf of pelikhan June 13, 2026 16:00 View session

Copilot AI reviewed Jun 13, 2026

View reviewed changes

pelikhan merged commit 3fa56b1 into main Jun 13, 2026
68 of 80 checks passed

pelikhan deleted the copilot/review-forecast-command branch June 13, 2026 16:03

github-actions Bot mentioned this pull request Jun 13, 2026

[aw] No-Op Runs #38951

Closed

github-actions Bot reviewed Jun 13, 2026

View reviewed changes

github-actions Bot approved these changes Jun 13, 2026

View reviewed changes

This was referenced Jun 13, 2026

Smoke Test: Claude - 27471836462 #39110

Closed

[docs] Update documentation for features from 2026-06-14 #39214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inspect all completed forecast runs in `gh aw forecast`#39102

Inspect all completed forecast runs in `gh aw forecast`#39102
pelikhan merged 1 commit into
mainfrom
copilot/review-forecast-command

Copilot AI commented Jun 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 13, 2026

Uh oh!

github-actions Bot Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Test Classification Details

Language Support

⚠️ `TestForecastWorkflow_RequestsCompletedRuns` (`pkg/cli/forecast_test.go:353`)

Uh oh!

github-actions Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 13, 2026

🧪 Test Quality Sentinel Report

Test Classification Details

Language Support

⚠️ TestForecastWorkflow_RequestsCompletedRuns (pkg/cli/forecast_test.go:353)

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

⚠️ `TestForecastWorkflow_RequestsCompletedRuns` (`pkg/cli/forecast_test.go:353`)