Inspect all completed forecast runs in gh aw forecast#39102
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
gh aw forecast
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. |
There was a problem hiding this comment.
Pull request overview
This pull request updates gh aw forecast to sample all completed workflow runs (instead of only successful ones) so AI credit usage estimates include runs that consume credits even when jobs fail, while still excluding skipped runs from sampling and evaluation.
Changes:
- Update forecast sampling to request workflow runs with
Status: "completed"instead ofStatus: "success". - Apply the same completed-run query behavior in forecast evaluation/backtesting to keep training and validation populations aligned.
- Update the unit test to assert the new
"completed"status query.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/forecast.go | Switches workflow run listing filters from success to completed for both forecasting and evaluation. |
| pkg/cli/forecast_test.go | Renames/updates the test to verify the CLI now requests completed runs. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #39102 does not have the implementation label (has_implementation_label=false) and has only 4 new lines in business logic directories, well below the 100-line threshold (requires_adr_by_default_volume=false). |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — overall the fix is correct and well-scoped; two suggestions to strengthen it.
📋 Key Themes & Highlights
Key Themes
- Test coverage gap: the
evaluateForecastbacktesting path useslistWorkflowRunsWithPaginationdirectly (not the injectableforecastListWorkflowRunsPaginated), so the"success" → "completed"fix there has no unit test regression guard. - Testability asymmetry:
forecastWorkflowandevaluateForecastfollow different injection patterns, making symmetric test coverage hard without a small refactor.
Positive Highlights
- ✅ Root cause correctly identified and fixed in both paths (
forecastWorkflowandevaluateForecast), keeping sampling and backtesting consistent. - ✅ Existing
isCompletedNonSkippedRunguard preserved — skipped runs remain excluded from AIC and success-rate estimates. - ✅ Test renamed and assertion updated to match the new behaviour — clean, minimal diff.
- ✅ PR description is precise about what changed and why.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 226.1 AIC · ⌖ 13.5 AIC · ⊞ 27.9K
| opts := ListWorkflowRunsOptions{ | ||
| WorkflowName: apiName, | ||
| Status: "success", | ||
| Status: "completed", |
There was a problem hiding this comment.
[/diagnose] This fix is correct, but the evaluateForecast path has no unit test regression coverage.
Unlike forecastWorkflow, which calls listRunsWithBackoff → forecastListWorkflowRunsPaginated (the injectable var at line 50), evaluateForecast calls listWorkflowRunsWithPagination directly on line 1126. That direct call means no mock can capture capturedOpts.Status here, so the "success" → "completed" fix in this function cannot be pinned with a unit test today.
💡 Suggestion: introduce an injectable variable for `evaluateForecast`
Add a package-level var mirroring the existing pattern:
// near top of file, next to forecastListWorkflowRunsPaginated
var forecastEvalListWorkflowRuns = listWorkflowRunsWithPaginationThen replace the direct call on line 1126:
// before
runs, _, err := listWorkflowRunsWithPagination(opts)
// after
runs, _, err := forecastEvalListWorkflowRuns(opts)This lets a test capture opts.Status on the validation path, closing the coverage gap symmetrically with TestForecastWorkflow_RequestsCompletedRuns.
| }, 30) | ||
| require.NoError(t, err) | ||
| assert.Equal(t, "success", capturedOpts.Status) | ||
| assert.Equal(t, "completed", capturedOpts.Status) |
There was a problem hiding this comment.
[/tdd] This test covers the forecastWorkflow sampling path — consider adding a parallel test for evaluateForecast to lock in the fix on the backtesting path too.
💡 Sketch of a parallel test
Once forecastEvalListWorkflowRuns (or equivalent) is made injectable:
func TestEvaluateForecast_RequestsCompletedRuns(t *testing.T) {
original := forecastEvalListWorkflowRuns
t.Cleanup(func() { forecastEvalListWorkflowRuns = original })
var capturedOpts ListWorkflowRunsOptions
forecastEvalListWorkflowRuns = func(opts ListWorkflowRunsOptions) ([]WorkflowRun, int, error) {
capturedOpts = opts
return nil, 0, nil
}
evaluateForecast(context.Background(), "smoke-copilot", ForecastWorkflowResult{}, "2026-01-01", "2026-02-01", ForecastConfig{})
assert.Equal(t, "completed", capturedOpts.Status)
}This mirrors the existing TestForecastWorkflow_RequestsCompletedRuns and ensures both sampling paths stay in sync.
🧪 Test Quality Sentinel Report
📊 Metrics & Test Classification (1 test analyzed)
Test Classification Details
Language SupportTests analyzed:
|
gh aw forecastwas sampling only successful runs, which undercounted AI credit usage for workflows where failed jobs still consumed credits. This change broadens sampling to all completed runs while continuing to exclude skipped runs from forecast inputs.Behavior change
completedworkflow runs instead ofsuccessruns when building forecast samples.Skipped-run handling
conclusion == "skipped"from sampling and metric computation.Test coverage