Skip to content

Fix audit comparison turns delta for cached summaries#43501

Open
pelikhan with Copilot wants to merge 3 commits into
mainfrom
copilot/cli-tools-test-fix-comparison-delta
Open

Fix audit comparison turns delta for cached summaries#43501
pelikhan with Copilot wants to merge 3 commits into
mainfrom
copilot/cli-tools-test-fix-comparison-delta

Conversation

Copilot AI commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

audit comparison could report comparison.delta.turns.after = 0 even when the run had non-zero turns, causing false turns_decrease classifications. The bug occurred on cached-summary rendering paths where Run.Turns could be stale/zero while Metrics.Turns was correct.

  • Current-run snapshot backfill in cached path

    • Updated processedRunFromSummary to backfill processedRun.Run.Turns from summary.Metrics.Turns only when Run.Turns is zero.
    • Preserves existing non-zero Run.Turns values.
  • Regression coverage

    • Added tests for:
      • backfilling turns from metrics when cached run turns are missing
      • preserving existing run turns when already present
if processedRun.Run.Turns == 0 && summary.Metrics.Turns > 0 {
	processedRun.Run.Turns = summary.Metrics.Turns
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix comparison.delta.turns.after reporting incorrect value Fix audit comparison turns delta for cached summaries Jul 5, 2026
Copilot AI requested a review from pelikhan July 5, 2026 07:39
@pelikhan pelikhan marked this pull request as ready for review July 5, 2026 07:40
Copilot AI review requested due to automatic review settings July 5, 2026 07:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an audit comparison bug where cached-summary rendering could incorrectly treat Run.Turns as 0 (stale) even when the cached Metrics.Turns was non-zero, leading to incorrect turns_decrease classifications.

Changes:

  • Backfill ProcessedRun.Run.Turns from RunSummary.Metrics.Turns when Run.Turns is 0 and metrics turns are available.
  • Add regression tests covering both the backfill behavior and the “do not overwrite non-zero turns” behavior.
Show a summary per file
File Description
pkg/cli/audit.go Backfills processedRun.Run.Turns from cached summary metrics when the cached run metadata is missing turns.
pkg/cli/audit_test.go Adds unit tests to prevent regressions in cached-summary turns handling.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0
  • Review effort level: Low

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #43501 does not have the 'implementation' label and has only 26 new lines of code in business logic directories (threshold: >100).

@github-actions github-actions Bot mentioned this pull request Jul 5, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Fix audit comparison turns delta for cached summaries

The fix is correct, well-scoped, and well-tested.

Root cause: processedRunFromSummary copies summary.Run into processedRun.Run, but Run.Turns can be zero when the summary was cached before that field was populated — while summary.Metrics.Turns is always authoritative (computed from logs). The backfill guard (Run.Turns == 0 && Metrics.Turns > 0) is the right condition and correctly preserves any pre-existing non-zero value.

What was reviewed:

  • processedRunFromSummary fix in pkg/cli/audit.go — minimal, correct, placed before the LogsPath assignment
  • Two regression tests in pkg/cli/audit_test.go — cover the backfill path and preserve-existing-turns path with clear assertions
  • Downstream in audit_comparison.godelta.Turns.After now gets the correct non-zero value, fixing the false turns_decrease classification

No issues found. LGTM.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • proxy.golang.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "proxy.golang.org"

See Network Configuration for more information.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 31.1 AIC · ⌖ 4.99 AIC · ⊞ 4.8K

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100 — Excellent

Analyzed 2 test(s): 2 design, 0 implementation, 0 violation(s).

📊 Metrics (2 tests)
Metric Value
Analyzed 2 (Go: 2, JS: 0)
✅ Design 2 (100%)
⚠️ Implementation 0 (0%)
Edge/error coverage 2 (100%)
Duplicate clusters 0
Inflation ⚠️ Yes (7.67:1 — 23 test lines / 3 prod lines)
🚨 Violations 0
Test File Classification Issues
TestProcessedRunFromSummaryBackfillsTurnsFromMetrics pkg/cli/audit_test.go design_test / behavioral_contract / high_value None
TestProcessedRunFromSummaryPreservesExistingTurns pkg/cli/audit_test.go design_test / behavioral_contract / high_value None

Verdict

Passed. 0% implementation tests (threshold: 30%). Both tests verify distinct behavioral contracts of processedRunFromSummary: (1) backfilling turns from metrics when the run has no turn data, and (2) preserving existing turns when present. All assertions include descriptive failure messages. Build tag //go:build !integration is present.

⚠️ Inflation note: test-to-production line ratio is 7.67:1, above the 2:1 threshold. This is expected for a focused 3-line production fix — the test setup requires more lines than the conditional being tested. The tests are substantive, not inflated.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • awmgmcpg

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "awmgmcpg"

See Network Configuration for more information.

🧪 Test quality analysis by Test Quality Sentinel · 28.5 AIC · ⌖ 12.7 AIC · ⊞ 6.8K ·
Comment /review to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. 0% implementation tests (threshold: 30%). Both tests cover distinct behavioral contracts with descriptive assertions and no violations.

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Skills-Based Review 🧠

Applied /diagnosing-bugs and /tdd — two minor suggestions on this clean, well-scoped bug fix.

📋 Key Themes & Highlights

Suggestions

  1. Missing code comment (audit.go:501) — The backfill condition is non-obvious without knowing why Run.Turns can be stale on cached-summary paths. Without a comment, a future developer may treat it as dead code and remove it. See inline comment.

  2. Uncovered edge case (audit_test.go:130) — No test covers the case where both Run.Turns and Metrics.Turns are zero. The code handles it correctly (no backfill), but documenting it as an explicit invariant protects against future drift. See inline comment.

Positive Highlights

  • ✅ Surgical, minimal fix — 3 lines targeting only the stale-turns problem
  • ✅ Excellent test-to-code ratio: 23 lines of test coverage for 3 lines of logic
  • ✅ Defensive condition (Run.Turns == 0 && Metrics.Turns > 0) correctly preserves non-zero existing values
  • ✅ Test names read as specifications — easy to understand without reading the body
  • ✅ PR description clearly explains the root cause and fix approach

@copilot please address the review comments above.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 45.6 AIC · ⌖ 5.51 AIC · ⊞ 6.6K ·
Comment /matt to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 45.6 AIC · ⌖ 5.51 AIC · ⊞ 6.6K
Comment /matt to run again

Comment thread pkg/cli/audit.go
GitHubRateLimitUsage: summary.GitHubRateLimitUsage,
JobDetails: summary.JobDetails,
}
if processedRun.Run.Turns == 0 && summary.Metrics.Turns > 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnosing-bugs] The fix is correct, but the why is missing — future readers will not know that Run.Turns can be stale/zero specifically in the cached-summary rendering path.

💡 Suggested comment
// Run.Turns may be zero on cached-summary paths where the RunSummary was
// serialised before the run completed. Metrics.Turns is populated from log
// parsing and is authoritative; backfill here so that audit comparison deltas
// are computed from an accurate value.
if processedRun.Run.Turns == 0 && summary.Metrics.Turns > 0 {
    processedRun.Run.Turns = summary.Metrics.Turns
}

Without this context, the next developer may remove it as apparently defensive dead code.

@copilot please address this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the explanatory comment in 3d0db57.

Comment thread pkg/cli/audit_test.go
}
}

func TestProcessedRunFromSummaryBackfillsTurnsFromMetrics(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Consider also testing the edge case where both Run.Turns and Metrics.Turns are zero — the current fix silently leaves turns as 0 in that scenario, which would still produce a false turns_decrease in comparison. A dedicated test makes the invariant explicit.

💡 Suggested test
func TestProcessedRunFromSummaryBothTurnsZero(t *testing.T) {
    summary := &RunSummary{
        Run:     WorkflowRun{DatabaseID: 789, Turns: 0},
        Metrics: LogMetrics{Turns: 0},
    }

    processed := processedRunFromSummary(summary, "/tmp/run-output")

    // Both are zero — no backfill occurs, and turns should remain 0.
    assert.Equal(t, 0, processed.Run.Turns, "turns should remain zero when neither Run.Turns nor Metrics.Turns is available")
}

Documents the known limitation: if a run genuinely has zero turns and Metrics.Turns is also zero, the delta will still reflect 0.

@copilot please address this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestProcessedRunFromSummaryBothTurnsZero in 3d0db57.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REQUEST_CHANGES — incomplete fix scope

The processedRunFromSummary backfill is correct, but the same staleness bug exists in computeRunMetricsDiff (audit_diff.go) which reads summary.Run.Turns directly and is not patched by this PR. Old cached summaries will still produce zero turns in audit diff output. See inline comments for details and suggested fixes.

Review summary

High: computeRunMetricsDiff reads summary{1,2}.Run.Turns directly without the backfill — audit diff remains broken for old cached summaries.

Medium: Test suite is missing the edge case where both Run.Turns == 0 and Metrics.Turns == 0, leaving the Metrics.Turns > 0 guard untested.

🔎 Code quality review by PR Code Quality Reviewer · 112.3 AIC · ⌖ 6.55 AIC · ⊞ 5.4K
Comment /review to run again

Comment thread pkg/cli/audit_test.go
Metrics: LogMetrics{Turns: 34},
}

processed := processedRunFromSummary(summary, "/tmp/run-output")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test for the both-zeros edge case (Run.Turns == 0, Metrics.Turns == 0). The two new tests cover backfill (zero→non-zero) and preservation (non-zero→non-zero). Neither exercises the path where Metrics.Turns == 0 too — meaning the guard summary.Metrics.Turns > 0 is never exercised as false. A future regression that drops the > 0 check and unconditionally writes summary.Metrics.Turns back to processedRun.Run.Turns would not be caught by the current suite.

💡 Suggested test
func TestProcessedRunFromSummaryBothTurnsZero(t *testing.T) {
    summary := &RunSummary{
        Run:     WorkflowRun{DatabaseID: 789, Turns: 0},
        Metrics: LogMetrics{Turns: 0},
    }

    processed := processedRunFromSummary(summary, "/tmp/run-output")

    // Neither field has data; turns should remain 0, not be set to Metrics.Turns.
    assert.Equal(t, 0, processed.Run.Turns, "run turns should remain 0 when both Run.Turns and Metrics.Turns are 0")
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestProcessedRunFromSummaryBothTurnsZero and also applied the same Run.Turns backfill from Metrics.Turns inside computeRunMetricsDiff (audit_diff.go) so audit diff is fixed for old cached summaries too — see 3d0db57.

@gh-aw-bot

Copy link
Copy Markdown
Collaborator

@copilot please run the pr-finisher skill, review the remaining feedback, ensure the branch is current, and rerun checks.

Generated by 👨‍🍳 PR Sous Chef · 16.1 AIC · ⌖ 6.23 AIC · ⊞ 4.6K ·
Comment /souschef to run again

…backfill, add both-zeros test

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI requested a review from gh-aw-bot July 5, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[cli-tools-test] [exploratory-test] audit: comparison.delta.turns.after always reports 0 regardless of actual turn count

4 participants