Skip to content

optimize: reduce claude-code-user-docs-review AIC cost ~20–33% by eliminating redundant main-agent work#38401

Merged
pelikhan merged 3 commits into
mainfrom
copilot/claude-code-user-docs-review
Jun 10, 2026
Merged

optimize: reduce claude-code-user-docs-review AIC cost ~20–33% by eliminating redundant main-agent work#38401
pelikhan merged 3 commits into
mainfrom
copilot/claude-code-user-docs-review

Conversation

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

The claude-code-user-docs-review workflow was consuming ~705 AIC/run due to the main agent duplicating work already delegated to sub-agents (redundant file reads, post-processing loops, and an over-specified output template).

Phase 1 — eliminate duplicate file reads

Replaced the 6-item numbered file list + trailing sub-agent dispatch with a single direct sub-agent launch. Previously the main agent cat'd all 6 docs before launching doc-reader, which reads the same files. The main agent now delegates entirely:

-Start by reading the essential documentation files…
-1. README.md  2. quick-start.md  3. …
-
-Use the `doc-reader` agent to gather structured facts…
+Launch the `doc-reader` agent and wait for its JSON output.
+Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly.

Phases 4–6 — remove post-sub-agent synthesis loops

Stripped the "Analyze:" / "Questions to answer:" / "Check for:" follow-up blocks. Sub-agent outputs (parity_observations, classification table, auth_gaps_or_missing_instructions) now pipe directly into Phase 7 sections without the main agent re-answering questions the sub-agents already answer.

Phase 7 — collapse output template

Reduced from 9 sections to 5 with an explicit 1,000-word cap. Removed: Persona Context (redundant with workflow description), standalone Example Parity (subsumed into Engine & Tool Matrix), and Conclusion /10 (redundant with Recommended Actions priority ranking).

History file consolidation

Added a directive to write findings only to review-history.jsonl and ignore legacy files (claude-doc-review-history.jsonl, claude-docs-review-trend.json, etc.), preventing history proliferation that grows startup context each run.

Copilot AI and others added 2 commits June 10, 2026 16:43
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…0-33%)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize Claude Code User Documentation Review optimize: reduce claude-code-user-docs-review AIC cost ~20–33% by eliminating redundant main-agent work Jun 10, 2026
Copilot AI requested a review from pelikhan June 10, 2026 16:52
@pelikhan pelikhan marked this pull request as ready for review June 10, 2026 16:56
Copilot AI review requested due to automatic review settings June 10, 2026 16:56
@pelikhan pelikhan merged commit dada0da into main Jun 10, 2026
54 of 65 checks passed
@pelikhan pelikhan deleted the copilot/claude-code-user-docs-review branch June 10, 2026 16:58
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #38401 does not have the implementation label and has 0 new lines of code in business logic directories (≤100 threshold). Neither Condition A nor Condition B is met.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel completed test quality analysis.

No test files were added or modified in this PR. The PR only modifies workflow markdown and lock files (.github/workflows/claude-code-user-docs-review.md, .github/workflows/claude-code-user-docs-review.lock.yml, .github/workflows/daily-token-consumption-report.lock.yml). Test Quality Sentinel skipped.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the claude-code-user-docs-review workflow prompt to reduce AI credit (AIC) usage by pushing more work into sub-agents and simplifying the main agent’s required output, and also aligns generated workflow artifacts/env to include models.json.

Changes:

  • Simplifies Phase 1 by delegating documentation reading entirely to the doc-reader sub-agent and removing several post-sub-agent synthesis loops.
  • Collapses the Phase 7 report template to fewer sections with an explicit 1,000-word cap, and adds guidance to consolidate run history into a single JSONL file.
  • Updates the daily token consumption report lock workflow to include /tmp/gh-aw/models.json in artifacts and to point GH_AW_MODELS_JSON_PATH at /tmp/gh-aw/models.json.
Show a summary per file
File Description
.github/workflows/daily-token-consumption-report.lock.yml Adds models.json to activation artifacts and standardizes GH_AW_MODELS_JSON_PATH to /tmp/gh-aw/models.json.
.github/workflows/claude-code-user-docs-review.md Refactors the agent prompt to eliminate redundant main-agent work and reduces the required output template.
.github/workflows/claude-code-user-docs-review.lock.yml Regenerates the compiled lockfile metadata to reflect the updated prompt content.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment on lines +69 to +70
Launch the `doc-reader` agent and wait for its JSON output.
Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly.
- Focus on the **user experience** of reading and following the docs
- Think about what would prevent successful adoption, not perfection
- This is a daily workflow - findings should be stored in cache-memory for tracking trends over time
- Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist.
@github-actions github-actions Bot mentioned this pull request Jun 10, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /zoom-out and /improve-codebase-architecture — requesting changes on reliability and quality-gate gaps.

The optimization direction is sound and the cost rationale is well-explained. Most concerns below are addressable in 1–2 lines each.

📋 Key Themes & Highlights

Key Themes

  • Sub-agent fallback gap: Phase 1 now fully trusts doc-reader with no recovery path if it fails or returns partial data; a single defensive sentence would close this.
  • Quality gates removed wholesale: Phases 4–6 stripped all main-agent cross-checks. The auth criteria (Phase 6) are the most valuable to retain (~15 tokens) since they guard the workflow's primary finding category for Claude users.
  • Trend metric lost: Removing the /10 score breaks cross-run comparability; this can be preserved in review-history.jsonl with zero impact on discussion word count.
  • Ambiguous directive: "Ignore legacy files" should be "Do not read legacy files" to actually prevent context growth.
  • Unrelated lock file change: daily-token-consumption-report.lock.yml fixes a path bug unrelated to the doc-review optimization; worth noting in the PR description.

Positive Highlights

  • ✅ Excellent PR description — each phase change is clearly justified with before/after diffs
  • ✅ History consolidation to review-history.jsonl is a smart operational improvement
  • ✅ Merging Engine Comparison + Tool Availability + Example Parity into one "Engine & Tool Matrix" section is cleaner architecture
  • ✅ 1,000-word cap on Phase 7 is a good guardrail
  • ✅ The doc-reader sub-agent was already purpose-built to read those 6 files — delegating fully to it is the right long-term direction

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 348.6 AIC · ⌖ 14 AIC


Use the `doc-reader` agent to gather structured facts from the six core documentation files. Use its JSON output as the factual basis for Phases 2, 3, and 7.
Launch the `doc-reader` agent and wait for its JSON output.
Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Removing the direct file reads with no fallback means the main agent is completely dependent on doc-reader—if it returns incomplete or malformed JSON there is no recovery path.

💡 Suggested defensive clause

Add a single line to guard against silent failure:

Launch the `doc-reader` agent and wait for its JSON output.
Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly.
**If the output is missing key fields or appears malformed, note the gap in the Executive Summary and continue with available data.**

Previously the 6-item file list served as both instructions and an implicit quality signal; without a fallback, a timed-out or empty sub-agent response will silently produce a hallucinated review with zero grounding.

- Do Claude workflows have the same capabilities as Copilot workflows?
- Are there features that only work with specific engines?
- Is it clear which tools are engine-agnostic?
Use the `engine-example-counter` agent. Its `parity_observations` field feeds directly into Phase 7 "Engine & Tool Matrix" section.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The "Analyze:" block was the main agent's independent verification layer over engine-example-counter output—its removal means the main agent passes sub-agent data through unchecked.

💡 A lightweight quality gate preserves most of the savings

The full question list cost tokens; a one-line sanity check does not:

Use the `engine-example-counter` agent. Its `parity_observations` field feeds directly into Phase 7 "Engine & Tool Matrix" section.
> Sanity-check: confirm at least one Claude and one Copilot example appear; escalate to Major if either is absent.

This retains ~85% of the token reduction while keeping one explicit signal the main agent must validate.

- Assumption that everyone uses Copilot tokens
- No alternative secret names documented
- No guidance on obtaining Claude API keys
Use the `auth-doc-extractor` agent. Its `auth_gaps_or_missing_instructions` feeds directly into Phase 7 "Auth Gaps" section.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The removed "Check for:" list covered the most user-facing blockers (missing Claude API key docs, Copilot-centric token assumptions). Removing it means the main agent will silently accept auth-doc-extractor output even if it omits these critical gaps.

💡 Single-sentence retention for the highest-priority signal

Auth coverage is the core value of this workflow for Claude users. Consider keeping one line:

Use the `auth-doc-extractor` agent. Its `auth_gaps_or_missing_instructions` feeds directly into Phase 7 "Auth Gaps" section.
**Priority check**: if Claude API key setup or non-Copilot token alternatives are undocumented, mark as Critical.

This costs ~15 tokens and protects the workflow's primary finding category.

- **Severity Findings**: Critical Blockers → Major Obstacles → Minor Confusion (combined in one `<details>` block)
- **Engine & Tool Matrix** — merge engine comparison and tool-engine-classifier output into one table (Copilot / Claude / Codex / Custom × Setup / Examples / Auth / Score); incorporate `parity_observations` from engine-example-counter
- **Auth Gaps** — use `auth-doc-extractor` JSON directly
- **Recommended Actions** (Priority 1 / 2 / 3)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] Removing "Conclusion — answer 'Can Claude Code users adopt gh-aw?' with overall score /10" breaks the one comparable metric across daily runs. Priority rankings (1/2/3) cannot be plotted as a trend.

💡 Preserve the score in JSONL without adding words to the discussion body

Since findings are written to review-history.jsonl, include the score there instead:

{"date": "2026-06-10", "score": 7, "critical": 1, "major": 3, "top_finding": "..."}

Alternatively, fold it into the Executive Summary as a single line: "Overall adoption readiness: 7/10 — auth gap is the primary blocker." This adds ~15 words and zero sections while keeping the trend signal alive.

- Focus on the **user experience** of reading and following the docs
- Think about what would prevent successful adoption, not perfection
- This is a daily workflow - findings should be stored in cache-memory for tracking trends over time
- Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] "Ignore legacy files if they exist" is ambiguous — it is unclear whether the agent should avoid reading them, writing to them, or both. If the agent globs *.jsonl at startup, legacy files will still inflate the context window despite this directive.

💡 Clearer and more actionable phrasing
- Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist.
+ Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Do not read `claude-doc-review-history.jsonl`, `claude-docs-review-trend.json`, or any other legacy history file.

Naming the legacy files explicitly also makes a one-off cleanup step obvious.

path: |
/tmp/gh-aw/aw_info.json
/tmp/gh-aw/model_multipliers.json
/tmp/gh-aw/models.json

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] This lock file change (adding models.json to the cache artifact list and correcting GH_AW_MODELS_JSON_PATH from ${RUNNER_TEMP}/gh-aw/actions/models.json to /tmp/gh-aw/models.json at two sites) is unrelated to the claude-code-user-docs-review optimization and is not mentioned in the PR description.

💡 What this likely means and what to do

If the previous path was wrong, the token-consumption report has been running silently without model data — that is a latent bug worth calling out explicitly. Two options:

  1. Separate PR: extract these lock file changes into their own PR with a clear description of the bug being fixed.
  2. PR description update: if this is intentional and tied to the doc-review change, add a ### Bonus fix section to the PR body explaining the path correction.

As written, a reviewer cannot tell whether this is a deliberate fix, a rebase artifact, or an accidental inclusion.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking observations

Two medium-severity concerns in claude-code-user-docs-review.md; the lock file changes are correct.

Findings

Lock file changes (daily-token-consumption-report.lock.yml, claude-code-user-docs-review.lock.yml)

The GH_AW_MODELS_JSON_PATH fix is correct: the old path pointed at the unmerged source file (${RUNNER_TEMP}/gh-aw/actions/models.json), bypassing frontmatter model-cost overrides. The new path (/tmp/gh-aw/models.json) correctly uses the merged output produced by writeMergedModelsJSON in generate_aw_info.cjs. Adding models.json to the artifact upload/download list is the necessary complement.

claude-code-user-docs-review.md — Phase 1 (medium)

Removing direct doc reads and making doc-reader output the sole factual basis with no fallback creates a silent-failure mode: if doc-reader returns empty or malformed JSON, the daily Discussion is posted with hallucinated or missing findings. Inline comment posted at line 70.

claude-code-user-docs-review.mdreview-history.jsonl (medium)

The append-one-JSON-line instruction omits a schema. Trend analysis requires stable field names across runs. Inline comment posted at line 146.

🔎 Code quality review by PR Code Quality Reviewer · ⌖ 13.5 AIC


Use the `doc-reader` agent to gather structured facts from the six core documentation files. Use its JSON output as the factual basis for Phases 2, 3, and 7.
Launch the `doc-reader` agent and wait for its JSON output.
Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single point of failure: doc-reader is now the sole factual basis with no fallback, so if it returns incomplete or malformed JSON, phases 2, 3, and 7 silently operate on empty/wrong facts — the daily discussion post will be misleading with no indication of failure.

💡 Suggested fix

Add an explicit fallback clause:

Launch the doc-reader agent and wait for its JSON output.
Use its output as the primary factual basis for Phases 2, 3, and 7.
If doc-reader fails or returns invalid/empty JSON, fall back to reading the six documentation files directly before continuing.

The previous instruction listed the six files explicitly — keeping that list as a fallback path preserves correctness when the small-model sub-agent is unreliable, without adding cost on the happy path.

- Focus on the **user experience** of reading and following the docs
- Think about what would prevent successful adoption, not perfection
- This is a daily workflow - findings should be stored in cache-memory for tracking trends over time
- Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review-history.jsonl schema is undefined: "append one JSON line per run" specifies no fields, so each run may write a different structure, making trend-tracking unreliable by design.

💡 Suggested fix

Pin a minimal schema inline:

Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run).
Required fields: {"date":"YYYY-MM-DD","score_10":N,"critical_count":N,"major_count":N,"minor_count":N,"top_finding":"..."}

Without stable field names, any downstream consumer cannot parse the history file reliably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[agentic-token-optimizer] Optimize: Claude Code User Documentation Review (705 AIC/run)

3 participants