optimize: reduce claude-code-user-docs-review AIC cost ~20–33% by eliminating redundant main-agent work#38401
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…0-33%) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #38401 does not have the implementation label and has 0 new lines of code in business logic directories (≤100 threshold). Neither Condition A nor Condition B is met. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧪 Test Quality Sentinel completed test quality analysis. No test files were added or modified in this PR. The PR only modifies workflow markdown and lock files (.github/workflows/claude-code-user-docs-review.md, .github/workflows/claude-code-user-docs-review.lock.yml, .github/workflows/daily-token-consumption-report.lock.yml). Test Quality Sentinel skipped. |
There was a problem hiding this comment.
Pull request overview
This PR updates the claude-code-user-docs-review workflow prompt to reduce AI credit (AIC) usage by pushing more work into sub-agents and simplifying the main agent’s required output, and also aligns generated workflow artifacts/env to include models.json.
Changes:
- Simplifies Phase 1 by delegating documentation reading entirely to the
doc-readersub-agent and removing several post-sub-agent synthesis loops. - Collapses the Phase 7 report template to fewer sections with an explicit 1,000-word cap, and adds guidance to consolidate run history into a single JSONL file.
- Updates the daily token consumption report lock workflow to include
/tmp/gh-aw/models.jsonin artifacts and to pointGH_AW_MODELS_JSON_PATHat/tmp/gh-aw/models.json.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/daily-token-consumption-report.lock.yml | Adds models.json to activation artifacts and standardizes GH_AW_MODELS_JSON_PATH to /tmp/gh-aw/models.json. |
| .github/workflows/claude-code-user-docs-review.md | Refactors the agent prompt to eliminate redundant main-agent work and reduces the required output template. |
| .github/workflows/claude-code-user-docs-review.lock.yml | Regenerates the compiled lockfile metadata to reflect the updated prompt content. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
| Launch the `doc-reader` agent and wait for its JSON output. | ||
| Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly. |
| - Focus on the **user experience** of reading and following the docs | ||
| - Think about what would prevent successful adoption, not perfection | ||
| - This is a daily workflow - findings should be stored in cache-memory for tracking trends over time | ||
| - Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /zoom-out and /improve-codebase-architecture — requesting changes on reliability and quality-gate gaps.
The optimization direction is sound and the cost rationale is well-explained. Most concerns below are addressable in 1–2 lines each.
📋 Key Themes & Highlights
Key Themes
- Sub-agent fallback gap: Phase 1 now fully trusts
doc-readerwith no recovery path if it fails or returns partial data; a single defensive sentence would close this. - Quality gates removed wholesale: Phases 4–6 stripped all main-agent cross-checks. The auth criteria (Phase 6) are the most valuable to retain (~15 tokens) since they guard the workflow's primary finding category for Claude users.
- Trend metric lost: Removing the /10 score breaks cross-run comparability; this can be preserved in
review-history.jsonlwith zero impact on discussion word count. - Ambiguous directive: "Ignore legacy files" should be "Do not read legacy files" to actually prevent context growth.
- Unrelated lock file change:
daily-token-consumption-report.lock.ymlfixes a path bug unrelated to the doc-review optimization; worth noting in the PR description.
Positive Highlights
- ✅ Excellent PR description — each phase change is clearly justified with before/after diffs
- ✅ History consolidation to
review-history.jsonlis a smart operational improvement - ✅ Merging Engine Comparison + Tool Availability + Example Parity into one "Engine & Tool Matrix" section is cleaner architecture
- ✅ 1,000-word cap on Phase 7 is a good guardrail
- ✅ The
doc-readersub-agent was already purpose-built to read those 6 files — delegating fully to it is the right long-term direction
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 348.6 AIC · ⌖ 14 AIC
|
|
||
| Use the `doc-reader` agent to gather structured facts from the six core documentation files. Use its JSON output as the factual basis for Phases 2, 3, and 7. | ||
| Launch the `doc-reader` agent and wait for its JSON output. | ||
| Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly. |
There was a problem hiding this comment.
[/zoom-out] Removing the direct file reads with no fallback means the main agent is completely dependent on doc-reader—if it returns incomplete or malformed JSON there is no recovery path.
💡 Suggested defensive clause
Add a single line to guard against silent failure:
Launch the `doc-reader` agent and wait for its JSON output.
Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly.
**If the output is missing key fields or appears malformed, note the gap in the Executive Summary and continue with available data.**Previously the 6-item file list served as both instructions and an implicit quality signal; without a fallback, a timed-out or empty sub-agent response will silently produce a hallucinated review with zero grounding.
| - Do Claude workflows have the same capabilities as Copilot workflows? | ||
| - Are there features that only work with specific engines? | ||
| - Is it clear which tools are engine-agnostic? | ||
| Use the `engine-example-counter` agent. Its `parity_observations` field feeds directly into Phase 7 "Engine & Tool Matrix" section. |
There was a problem hiding this comment.
[/zoom-out] The "Analyze:" block was the main agent's independent verification layer over engine-example-counter output—its removal means the main agent passes sub-agent data through unchecked.
💡 A lightweight quality gate preserves most of the savings
The full question list cost tokens; a one-line sanity check does not:
Use the `engine-example-counter` agent. Its `parity_observations` field feeds directly into Phase 7 "Engine & Tool Matrix" section.
> Sanity-check: confirm at least one Claude and one Copilot example appear; escalate to Major if either is absent.This retains ~85% of the token reduction while keeping one explicit signal the main agent must validate.
| - Assumption that everyone uses Copilot tokens | ||
| - No alternative secret names documented | ||
| - No guidance on obtaining Claude API keys | ||
| Use the `auth-doc-extractor` agent. Its `auth_gaps_or_missing_instructions` feeds directly into Phase 7 "Auth Gaps" section. |
There was a problem hiding this comment.
[/zoom-out] The removed "Check for:" list covered the most user-facing blockers (missing Claude API key docs, Copilot-centric token assumptions). Removing it means the main agent will silently accept auth-doc-extractor output even if it omits these critical gaps.
💡 Single-sentence retention for the highest-priority signal
Auth coverage is the core value of this workflow for Claude users. Consider keeping one line:
Use the `auth-doc-extractor` agent. Its `auth_gaps_or_missing_instructions` feeds directly into Phase 7 "Auth Gaps" section.
**Priority check**: if Claude API key setup or non-Copilot token alternatives are undocumented, mark as Critical.This costs ~15 tokens and protects the workflow's primary finding category.
| - **Severity Findings**: Critical Blockers → Major Obstacles → Minor Confusion (combined in one `<details>` block) | ||
| - **Engine & Tool Matrix** — merge engine comparison and tool-engine-classifier output into one table (Copilot / Claude / Codex / Custom × Setup / Examples / Auth / Score); incorporate `parity_observations` from engine-example-counter | ||
| - **Auth Gaps** — use `auth-doc-extractor` JSON directly | ||
| - **Recommended Actions** (Priority 1 / 2 / 3) |
There was a problem hiding this comment.
[/improve-codebase-architecture] Removing "Conclusion — answer 'Can Claude Code users adopt gh-aw?' with overall score /10" breaks the one comparable metric across daily runs. Priority rankings (1/2/3) cannot be plotted as a trend.
💡 Preserve the score in JSONL without adding words to the discussion body
Since findings are written to review-history.jsonl, include the score there instead:
{"date": "2026-06-10", "score": 7, "critical": 1, "major": 3, "top_finding": "..."}Alternatively, fold it into the Executive Summary as a single line: "Overall adoption readiness: 7/10 — auth gap is the primary blocker." This adds ~15 words and zero sections while keeping the trend signal alive.
| - Focus on the **user experience** of reading and following the docs | ||
| - Think about what would prevent successful adoption, not perfection | ||
| - This is a daily workflow - findings should be stored in cache-memory for tracking trends over time | ||
| - Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist. |
There was a problem hiding this comment.
[/improve-codebase-architecture] "Ignore legacy files if they exist" is ambiguous — it is unclear whether the agent should avoid reading them, writing to them, or both. If the agent globs *.jsonl at startup, legacy files will still inflate the context window despite this directive.
💡 Clearer and more actionable phrasing
- Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist.
+ Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Do not read `claude-doc-review-history.jsonl`, `claude-docs-review-trend.json`, or any other legacy history file.Naming the legacy files explicitly also makes a one-off cleanup step obvious.
| path: | | ||
| /tmp/gh-aw/aw_info.json | ||
| /tmp/gh-aw/model_multipliers.json | ||
| /tmp/gh-aw/models.json |
There was a problem hiding this comment.
[/zoom-out] This lock file change (adding models.json to the cache artifact list and correcting GH_AW_MODELS_JSON_PATH from ${RUNNER_TEMP}/gh-aw/actions/models.json to /tmp/gh-aw/models.json at two sites) is unrelated to the claude-code-user-docs-review optimization and is not mentioned in the PR description.
💡 What this likely means and what to do
If the previous path was wrong, the token-consumption report has been running silently without model data — that is a latent bug worth calling out explicitly. Two options:
- Separate PR: extract these lock file changes into their own PR with a clear description of the bug being fixed.
- PR description update: if this is intentional and tied to the doc-review change, add a
### Bonus fixsection to the PR body explaining the path correction.
As written, a reviewer cannot tell whether this is a deliberate fix, a rebase artifact, or an accidental inclusion.
There was a problem hiding this comment.
Non-blocking observations
Two medium-severity concerns in claude-code-user-docs-review.md; the lock file changes are correct.
Findings
Lock file changes (daily-token-consumption-report.lock.yml, claude-code-user-docs-review.lock.yml)
The GH_AW_MODELS_JSON_PATH fix is correct: the old path pointed at the unmerged source file (${RUNNER_TEMP}/gh-aw/actions/models.json), bypassing frontmatter model-cost overrides. The new path (/tmp/gh-aw/models.json) correctly uses the merged output produced by writeMergedModelsJSON in generate_aw_info.cjs. Adding models.json to the artifact upload/download list is the necessary complement.
claude-code-user-docs-review.md — Phase 1 (medium)
Removing direct doc reads and making doc-reader output the sole factual basis with no fallback creates a silent-failure mode: if doc-reader returns empty or malformed JSON, the daily Discussion is posted with hallucinated or missing findings. Inline comment posted at line 70.
claude-code-user-docs-review.md — review-history.jsonl (medium)
The append-one-JSON-line instruction omits a schema. Trend analysis requires stable field names across runs. Inline comment posted at line 146.
🔎 Code quality review by PR Code Quality Reviewer · ⌖ 13.5 AIC
|
|
||
| Use the `doc-reader` agent to gather structured facts from the six core documentation files. Use its JSON output as the factual basis for Phases 2, 3, and 7. | ||
| Launch the `doc-reader` agent and wait for its JSON output. | ||
| Use its output as the sole factual basis for Phases 2, 3, and 7 — do not read the documentation files directly. |
There was a problem hiding this comment.
Single point of failure: doc-reader is now the sole factual basis with no fallback, so if it returns incomplete or malformed JSON, phases 2, 3, and 7 silently operate on empty/wrong facts — the daily discussion post will be misleading with no indication of failure.
💡 Suggested fix
Add an explicit fallback clause:
Launch the
doc-readeragent and wait for its JSON output.
Use its output as the primary factual basis for Phases 2, 3, and 7.
Ifdoc-readerfails or returns invalid/empty JSON, fall back to reading the six documentation files directly before continuing.
The previous instruction listed the six files explicitly — keeping that list as a fallback path preserves correctness when the small-model sub-agent is unreliable, without adding cost on the happy path.
| - Focus on the **user experience** of reading and following the docs | ||
| - Think about what would prevent successful adoption, not perfection | ||
| - This is a daily workflow - findings should be stored in cache-memory for tracking trends over time | ||
| - Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run). Do not create new history file names. Ignore legacy files if they exist. |
There was a problem hiding this comment.
review-history.jsonl schema is undefined: "append one JSON line per run" specifies no fields, so each run may write a different structure, making trend-tracking unreliable by design.
💡 Suggested fix
Pin a minimal schema inline:
Write findings summary ONLY to `review-history.jsonl` (append one JSON line per run).
Required fields: {"date":"YYYY-MM-DD","score_10":N,"critical_count":N,"major_count":N,"minor_count":N,"top_finding":"..."}
Without stable field names, any downstream consumer cannot parse the history file reliably.
The
claude-code-user-docs-reviewworkflow was consuming ~705 AIC/run due to the main agent duplicating work already delegated to sub-agents (redundant file reads, post-processing loops, and an over-specified output template).Phase 1 — eliminate duplicate file reads
Replaced the 6-item numbered file list + trailing sub-agent dispatch with a single direct sub-agent launch. Previously the main agent
cat'd all 6 docs before launchingdoc-reader, which reads the same files. The main agent now delegates entirely:Phases 4–6 — remove post-sub-agent synthesis loops
Stripped the "Analyze:" / "Questions to answer:" / "Check for:" follow-up blocks. Sub-agent outputs (
parity_observations, classification table,auth_gaps_or_missing_instructions) now pipe directly into Phase 7 sections without the main agent re-answering questions the sub-agents already answer.Phase 7 — collapse output template
Reduced from 9 sections to 5 with an explicit 1,000-word cap. Removed: Persona Context (redundant with workflow description), standalone Example Parity (subsumed into Engine & Tool Matrix), and Conclusion /10 (redundant with Recommended Actions priority ranking).
History file consolidation
Added a directive to write findings only to
review-history.jsonland ignore legacy files (claude-doc-review-history.jsonl,claude-docs-review-trend.json, etc.), preventing history proliferation that grows startup context each run.