|
| 1 | +# Claude Code Instructions for matplotlib ezmon Data Collection |
| 2 | + |
| 3 | +## Quick Start |
| 4 | + |
| 5 | +**Read the full process document first:** |
| 6 | +``` |
| 7 | +Read /tmp/matplotlib/DATA_COLLECTION_PROCESS.md |
| 8 | +``` |
| 9 | + |
| 10 | +## What This Is |
| 11 | + |
| 12 | +This is a **matplotlib fork** used to validate **ezmon** (our pytest-testmon fork with NetDB support). We replay upstream CI history and verify ezmon correctly selects tests. |
| 13 | + |
| 14 | +## Current Task |
| 15 | + |
| 16 | +Continue processing workflow runs from the tracking table in `DATA_COLLECTION_PROCESS.md`. |
| 17 | + |
| 18 | +## Key Commands |
| 19 | + |
| 20 | +```bash |
| 21 | +# Check current state |
| 22 | +git log --oneline -3 |
| 23 | +gh run list --workflow=tests.yml --limit 3 |
| 24 | + |
| 25 | +# See remaining work |
| 26 | +git fetch upstream |
| 27 | +git log --oneline 9b61b471d0..upstream/main --first-parent | wc -l |
| 28 | +``` |
| 29 | + |
| 30 | +## Important Files |
| 31 | + |
| 32 | +| File/Directory | Purpose | |
| 33 | +|----------------|---------| |
| 34 | +| `DATA_COLLECTION_PROCESS.md` | **Main documentation** - process, tracking table | |
| 35 | +| `reports/` | **Detailed run reports** - organized in files of 5 runs each | |
| 36 | +| `.github/workflows/tests.yml` | Our ezmon-enabled workflow (preserve this!) | |
| 37 | +| `scripts/` | Automation scripts | |
| 38 | + |
| 39 | +## Report File Structure |
| 40 | + |
| 41 | +Reports are stored in `reports/` directory, organized by groups of 5: |
| 42 | +- `reports/runs_001-005.md` - Runs 1-5 (includes historical Commit 1) |
| 43 | +- `reports/runs_006-010.md` - Runs 6-10 |
| 44 | +- `reports/runs_011-015.md` - Runs 11-15 |
| 45 | +- `reports/runs_016-020.md` - Runs 16-20 |
| 46 | +- (continue pattern for future runs) |
| 47 | + |
| 48 | +## Critical: Reset Workflow |
| 49 | + |
| 50 | +When processing each run, you must reset to upstream state. **ALWAYS save our files first:** |
| 51 | + |
| 52 | +```bash |
| 53 | +# 1. SAVE before reset (including reports directory!) |
| 54 | +cp .github/workflows/tests.yml /tmp/our-tests.yml |
| 55 | +cp DATA_COLLECTION_PROCESS.md /tmp/DATA_COLLECTION_PROCESS.md |
| 56 | +cp CLAUDE.md /tmp/CLAUDE.md |
| 57 | +cp -r reports /tmp/our-reports |
| 58 | +cp -r scripts /tmp/our-scripts 2>/dev/null || true |
| 59 | + |
| 60 | +# 2. Reset (this wipes our files!) |
| 61 | +git reset --hard $UPSTREAM_SHA |
| 62 | + |
| 63 | +# 3. RESTORE after reset |
| 64 | +cp /tmp/our-tests.yml .github/workflows/tests.yml |
| 65 | +cp /tmp/DATA_COLLECTION_PROCESS.md DATA_COLLECTION_PROCESS.md |
| 66 | +cp /tmp/CLAUDE.md CLAUDE.md |
| 67 | +mkdir -p reports && cp -r /tmp/our-reports/* reports/ |
| 68 | +mkdir -p scripts && cp -r /tmp/our-scripts/* scripts/ 2>/dev/null || true |
| 69 | + |
| 70 | +# 4. Commit and push |
| 71 | +git add .github/workflows/tests.yml DATA_COLLECTION_PROCESS.md CLAUDE.md reports/ scripts/ |
| 72 | +git commit -m "Run N: Match upstream $UPSTREAM_SHA (PR #XXXX) - description" |
| 73 | +git push origin main --force |
| 74 | +``` |
| 75 | + |
| 76 | +## Required Git Diffs in Reports |
| 77 | + |
| 78 | +For EVERY run report, include TWO git diffs: |
| 79 | + |
| 80 | +### 1. Upstream Code Changes Diff |
| 81 | +Shows what changed between the previous run and current run: |
| 82 | +```bash |
| 83 | +git diff $PREVIOUS_UPSTREAM_SHA $CURRENT_UPSTREAM_SHA -- '*.py' |
| 84 | +``` |
| 85 | + |
| 86 | +### 2. Code Parity Verification Diff |
| 87 | +Shows that our commit matches upstream (only infrastructure files should differ): |
| 88 | +```bash |
| 89 | +git diff $UPSTREAM_SHA $OUR_COMMIT --stat |
| 90 | +git diff $UPSTREAM_SHA $OUR_COMMIT --name-only -- '*.py' '*.pyi' # Should be empty! |
| 91 | +``` |
| 92 | + |
| 93 | +Include both diffs in the report under: |
| 94 | +- **Git Diff (upstream code changes: Run N-1 → Run N)** |
| 95 | +- **Git Diff (code parity: our commit vs upstream)** |
| 96 | + |
| 97 | +## Code Parity Verification |
| 98 | + |
| 99 | +**CRITICAL**: Before pushing, ALWAYS verify: |
| 100 | +```bash |
| 101 | +# These must return 0 / empty: |
| 102 | +git diff $UPSTREAM_SHA HEAD -- '*.py' '*.pyi' | wc -l # Must be 0 |
| 103 | +git diff $UPSTREAM_SHA HEAD -- lib/matplotlib/ | wc -l # Must be 0 |
| 104 | + |
| 105 | +# Only these files should differ: |
| 106 | +git diff $UPSTREAM_SHA HEAD --name-only |
| 107 | +# Expected: .github/workflows/tests.yml, CLAUDE.md, DATA_COLLECTION_PROCESS.md, reports/*, scripts/* |
| 108 | +``` |
| 109 | + |
| 110 | +## Commit Efficiency |
| 111 | + |
| 112 | +**Bundle documentation updates with run commits** to avoid redundant CI runs: |
| 113 | +- Do NOT commit report updates separately |
| 114 | +- Include report updates in the NEXT run's commit |
| 115 | +- This prevents triggering extra workflow runs for doc-only changes |
| 116 | + |
| 117 | +## Comparing Results with Upstream |
| 118 | + |
| 119 | +**Our test matrix (5 variants):** |
| 120 | +- macOS: Python 3.11/3.12 on macos-14, Python 3.13 on macos-15 |
| 121 | +- Linux: Python 3.12 on ubuntu-22.04 (added Run 56) |
| 122 | +- Linux ARM: Python 3.12 on ubuntu-24.04-arm (added Run 56-retry) |
| 123 | + |
| 124 | +**Compare matching variants** when checking upstream results: |
| 125 | +- macOS variants: `gh api repos/matplotlib/matplotlib/actions/runs/ID/jobs --jq '.jobs[] | select(.name | test("macos")) | {name, conclusion}'` |
| 126 | +- Linux variants: `gh api repos/matplotlib/matplotlib/actions/runs/ID/jobs --jq '.jobs[] | select(.name | test("ubuntu")) | {name, conclusion}'` |
| 127 | + |
| 128 | +**Note**: Run 56 is the baseline for ubuntu-22.04. Run 56-retry adds ubuntu-24.04-arm baseline. |
| 129 | + |
| 130 | +## Investigation Requirements |
| 131 | + |
| 132 | +For EVERY run, investigate thoroughly: |
| 133 | +1. **Check upstream job results** - compare with our matching platform variants |
| 134 | +2. **Investigate any discrepancies** - different test counts, failures, etc. |
| 135 | +3. **Check for NetDB race conditions** - compare changed file counts across parallel jobs |
| 136 | +4. **Document findings** - include investigation results in reports |
| 137 | + |
| 138 | +## Do NOT |
| 139 | + |
| 140 | +- Modify code in `lib/matplotlib/` beyond what upstream has |
| 141 | +- Run `git reset --hard` without saving our files first (including reports/) |
| 142 | +- Push without preserving our workflow file, docs, AND reports |
| 143 | +- Skip the code parity verification |
| 144 | +- Skip including both git diffs in reports |
| 145 | +- Forget to save/restore the `reports/` directory during reset |
| 146 | +- Commit report updates separately (bundle with next run) |
| 147 | +- Compare with non-matching upstream variants (use macOS jobs only) |
| 148 | + |
| 149 | +## Session Initiation |
| 150 | + |
| 151 | +Future sessions should be started with: |
| 152 | +```bash |
| 153 | +cd /tmp/matplotlib && claude |
| 154 | +``` |
| 155 | + |
| 156 | +Then say: "Continue the matplotlib ezmon data collection process." |
0 commit comments