Skip to content

Commit 3bad25e

Browse files
qunabugithub-advanced-security[bot]cursoragent
authored
Handsontable (scrolling) performance comparison harness (#12229)
* Peformance Tests * Forgot about actually run the test * Potential fix for code scanning alert no. 53: Incomplete string escaping or encoding Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * fix(perf-tests): remove unused standalone chromium launch Co-authored-by: Mateusz Wojczal <qunabu@users.noreply.github.com> * Fixing PR comment * fix(perf-tests): address remaining bugbot harness issues Co-authored-by: Mateusz Wojczal <qunabu@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateusz Wojczal <qunabu@users.noreply.github.com>
1 parent 7d66873 commit 3bad25e

16 files changed

Lines changed: 1859 additions & 0 deletions
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
name: Performance tests
2+
3+
on:
4+
pull_request:
5+
types:
6+
- opened
7+
- reopened
8+
- synchronize
9+
10+
concurrency:
11+
group: ${{ github.workflow }}-${{ github.ref }}
12+
cancel-in-progress: true
13+
14+
permissions:
15+
contents: read
16+
pull-requests: write
17+
18+
jobs:
19+
performance-tests:
20+
name: Performance tests
21+
runs-on: ubuntu-latest
22+
timeout-minutes: 90
23+
steps:
24+
- name: Checkout
25+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # https://github.com/actions/checkout/releases/tag/v6.0.2
26+
27+
- name: Install pnpm
28+
uses: pnpm/action-setup@41ff72655975bd51cab0327fa583b6e92b6d3061 # https://github.com/pnpm/action-setup/releases/tag/v4.2.0
29+
with:
30+
version: 10.30.2
31+
32+
- name: Use Node.js
33+
uses: actions/setup-node@6044e13b5dc448c55e2357c09f80417699197238 # https://github.com/actions/setup-node/releases/tag/v6.2.0
34+
with:
35+
node-version-file: '.nvmrc'
36+
cache: 'pnpm'
37+
registry-url: https://registry.npmjs.org/
38+
39+
- name: Run performance test setup
40+
working-directory: performance-tests
41+
run: bash test.sh
42+
43+
# - name: Install Playwright system dependencies
44+
# working-directory: performance-tests
45+
# run: npx playwright install-deps chromium
46+
47+
# - name: Run Playwright performance tests
48+
# working-directory: performance-tests
49+
# env:
50+
# CI: true
51+
# run: npx playwright test
52+
53+
- name: Upload performance-tests output
54+
if: always()
55+
uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # https://github.com/actions/upload-artifact/releases/tag/v7.0.0
56+
with:
57+
name: performance-tests-output
58+
path: performance-tests/output
59+
retention-days: 14
60+
if-no-files-found: warn
61+
62+
- name: Ensure PR comment body exists
63+
if: always()
64+
run: |
65+
mkdir -p performance-tests/output
66+
if [ ! -f performance-tests/output/result.md ]; then
67+
{
68+
echo '## Performance tests'
69+
echo
70+
echo 'No result.md was generated. Check the workflow log and downloaded artifacts.'
71+
} > performance-tests/output/result.md
72+
fi
73+
74+
- name: Compose pull request comment body
75+
if: always()
76+
env:
77+
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
78+
run: |
79+
{
80+
echo '## Performance tests results'
81+
echo
82+
echo "Artifacts: **[performance-tests-output](${RUN_URL})** (open the workflow run, then download **performance-tests-output**)."
83+
echo
84+
echo '---'
85+
echo
86+
cat performance-tests/output/result.md
87+
} > performance-tests/output/pr-comment.md
88+
89+
- name: Post performance summary to pull request
90+
if: always()
91+
uses: marocchino/sticky-pull-request-comment@773744901bac0e8cbb5a0dc842800d45e9b2b405 # https://github.com/marocchino/sticky-pull-request-comment/tree/v2.9.4
92+
with:
93+
header: performance-tests
94+
path: performance-tests/output/pr-comment.md

performance-tests/.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
2+
# Playwright
3+
node_modules/
4+
/blob-report/
5+
/playwright/.cache/
6+
output
7+
!output/.gitkeep
8+
# Playwright
9+
/playwright/.auth/
10+
.DS_Store

performance-tests/README.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Handsontable performance comparison
2+
3+
This project benchmarks **two npm versions of Handsontable** under the same Playwright scenario (scroll stress on a grid). Chrome performance traces are recorded during each run; after all tests finish, a **global teardown** script aggregates those traces and writes a **Markdown comparison** to `output/result.md`.
4+
5+
## Prerequisites
6+
7+
- Node.js
8+
- Playwright browsers: `npx playwright install` (if not already installed)
9+
10+
## How to run
11+
12+
```bash
13+
VERSION_1=0.0.0-next-b032e34-20260319 VERSION_2=17.0.0 ITERATIONS=3 npx playwright test
14+
```
15+
16+
### Environment variables
17+
18+
| Variable | Meaning |
19+
|-------------|---------|
20+
| **`VERSION_1`** | **Baseline** Handsontable version string (npm tag or exact version). Used to load that release from the CDN in the generated fixture and to name trace files (`output/test-{version}-{iteration}.json`). The teardown report treats **the first version as baseline** when computing percent and absolute deltas. |
21+
| **`VERSION_2`** | **Comparison** Handsontable version—the build you want to compare against `VERSION_1`. Same fixture/trace naming rules as `VERSION_1`. |
22+
| **`ITERATIONS`** | How many times **each** version is run. Default in code is **3** if unset. Traces are written as `test-{version}-1.json``test-{version}-{ITERATIONS}.json`. The teardown step **averages** metrics across those files per version (arithmetic mean), then compares the two averaged results. |
23+
24+
If you omit these variables, the tests fall back to `VERSION_1=15.0.0`, `VERSION_2=15.1.0`, and `ITERATIONS=3` (see `tests/scroll-down.spec.ts` and `tests/teardown.mjs`).
25+
26+
## What happens when you run the suite
27+
28+
1. **Tests** (`tests/scroll-down.spec.ts`): For each version and iteration, a fixture HTML is built from `tests/template.html` (Handsontable loaded at the chosen version). Playwright opens the page, starts **Chrome DevTools Protocol** tracing (`Tracing.start` with timeline and CPU profiler categories), performs repeated wheel scrolling on the table holder, stops tracing, and saves the JSON trace to `output/test-{version}-{iteration}.json`.
29+
30+
2. **Teardown** (`tests/teardown.mjs`, configured as `globalTeardown` in `playwright.config.ts`): After all tests complete, it loads the trace files for `VERSION_1` and `VERSION_2`, averages each side across iterations, and builds a Markdown table (metrics, both versions, % change, absolute change). Output is written to **`output/result.md`** and also printed to the console.
31+
32+
You can re-run aggregation without Playwright using:
33+
34+
```bash
35+
node tests/teardown.mjs
36+
```
37+
38+
(use the same `VERSION_1`, `VERSION_2`, and `ITERATIONS` in the environment so file names match what was produced).
39+
40+
## Comparison methodology: same machine, relative—not golden records
41+
42+
**What you are measuring:** Each full `npx playwright test` run exercises **both** `VERSION_1` and `VERSION_2` on the **same physical machine**, under the **same OS session**, and in **one continuous batch** (the suite is configured with a **single worker**, so tests run **sequentially** and close together in time). The report compares those two builds **to each other** under those shared conditions.
43+
44+
**What you are *not* doing:** There is **no golden record** in the repo—no stored baseline trace or pass/fail threshold for absolute milliseconds or heap size. Numbers in `output/result.md` are **real values for that machine and that run** only. Another laptop, Chrome update, or background load will shift the raw figures; the **useful signal** is usually the **direction and size of the gap** between the two columns (and patterns like **Max Nodes**), not whether e.g. “Scripting = 3000 ms” matches a fixed target.
45+
46+
**Why this strategy:** Comparing versions **back-to-back on one machine** keeps **most environmental noise shared**: both builds see the same CPU model, GPU, memory, power mode, and roughly the same thermal and OS jitter. That makes **relative** comparisons (percent change, which side is higher on Scripting or node count) **much more stable** than publishing a single absolute trace as a canonical benchmark. **Iterations** (`ITERATIONS`) then average out **within-run** variance (GC pauses, scheduling spikes).
47+
48+
**Tips if you need steadier numbers:** Use AC power where relevant, close heavy background apps, use a consistent Chrome/Playwright channel, and avoid starting other large workloads mid-suite. For anything you publish externally, note **hardware and Chrome version** alongside the table so readers know the absolutes are contextual.
49+
50+
## How metrics are calculated (DevTools-aligned)
51+
52+
Summaries are **not** ad-hoc timings: they follow the same model as **Chrome DevTools Performance** when you load a trace JSON—category times (System, Scripting, Rendering, Loading, Painting, etc.), the auto-selected **main-thread time range**, and optional **UpdateCounters** / heap-style stats from the trace.
53+
54+
Implementation details, including references to the corresponding `devtools-frontend` sources, are documented in **`trace-parser.mjs`** (see the file header and inline comments). That module parses the performance JSON, maps events to DevTools categories, applies the same windowing logic as the Timeline UI, and can average multiple trace files for one version.
55+
56+
To inspect a single trace from the CLI:
57+
58+
```bash
59+
node trace-parser.mjs output/test-17.0.0-1.json
60+
# Multiple files → average across runs:
61+
node trace-parser.mjs output/test-17.0.0-1.json output/test-17.0.0-2.json
62+
```
63+
64+
Flags: `--debug` (extra internal details), `--full` (include rarely used categories in the text output).
65+
66+
## Example output and how to read it
67+
68+
### Use case: 0.0.0-next-b032e34-20260319 vs 17.0.0 — scrolling performance improved
69+
70+
Below is a **real example** from one run (`VERSION_1=0.0.0-next-b032e34-20260319`, `VERSION_2=17.0.0`, `ITERATIONS=3`). Your numbers will differ by machine and run; the pattern is what matters.
71+
72+
### Sample comparison table
73+
74+
| Metric | 0.0.0-next-b032e34-20260319 | 17.0.0 | % change | Value change |
75+
| --------------------- | ---------------------------: | -------: | -------: | -----------: |
76+
| Total (ms) | 11860 | 13379 | +12.8% | +1519 |
77+
| System (ms) | 612 | 498 | -18.6% | -114 |
78+
| Scripting (ms) | 2971 | 5504 | +85.3% | +2533 |
79+
| Rendering (ms) | 2115 | 2253 | +6.5% | +138 |
80+
| Loading (ms) | 2 | 1 | -50.0% | -1 |
81+
| Painting (ms) | 395 | 335 | -15.2% | -60 |
82+
| Experience (ms) |||||
83+
| Idle (ms) | 5764 | 4787 | -17.0% | -977 |
84+
| Min JS heap (kb) | 925 kb | 925 kb | +0.0% | 0 |
85+
| Max JS heap (Mb) | 99.9 Mb | 100.9 Mb | +1.0% | +1.0 Mb |
86+
| Min Documents (count) | 2 | 2 | +0.0% | 0 |
87+
| Max Documents (count) | 4 | 4 | +0.0% | 0 |
88+
| Min Nodes (count) | 20 | 20 | +0.0% | 0 |
89+
| Max Nodes (count) | 182408 | 182282 | -0.1% | -126 |
90+
| Min Listeners (count) | 0 | 0 | 0% | 0 |
91+
| Max Listeners (count) | 156 | 153 | -1.9% | -3 |
92+
93+
### Why the next build looks much better here (for this scroll test)
94+
95+
In this trace, **`0.0.0-next-b032e34-20260319` is clearly ahead on the work that dominates scrolling**:
96+
97+
1. **Scripting (~3.0s vs ~5.5s)** — This is the largest gap. DevTools **Scripting** is time spent in JavaScript and closely related main-thread work (timers, rAF, event dispatch, GC, etc.). Handsontable does most of its scroll path in JS (virtualisation, cell updates, hooks). **Roughly 2.5s more scripting** on 17.0.0 in the same scenario means the stable build is spending far more CPU in script during the wheel loop; that usually translates directly to jank risk and lower scroll throughput.
98+
99+
2. **Total (ms) (~11.9s vs ~13.4s)** — In this report, **Total** matches the **length of the auto-selected main-thread window** in the trace (same basis as the category rows). A longer window with **higher Scripting** and **Rendering** fits the story: more main-thread activity is packed into the interaction. So the next build is not only “less scripting”; the summarized slice is also **shorter**, which is consistent with finishing the same scripted scroll with less sustained busy work.
100+
101+
3. **Rendering****17.0.0** is slightly higher (~138 ms averaged). That is smaller than the scripting delta but still directionally “more layout/style/paint pipeline work” on the older line.
102+
103+
4. **Where 17.0.0 looks better on individual lines****System** and **Painting** are a bit lower, and **Idle** is lower on 17.0.0. Those numbers do **not** mean 17.0.0 was “smoother” overall here: **Idle** is “unattributed time in the selected range,” and category totals are interdependent. The decisive signal for scroll cost is still **Scripting** (and secondarily **Rendering**), where the next build wins decisively.
104+
105+
5. **Heap / DOM counters****Max JS heap** is only ~1% higher on 17.0.0; **nodes and listeners** are nearly identical. So this comparison is **not** explained by a much heavier DOM tree; it points to **how much work each version does per scroll** (especially script), not a gross memory shape change.
106+
107+
**Caveat:** One machine and three iterations is enough to spot a large regression; for release decisions, repeat on cold/warm runs and different hardware, and inspect traces in DevTools for hotspots.
108+
109+
### Use case: 15.1.0 vs 15.0.0 — DOM nodes not removed while scrolling
110+
111+
This run compares **15.1.0** (baseline, first column) against **15.0.0** (second column). Command:
112+
113+
```bash
114+
VERSION_1=15.1.0 VERSION_2=15.0.0 ITERATIONS=3 npx playwright test
115+
```
116+
117+
**What went wrong in 15.1.0:** In that line, **scrolling no longer tears down off-screen cell DOM** the way 15.0.0 does. New rows keep getting materialised, so **nodes accumulate** for the whole scroll instead of staying bounded. That is a **virtualisation / lifecycle regression**: the grid still “scrolls,” but the document keeps growing.
118+
119+
**What the report shows:** **Max Nodes (count)** is the smoking gun — tens or hundreds of thousands on **15.1.0** versus a **small, stable** peak on **15.0.0** (~2k in this run). **Max JS heap** follows the same story (larger peak on 15.1.0). Extra DOM also drives **more Scripting** and **Rendering** time on 15.1.0 and a slightly longer **Total** window, because the browser does more work per tick on a much larger tree.
120+
121+
#### Sample comparison table (same scenario, three iterations)
122+
123+
| Metric | 15.1.0 | 15.0.0 | % change | Value change |
124+
| --------------------- | ------: | ------: | -------: | -----------: |
125+
| Total (ms) | 10571 | 9786 | -7.4% | -785 |
126+
| System (ms) | 442 | 422 | -4.5% | -20 |
127+
| Scripting (ms) | 3530 | 3100 | -12.2% | -430 |
128+
| Rendering (ms) | 2308 | 1758 | -23.8% | -550 |
129+
| Loading (ms) | 2 | 1 | -50.0% | -1 |
130+
| Painting (ms) | 311 | 292 | -6.1% | -19 |
131+
| Experience (ms) |||||
132+
| Idle (ms) | 3978 | 4213 | +5.9% | +235 |
133+
| Min JS heap (kb) | 925 kb | 925 kb | +0.0% | 0 |
134+
| Max JS heap (Mb) | 98.2 Mb | 88.0 Mb | -10.4% | -10.2 Mb |
135+
| Min Documents (count) | 2 | 2 | +0.0% | 0 |
136+
| Max Documents (count) | 4 | 4 | +0.0% | 0 |
137+
| Min Nodes (count) | 20 | 20 | +0.0% | 0 |
138+
| Max Nodes (count) | 182242 | 2144 | -98.8% | -180098 |
139+
| Min Listeners (count) | 0 | 0 | 0% | 0 |
140+
| Max Listeners (count) | 151 | 149 | -1.3% | -2 |
141+
142+
Percent and value change are **relative to the first column (15.1.0)**. Large negative **% change** on nodes/heap/scripting/rendering means **15.0.0** is lower on those metrics — i.e. healthier for this bug.
143+
144+
#### DevTools screenshots
145+
146+
The graphs below are from Chrome DevTools on the same kind of scroll session: **15.0.0** keeps the node count bounded; **15.1.0** shows a **sustained climb** as nodes are added and not removed.
147+
148+
##### 15.0.0
149+
150+
![15.0.0 — node count stays bounded during scroll](docs/15.0.0.png)
151+
152+
##### 15.1.0
153+
154+
![15.1.0 — nodes accumulate while scrolling](docs/15.1.0.png)
155+
156+
### Use cases: which rows to trust for what you care about
157+
158+
| Your goal | Primary metrics | Notes |
159+
| --------- | ---------------- | ----- |
160+
| Smoother scrolling / less jank during interaction | **Scripting**, **Rendering**, **Painting** | Scripting usually dominates for data grids; Rendering/Painting catch layout and compositing cost. |
161+
| Shorter “busy” interval in the trace window | **Total (ms)** with **Scripting** / **Rendering** | Total is the DevTools auto-range width; interpret it together with categories, not alone. |
162+
| Memory creep or tab stability | **Max JS heap**, **Max Nodes**, **Max Listeners** | Taken from `UpdateCounters` in the trace; same idea as the Memory timeline in DevTools. |
163+
| Virtualisation / DOM churn (e.g. cells not detached on scroll) | **Max Nodes** (then **Max JS heap**, **Scripting**, **Rendering**) | A bounded **Max Nodes** vs a runaway count across the same scroll usually matches DevTools Performance or Memory; compare the **15.1.0 vs 15.0.0** example above with `docs/15.0.0.png` and `docs/15.1.0.png`. |
164+
| Less “mystery” main-thread time | **System** vs **Scripting** | Large scripting with DevTools-aligned parsing is usually actionable (profiles, stacks); very high System may warrant a separate trace look. |
165+
| CLS / interaction timing (if populated) | **Experience** | Often empty unless the trace includes the right events; `` is normal for some scenarios. |
166+
167+
## Project layout (short)
168+
169+
- `tests/scroll-down.spec.ts` — scenario, CDP tracing, trace file output
170+
- `tests/teardown.mjs` — post-run Markdown report → `output/result.md`
171+
- `tests/template.html` — Handsontable test page template (`{{version}}` placeholder)
172+
- `trace-parser.mjs` — trace JSON → DevTools-style stats
173+
- `output/` — trace JSON files and `result.md` (ensure the directory exists or is created by the tests)

performance-tests/docs/15.0.0.png

314 KB
Loading

performance-tests/docs/15.1.0.png

336 KB
Loading
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.js
2+
*.min.js

performance-tests/fixtures/.gitkeep

Whitespace-only changes.
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
<!DOCTYPE html>
2+
<html>
3+
4+
<head>
5+
<title>Test 1</title>
6+
</head>
7+
8+
<body>
9+
<script src="handsontable.full.js"></script>
10+
11+
<div id="example1" class="hot"></div>
12+
13+
<script type="module">
14+
15+
const container = document.querySelector('#example1');
16+
new Handsontable(container, {
17+
data: Handsontable.helper.createSpreadsheetData(5000, 10),
18+
width: 600,
19+
height: 600,
20+
colHeaders: true,
21+
rowHeaders: true,
22+
licenseKey: 'non-commercial-and-evaluation',
23+
});
24+
25+
</script>
26+
27+
</html>
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
<!DOCTYPE html>
2+
<html>
3+
4+
<head>
5+
<title>Test 1</title>
6+
</head>
7+
8+
<body>
9+
<script src="https://cdn.jsdelivr.net/npm/handsontable@latest/dist/handsontable.full.js"></script>
10+
11+
<div id="example1" class="hot"></div>
12+
13+
<script type="module">
14+
15+
const container = document.querySelector('#example1');
16+
new Handsontable(container, {
17+
data: Handsontable.helper.createSpreadsheetData(5000, 10),
18+
width: 600,
19+
height: 600,
20+
colHeaders: true,
21+
rowHeaders: true,
22+
licenseKey: 'non-commercial-and-evaluation',
23+
});
24+
25+
</script>
26+
27+
</html>

0 commit comments

Comments
 (0)