fix(benchmarks): codspeed improvements#20972
Conversation
|
|
This PR is packaged and the instant preview is available (936939b). Install it locally:
npm i -D webpack@https://pkg.pr.new/webpack@936939b
yarn add -D webpack@https://pkg.pr.new/webpack@936939b
pnpm add -D webpack@https://pkg.pr.new/webpack@936939b |
| return { | ||
| filter: | ||
| typeof raw.filter === "string" && raw.filter | ||
| ? new RegExp(raw.filter) |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20972 +/- ##
=======================================
Coverage 90.91% 90.91%
=======================================
Files 573 573
Lines 58639 58639
Branches 15774 15774
=======================================
Hits 53312 53312
Misses 5327 5327
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR updates webpack’s benchmark harness to integrate CodSpeed’s Tinybench plugin and to support walltime (timing) benchmarks on CodSpeed’s codspeed-macro runners, simplifying the benchmark setup and CI workflow.
Changes:
- Switch benchmark instrumentation from a custom
@codspeed/coreintegration to@codspeed/tinybench-plugin. - Rework
BenchmarkTestCases.benchmark.mjsto register walltime-friendly scenarios (cold builds, watch rebuilds, and cache warmups) without baseline/sharding logic. - Update the GitHub Actions workflow to run as a single CodSpeed walltime job on
codspeed-macro.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
test/BenchmarkTestCases.benchmark.mjs |
Replaces custom CodSpeed harness with @codspeed/tinybench-plugin, adds scenario-based task registration, removes sharding/baseline logic. |
.github/workflows/benchmarks.yml |
Moves benchmarks to codspeed-macro runner and switches to walltime mode; simplifies job structure. |
package.json |
Updates benchmark script flags and swaps dependency from @codspeed/core to @codspeed/tinybench-plugin. |
yarn.lock |
Adds lock entries for @codspeed/tinybench-plugin and transitive deps (e.g., stack-trace). |
Comments suppressed due to low confidence (1)
.github/workflows/benchmarks.yml:44
- The workflow no longer excludes the
*-on-schedulebenchmark cases (previously done viaNEGATIVE_FILTER: on-schedule). As a result,test/benchmarkCases/typescript-long-on-schedulewill run on every PR/push, which likely increases runtime substantially. If these cases are still intended to be schedule-only, restoreNEGATIVE_FILTER: on-schedule(or pass--negative-filter on-scheduletoyarn benchmark).
- name: Run benchmarks
uses: CodSpeedHQ/action@3194d9a39c4d46684cb44bf7207fc56626aad8fd # v4.15.1
with:
run: yarn benchmark
mode: "walltime"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merging this PR will create unknown performance changes
Performance Changes
Comparing Footnotes
|
| token: ${{ secrets.CODSPEED_TOKEN }} | ||
| env: | ||
| LAST_COMMIT: 1 | ||
| NEGATIVE_FILTER: on-schedule |
There was a problem hiding this comment.
Please spit them for performance reasons on CI, memory and walltime should be different CI tasks, memory are very unstable if you will mix them
There was a problem hiding this comment.
Where is that documented? As far as I could tell, it's recommended to run when in a single CI step
There was a problem hiding this comment.
We asked about unstable memory test using chat at their site, it was recommendation from them
There was a problem hiding this comment.
I think this is the current recommendation, but I can ask
| "three-long": PROD_ONLY, | ||
| "typescript-long-on-schedule": PROD_ONLY, | ||
| "wasm-modules-async": DEV_AND_REBUILD, | ||
| "wasm-modules-sync": DEV_AND_REBUILD |
There was a problem hiding this comment.
We should make all scenarios for all cases, we already in past have problems where we miss performance degradation due missing scenario
| "@changesets/cli": "^2.29.8", | ||
| "@changesets/get-github-info": "^0.8.0", | ||
| "@codspeed/core": "^5.4.0", | ||
| "@codspeed/tinybench-plugin": "^5.4.0", |
There was a problem hiding this comment.
tinybench-plugin uses old version of tinybench
There was a problem hiding this comment.
Yes, it's off by one major version
| ); | ||
| }); | ||
|
|
||
| bench.run(); |
There was a problem hiding this comment.
We have this script because I have internal CI which check different performance between version, commits and etc, in future I want to share it and make it publicly to see how we faster or slow between versions, not sure we really can remove it, but yes, we can simplify it
There was a problem hiding this comment.
I'm afraid I don't understand this request. The benchmarking file shouldn't be responsible for evaluating this commit versus that commit for each benchmark, that's what CodSpeed is for
There was a problem hiding this comment.
Yes, we should support it
alexander-akait
left a comment
There was a problem hiding this comment.
We need a discussion about many things here, also I don't think it make our benchmarks stable, I don't see why it should be, code in our benchmark file 1:1 like in the plugin
|
The main question - why do you think it will have stable benchmarks? |
I think by using walltime, which measures both CPU time and I/O time, we will be able to see a bigger picture. Plus, by not having additional logic in the benchmark runner, we can discount any issues in that file as causing memory issues |
We already use it, look at the original code and our code, we have the same just with some internal changes to be more stable and have ability to check something between versions, I really don't see any changes which can make it more stable, if I something missing, please show me |
Replaces our custom benchmark tool with
@codspeed/plugin-tinybench