Changes CI test sharding from per-project to per-task granularity.#11847
Draft
AlexeyKuznetsov-DD wants to merge 2 commits into
Draft
Changes CI test sharding from per-project to per-task granularity.#11847AlexeyKuznetsov-DD wants to merge 2 commits into
AlexeyKuznetsov-DD wants to merge 2 commits into
Conversation
Empty commit so this branch runs the current per-project test-slot sharding through CI as a baseline, before the per-task sharding change is committed on top for comparison. No files changed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rework -Pslot=X/Y sharding so a module's test variants (e.g. jdbc's test/forkedTest/oldH2Test/oldPostgresTest) hash to independent slots instead of serializing in a single job. Parse the slot selection once and cache it on the root project; keep Project.isInSelectedSlot (used by runMuzzle) at project granularity and add a task-level gate for Test tasks. The *Check aggregate and all coverage builds stay whole-module, project-slotted so per-module JaCoCo sees complete execution data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Changes CI test sharding from per-project to per-task granularity.
Previously, the
-Pslot=X/Yfilter was evaluated once per project viaProject.isInSelectedSlot: everyTesttaskin a module (e.g. jdbc's
test/forkedTest/oldH2Test/oldPostgresTest) was pinned to the same slot andserialized inside one job. Now each test variant is hashed independently on the key
"<projectPath>:<taskName>", soa module's variants spread across different CI slots.
Key changes in
CIJobsExtensions.kt:Task.isInSelectedSlotshards at task granularity;Project.isInSelectedSlotis kept for whole-project aggregateslike
runMuzzle.createRootTasknow depends directly on the in-slotTesttasks the umbrella would run (viatestTaskFilter),instead of gating with
onlyIfagainst a project-level slot. Out-of-slot modules aren't pulled into the job at all.SlotSelection/SlotHolder).-PcheckCoverageor aforceCoverageaggregate ->coverageEnabled) deliberately *fall back to whole-project slotting*, and the
check/JaCoCo aggregate stays project-level (testTaskFilter = null). This keeps per-module JaCoCo execution data complete.Motivation
The pipeline is gated by the slowest
test_instshard. Serializing all of a module's test variants in one job made thatshard taller than it needed to be. Sharding per task rebalances the work.
A/B comparison of two full pipelines on identical code (only the CI logic differs):
test_instshard (critical path)test_inst_latestslowest shardtest_smokeslowest shardThe slowest shard drops ~368 s and the pipeline wall-clock follows almost exactly — ~4 minutes faster (−7.6 %) while
using less total compute.
Additional Notes
While A/B testing, the pipeline's own
aggregate_test_countsjob revealed the old per-project logic was not runningevery test:
The gap is entirely in core modules that have multiple test tasks per module:
test_basetest_profilingtest_debuggertest_insttest_inst_latesttest_smokeRoot cause. The old
onlyIfgated each task onabs(project.path.hashCode() % totalSlots) + 1 == selectedSlot, buttotalSlots(the hash divisor) did not match the number of splits a job kind actually launched. Confirmed in a baselinetest_profilingjob: it ran withCI_NODE_INDEX=3, CI_NODE_TOTAL=13, so the hash produced buckets1..13while thejob only ran
selected=3— no profiling module hashed to bucket 3, so 0 profiling tests ran. The baseline pipelineeven emitted its own
⚠️ WARNING: 6 job(s) with zero tests. The same misalignment skipped ~57 % oftest_base(6,610 -> 15,268 tests per JVM, same modules).
Why it's now covered. Per-task sharding hashes on
"<projectPath>:<taskName>", and — more importantly —createRootTaskmakes each aggregate depend directly on theTesttasks selected for its own slot rather thanrelying on an
onlyIfcompared against a globally-numbered slot that didn't line up. Every test task now lands inexactly one slot that actually runs.
Why this isn't double-counting.
test_inst,test_inst_latest, andtest_smokeare unchanged to within 0.06 %.Those modules have ~one test task each, so they were never affected by the bug; if the new code ran anything twice, they
would inflate too. Only the multi-variant core modules gained tests — the signature of restored coverage, not duplicated
work.
Implemented and tested with Claude.