chore: add nats benchmarking pkg by sreya · Pull Request #26396 · coder/coder

sreya · 2026-06-16T05:30:23Z

Just an FYI I let the AI run with this one a little bit more than previous PRs related to production code since this is only for internal use.

Here's a comparison of the prototype vs the current implementation (prior to message queue, we should rerun to make sure we didn't seriously regress for some reason due to that)

High Volume Single Subject

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
100P x 10S x 8KB	397,993	529,403	3,415,410	4,516,496
100P x 10S x 64KB	51,683	55,435	516,818	554,347
10P x 100S x 8KB	353,889	441,064	6,745,004	5,478,560
10P x 100S x 64KB	61,924	65,222	4,263,735	5,794,167

High Volume Multi Subject (10 Subjects)

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
100P x 10S x 8KB	614,206	647,814	560,404	578,822
100P x 10S x 64KB	61,162	69,478	61,161	69,478
10P x 100S x 8KB	613,889	658,842	5,354,673	5,722,741
10P x 100S x 64KB	76,577	87,521	765,762	875,149

High Cardinality Publish Fan In

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
100P x 100S x 8KB	610,315	728,160	559,369	628,184
100P x 100S x 64KB	57,750	65,128	57,747	65,122

High Cardinality Fanout

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
300P x 100S x 8KB	575,351	664,537	522,973	599,190
300P x 100S x 64KB	46,218	59,080	46,217	59,080

High Cardinality Fanout

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
100P x 300S x 8KB	605,637	676,330	1,609,660	1,801,120
100P x 300S x 64KB	51,456	65,459	154,358	196,373

Global Broadcast

Row	Messages	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
10P x 10S x 8KB	100,000	70,358	73,329	679,927	704,078
10P x 10S x 64KB	20,000	9,053	8,618	79,210	73,614

Global Broadcast Subscriber Fanout

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
10P x 100S x 8KB	67,993	71,440	6,467,409	6,676,229
10P x 100S x 64KB	8,568	8,466	766,898	737,429

Sharded Broadcast, 10 Subjects

Row	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
10P x 100S x 8KB	688,390	804,489	5,548,640	6,615,891
10P x 100S x 64KB	40,044	44,122	400,347	441,221

Sharded High-Cardinality Thin Fanout

Row	Messages	Old Pubs/sec	New Pubs/sec	Old Dels/sec	New Dels/sec
100P x 100S x 8KB	1,000,000	759,281	800,269	625,637	763,617
100P x 100S x 64KB	200,000	39,539	45,424	39,535	45,424

Add an importable benchmark library for the NATS-backed pubsub that measures Pubs/sec and Deliveries/sec under high fan-out load across configurable subjects, payload sizes, publishers, subscribers, and replica counts. - Deterministic plan maps publishers/subscribers to subjects and replicas and precomputes exact per-subscriber delivery counts. - Probe-based readiness gate proves cross-route subscription interest has propagated before the measured phase, since routed drops are silent. - Workload-derived sizing for listener queues and server max pending prevents slow-consumer drops; any drop signal invalidates the run. - Bounded phases fail with shortfall, server-stats, and goroutine-dump diagnostics instead of hanging. - TestBenchMatrix (gated behind CODER_TEST_NATS_BENCH=1) runs the 8 KiB / 64 KiB x 1/5/10 replica matrix and renders grouped markdown tables; invalid runs never report a throughput number.

…ndings Address code review findings: - Derive MaxPending from the per-node sum of subject volumes with local subscribers, since MaxPending is a per-connection budget and one subscribe connection carries every coalesced subscription on its node. The previous per-subscriber derivation undersized multi-subject nodes. - Derive the per-subscription pending byte limit (new LocalQueueBytes knob) alongside the message limit; previously the 512 MiB default could trip before the derived message limit. - Pad message-count budgets with probe headroom so in-flight readiness probes cannot consume capacity sized for the benchmark burst. - Warn when the derived local queue hits its cap and can no longer guarantee a drop-free run. - Return partial Results on publish and flush errors for diagnostics, matching the documented Run contract. - Register subscriber cleanup before subscribing so partial subscribe failures are cleaned up by the workload itself. - Remove a no-op subscriber-node flush whose comment misattributed the interest guarantee; SubscribeWithErr flushes the SUB itself. - Record effective (overridden) configs in matrix report rows.

- Replace the env-gated TestBenchMatrix test with a cmd/natsbench CLI: no flags runs the default matrix, -scenario runs one named scenario, and shape flags (-payload/-subjects/-publishers/-subscribers/ -replicas) run a custom configuration. Markdown goes to stdout, logs to stderr, and a failed run exits nonzero. The report-to-file env var is gone; redirect stdout instead. - Remove Config.withDefaults: Run now requires a fully populated config and validates that Timeout is positive. The CLI defaults the timeout to 2 minutes. - Collapse the readiness gate's two plan inversions into a single subjectNodes mapping that serves as both the probe schedule and each subscriber's required probe set. - Document why startPublishers parks on a barrier and when the zero-expectation pre-close of allDone applies. - Drop digit-separator underscores from numeric literals.

- Encode readiness probes as a sentinel byte plus the decimal node index instead of a BigEndian uint64, dropping the encoding/binary and math dependencies and the overflow guard. - Return publisher errors over a buffered, closed-on-completion channel instead of writing into a shared slice, removing any question of a data race on the error collection. - Move the CLI driver into the natsbench package as the exported Main plus a testable cliRun.scenarios; cmd/natsbench is now a thin entrypoint. Adds unit coverage for scenario selection. - Expand the plan doc comment with a concrete worked example of the publisher/subscriber to subject/node assignment and expected counts.

Replace the 0x5b sentinel byte with a 'natsbench-probe:' string prefix. Both distinguish probes from the all-zero benchmark payloads equally well, but the prefix is self-documenting in packet captures and debuggers. Decode with strings.CutPrefix.

Collapse the library and its cmd/natsbench entrypoint into a single package main with a main() that calls runCLI. The benchmark is now run directly with 'go run ./coderd/x/nats/natsbench/'. Tests still live in the same directory and continue to pass.

…comments - Rename awaitReadiness -> awaitTopologyReady, readinessConverged -> isReady, readinessShortfall -> unreadySubscribers. - Give each plan field its own comment line. - Note why probe flushing dedupes pubNode (it is indexed by publisher, so multiple publishers share a node).

Compute the sorted distinct publisher and subscriber node sets once in buildPlan (plan.pubNodes / plan.subNodes) instead of recomputing uniqueInts at each call site, including on every iteration of the readiness gate loop. Several publishers or subscribers can share a node, so per-node work (flushing, burst sizing) needs the deduped set.

Drop the redundant Scenario column (the payload group header and the Replicas column already identify each row) and the always-zero Drops and always-empty Notes columns. A Status column is now included only for groups that contain an invalid run, so clean matrices render as a compact four-column table.

Pad every table cell to its column's widest value so the raw markdown also lines up in a fixed-width terminal, instead of relying on a markdown viewer to align ragged pipes. Numeric columns stay right-aligned and the Status column is left-aligned.

The standard matrix now runs with 3 publisher and 3 subscriber connections (DefaultConns) to match the prior natsbench harness, which spreads same-subject hashing across connections and raises single-node throughput over the production 1/1 default. New -publish-conns and -subscribe-conns flags apply to every run, so 1/1 production behavior is still reproducible with -publish-conns 1 -subscribe-conns 1.

Drop the trailing colons from table separator rows. Cells are already padded for terminal alignment, so the GitHub markdown alignment hints added visual noise without changing the rendered terminal output.

Add Subjects, Publishers, and Subscribers columns to the report so the workload shape is explicit. The default matrix holds these constant, but named-scenario overrides and custom runs vary them, and a table that hides the shape is easy to misread.

probeNode ran string(payload) on every delivered message, allocating a full copy of the (up to 64 KiB) payload per delivery. At high fan-out this dominated runtime via GC pressure and understated throughput by up to ~10x. Compare the probe prefix as bytes against a package-level byte slice and convert only the tiny trailing node index to a string, and only for actual probes, so benchmark payloads cost no allocation.

Run validates a fully populated config and applies no defaults, so the Messages comment claiming 'Zero means DefaultMessages' was a false contract (validate rejects Messages < 1). Likewise PublishConns and SubscribeConns do not default in natsbench; zero passes through to nats.Options, which applies the single-connection default. Clarify on the Config type that defaulting happens in the CLI, required fields must be set, and only LocalQueue*/MaxPending are derived when zero.

- Default matrix now uses replica counts (1, 3, 9) coprime with the subject count (10) so cluster scenarios actually exercise cross-node routing; previously divisor counts co-located every pub/sub pair and the readiness gate proved nothing. TestRunCluster likewise uses coprime Subjects/Replicas for cross-node integration coverage. - applySizing now warns when an explicit LocalQueueBytes is below the derived size, matching LocalQueueMsgs and MaxPending. - Wire SIGINT/SIGTERM cancellation through the CLI; the run loop stops launching scenarios once interrupted instead of emitting confusing topology errors. Move os.Exit out of the deferred-stop scope. - Replace hand-rolled formatInt with humanize.Comma. - Add unit tests for the drop-invalidation path (dropState, listener drop accounting, awaitPhase fail-fast). - Trim probe comments to the why; use wg.Go for publisher goroutines. - Document that DefaultScenarios leaves Timeout unset for the caller.

- Drop vestigial Status/Scenario NotContains assertions in the clean-group render test. - Clarify closeAll comment refers to Pubsub.Close. - De-stutter the subscriber registration error message.

Call signal stop() explicitly instead of deferring it, so os.Exit no longer skips it and the two-function split added only to satisfy the exitAfterDefer lint is unnecessary.

…rops The clean-group render test asserted NotContains 'Drops', which never appears in any output and so passed trivially. Restore the meaningful NotContains 'Status' assertion: Status is a conditional column header added only for groups with an invalid run, and this test exists to verify clean groups omit it.

sreya added 20 commits June 11, 2026 22:03

refactor(coderd/x/nats/natsbench): rename cli.go to main.go

36b4f76

refactor(coderd/x/nats/natsbench): plain separator dashes in report

31c65fb

Drop the trailing colons from table separator rows. Cells are already padded for terminal alignment, so the GitHub markdown alignment hints added visual noise without changing the rendered terminal output.

refactor(coderd/x/nats/natsbench): review nits

ad647f3

- Drop vestigial Status/Scenario NotContains assertions in the clean-group render test. - Clarify closeAll comment refers to Pubsub.Close. - De-stutter the subscriber registration error message.

refactor(coderd/x/nats/natsbench): collapse runMain into main

430f033

Call signal stop() explicitly instead of deferring it, so os.Exit no longer skips it and the two-function split added only to satisfy the exitAfterDefer lint is unnecessary.

sreya requested a review from spikecurtis June 16, 2026 05:30

github-actions Bot assigned sreya Jun 16, 2026

sreya changed the title ~~Nats benchmarking~~ chore: add nats benchmarking pkg Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add nats benchmarking pkg#26396

chore: add nats benchmarking pkg#26396
sreya wants to merge 20 commits into
mainfrom
nats-benchmarking

sreya commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sreya commented Jun 16, 2026

High Volume Single Subject

High Volume Multi Subject (10 Subjects)

High Cardinality Publish Fan In

High Cardinality Fanout

High Cardinality Fanout

Global Broadcast

Global Broadcast Subscriber Fanout

Sharded Broadcast, 10 Subjects

Sharded High-Cardinality Thin Fanout

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant