Reduce sharing in struct client between main and IO threads by abokhalill · Pull Request #15006 · redis/redis

abokhalill · 2026-04-06T11:09:23Z

Cache line 0 of struct client mixes main thread hot fields (flags, conn)
with IO-thread-hot fields (tid, io_flags). Under threaded I/O, every command
dispatch invalidates the IO thread's L1d copy of the line and vice versa.

This patch reorders fields so IO-thread fields start at cache line 1. Pure field
reorder.

Microbenchmark (500K samples, pinned to separate physical cores, AMD Zen 4):

	Shared line	Isolated	Delta
Mean TSC/iter	103.6	88.9	−14%
p99 TSC/iter	141	94	−33%

Welch's t = 5.92, p = 0.0001.

Builds clean, passes redis-benchmark --threads 4 with --io-threads 4.

Found via static analysis of struct layout relative to cache line boundaries.

Note

Medium Risk
Reorders fields in the hot-path client struct to change memory layout, which can subtly affect alignment/ABI assumptions and any code relying on field offsets, despite no logic changes.

Overview
Reorders struct client fields in server.h to separate main-thread hot fields (id/flags/conn/resp/db metadata) from IO-thread hot fields (tid/running_tid/io_flags/read_error) onto different cache lines, reducing false sharing under threaded I/O.

Adds an explicit padding int (_pad0) and cache-line grouping comments to keep alignment stable after the reorder.

^{Reviewed by Cursor Bugbot for commit 649cc36. Bugbot is set up for automated code reviews on this repo. Configure here.}

Reorder fields in struct client so that IO thread hot fields (tid, running_tid, io_flags, read_error) start at cache line 1 instead of sharing cache line 0 with main thread hot fields. Under threaded I/O, the main thread writes flags on every command dispatch while the IO thread reads/writes io_flags on every I/O event. With both on cache line 0, each access triggers a cross core L1d invalidation. Before: 103.6 TSC/iter mean, 141 TSC p99 After: 88.9 TSC/iter mean, 94 TSC p99 Delta: -14.2% mean, -33.3% tail latency per atomic store Welch t-test: t=5.92, p=0.0001

CLAassistant · 2026-04-06T11:09:30Z

All committers have signed the CLA.

jit-ci · 2026-04-06T11:09:32Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

ShooterIT · 2026-04-09T13:03:40Z

is there any performance improvement?

fcostaoliveira · 2026-04-10T13:14:35Z

@ShooterIT we've just kicked off the following Redis benchmarks on bare-metal AWS runners (3 datapoints each, both x86 m7i and ARM m8g) to verify the gains under high-concurrency io-threads workloads:

memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-16 (2000 conns, pipeline-16)
memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-10 (2000 conns, pipeline-10)
memtier_benchmark-3Mkeys-string-mixed-20-80-with-512B-values-pipeline-10-5200_conns (5200 conns, mixed)
memtier_benchmark-3Mkeys-string-get-with-1KiB-values-pipeline-10-2000_conns (2000 conns, read-only)

All four use the oss-standalone-12/16-io-threads topologies — these should stress the main↔IO-thread false-sharing window the most.

One caveat worth flagging: x86 Sapphire Rapids has 64-byte cache lines while ARM Neoverse V2 (m8g) has 128-byte cache lines. The _pad0 separation may not be sufficient on ARM if the surrounding fields still land within the same 128-byte line — so the gains might hold on x86 but be flat (or need extra padding) on ARM. We'll see what the data shows.

Will report numbers once the runs complete (~30-45 min).

fcostaoliveira · 2026-04-10T13:24:10Z

CE Performance Automation : step 2 of 2 (benchmark) RUNNING...

This comment was automatically generated given a benchmark was triggered.

Started benchmark suite at 2026-04-10 13:34:18.218282 and took 602.649904 seconds up until now.
Status: [####################------------------------------------------------------------] 25.0% completed.

In total will run 4 benchmarks.
- 3 pending.
- 1 completed:
- 1 successful.
- 0 failed.
You can check a the status in detail via the grafana link

fcostaoliveira · 2026-04-10T14:54:54Z

CE Performance Automation : step 1 of 2 (build) STARTING...

This comment was automatically generated given a benchmark was triggered.
Started building at 2026-04-10 14:54:53.659481
You can check each build/benchmark progress in grafana:

git hash: ae95526
git branch: unstable
commit date and time: 2026-04-09 17:58:37+03:00
commit summary: RED-183356: Automate tarball creation (RED-183356: Automate tarball creation #14911)
test filters:
- command priority lower limit: 0
- command priority upper limit: 100000
- test name regex: memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-16|memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-10|memtier_benchmark-3Mkeys-string-mixed-20-80-with-512B-values-pipeline-10-5200_conns|memtier_benchmark-3Mkeys-string-get-with-1KiB-values-pipeline-10-2000_conns
- command group regex: .*

fcostaoliveira · 2026-04-10T14:55:53Z

CE Performance Automation : step 1 of 2 (build) DONE.

This comment was automatically generated given a benchmark was triggered.
Started building at 2026-04-11 19:56:20.515020 and took 58 seconds.
You can check each build/benchmark progress in grafana:

git hash: 0d85627
git branch: unstable
commit date and time: 2026-04-10 23:25:56+08:00
commit summary: Use no_value dict type for stream_idmp_keys to explicitly mark it as a key-only set (Use no_value dict type for stream_idmp_keys to explicitly mark it as a key-only set #14987)
test filters:
- command priority lower limit: 0
- command priority upper limit: 100000
- test name regex: memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-16|memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-10|memtier_benchmark-3Mkeys-string-get-with-1KiB-values-pipeline-10-2000_conns
- command group regex: .*

You can check a comparison in detail via the grafana link

abokhalill · 2026-04-16T23:53:28Z

@fcostaoliveira did the bare metal benchmark results come back? I see the automation completed on Apr 10-11 but didn't see the final numbers posted. I'm happy to address the ARM 128 byte cl concern with additional padding if needed.

fcostaoliveira requested a review from ShooterIT April 10, 2026 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce sharing in struct client between main and IO threads#15006

Reduce sharing in struct client between main and IO threads#15006
abokhalill wants to merge 1 commit intoredis:unstablefrom
abokhalill:fix/client-struct-false-sharing

abokhalill commented Apr 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

CLAassistant commented Apr 6, 2026 •

edited

Loading

Uh oh!

jit-ci Bot commented Apr 6, 2026

Uh oh!

ShooterIT commented Apr 9, 2026

Uh oh!

fcostaoliveira commented Apr 10, 2026

Uh oh!

fcostaoliveira commented Apr 10, 2026 •

edited

Loading

Uh oh!

fcostaoliveira commented Apr 10, 2026

Uh oh!

fcostaoliveira commented Apr 10, 2026 •

edited

Loading

Uh oh!

abokhalill commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

abokhalill commented Apr 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jit-ci Bot commented Apr 6, 2026

Uh oh!

ShooterIT commented Apr 9, 2026

Uh oh!

fcostaoliveira commented Apr 10, 2026

Uh oh!

fcostaoliveira commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CE Performance Automation : step 2 of 2 (benchmark) RUNNING...

Uh oh!

fcostaoliveira commented Apr 10, 2026

CE Performance Automation : step 1 of 2 (build) STARTING...

Uh oh!

fcostaoliveira commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CE Performance Automation : step 1 of 2 (build) DONE.

Uh oh!

abokhalill commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abokhalill commented Apr 6, 2026 •

edited by cursor Bot

Loading

CLAassistant commented Apr 6, 2026 •

edited

Loading

fcostaoliveira commented Apr 10, 2026 •

edited

Loading

fcostaoliveira commented Apr 10, 2026 •

edited

Loading