Skip to content

Enable x86 TSC monotonic clock by default with runtime calibration#15018

Open
fcostaoliveira wants to merge 5 commits intoredis:unstablefrom
filipecosta90:hw-clock-x86-default
Open

Enable x86 TSC monotonic clock by default with runtime calibration#15018
fcostaoliveira wants to merge 5 commits intoredis:unstablefrom
filipecosta90:hw-clock-x86-default

Conversation

@fcostaoliveira
Copy link
Copy Markdown
Collaborator

@fcostaoliveira fcostaoliveira commented Apr 8, 2026

Summary

On x86_64 Linux, Redis's hardware TSC clock path was gated behind a
compile-time USE_PROCESSOR_CLOCK flag. Without it, Redis falls back to
clock_gettime(CLOCK_MONOTONIC) — a syscall that costs ~50-100ns per
invocation vs ~5-10ns for RDTSC.

This is already the default on ARM (Generic Timer), but x86 users had to
opt in manually. Additionally, the existing x86 calibration parsed the
CPU "model name" field for a GHz string, which fails on CPUs that don't
include a frequency in that field.

This change:

  1. Removes the USE_PROCESSOR_CLOCK compile-time gate on x86_64 Linux
  2. Enables HW TSC by default when constant_tsc is present in /proc/cpuinfo flags
  3. Replaces the fragile GHz regex parsing with runtime calibration: measures
    RDTSC ticks over a 10ms clock_gettime interval at startup
  4. Falls back to POSIX clock if constant_tsc is absent

Benchmark Results — io-threads validation (3 datapoints / cell)

Test: memtier_benchmark-1Mkeys-string-setget2000c-1KiB-pipeline-10 (2000 client connections, 1M keys, 10% SET / 90% GET, 1KiB values, pipeline 10).
Platform: x86-aws-m7i.metal-24xl — Intel Xeon Platinum 8488C (Sapphire Rapids), 96 cores bare metal.
Comparison: hw-clock-x86-default (3ce8b055a) vs unstable (0fa78fd8f), 3 independent runs each.

Topology hw-clock (mean ± σ, ops/sec) unstable (mean ± σ, ops/sec) Δ mean
oss-standalone 811,266 ± 2,951 803,820 ± 5,376 +0.9% (flat)
oss-standalone-02-io-threads 914,202 ± 8,092 924,562 ± 7,690 −1.1% (flat)
oss-standalone-04-io-threads 2,353,548 ± 33,466 2,334,689 ± 28,363 +0.8% (flat)
oss-standalone-08-io-threads 2,992,087 ± 53,282 2,776,512 ± 43,426 +7.8%
oss-standalone-12-io-threads 2,969,603 ± 57,797 2,726,743 ± 67,507 +8.9%
oss-standalone-16-io-threads 2,786,357 ± 38,964 2,618,620 ± 15,894 +6.4%

Interpretation. The improvement concentrates exactly where the theory predicts: topologies where multiple io-threads contend on clock_gettime (8/12/16 io-threads) show consistent +6.4% to +8.9% gains with non-overlapping confidence intervals across 3 runs. Low-thread-count topologies (standalone, 2/4 io-threads) are flat — no regressions, but no measurable win either, since the syscall-per-command cost isn't the bottleneck at low concurrency.

ARM: No change expected and none observed — ARM already uses the HW clock path (Generic Timer / CNTVCT_EL0) by default. This PR only affects the x86_64 Linux path.

The underlying mechanism is the existing call() optimization at server.c:3910-3935, which skips ustime() / gettimeofday() when a HW monotonic clock is available. By flipping x86 TSC on by default (when constant_tsc is reported), that fast path is taken on Intel Sapphire Rapids and equivalents without the user having to rebuild.


Note

Medium Risk
Changes the default monotonic clock source on x86_64 Linux and adds a startup calibration path; incorrect calibration or platform quirks could impact timing behavior, though it falls back to POSIX on detected issues.

Overview
On x86_64 Linux, the hardware TSC monotonic clock path is now attempted by default (no longer gated by USE_PROCESSOR_CLOCK) when /proc/cpuinfo reports constant_tsc.

The x86 init logic drops CPU model-name GHz parsing and instead calibrates ticks/us at startup by measuring RDTSC over a ~10ms CLOCK_MONOTONIC interval, with additional error handling (EINTR-safe nanosleep, non-monotonic TSC detection) that triggers fallback to the POSIX clock.

Reviewed by Cursor Bugbot for commit ddcc038. Bugbot is set up for automated code reviews on this repo. Configure here.

Remove the USE_PROCESSOR_CLOCK compile-time gate for x86_64 Linux so the
hardware TSC is enabled automatically when the CPU advertises constant_tsc.

Replace the fragile "model name" GHz regex parsing with runtime calibration:
measure RDTSC ticks over a 10 ms wall-clock interval to determine the TSC
frequency.  This works on CPUs whose /proc/cpuinfo model-name line does not
include a "@ X.XGHz" suffix.

With the HW clock active, the call() hot path can use getMonotonicUs()
instead of gettimeofday(), eliminating 2-3 system calls per command on x86.
@augmentcode
Copy link
Copy Markdown

augmentcode Bot commented Apr 8, 2026

🤖 Augment PR Summary

Summary: Enables the x86_64 Linux TSC-based monotonic clock by default, using runtime calibration.
Changes:

  • Removes the compile-time USE_PROCESSOR_CLOCK gate for the x86_64 Linux path.
  • Enables the HW TSC clock when constant_tsc is present in /proc/cpuinfo.
  • Replaces model-name GHz parsing with startup calibration (measure __rdtsc() over ~10ms via CLOCK_MONOTONIC).
  • Falls back to POSIX CLOCK_MONOTONIC when the flag is missing or calibration fails.
  • Updates documentation/comments to reflect the new default behavior.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

Comment thread src/monotonic.c Outdated

/* Sleep ~10 ms to accumulate enough ticks for an accurate measurement. */
struct timespec req = {0, 10000000};
nanosleep(&req, NULL);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/monotonic.c:90: The nanosleep() and surrounding clock_gettime() calls ignore return values; if nanosleep is interrupted (EINTR) or clock_gettime fails, the calibration can compute an incorrect mono_ticksPerMicrosecond and skew all monotonic timing. Consider checking/handling these return codes so you reliably fall back to the POSIX path on failure.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed in 2488aba4b (on this branch — Handle calibration syscall failures in monotonicInit_x86linux):

  • clock_gettime() return values checked at both sample points; on failure we log and return, leaving getMonotonicUs NULL so the POSIX clock path is used.
  • nanosleep() return value checked; EINTR retries with the remaining time, any other errno logs and returns to POSIX fallback.
  • errno.h included for the EINTR check.

Thread can be resolved.

@fcostaoliveira fcostaoliveira requested a review from sundb April 8, 2026 19:34
Check return values of clock_gettime() and nanosleep() during TSC
calibration.  On failure (or EINTR for nanosleep), fall back to the
POSIX clock path instead of computing an incorrect tick rate.
@sundb sundb added this to Redis 8.8 Apr 9, 2026
@github-project-automation github-project-automation Bot moved this to Todo in Redis 8.8 Apr 9, 2026
Copy link
Copy Markdown
Member

@ShooterIT ShooterIT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread src/monotonic.c
Copy link
Copy Markdown
Collaborator

@sundb sundb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Addresses ShooterIT's review comment on src/monotonic.c.

Invariant TSC on modern x86 is guaranteed to be monotonic across a single
core's context, but TSC migration across sockets/cores with misaligned
TSC, virtualisation, or firmware quirks can still produce a non-monotonic
sample pair. Subtracting uint64_t in that case wraps to a huge value and
computes a nonsense tick rate.

Guard against tsc_end <= tsc_start in monotonicInit_x86linux and bail out
to the POSIX clock path when detected, matching the behaviour of the
other calibration-failure branches in the same function.
@fcostaoliveira
Copy link
Copy Markdown
Collaborator Author

fcostaoliveira commented Apr 22, 2026

CE Performance Automation : step 1 of 2 (build) DONE.

This comment was automatically generated given a benchmark was triggered.
Started building at 2026-04-22 13:12:42.151837 and took 59 seconds.
You can check each build/benchmark progress in grafana:

  • git hash: ddcc038
  • git branch: hw-clock-x86-default
  • commit date and time: 2026-04-22 01:03:49+01:00
  • commit summary: Reject non-monotonic TSC sample during x86 calibration
  • test filters:
    • command priority lower limit: 0
    • command priority upper limit: 100000
    • test name regex: .*string.pipeline-(100|500).
    • command group regex: .*

You can check a comparison in detail via the grafana link

@fcostaoliveira
Copy link
Copy Markdown
Collaborator Author

fcostaoliveira commented Apr 22, 2026

CE Performance Automation : step 2 of 2 (benchmark) RUNNING...

This comment was automatically generated given a benchmark was triggered.

Started benchmark suite at 2026-04-22 13:13:42.993307 and took 130.941803 seconds up until now.
Status: [###########---------------------------------------------------------------------] 14.29% completed.

In total will run 7 benchmarks.
- 6 pending.
- 1 completed:
- 1 successful.
- 0 failed.
You can check a the status in detail via the grafana link

Comment thread src/monotonic.c
struct timespec ts_start, ts_end;
uint64_t tsc_start, tsc_end;

if (clock_gettime(CLOCK_MONOTONIC, &ts_start) != 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very fragile and inaccurate to me.
A context switch between the system call and the he clock reading, would cause inaccuracies that in rare cases could be huge, and besides a small inaccuracy can cause a huge drift after a long period.
Luckily we don't use the monotonic clock for anything important, but still, I think this manual calibration is wrong.
How about using it only when the parsing of the official one fails?

Comment thread src/monotonic.c


const char * monotonicInit(void) {
#if defined(USE_PROCESSOR_CLOCK) && defined(__x86_64__) && defined(__linux__)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this being discussed back when this code was originally added, I remember two claims that bother me:

  1. There are hardware that have unreliable clocks. I remember seeing a comment listing them in some Linux source file.
  2. I remember observations that indicated that with the exception of some bad hypervisor platform, the system call is actually a very fast VDSO, and that Linux already knows, when it's safe to use the HW clock without a real syscall, and when it does, its nearly as fast as using the HW clock directly.

Do you have any new realization around these, or evidence to contradict that research.

@yoav-steinberg feel free to comment from memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

5 participants