This module contains JMH-based performance benchmarks for the Java Profiler. The benchmarks are organized into two main categories:
- Throughput Benchmarks (
scenarios.throughput): Measure raw performance and scalability - Counter Benchmarks (
scenarios.counters): Measure feature-specific behavior with profiler metrics
All benchmarks require the WhiteboxProfiler to be enabled, which starts/stops the profiler between iterations and collects internal metrics.
Run all benchmarks:
./gradlew :ddprof-stresstest:jmhRun specific benchmark class:
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
CallTraceStorageQuickBenchmarkRun specific benchmark method:
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
CallTraceStorageBaselineBenchmark.baseline01Thread-Pjmh.fork=N: Number of JVM forks (default: 3)-Pjmh.wi=N: Warmup iterations (default: 3-5)-Pjmh.i=N: Measurement iterations (default: 3-5)-Pjmh.wt=N: Warmup time in seconds (default: 1-2)-Pjmh.t=N: Measurement time in seconds (default: 3-5)-Pjmh.resultFormat=json|csv|text: Output format-Pjmh.resultFile=path: Output file path-Pjmh.prof='profiler': JMH profiler to use
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
-Pjmh.fork=1 -Pjmh.wi=1 -Pjmh.i=2 \
YourBenchmarkLocated in scenarios.throughput.*
Measure end-to-end profiling engine performance including signal handlers, stack walking, CallTraceStorage operations, and JFR processing under various thread lifecycle patterns.
Quick smoke test (~2 minutes):
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
ProfilerThroughputQuickBenchmarkBaseline scaling (~12 minutes):
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
ProfilerThroughputBaselineBenchmarkThread churn (~20-30 minutes):
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
ProfilerThroughputThreadChurnBenchmarkSlot exhaustion (~15-20 minutes):
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
ProfilerThroughputSlotExhaustionBenchmarkDocumentation: See doc/architecture/CallTraceStorage.md for detailed CallTraceStorage architecture, benchmark results analysis, and optimization recommendations.
Compare performance of JNI-based native vs DirectByteBuffer-based Java implementations for thread context storage.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
ThreadContextBenchmarkTests various thread counts to measure both single-threaded overhead and multi-threaded contention.
Measure thread filtering performance and overhead.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
scenarios.throughput.ThreadFilterBenchmarkLocated in scenarios.counters.*
These benchmarks focus on measuring specific profiler features with metric collection enabled.
Measures profiler behavior under parallel work with distributed tracing context propagation.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
TracedParallelWorkParameters:
tagCardinality: Number of unique tag values (10, 100, 1000)command: Profiler configuration with attributes
Measures overhead of JFR recording dump operations.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
DumpRecordingMeasures profiler performance with complex object graph mutations.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
GraphMutationMeasures timing overhead and profiler impact on high-frequency time measurements.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
NanoTimeMeasures profiler impact on lambda capture and invocation performance.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
CapturingLambdasThread filtering with counter metrics collection.
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
scenarios.counters.ThreadFilterBenchmark./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
-Pjmh.resultFormat=json \
-Pjmh.resultFile=build/benchmark-results.json \
YourBenchmark./gradlew :ddprof-stresstest:jmh \
-Pjmh.resultFormat=csv \
-Pjmh.resultFile=build/benchmark-results.csv \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
YourBenchmarkOverride benchmark parameters:
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
-Pjmh.p='command=cpu=50us,wall=50us' \
YourBenchmarkUse simple class names, not regex patterns:
- ❌ Wrong:
-Pjmh.includes='.*CallTrace.*' - ✅ Right:
CallTraceStorageQuickBenchmark
Use reduced iterations:
-Pjmh.fork=1 -Pjmh.wi=1 -Pjmh.i=1Verify profiler library loads:
./gradlew :ddprof-test:testDebug -Ptests=JavaProfilerTest.testGetInstanceNote: The -Ptests property works uniformly across all platforms with config-specific test tasks.
- Reduce concurrent thread counts
- Use smaller parameter values
- Increase JVM heap:
-Pjmh.jvmArgs='-Xmx4g'
For CI environments with reduced iterations:
./gradlew :ddprof-stresstest:jmh \
-Pjmh.prof='com.datadoghq.profiler.stresstest.WhiteboxProfiler' \
-Pjmh.fork=1 -Pjmh.wi=2 -Pjmh.i=3 \
-Pjmh.resultFormat=json \
-Pjmh.resultFile=build/ci-results.json \
CallTraceStorageQuickBenchmarkddprof-stresstest/
├── README.md # This file
├── src/jmh/java/
│ └── com/datadoghq/profiler/stresstest/
│ ├── Configuration.java # Base benchmark configuration
│ ├── WhiteboxProfiler.java # Custom JMH profiler
│ └── scenarios/
│ ├── throughput/ # Raw performance benchmarks
│ │ ├── ProfilerThroughput* # End-to-end profiling engine suite
│ │ ├── ThreadContext* # ThreadContext benchmarks
│ │ └── ThreadFilter* # ThreadFilter benchmarks
│ └── counters/ # Feature-specific benchmarks
│ ├── TracedParallelWork # Distributed tracing overhead
│ ├── DumpRecording # JFR dump overhead
│ ├── GraphMutation # Object graph mutations
│ ├── NanoTime # Timing overhead
│ ├── CapturingLambdas # Lambda performance
│ └── ThreadFilter* # Thread filtering with counters
- CallTraceStorage Architecture:
doc/architecture/CallTraceStorage.md- Detailed triple-buffer architecture, benchmark results, and optimization guide - Main README:
README.md(project root) - General project overview - Build Configuration:
CLAUDE.md- Build system and development guidelines
When adding new benchmarks:
- Place in appropriate category (
throughputorcounters) - Extend
Configuration.javafor common setup - Use
WhiteboxProfilerfor profiler metric collection - Document in this README with:
- Purpose
- Example command
- Key parameters
- Add detailed analysis to
doc/if architectural (like CallTraceStorage)
Benchmarks help establish and validate:
- Scalability: Linear scaling up to core count
- Overhead: <5% impact on application performance
- Throughput: Millions of samples per second
- Latency: <1μs per profiling operation
- Memory: Bounded memory usage under load
Run benchmarks before and after changes to validate performance regressions.