|
| 1 | +# Load Testing and Benchmarking Framework Spec |
| 2 | + |
| 3 | +Originally extracted from the HTTP/2 design spec (`degroff/http2` branch, `docs/plans/2026-02-13-http2-design.md`, Section 15) |
| 4 | +and the implementation plan (`docs/plans/2026-02-13-http2-implementation.md`, Phases 6 and 8). |
| 5 | + |
| 6 | +Implemented on the `degroff/load_tests` branch. |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Goal |
| 11 | + |
| 12 | +A self-contained, reproducible benchmark suite that: |
| 13 | +1. Tests java-http against multiple competing Java HTTP servers with identical workloads |
| 14 | +2. Produces structured, machine-readable JSON results with system metadata |
| 15 | +3. Auto-generates README performance table from the latest results |
| 16 | +4. Can be run locally via a single script |
| 17 | + |
| 18 | +## Benchmark Tools |
| 19 | + |
| 20 | +### wrk (primary) |
| 21 | + |
| 22 | +C-based HTTP benchmark tool using kqueue (macOS) / epoll (Linux). Very fast — not the bottleneck. Provides latency percentiles (p50, p90, p99) via Lua `done()` callback that outputs JSON. Scenarios are defined as Lua files in `load-tests/scenarios/`. |
| 23 | + |
| 24 | +Install: `brew install wrk` (macOS), `apt install wrk` (Linux). |
| 25 | + |
| 26 | +### fusionauth-load-tests (secondary comparison) |
| 27 | + |
| 28 | +Java-based load generator using virtual threads and the JDK HttpClient (`~/dev/fusionauth/fusionauth-load-tests`). Provides a "real Java client" perspective. Achieves lower absolute RPS (~50-60K vs ~110K for wrk) due to Java client overhead. Sends a small request body with each GET request. |
| 29 | + |
| 30 | +Useful for cross-validating relative server rankings. The `--tool both` flag runs both tools against each server. |
| 31 | + |
| 32 | +### Decisions |
| 33 | + |
| 34 | +- **Dropped Apache Bench**: Single-threaded, bottlenecks at ~30-50K req/s. |
| 35 | +- **Dropped h2load**: Originally planned for HTTP/2 benchmarks. Will reconsider when HTTP/2 lands. |
| 36 | +- **Dropped GitHub Actions workflow**: GHA shared runners (2 vCPU, 7GB RAM) produce poor and noisy performance numbers. Not useful for benchmarking. Benchmarks should be run on dedicated hardware. |
| 37 | + |
| 38 | +## Vendor Servers |
| 39 | + |
| 40 | +All servers implement the same 5 endpoints on port 8080: |
| 41 | + |
| 42 | +| Server | Directory | Implementation | Status | |
| 43 | +|---|---|---|---| |
| 44 | +| java-http | `load-tests/self/` | LoadHandler (5 endpoints) | Done | |
| 45 | +| JDK HttpServer | `load-tests/jdk-httpserver/` | `com.sun.net.httpserver.HttpServer` | Done | |
| 46 | +| Jetty | `load-tests/jetty/` | Jetty 12.0.x embedded | Done | |
| 47 | +| Netty | `load-tests/netty/` | Netty 4.1.x with HTTP codec | Done | |
| 48 | +| Apache Tomcat | `load-tests/tomcat/` | Tomcat 8.5.x embedded | Done (kept at 8.5.x; upgrade to 10.x deferred to HTTP/2 work) | |
| 49 | + |
| 50 | +Endpoints: |
| 51 | +- `GET /` — No-op (reads body, returns empty 200) |
| 52 | +- `GET /no-read` — No-op (does not read body, returns empty 200) |
| 53 | +- `GET /hello` — Returns "Hello world" |
| 54 | +- `GET /file?size=N` — Returns N bytes of generated content (default 1MB) |
| 55 | +- `POST /load` — Base64-encodes request body and returns it |
| 56 | + |
| 57 | +Each server follows the same pattern: |
| 58 | +- `build.savant` — Savant build config with proper dependency resolution (including `maven()` fetch for transitive deps) |
| 59 | +- `src/main/java/io/fusionauth/http/load/` — Server implementation |
| 60 | +- `src/main/script/start.sh` — Startup script |
| 61 | + |
| 62 | +## Benchmark Scenarios |
| 63 | + |
| 64 | +| Scenario | Method | Endpoint | wrk Threads | Connections | Purpose | |
| 65 | +|---|---|---|---|---|---| |
| 66 | +| `baseline` | GET | `/` | 12 | 100 | No-op throughput ceiling | |
| 67 | +| `hello` | GET | `/hello` | 12 | 100 | Small response body | |
| 68 | +| `post-load` | POST | `/load` | 12 | 100 | POST with body, Base64 response | |
| 69 | +| `large-file` | GET | `/file?size=1048576` | 4 | 10 | 1MB response throughput | |
| 70 | +| `high-concurrency` | GET | `/` | 12 | 1000 | Connection pressure | |
| 71 | +| `mixed` | Mixed | Rotates all endpoints | 12 | 100 | Real-world mix (wrk only) | |
| 72 | + |
| 73 | +Note: The `mixed` scenario is skipped for fusionauth-load-tests since it only supports a single URL per configuration. |
| 74 | + |
| 75 | +## Scripts |
| 76 | + |
| 77 | +### run-benchmarks.sh |
| 78 | + |
| 79 | +Main orchestrator. Builds each server via Savant, starts it, runs benchmarks, stops it, aggregates JSON results. |
| 80 | + |
| 81 | +``` |
| 82 | +./run-benchmarks.sh [OPTIONS] |
| 83 | +
|
| 84 | +Options: |
| 85 | + --servers <list> Comma-separated server list (default: all) |
| 86 | + --scenarios <list> Comma-separated scenario list (default: all) |
| 87 | + --tool <name> Benchmark tool: wrk, fusionauth, or both (default: wrk) |
| 88 | + --label <name> Label for the results file |
| 89 | + --output <dir> Output directory (default: load-tests/results/) |
| 90 | + --duration <time> Duration per scenario (default: 30s) |
| 91 | +``` |
| 92 | + |
| 93 | +### update-readme.sh |
| 94 | + |
| 95 | +Reads the latest JSON from `load-tests/results/`, generates a markdown performance table, and replaces the `## Performance` section in the project root `README.md`. |
| 96 | + |
| 97 | +### compare-results.sh |
| 98 | + |
| 99 | +Compares two result JSON files side-by-side with normalized ratios. Useful for detecting regressions or improvements between runs. |
| 100 | + |
| 101 | +## Structured Output Format |
| 102 | + |
| 103 | +Results are saved as JSON with ISO timestamps: `results/YYYY-MM-DDTHH-MM-SSZ.json` |
| 104 | + |
| 105 | +Results are `.gitignore`d — they are machine-specific and not committed to the repo. |
| 106 | + |
| 107 | +```json |
| 108 | +{ |
| 109 | + "version": 1, |
| 110 | + "timestamp": "2026-02-18T16:35:25Z", |
| 111 | + "system": { |
| 112 | + "os": "Darwin", |
| 113 | + "arch": "arm64", |
| 114 | + "cpuModel": "Apple M4", |
| 115 | + "cpuCores": 10, |
| 116 | + "ramGB": 24, |
| 117 | + "javaVersion": "openjdk version \"21.0.10\" 2026-01-20", |
| 118 | + "description": "Local benchmark" |
| 119 | + }, |
| 120 | + "tools": { |
| 121 | + "selected": "wrk", |
| 122 | + "wrkVersion": "wrk 4.2.0 [kqueue] ..." |
| 123 | + }, |
| 124 | + "results": [ |
| 125 | + { |
| 126 | + "server": "self", |
| 127 | + "tool": "wrk", |
| 128 | + "protocol": "http/1.1", |
| 129 | + "scenario": "baseline", |
| 130 | + "config": { |
| 131 | + "threads": 12, |
| 132 | + "connections": 100, |
| 133 | + "duration": "30s", |
| 134 | + "endpoint": "/" |
| 135 | + }, |
| 136 | + "metrics": { |
| 137 | + "requests": 1117484, |
| 138 | + "duration_us": 10100310, |
| 139 | + "rps": 110638.58, |
| 140 | + "avg_latency_us": 885.05, |
| 141 | + "p50_us": 833, |
| 142 | + "p90_us": 979, |
| 143 | + "p99_us": 2331, |
| 144 | + "max_us": 89174, |
| 145 | + "errors_connect": 0, |
| 146 | + "errors_read": 0, |
| 147 | + "errors_write": 0, |
| 148 | + "errors_timeout": 0 |
| 149 | + } |
| 150 | + } |
| 151 | + ] |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +## Directory Structure |
| 156 | + |
| 157 | +``` |
| 158 | +load-tests/ |
| 159 | + .gitignore # Ignores *.iml files |
| 160 | + README.md # Usage documentation |
| 161 | + run-benchmarks.sh # Main orchestrator script |
| 162 | + update-readme.sh # Parses results, updates project README |
| 163 | + compare-results.sh # Compares two result files |
| 164 | + results/ # JSON results (gitignored) |
| 165 | + scenarios/ # wrk Lua scenario files |
| 166 | + baseline.lua |
| 167 | + hello.lua |
| 168 | + post-load.lua |
| 169 | + large-file.lua |
| 170 | + high-concurrency.lua |
| 171 | + mixed.lua |
| 172 | + json-report.lua # Shared done() function for JSON output |
| 173 | + self/ # java-http |
| 174 | + jdk-httpserver/ # JDK built-in HttpServer |
| 175 | + jetty/ # Eclipse Jetty 12.0.x |
| 176 | + netty/ # Netty 4.1.x |
| 177 | + tomcat/ # Apache Tomcat 8.5.x |
| 178 | +``` |
| 179 | + |
| 180 | +## Performance Optimization Investigation |
| 181 | + |
| 182 | +Once we have several benchmark runs collected, investigate optimizations to get java-http consistently #1 in RPS across all scenarios. |
| 183 | + |
| 184 | +Areas to investigate: |
| 185 | +- Profile under load with `async-profiler` or JDK Flight Recorder (lock contention, allocation pressure, syscall overhead) |
| 186 | +- Compare request processing paths against Netty and Jetty |
| 187 | +- Thread scheduling and virtual thread usage — blocking where we could be non-blocking? |
| 188 | +- Socket/channel configuration (TCP_NODELAY, SO_REUSEPORT, buffer sizes) |
| 189 | +- Read/write loop for unnecessary copies or allocations per request |
| 190 | +- Selector strategy and worker thread pool sizing for high-connection-count scenarios |
| 191 | + |
| 192 | +## Future Work |
| 193 | + |
| 194 | +- **HTTP/2 benchmarks**: Add h2load scenarios when HTTP/2 lands on the `degroff/http2` branch. Upgrade Tomcat to 10.x for HTTP/2 support. |
| 195 | +- **Performance optimization**: Profile and optimize java-http based on benchmark data. |
0 commit comments