FusionAuth
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 28 additions & 42 deletions b/‎README.md‎
Lines changed: 28 additions & 42 deletions
diff --git a/‎docs/plans/load-testing-spec.md‎
Lines changed: 195 additions & 0 deletions b/‎docs/plans/load-testing-spec.md‎
Lines changed: 195 additions & 0 deletions
diff --git a/‎load-tests/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎load-tests/.gitignore‎
Lines changed: 1 addition & 0 deletions
@@ -3,4 +3,6 @@
 out
 build
 load-tests/lib
+load-tests/results/
 load-tests/tomcat/web/WEB-INF/lib
+.worktrees
@@ -173,58 +173,44 @@ Then you can open `https://example.org` in a browser or call it using an HTTP cl
 
 ## Performance
 
-A key purpose for this project is obtain screaming performance. Here are some basic metrics using the FusionAuth load test suite against a boilerplate request handler. The request handler simply returns a `200`. Here are some simple comparisons between `apache-tomcat`, `Netty`, and `java-http`.
+A key purpose for this project is to obtain screaming performance. Here are benchmark results comparing `java-http` against other Java HTTP servers.
 
-The load test configuration is set to `100` clients with `100,000` requests each per worker. This means the entire test will execute `10,000,000` requests. The HTTP client is [Restify](https://github.com/inversoft/restify) which is a FusionAuth library that uses `HttpURLConnection` under the hoods. This REST client is used because it is considerably faster than the native Java REST client. In a real life example, depending upon your application, this performance may not matter. For the purposes of a load test, we have attempted to remove as many limitations to pushing the server as hard as we can. 
+These benchmarks ensure `java-http` stays near the top in raw throughput, and we'll be working on claiming the top position -- even if only for bragging rights, since in practice your database and application code will be the bottleneck long before the HTTP server.
 
-All the servers were HTTP so that TLS would not introduce any additional latency.
+All servers implement the same request handler that reads the request body and returns a `200`. All servers were tested over HTTP (no TLS) to isolate server performance.
 
-Here are the current test results: (in progress...)
+| Server | Requests/sec | Failures/sec | Avg latency (ms) | P99 latency (ms) | vs java-http |
+|--------|-------------:|-------------:|------------------:|------------------:|-------------:|
+| java-http      |      114,483 |            0 |              0.86 |              1.68 |       100.0% |
+| JDK HttpServer |       89,870 |            0 |              1.08 |              2.44 |        78.5% |
+| Jetty          |      111,500 |            0 |              1.17 |             11.89 |        97.3% |
+| Netty          |      117,119 |            0 |              0.85 |              1.75 |       102.3% |
+| Apache Tomcat  |      102,030 |            0 |              0.94 |              2.41 |        89.1% |
 
-| Server         | Avg requests per second   | Failures per second   | Avg latency in ms       | Normalized Performance (%) |
-|----------------|---------------------------|-----------------------|-------------------------|----------------------------|
-| java-http      | 101,317                   | 0                     | 0.350                   | 100%                       |
-| Apache Tomcat  | 83,463                    | 0                     | 0.702                   | 82.3%                      | 
-| Netty          | ?                         | ?                     | ?                       |                            |
-| OkHttp         | ?                         | ?                     | ?                       |                            |
-| JDK HttpServer | ?                         | ?                     | ?                       |                            |
+#### Under stress (1,000 concurrent connections)
 
-Note the JDK HTTP Server is `com.sun.net.httpserver.HttpServer`. I don't know that anyone would use this in production, the JDK has not yet made a version of this using a public API. It is included here for reference only. 
+| Server | Requests/sec | Failures/sec | Avg latency (ms) | P99 latency (ms) | vs java-http |
+|--------|-------------:|-------------:|------------------:|------------------:|-------------:|
+| java-http      |      114,120 |            0 |              8.68 |             11.88 |       100.0% |
+| JDK HttpServer |       50,870 |      17655.7 |              6.19 |             22.61 |        44.5% |
+| Jetty          |      108,434 |            0 |              9.20 |             14.83 |        95.0% |
+| Netty          |      115,105 |            0 |              8.61 |             10.09 |       100.8% |
+| Apache Tomcat  |       99,163 |            0 |              9.88 |             18.77 |        86.8% |
 
-Load test last performed May 30, 2025. Using the [fusionauth-load-test](https://github.com/fusionauth/fusionauth-load-tests) library.
+_JDK HttpServer (`com.sun.net.httpserver`) is included as a baseline since it ships with the JDK and requires no dependencies. However, as the stress test shows, it is not suitable for production workloads — it suffers significant failures under high concurrency._
 
-### Running load tests
-
-Start the HTTP server to test.
-
-#### java-http
-
-Start the HTTP server. Run the following commands from the `java-http` repo.
+_Benchmark performed 2026-02-19 on Darwin, arm64, 10 cores, Apple M4, 24GB RAM (MacBook Air)._
+_OS: macOS 15.7.3._
+_Java: openjdk version "21.0.10" 2026-01-20._
 
+To reproduce:
 ```bash
-cd load-tests/self
-sb clean start
+cd load-tests
+./run-benchmarks.sh --tool wrk --scenarios hello,high-concurrency
+./update-readme.sh
 ```
 
-#### Apache Tomcat
-
-Start the HTTP server. Run the following commands from the `java-http` repo.
-
-```bash
-cd load-tests/tomcat
-sb clean start
-```
-
-Once you have the server started you wish to test, start the load test. Run the following commands from the `fusionauth-load-tests` repo.
-
-```bash
-sb clean int
-./load-test.sh HTTP.json
-```
-
-Netty and Tomcat both seem to suffer from buffering and connection issues at very high scale. Regardless of the configuration, both servers always begins to fail with connection timeout problems at scale. `java-http` does not have these issues because of the way it handles connections via the selector. Connections don't back up and client connection pools can always be re-used with Keep-Alive.
-
-The general requirements and roadmap are as follows:
+See [load-tests/README.md](load-tests/README.md) for full usage and options.
 
 ## Todos and Roadmap
 
@@ -273,7 +259,7 @@ We are looking for Java developers that are interested in helping us build the c
 ```bash
 $ mkdir ~/savant
 $ cd ~/savant
-$ wget http://savant.inversoft.org/org/savantbuild/savant-core/2.0.2/savant-2.0.2.tar.gz
+$ wget https://savant.inversoft.org/org/savantbuild/savant-core/2.0.2/savant-2.0.2.tar.gz
 $ tar xvfz savant-2.0.2.tar.gz
 $ ln -s ./savant-2.0.2 current
 $ export PATH=$PATH:~/savant/current/bin/
 
@@ -0,0 +1,195 @@
+# Load Testing and Benchmarking Framework Spec
+
+Originally extracted from the HTTP/2 design spec (`degroff/http2` branch, `docs/plans/2026-02-13-http2-design.md`, Section 15)
+and the implementation plan (`docs/plans/2026-02-13-http2-implementation.md`, Phases 6 and 8).
+
+Implemented on the `degroff/load_tests` branch.
+
+---
+
+## Goal
+
+A self-contained, reproducible benchmark suite that:
+1. Tests java-http against multiple competing Java HTTP servers with identical workloads
+2. Produces structured, machine-readable JSON results with system metadata
+3. Auto-generates README performance table from the latest results
+4. Can be run locally via a single script
+
+## Benchmark Tools
+
+### wrk (primary)
+
+C-based HTTP benchmark tool using kqueue (macOS) / epoll (Linux). Very fast — not the bottleneck. Provides latency percentiles (p50, p90, p99) via Lua `done()` callback that outputs JSON. Scenarios are defined as Lua files in `load-tests/scenarios/`.
+
+Install: `brew install wrk` (macOS), `apt install wrk` (Linux).
+
+### fusionauth-load-tests (secondary comparison)
+
+Java-based load generator using virtual threads and the JDK HttpClient (`~/dev/fusionauth/fusionauth-load-tests`). Provides a "real Java client" perspective. Achieves lower absolute RPS (~50-60K vs ~110K for wrk) due to Java client overhead. Sends a small request body with each GET request.
+
+Useful for cross-validating relative server rankings. The `--tool both` flag runs both tools against each server.
+
+### Decisions
+
+- **Dropped Apache Bench**: Single-threaded, bottlenecks at ~30-50K req/s.
+- **Dropped h2load**: Originally planned for HTTP/2 benchmarks. Will reconsider when HTTP/2 lands.
+- **Dropped GitHub Actions workflow**: GHA shared runners (2 vCPU, 7GB RAM) produce poor and noisy performance numbers. Not useful for benchmarking. Benchmarks should be run on dedicated hardware.
+
+## Vendor Servers
+
+All servers implement the same 5 endpoints on port 8080:
+
+| Server | Directory | Implementation | Status |
+|---|---|---|---|
+| java-http | `load-tests/self/` | LoadHandler (5 endpoints) | Done |
+| JDK HttpServer | `load-tests/jdk-httpserver/` | `com.sun.net.httpserver.HttpServer` | Done |
+| Jetty | `load-tests/jetty/` | Jetty 12.0.x embedded | Done |
+| Netty | `load-tests/netty/` | Netty 4.1.x with HTTP codec | Done |
+| Apache Tomcat | `load-tests/tomcat/` | Tomcat 8.5.x embedded | Done (kept at 8.5.x; upgrade to 10.x deferred to HTTP/2 work) |
+
+Endpoints:
+- `GET /` — No-op (reads body, returns empty 200)
+- `GET /no-read` — No-op (does not read body, returns empty 200)
+- `GET /hello` — Returns "Hello world"
+- `GET /file?size=N` — Returns N bytes of generated content (default 1MB)
+- `POST /load` — Base64-encodes request body and returns it
+
+Each server follows the same pattern:
+- `build.savant` — Savant build config with proper dependency resolution (including `maven()` fetch for transitive deps)
+- `src/main/java/io/fusionauth/http/load/` — Server implementation
+- `src/main/script/start.sh` — Startup script
+
+## Benchmark Scenarios
+
+| Scenario | Method | Endpoint | wrk Threads | Connections | Purpose |
+|---|---|---|---|---|---|
+| `baseline` | GET | `/` | 12 | 100 | No-op throughput ceiling |
+| `hello` | GET | `/hello` | 12 | 100 | Small response body |
+| `post-load` | POST | `/load` | 12 | 100 | POST with body, Base64 response |
+| `large-file` | GET | `/file?size=1048576` | 4 | 10 | 1MB response throughput |
+| `high-concurrency` | GET | `/` | 12 | 1000 | Connection pressure |
+| `mixed` | Mixed | Rotates all endpoints | 12 | 100 | Real-world mix (wrk only) |
+
+Note: The `mixed` scenario is skipped for fusionauth-load-tests since it only supports a single URL per configuration.
+
+## Scripts
+
+### run-benchmarks.sh
+
+Main orchestrator. Builds each server via Savant, starts it, runs benchmarks, stops it, aggregates JSON results.
+
+```
+./run-benchmarks.sh [OPTIONS]
+
+Options:
+  --servers <list>     Comma-separated server list (default: all)
+  --scenarios <list>   Comma-separated scenario list (default: all)
+  --tool <name>        Benchmark tool: wrk, fusionauth, or both (default: wrk)
+  --label <name>       Label for the results file
+  --output <dir>       Output directory (default: load-tests/results/)
+  --duration <time>    Duration per scenario (default: 30s)
+```
+
+### update-readme.sh
+
+Reads the latest JSON from `load-tests/results/`, generates a markdown performance table, and replaces the `## Performance` section in the project root `README.md`.
+
+### compare-results.sh
+
+Compares two result JSON files side-by-side with normalized ratios. Useful for detecting regressions or improvements between runs.
+
+## Structured Output Format
+
+Results are saved as JSON with ISO timestamps: `results/YYYY-MM-DDTHH-MM-SSZ.json`
+
+Results are `.gitignore`d — they are machine-specific and not committed to the repo.
+
+```json
+{
+  "version": 1,
+  "timestamp": "2026-02-18T16:35:25Z",
+  "system": {
+    "os": "Darwin",
+    "arch": "arm64",
+    "cpuModel": "Apple M4",
+    "cpuCores": 10,
+    "ramGB": 24,
+    "javaVersion": "openjdk version \"21.0.10\" 2026-01-20",
+    "description": "Local benchmark"
+  },
+  "tools": {
+    "selected": "wrk",
+    "wrkVersion": "wrk 4.2.0 [kqueue] ..."
+  },
+  "results": [
+    {
+      "server": "self",
+      "tool": "wrk",
+      "protocol": "http/1.1",
+      "scenario": "baseline",
+      "config": {
+        "threads": 12,
+        "connections": 100,
+        "duration": "30s",
+        "endpoint": "/"
+      },
+      "metrics": {
+        "requests": 1117484,
+        "duration_us": 10100310,
+        "rps": 110638.58,
+        "avg_latency_us": 885.05,
+        "p50_us": 833,
+        "p90_us": 979,
+        "p99_us": 2331,
+        "max_us": 89174,
+        "errors_connect": 0,
+        "errors_read": 0,
+        "errors_write": 0,
+        "errors_timeout": 0
+      }
+    }
+  ]
+}
+```
+
+## Directory Structure
+
+```
+load-tests/
+  .gitignore                       # Ignores *.iml files
+  README.md                        # Usage documentation
+  run-benchmarks.sh                # Main orchestrator script
+  update-readme.sh                 # Parses results, updates project README
+  compare-results.sh               # Compares two result files
+  results/                         # JSON results (gitignored)
+  scenarios/                       # wrk Lua scenario files
+    baseline.lua
+    hello.lua
+    post-load.lua
+    large-file.lua
+    high-concurrency.lua
+    mixed.lua
+    json-report.lua                # Shared done() function for JSON output
+  self/                            # java-http
+  jdk-httpserver/                  # JDK built-in HttpServer
+  jetty/                           # Eclipse Jetty 12.0.x
+  netty/                           # Netty 4.1.x
+  tomcat/                          # Apache Tomcat 8.5.x
+```
+
+## Performance Optimization Investigation
+
+Once we have several benchmark runs collected, investigate optimizations to get java-http consistently #1 in RPS across all scenarios.
+
+Areas to investigate:
+- Profile under load with `async-profiler` or JDK Flight Recorder (lock contention, allocation pressure, syscall overhead)
+- Compare request processing paths against Netty and Jetty
+- Thread scheduling and virtual thread usage — blocking where we could be non-blocking?
+- Socket/channel configuration (TCP_NODELAY, SO_REUSEPORT, buffer sizes)
+- Read/write loop for unnecessary copies or allocations per request
+- Selector strategy and worker thread pool sizing for high-connection-count scenarios
+
+## Future Work
+
+- **HTTP/2 benchmarks**: Add h2load scenarios when HTTP/2 lands on the `degroff/http2` branch. Upgrade Tomcat to 10.x for HTTP/2 support.
+- **Performance optimization**: Profile and optimize java-http based on benchmark data.
@@ -0,0 +1 @@
+*.iml