From 132818ecd97fd760f286ddc502612b8c3da39120 Mon Sep 17 00:00:00 2001
From: Thomas Cederholm <thomas.cederholm@extendaretail.com>
Date: Tue, 19 May 2026 16:24:19 +0200
Subject: [PATCH] docs: Add performance section with Java vs Node comparison

---
 README.md                  | 43 +++++++++++++----
 docs/perf-1.0.3.svg        | 94 ++++++++++++++++++++++++++++++++++++++
 docs/perf-java-vs-node.svg | 83 +++++++++++++++++++++++++++++++++
 3 files changed, 212 insertions(+), 8 deletions(-)
 create mode 100644 docs/perf-1.0.3.svg
 create mode 100644 docs/perf-java-vs-node.svg

diff --git a/README.md b/README.md
index 1e092db..a5520f8 100644
--- a/README.md
+++ b/README.md
@@ -653,17 +653,44 @@ Schemas are located under test resources folder.
 - Example requests can be found under `acceptance/k6` that can be a base for exploring the functionality.
 - The logger in the configuration needs to be enabled to get some insight into the code.
 
-## Performance and caveats
-
-The library wraps the JDK's bundled `com.sun.net.httpserver.HttpServer` and uses a virtual-thread-per-request executor. On a developer laptop (Apple Silicon, single instance, default JVM flags) it sustains roughly:
-
-- **~32k requests/second** for small JSON GETs and POSTs (~300 byte bodies), measured via `k6` at 30 sustained VUs over 45 seconds (1.4M requests, **100% of checks passing**, 0% HTTP failures).
-
-A few things to know:
+## Caveats
 
 - **Single-process model.** No horizontal scaling primitives are bundled; run multiple instances behind a load balancer for production scale.
-- **JDK HttpServer is the throughput ceiling.** It's documented as a low-throughput / dev-test server. If you need to go materially above the rates above, the handler-facing API (`Request`, `Response`, `RequestHandler`, `RequestInterceptor`, `ResponseDecorator`, `TypeMapper`) is transport-neutral by design — `Request` is built from primitives (body bytes, raw query string, path parameters, a header lookup function), not a JDK `HttpExchange`. A future enhancement could plug in a higher-throughput backend (Jetty, Helidon Níma, Netty) by writing a new adapter behind `com.retailsvc.http.internal` while leaving handlers untouched.
+- **JDK `HttpServer` is the throughput ceiling.** It's documented as a low-throughput / dev-test server. If you need to go materially above the rates shown under [Performance](#performance), the handler-facing API (`Request`, `Response`, `RequestHandler`, `RequestInterceptor`, `ResponseDecorator`, `TypeMapper`) is transport-neutral by design — `Request` is built from primitives (body bytes, raw query string, path parameters, a header lookup function), not a JDK `HttpExchange`. A future enhancement could plug in a higher-throughput backend (Jetty, Helidon Níma, Netty) by writing a new adapter behind `com.retailsvc.http.internal` while leaving handlers untouched.
 - **Per-request state uses `ScopedValue`** (Java 25, JEP 506). This matters if a handler offloads work to an executor that's not a `StructuredTaskScope`-managed child thread: the `ScopedValue` is not visible there, so the handler must capture the values it needs (e.g. `byte[] body = request.bytes();`) before submitting.
 - **Empty responses use `Response.empty()` (204) or `Response.status(code)` for other no-body statuses.** The renderer sends `responseLength = -1` (`Content-Length: 0`, no body) for any `Response` with `body() == null`, regardless of status code. Passing `0` to the JDK directly produces a chunked response with zero chunks, which is technically non-conformant — `Response` factories handle this for you.
 
+## Performance
+
+The chart below shows sustained throughput and 95th-percentile latency of `openapi-httpserver-java` under a mixed-CRUD load (50 concurrent virtual users driven by k6 for 75 s after a 20 s warmup). The bench handlers do the minimum: parse the request via the registered `TypeMapper`, hit an in-memory store, and return a `Response`. There are no synthetic sleeps, no downstream calls, and no database — what you see is the framework path itself: routing, OpenAPI validation, JSON (de)serialisation, response rendering.
+
+Two profiles, both inside a CPU- and memory-capped Docker container running Temurin 25 on an Apple M1 Max:
+
+- **2 CPU / 1 GB** — the default profile. The framework sustains over 10,000 req/s with a p95 under 7 ms.
+- **1 CPU / 512 MB** — the constrained profile. Throughput halves with CPU (the framework is CPU-bound, not lock- or IO-bound), and tighter memory pressures G1 into more old-generation collections, widening p95 to ~24 ms. The median request still completes in ~4 ms.
+
+![Performance: openapi-httpserver-java 1.0.3 throughput and p95 latency across two CPU/memory profiles](docs/perf-1.0.3.svg)
+
+### How does that compare?
+
+This is not a competition — different runtimes, different ecosystems, different sweet spots. It's a sanity check: where does `openapi-httpserver-java` land against a familiar reference point on the same hardware, under the same load?
+
+The reference point is a deliberately minimal Node.js service: Express 4 with `express-openapi-validator` against the same OpenAPI spec, handlers stripped to the same "parse, touch in-memory store, respond" shape, no synthetic sleeps. Both run inside the same 1 CPU / 512 MB Docker container; k6 drives the same mixed-CRUD workload at 50 VUs for 5 minutes of sustained measurement.
+
+| Metric (1 CPU / 512 MB) | openapi-httpserver-java | Node + Express |
+|---|---|---|
+| Aggregate throughput | **10,680 req/s** | 4,595 req/s |
+| p50 latency | 3.5 ms | 8.7 ms |
+| p95 latency | 12.8 ms | 24.0 ms |
+| p99 latency | 24.7 ms | 35.4 ms |
+
+![Java vs Node performance comparison: throughput and p95 latency at 1 CPU / 512 MB](docs/perf-java-vs-node.svg)
+
+A few things worth keeping in mind when reading this:
+
+- **Both stacks held up for the full 5 minutes** with stable tails — nothing pathological on either side.
+- **The Java advantage is mostly the JIT and the JVM thread pool.** Once hot, the framework dispatches requests through compiled code on real OS threads; Node serialises everything through a single event loop and pays for per-request JS validation in `express-openapi-validator`.
+- **It is not a 10× story.** At 1 vCPU both runtimes are CPU-bound on essentially the same task. Expect roughly 2× throughput and ~2× tighter tail latency, not a runaway.
+- The Node service used here is intentionally minimal; a tuned Fastify + AJV setup would close some of the gap, and a Go or Rust service would likely open it again in the opposite direction. The point of the comparison is to give you a feel for the ballpark, not to crown a winner.
+
 ## Known limitations or missing features
diff --git a/docs/perf-1.0.3.svg b/docs/perf-1.0.3.svg
new file mode 100644
index 0000000..f6cf6a3
--- /dev/null
+++ b/docs/perf-1.0.3.svg
@@ -0,0 +1,94 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 400" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif" font-size="12">
+  <style>
+    .title    { font-size: 14px; font-weight: 600; fill: #0f172a; }
+    .axis     { stroke: #94a3b8; stroke-width: 1; }
+    .grid     { stroke: #cbd5e1; stroke-width: 1; stroke-dasharray: 3 3; }
+    .tick     { fill: #475569; font-size: 11px; }
+    .catlabel { fill: #1e293b; font-size: 12px; font-weight: 500; }
+    .valuelbl { fill: #0f172a; font-size: 12px; font-weight: 600; }
+    .bar-rps  { fill: #2563eb; }
+    .bar-p95  { fill: #0ea5e9; }
+    .caption  { fill: #64748b; font-size: 11px; }
+    @media (prefers-color-scheme: dark) {
+      .title    { fill: #f1f5f9; }
+      .axis     { stroke: #64748b; }
+      .grid     { stroke: #334155; }
+      .tick     { fill: #cbd5e1; }
+      .catlabel { fill: #f1f5f9; }
+      .valuelbl { fill: #f8fafc; }
+      .bar-rps  { fill: #60a5fa; }
+      .bar-p95  { fill: #38bdf8; }
+      .caption  { fill: #94a3b8; }
+    }
+  </style>
+
+  <!-- ============ Panel 1: Throughput ============ -->
+  <g transform="translate(0,0)">
+    <text x="200" y="28" text-anchor="middle" class="title">Throughput (req/s)</text>
+
+    <!-- gridlines + y-tick labels at 0, 3000, 6000, 9000, 12000 over plot y∈[60,320] -->
+    <g>
+      <!-- 12000 -->
+      <line x1="70" y1="60" x2="380" y2="60" class="grid"/>
+      <text x="62" y="64" text-anchor="end" class="tick">12000</text>
+      <!-- 9000 -->
+      <line x1="70" y1="125" x2="380" y2="125" class="grid"/>
+      <text x="62" y="129" text-anchor="end" class="tick">9000</text>
+      <!-- 6000 -->
+      <line x1="70" y1="190" x2="380" y2="190" class="grid"/>
+      <text x="62" y="194" text-anchor="end" class="tick">6000</text>
+      <!-- 3000 -->
+      <line x1="70" y1="255" x2="380" y2="255" class="grid"/>
+      <text x="62" y="259" text-anchor="end" class="tick">3000</text>
+      <!-- 0 baseline -->
+      <line x1="70" y1="320" x2="380" y2="320" class="axis"/>
+      <text x="62" y="324" text-anchor="end" class="tick">0</text>
+    </g>
+
+    <!-- bars: range 0..12000 mapped to y 320..60 (260px tall plot) -->
+    <!-- 10242 → height = (10242/12000)*260 ≈ 221.9 → y = 320 - 222 = 98 -->
+    <rect x="120" y="98" width="80" height="222" class="bar-rps"/>
+    <text x="160" y="92" text-anchor="middle" class="valuelbl">10 242</text>
+
+    <!-- 5438 → height = (5438/12000)*260 ≈ 117.8 → y = 320 - 118 = 202 -->
+    <rect x="250" y="202" width="80" height="118" class="bar-rps"/>
+    <text x="290" y="196" text-anchor="middle" class="valuelbl">5 438</text>
+
+    <!-- category labels -->
+    <text x="160" y="342" text-anchor="middle" class="catlabel">2 CPU / 1 GB</text>
+    <text x="290" y="342" text-anchor="middle" class="catlabel">1 CPU / 512 MB</text>
+  </g>
+
+  <!-- ============ Panel 2: p95 latency ============ -->
+  <g transform="translate(410,0)">
+    <text x="200" y="28" text-anchor="middle" class="title">p95 latency (ms)</text>
+
+    <!-- gridlines at 0, 10, 20, 30 over plot y∈[60,320] -->
+    <g>
+      <line x1="70" y1="60"  x2="380" y2="60"  class="grid"/>
+      <text x="62" y="64"  text-anchor="end" class="tick">30</text>
+      <line x1="70" y1="147" x2="380" y2="147" class="grid"/>
+      <text x="62" y="151" text-anchor="end" class="tick">20</text>
+      <line x1="70" y1="233" x2="380" y2="233" class="grid"/>
+      <text x="62" y="237" text-anchor="end" class="tick">10</text>
+      <line x1="70" y1="320" x2="380" y2="320" class="axis"/>
+      <text x="62" y="324" text-anchor="end" class="tick">0</text>
+    </g>
+
+    <!-- bars: range 0..30 → 260px plot -->
+    <!-- 6.9 → height = (6.9/30)*260 ≈ 59.8 → y = 320 - 60 = 260 -->
+    <rect x="120" y="260" width="80" height="60" class="bar-p95"/>
+    <text x="160" y="254" text-anchor="middle" class="valuelbl">6.9</text>
+
+    <!-- 23.5 → height = (23.5/30)*260 ≈ 203.7 → y = 320 - 204 = 116 -->
+    <rect x="250" y="116" width="80" height="204" class="bar-p95"/>
+    <text x="290" y="110" text-anchor="middle" class="valuelbl">23.5</text>
+
+    <text x="160" y="342" text-anchor="middle" class="catlabel">2 CPU / 1 GB</text>
+    <text x="290" y="342" text-anchor="middle" class="catlabel">1 CPU / 512 MB</text>
+  </g>
+
+  <!-- footer caption -->
+  <text x="410" y="380" text-anchor="middle" class="caption">openapi-httpserver-java 1.0.3 · 50 VU mixed CRUD · 75 s measure window · Temurin 25 · Apple M1 Max</text>
+</svg>
diff --git a/docs/perf-java-vs-node.svg b/docs/perf-java-vs-node.svg
new file mode 100644
index 0000000..84e5c01
--- /dev/null
+++ b/docs/perf-java-vs-node.svg
@@ -0,0 +1,83 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 820 400" font-family="-apple-system,BlinkMacSystemFont,'Segoe UI',Roboto,sans-serif" font-size="12">
+  <style>
+    .title    { font-size: 14px; font-weight: 600; fill: #0f172a; }
+    .axis     { stroke: #94a3b8; stroke-width: 1; }
+    .grid     { stroke: #cbd5e1; stroke-width: 1; stroke-dasharray: 3 3; }
+    .tick     { fill: #475569; font-size: 11px; }
+    .catlabel { fill: #1e293b; font-size: 12px; font-weight: 500; }
+    .valuelbl { fill: #0f172a; font-size: 12px; font-weight: 600; }
+    .bar-java { fill: #2563eb; }
+    .bar-node { fill: #16a34a; }
+    .caption  { fill: #64748b; font-size: 11px; }
+    @media (prefers-color-scheme: dark) {
+      .title    { fill: #f1f5f9; }
+      .axis     { stroke: #64748b; }
+      .grid     { stroke: #334155; }
+      .tick     { fill: #cbd5e1; }
+      .catlabel { fill: #f1f5f9; }
+      .valuelbl { fill: #f8fafc; }
+      .bar-java { fill: #60a5fa; }
+      .bar-node { fill: #4ade80; }
+      .caption  { fill: #94a3b8; }
+    }
+  </style>
+
+  <!-- ============ Panel 1: Throughput ============ -->
+  <g transform="translate(0,0)">
+    <text x="200" y="28" text-anchor="middle" class="title">Aggregate throughput (req/s)</text>
+
+    <g>
+      <line x1="70" y1="60"  x2="380" y2="60"  class="grid"/>
+      <text x="62" y="64"  text-anchor="end" class="tick">12000</text>
+      <line x1="70" y1="125" x2="380" y2="125" class="grid"/>
+      <text x="62" y="129" text-anchor="end" class="tick">9000</text>
+      <line x1="70" y1="190" x2="380" y2="190" class="grid"/>
+      <text x="62" y="194" text-anchor="end" class="tick">6000</text>
+      <line x1="70" y1="255" x2="380" y2="255" class="grid"/>
+      <text x="62" y="259" text-anchor="end" class="tick">3000</text>
+      <line x1="70" y1="320" x2="380" y2="320" class="axis"/>
+      <text x="62" y="324" text-anchor="end" class="tick">0</text>
+    </g>
+
+    <!-- Java 10680 → (10680/12000)*260 = 231.4 → y = 88.6 -->
+    <rect x="120" y="89" width="80" height="231" class="bar-java"/>
+    <text x="160" y="83" text-anchor="middle" class="valuelbl">10 680</text>
+
+    <!-- Node 4595 → (4595/12000)*260 = 99.6 → y = 220.4 -->
+    <rect x="250" y="220" width="80" height="100" class="bar-node"/>
+    <text x="290" y="214" text-anchor="middle" class="valuelbl">4 595</text>
+
+    <text x="160" y="342" text-anchor="middle" class="catlabel">openapi-httpserver-java</text>
+    <text x="290" y="342" text-anchor="middle" class="catlabel">Node + Express</text>
+  </g>
+
+  <!-- ============ Panel 2: p95 latency ============ -->
+  <g transform="translate(410,0)">
+    <text x="200" y="28" text-anchor="middle" class="title">p95 latency (ms)</text>
+
+    <g>
+      <line x1="70" y1="60"  x2="380" y2="60"  class="grid"/>
+      <text x="62" y="64"  text-anchor="end" class="tick">30</text>
+      <line x1="70" y1="147" x2="380" y2="147" class="grid"/>
+      <text x="62" y="151" text-anchor="end" class="tick">20</text>
+      <line x1="70" y1="233" x2="380" y2="233" class="grid"/>
+      <text x="62" y="237" text-anchor="end" class="tick">10</text>
+      <line x1="70" y1="320" x2="380" y2="320" class="axis"/>
+      <text x="62" y="324" text-anchor="end" class="tick">0</text>
+    </g>
+
+    <!-- Java 12.8 → (12.8/30)*260 = 110.9 → y = 209.1 -->
+    <rect x="120" y="209" width="80" height="111" class="bar-java"/>
+    <text x="160" y="203" text-anchor="middle" class="valuelbl">12.8</text>
+
+    <!-- Node 24.0 → (24/30)*260 = 208 → y = 112 -->
+    <rect x="250" y="112" width="80" height="208" class="bar-node"/>
+    <text x="290" y="106" text-anchor="middle" class="valuelbl">24.0</text>
+
+    <text x="160" y="342" text-anchor="middle" class="catlabel">openapi-httpserver-java</text>
+    <text x="290" y="342" text-anchor="middle" class="catlabel">Node + Express</text>
+  </g>
+
+  <text x="410" y="380" text-anchor="middle" class="caption">1 CPU / 512 MB · 50 VU mixed CRUD · 5 min sustained · Temurin 25 vs Node 22 + express-openapi-validator</text>
+</svg>