|
| 1 | +# Performance Findings — JFR @ 2026-05-08 |
| 2 | + |
| 3 | +**Date:** 2026-05-08 |
| 4 | +**Status:** Documented for follow-up; not yet implemented |
| 5 | +**Source:** Java Flight Recorder run during a 45 s / 30 VU k6 load test against `ServerLauncher` on the post-refactor branch (`refactor/openapi-3.1-readiness`, commit ~`bbb3c07`). 1,449,132 HTTP requests, ~32k rps, 0% HTTP failures, 100% k6 checks passing. |
| 6 | + |
| 7 | +> The JFR file itself is not committed; this document captures the analysis. Re-run with `-XX:StartFlightRecording=…` against `com.retailsvc.http.start.ServerLauncher` to reproduce. |
| 8 | +
|
| 9 | +## TL;DR |
| 10 | + |
| 11 | +The library is functionally correct under load, but several easy wins exist for both throughput and GC pressure. Three changes — caching `Spec.basePath`, caching compiled `Pattern` for string validation, and memoising ref resolution — together remove ~470 MB of unnecessary allocation per million requests and visible CPU samples in `Spec.basePath` / `Pattern.compile`. Two further small wins clean up parameter validation. One large item (streaming body parse) is a possible future direction but is a public-API change and is parked. |
| 12 | + |
| 13 | +## Recording profile |
| 14 | + |
| 15 | +- 33,798 `jdk.ExecutionSample` events (sampling profiler) |
| 16 | +- 12,834 `jdk.ObjectAllocationSample` events |
| 17 | +- ~164 s wall-clock; load applied for the middle 45 s |
| 18 | +- Most CPU samples (21,278) are `Unsafe.park` — virtual threads parked on I/O, healthy. |
| 19 | + |
| 20 | +## Hot frames in our code (CPU samples) |
| 21 | + |
| 22 | +| Samples | Location | Notes | |
| 23 | +|---:|---|---| |
| 24 | +| 77 | `RequestPreparationFilter.doFilter` | Per-request orchestration; the bulk is `readAllBytes` | |
| 25 | +| 44 | `start.PostDataHandler.handle` | Test handler echo | |
| 26 | +| 39 | `start.ParamHandler.handle` | Test handler | |
| 27 | +| 33 | `start.GetDataHandler.handle` | Test handler | |
| 28 | +| 31 | `validate.DefaultValidator.validateString` | **Pattern recompilation** — see W2 | |
| 29 | +| 26 | `spec.Spec.basePath` | **URI recreation per request** — see W1 | |
| 30 | +| 18 | `start.PostListObjectsHandler.handle` | Test handler | |
| 31 | +| 16 | `RequestPreparationFilter.validateParameters` | Pointer-string + query-map assembly — see W4/W5 | |
| 32 | +| 14 | `spec.PathTemplate.match` | Already a precompiled regex; nothing to do | |
| 33 | +| 12 | `Request.operationId` | Inside JDK `ScopedValue.get()` traversal | |
| 34 | + |
| 35 | +## Allocation hot-spots in our code (sampled) |
| 36 | + |
| 37 | +The numbers below are extrapolated from JFR's allocation sample weights to total estimated bytes over the 45 s of load. Rank order is what matters; absolute numbers should be read as "MB-scale" indicators. |
| 38 | + |
| 39 | +| Site | Est. alloc | Cause | Action | |
| 40 | +|---|---:|---|---| |
| 41 | +| `RequestPreparationFilter.doFilter` → `byte[]` | ~24 GB | `getRequestBody().readAllBytes()` — fresh buffer per request | Defer (see W6) | |
| 42 | +| `Request.operationId` → `Object[]` | ~237 MB | JDK-internal `ScopedValue.get()` carrier traversal | Mitigate by reading the context once per handler (see W7) | |
| 43 | +| `validate.DefaultValidator.validateString` → `int[]` + `Matcher` | ~215 MB | `Pattern.compile(s.pattern())` per request | **W2** | |
| 44 | +| `RequestPreparationFilter.validateParameters` → `String` + `HashMap` | ~180 MB | Pointer-string concat + per-request query parse | **W4 + W5** | |
| 45 | +| `start.*Handler` → `LoggingEvent` (logback) | ~190 MB | `LOG.debug(...)` allocates even when level disabled in some logback paths | Cosmetic; only test handlers | |
| 46 | +| `Spec.stripPrefix` → `String` + `byte[]` | ~150 MB | `ref.substring(prefix.length())` per `$ref` resolve | **W3** | |
| 47 | +| `ScopedValue.Carrier` | ~117 MB | One per request; inherent to `ScopedValue.where(...)` | Don't address | |
| 48 | +| `Spec.basePath` → `URI` | ~105 MB | `URI.create(servers[0].url())` per request | **W1** | |
| 49 | + |
| 50 | +## Recommended changes |
| 51 | + |
| 52 | +Numbered W1–W7 in priority order (easy + high-impact first). |
| 53 | + |
| 54 | +### W1 — Cache `Spec.basePath` once at construction |
| 55 | + |
| 56 | +**Files:** `src/main/java/com/retailsvc/http/spec/Spec.java` |
| 57 | + |
| 58 | +`basePath()` currently does: |
| 59 | +```java |
| 60 | +public String basePath() { |
| 61 | + if (servers.isEmpty()) { |
| 62 | + throw new IllegalStateException("no servers declared"); |
| 63 | + } |
| 64 | + return Optional.ofNullable(URI.create(servers.get(0).url()).getPath()).orElse(""); |
| 65 | +} |
| 66 | +``` |
| 67 | + |
| 68 | +Called from `RequestPreparationFilter.stripBasePath` on **every** request. The result is spec-static — compute once. |
| 69 | + |
| 70 | +**Sketch:** |
| 71 | +- Add a `private final String basePath;` field on `Spec`. |
| 72 | +- Compute it in the canonical constructor (or in `Spec.from` factory) — same logic, run once. |
| 73 | +- Replace the method body with `return basePath;`. |
| 74 | + |
| 75 | +**Impact:** ~100 MB of URI allocs gone; ~26 CPU samples eliminated; small but consistent latency win on every request. |
| 76 | + |
| 77 | +### W2 — Cache compiled `Pattern` for string validation |
| 78 | + |
| 79 | +**Files:** `src/main/java/com/retailsvc/http/validate/DefaultValidator.java` |
| 80 | + |
| 81 | +Currently: |
| 82 | +```java |
| 83 | +if (s.pattern() != null && !Pattern.compile(s.pattern()).matcher(str).matches()) { |
| 84 | + fail(pointer, "pattern", "does not match pattern " + s.pattern(), str); |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +`Pattern.compile` is recompiled on every validation. Patterns are immutable and thread-safe. |
| 89 | + |
| 90 | +**Sketch:** |
| 91 | +- Option A (smaller): hold a `ConcurrentHashMap<String, Pattern>` field on `DefaultValidator`; `getOrDefault`/`computeIfAbsent` to memoise. Cache is unbounded but pattern strings are spec-static so it's bounded by the spec. |
| 92 | +- Option B (cleaner): compile the `Pattern` at `SchemaParser.parse(...)` time and store it on `StringSchema` as an extra component. Requires changing `StringSchema` to carry a `Pattern` (or a record containing both raw + compiled). Slightly bigger surface change, but eliminates the cache lookup too. |
| 93 | + |
| 94 | +Recommendation: **A first**, B as a refinement if profiling shows the lookup itself is hot. |
| 95 | + |
| 96 | +**Impact:** ~215 MB of `int[]`/`Matcher` allocations gone; the 31 `validateString` CPU samples drop substantially. |
| 97 | + |
| 98 | +### W3 — Memoise `Spec.resolveSchema` / `Spec.resolveParameter` |
| 99 | + |
| 100 | +**Files:** `src/main/java/com/retailsvc/http/spec/Spec.java` |
| 101 | + |
| 102 | +`stripPrefix(...)` does `ref.substring(prefix.length())` plus a map lookup on every ref resolution. Refs are spec-static; once resolved, they don't change. |
| 103 | + |
| 104 | +**Sketch:** |
| 105 | +- Two `ConcurrentHashMap` fields on `Spec`: `resolvedSchemas` and `resolvedParameters`. |
| 106 | +- `computeIfAbsent` on lookup. Cache size is bounded by the spec's component count. |
| 107 | + |
| 108 | +**Impact:** ~150 MB of allocs gone; tightens the validator hot path when schemas use `$ref`. |
| 109 | + |
| 110 | +### W4 — Skip `parseQuery` when the operation has no QUERY parameters |
| 111 | + |
| 112 | +**Files:** `src/main/java/com/retailsvc/http/internal/RequestPreparationFilter.java` |
| 113 | + |
| 114 | +`validateParameters` currently calls `parseQuery(...)` unconditionally, allocating a HashMap even for routes that take no query params (most of them in the smoke test). |
| 115 | + |
| 116 | +**Sketch:** |
| 117 | +```java |
| 118 | +private void validateParameters(HttpExchange exchange, Operation op, Map<String, String> pathParams) { |
| 119 | + Map<String, String> query = null; // build only on demand |
| 120 | + for (Parameter p : op.parameters()) { |
| 121 | + String value = switch (p.in()) { |
| 122 | + case PATH -> pathParams.get(p.name()); |
| 123 | + case QUERY -> { |
| 124 | + if (query == null) query = parseQuery(exchange.getRequestURI().getQuery()); |
| 125 | + yield query.get(p.name()); |
| 126 | + } |
| 127 | + case HEADER -> exchange.getRequestHeaders().getFirst(p.name()); |
| 128 | + case COOKIE -> null; |
| 129 | + }; |
| 130 | + … |
| 131 | + } |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +**Impact:** Eliminates a HashMap (and the `String.split("&")` work) on every GET that has no query parameters. |
| 136 | + |
| 137 | +### W5 — Precompute `Parameter` JSON-pointer at parse time |
| 138 | + |
| 139 | +**Files:** `src/main/java/com/retailsvc/http/spec/Parameter.java`, `src/main/java/com/retailsvc/http/internal/RequestPreparationFilter.java` |
| 140 | + |
| 141 | +The pointer string `"/" + p.in().name().toLowerCase(Locale.ROOT) + "/" + p.name()` is rebuilt per request per parameter. It's spec-static. |
| 142 | + |
| 143 | +**Sketch:** |
| 144 | +- Add a `private final String pointer` (or computed accessor) on `Parameter`. Possibly via a `pointer()` method that lazily memoises, since `Parameter` is currently a plain record. A non-record helper on the validate side would also work. |
| 145 | +- Replace the runtime `"/" + … + p.name()` with `p.pointer()`. |
| 146 | + |
| 147 | +**Impact:** Eliminates a few `StringBuilder` + `String` allocs per parameter per request. Small per-request, but parameters validate on every request. |
| 148 | + |
| 149 | +### W6 — (Future / public API) Stream-validate the request body |
| 150 | + |
| 151 | +**Files:** `src/main/java/com/retailsvc/http/JsonMapper.java`, `RequestPreparationFilter.java` |
| 152 | + |
| 153 | +`getRequestBody().readAllBytes()` allocates ~24 GB worth of throwaway `byte[]` over a million requests at moderate body sizes. Eliminating that means reading-and-validating in a single pass — `JsonMapper.mapFrom(InputStream)` instead of `mapFrom(byte[])`, and forgoing storing the raw bytes for handlers that use `Request.bytes()`. |
| 154 | + |
| 155 | +This is a **public API change** and changes handler ergonomics: handlers that need the raw bytes would have to opt in to caching. Defer until / unless we have a concrete throughput goal that needs it. |
| 156 | + |
| 157 | +### W7 — Read context once per handler |
| 158 | + |
| 159 | +**Files:** `src/main/java/com/retailsvc/http/Request.java` (consumer-facing) |
| 160 | + |
| 161 | +`Request.bytes()`, `parsed()`, `operationId()`, `pathParams()` each do `CONTEXT.get()` independently. Every `get()` walks the JDK's scope chain and allocates a small `Object[]` (sampled at ~237 MB across the run). Internal callers (us) and well-written external handlers should hoist a single `RequestContext`: |
| 162 | + |
| 163 | +**Optional new accessor:** |
| 164 | +```java |
| 165 | +public static RequestContext current() { return CONTEXT.get(); } |
| 166 | +``` |
| 167 | + |
| 168 | +Add as an internal escape hatch (or a documented optimisation in handler code) without removing the four typed accessors. `DispatchHandler` should use `current().operationId()` to halve its `ScopedValue.get` cost. |
| 169 | + |
| 170 | +**Impact:** Marginal but free; documents the off-thread-capture pattern users will need anyway. |
| 171 | + |
| 172 | +## Out of scope for the perf PR |
| 173 | + |
| 174 | +- `ScopedValue.Carrier` per-request allocation — JDK-API-mandated, not ours to fix. |
| 175 | +- Test-handler logging allocs — those handlers are example code, not library code. |
| 176 | +- `LinkedList`/`ReferencePipeline` use in `PostListObjectsHandler`/`ParamHandler` — same, test handlers. |
| 177 | +- The k6 / load-shape itself — that's just the measurement harness. |
| 178 | + |
| 179 | +## Sequencing and verification |
| 180 | + |
| 181 | +Suggested PR shape: **W1 → W2 → W3 → W4 → W5** in five small commits, each verifiable independently with `mvn verify`. Add a `JfrAware` benchmark or rerun the k6 + JFR session before/after to confirm impact. W7 is a one-liner addition that can ride along. |
| 182 | + |
| 183 | +W6 deserves its own design doc and probably its own PR after the immediate wins land — and only if a real throughput target makes it necessary. |
0 commit comments