Skip to content

Commit 69c176c

Browse files
committed
docs: Capture JFR-based performance findings for follow-up PR
Documents the seven actionable items (W1–W7) found in a 30-VU / 45 s JFR recording: cache Spec.basePath, cache compiled Pattern in the validator, memoise ref resolution, skip parseQuery when not needed, precompute parameter pointer strings, plus a deferred stream-body change and a read-context-once mitigation for ScopedValue.get traversal cost.
1 parent bbb3c07 commit 69c176c

1 file changed

Lines changed: 183 additions & 0 deletions

File tree

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Performance Findings — JFR @ 2026-05-08
2+
3+
**Date:** 2026-05-08
4+
**Status:** Documented for follow-up; not yet implemented
5+
**Source:** Java Flight Recorder run during a 45 s / 30 VU k6 load test against `ServerLauncher` on the post-refactor branch (`refactor/openapi-3.1-readiness`, commit ~`bbb3c07`). 1,449,132 HTTP requests, ~32k rps, 0% HTTP failures, 100% k6 checks passing.
6+
7+
> The JFR file itself is not committed; this document captures the analysis. Re-run with `-XX:StartFlightRecording=…` against `com.retailsvc.http.start.ServerLauncher` to reproduce.
8+
9+
## TL;DR
10+
11+
The library is functionally correct under load, but several easy wins exist for both throughput and GC pressure. Three changes — caching `Spec.basePath`, caching compiled `Pattern` for string validation, and memoising ref resolution — together remove ~470 MB of unnecessary allocation per million requests and visible CPU samples in `Spec.basePath` / `Pattern.compile`. Two further small wins clean up parameter validation. One large item (streaming body parse) is a possible future direction but is a public-API change and is parked.
12+
13+
## Recording profile
14+
15+
- 33,798 `jdk.ExecutionSample` events (sampling profiler)
16+
- 12,834 `jdk.ObjectAllocationSample` events
17+
- ~164 s wall-clock; load applied for the middle 45 s
18+
- Most CPU samples (21,278) are `Unsafe.park` — virtual threads parked on I/O, healthy.
19+
20+
## Hot frames in our code (CPU samples)
21+
22+
| Samples | Location | Notes |
23+
|---:|---|---|
24+
| 77 | `RequestPreparationFilter.doFilter` | Per-request orchestration; the bulk is `readAllBytes` |
25+
| 44 | `start.PostDataHandler.handle` | Test handler echo |
26+
| 39 | `start.ParamHandler.handle` | Test handler |
27+
| 33 | `start.GetDataHandler.handle` | Test handler |
28+
| 31 | `validate.DefaultValidator.validateString` | **Pattern recompilation** — see W2 |
29+
| 26 | `spec.Spec.basePath` | **URI recreation per request** — see W1 |
30+
| 18 | `start.PostListObjectsHandler.handle` | Test handler |
31+
| 16 | `RequestPreparationFilter.validateParameters` | Pointer-string + query-map assembly — see W4/W5 |
32+
| 14 | `spec.PathTemplate.match` | Already a precompiled regex; nothing to do |
33+
| 12 | `Request.operationId` | Inside JDK `ScopedValue.get()` traversal |
34+
35+
## Allocation hot-spots in our code (sampled)
36+
37+
The numbers below are extrapolated from JFR's allocation sample weights to total estimated bytes over the 45 s of load. Rank order is what matters; absolute numbers should be read as "MB-scale" indicators.
38+
39+
| Site | Est. alloc | Cause | Action |
40+
|---|---:|---|---|
41+
| `RequestPreparationFilter.doFilter``byte[]` | ~24 GB | `getRequestBody().readAllBytes()` — fresh buffer per request | Defer (see W6) |
42+
| `Request.operationId``Object[]` | ~237 MB | JDK-internal `ScopedValue.get()` carrier traversal | Mitigate by reading the context once per handler (see W7) |
43+
| `validate.DefaultValidator.validateString``int[]` + `Matcher` | ~215 MB | `Pattern.compile(s.pattern())` per request | **W2** |
44+
| `RequestPreparationFilter.validateParameters``String` + `HashMap` | ~180 MB | Pointer-string concat + per-request query parse | **W4 + W5** |
45+
| `start.*Handler``LoggingEvent` (logback) | ~190 MB | `LOG.debug(...)` allocates even when level disabled in some logback paths | Cosmetic; only test handlers |
46+
| `Spec.stripPrefix``String` + `byte[]` | ~150 MB | `ref.substring(prefix.length())` per `$ref` resolve | **W3** |
47+
| `ScopedValue.Carrier` | ~117 MB | One per request; inherent to `ScopedValue.where(...)` | Don't address |
48+
| `Spec.basePath``URI` | ~105 MB | `URI.create(servers[0].url())` per request | **W1** |
49+
50+
## Recommended changes
51+
52+
Numbered W1–W7 in priority order (easy + high-impact first).
53+
54+
### W1 — Cache `Spec.basePath` once at construction
55+
56+
**Files:** `src/main/java/com/retailsvc/http/spec/Spec.java`
57+
58+
`basePath()` currently does:
59+
```java
60+
public String basePath() {
61+
if (servers.isEmpty()) {
62+
throw new IllegalStateException("no servers declared");
63+
}
64+
return Optional.ofNullable(URI.create(servers.get(0).url()).getPath()).orElse("");
65+
}
66+
```
67+
68+
Called from `RequestPreparationFilter.stripBasePath` on **every** request. The result is spec-static — compute once.
69+
70+
**Sketch:**
71+
- Add a `private final String basePath;` field on `Spec`.
72+
- Compute it in the canonical constructor (or in `Spec.from` factory) — same logic, run once.
73+
- Replace the method body with `return basePath;`.
74+
75+
**Impact:** ~100 MB of URI allocs gone; ~26 CPU samples eliminated; small but consistent latency win on every request.
76+
77+
### W2 — Cache compiled `Pattern` for string validation
78+
79+
**Files:** `src/main/java/com/retailsvc/http/validate/DefaultValidator.java`
80+
81+
Currently:
82+
```java
83+
if (s.pattern() != null && !Pattern.compile(s.pattern()).matcher(str).matches()) {
84+
fail(pointer, "pattern", "does not match pattern " + s.pattern(), str);
85+
}
86+
```
87+
88+
`Pattern.compile` is recompiled on every validation. Patterns are immutable and thread-safe.
89+
90+
**Sketch:**
91+
- Option A (smaller): hold a `ConcurrentHashMap<String, Pattern>` field on `DefaultValidator`; `getOrDefault`/`computeIfAbsent` to memoise. Cache is unbounded but pattern strings are spec-static so it's bounded by the spec.
92+
- Option B (cleaner): compile the `Pattern` at `SchemaParser.parse(...)` time and store it on `StringSchema` as an extra component. Requires changing `StringSchema` to carry a `Pattern` (or a record containing both raw + compiled). Slightly bigger surface change, but eliminates the cache lookup too.
93+
94+
Recommendation: **A first**, B as a refinement if profiling shows the lookup itself is hot.
95+
96+
**Impact:** ~215 MB of `int[]`/`Matcher` allocations gone; the 31 `validateString` CPU samples drop substantially.
97+
98+
### W3 — Memoise `Spec.resolveSchema` / `Spec.resolveParameter`
99+
100+
**Files:** `src/main/java/com/retailsvc/http/spec/Spec.java`
101+
102+
`stripPrefix(...)` does `ref.substring(prefix.length())` plus a map lookup on every ref resolution. Refs are spec-static; once resolved, they don't change.
103+
104+
**Sketch:**
105+
- Two `ConcurrentHashMap` fields on `Spec`: `resolvedSchemas` and `resolvedParameters`.
106+
- `computeIfAbsent` on lookup. Cache size is bounded by the spec's component count.
107+
108+
**Impact:** ~150 MB of allocs gone; tightens the validator hot path when schemas use `$ref`.
109+
110+
### W4 — Skip `parseQuery` when the operation has no QUERY parameters
111+
112+
**Files:** `src/main/java/com/retailsvc/http/internal/RequestPreparationFilter.java`
113+
114+
`validateParameters` currently calls `parseQuery(...)` unconditionally, allocating a HashMap even for routes that take no query params (most of them in the smoke test).
115+
116+
**Sketch:**
117+
```java
118+
private void validateParameters(HttpExchange exchange, Operation op, Map<String, String> pathParams) {
119+
Map<String, String> query = null; // build only on demand
120+
for (Parameter p : op.parameters()) {
121+
String value = switch (p.in()) {
122+
case PATH -> pathParams.get(p.name());
123+
case QUERY -> {
124+
if (query == null) query = parseQuery(exchange.getRequestURI().getQuery());
125+
yield query.get(p.name());
126+
}
127+
case HEADER -> exchange.getRequestHeaders().getFirst(p.name());
128+
case COOKIE -> null;
129+
};
130+
131+
}
132+
}
133+
```
134+
135+
**Impact:** Eliminates a HashMap (and the `String.split("&")` work) on every GET that has no query parameters.
136+
137+
### W5 — Precompute `Parameter` JSON-pointer at parse time
138+
139+
**Files:** `src/main/java/com/retailsvc/http/spec/Parameter.java`, `src/main/java/com/retailsvc/http/internal/RequestPreparationFilter.java`
140+
141+
The pointer string `"/" + p.in().name().toLowerCase(Locale.ROOT) + "/" + p.name()` is rebuilt per request per parameter. It's spec-static.
142+
143+
**Sketch:**
144+
- Add a `private final String pointer` (or computed accessor) on `Parameter`. Possibly via a `pointer()` method that lazily memoises, since `Parameter` is currently a plain record. A non-record helper on the validate side would also work.
145+
- Replace the runtime `"/" + … + p.name()` with `p.pointer()`.
146+
147+
**Impact:** Eliminates a few `StringBuilder` + `String` allocs per parameter per request. Small per-request, but parameters validate on every request.
148+
149+
### W6 — (Future / public API) Stream-validate the request body
150+
151+
**Files:** `src/main/java/com/retailsvc/http/JsonMapper.java`, `RequestPreparationFilter.java`
152+
153+
`getRequestBody().readAllBytes()` allocates ~24 GB worth of throwaway `byte[]` over a million requests at moderate body sizes. Eliminating that means reading-and-validating in a single pass — `JsonMapper.mapFrom(InputStream)` instead of `mapFrom(byte[])`, and forgoing storing the raw bytes for handlers that use `Request.bytes()`.
154+
155+
This is a **public API change** and changes handler ergonomics: handlers that need the raw bytes would have to opt in to caching. Defer until / unless we have a concrete throughput goal that needs it.
156+
157+
### W7 — Read context once per handler
158+
159+
**Files:** `src/main/java/com/retailsvc/http/Request.java` (consumer-facing)
160+
161+
`Request.bytes()`, `parsed()`, `operationId()`, `pathParams()` each do `CONTEXT.get()` independently. Every `get()` walks the JDK's scope chain and allocates a small `Object[]` (sampled at ~237 MB across the run). Internal callers (us) and well-written external handlers should hoist a single `RequestContext`:
162+
163+
**Optional new accessor:**
164+
```java
165+
public static RequestContext current() { return CONTEXT.get(); }
166+
```
167+
168+
Add as an internal escape hatch (or a documented optimisation in handler code) without removing the four typed accessors. `DispatchHandler` should use `current().operationId()` to halve its `ScopedValue.get` cost.
169+
170+
**Impact:** Marginal but free; documents the off-thread-capture pattern users will need anyway.
171+
172+
## Out of scope for the perf PR
173+
174+
- `ScopedValue.Carrier` per-request allocation — JDK-API-mandated, not ours to fix.
175+
- Test-handler logging allocs — those handlers are example code, not library code.
176+
- `LinkedList`/`ReferencePipeline` use in `PostListObjectsHandler`/`ParamHandler` — same, test handlers.
177+
- The k6 / load-shape itself — that's just the measurement harness.
178+
179+
## Sequencing and verification
180+
181+
Suggested PR shape: **W1 → W2 → W3 → W4 → W5** in five small commits, each verifiable independently with `mvn verify`. Add a `JfrAware` benchmark or rerun the k6 + JFR session before/after to confirm impact. W7 is a one-liner addition that can ride along.
182+
183+
W6 deserves its own design doc and probably its own PR after the immediate wins land — and only if a real throughput target makes it necessary.

0 commit comments

Comments
 (0)