Use this procedure when reproducing the checked-in Lattice benchmark methodology on a dedicated Linux host, especially when placement, affinity, or NUMA behavior is part of the claim.
- Bare-metal Linux (or a hypervisor with CPU pinning), kernel >= 5.15.
- At least 8 isolated cores via
isolcpus=on the kernel command line, plusnohz_full=andrcu_nocbs=covering the same range. cpupower frequency-set -g performanceon every isolated core.- THP set to
madviseornever; SMT disabled or explicitly accounted for. - The native library built with
cargo build --releaseand present onjava.library.path.
-
JDK 21 Temurin (the Gradle toolchain default).
-
Recommended JVM args (already set in
build.gradlefor the JMH suite):-Xms2g -Xmx2g -XX:+AlwaysPreTouch -XX:+UnlockDiagnosticVMOptions -XX:+UseParallelGCGraph runtime behavior is configured per benchmark graph through
FusionSpec,MetricsSpec,GraphPlacementSpec, andDiagnosticsSpec, not through process-global runtime properties.
./gradlew nativeBuildRelease
./gradlew jmh \
-PjmhInclude='com\.lattice\.benchmark\..*(Disruptor|OptimalPath).*' \
-PjmhJvmArgs='-Djava.library.path=native/static-topology-native/target/release'For steady-state comparison numbers use at least the checked-in head-to-head profile: 3 forks, 5 x 5 s warmup, and 8 x 5 s measurement. Longer campaigns such as 3 x 10 s warmup and 5 x 10 s measurement are appropriate for release attachments and hardware-specific reports.
For every benchmark JSON checked into docs/benchmark-results/<version>/:
- The JMH JSON output verbatim.
- A
host.mdnext to it with:lscpu,uname -a, JDK version, Gradle version, kernel cmdline, microcode, isolated CPU set, the exacttasksetinvocation if used, and whether the native library was loaded. - A short
notes.mdwith anything anomalous (thermal throttling, shared hardware, etc.).
Use the same payload model, completion semantics, observability toggles, JVM flags, and Disruptor wait strategy as the checked-in baseline when extending a "vs Disruptor" claim to a new host.
For explicit Lattice CPU-set placement, record both the requested policy and
the host's allowed CPU mask. The Linux native backend applies the intersection
of those sets, which is the right behavior under cgroups and taskset; an
empty intersection is a placement failure and should not be used for pinned
benchmark claims.