A workload catalogue + harness for measuring cljw against other runtimes.
Each benchmarks/NN_<name>/ directory holds a meta.yaml (name / category /
expected_output / description) and one source file per language
(bench.clj, bench.c, bench.zig, Bench.java, bench.py, bench.rb,
bench.js, bench.go). expected_output doubles as a correctness oracle —
a runtime that prints the wrong value is shown as a SKIP, never timed.
The cross-language table below is generated from the measurement YAML by
gen_cross_table.py, never hand-maintained — a hand-curated table drifts from the data (the lesson from cw's predecessor).
# cljw-only: per-bench millisecond table (builds ReleaseSafe, uses hyperfine)
bash bench/run_bench.sh # all native workloads
bash bench/run_bench.sh --bench=sieve # one workload
bash bench/run_bench.sh --no-wasm # skip the -Dwasm FFI workloads
# Cross-language comparison → measurement YAML
bash bench/compare_langs.sh --yaml=bench/cross-lang-latest.yaml
# Regenerate the Markdown table from the YAML
yq -o=json bench/cross-lang-latest.yaml | python3 bench/gen_cross_table.pyThe cross-language harness compiles each language on the fly and auto-skips any
toolchain that is absent (command -v guard); run it from inside nix develop
for the pinned, reproducible toolchains. cljw is built ReleaseSafe (the
shipped build); timing is hyperfine, reported as cold-start wall-clock
(process launch → exit, startup included).
Cold-start wall-clock (process launch → exit), µs, lower is better. Columns: ClojureWasm, then Python / Ruby / Node.js / Babashka, then Java / Go / C. Only cold-start is published: it is the metric that compares uniformly across languages. A startup-subtracted "compute" table is intentionally omitted — for the fast languages the per-run compute sits below process-spawn noise (~3 ms ± 10%), so subtracting startup would report noise, not signal.
Conditions: MacBook Pro (Mac16,8), Apple M4 Pro, 12-core (8P+4E), 48 GB RAM, macOS 26.5 (25F71), hyperfine 2 warmup + 6 runs, 2026-06-11.
Cold-start = process launch → exit (startup included). Only cold-start is shown: it is the metric that compares uniformly across languages. A startup-subtracted compute number is omitted because, for the fast languages, compute sits below process-spawn noise.
| Benchmark | ClojureWasm | Python | Ruby | Node.js | Babashka | Java | Go | C |
|---|---|---|---|---|---|---|---|---|
| fib_recursive | 23911 | 19738 | 31766 | 43100 | 23980 | 25705 | 3069 | 1090 |
| fib_loop | 5138 | 14511 | 29523 | 43113 | 12300 | 22580 | 2648 | 516 |
| tak | 9906 | 15969 | 31667 | 43309 | 13980 | 23393 | 2090 | 1706 |
| arith_loop | 49303 | 56217 | 52302 | 47965 | 47772 | 25307 | 2699 | 893 |
| map_filter_reduce | 14128 | 16333 | 30540 | 45260 | 16803 | 27568 | 3080 | 564 |
| vector_ops | 8307 | 15525 | 30477 | 44576 | 13276 | 25082 | 2471 | 937 |
| map_ops | 5382 | 14217 | 29893 | 43762 | 12143 | 23846 | 2252 | 321 |
| list_build | 6362 | 17735 | 29800 | 43924 | 12441 | 24803 | 2616 | 594 |
| sieve | 28518 | 16279 | 33127 | 44786 | 13301 | 31248 | 2435 | 831 |
| nqueens | 21000 | 22846 | 37508 | 46085 | 20502 | 25091 | 2389 | 912 |
| atom_swap | 7096 | 15961 | 30530 | 44489 | 13105 | 24248 | 2751 | 854 |
| gc_stress | 32235 | 29057 | 38666 | 45687 | 32313 | 37068 | 5521 | 1600 |
| lazy_chain | 20770 | 17741 | 30196 | 45242 | 13578 | 23347 | 2320 | 896 |
| transduce | 12303 | 17294 | 30774 | 44645 | 14293 | 23844 | 2044 | 2680 |
| keyword_lookup | 17958 | 20142 | 34684 | 40644 | 18838 | 27850 | 2907 | 1018 |
| protocol_dispatch | 8039 | 15899 | 30430 | 41616 | — | 25974 | 1633 | 1216 |
| nested_update | 25636 | 17016 | 31555 | 44769 | 13810 | 25879 | 2128 | 1251 |
| string_ops | 30166 | 28040 | 37720 | 45909 | 18232 | 28024 | 3161 | 3682 |
| multimethod_dispatch | 8302 | 17508 | 31345 | 45336 | 14470 | 24839 | 2666 | 996 |
| real_workload | 15150 | 17702 | 32775 | 46654 | 15049 | 28878 | 2445 | 1226 |
| gc_alloc_rate | 101830 | — | — | 46930 | 32255 | — | — | — |
| gc_large_heap | 66300 | — | — | 51998 | 28818 | — | — | — |
| bigint_factorial | 23691 | 15778 | 43834 | 47193 | 15787 | 32444 | 2786 | — |
| ratio_sum | 112978 | — | — | — | 36504 | — | — | — |
| stm_refs | 35145 | — | — | — | 38388 | — | — | — |
| regex_count | 44527 | 19759 | 40118 | 45999 | 15361 | 30817 | 6240 | 10600 |
| sort | 7159 | 14860 | 31128 | 45146 | 12364 | 25240 | 2315 | 1016 |
| destructure | 77354 | — | — | — | 40713 | — | — | — |
| edn_roundtrip | 22883 | — | — | — | 100632 | — | — | — |
- A few ClojureWasm facts (no comparison intended): cold-start folds in
loading the embedded AOT-compiled
clojure.core(ADR-0056), so the table is the real time-to-first-eval — forcljw(a tree-walking + bytecode-VM interpreter) that is mostly genuine work; for the compiled languages a small-workload run is mostly process spawn.cljwis builtReleaseSafe(optimised, all safety checks on). - Reproducibility. Toolchains are pinned in
flake.nix; run from insidenix developfor the full, reproducible column set. The harness auto-skips a language whose toolchain is absent.