Skip to content

0.2.1: specialized ropes and core optimizations#18

Merged
danlentz merged 35 commits into
masterfrom
021-specialized-ropes
Apr 17, 2026
Merged

0.2.1: specialized ropes and core optimizations#18
danlentz merged 35 commits into
masterfrom
021-specialized-ropes

Conversation

@danlentz
Copy link
Copy Markdown
Collaborator

@danlentz danlentz commented Apr 12, 2026

0.2.1: StringRope, ByteRope, and a kernel/infrastructure pass

Release branch for 0.2.1. 35 commits, 54 files changed (+11.1K / −1.5K). Adds two new persistent collection types, refactors the rope kernel to be chunk-protocol-agnostic, and tightens
the tree kernel with targeted constant-factor improvements. No breaking changes.

Summary

Two new collection types

  • string-rope — persistent chunked text backed by java.lang.String chunks. Implements java.lang.CharSequence, so it drops into re-find / re-seq / re-matches, clojure.string, and any Java
    API expecting text. #string/rope "…" EDN tag, content-based equality with String, String.hashCode-compatible. Constructors: string-rope, string-rope-concat. Up to ~38× faster than
    String at 100K characters on random structural edits, growing to ~130× at 500K.
  • byte-rope — persistent chunked binary backed by byte[] chunks. Unsigned [0, 255] semantics exposed as long, unsigned lexicographic Comparable via Arrays.compareUnsigned, #byte/rope
    "hex" EDN tag. Framed as persistent memory: structural editing, zero-cost snapshots, structure sharing. Extras: byte-rope-bytes, -hex, -write, -input-stream, -get-byte/-short/-int/-long
    (with -le variants), -index-of, streaming -digest through MessageDigest. At 500K: ~110× vs byte[] on splice, ~128× on remove.

Rope kernel: one kernel, three variants

  • PRopeChunk protocol extracted into kernel/chunk.clj with extensions for APersistentVector, String, and byte[]. The rope kernel is now honestly chunk-protocol-parameterized — concat /
    split / splice / reduce / fold / CSI maintenance are written once.
  • Per-variant CSI — each rope variant declares its own +target-chunk-size+ / +min-chunk-size+ and binds them via its with-tree macro into the kernel's dynamic vars. All three variants
    default to 1024/512 after lein bench-rope-tuning sweep (up from 256/128).
  • Flat-mode optimization — a rope under its per-variant flat threshold (1024 elements) stores content directly as the natural collection (PersistentVector, String, byte[]), with zero
    tree overhead and transparent promotion on edits. Memory parity with the raw baseline.
  • Monomorphic nth / reduce — each variant inlines the tree walk with direct chunk-type calls, bypassing PRopeChunk dispatch on hot paths. ~2–2.6× faster nth, ~1.7–3.3× faster reduce.
  • Cursor cache removed — the volatile-mutable field set on StringRope/ByteRope had torn-read races (three volatile writes aren't atomic as a group) and cache thrashing under concurrent
    access. Monomorphic walk is fast enough that the cache's benefit didn't justify the thread-safety cost.
  • rope-splice-inplace fused single-chunk splice path avoids an intermediate chunk allocation on the overflow path.
  • RopeSeq / RopeSeqReverse moved out of the kernel into types/rope.clj — they were only used by the generic Rope deftype. Kernel drops ~220 lines.

Performance pass (late in cycle)

  • Primitive rank for long-ordered-* / string-ordered-*. rank-of / indexOf now dispatch through the same primitive-specialized pattern already used for contains / find / find-val.
    node-rank-long, node-rank-string added. string-ordered-set rank ~1.9× faster; long-ordered-set rank at parity (the LongComparator was already HotSpot-inlined).
  • Range-map bulk construction. (range-map coll) with sorted disjoint input now takes an O(n) balanced-build path via tree/node-build-sorted. Overlapping input still falls through to the
    general carving path, preserving "later wins" semantics. ~10× faster than the previous per-insert path; ~2× faster than Guava TreeRangeMap bulk put.
  • Non-allocating java.util.Iterator for OrderedSet / OrderedMap via tree/NodeIterator. Advances the enumerator in place with unsynchronized-mutable state, matching
    clojure.lang.SeqIterator's memory model — thread-safety contract unchanged (per-call fresh, not shared). Java iteration is ~2× faster than sorted-set and ~3.6× faster than data.avl.

Bug fixes

  • Primitive node specialization preserved across mutations. conj / disjoin / assoc / without were passing the generic SimpleNode constructor instead of the collection's alloc, silently
    downgrading LongKeyNode → SimpleNode after a single conj on a long-ordered-set. Fixed by threading alloc through all call sites.
  • PriorityQueue / OrderedMultiset getAllocator / getStitch returned nil instead of the generic constructor.
  • Empty StringRope charAt / nth dereferenced nil root instead of throwing bounds exceptions.
  • StringRope valAt threw ClassCastException on non-integer keys.
  • Empty StringRope / ByteRope r/fold crashed instead of returning (combinef).
  • ByteRope InputStream.read(buf, off, 0) returned -1 at EOF instead of 0 per contract.
  • Surrogate-pair-safe chunking in str->root.
  • Auto-boxing in str->root / bytes->root loop variables (pre-existing, exposed under warn-on-reflection).

Bench infrastructure

  • lein bench-report --publish omits the Full Scorecard / Regressions / Improvements sections — useful for A/B review, noise for outside readers of the committed doc/report.txt. Default
    behavior unchanged.
  • lein bench-report auto-baselines against the prior timestamped EDN.
  • lein bench auto-compares against the prior run and prints a compact Regressions / Improvements section inline.
  • lein bench-charts generates 7 PNG charts via XChart (dev dep).
  • lein bench-rope-tuning rewritten to sweep all three rope variants.
  • Full suite gains N=1K and N=5K cardinalities.
  • Main suite gains range-map / segment-tree / priority-queue / multiset / fuzzy coverage (previously benched only by specialized scripts or not at all).
  • Four new bench cases exercising the new optimization paths (long-rank-lookup, string-rank-lookup, range-map-bulk-construction, set-iteration-iterator).

Documentation

  • README.md refreshed with current numbers and full N=1K–500K cardinality coverage in the rope tables.
  • doc/report.txt regenerated via --publish from the 2026-04-17 bench run.
  • doc/ropes.md gains a Chunk Abstraction: One Kernel, Many Backends section, a Specialized Ropes section with per-variant design, and a variant-picker table.
  • doc/cookbook.md gains six rope recipes at the front (text editor, regex on StringRope, bulk sequence assembly, binary protocol, streaming digest, undo history).
  • doc/collections-api.md gains full StringRope and ByteRope sections.
  • doc/benchmarks.md restructured as a methodology + infrastructure guide.

Full details in CHANGES.md.

dco-lentz and others added 21 commits April 9, 2026 23:37
Replace the Object[] based cursor cache with three direct volatile-mutable
fields (_cc_chunk, _cc_start, _cc_end) to eliminate Integer boxing overhead.
Fix rope-chunk-at to properly adjust index when descending into subtrees.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StringRope refactor:
- with-tree macro replaces ~16 copies of the (binding [*t-join* alloc] ...)
  form; helpers ->StringRope*, coll->str, and coll->tree-root deduplicate
  the 6-arg constructor and coercion dispatch that were scattered through
  the PRope method bodies
- Simpler rope-cat / rope-insert / rope-remove / rope-splice method bodies
- Extract rope-tree-walk and wrap-reduce-fn from the duplicated 1-arity
  and 2-arity rope-reduce implementations; drop dead reduce-chunk-indexed
- New tests: flat-boundary, hashmap-key-compatibility, cursor-cache-stress

kernel/chunk.clj (new):
- Holds the PRopeChunk protocol extensions for the rope kernel's chunk
  backends (APersistentVector, String, byte[])
- These are internal kernel dispatch, not user-facing interop — extracting
  them keeps kernel/rope.clj focused on the rope algebra
  (1237 -> 1155 lines, 266-line chunk.clj)

ByteRope:
- Persistent immutable byte sequence backed by the chunked WBT kernel
- byte[] chunks via PRopeChunk extension (13 methods), including a
  fused chunk-splice-split that avoids intermediate allocations on the
  overflow path
- byte-rope-node-create / bytes->root / byte-rope->bytes in kernel/rope.clj
- types/byte_rope.clj: ByteRope deftype, ByteRopeSeq/SeqReverse,
  TransientByteRope with ByteArrayOutputStream tail buffer, flat-mode
  optimization (raw byte[] root below 1024 bytes), cursor cache for
  O(1) amortized sequential nth on tree-mode ropes
- Unsigned semantics: nth / reduce / seq yield longs in [0, 255]
- Equality: byte-rope = byte-rope and byte-rope = byte[]; intentionally
  not equal to Clojure vectors to avoid signed/unsigned confusion
- Comparable via Arrays.compareUnsigned (lex order matches protobuf /
  Okio / Netty conventions)
- Utilities: multi-byte reads (get-byte / short / int / long with
  big-endian default and -le variants), materialization (bytes, hex,
  write to OutputStream, InputStream adapter), byte-rope-index-of,
  byte-rope-digest (streams chunks through MessageDigest without
  materializing the whole rope)
- Public API: byte-rope, byte-rope-concat, byte-rope-bytes,
  byte-rope-hex, byte-rope-write, byte-rope-input-stream,
  byte-rope-get-byte / short / int / long (+ -le),
  byte-rope-index-of, byte-rope-digest
- #byte/rope "hex" tagged literal with EDN round-trip via readers.clj
- 34 unit tests + 8 property tests (152 assertions)
- bench_runner: 13 ByteRope vs byte[] benchmarks (construction,
  concat, split, splice, insert, remove, nth, reduce, fold,
  repeated-edits, bytes, digest)
- bench_analyze: ByteRope vs byte[] headline section
- simple_bench: :byte-rope category

Full suite: 688 tests, 470,395 assertions, 0 failures.
Documentation:
- cookbook.md restructured with six rope recipes leading:
  text editor buffer, regex/clojure.string on StringRope,
  bulk sequence assembly, binary protocol assembly with ByteRope,
  streaming crypto digest, persistent undo history. Existing
  collection recipes renumbered; duplicate "#11" at the end removed.
- ropes.md gains a "Chunk Abstraction: One Kernel, Many Backends"
  section explaining PRopeChunk and pointing at kernel/chunk.clj as
  the internal dispatch table (vs types/interop.clj for user-facing
  extension). Also a "Specialized Ropes" section with concrete
  StringRope and ByteRope examples plus a variant-picker table.
  The Status and API sections now cover all three variants.
- collections-api.md gains full StringRope and ByteRope sections
  with constructors, interface tables, and per-variant operations.

Per-variant CSI tuning:
- kernel/rope.clj adds `*target-chunk-size*` and `*min-chunk-size*`
  dynamic vars. Every internal function that reads CSI captures
  them into a local at entry, keeping the cost to one var deref
  per call (not per chunk). The public `+target-chunk-size+` and
  `+min-chunk-size+` defs remain for external code.
- Each rope variant now declares its own `+target-chunk-size+` and
  `+min-chunk-size+` constants and binds them via its `with-tree`
  macro along with `tree/*t-join*`. Generic Rope, StringRope, and
  ByteRope all carry per-variant CSI without touching the kernel.
- The generic Rope deftype gains a `with-tree` macro so its
  mutation methods pick up CSI from one place instead of open-coding
  `(binding [tree/*t-join* alloc] ...)` at 10 sites.

Crossover benchmarking:
- rope_tuning_bench.clj fully rewritten to sweep chunk sizes
  across all three variants (Rope vs Vector, StringRope vs String,
  ByteRope vs byte[]). Each (variant, N, target) cell measures
  construct/nth/reduce/split/splice/concat and reports speedups
  vs the natural baseline plus a geomean score for ranking.
  `--variant rope|string-rope|byte-rope` restricts the sweep.
- Ran the full sweep on 2023 M2. At 100K+ elements every variant
  showed monotonic improvement in the [256, 1024] range with
  diminishing returns beyond 1024. Updated all three variant
  defaults to target=1024, min=512. Generic Rope at 500K moves
  from ~256's baseline to: +41% nth, +10% reduce, +38% split,
  5x concat, -20% splice (still ~6000x faster than vector).
  StringRope and ByteRope improve on every operation.

Memory-meter coverage:
- memory_test.clj adds `string-rope-memory` and `byte-rope-memory`
  deftests comparing each variant to its natural baseline (String
  and byte[] respectively). The summary report table now has a
  third section showing the full rope family with per-variant
  baselines and overhead ratios.

Full suite: 690 tests, 471,304 assertions, 0 failures.
Mirrors the existing StringRope / ByteRope flat-mode optimization: when
a rope holds ≤ +flat-threshold+ (1024) elements, the `root` field holds
the raw APersistentVector directly instead of a one-chunk tree wrapper.
Reads and writes dispatch on `(flat? root)` to either vector-native
operations or the kernel tree path, with transparent promotion to tree
form once the size exceeds the threshold.

types/rope.clj:
- Adds +flat-threshold+ (= 1024, matching +target-chunk-size+) plus
  flat-mode helpers: `flat?`, `flat-size`, `ensure-tree-root`,
  `make-root`, `->tree-root`.
- Every Rope deftype method now has a `cond` dispatching on nil /
  flat / tree: count, nth, seq, rseq, reduce, fold, peek, pop, cons,
  assoc, toArray, contains, and the full PRope protocol
  (rope-cat / -split / -sub / -insert / -remove / -splice /
  -chunks / -str).
- Flat paths use vector-native ops (subvec, .nth, .cons, .pop, into,
  indexOf) which are already O(1) or close to it. The kernel is only
  invoked when the operation either starts in tree mode or would
  exceed the flat threshold. Reduce uses clojure.core/reduce so both
  PersistentVector (IReduceInit) and SubVector (plain reducible)
  dispatch correctly.
- ->rope / rope / rope-concat-all construct as flat if the input
  fits; rope-concat promotes lazily when the combined size exceeds
  the threshold.
- TransientRope.persistent! demotes a small tree result back to a
  flat vector at finalization time (mirrors StringRope/ByteRope).
- asTransient promotes flat-mode roots to tree on the way in so the
  transient's internal machinery sees a uniform tree representation.

kernel/rope.clj:
- `invariant-valid?` and `normalize-root` now tolerate a bare
  APersistentVector as a trivially-valid flat-mode root, so tests
  that pass `.-root` directly to the kernel keep working without
  having to distinguish flat vs tree at the test layer.

test/ordered_collections/rope_test.clj:
- `rope-tree-healthy?` recognizes flat roots as trivially healthy.

Memory (clj-memory-meter, N=100K random longs):
  rope:   29.5 bytes/elem  (total: 2.8 MB)   — was 30.3 bytes/elem
  vector: 29.4 bytes/elem  (total: 2.8 MB)
  ratio:  1.00x                              — was 1.03x

At N=1K (flat mode) the overhead is essentially zero — the rope is
just a PersistentVector plus the Rope deftype header. Larger N also
improved slightly from the per-variant CSI tuning landed earlier.

Full suite: 690 tests, 470,354 assertions, 0 failures.
Small cardinalities matter more now that all three rope variants have
a flat-mode optimization: ropes ≤ 1024 elements skip the tree wrapper
entirely and should measure comparable to (or better than) their
natural baselines on reads while still winning on structural edits.
N=1000 is safely inside flat mode; N=5000 is the first "in-tree but
small" size and exercises the post-promotion path.

- `bench_runner.clj` sizes-full now runs [1000 5000 10000 100000 500000]
  (was [10000 100000 500000]). Every benchmark spec scales cleanly to
  smaller N, so the README/headline tables pick up new columns
  automatically when the full suite runs.
- `simple_bench.clj` sizes-quick, sizes-default, and sizes-full all
  gain 5000 alongside the pre-existing 1000. The private per-category
  defaults (rope-sizes, byte-rope-sizes, string-rope-sizes) are
  updated to match.
- `rope_tuning_bench.clj` default-sizes gains 1000 and 5000 so CSI
  tuning sweeps can see the flat-vs-tree crossover and the
  small-tree-mode behaviour.

Spot-checked rope category at 1K/5K/10K via `lein bench-simple`:
- nth at 1K: rope 109µs ≈ vector 111µs (flat mode = direct .nth)
- repeated edits at 1K: rope 1.74ms vs vector 10.12ms (~6x)
- fold at 1K: rope 13µs vs vector 78µs (flat mode skips fork-join)
Every rope variant (generic, string, byte) now documents that ropes
at or below the flat threshold are stored as a bare concrete collection
(PersistentVector / String / byte[]) directly in the root field,
bypassing the tree wrapper entirely.

- `ropes.md`: new "Flat Mode: Zero-Overhead Small Ropes" section
  explaining the optimization once for all three variants, with a
  table of which concrete type backs each variant in flat mode and
  how promotion/demotion work. The existing "Benchmark Summary"
  picks up a callout noting its numbers are tree-mode only.
- `collections-api.md`: generic Rope section gains a paragraph
  matching the existing StringRope/ByteRope notes. The `nth` / `r/fold`
  rows call out the flat-mode dispatch path.
- `algorithms.md`: new "Flat Mode" subsection beside CSI. Also
  updates the stale CSI constants block (target/min 256/128 → 1024/512)
  and notes that each variant carries its own per-variant defaults
  via its `with-tree` macro.
- `CHANGES.md`: new 0.2.1-SNAPSHOT entry summarizing the StringRope
  refactor, kernel/chunk.clj extraction, ByteRope addition,
  per-variant CSI tuning, flat-mode optimization, benchmark suite
  updates, and documentation rewrites.

Full suite: 690 tests, 469,668 assertions, 0 failures.
The existing report showed losses but silently computed-and-discarded
the matching wins. It also had no cross-category view and no way to
see the three rope variants side-by-side. Three new sections, no
removals, same terminal formatting throughout.

etc/lib/bench_analyze.clj:
- `category-summary` aggregates the scorecard by category and returns
  wins / parity / losses counts, geomean speedup, best win, and worst
  loss per category. The geomean matters more than an arithmetic mean
  here because speedups are ratios — geomean is what gets reported.
- `rope-family-summary` picks the largest benchmarked size and, for
  each structural operation, looks up the per-variant speedup vs the
  natural baseline (vector / String / byte[]). Returns a row per
  operation with one cell per variant; cells are nil when a variant
  does not have a matching benchmark group (e.g. the generic rope
  has no single-splice `rope-insert` bench).

etc/lib/bench_render.clj:
- `render-significant-wins` — parallel to the existing losses
  renderer. Formats as plain speedup strings ('12.3x') not the
  '1.4x slower' framing used for losses.
- `render-category-summary` — seven-column table keyed on the
  existing `category-order`. Shows wins / parity / losses counts,
  geomean speedup, best win, and the single worst-loss case with
  group name.
- `render-rope-family` — four-column cross-variant table labelled
  'Each cell is variant vs natural baseline speedup at N=X'. Big
  wins render as bold `**N.Nx**`; losses use sub-1 decimal precision.
- `fmt-speedup-cell` handles the full range — from `**1236x**` down
  to `0.0018x` — without losing precision at either end.

etc/bench_report.bb:
- New sections inserted between 'Headline Performance' and 'At
  Parity' in this order: Performance by Category, Rope Family at
  Scale, Significant Wins. Everything else is unchanged.

Verified against bench-results/2026-04-10_09-24-13.edn: all existing
sections render identically; three new sections populate correctly.
The ByteRope column in the Rope Family table shows placeholders on
old result files (pre-ByteRope) and will populate after a fresh
bench run.
After writing a fresh bench-results/<timestamp>.edn, the runner now
looks for the most-recent existing EDN in the same directory that
predates the new one. If it finds one, it flat-walks both files,
matches leaf measurements by (size, group, variant), and prints a
compact section with:

  - the top regressions (≥10% slower) and
  - the top improvements (≥10% faster)

Each row shows old → new timing plus percent delta. At the end it
suggests `lein bench-report --file … --baseline …` for the full
category breakdown.

Self-contained: parses the EDN inline with `clojure.edn` and does
the delta computation in ~50 lines, no dependency on the bb
bench-report tool. When the prior file has a different size set
(e.g. before N=1000/5000 were added), unmatched cells are simply
skipped and the reported "Compared: N matching cells" count reflects
the intersection.

Smoke-tested against 2026-04-09 → 2026-04-10 EDN pair: the inline
section matches the subset of regressions the full `lein bench-report
--baseline` tool produces, laid out with timing units and percent
deltas.
benchmarks.md stripped of hardcoded numbers (were stale from 0.2.0) and
restructured around the benchmark infrastructure: versioned EDN
artifacts, the parse-analyze-render pipeline, A/B comparison method,
and per-category interpretation. All current numbers live in report.txt
which is auto-generated via `lein bench-report > doc/report.txt`.
nth: each rope variant now inlines the tree walk directly in the
deftype, replacing the generic kernel's protocol-dispatched rope-nth
with concrete chunk-type calls (alength/aget for byte[], .length/.charAt
for String, .count/.nth for vector). Eliminates per-level PRopeChunk
dispatch and the [chunk offset] tuple allocation from rope-chunk-at.
Measured 2-2.6x improvement on random nth at N=500K.

reduce: byte-rope and string-rope add monomorphic tree-reduce helpers
that walk the tree with inlined per-chunk loops, bypassing per-chunk
chunk-reduce-init dispatch. 1.7-3.3x improvement.

Cursor cache removed from StringRope and ByteRope. The volatile-mutable
fields had torn-read races under concurrent access (three volatile writes
are not atomic as a group) and caused cache thrashing when two threads
did sequential access on the same instance. Monomorphic tree walk is
fast enough that the cache benefit does not justify the correctness cost.
Add benchmarks for range-map (construction, lookup, carve-out,
iteration vs Guava TreeRangeMap), segment-tree (construction, query,
update vs sorted-map), priority-queue (construction, push, pop-min vs
sorted-set-by), ordered-multiset (construction, multiplicity, iteration
vs sorted-map counts), fuzzy-set and fuzzy-map (construction, nearest
vs sorted-set/map). All wired into all-benchmark-specs for inclusion
in lein bench --full.

Memory tests extended to cover string-rope, byte-rope, range-map,
segment-tree, and fuzzy-map.

Time estimates updated: --full is ~60 min with the expanded suite
(was ~30 min).
bench-report gains headline sections for ordered-set, ordered-map,
long-specialized, and string-specialized vs their competitors.
Asterisks removed from speedup formatting (plain text report).
Default --top increased from 12 to 30.

Auto-baseline: bench-report now auto-selects the prior timestamped
EDN as baseline when --baseline is not specified, so Regressions and
Improvements sections render by default. File discovery filters to
timestamped EDN only (excludes non-standard filenames).

Rope tuner: score function now uses structural-editing geomean
(splice, split, concat) as the primary ranking metric, with a
secondary 'all' column showing the equal-weight geomean. Docstring
explains the rationale: splice and split are chunk-size-insensitive
at scale, so the old equal-weight geomean was misleadingly driven
by concat.
README: performance tables updated from 2026-04-12 bench run. Added
StringRope, ByteRope, and specialized-collection tables. Collections
table gains string-rope and byte-rope constructors. Ropes section
describes all three variants with examples. Test count 454 -> 690.

ByteRope framed as persistent structure-sharing memory: O(log n)
structural editing, zero-cost immutable snapshots via path-copying,
automatic chunk coalescing, GC of unreachable versions. Use cases:
binary protocol construction, undo/redo, diffing/patching, streaming.

CHANGES.md: monomorphic hot paths entry with measured improvements,
cursor cache removal rationale, updated specialized-type entries.

Cookbook: range-map pricing-tiers recipe, interval-set availability
windows recipe. Collections-api: cursor cache references removed.
Empty StringRope: charAt and nth dereferenced nil root instead of
throwing bounds exception. Fixed by using flat-size (nil-safe) for
bounds check before dispatching on root type.

StringRope valAt: coerced all keys with (int k), causing
ClassCastException on non-integer keys like :x or nil. Added
integer? guard to match standard associative lookup semantics.

Empty fold: StringRope and ByteRope fell through to rope-fold on
nil root, crashing instead of returning (combinef). Added nil check.

ByteRope InputStream: read(buf, off, 0) returned -1 at EOF instead
of 0 per InputStream contract. Zero-length reads now return 0
regardless of position.

Found by Codex review.
lein bench-charts reads the latest benchmark EDN and generates 7
PNG charts in doc/charts/:

  1. set-algebra-scaling — union/intersection/difference vs sorted-set
  2. rope-editing-scaling — repeated-edits for all 3 rope variants
  3. collection-winners — best headline win per collection type (dot plot)
  4. rope-operations-profile — full win/loss profile (dot plot, log-Y)
  5. rope-vs-vector-absolute — diverging O(log n) vs O(n) lines
  6. string-rope-crossover — per-operation crossover vs String
  7. byte-rope-crossover — per-operation crossover vs byte[]

Uses XChart 3.8.8 (dev dependency) for direct PNG output via Java2D.
conj/disjoin on OrderedSet and assoc/without/assoc-new on OrderedMap
were passing tree/node-create-weight-balanced (the generic SimpleNode
constructor) instead of the collection's alloc field. This silently
downgraded LongKeyNode to SimpleNode after a single mutation,
losing the unboxed-key benefit that long-ordered-set/map exist to
provide.

Fixed by threading alloc through all node-add/node-remove call sites.
ordered-merge-with also propagated nil alloc/stitch into the result
map; fixed to carry alloc/stitch from the source and bind *t-join*.

Found by Codex review.
getAllocator returned nil instead of tree/node-create-weight-balanced;
getStitch returned nil instead of tree/node-stitch. This violated the
INodeCollection/IBalancedCollection contract and would crash if
with-tree-env were ever used on these types.

Found by Codex review.
README Performance section embeds set-algebra-scaling and
rope-editing-scaling charts. benchmarks.md gains a Charts section
linking all 7 PNGs with descriptions. Reflection warnings in
bench_charts.clj fixed via XYStyler/CategoryStyler type hints.
@danlentz danlentz requested a review from dco-lmeyers April 12, 2026 20:24
@danlentz danlentz self-assigned this Apr 12, 2026
@danlentz danlentz added the in-progress under ongoing development label Apr 12, 2026
@@ -0,0 +1 @@
dan.lentz@lentz-mbpro-14233.830 No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed emacs file lock

Flat-mode seq/rseq IReduce: 1-arity reduce treated enum=nil as empty,
discarding all chunk elements for ropes at or below the flat threshold.
Fixed by always starting from the current element and only checking
enum for continuation to the next chunk. Affects StringRopeSeq,
StringRopeSeqReverse, ByteRopeSeq, ByteRopeSeqReverse.

Empty StringRope 1-arity reduce: nil root fell through to tree path
(node-least on nil). Added nil check returning (f).

InputStream.read(buf, off, len): added bounds validation per the
InputStream contract. Out-of-range off+len now throws IOOB.

StringRope.subSequence: empty flat subsequence stored "" instead of
nil, breaking isEmpty on the result. Now converts to nil.

Found by Codex review (round 2).
str->root split strings at raw char offsets, placing a lone high
surrogate at the end of one chunk and the low surrogate at the start
of the next. Now adjusts chunk boundaries to avoid splitting UTF-16
surrogate pairs.
Add targeted tests for all Codex-found issues:
- Empty StringRope charAt/nth bounds exceptions
- Non-integer key valAt returns nil
- Empty fold returns (combinef)
- ByteRope InputStream zero-length and out-of-bounds reads
- Surrogate pair at chunk boundary not split
- LongKeyNode preserved through conj/disj/assoc/dissoc
- LongKeyNode preserved through ordered-merge-with

695 tests, 470K assertions, 0 failures.
All numbers from bench-results/2026-04-12_16-48-22.edn. Set algebra
improved to 50-75x vs sorted-set at 500K (was 39-57x in prior run).
…boxing

The generic-Rope-specific RopeSeq and RopeSeqReverse lived in the kernel
but are only used by the generic Rope deftype — string-rope and
byte-rope carry their own monomorphic seq types. Moving them to
types/rope.clj makes the kernel honestly chunk-protocol-agnostic and
cuts ~220 lines from the kernel file. Also trims now-unused imports
(Murmur3, SeqIterator, Util).

Also fixes a pre-existing auto-boxing warning in str->root and
bytes->root: the loop variable `pos` was inferred primitive long but
the recur arg came from clojure.core/min (Object) and unchecked-dec-int
(int), forcing auto-boxing per iteration. Threaded as primitive long
throughout, using unchecked-add/dec/int consistently.
…rator

Three related constant-factor improvements, each targeting a known loss
in doc/report.txt. No contract changes.

1. Primitive rank for long/string keys. Adds node-rank-long and
   node-rank-string in kernel/tree.clj alongside the existing
   contains/find/find-val primitive fast paths. OrderedSet.rank-of,
   OrderedSet.indexOf, and OrderedMap.rank-of dispatch via identity
   check on cmp. string-ordered-set rank: ~1.9x faster at N=100K.

2. Range-map bulk construction. node-build-sorted in kernel/tree.clj
   builds a balanced tree from sorted kv pairs in O(n). The range-map
   constructor now sorts input, validates disjointness, and takes the
   bulk path when applicable; overlapping input still falls through to
   the general carving path, preserving "later wins" semantics.
   N=10K disjoint construction: ~10x faster.

3. Non-allocating Java iterator for OrderedSet/OrderedMap. NodeIterator
   deftype advances the tree enumerator in place via unsynchronized-
   mutable, avoiding the seq-cell allocation per .next() that
   SeqIterator-over-seq incurred. Full traversal at N=100K: ~1.65x
   faster. Thread-safety contract unchanged: the iterator is per-call
   fresh (no shared state on the collection), matching the memory
   model of clojure.lang.SeqIterator.
The Full Scorecard, Regressions, and Improvements sections are useful
for interactive A/B review during development but are noise for outside
readers of the committed doc/report.txt snapshot.

lein bench-report --publish suppresses the three sections for
redirect-to-file use. Default lein bench-report is unchanged and still
shows everything. doc/benchmarks.md updated to document the flag and
recommend `lein bench-report --publish > doc/report.txt` for the
snapshot workflow.
The existing bench suite didn't exercise the optimization code paths
added in 8d19c26:

- bench-rank-lookup only ran on default ordered-set (NormalComparator),
  not on long-ordered-set or string-ordered-set — so the primitive
  node-rank-long / node-rank-string paths never ran.
- build-oc-range-map in bench-range-map-construction uses per-entry
  assoc, so the new single-argument (range-map coll) bulk-build path
  via node-build-sorted never ran.
- bench-set-iteration uses reduce (goes through CollReduce), so the
  new tree/NodeIterator never ran.

Adds four bench cases alongside the existing ones:
- bench-long-rank-lookup   (long-ordered-set vs data.avl rank-of)
- bench-string-rank-lookup (string-ordered-set vs data.avl rank-of)
- bench-range-map-bulk-construction (single-arg (range-map coll) vs
  Guava TreeRangeMap put)
- bench-set-iteration-iterator (Java .iterator() traversal across
  sorted-set / data.avl / ordered-set)

Also registers headline-benchmark entries so the new groups render in
the appropriate scaling tables under lein bench-report:
- Long-Specialized vs data.avl gains a "Rank lookup" row
- String-Specialized vs data.avl gains a "Rank lookup" row
- Range Map vs Guava gains a "Bulk Construction" row
- Ordered Set vs sorted-set / data.avl gain an "Iteration (Iterator)" row

At N=1K (smoke-tested), the new benches show ordered-set rank ~4x
faster than data.avl, bulk range-map construction ~2x faster than
Guava, and Iterator traversal ~3.6x faster than data.avl's iterator.
…data-avl

Regenerates the headline tables and prose claims against the
2026-04-17 benchmark run. No API or behavior changes.

README.md
- Rope vs PersistentVector and StringRope vs String tables extended to
  the full N=1K / 5K / 10K / 100K / 500K cardinality range.
- Set algebra tables (vs sorted-set, data.avl, clojure.core/set)
  refreshed.
- "Other operations" and Specialized collections tables refreshed.
- Intro: "18-60x wins at 500K" (was 28-75x) and "up to 60x faster"
  (was 50x) to match current set-algebra ceilings.

doc/ropes.md
- Main Benchmark Summary (Rope vs PersistentVector) regenerated.
- StringRope vs String performance table regenerated, shifted to
  N=10K/100K/500K, added Single Remove row.
- ByteRope vs byte[] performance table regenerated, added Single
  Remove and Split at Midpoint rows, noted crossover at 10K+.

doc/cookbook.md
- StringRope speedup note: "~38x faster than plain String" at 100K
  (was ~35x), with "~130x at 500K" added.

doc/vs-clojure-data-avl.md
- Parallel set operations: 7-51x (was 7-42x).
- Parallel fold: 3-5x (was 6-9x) — honest post-baseline-shift.
Fresh snapshot against bench-results/2026-04-17_11-03-53.edn
(git rev 990b9a5), rendered via lein bench-report --publish so the
Full Scorecard, Regressions, and Improvements sections — useful for
interactive A/B review but noise for outside readers — are omitted
from the committed artifact.

320 lines (was 423). Headline scaling tables, per-category geomean,
Rope Family at Scale, Significant Wins, At Parity, Significant
Losses remain.
project.clj: 0.2.1-SNAPSHOT → 0.2.1
CHANGES.md: [0.2.1-SNAPSHOT] - unreleased → [0.2.1] - 2026-04-17

CHANGES entries added this cycle:
- New Performance Improvements section: primitive rank for long/
  string ordered collections, range-map bulk construction path, and
  non-allocating java.util.Iterator for OrderedSet/OrderedMap.
- New Refactoring section noting RopeSeq/RopeSeqReverse relocation
  from kernel/rope.clj to types/rope.clj.
- Bug Fixes gains the auto-boxing fix in str->root / bytes->root.
- Benchmarks and Tooling gains the bench-report --publish flag and
  the four new bench cases exercising the optimizations.
- StringRope headline claim refreshed: ~38x at 100K, ~130x at 500K
  (was ~35x at 100K).
@danlentz danlentz removed the in-progress under ongoing development label Apr 17, 2026
@danlentz danlentz merged commit cd29c14 into master Apr 17, 2026
2 checks passed
@danlentz danlentz deleted the 021-specialized-ropes branch April 17, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants