Skip to content

New Arena Range#855

Draft
mjp41 wants to merge 31 commits into
mainfrom
BackendArenaRange
Draft

New Arena Range#855
mjp41 wants to merge 31 commits into
mainfrom
BackendArenaRange

Conversation

@mjp41

@mjp41 mjp41 commented Jun 9, 2026

Copy link
Copy Markdown
Member

This PR replaces them with a new backend that handles snmalloc's full
(exponent, mantissa) size class sequence end-to-end, and still maintains
the snmalloc invariant that any allocation is aligned by the largest power
of two that divides its size.

Free blocks are kept in a set of bins, with a few extra bins to handle the
alignment subtleties: a 5-unit block at the wrong address cannot serve a
4-unit request (because of the higher alignment), whereas a 6-unit block
can serve every smaller size. A small precomputed mask per request hides
exactly those bins that cannot serve it.

Consolidation is Doug Lea�style: on free we look left and right and merge
maximally. Blocks live in two layers of red-black trees � one tree per
non-empty bin (for selection within a bin) and one address-keyed tree over
all free blocks in the allocator (for left/right neighbour lookup on
free). Because the trees scale, the same backend stacks at multiple levels
of the range pipeline, the way buddy allocators used to.

mjp41 and others added 29 commits May 15, 2026 12:37
Co-authored-by: Copilot <copilot@github.com>
Phase 1 of the BackendArena refactor (see PLAN.md). Introduces
src/snmalloc/backend_helpers/backend_arena_bins.h, which owns the
chunk-unit size-class scheme and the non-empty-bins bitmap that
later phases will use to drive bin selection inside BackendArena.

Public surface (the integration contract for future phases):

  * range_t, carve_t, carve(block, n_chunks), max_supported_chunks().
  * Nested Bitmap with add(block), find_for_request(n_chunks),
    clear(bin_id), and TOTAL_BINS.

Everything else (the size-class encoding, the per-SC tables, the
free-side classifier bin_index) is private. The unit test reaches
it via a friend struct BackendArenaBinsTestAccess<B> that is only
forward-declared in the header and defined in the test translation
unit, so the production header carries no test-only surface.

Implementation:

  * Two power-of-two-sized rodata tables indexed by raw sc id with
    shift+add. bitmap_info_t (4 words via alignas) feeds
    Bitmap::find_for_request; carve_info_t (2 words) feeds carve
    and the free-side cascade-fit predicate.
  * bitmap_info_t fields (start_word, first_mask, second_mask) are
    pre-shifted into the bitmap's word layout so find_for_request
    is two ANDs on the hot word + word-boundary fall-through.
  * Tables are populated at constexpr build time by BinTable()
    consuming the canonical bin_subsets table; the strict-chain
    invariant on bin_subsets is checked at compile time via throw
    in the constexpr constructor.
  * Fast path uses the runtime CLZ intrinsic via the new
    bits::to_exp_mant<MANTISSA_BITS, LOW_BITS> (paired with the
    existing to_exp_mant_const); the _const variant is restricted
    to constexpr table construction and test static_asserts.
    bits::prev_pow2_bits / prev_pow2_bits_const are added alongside
    for symmetric runtime / constexpr access.

The new test cross-checks bin classification, carve, and
find_for_request against a brute-force scanner derived directly
from bin_subsets, for B in {1, 2, 3}. Exhaustive single-bit and
multi-bit randomised bitmap states are covered, plus word-boundary
straddle cases enumerated automatically from the table.

No production code path is changed: BackendArenaBins<B> is unused
in the build until later phases compose it into BackendArena.

Also lands PLAN.md (single forward-looking spec for the whole
BackendArena refactor) and claude.md (development guidance for
this branch).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a public RBTree method that returns the strict neighbours of a
probe value K in a single root-to-leaf descent:

  - every left turn (parent > K) records the parent as the current
    successor candidate
  - every right turn (parent < K) records the parent as the current
    predecessor candidate

At loop exit the tightest neighbours are returned as
`stl::Pair<K, K>{pred, succ}`; either component is `Rep::null` when no
such neighbour exists.

The "K not in tree" precondition is asserted via SNMALLOC_ASSERT and
expands to nothing in Release. BackendArena, the planned caller, relies
on the invariant that two free blocks cannot share a starting address.

test_neighbours exercises the algorithm against std::set::lower_bound /
upper_bound as oracle. Boundary probes (K=0, K=size+1) plus random
probes that skip oracle hits keep every call within the precondition.
The sweep reuses the existing test()'s size range but caps to the first
few seeds per size to keep the per-test time budget in check.

PLAN.md Phase 2 spec records the K-not-in-tree precondition.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce BackendArena, a free-range allocator that manages chunks
within a bounded arena using a dual red-black-tree scheme:

- Bin trees: one per size-class bin, for best-fit allocation lookups
  driven by a non-empty-bins bitmap.
- Range tree: keyed by address, for O(log n) neighbour lookup during
  consolidation of adjacent free blocks.

Key design decisions:
- Single-chunk (min-size) blocks live only in bin tree 0, not the
  range tree, keeping range-tree overhead proportional to multi-chunk
  blocks. The min-size bin is probed as a fallback during consolidation.
- Three-variant encoding (Min/TwoMin/Large) in pagemap metadata bits
  avoids a range-tree lookup for the common 1-chunk and 2-chunk cases.
- WordRef handle and TreeRep<RefFn> template follow the existing
  BackendStateWordRef / BuddyChunkRep patterns from largebuddyrange.h.
- Consolidation in add_block checks predecessor then successor,
  merging adjacent blocks and re-inserting the result.
- remove_block uses Bins::carve to split oversized blocks, re-inserting
  remainders.

Also:
- Add neighbours() to RBTree: single-descent strict-neighbour query.
- Add for_each() to RBTree: in-order traversal for invariant checking.
- Make BackendArenaBins::bin_index public (sole consumer is BackendArena).
- Add BackendArenaBins::Bitmap::test() for invariant verification.
- Five-clause structural invariant gated on bool parameter (defaults to
  Debug), checked at entry/exit of add_block and remove_block.
- Comprehensive test suite: word-level round-trips, tree operations,
  empty-state invariant, add/remove without consolidation, consolidation
  case matrix (8 pred/succ combinations), overflow detection, and
  randomised stress test with oracle validation (50 seeds x 500 ops).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests two BackendArena instances sharing a single MockRep pagemap:
- Basic migration: blocks move between arenas
- Consolidation after migration: gap block consolidates with neighbours
- Randomised stress: 50 seeds x 500 ops with add/remove/migrate

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 9 of the BackendArena refactor:

- Make BackendArena fully generic over its Rep, mirroring the
  Buddy/Rep layering. The class no longer holds any bit-layout
  constants; Rep supplies the full RBTree Rep for both the bin
  trees and the range tree, owning red-bit (and any tag-bit)
  packing privately.
- Rep concept now requires:
    using BinRep         -- full RBTree Rep for the bin trees
    using RangeRep       -- full RBTree Rep for the range tree
    get_variant / set_variant
    get_large_size_chunks / set_large_size_chunks
    can_consolidate(higher_addr) -> bool
- Add can_consolidate checks in add_block before each (predecessor
  and successor) merge, and update the invariants to tolerate
  boundary-blocked adjacency.
- MockRep grows inner BinRep / RangeRep structs that each provide
  the full RBTree Rep interface over the mock-entry array, with a
  private red-bit at bit 8.
- New tests verify that can_consolidate returning false at a
  specific address prevents predecessor- and successor-side merges
  independently, including at min-block boundaries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 10 of the BackendArena work. Adds the BackendArenaRange wrapper
that drops into the LargeBuddyRange slot, generalises BackendArena and
BackendArenaBins on MIN_SIZE_BITS, and converts the arena/range API
boundary to bytes throughout.

* BackendArenaRange<REFILL_SIZE_BITS, MAX_SIZE_BITS, Pagemap,
  MIN_REFILL_SIZE_BITS> with a PagemapRep that packs variant tag, RB
  red bit and the consolidated large-block size into the first pagemap
  word, and uses the second word for in-tree links. Provides
  alloc_range / dealloc_range / add_range over the bin-tree arena.
* parent_dealloc unifies the old parent_dealloc_range and
  dealloc_overflow paths; add_range uses bits::align_up /
  bits::align_down for parent-input trimming.
* BackendArenaBins<B, MIN_SIZE_BITS> generalises the bin scheme so its
  range_t, carve and find_for_request all speak bytes (multiples of
  UNIT_SIZE = 1 << MIN_SIZE_BITS). Tests cover MIN_SIZE_BITS in {0, 4,
  14}.
* BackendArena<Rep, MIN_SIZE_BITS, MAX_SIZE_BITS>: add_block /
  remove_block / variant_of / insert_block / range_from_addr /
  invariants all work in bytes. remove_block returns a scalar address
  (0 = failure); the size half of the old pair was tautological.
  CHUNKS_BITS / addr_to_chunk / chunk_to_addr removed.
* PagemapRep::get_large_size / set_large_size are bytes-in / bytes-out;
  storage still scales by MIN_SIZE_BITS so the shifted field fits a
  pagemap word.
* Tests: func-backend_arena_range exercises alloc/dealloc/refill/large
  paths against a mock parent; func-backend_arena and
  func-backend_arena_bins updated for the bytes-throughout convention
  (chunk_size(N) helper at the test boundary).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Every `git commit`, `--amend`, `push`, `reset`, `rebase`, or
`gh pr create` must be preceded by an explicit ask_user approval for
that specific commit/PR. "Begin the next phase" does not authorise
committing later work — only "commit this" for the change in hand
counts as approval.

If a commit has already been made without approval, offer
`git reset --soft HEAD~1` to undo it while preserving the staged
changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mechanical substitution of every LargeBuddyRange instantiation in the
default in-tree range pipelines:

- src/snmalloc/backend/standard_range.h (GlobalR, LargeObjectRange).
- src/snmalloc/backend/meta_protected_range.h (GlobalR, CentralObjectRange,
  CentralMetaRange, the conditional_t huge-page cache, ObjectRange,
  MetaRange).

After this change snmalloc uses the BackendArena bin-tree allocator
instead of the power-of-two buddy for all large-range management in
the default pipelines. LargeBuddyRange and BuddyChunkRep remain in
the tree, available for alternative configurations.

Two issues uncovered during Phase 12 testing and fixed here:

1. backend_arena.h: BackendArena::add_block's successor-min branch
   called Rep::can_consolidate(succ_addr) before contains_min(succ_addr)
   confirmed succ_addr is in our region. For a block added at the very
   top of a registered region (e.g. last 8 MiB of a 256 MiB fixed
   region), succ_addr = addr + size sits one chunk past the pagemap's
   mapped backing, and the can_consolidate probe segfaults. The fix
   reorders the checks so the tree-membership test gates the pagemap
   read, matching the documented pattern in buddy.h:90-93.

   Regression coverage: MockRep gains a per-chunk `boundary` field on
   `mock_entry`. `MockRep::can_consolidate(addr)` now returns
   `!mock_store[mock_index(addr)].boundary` — faithful to the real
   `PagemapRep::can_consolidate` reading `entry.is_boundary()`. The
   `mock_index` bounds assertion fires on any out-of-range probe, so
   the unsafe pattern trips in unit tests rather than only as a
   segfault in production. A new test_block_at_arena_top_edge adds a
   block whose succ_addr would address chunk MOCK_ARENA_CHUNKS;
   without the reorder this reproduces the original failure.

   This unification also subsumed the previous BoundaryMockRep and its
   boundary_addrs global std::set: the four boundary tests now run on
   Arena<K> and set mock_store[mock_index(addr)].boundary = true
   instead. Net -35 lines in backend_arena.cc.

2. backend_arena_bins.h: the BinTable constexpr constructor used
   throw "..." as a constexpr-eval-fails trick to surface invariant
   violations as compile errors. throw requires exception support,
   which is disabled in the main allocator (-fno-exceptions), so this
   broke Phase 12 builds. Replaced with SNMALLOC_CHECK(false && "..."),
   which calls a non-constexpr error path and achieves the same
   compile-time failure without runtime exception machinery.

Full ctest suite passes (86/86, --timeout 120 -j 4).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On closer inspection at Phase 13 start, the two-path conditional in
BackendArenaRange::refill turns out to be load-bearing rather than
vestigial:

- Aligned-parent path serves caller sizes up to (1 << MAX_SIZE_BITS) - 1;
  unaligned-parent path caps at ~REFILL_SIZE / 2 because of its
  while (needed_size <= refill_size) guard. Unifying on the unaligned
  strategy reduces capability for aligned-parent configs.

- The aligned-parent carve shortcut is precise, not a perf
  optimisation: it hands the caller's `size` bytes back directly and
  calls add_range with refill_size - size, which is strictly less than
  refill_size and so satisfies add_block's size < 2^MAX_SIZE_BITS
  precondition even when REFILL_SIZE_BITS == MAX_SIZE_BITS (the
  LargeObjectRange config). A unified "add the whole refill then
  recurse" path violates that precondition for the same config, and
  the workarounds (cut LocalCacheSizeBits by 1, or bump MAX_SIZE_BITS
  by 1) carry real cost for no behavioural win.

- LargeBuddyRange would still consume Aligned under the agreed-minimal
  (a)+(ii) scope, so the Aligned field's footprint in pass-through
  ranges doesn't shrink — defeating the only structural-cleanup
  motivation.

The BackendArena refactor (Phases 1-12) ends with Phase 12.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the tagged small/large encoding and the leading-zero-count
large-class indexing with a single uniform exp+mantissa scheme:

    value == 0                              : unmapped sentinel
    value in [1, 1 + NUM_SMALL_SIZECLASSES) : small  (sc = value - 1)
    value in [1 + NUM_SMALL_SIZECLASSES,
             1 + NUM_SMALL_SIZECLASSES + NUM_LARGE_CLASSES)
                                            : large  (lc = ...)

Small classes use `from_exp_mant(sc)` (unchanged). Large classes
continue the same exp+mantissa namespace as
`from_exp_mant(NUM_SMALL_SIZECLASSES + lc)`. The discriminator tag bit
is gone — small and large share one contiguous index space — and the
sentinel slot 0 lets the size-lookup fast path return 0 / 0 for
unmapped pointers without a branch.

The `SIZECLASS_REP_SIZE` / `REMOTE_BACKEND_MARKER` / `REMOTE_MIN_ALIGN`
chain is re-derived from the new `SIZECLASS_BITS` (renamed from
`TAG_SIZECLASS_BITS`); RED_BIT / VARIANT_SHIFT / LARGE_SIZE_SHIFT in
`backend_arena_range.h` and RED_BIT in `largebuddyrange.h` derive
from the new public `MetaEntryBase::BACKEND_LAYOUT_FIRST_FREE_BIT` so
future widenings propagate automatically.

A new `MAX_LARGE_SIZECLASS_SIZE` constant gates user-supplied sizes at
the API boundary (`alloc_not_small`, `round_size`, `check_size`,
`rust_realloc`) — replacing the loose `> 2^63` bound. `ENCODED_ADDRESS_BITS`
caps the encoding at `BITS - 1` so the constant survives 32-bit
platforms where `DefaultPal::address_bits == BITS`.

The pre-Phase-13 `large_size_to_chunk_sizeclass` helper is removed —
its `+NUM_SMALL_SIZECLASSES` / `-NUM_SMALL_SIZECLASSES` round-trip
through an `lc` index cancels in the uniform scheme, so
`size_to_sizeclass_full`'s large branch inlines the `to_exp_mant`
directly.

Front-end semantics are unchanged: `large_size_to_chunk_size` still
returns `next_pow2(size)` and the front end still reserves pow2 chunk
sizes. The non-pow2 large sizeclasses exist in `sizeclass_metadata`
(with `slab_mask = info.align - 1`) but are unreachable from
`size_to_sizeclass_full` until Phase 15 drops the `next_pow2` rounding.

Tests:
- `sizeclass.cc`: sentinel sanity, raw-value adjacency, range disjoint,
  large monotonicity, pow2 round-trip, non-pow2 rounds up.
- `rounding.cc`: extends to pow2 large sizeclasses, verifying
  `index_in_object` / `is_start_of_object` at representative offsets.
- `cheri.cc`: large-class verification loop bound updated to
  `NUM_LARGE_CLASSES`.
- Loop bounds in tests use `ENCODED_ADDRESS_BITS` to avoid
  `bits::one_at_bit(BITS)` UB on 32-bit.

ctest: 86/86 passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
get_mut<true> base-adjusted p before calling register_range, which then
re-applied the base subtraction internally and tripped its out-of-range
guard for legitimate in-range addresses. The path is reachable on PALs
without LazyCommit (e.g. PALNoAlloc<PALLinux>) when get<true>/get_mut<true>
is called on an in-range address of a bounded pagemap.

Move the register_range call before the p = p - base adjust so it sees
the un-adjusted address that its bounds check expects. Add a regression
test in func-pagemap that wraps DefaultPal with a stub stripping
LazyCommit; this exercises the previously-broken path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Encode (sizeclass, slab-offset) jointly in the pagemap entry so the
front end can recover the allocation start for an arbitrary interior
chunk of a multi-slab-tile large allocation. The front end still
only issues pow2 large requests, so every materialised entry today
has offset=0; this lays the groundwork for Phase 15+ non-pow2 large
support without front-end changes.

Key pieces:
- offset_and_sizeclass_t packs sizeclass into the low SIZECLASS_BITS
  and per-chunk offset into the next OFFSET_BITS of one word.
- Backend::alloc_chunk loops over slab tiles, writing each tile's
  slab_index into the offset bits of its pagemap entry.
- SizeClassTable is split into three by purpose:
  * start_ (sizeclass_data_start, 32B/row, indexed by osc): hot
    path for start_of_object on every dealloc.
  * align_ (sizeclass_data_align, 16B/row, indexed by sc): used by
    is_start_of_object alignment check in -check builds.
  * slab_ (sizeclass_data_slab, 4B/row, indexed by sc): cold; slab
    init thresholds.
- start_of_object branches on osc.offset() == 0 (testable from bits
  already loaded in osc.raw()), so the offset=0 hot path skips the
  offset_bytes load and offset-shift arithmetic. Combined with the
  table split, perf-external_pointer-fast matches the baseline
  (~290 ms median) with no regression; perf-singlethread-check is
  within noise.
- New src/test/func/large_offset targeted test reaches the
  multi-slab-tile branch via the public backend API.
- check_invariant in BackendArena now uses SNMALLOC_CHECK rather
  than SNMALLOC_ASSERT, so callers that opt in via enabled=true get
  the invariant checks even in Release builds (which is what the
  tests want); the #ifndef NDEBUG wrapper is no longer needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
snmalloc::alloc<size, Conts, align>() applies aligned_size(align, size)
internally; snmalloc::dealloc<size>(p) did not. When the alignment
upgrade pushed the reservation into a different sizeclass than `size`,
check_size fired under the check flavour. Reproducer:
alloc<33*1024, _, 128*1024>(); dealloc<33*1024>(p)
=> "Dealloc rounded size mismatch: 0xa000 != 0x20000".

Merge dealloc<size> into a single template `dealloc<size, align = 1>`
applying aligned_size(align, size) before check_size. The default
align=1 preserves existing one-argument-template behaviour because
aligned_size(1, size) == size.

Move aligned_size from sizeclasstable.h to sizeclassstatic.h so the
test library header can use it without pulling in the full runtime
sizeclass machinery. Existing consumers still get it transitively via
the pal.h -> ds_core.h -> sizeclassstatic.h include chain.

Mirror the merge in the test library header: dealloc<size, align=1>
and alloc<size, ZeroMem, align=1>. Add aligned_dealloc to
TESTLIB_ONLY_TESTS.

Includes src/test/func/aligned_dealloc/ with the canonical reproducer
and additional (S, A) pairs.

Also captures the planning context in PLAN.md (pre-Phase-15 + Phase 15
sections).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A request like malloc(70 KiB) at the default INTERMEDIATE_BITS = 2
now reserves the smallest enclosing exp+mantissa sizeclass (80 KiB)
rather than next_pow2(size) (128 KiB). Sizes that already land on a
class boundary reserve exactly that size; mid-exponent sizes shrink
by up to ~33%.

Mechanics:

  sizeclasstable.h
    - size_to_sizeclass_full drops next_pow2(size); to_exp_mant ceils
      directly to the smallest enclosing class.
    - round_size's large branch matches the reservation
      (sizeclass_full_to_size of the chosen class), so
      DefaultConts::success zeroes exactly the reservation for calloc.
    - large_size_to_chunk_size removed (the one caller in corealloc
      uses sizeclass_full_to_size(sc) directly with a hoisted sc).
    - compute_max_large_slab_index tightened to meta.size / slab_size
      - 1 (the actual worst case the runtime pagemap loop writes).

  backend.h
    - alloc_chunk's pow2 precondition relaxed to the slab-tile
      invariant: size is a positive multiple of slab_size.

  corealloc.h
    - large alloc path hoists size_to_sizeclass_full / chunk size into
      locals so each table lookup happens once.

Tests:

  - large_offset_frontend/: new front-end counterpart to
    large_offset/. Exhaustively round-trips every large sizeclass and
    walks every chunk-aligned interior pointer for a boundary and a
    non-boundary request.
  - memory/: adds test_calloc_non_pow2_large as a calloc zeroing smoke
    test; clamps the end-of-stride probe in check_external_pointer_large
    since non-pow2 reservations are tighter than the next pow2.
  - sizeclass/: deterministic round_size gate over every large class
    (S maps to itself; S_prev+1 ceils to S).
  - large_offset/: backend test now passes the chunk-multiple reserve
    (= sizeclass_full_to_size(sc)) instead of next_pow2(size).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- RBTree::neighbours's "value absent" precondition is now release-checked
  with a single post-descent comparison. A duplicate key would otherwise
  return an arbitrary neighbour pair that callers (e.g.
  BackendArena::add_block) would consume as valid, corrupting
  dual-tree consolidation. Equivalent to the prior per-node debug assert
  for BST-ordered trees, with no per-node cost.

- BackendArena::range_from_addr's Large branch asserts the structural
  invariants on the size returned by Rep::get_large_size (> TWO_UNITS,
  unit-aligned, below the arena-size cap). Debug-only: this is an internal
  Rep invariant, not defense against external corruption.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Range tree stores chunk-aligned addresses in Word::Two of the pagemap
entry. The markerless ownership discriminator
(is_backend_owned == (remote_and_sizeclass & COMBINED_MASK) == 0) requires
those addresses to have zero in the BACKEND_RESERVED_MASK_WORD_TWO
(= COMBINED_MASK) bits — i.e., the reserved mask must fit entirely below
the chunk alignment. The invariant held silently for default configs; a
future config change shrinking the chunk alignment or growing
INTERMEDIATE_BITS would have turned backend-owned writes into spurious
frontend entries with no compile-time guard.

This assertion mirrors the existing Word::One BIN_META_MASK assert.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
unlink_block called bitmap.add(range) purely to compute the bin id,
relying on the idempotent set-bit side effect being harmless on the
unlink path. Bins::bin_index is the pure classifier with no side effect
and a name that matches the intent. The bitmap-tree consistency
invariant is unchanged (still verified by check_invariant Clause 4).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A null probe value can never collide with a tree entry (null is not
insertable), so it must be exempt from the post-descent duplicate
check. The redblack functional test deliberately probes with key 0
to exercise the boundary case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alloc_range and dealloc_range both bypassed to the parent when the
request equalled or exceeded the bin range, but used mask_bits(N) =
(1<<N)-1 while BackendArena::add_block's precondition is
size < one_at_bit(N) = 1<<N. The mask-based test was off-by-one but
harmless because aligned chunk sizes never land on (1<<N)-1.
Replace both call sites with a single is_too_large helper so the
boundary cannot drift between paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Note that MetaEntryBase::operator= is load-bearing: the pagemap
  writes back through it, so META_BOUNDARY_BIT survives every
  metadata mutation without explicit preservation by callers.
- Correct the BackendStateWordRef single-pointer ctor comment: it
  is required by RBRepMethods for sentinel construction from
  &Rep::root, not a legacy convenience.
- BackendArena: drop spurious const on contains_min and
  check_invariant (no const callers exist), removing the
  const_cast laundering.
- BackendArena::check_invariant: lift the five clause titles into
  the docblock; trim the inline labels to single-line markers.
- BackendArena::add_block: drop cross-file line-number reference
  to buddy.h.
- backend_arena_range.h / backend_arena_bins.h: replace
  SNMALLOC_CHECK(false && "msg") with SNMALLOC_CHECK_MSG.
- backend_arena_range.h: rename `auto refill` to `refill_range`
  to avoid shadowing the enclosing function.
- Tests: use "test/..." quoted include style for consistency.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces the building blocks for Phase A of the SmallBuddyRange ->
SmallArenaRange migration. Nothing is wired into the production pipeline
yet (the existing SmallBuddyRange remains the LocalMetaRange) — this
commit only adds the new components and their gate test.

* InplaceRep<Authmap, ChunkBounds>: in-band red-black-tree node Rep
  for BackendArena that stores the tree pointers inside the free block
  itself. Supports CHERI provenance via the Authmap mechanism (the same
  write-once cap table used by dealloc_meta_data); node accesses go
  through Authmap::amplify_from_address. can_consolidate refuses
  merging across MIN_CHUNK_SIZE boundaries to keep BackendArena's
  MAX_SIZE_BITS == MIN_CHUNK_BITS invariant intact.

* SmallArenaRange<Authmap>::Type<ParentRange>: a wrapper around
  BackendArena<InplaceRep<Authmap, ChunkBounds>, MIN_BITS,
  MIN_CHUNK_BITS> presenting the standard Range interface. Serves
  arbitrarily-unit-aligned sizes (not just powers of two). Replaces
  the historical alloc_range_with_leftover with
  alloc_size_with_align(size, align), which makes alignment an
  explicit parameter and donates the unit-aligned tail back to the
  arena.

* amplify_from_address<bool potentially_out_of_range>(address_t) on
  DummyAuthmap (pass-through reinterpret_cast) and BasicAuthmap (lookup
  + pointer_offset). Lets InplaceRep recover an arena cap for an
  address it knows only as an integer.

* New test target backend_arena_inplace covering the rep accessor
  round-trips, arena add/remove/consolidation/carve, a 30-seed x 500-op
  stress, the can_consolidate chunk-boundary refusal, and four
  alloc_size_with_align scenarios (exact fit, pow2 align over non-pow2
  size, align larger than size, MIN_CHUNK_SIZE bypass).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase B of the SmallBuddyRange -> SmallArenaRange migration.

* StandardLocalState and MetaProtectedRangeLocalState gain an
  Authmap template parameter, plumbed through alongside Pagemap.
  Both configs and the domestication test pass their Authmap into
  the LocalState instantiation.
* The three SmallBuddyRange uses in the meta-range pipes are
  replaced with SmallArenaRange<Authmap>.
* BackendAllocator::alloc_meta_data calls the new
  alloc_size_with_align(size, alignment) primitive, with
  alignment = max(next_pow2(size), MetaRangeT::UNIT_SIZE). The
  next_pow2 keeps Phase B behaviour identical to the previous
  buddy-rounded path; the max floors the alignment at the meta
  range's UNIT_SIZE so alloc_size_with_align's precondition holds
  for any positive size.
* FixedRangeConfig's inline Authmap gains amplify_from_address
  (the new SmallArenaRange path needs it).

SmallBuddyRange.h is now orphaned but stays in tree until Phase D
removes it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SmallBuddyRange was orphaned by the previous commit; LargeBuddyRange,
SmallBuddyRange and their shared buddy.h are now all dead. Delete
them (-848 lines) and clean up stale references in comments, README,
AddressSpace.md, and the MIN_HEAP_SIZE_FOR_THREAD_LOCAL_BUDDY
constant (renamed ..._CACHE).

Now that there is only one Arena type and the Small/Large pair of
range adapters built on it, rename for symmetry and to drop the
redundant 'Backend' prefix:

  BackendArena       -> Arena
  BackendArenaBins   -> ArenaBins
  BackendArenaRange  -> LargeArenaRange   (pairs with SmallArenaRange)

Files and test directories renamed to match. The test-internal
'using Arena = ...<...>;' aliases become 'TestArena' to avoid
colliding with the renamed class template.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BackendArenaRange / SmallArenaRange accept any UNIT_SIZE-aligned
request; the pow2 rounding the backend was applying to metadata
sizes was a leftover from the buddy era and inflated every slab's
metadata block to the next power of two. With a ClientMeta
provider whose per-slab storage is non-pow2 (e.g. allocation
bitmap + small fixed header), this rounding doubled the metadata
overhead.

Publish MIN_META_ALIGN on each LocalState (= MetaRange::UNIT_SIZE).
Add BackendAllocator::meta_size_round, which pads to MIN_META_ALIGN
and steps up to MIN_CHUNK_SIZE for requests that would bypass the
small range to the parent. Replace all four next_pow2-rounded
metadata sites in backend.h with this helper.

A new test func/client_meta_nonpow2 installs a ClientMetaDataProvider
whose per-slab storage is non-pow2 and exercises alloc/dealloc
round-tripping across several sizeclasses; any disagreement between
alloc-side and dealloc-side rounding would trip the meta range's
dealloc_range assertions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrite the design intro as a docs/ companion to AddressSpace.md:

* Drop the LargeBuddyRange framing (that range no longer exists).
* Align the mechanism description with the in-tree code, which builds
  positive serve masks rather than the inverse skip masks the original
  sketch used.
* Add brief sections on the two-tree structure (one bin tree per
  non-empty bin + one range tree for coalescing) and on the two reps
  Arena ships with: PagemapRep behind LargeArenaRange for whole-chunk
  allocations, InplaceRep behind SmallArenaRange for sub-chunk metadata.
* Link out to AddressSpace.md and the prototype scripts.

PLAN.md is the working planning document; untrack it and add to
.gitignore so it stays local.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* arenabins.h: Bitmap needs a constexpr default ctor for clang's
  require_constant_initialization on threadalloc.h's default_alloc.
* redblacktree.h: print had an uninitialised s_indent; inlining changes
  from the new neighbours / for_each helpers tipped GCC's
  -Wmaybe-uninitialized over.
* Several test locals are only consumed by SNMALLOC_ASSERT and become
  unused in -DNDEBUG builds. Wrap with UNUSED(...) to match the
  convention already in the file.
* GCC's -Warray-bounds cannot prove the upper bound of indices read
  from compile-time tables once asserts are stripped. Add
  SNMALLOC_ASSUME alongside the existing asserts in
  ArenaBins::Bitmap::find_for_request and the test's mock_index.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… sign-conv

* sizeclasstable.h: add operator!= alongside operator==. Apple Clang in
  gnu++17 mode does not synthesise it from ==, so the release-rounding
  test fails to build on macOS.
* arena.cc test mock_index: GCC's release -Warray-bounds still saw the
  OOB path at the mock_store[...] read site even with SNMALLOC_ASSUME
  inside mock_index. Replace the indirect probe in can_consolidate
  with an explicit in-range guard returning false on out-of-arena
  addresses (matches PagemapRep semantics: no neighbour outside the
  arena). The initializer-list size_t loop in test_large_size_roundtrip
  now uses size_t{} literals to silence -Wsign-conversion under the
  clang+UBSan+TSan CI configuration.
* smallarenarange.cc: MSVC rejects alignas with values as large as
  MIN_CHUNK_SIZE (16384) on static storage. Oversize both backing
  buffers by one chunk and align the base up at runtime via
  base_addr() / pool_base(). Fix the resulting -Wsign-conversion in
  the pointer-difference index calculation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 14 changed the sentinel sizeclass slot to be the
zero-initialised row 0 with both size = 0 and slab_mask = 0.
Pre-Phase 14, the sentinel had slab_mask = SIZE_MAX (from
`size - 1` underflow in the large-class init loop). The bounds-
checked memcpy shim (`bounds_checks.h::check_bound`) calls
`remaining_bytes` unconditionally on every memcpy destination,
including foreign (non-snmalloc) heap addresses reached via
LD_PRELOAD before snmalloc has seen them. With slab_mask = 0,
`start_of_object(addr) = addr`, so `remaining_bytes = 0`, and
every memcpy on a foreign pointer fatals.

Restore the pre-Phase-14 behaviour by explicitly setting the
sentinel slot's slab_mask to ~size_t(0), so `start_of_object`
collapses to 0 and `remaining_bytes` underflows to a huge value
that trivially passes any reasonable bound check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Coverage report (cross-platform merged)

Lines covered (src/snmalloc/**): 3045 / 3430 (88.78%)

Merged line coverage is the per-line union across all platforms. Region coverage is reported per-platform only; no cross-platform region total is computed.

Per-directory breakdown

Directory Lines covered Lines executable %
src/snmalloc/stl 2 4 50.00%
src/snmalloc/override 94 156 60.26%
src/snmalloc/pal 330 449 73.50%
src/snmalloc/mitigations 14 18 77.78%
src/snmalloc/aal 51 59 86.44%
src/snmalloc/ds_core 381 440 86.59%
src/snmalloc/global 265 306 86.60%
src/snmalloc/ds 338 365 92.60%
src/snmalloc/mem 839 876 95.78%
src/snmalloc/ds_aal 108 112 96.43%
src/snmalloc/backend_helpers 521 540 96.48%
src/snmalloc/backend 102 105 97.14%
Per-platform contributions (advisory)
Platform Lines covered Lines executable Lines % Regions covered Regions executable Regions %
freebsd-14 6081 6674 91.11% 6007 9481 63.36%
linux-self-host-shim-checks 6214 6811 91.23% 6269 10283 60.96%
linux-self-host-shim-checks-selfhost 1870 2565 72.90% 1900 3777 50.30%
macos-14 6093 6633 91.86% 6118 9835 62.21%
windows-2022 6002 6653 90.21% 6042 9789 61.72%

mjp41 and others added 2 commits June 10, 2026 10:32
1. start_of_object: replace inlined 64-bit div_mult fast path
   with a call to slab_index, which has the correct 32-bit
   offset / size fallback. The inlined version overflowed
   size_t on 32-bit arm-linux-gnueabihf, producing wrong
   remaining_bytes (off by one allocation size) for
   func-memory-check::test_remaining_bytes.

2. func-largearenarange-check test: pass MinBaseSizeBits<Pal>()
   as the MIN_REFILL_SIZE_BITS template parameter so the first
   parent allocation is at least the PAL's minimum reserve
   size. Windows VirtualAlloc cannot reserve below 64 KiB
   allocation granularity, so PalRange returned nullptr for
   the test's 16 KiB request. Matches what production code
   (standard_range.h, meta_protected_range.h) does.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extracted the exact diff from the CI clang-format-15 run on this PR
and applied it. 13 files: whitespace, ternary line-breaks, include
reordering, friend-decl single-line, init-list one-per-line for
size_t{} literals.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant