New Arena Range#855
Draft
mjp41 wants to merge 31 commits into
Draft
Conversation
Phase 1 of the BackendArena refactor (see PLAN.md). Introduces
src/snmalloc/backend_helpers/backend_arena_bins.h, which owns the
chunk-unit size-class scheme and the non-empty-bins bitmap that
later phases will use to drive bin selection inside BackendArena.
Public surface (the integration contract for future phases):
* range_t, carve_t, carve(block, n_chunks), max_supported_chunks().
* Nested Bitmap with add(block), find_for_request(n_chunks),
clear(bin_id), and TOTAL_BINS.
Everything else (the size-class encoding, the per-SC tables, the
free-side classifier bin_index) is private. The unit test reaches
it via a friend struct BackendArenaBinsTestAccess<B> that is only
forward-declared in the header and defined in the test translation
unit, so the production header carries no test-only surface.
Implementation:
* Two power-of-two-sized rodata tables indexed by raw sc id with
shift+add. bitmap_info_t (4 words via alignas) feeds
Bitmap::find_for_request; carve_info_t (2 words) feeds carve
and the free-side cascade-fit predicate.
* bitmap_info_t fields (start_word, first_mask, second_mask) are
pre-shifted into the bitmap's word layout so find_for_request
is two ANDs on the hot word + word-boundary fall-through.
* Tables are populated at constexpr build time by BinTable()
consuming the canonical bin_subsets table; the strict-chain
invariant on bin_subsets is checked at compile time via throw
in the constexpr constructor.
* Fast path uses the runtime CLZ intrinsic via the new
bits::to_exp_mant<MANTISSA_BITS, LOW_BITS> (paired with the
existing to_exp_mant_const); the _const variant is restricted
to constexpr table construction and test static_asserts.
bits::prev_pow2_bits / prev_pow2_bits_const are added alongside
for symmetric runtime / constexpr access.
The new test cross-checks bin classification, carve, and
find_for_request against a brute-force scanner derived directly
from bin_subsets, for B in {1, 2, 3}. Exhaustive single-bit and
multi-bit randomised bitmap states are covered, plus word-boundary
straddle cases enumerated automatically from the table.
No production code path is changed: BackendArenaBins<B> is unused
in the build until later phases compose it into BackendArena.
Also lands PLAN.md (single forward-looking spec for the whole
BackendArena refactor) and claude.md (development guidance for
this branch).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a public RBTree method that returns the strict neighbours of a
probe value K in a single root-to-leaf descent:
- every left turn (parent > K) records the parent as the current
successor candidate
- every right turn (parent < K) records the parent as the current
predecessor candidate
At loop exit the tightest neighbours are returned as
`stl::Pair<K, K>{pred, succ}`; either component is `Rep::null` when no
such neighbour exists.
The "K not in tree" precondition is asserted via SNMALLOC_ASSERT and
expands to nothing in Release. BackendArena, the planned caller, relies
on the invariant that two free blocks cannot share a starting address.
test_neighbours exercises the algorithm against std::set::lower_bound /
upper_bound as oracle. Boundary probes (K=0, K=size+1) plus random
probes that skip oracle hits keep every call within the precondition.
The sweep reuses the existing test()'s size range but caps to the first
few seeds per size to keep the per-test time budget in check.
PLAN.md Phase 2 spec records the K-not-in-tree precondition.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce BackendArena, a free-range allocator that manages chunks within a bounded arena using a dual red-black-tree scheme: - Bin trees: one per size-class bin, for best-fit allocation lookups driven by a non-empty-bins bitmap. - Range tree: keyed by address, for O(log n) neighbour lookup during consolidation of adjacent free blocks. Key design decisions: - Single-chunk (min-size) blocks live only in bin tree 0, not the range tree, keeping range-tree overhead proportional to multi-chunk blocks. The min-size bin is probed as a fallback during consolidation. - Three-variant encoding (Min/TwoMin/Large) in pagemap metadata bits avoids a range-tree lookup for the common 1-chunk and 2-chunk cases. - WordRef handle and TreeRep<RefFn> template follow the existing BackendStateWordRef / BuddyChunkRep patterns from largebuddyrange.h. - Consolidation in add_block checks predecessor then successor, merging adjacent blocks and re-inserting the result. - remove_block uses Bins::carve to split oversized blocks, re-inserting remainders. Also: - Add neighbours() to RBTree: single-descent strict-neighbour query. - Add for_each() to RBTree: in-order traversal for invariant checking. - Make BackendArenaBins::bin_index public (sole consumer is BackendArena). - Add BackendArenaBins::Bitmap::test() for invariant verification. - Five-clause structural invariant gated on bool parameter (defaults to Debug), checked at entry/exit of add_block and remove_block. - Comprehensive test suite: word-level round-trips, tree operations, empty-state invariant, add/remove without consolidation, consolidation case matrix (8 pred/succ combinations), overflow detection, and randomised stress test with oracle validation (50 seeds x 500 ops). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests two BackendArena instances sharing a single MockRep pagemap: - Basic migration: blocks move between arenas - Consolidation after migration: gap block consolidates with neighbours - Randomised stress: 50 seeds x 500 ops with add/remove/migrate Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 9 of the BackendArena refactor:
- Make BackendArena fully generic over its Rep, mirroring the
Buddy/Rep layering. The class no longer holds any bit-layout
constants; Rep supplies the full RBTree Rep for both the bin
trees and the range tree, owning red-bit (and any tag-bit)
packing privately.
- Rep concept now requires:
using BinRep -- full RBTree Rep for the bin trees
using RangeRep -- full RBTree Rep for the range tree
get_variant / set_variant
get_large_size_chunks / set_large_size_chunks
can_consolidate(higher_addr) -> bool
- Add can_consolidate checks in add_block before each (predecessor
and successor) merge, and update the invariants to tolerate
boundary-blocked adjacency.
- MockRep grows inner BinRep / RangeRep structs that each provide
the full RBTree Rep interface over the mock-entry array, with a
private red-bit at bit 8.
- New tests verify that can_consolidate returning false at a
specific address prevents predecessor- and successor-side merges
independently, including at min-block boundaries.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 10 of the BackendArena work. Adds the BackendArenaRange wrapper
that drops into the LargeBuddyRange slot, generalises BackendArena and
BackendArenaBins on MIN_SIZE_BITS, and converts the arena/range API
boundary to bytes throughout.
* BackendArenaRange<REFILL_SIZE_BITS, MAX_SIZE_BITS, Pagemap,
MIN_REFILL_SIZE_BITS> with a PagemapRep that packs variant tag, RB
red bit and the consolidated large-block size into the first pagemap
word, and uses the second word for in-tree links. Provides
alloc_range / dealloc_range / add_range over the bin-tree arena.
* parent_dealloc unifies the old parent_dealloc_range and
dealloc_overflow paths; add_range uses bits::align_up /
bits::align_down for parent-input trimming.
* BackendArenaBins<B, MIN_SIZE_BITS> generalises the bin scheme so its
range_t, carve and find_for_request all speak bytes (multiples of
UNIT_SIZE = 1 << MIN_SIZE_BITS). Tests cover MIN_SIZE_BITS in {0, 4,
14}.
* BackendArena<Rep, MIN_SIZE_BITS, MAX_SIZE_BITS>: add_block /
remove_block / variant_of / insert_block / range_from_addr /
invariants all work in bytes. remove_block returns a scalar address
(0 = failure); the size half of the old pair was tautological.
CHUNKS_BITS / addr_to_chunk / chunk_to_addr removed.
* PagemapRep::get_large_size / set_large_size are bytes-in / bytes-out;
storage still scales by MIN_SIZE_BITS so the shifted field fits a
pagemap word.
* Tests: func-backend_arena_range exercises alloc/dealloc/refill/large
paths against a mock parent; func-backend_arena and
func-backend_arena_bins updated for the bytes-throughout convention
(chunk_size(N) helper at the test boundary).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Every `git commit`, `--amend`, `push`, `reset`, `rebase`, or `gh pr create` must be preceded by an explicit ask_user approval for that specific commit/PR. "Begin the next phase" does not authorise committing later work — only "commit this" for the change in hand counts as approval. If a commit has already been made without approval, offer `git reset --soft HEAD~1` to undo it while preserving the staged changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mechanical substitution of every LargeBuddyRange instantiation in the default in-tree range pipelines: - src/snmalloc/backend/standard_range.h (GlobalR, LargeObjectRange). - src/snmalloc/backend/meta_protected_range.h (GlobalR, CentralObjectRange, CentralMetaRange, the conditional_t huge-page cache, ObjectRange, MetaRange). After this change snmalloc uses the BackendArena bin-tree allocator instead of the power-of-two buddy for all large-range management in the default pipelines. LargeBuddyRange and BuddyChunkRep remain in the tree, available for alternative configurations. Two issues uncovered during Phase 12 testing and fixed here: 1. backend_arena.h: BackendArena::add_block's successor-min branch called Rep::can_consolidate(succ_addr) before contains_min(succ_addr) confirmed succ_addr is in our region. For a block added at the very top of a registered region (e.g. last 8 MiB of a 256 MiB fixed region), succ_addr = addr + size sits one chunk past the pagemap's mapped backing, and the can_consolidate probe segfaults. The fix reorders the checks so the tree-membership test gates the pagemap read, matching the documented pattern in buddy.h:90-93. Regression coverage: MockRep gains a per-chunk `boundary` field on `mock_entry`. `MockRep::can_consolidate(addr)` now returns `!mock_store[mock_index(addr)].boundary` — faithful to the real `PagemapRep::can_consolidate` reading `entry.is_boundary()`. The `mock_index` bounds assertion fires on any out-of-range probe, so the unsafe pattern trips in unit tests rather than only as a segfault in production. A new test_block_at_arena_top_edge adds a block whose succ_addr would address chunk MOCK_ARENA_CHUNKS; without the reorder this reproduces the original failure. This unification also subsumed the previous BoundaryMockRep and its boundary_addrs global std::set: the four boundary tests now run on Arena<K> and set mock_store[mock_index(addr)].boundary = true instead. Net -35 lines in backend_arena.cc. 2. backend_arena_bins.h: the BinTable constexpr constructor used throw "..." as a constexpr-eval-fails trick to surface invariant violations as compile errors. throw requires exception support, which is disabled in the main allocator (-fno-exceptions), so this broke Phase 12 builds. Replaced with SNMALLOC_CHECK(false && "..."), which calls a non-constexpr error path and achieves the same compile-time failure without runtime exception machinery. Full ctest suite passes (86/86, --timeout 120 -j 4). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On closer inspection at Phase 13 start, the two-path conditional in BackendArenaRange::refill turns out to be load-bearing rather than vestigial: - Aligned-parent path serves caller sizes up to (1 << MAX_SIZE_BITS) - 1; unaligned-parent path caps at ~REFILL_SIZE / 2 because of its while (needed_size <= refill_size) guard. Unifying on the unaligned strategy reduces capability for aligned-parent configs. - The aligned-parent carve shortcut is precise, not a perf optimisation: it hands the caller's `size` bytes back directly and calls add_range with refill_size - size, which is strictly less than refill_size and so satisfies add_block's size < 2^MAX_SIZE_BITS precondition even when REFILL_SIZE_BITS == MAX_SIZE_BITS (the LargeObjectRange config). A unified "add the whole refill then recurse" path violates that precondition for the same config, and the workarounds (cut LocalCacheSizeBits by 1, or bump MAX_SIZE_BITS by 1) carry real cost for no behavioural win. - LargeBuddyRange would still consume Aligned under the agreed-minimal (a)+(ii) scope, so the Aligned field's footprint in pass-through ranges doesn't shrink — defeating the only structural-cleanup motivation. The BackendArena refactor (Phases 1-12) ends with Phase 12. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the tagged small/large encoding and the leading-zero-count
large-class indexing with a single uniform exp+mantissa scheme:
value == 0 : unmapped sentinel
value in [1, 1 + NUM_SMALL_SIZECLASSES) : small (sc = value - 1)
value in [1 + NUM_SMALL_SIZECLASSES,
1 + NUM_SMALL_SIZECLASSES + NUM_LARGE_CLASSES)
: large (lc = ...)
Small classes use `from_exp_mant(sc)` (unchanged). Large classes
continue the same exp+mantissa namespace as
`from_exp_mant(NUM_SMALL_SIZECLASSES + lc)`. The discriminator tag bit
is gone — small and large share one contiguous index space — and the
sentinel slot 0 lets the size-lookup fast path return 0 / 0 for
unmapped pointers without a branch.
The `SIZECLASS_REP_SIZE` / `REMOTE_BACKEND_MARKER` / `REMOTE_MIN_ALIGN`
chain is re-derived from the new `SIZECLASS_BITS` (renamed from
`TAG_SIZECLASS_BITS`); RED_BIT / VARIANT_SHIFT / LARGE_SIZE_SHIFT in
`backend_arena_range.h` and RED_BIT in `largebuddyrange.h` derive
from the new public `MetaEntryBase::BACKEND_LAYOUT_FIRST_FREE_BIT` so
future widenings propagate automatically.
A new `MAX_LARGE_SIZECLASS_SIZE` constant gates user-supplied sizes at
the API boundary (`alloc_not_small`, `round_size`, `check_size`,
`rust_realloc`) — replacing the loose `> 2^63` bound. `ENCODED_ADDRESS_BITS`
caps the encoding at `BITS - 1` so the constant survives 32-bit
platforms where `DefaultPal::address_bits == BITS`.
The pre-Phase-13 `large_size_to_chunk_sizeclass` helper is removed —
its `+NUM_SMALL_SIZECLASSES` / `-NUM_SMALL_SIZECLASSES` round-trip
through an `lc` index cancels in the uniform scheme, so
`size_to_sizeclass_full`'s large branch inlines the `to_exp_mant`
directly.
Front-end semantics are unchanged: `large_size_to_chunk_size` still
returns `next_pow2(size)` and the front end still reserves pow2 chunk
sizes. The non-pow2 large sizeclasses exist in `sizeclass_metadata`
(with `slab_mask = info.align - 1`) but are unreachable from
`size_to_sizeclass_full` until Phase 15 drops the `next_pow2` rounding.
Tests:
- `sizeclass.cc`: sentinel sanity, raw-value adjacency, range disjoint,
large monotonicity, pow2 round-trip, non-pow2 rounds up.
- `rounding.cc`: extends to pow2 large sizeclasses, verifying
`index_in_object` / `is_start_of_object` at representative offsets.
- `cheri.cc`: large-class verification loop bound updated to
`NUM_LARGE_CLASSES`.
- Loop bounds in tests use `ENCODED_ADDRESS_BITS` to avoid
`bits::one_at_bit(BITS)` UB on 32-bit.
ctest: 86/86 passing.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
get_mut<true> base-adjusted p before calling register_range, which then re-applied the base subtraction internally and tripped its out-of-range guard for legitimate in-range addresses. The path is reachable on PALs without LazyCommit (e.g. PALNoAlloc<PALLinux>) when get<true>/get_mut<true> is called on an in-range address of a bounded pagemap. Move the register_range call before the p = p - base adjust so it sees the un-adjusted address that its bounds check expects. Add a regression test in func-pagemap that wraps DefaultPal with a stub stripping LazyCommit; this exercises the previously-broken path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Encode (sizeclass, slab-offset) jointly in the pagemap entry so the
front end can recover the allocation start for an arbitrary interior
chunk of a multi-slab-tile large allocation. The front end still
only issues pow2 large requests, so every materialised entry today
has offset=0; this lays the groundwork for Phase 15+ non-pow2 large
support without front-end changes.
Key pieces:
- offset_and_sizeclass_t packs sizeclass into the low SIZECLASS_BITS
and per-chunk offset into the next OFFSET_BITS of one word.
- Backend::alloc_chunk loops over slab tiles, writing each tile's
slab_index into the offset bits of its pagemap entry.
- SizeClassTable is split into three by purpose:
* start_ (sizeclass_data_start, 32B/row, indexed by osc): hot
path for start_of_object on every dealloc.
* align_ (sizeclass_data_align, 16B/row, indexed by sc): used by
is_start_of_object alignment check in -check builds.
* slab_ (sizeclass_data_slab, 4B/row, indexed by sc): cold; slab
init thresholds.
- start_of_object branches on osc.offset() == 0 (testable from bits
already loaded in osc.raw()), so the offset=0 hot path skips the
offset_bytes load and offset-shift arithmetic. Combined with the
table split, perf-external_pointer-fast matches the baseline
(~290 ms median) with no regression; perf-singlethread-check is
within noise.
- New src/test/func/large_offset targeted test reaches the
multi-slab-tile branch via the public backend API.
- check_invariant in BackendArena now uses SNMALLOC_CHECK rather
than SNMALLOC_ASSERT, so callers that opt in via enabled=true get
the invariant checks even in Release builds (which is what the
tests want); the #ifndef NDEBUG wrapper is no longer needed.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
snmalloc::alloc<size, Conts, align>() applies aligned_size(align, size) internally; snmalloc::dealloc<size>(p) did not. When the alignment upgrade pushed the reservation into a different sizeclass than `size`, check_size fired under the check flavour. Reproducer: alloc<33*1024, _, 128*1024>(); dealloc<33*1024>(p) => "Dealloc rounded size mismatch: 0xa000 != 0x20000". Merge dealloc<size> into a single template `dealloc<size, align = 1>` applying aligned_size(align, size) before check_size. The default align=1 preserves existing one-argument-template behaviour because aligned_size(1, size) == size. Move aligned_size from sizeclasstable.h to sizeclassstatic.h so the test library header can use it without pulling in the full runtime sizeclass machinery. Existing consumers still get it transitively via the pal.h -> ds_core.h -> sizeclassstatic.h include chain. Mirror the merge in the test library header: dealloc<size, align=1> and alloc<size, ZeroMem, align=1>. Add aligned_dealloc to TESTLIB_ONLY_TESTS. Includes src/test/func/aligned_dealloc/ with the canonical reproducer and additional (S, A) pairs. Also captures the planning context in PLAN.md (pre-Phase-15 + Phase 15 sections). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A request like malloc(70 KiB) at the default INTERMEDIATE_BITS = 2
now reserves the smallest enclosing exp+mantissa sizeclass (80 KiB)
rather than next_pow2(size) (128 KiB). Sizes that already land on a
class boundary reserve exactly that size; mid-exponent sizes shrink
by up to ~33%.
Mechanics:
sizeclasstable.h
- size_to_sizeclass_full drops next_pow2(size); to_exp_mant ceils
directly to the smallest enclosing class.
- round_size's large branch matches the reservation
(sizeclass_full_to_size of the chosen class), so
DefaultConts::success zeroes exactly the reservation for calloc.
- large_size_to_chunk_size removed (the one caller in corealloc
uses sizeclass_full_to_size(sc) directly with a hoisted sc).
- compute_max_large_slab_index tightened to meta.size / slab_size
- 1 (the actual worst case the runtime pagemap loop writes).
backend.h
- alloc_chunk's pow2 precondition relaxed to the slab-tile
invariant: size is a positive multiple of slab_size.
corealloc.h
- large alloc path hoists size_to_sizeclass_full / chunk size into
locals so each table lookup happens once.
Tests:
- large_offset_frontend/: new front-end counterpart to
large_offset/. Exhaustively round-trips every large sizeclass and
walks every chunk-aligned interior pointer for a boundary and a
non-boundary request.
- memory/: adds test_calloc_non_pow2_large as a calloc zeroing smoke
test; clamps the end-of-stride probe in check_external_pointer_large
since non-pow2 reservations are tighter than the next pow2.
- sizeclass/: deterministic round_size gate over every large class
(S maps to itself; S_prev+1 ceils to S).
- large_offset/: backend test now passes the chunk-multiple reserve
(= sizeclass_full_to_size(sc)) instead of next_pow2(size).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- RBTree::neighbours's "value absent" precondition is now release-checked with a single post-descent comparison. A duplicate key would otherwise return an arbitrary neighbour pair that callers (e.g. BackendArena::add_block) would consume as valid, corrupting dual-tree consolidation. Equivalent to the prior per-node debug assert for BST-ordered trees, with no per-node cost. - BackendArena::range_from_addr's Large branch asserts the structural invariants on the size returned by Rep::get_large_size (> TWO_UNITS, unit-aligned, below the arena-size cap). Debug-only: this is an internal Rep invariant, not defense against external corruption. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Range tree stores chunk-aligned addresses in Word::Two of the pagemap entry. The markerless ownership discriminator (is_backend_owned == (remote_and_sizeclass & COMBINED_MASK) == 0) requires those addresses to have zero in the BACKEND_RESERVED_MASK_WORD_TWO (= COMBINED_MASK) bits — i.e., the reserved mask must fit entirely below the chunk alignment. The invariant held silently for default configs; a future config change shrinking the chunk alignment or growing INTERMEDIATE_BITS would have turned backend-owned writes into spurious frontend entries with no compile-time guard. This assertion mirrors the existing Word::One BIN_META_MASK assert. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
unlink_block called bitmap.add(range) purely to compute the bin id, relying on the idempotent set-bit side effect being harmless on the unlink path. Bins::bin_index is the pure classifier with no side effect and a name that matches the intent. The bitmap-tree consistency invariant is unchanged (still verified by check_invariant Clause 4). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A null probe value can never collide with a tree entry (null is not insertable), so it must be exempt from the post-descent duplicate check. The redblack functional test deliberately probes with key 0 to exercise the boundary case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
alloc_range and dealloc_range both bypassed to the parent when the request equalled or exceeded the bin range, but used mask_bits(N) = (1<<N)-1 while BackendArena::add_block's precondition is size < one_at_bit(N) = 1<<N. The mask-based test was off-by-one but harmless because aligned chunk sizes never land on (1<<N)-1. Replace both call sites with a single is_too_large helper so the boundary cannot drift between paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Note that MetaEntryBase::operator= is load-bearing: the pagemap writes back through it, so META_BOUNDARY_BIT survives every metadata mutation without explicit preservation by callers. - Correct the BackendStateWordRef single-pointer ctor comment: it is required by RBRepMethods for sentinel construction from &Rep::root, not a legacy convenience. - BackendArena: drop spurious const on contains_min and check_invariant (no const callers exist), removing the const_cast laundering. - BackendArena::check_invariant: lift the five clause titles into the docblock; trim the inline labels to single-line markers. - BackendArena::add_block: drop cross-file line-number reference to buddy.h. - backend_arena_range.h / backend_arena_bins.h: replace SNMALLOC_CHECK(false && "msg") with SNMALLOC_CHECK_MSG. - backend_arena_range.h: rename `auto refill` to `refill_range` to avoid shadowing the enclosing function. - Tests: use "test/..." quoted include style for consistency. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces the building blocks for Phase A of the SmallBuddyRange -> SmallArenaRange migration. Nothing is wired into the production pipeline yet (the existing SmallBuddyRange remains the LocalMetaRange) — this commit only adds the new components and their gate test. * InplaceRep<Authmap, ChunkBounds>: in-band red-black-tree node Rep for BackendArena that stores the tree pointers inside the free block itself. Supports CHERI provenance via the Authmap mechanism (the same write-once cap table used by dealloc_meta_data); node accesses go through Authmap::amplify_from_address. can_consolidate refuses merging across MIN_CHUNK_SIZE boundaries to keep BackendArena's MAX_SIZE_BITS == MIN_CHUNK_BITS invariant intact. * SmallArenaRange<Authmap>::Type<ParentRange>: a wrapper around BackendArena<InplaceRep<Authmap, ChunkBounds>, MIN_BITS, MIN_CHUNK_BITS> presenting the standard Range interface. Serves arbitrarily-unit-aligned sizes (not just powers of two). Replaces the historical alloc_range_with_leftover with alloc_size_with_align(size, align), which makes alignment an explicit parameter and donates the unit-aligned tail back to the arena. * amplify_from_address<bool potentially_out_of_range>(address_t) on DummyAuthmap (pass-through reinterpret_cast) and BasicAuthmap (lookup + pointer_offset). Lets InplaceRep recover an arena cap for an address it knows only as an integer. * New test target backend_arena_inplace covering the rep accessor round-trips, arena add/remove/consolidation/carve, a 30-seed x 500-op stress, the can_consolidate chunk-boundary refusal, and four alloc_size_with_align scenarios (exact fit, pow2 align over non-pow2 size, align larger than size, MIN_CHUNK_SIZE bypass). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase B of the SmallBuddyRange -> SmallArenaRange migration. * StandardLocalState and MetaProtectedRangeLocalState gain an Authmap template parameter, plumbed through alongside Pagemap. Both configs and the domestication test pass their Authmap into the LocalState instantiation. * The three SmallBuddyRange uses in the meta-range pipes are replaced with SmallArenaRange<Authmap>. * BackendAllocator::alloc_meta_data calls the new alloc_size_with_align(size, alignment) primitive, with alignment = max(next_pow2(size), MetaRangeT::UNIT_SIZE). The next_pow2 keeps Phase B behaviour identical to the previous buddy-rounded path; the max floors the alignment at the meta range's UNIT_SIZE so alloc_size_with_align's precondition holds for any positive size. * FixedRangeConfig's inline Authmap gains amplify_from_address (the new SmallArenaRange path needs it). SmallBuddyRange.h is now orphaned but stays in tree until Phase D removes it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SmallBuddyRange was orphaned by the previous commit; LargeBuddyRange, SmallBuddyRange and their shared buddy.h are now all dead. Delete them (-848 lines) and clean up stale references in comments, README, AddressSpace.md, and the MIN_HEAP_SIZE_FOR_THREAD_LOCAL_BUDDY constant (renamed ..._CACHE). Now that there is only one Arena type and the Small/Large pair of range adapters built on it, rename for symmetry and to drop the redundant 'Backend' prefix: BackendArena -> Arena BackendArenaBins -> ArenaBins BackendArenaRange -> LargeArenaRange (pairs with SmallArenaRange) Files and test directories renamed to match. The test-internal 'using Arena = ...<...>;' aliases become 'TestArena' to avoid colliding with the renamed class template. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BackendArenaRange / SmallArenaRange accept any UNIT_SIZE-aligned request; the pow2 rounding the backend was applying to metadata sizes was a leftover from the buddy era and inflated every slab's metadata block to the next power of two. With a ClientMeta provider whose per-slab storage is non-pow2 (e.g. allocation bitmap + small fixed header), this rounding doubled the metadata overhead. Publish MIN_META_ALIGN on each LocalState (= MetaRange::UNIT_SIZE). Add BackendAllocator::meta_size_round, which pads to MIN_META_ALIGN and steps up to MIN_CHUNK_SIZE for requests that would bypass the small range to the parent. Replace all four next_pow2-rounded metadata sites in backend.h with this helper. A new test func/client_meta_nonpow2 installs a ClientMetaDataProvider whose per-slab storage is non-pow2 and exercises alloc/dealloc round-tripping across several sizeclasses; any disagreement between alloc-side and dealloc-side rounding would trip the meta range's dealloc_range assertions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrite the design intro as a docs/ companion to AddressSpace.md: * Drop the LargeBuddyRange framing (that range no longer exists). * Align the mechanism description with the in-tree code, which builds positive serve masks rather than the inverse skip masks the original sketch used. * Add brief sections on the two-tree structure (one bin tree per non-empty bin + one range tree for coalescing) and on the two reps Arena ships with: PagemapRep behind LargeArenaRange for whole-chunk allocations, InplaceRep behind SmallArenaRange for sub-chunk metadata. * Link out to AddressSpace.md and the prototype scripts. PLAN.md is the working planning document; untrack it and add to .gitignore so it stays local. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* arenabins.h: Bitmap needs a constexpr default ctor for clang's require_constant_initialization on threadalloc.h's default_alloc. * redblacktree.h: print had an uninitialised s_indent; inlining changes from the new neighbours / for_each helpers tipped GCC's -Wmaybe-uninitialized over. * Several test locals are only consumed by SNMALLOC_ASSERT and become unused in -DNDEBUG builds. Wrap with UNUSED(...) to match the convention already in the file. * GCC's -Warray-bounds cannot prove the upper bound of indices read from compile-time tables once asserts are stripped. Add SNMALLOC_ASSUME alongside the existing asserts in ArenaBins::Bitmap::find_for_request and the test's mock_index. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… sign-conv
* sizeclasstable.h: add operator!= alongside operator==. Apple Clang in
gnu++17 mode does not synthesise it from ==, so the release-rounding
test fails to build on macOS.
* arena.cc test mock_index: GCC's release -Warray-bounds still saw the
OOB path at the mock_store[...] read site even with SNMALLOC_ASSUME
inside mock_index. Replace the indirect probe in can_consolidate
with an explicit in-range guard returning false on out-of-arena
addresses (matches PagemapRep semantics: no neighbour outside the
arena). The initializer-list size_t loop in test_large_size_roundtrip
now uses size_t{} literals to silence -Wsign-conversion under the
clang+UBSan+TSan CI configuration.
* smallarenarange.cc: MSVC rejects alignas with values as large as
MIN_CHUNK_SIZE (16384) on static storage. Oversize both backing
buffers by one chunk and align the base up at runtime via
base_addr() / pool_base(). Fix the resulting -Wsign-conversion in
the pointer-difference index calculation.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Phase 14 changed the sentinel sizeclass slot to be the zero-initialised row 0 with both size = 0 and slab_mask = 0. Pre-Phase 14, the sentinel had slab_mask = SIZE_MAX (from `size - 1` underflow in the large-class init loop). The bounds- checked memcpy shim (`bounds_checks.h::check_bound`) calls `remaining_bytes` unconditionally on every memcpy destination, including foreign (non-snmalloc) heap addresses reached via LD_PRELOAD before snmalloc has seen them. With slab_mask = 0, `start_of_object(addr) = addr`, so `remaining_bytes = 0`, and every memcpy on a foreign pointer fatals. Restore the pre-Phase-14 behaviour by explicitly setting the sentinel slot's slab_mask to ~size_t(0), so `start_of_object` collapses to 0 and `remaining_bytes` underflows to a huge value that trivially passes any reasonable bound check. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Coverage report (cross-platform merged)Lines covered ( Merged line coverage is the per-line union across all platforms. Region coverage is reported per-platform only; no cross-platform region total is computed. Per-directory breakdown
Per-platform contributions (advisory)
|
1. start_of_object: replace inlined 64-bit div_mult fast path with a call to slab_index, which has the correct 32-bit offset / size fallback. The inlined version overflowed size_t on 32-bit arm-linux-gnueabihf, producing wrong remaining_bytes (off by one allocation size) for func-memory-check::test_remaining_bytes. 2. func-largearenarange-check test: pass MinBaseSizeBits<Pal>() as the MIN_REFILL_SIZE_BITS template parameter so the first parent allocation is at least the PAL's minimum reserve size. Windows VirtualAlloc cannot reserve below 64 KiB allocation granularity, so PalRange returned nullptr for the test's 16 KiB request. Matches what production code (standard_range.h, meta_protected_range.h) does. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extracted the exact diff from the CI clang-format-15 run on this PR
and applied it. 13 files: whitespace, ternary line-breaks, include
reordering, friend-decl single-line, init-list one-per-line for
size_t{} literals.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR replaces them with a new backend that handles snmalloc's full
(exponent, mantissa) size class sequence end-to-end, and still maintains
the snmalloc invariant that any allocation is aligned by the largest power
of two that divides its size.
Free blocks are kept in a set of bins, with a few extra bins to handle the
alignment subtleties: a 5-unit block at the wrong address cannot serve a
4-unit request (because of the higher alignment), whereas a 6-unit block
can serve every smaller size. A small precomputed mask per request hides
exactly those bins that cannot serve it.
Consolidation is Doug Lea�style: on free we look left and right and merge
maximally. Blocks live in two layers of red-black trees � one tree per
non-empty bin (for selection within a bin) and one address-keyed tree over
all free blocks in the allocator (for left/right neighbour lookup on
free). Because the trees scale, the same backend stacks at multiple levels
of the range pipeline, the way buddy allocators used to.