research(seprag): BET 2⊗4 filtered-ANN region-pruning — qualified NO-GO (ADR-201) [finding, not a feature] by shaal · Pull Request #536 · ruvnet/RuVector

shaal · 2026-06-04T20:48:02Z

⚠️ This is a research finding, not a feature. The verdict is a qualified NO-GO. Merging records an ADR + reusable benchmark tooling — it does not add a production code path or claim a win. No urgency to merge; opened for visibility and the record (same as the ADR-199 kill).

TL;DR

BET 2 ⊗ BET 4 of the SepRAG research line (#534): does region-pruned IVF search beat the ruvector-acorn incumbent on correlated filtered queries? Pre-registered ≥5× distance-eval gate.

Verdict: qualified NO-GO. Region-pruning beats vanilla ACORN 6–48× at selectivity ≤ 1% — but the win does not survive the mandatory adversarial check: giving ACORN a predicate-aware entry (a simple, standard enhancement) collapses the gap to ~2× at high correlation, below the bar. A real but narrow edge remains at moderate correlation (ρ≈0.7). Full reasoning: docs/adr/ADR-201.

What's in this PR (independent of #535)

New self-contained crate ruvector-filtered-bench — depends only on ruvector-acorn + ruvector-rairs; zero dependency on ruvector-seprag/PR SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag) #535.
ADR-201 + the pre-registration doc (gate frozen before the run).
Additive, result-preserving instrumentation of ruvector-acorn: acorn_search_counted, flat_filtered_search_counted, acorn_search_seeded_counted. Existing functions delegate unchanged; 13 acorn tests prove behavior is preserved. Useful tooling for anyone measuring ACORN's distance-eval cost.

Why it may still be worth landing despite the NO-GO

ADR-201 is a documented, cited finding — kills are first-class in this thread (cf. ADR-199).
The ACORN eval-counting variants and the ρ-correlation-knob benchmark are reusable for the named follow-ups (multi-predicate conjunctions; large-n).

Honest verdict (cost at matched recall, n=20k arxiv)

ρ	sel	A vs vanilla ACORN	A vs predicate-aware-entry ACORN
1.0	0.1%	25.9×	2.4× — below the 5× bar
1.0	1%	6.1×	1.9× — below the bar
0.7	1%	9.0×	6.5× — holds

Central lesson: a filtered-ANN cost claim is meaningless without a predicate-aware-entry baseline.

Notes for review

Branches off main; does not affect SepRAG: CCH-inspired retrieval exploration + customizable re-weighting (ADRs 196-200, ruvector-seprag) #535 (BET 1, a separate WIN).
Data-dependent tests skip gracefully without the ogbn-arxiv download → CI is green without data.
ruvector-acorn is touched only additively (measurement); the core algorithm is unchanged.

Resolves the BET 2 ⊗ BET 4 item of #534 (qualified NO-GO). Follow-ups (conjunctions, large-n, BET 4 standalone) noted in ADR-201 and #534.

… (issue ruvnet#534) Region-pruned filtered ANN vs tuned ACORN. New self-contained crate ruvector-filtered-bench, depending only on ruvector-acorn (incumbent + oracle) and ruvector-rairs (IVF) — independent of ruvector-seprag/PR ruvnet#535. Pre-registration (docs/plans/bet2-filtered-ann/PRE-REGISTRATION.md) freezes a selectivity-shaped win/kill gate before any contender runs: at correlation rho>=0.7, contender A within 2% filtered-recall@10 of tuned ACORN at >=5x fewer distance-evals/query at sel<=1% (>=2x at sel=5%), monotonic in selectivity; graceful-degradation and wall-clock honesty guards; rho=0 recall-collapse kill control. M0 (plumbing, pre-freeze-safe): - data.rs: aligned ogbn-arxiv feat/label/year loader. - predicate.rs: rho-correlation knob holding selectivity exactly constant across rho, plus natural label/year predicate families. - tests/oracle_gate.rs: exact_filtered_knn cross-checked against an independent brute force on a real arxiv slice (sel x rho grid). 5 tests green, clippy clean.

… baseline Instrument ruvector-acorn with additive, result-preserving counted-search variants (acorn_search_counted, flat_filtered_search_counted) so distance-evals — the pre-registered primary cost metric — are measured exactly on ACORN-as-shipped. 13 acorn tests pass incl. a counted==uncounted + flat-evals==#matches invariant. filtered-bench contenders (src/contenders.rs): - B: ACORN predicate-agnostic search (the incumbent), exact eval counts. - C: classic post-filter (retrieve top-pool unfiltered, then filter) — the floor. M1 findings (n=20k arxiv, ρ=1, k=10): - TEETH (examples/teeth.rs): at the gate-relevant low selectivity, post-filter collapses while ACORN holds — sel=0.1%: 73.7% vs 22.7%; sel=0.5%: 90.4% vs 59.7%; sel=1%: 92.6% vs 79.3%. At sel>=5% post-filter is fine (as theory predicts). Benchmark is demonstrably sensitive (50+ pt recall swing) — the negative control. - TUNED ACORN (examples/acorn_tune.rs): ACORN reaches ~92.6% recall at sel=1% with gamma=2, ef=512, at ~1622 evals/query; evals are ~flat in ef (early-termination bound), so "tuned" = crank ef for recall at near-constant cost. This is the fair incumbent baseline for the M3 gate, and it validates the >=5x bar: contender A must reach >=90.6% recall at <=~324 evals/query to win.

src/prune.rs: RegionPruneIvf, built on ruvector-rairs k-means (ADR-193 substrate). Two stacked prunings realizing the salvaged SepRAG kernel on the treewidth-immune IVF hierarchy: 1. predicate pruning — skip clusters with zero matching members (the BET-2 win). 2. branch-and-bound distance pruning — triangle-inequality lower bound (dist(q,centroid) - radius); once the top-k heap is full, clusters whose LB exceeds the worst result are skipped. Probe in LB order so the bound lets us break, not just skip — a strict improvement over the M2-sketch's match-count ordering, and it yields EXACT filtered top-k. Cost metric = nclusters (routing) + matching members scanned; the O(1) predicate gates the expensive distance, so non-matching points cost nothing (the asymmetry vs ACORN, which evaluates a distance per expanded node regardless of predicate). max_probe knob: None = exact B&B (recall 1.0); Some(p) caps match-clusters probed (trades recall for fewer evals, mirroring ACORN's ef) for equal-recall comparison. Tests: exact_bb_matches_oracle (recall 1.0 vs exact_filtered_knn on 20 queries) and zero_match_clusters_are_skipped (1% selectivity → <1000 evals vs 4000 full scan). 8 unit + 1 integration green, clippy clean.

@1622

… sel<=1%) examples/sweep.rs: full selectivity x rho grid, cost-at-matched-recall comparison (tune A's probe cap to ACORN's recall, then compare distance-evals), with the wall-clock honesty guard and the rho=0 kill control. VERDICT vs the frozen gate (n=20k, ACORN gamma2 ef=512, IVF nclusters=64): - WIN at sel<=1%, rho>=0.7: region-pruned IVF beats tuned ACORN by 6.1-48x evals and 4.7-26x wall-clock at equal-or-better recall (A's exact B&B recall >= ACORN). e.g. rho=1 sel=1%: ACORN 92.6%@1622 evals vs A 99.9%@264 evals = 6.1x (4.7x wall). - MISS at sel=5%: best 1.5x (gate wanted >=2x). The win is a low-selectivity (<=1%) phenomenon — the dominant production metadata-filter regime, but a real boundary, not the full pre-registered claim. - Mechanism partly refuted: A also wins at rho=0 (low sel), so the eval advantage is selectivity-driven (few matches -> cheap exact B&B) more than correlation- driven; correlation governs recall, not cost. Reported, not buried. - rho=0 kill control: A does NOT collapse (recall-safe); high-sel (>=10%) A loses as expected (ACORN's regime). Wall-clock guard: PASS (win survives the clock). nclusters is A's tuning knob (parallel to ACORN's ef): 64 beats 128 in the win regime (cheaper routing); both confirm the same boundary.

…y fails the gate Adds predicate-aware-entry ACORN (the rule-ruvnet#5 "tune harder" adversary): - ruvector-acorn: acorn_search_seeded_counted (beam starts from caller seeds instead of multi-probe entry); acorn_search_impl refactored to take Option<seeds>, existing fns pass None — 13 acorn tests still green (behavior preserved). - contenders.rs: Acorn::search_predicate_entry — stride-sample probes, predicate-test free, distance-eval only matching probes, seed the beam from the nearest matches. - examples/adversarial.rs: A vs best-of(vanilla-B, predicate-entry-D) at matched recall. FINDING (rule ruvnet#5 changed the verdict): predicate-aware entry slashes ACORN's cost at HIGH correlation (rho=1 sel=0.1%: 3753 -> 203 evals), collapsing A's advantage from 44.7x (vs vanilla) to 2.4x — BELOW the pre-registered 5x bar. A vs best ACORN: rho=1.0: 2.4x / 2.3x / 1.9x (sel .001/.005/.01) — MISS at the 5x bar. rho=0.7: 38.8x / 14.6x / 6.5x — WIN (D's seeding is weak at moderate correlation, where matches are scattered so a seeded walk still wanders). So A and predicate-entry-ACORN exploit the SAME structure and converge (~2x) at high correlation; A's clean win is NOT robust to a properly-tuned ACORN. Honest verdict: largely a KILL at the pre-registered bar, with a narrower conditional edge at rho~0.7. Caveat favoring A: D's seeding leans on ~16k "free" predicate tests (the eval metric ignores the O(1) predicate scan); at scale that scan isn't free, restoring some edge.

…O-GO (M4) Writes up the BET 2 ⊗ BET 4 outcome with ADR-199/200 honesty. Verdict: region-pruned IVF beats VANILLA ACORN 6-48x evals (4.7-26x wall-clock) at sel<=1%, but the pre-registered >=5x WIN does NOT survive the rule-ruvnet#5 adversarial check — giving ACORN a predicate-aware entry collapses the gap to ~2x at high correlation (rho=1), below the bar. A retains a narrow conditional edge at moderate correlation (rho~0.7, 6-39x) plus an at-scale caveat (D's seeding leans on a ~full predicate scan the eval metric treats as free). Net: the bet does not cleanly pay; the clean win was an artifact of an under-equipped incumbent. Central lesson: a filtered-ANN cost claim is meaningless without a predicate-aware-entry baseline. Also strips a stray tag from the pre-registration doc (non-semantic).

The experiment's own evidence points to two flip conditions (conjunctions where ACORN's predicate-seeding degrades but cluster-skip composes; large-n where the predicate scan stops being free) and the open BET 4 standalone baseline.

… hold) A conjunction is a single O(1) boolean predicate of selectivity = product; in the distance-eval metric it reduces to (selectivity, scatter) — both already swept. The 'exponentially-unlikely seed' reasoning was wrong (testing a conjunction is O(1)). Residual leads downgraded to narrow/speculative (predicate-eval cost, large-n). Recommend closing BET 2 ⊗ BET 4; thread value is BET 1 productionization + BET 3.

…e — scale-gated WIN (ADR-206) (#542) * docs(bet4): pre-register LB-B&B IVF vs plain-IVF nprobe gate (FROZEN) Closes the BET 4 caveat left open by ADR-201: the region-pruning IVF kernel was only run against ACORN (BET 2), never against its natural incumbent, plain IVF nprobe, on unfiltered ANN. Frozen gate: WIN = >=2x member-scan reduction at matched recall@10 (R=0.95) AND wall-clock win across nclusters in {64,256,1024}; KILL = <1.5x or wall-clock reverses. Two controls: exact-vs-exact pruning-fraction probe + low-d (PCA-8) soundness control. Honest prior: NO-GO lean (128-d concentration makes the triangle-inequality bound loose) — the IVF-level companion to ADR-199. Branch off clean main; B&B kernel rebuilt self-contained (BET 2's lives only on #536). * feat(bet4): M0 — self-contained BnBIvf kernel + oracle gate (exactness certified) New crate ruvector-bet4-ivf-bench (deps: ruvector-rairs, rand). - data.rs: aligned arxiv 128-d feature CSV loader. - kernel.rs: BnBIvf — IVF probed in ascending lower-bound order with B&B early termination (break when LB >= kth-best); LB(q,c)=max(0,|q-mu_c|-r_c), r_c=max member radius. Full budget = exact; max_probe cap = nprobe analogue. Built on ruvector-rairs kmeans so it shares centroids with the IvfFlat incumbent (shared-index pre-reg requirement). - oracle.rs: brute-force exact kNN + recall@k + shared true-L2 helper. - M0 gate test PASSES on real arxiv slice: full-budget B&B == oracle (recall@10 >= 0.999) → B&B invariant certified. clippy clean. Frozen gate: docs/plans/bet4-ivf-pruning/PRE-REGISTRATION.md. Off clean main. * feat(bet4): M1 — instrumented plain-IVF incumbent on shared index + faithfulness gate BnBIvf::search_nprobe: the plain-IVF incumbent strategy (nprobe nearest centroids, scan all members, no B&B) on the SAME centroids/lists as the B&B contender, with member-eval counting. Refactored top-k accumulation into shared consider()/finalize() so both strategies accumulate identically and only the probe loop differs (shared-index pre-reg requirement). New gate instrumented_nprobe_matches_rairs PASSES: recall matches ruvector-rairs::IvfFlat within 0.01 at matched params → the cost-measured incumbent is algorithmically the real one. 3 tests green. * feat(bet4): M2/M3 — steelman B&B + PCA-8 control + matched-recall sweep - kernel: search_bnb_skip — the STEELMAN. Centroid-distance order (the effective nprobe ordering) + per-cluster LB-skip (correctness-safe in any order, unlike the LB-order global break). The strongest cluster-level B&B: if it can't beat tuned nprobe, the bound doesn't pay. - pca: minimal power-iteration top-m PCA (no linalg dep) for the low-dim control — projects real arxiv features to 8-d where the bound is tight. - examples/ivf_pruning_sweep: 3 contenders share one index per nclusters (plain nprobe / B&B LB-order / B&B steelman) x 2 regimes (128-d, PCA-8), exact-regime pruning probe, matched-recall@0.95, frozen-gate verdict. RESULT (n=20k & n=50k both): steelman = 1.00x evals vs nprobe in EVERY cell, BOTH regimes. NO-GO. Mechanism is structural, not dimensional: the LB bound only prunes FAR clusters that tuned nprobe already skips, so it's redundant with nprobe's centroid-distance cutoff. Exact-prune fraction scales correctly with dim (0-13% @128-d, 8-87% @PCA-8) => kernel sound; the redundancy is fundamental. LB-ORDER (faithful BET-2 kernel) is strictly WORSE (0.18-0.25x) — LB-ordering probes far large-radius clusters early. * docs(bet4): ADR-205 — cluster-pruning vs plain IVF nprobe = structural NO-GO Verdict: NO-GO (robust, structural). Steelman B&B (centroid order + LB-skip) ties tuned nprobe at exactly 1.00x member-evals in every cell, n=20k & n=50k, 128-d & PCA-8. Mechanism: the triangle-inequality bound only prunes FAR clusters that tuned nprobe already skips => redundant with nprobe's centroid-distance cutoff; win is structurally impossible, not just hard in high-d. LB-order (faithful BET-2 kernel) strictly worse (0.18-0.25x). Companion to ADR-199. Honest deviation recorded: the pre-registered PCA-8 control expected a B&B WIN (tight bound). It tied instead — the premise was false (tight bound beats full-scan, not tuned nprobe). Control still valid: exact-prune fraction scales correctly with dim (0-13% @128-d, 8-82% @PCA-8) => kernel sound; it revealed the structural redundancy. Scoreboard 2 WINS / 4 KILLS. * chore(bet4): lockfile for ruvector-bet4-ivf-bench workspace member * docs(bet5): FROZEN pre-registration — PQ/IVFADC within-list pruning vs tuned nprobe Opens the one lever ADR-205 left explicitly open (within-list PQ asymmetric distance, orthogonal to the killed cluster-level bound). Frozen gate: PQ must beat the cheaper of {plain full-L2, early-abandon exact-L2} nprobe by >=2x full-L2-equivalent member-evals at recall@10=0.95 AND wall-clock, across nclusters{64,256,1024} at >=1 scale N>=50k. Honest prior: ~55% win-at-scale, named kill-paths = amortization crossover + concentration re-rank ceiling. Stacked on feat/seprag-bet4-ivf-pruning to reuse ruvector-bet4-ivf-bench. Thread #534. * feat(bet5): M0 — PqIvf (IVFADC) kernel + early-abandon steelman + gate PqIvf trains m sub-quantizers on the shared ruvector-rairs k-means substrate (kmeans assignments ARE the PQ codes), encodes corpus to m-byte codes, and adds search_adc_rerank (cheap ADC scan of nprobe lists + exact L2 re-rank of top-R) plus search_adc_only (pure-ADC ceiling probe). AdcCost charges everything in one honest unit: 256 (LUT) + adc_members*m/D + rerank*1 full-L2-equivalents. BnBIvf gains search_nprobe_abandon = the early-abandon exact-L2 steelman incumbent (user-confirmed verdict-setter), charged in dims_touched/D. Gates (real 2k arxiv slice): PqIvf shares centroids w/ BnBIvf; PQ@full-rerank exact (recall>=0.999); early-abandon exact vs full L2 (<0.001). 6 tests green, clippy clean. Thread #534, BET5 pre-reg frozen at 1d920b3. * feat(bet5): M1/M2/M3 — matched-recall PQ sweep harness examples/pq_pruning_sweep.rs: shared index per nclusters; tune incumbent nprobe to min reaching recall@10>=0.95; PQ scans the SAME nprobe lists (cannot rerank an unscanned neighbour) and we tune the smallest re-rank R recovering >=0.95. Charges all PQ ops in full-L2-equivalents (256 LUT + adc*m/D + R rerank). Reports pure-ADC ceiling, R*, early-abandon dim-prune fraction, wall-clock, crossover n*, frozen gate. Thread #534. * style(bet5): clippy-clean PQ kernel + sweep (iterator idioms, type alias) * perf(bet5): shared IvfParts — build k-means once per cell, not per contender Extract build_ivf -> IvfParts; BnBIvf::from_parts + PqIvf::from_parts reuse one seeded k-means for the incumbent and every PQ(m). Cuts the worst cell (nc=1024 @100k) from 3x k-means to 1x while guaranteeing the shared-index property by construction. Behavior-preserving (N=5000 numbers identical). 6 tests green. * fix(bet5): charge routing (nclusters centroid evals) to both contenders Pre-reg accounting + 'no free routing' adversarial check require the nclusters query-centroid routing evals charged equally to incumbent AND PQ. Harness omitted it, silently flattering PQ where routing dominates (high nclusters). Now prints member-only ratio (transparency) AND the gate-deciding TOTAL ratio with routing; verdict decided on total. Wall-clock already included routing (search computes centroid dists) so the wall guard was already honest. Re-run authoritative. * docs(bet5): ADR-206 — PQ/IVFADC within-list pruning = scale-gated WIN Opens ADR-205's one open lever (within-list PQ asymmetric distance, orthogonal to the killed cluster-level bound). PQ (cheap ADC scan + exact top-R rerank) beats tuned plain nprobe AND the early-abandon exact-L2 steelman by >=2x full-L2-equivalent member-evals at recall@10=0.95 AND wall-clock, across all three nclusters{64,256,1024} at N=100k. Win GROWS with N, crossover n* RISES with nclusters (routing amortization) -> >=2x at nclusters~sqrt(n) from n~20-50k. Honest caveats (none buried): win rides on the exact rerank not pure ADC (ceiling ~0.5) = IVFADC+refine validated, not a new method; scale-gated (full sweep only at 100k); nc=1024/100k knife-edge 2.03x; m=16 tuned; recall-floor tunability flatters PQ modestly; steelman halved the naive-L2 ratio. Routing charge bug in my own harness caught by the pre-registered 'no free routing' check (nc=1024/50k 2.24x member -> 1.65x total). Scoreboard 3 WINS / 4 KILLS. Thread #534, pre-reg frozen at 1d920b3. --------- Co-authored-by: ruv <ruvnet@users.noreply.github.com>

shaal added 7 commits June 4, 2026 14:44

shaal mentioned this pull request Jun 4, 2026

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN #534

Open

5 tasks

shaal mentioned this pull request Jun 5, 2026

SepRAG BET 3 — curated-KG treewidth probe → NO-GO (finding, ADR-203) #538

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

research(seprag): BET 2⊗4 filtered-ANN region-pruning — qualified NO-GO (ADR-201) [finding, not a feature]#536

research(seprag): BET 2⊗4 filtered-ANN region-pruning — qualified NO-GO (ADR-201) [finding, not a feature]#536
shaal wants to merge 8 commits into
ruvnet:mainfrom
shaal:docs/seprag-bet2-filtered-ann

shaal commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 4, 2026

TL;DR

What's in this PR (independent of #535)

Why it may still be worth landing despite the NO-GO

Honest verdict (cost at matched recall, n=20k arxiv)

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant