SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN

## Summary

Exploration of **CCH-inspired hierarchical retrieval** ("SepRAG") for RuVector's hybrid
vector + knowledge-graph memory, run as a disciplined, evidence-gated research thread. It
produced one **negative** result (recorded honestly) and one **proven, shippable** result.

This is a tracking/epic issue. The first deliverable (ADRs 196–200 + a reference crate +
six experiment harnesses) is in the companion PR. The remaining bets stay open below.

## Measured impact (proven, with scope)

A **new capability** — adapt the ANN index to a *changed relevance metric* by re-weighting
the existing graph instead of rebuilding it. It does **not** change query latency or
absolute recall of existing search; it removes the rebuild cost a metric change otherwise
incurs.

- **Update cost (metric change), reference Vamana:** 141.8s → 0.035s at n=100k (**~4,000×**
  lower); ratio ~1,000–4,000× across n=5k–100k (rebuild is super-linear; re-weight ≈ a
  medoid recompute).
- **Production `ruvector-diskann` Vamana (n=20k):** rebuild ~7s → re-weight ~0.01s, recall
  within **2%** of rebuild (96–99% absolute).
- **Recall preserved:** within 2% of full rebuild across diagonal / rotational / non-linear
  / region-local / compounding drift, up to n=100k.
- **Hybrid operating point:** re-weight every step + rebuild every ~4 → **98.8% vs 99.1%
  (always) / 94.4% (never), at 25% of rebuild cost.**
- **Not yet proven:** drift is synthetic (parametric), not a live `ruvector-gnn`
  learned-metric trajectory; production-loop integration is backlog item #1. No query-latency
  or recall improvement to existing search.

## Motivation

Customizable Contraction Hierarchies (CCH) give ~35,000× speedups on road networks by
pre-processing a separator hierarchy. The question: can the same machinery (nested
dissection, separators, contraction shortcuts, elimination trees, separator-tree k-NN, and
**metric-independent + customizable** preprocessing) give large search-space reduction and
cheap **self-learning re-weighting** for embedding + KG retrieval?

## Findings

### ❌ CCH full-contraction does NOT fit embedding retrieval (NO-GO, measured)

Validated implementation (`ruvector-seprag`, separator-tree k-NN == Dijkstra oracle on
toy + real graphs), then measured on real public data:

| Backbone (n=1500) | blowup `\|G+\|/\|G_nav\|` | elim-tree height |
|---|---|---|
| roadNet-PA (control) | 7.6× | ~3.5·√n |
| ogbn-arxiv citation | 23.8× | ~0.6·n |
| ogbn-arxiv feature kNN | 42.4× | ~0.56·n |

The road control proves the implementation is sound; both embedding-derived backbones are
intrinsically high-treewidth, so contraction blows up. HNSW already owns embedding kNN.
(ADR-199.)

### ✅ Customizable re-weighting (the salvaged idea) — proven WIN

Decoupled from CCH: in a self-learning store the metric drifts; instead of rebuilding the
ANN index, **reuse the fixed topology and re-score under the new metric.** Validated on the
production `ruvector-diskann` Vamana:

- Recall within **2%** of a full rebuild across **diagonal / rotational / non-linear /
  region-local / compounding** drift, up to **n=100k**, at **~1,000–4,000× lower update cost**.
- Hybrid operating policy: re-weight every step + rebuild every ~4 steps → **98.8% recall
  (vs 99.1% always-rebuild, 94.4% never) at 25% of the rebuild cost**.
- Honest caveats: the recall gap widens with scale/churn (defer/batch rebuilds, not never);
  a drift-*triggered* rebuild monitor underperformed simple periodic (needs a better signal).

(ADR-200.)

## Deliverables (companion PR)

- ADR-196…200 (`docs/adr/`) — the full decision record (one dead end + one win).
- `docs/plans/seprag-cch-retrieval/` — milestone plans, `FUTURE-DIRECTIONS.md` (backlog +
  "prove-not-hype" protocol).
- `crates/ruvector-seprag/` — reference CCH separator-tree k-NN + shared ANN engine +
  6 experiment harnesses (`reweight_vs_rebuild`, `scale_drift`, `region_drift`,
  `diskann_drift`, `hybrid_policy`, `blowup_report`). Tests + clippy green.

## Backlog (open — same prove-not-hype protocol: one claim+number, beat a tuned in-repo
incumbent, public data, pre-registered win/kill gate, adversarial check)

- [ ] **Productionize the win** — wire "re-weight + periodic rebuild" into the
  `ruvector-diskann`/`ruvector-gnn` loop behind a flag, on a real learned-metric trajectory.
- [ ] **Smarter rebuild trigger** — sampled-recall probe vs the Frobenius monitor.
- [ ] **BET 2** — filtered ANN (predicate-constrained) vs `ruvector-acorn`.
- [ ] **BET 3** — multi-hop Graph-RAG on a sparse curated KG (HotpotQA / MuSiQue ground truth).
- [ ] **BET 4** — region-pruning query on an IVF/clustering hierarchy (`rairs-ivf`, treewidth-immune).

## Reproduce

Datasets are public (ogbn-arxiv, SNAP roadNet-PA). Each harness prints its pre-registered
gate. See `docs/plans/seprag-cch-retrieval/FUTURE-DIRECTIONS.md` for the protocol.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly