Skip to content

RLV Locator: add lightweight semantic search as third RRF signal #89

@unamedkr

Description

@unamedkr

Summary

Add a lightweight sentence embedding model (all-MiniLM-L6-v2, 22M params) as the third signal in the Reciprocal Rank Fusion (RRF) locator, catching semantic matches that BM25 and keywords miss.

Problem

Current locator uses BM25 + keyword overlap. On the 1.3MB large-doc test, Q15 fails because "temnein" (Greek word meaning "to cut") is in chunk 531, but BM25 picks chunk 553 (which discusses "Stegocephalia" = "roof-headed"). The semantic connection between "what does temnein mean" and the chunk containing "temnein (to cut)" is obvious to humans but invisible to keyword matching.

Proposed Solution

# Three-signal RRF (currently two)
rrf[cid] = (1/(60+rank_keyword) +
            1/(60+rank_bm25) +
            1/(60+rank_semantic))  # NEW

Embedding model selection

Model Params CPU latency Quality
all-MiniLM-L6-v2 22M ~30ms/query Good
BGE-small-en 33M ~50ms/query Better
nomic-embed-text 137M ~200ms/query Best

MiniLM is recommended: 30ms per query on CPU, no GPU needed.

Pre-computation

Chunk embeddings are computed once during quantcpp index and stored alongside KV caches. Per-query cost is only one embedding (30ms).

Expected Impact

  • Q15 (temnein): semantic similarity catches the correct chunk
  • 19/20 → 20/20 on large-doc test
  • General: better handling of paraphrased/synonym queries

Priority: P2

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions