Local hybrid retrieval index (Vectorize + BM25) over extracted chunks

## Context
Grounded chat (`memwalChat`, `apps/api/src/worker.ts` ~3498) delegates semantic recall entirely to the MemWal relayer (`MemWalMcpClient.recallSiteContext`) with a keyword-only fallback and a fixed-ish topK; there is no local vector index. The relayer is flaky enough that `packages/memwal` already wraps it in retries. Meanwhile `buildChunks` already produces stable, citable chunks at build time (worker ~2186/2790) that are never embedded or indexed.

## Goal / user story
As a user chatting against a namespace, I want fast, reliable hybrid retrieval (vector + keyword) over the namespace's own chunks, so answers are well-grounded even when the MemWal relayer is slow or down.

## Acceptance criteria
- [ ] A Vectorize binding is added to `wrangler.jsonc`/`WorkerEnv`, with an index dimension matching the chosen Workers AI embedding model.
- [ ] At build time (queue consumer, where `buildChunks` already runs) each chunk is embedded via Workers AI (`@cf/baai/bge-*`) and upserted to Vectorize keyed by `${namespace}:${chunkId}` with metadata `{ namespace, chunkId, routePath, url }`; chunk text is persisted (D1 or R2) for retrieval-time hydration.
- [ ] `memwalChat` performs **hybrid retrieval**: embed the query → Vectorize topK (namespace-filtered) fused with a keyword/BM25 pass via reciprocal-rank fusion, then MMR de-dup, with a configurable `topK`.
- [ ] MemWal relayer recall becomes a **fallback used only when the local index returns nothing**, removing it from the hot path.
- [ ] The chat `sources[]` shape (url/routePath/quote/blobId) is preserved so the existing UI keeps rendering per-turn provenance.

## Implementation notes
- Touch `apps/api/cloudflare/wrangler.jsonc`, `apps/api/src/worker.ts` (build/queue consumer + `memwalChat`), plus a small D1 migration for a `contextmem_chunks` text store (or reuse R2 `chunks.ndjson`).
- Pick model + dims explicitly: `@cf/baai/bge-base-en-v1.5` (768) or `bge-small` (384). Batch embeddings to respect Workers AI rate limits.
- BM25: a lightweight keyword score over the candidate set is sufficient; full inverted index is optional. RRF (`1/(k+rank)`) keeps fusion simple and tuning-free.
- Gotchas: Vectorize upserts are **eventually consistent** (don't query immediately after upsert in the same run); use metadata filtering for per-namespace isolation; account for Vectorize/AI cost per build. This index is off-chain edge infra — verifiability still comes from the receipt + `chunkGraphDigest` + stored ciphertext, not the index.

## Sui Overflow angle
A visibly fast, well-cited grounded-chat answer is the centerpiece of the live demo; replacing the flaky external relayer with a local hybrid index makes that moment reliable on stage and showcases the quality of on-Walrus context.

## Dependencies
Builds on "Token-aware chunking with overlap" (better chunks → better embeddings). Decision needed: commit to Cloudflare Vectorize binding/cost vs. continue delegating recall to MemWal.

_Part of the ContextMEM roadmap (#4) • Sui Overflow build._

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local hybrid retrieval index (Vectorize + BM25) over extracted chunks #14

Context

Goal / user story

Acceptance criteria

Implementation notes

Sui Overflow angle

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local hybrid retrieval index (Vectorize + BM25) over extracted chunks #14

Description

Context

Goal / user story

Acceptance criteria

Implementation notes

Sui Overflow angle

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions