Skip to content

Local hybrid retrieval index (Vectorize + BM25) over extracted chunks #14

Description

@harrymove-ctrl

Context

Grounded chat (memwalChat, apps/api/src/worker.ts ~3498) delegates semantic recall entirely to the MemWal relayer (MemWalMcpClient.recallSiteContext) with a keyword-only fallback and a fixed-ish topK; there is no local vector index. The relayer is flaky enough that packages/memwal already wraps it in retries. Meanwhile buildChunks already produces stable, citable chunks at build time (worker ~2186/2790) that are never embedded or indexed.

Goal / user story

As a user chatting against a namespace, I want fast, reliable hybrid retrieval (vector + keyword) over the namespace's own chunks, so answers are well-grounded even when the MemWal relayer is slow or down.

Acceptance criteria

  • A Vectorize binding is added to wrangler.jsonc/WorkerEnv, with an index dimension matching the chosen Workers AI embedding model.
  • At build time (queue consumer, where buildChunks already runs) each chunk is embedded via Workers AI (@cf/baai/bge-*) and upserted to Vectorize keyed by ${namespace}:${chunkId} with metadata { namespace, chunkId, routePath, url }; chunk text is persisted (D1 or R2) for retrieval-time hydration.
  • memwalChat performs hybrid retrieval: embed the query → Vectorize topK (namespace-filtered) fused with a keyword/BM25 pass via reciprocal-rank fusion, then MMR de-dup, with a configurable topK.
  • MemWal relayer recall becomes a fallback used only when the local index returns nothing, removing it from the hot path.
  • The chat sources[] shape (url/routePath/quote/blobId) is preserved so the existing UI keeps rendering per-turn provenance.

Implementation notes

  • Touch apps/api/cloudflare/wrangler.jsonc, apps/api/src/worker.ts (build/queue consumer + memwalChat), plus a small D1 migration for a contextmem_chunks text store (or reuse R2 chunks.ndjson).
  • Pick model + dims explicitly: @cf/baai/bge-base-en-v1.5 (768) or bge-small (384). Batch embeddings to respect Workers AI rate limits.
  • BM25: a lightweight keyword score over the candidate set is sufficient; full inverted index is optional. RRF (1/(k+rank)) keeps fusion simple and tuning-free.
  • Gotchas: Vectorize upserts are eventually consistent (don't query immediately after upsert in the same run); use metadata filtering for per-namespace isolation; account for Vectorize/AI cost per build. This index is off-chain edge infra — verifiability still comes from the receipt + chunkGraphDigest + stored ciphertext, not the index.

Sui Overflow angle

A visibly fast, well-cited grounded-chat answer is the centerpiece of the live demo; replacing the flaky external relayer with a local hybrid index makes that moment reliable on stage and showcases the quality of on-Walrus context.

Dependencies

Builds on "Token-aware chunking with overlap" (better chunks → better embeddings). Decision needed: commit to Cloudflare Vectorize binding/cost vs. continue delegating recall to MemWal.

Part of the ContextMEM roadmap (#4) • Sui Overflow build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P0Demo-blocking: required for a working Sui Overflow democrawlingWeb/Walrus crawling and context-extraction qualityfeatureUser- or agent-facing capabilityplatformBackend platform plumbing: Worker, D1, queues, secrets, metering

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions