forked from tung-lee/contextMeM
-
Notifications
You must be signed in to change notification settings - Fork 0
Local hybrid retrieval index (Vectorize + BM25) over extracted chunks #14
Copy link
Copy link
Open
Labels
P0Demo-blocking: required for a working Sui Overflow demoDemo-blocking: required for a working Sui Overflow democrawlingWeb/Walrus crawling and context-extraction qualityWeb/Walrus crawling and context-extraction qualityfeatureUser- or agent-facing capabilityUser- or agent-facing capabilityplatformBackend platform plumbing: Worker, D1, queues, secrets, meteringBackend platform plumbing: Worker, D1, queues, secrets, metering
Milestone
Description
Metadata
Metadata
Assignees
Labels
P0Demo-blocking: required for a working Sui Overflow demoDemo-blocking: required for a working Sui Overflow democrawlingWeb/Walrus crawling and context-extraction qualityWeb/Walrus crawling and context-extraction qualityfeatureUser- or agent-facing capabilityUser- or agent-facing capabilityplatformBackend platform plumbing: Worker, D1, queues, secrets, meteringBackend platform plumbing: Worker, D1, queues, secrets, metering
Context
Grounded chat (
memwalChat,apps/api/src/worker.ts~3498) delegates semantic recall entirely to the MemWal relayer (MemWalMcpClient.recallSiteContext) with a keyword-only fallback and a fixed-ish topK; there is no local vector index. The relayer is flaky enough thatpackages/memwalalready wraps it in retries. MeanwhilebuildChunksalready produces stable, citable chunks at build time (worker ~2186/2790) that are never embedded or indexed.Goal / user story
As a user chatting against a namespace, I want fast, reliable hybrid retrieval (vector + keyword) over the namespace's own chunks, so answers are well-grounded even when the MemWal relayer is slow or down.
Acceptance criteria
wrangler.jsonc/WorkerEnv, with an index dimension matching the chosen Workers AI embedding model.buildChunksalready runs) each chunk is embedded via Workers AI (@cf/baai/bge-*) and upserted to Vectorize keyed by${namespace}:${chunkId}with metadata{ namespace, chunkId, routePath, url }; chunk text is persisted (D1 or R2) for retrieval-time hydration.memwalChatperforms hybrid retrieval: embed the query → Vectorize topK (namespace-filtered) fused with a keyword/BM25 pass via reciprocal-rank fusion, then MMR de-dup, with a configurabletopK.sources[]shape (url/routePath/quote/blobId) is preserved so the existing UI keeps rendering per-turn provenance.Implementation notes
apps/api/cloudflare/wrangler.jsonc,apps/api/src/worker.ts(build/queue consumer +memwalChat), plus a small D1 migration for acontextmem_chunkstext store (or reuse R2chunks.ndjson).@cf/baai/bge-base-en-v1.5(768) orbge-small(384). Batch embeddings to respect Workers AI rate limits.1/(k+rank)) keeps fusion simple and tuning-free.chunkGraphDigest+ stored ciphertext, not the index.Sui Overflow angle
A visibly fast, well-cited grounded-chat answer is the centerpiece of the live demo; replacing the flaky external relayer with a local hybrid index makes that moment reliable on stage and showcases the quality of on-Walrus context.
Dependencies
Builds on "Token-aware chunking with overlap" (better chunks → better embeddings). Decision needed: commit to Cloudflare Vectorize binding/cost vs. continue delegating recall to MemWal.
Part of the ContextMEM roadmap (#4) • Sui Overflow build.