Skip to content

feat: support deferred embedding generation#12

Open
andinux wants to merge 2 commits into
mainfrom
feat/deferred-embeddings
Open

feat: support deferred embedding generation#12
andinux wants to merge 2 commits into
mainfrom
feat/deferred-embeddings

Conversation

@andinux

@andinux andinux commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Why

UI applications (e.g. the SQLite Cloud dashboard) need uploads to appear instantly. Today every memory_add_* call computes embeddings inline, so adding content blocks on the embedding engine — and even requires a configured model just to store a file. This PR decouples the two steps: store content immediately, generate embeddings later from a background process, and let the UI track progress.

Implementation choices

No schema change. "Pending" is a derived state: a dbmem_content row with a non-empty stored value and no dbmem_vault rows. We deliberately avoided adding a status/n_chunks column to dbmem_content because that table is the cloudsync-replicated one (cloudsync_init('dbmem_content')), while "has this content been embedded" is per-node local state — a replicated column would propagate node A's "embedded" claim to node B whose local vault is empty, and would require a coordinated schema migration across replicas. The local-only dbmem_vault table is the correct home for this state, and memory_reindex()'s existing has_vault check already understands it.

Deferred add is a skip, not a new pipeline. With defer_embeddings=1, dbmem_process_buffer performs everything it does today (dedup, stale-path cleanup, content insert, SAVEPOINT) and only skips the chunk/embed/FTS step. memory_embed_pending() reuses the same row-processing logic as memory_reindex() (one SAVEPOINT per file, hash-consistency rekeying), so a file is always either fully indexed or untouched, and an interrupted worker can simply be restarted.

New API surface

  • memory_set_option('defer_embeddings', 1) — all memory_add_* functions store content without computing embeddings or FTS entries. No embedding model is required for deferred adds. Requires save_content=1 (rejected otherwise, since the content could never be embedded later). Deferred content is invisible to memory_search until embedded.
  • memory_embed_pending([limit]) — embeds up to limit pending rows (all when omitted), returns the number processed. Designed for background workers looping with a small batch size.
  • memory_pending_count() — number of rows still waiting for embeddings, for progress reporting.
  • memory_list_files() — file nodes now include "indexed": true|false so a UI can render per-file badges. Empty files and directory markers always report true.

Zero-chunk contents

Content whose parsing yields no chunks (e.g. whitespace-only text) now inserts a single sentinel row in dbmem_vault (zeroblob(0) embedding, n_tokens=0) marking it processed. Without it, such rows would look pending forever and the worker loop would never converge. sqlite-vector ≥ 0.9.80 skips NULL/undersized blobs in all scan paths, so search is unaffected. Side fix: memory_reindex() stops re-parsing these rows on every run, and a zero-chunk first add no longer persists dimension=0 to settings (which previously broke memory_search on reopened connections).

UI flow

-- once per database
SELECT memory_set_option('defer_embeddings', 1);

-- on upload (instant, no model needed)
SELECT memory_add_content('docs/api.md', :content);
SELECT memory_list_files();          -- file appears with "indexed":false

-- background worker
SELECT memory_embed_pending(10);     -- repeat until it returns 0

-- UI polling
SELECT memory_pending_count();       -- progress = 1 - pending/total; refresh badges via memory_list_files()

Validations

Done:

  • Full unit test suite passes: 181/181 (make test DEFINES="-DTEST_SQLITE_EXTENSION")
  • New tests: deferred add stores content with no vault/FTS rows and needs no model; defer_embeddings + save_content=0 is rejected; memory_embed_pending batches correctly and converges to 0; sentinel row created for zero-chunk content (direct and deferred) with no embedding computed; indexed flag in memory_list_files(); zero-chunk first add does not persist dimension=0
  • Existing tests updated for the intended behavior changes (8 exact-JSON list_files assertions gained "indexed", whitespace-only test now expects the sentinel)
  • Extension builds cleanly (dist/memory.dylib)

Todo:

  • Manual vector_full_scan check with a sentinel row present, using a real sqlite-vector build (the unit-test harness does not load sqlite-vector) — expected: sentinel skipped, no error
  • End-to-end search-on-reopen check in the integration/sync test setup where the full extension stack is loaded
  • Confirm deployed sqlite-vector version is ≥ 0.9.80 everywhere the dashboard ships this feature

🤖 Generated with Claude Code

andinux and others added 2 commits June 9, 2026 22:45
Add a defer_embeddings option that stores content without computing
embeddings or FTS entries, so callers (e.g. a dashboard upload) can add
files instantly without an embedding model and index them later from a
background process.

- memory_set_option('defer_embeddings', 1): memory_add_* functions only
  store content in dbmem_content; requires save_content=1
- memory_embed_pending([limit]): embeds pending rows in batches, one
  SAVEPOINT per file, so an interrupted worker can be safely retried;
  rekeys rows whose stored hash no longer matches the current
  preserve_duplicate_paths scope
- memory_pending_count(): number of rows awaiting embeddings, for
  progress reporting
- memory_list_files(): file nodes now include an "indexed" boolean
- content parsing to zero chunks (e.g. whitespace-only) now inserts a
  zero-length sentinel row in dbmem_vault marking it processed, so it
  exits the pending state and memory_reindex stops re-parsing it
  (sqlite-vector >= 0.9.80 skips undersized blobs during scans)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A parse producing no chunks (e.g. whitespace-only content) reached the
dimension persistence block with ctx->dimension still 0, writing
dimension=0 to dbmem_settings and latching dimension_saved. Later real
embeddings then updated the dimension in memory only, so a reopened
connection that only searches saw dimension=0 and reported that no
content has been indexed.

Persist the dimension only when at least one real embedding was
computed (chunks_added > 0), which guarantees ctx->dimension is set.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@andinux andinux requested a review from marcobambini June 10, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants