Skip to content

fix: expand chunk IDs in remove() when chunking is enabled#13

Merged
harlan-zw merged 3 commits intomainfrom
fix/remove-with-chunking
Mar 30, 2026
Merged

fix: expand chunk IDs in remove() when chunking is enabled#13
harlan-zw merged 3 commits intomainfrom
fix/remove-with-chunking

Conversation

@oritwoen
Copy link
Copy Markdown
Collaborator

@oritwoen oritwoen commented Mar 30, 2026

remove() is a no-op when chunking is on. The driver only knows about chunk IDs like doc1#chunk-0, but remove() sends the bare parent ID doc1 - no match, nothing deleted.

Funny thing is listIds() already handles the reverse direction (collapsing chunks back to parent IDs), remove() just never got the
same treatment.

Fix: query stored IDs first, match chunks to their parents, then forward the expanded list. Added listIds() to sqlite-fts along the way since it was the only driver missing it.

Summary by CodeRabbit

  • Bug Fixes

    • Removal now correctly deletes all chunks associated with a parent document, preventing orphaned chunk results after deletion.
  • Tests

    • Added a test that verifies removing a parent document also removes its generated chunks while leaving other documents intact.

Documents indexed with a chunker are stored as `{id}#chunk-{i}` but
remove() was passing parent IDs directly to drivers, matching nothing.

Now retriv expands parent IDs to their chunk IDs via listIds() before
forwarding to drivers. Also adds listIds() to the sqlite-fts driver.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

This change adds a provider API listIds() (sqlite FTS) and updates retriv.remove(ids) to use it when chunking is enabled, expanding removals to include matching chunk IDs and cleaning up in-memory parentDocs. A test validating parent-ID-aware chunk removal was added.

Changes

Cohort / File(s) Summary
SQLite FTS Provider
src/db/sqlite-fts.ts
Added listIds(): Promise<string[]> that returns all id values from documents_meta.
Document removal & chunk handling
src/retriv.ts
remove(ids) now calls provider listIds() when chunking is enabled, expands removeIds to include chunk IDs whose parent matches input IDs, calls drivers with expanded set, and deletes original ids from parentDocs.
Tests
test/retriv.test.ts
Added test that indexes two chunked docs, removes one parent ID, and asserts chunked entries of the removed doc are gone while the other doc remains searchable.

Sequence Diagram

sequenceDiagram
    actor Caller
    participant Retriv as retriv.remove()
    participant Provider as SearchProvider (driver)
    participant ParentMap as parentDocs Map

    Caller->>Retriv: remove(['doc1'])
    alt chunking enabled
        Retriv->>Provider: listIds()
        Provider-->>Retriv: ['doc1','doc1#chunk-0','doc1#chunk-1','doc2',...]
        Retriv->>Retriv: compute removeIds (include chunks whose parent == 'doc1')
    end
    Retriv->>Provider: remove(removeIds)
    Provider-->>Retriv: ok
    Retriv->>ParentMap: delete entries for 'doc1'
    ParentMap-->>Retriv: ok
    Retriv-->>Caller: complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I nibble through ids and tidy the heap,
Chunks of a parent I gather and sweep,
A list I request, then prune with finesse,
parentDocs cleared — no clutter, no mess! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: expanding chunk IDs in the remove() method when chunking is enabled, which directly addresses the core issue described in the PR objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/remove-with-chunking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@oritwoen oritwoen self-assigned this Mar 30, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="test/retriv.test.ts">

<violation number="1" location="test/retriv.test.ts:149">
P2: Vacuous-truth risk: `every()` on an empty array always returns `true`, so this assertion passes even if `remove()` was never exercised (e.g. indexing silently failed). Asserting the length directly is more robust.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/retriv.ts`:
- Around line 224-239: When chunker is enabled the code currently only handles
chunk-aware deletes if at least one driver implements listIds(); to avoid
silently leaving chunk rows behind, add a guard after computing lister: if
chunker is true and no lister is found but there are drivers with a remove
method, throw an error (or return a rejected promise) explaining that drivers
must implement listIds() for chunked deletes; reference the existing
variables/functions chunker, drivers, lister, listIds(), remove, and removeIds
so you locate the check and replace the silent fallback with a fast failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ccdb3ad-6392-44ad-a597-73ab65e12a70

📥 Commits

Reviewing files that changed from the base of the PR and between 3f231ad and d4918e6.

📒 Files selected for processing (3)
  • src/db/sqlite-fts.ts
  • src/retriv.ts
  • test/retriv.test.ts

Address review feedback:
- throw instead of silently falling back to bare IDs
- use filter().toHaveLength(0) instead of every() to avoid vacuous truth
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/retriv.ts`:
- Around line 223-228: If remove([]) is called with an empty ids array, avoid
running the chunker expansion and potential listIds lookup; add an early no-op
return when ids is empty at the start of remove() (before the chunker logic).
Specifically, in remove() check if ids is falsy or length === 0 and return
immediately so the subsequent code that creates lister via drivers.find(d =>
d.listIds) and calls lister.listIds() is skipped, preventing the error when no
listIds-capable driver exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 970fa173-43dd-41db-8570-c4fd89214241

📥 Commits

Reviewing files that changed from the base of the PR and between d4918e6 and b82d62d.

📒 Files selected for processing (2)
  • src/retriv.ts
  • test/retriv.test.ts
✅ Files skipped from review due to trivial changes (1)
  • test/retriv.test.ts

Comment on lines +223 to +228
let removeIds = ids
if (chunker) {
const lister = drivers.find(d => d.listIds)
if (!lister)
throw new Error('remove() with chunking requires a driver that implements listIds()')
const allIds = await lister.listIds!()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle empty remove() input as a fast no-op.

At Line 224, chunk-aware expansion runs even when ids is empty. With chunking enabled and no listIds() driver, remove([]) throws instead of returning a no-op result.

Suggested fix
    async remove(ids: string[]) {
+      if (ids.length === 0)
+        return { count: 0 }
       let removeIds = ids
       if (chunker) {
         const lister = drivers.find(d => d.listIds)
         if (!lister)
           throw new Error('remove() with chunking requires a driver that implements listIds()')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/retriv.ts` around lines 223 - 228, If remove([]) is called with an empty
ids array, avoid running the chunker expansion and potential listIds lookup; add
an early no-op return when ids is empty at the start of remove() (before the
chunker logic). Specifically, in remove() check if ids is falsy or length === 0
and return immediately so the subsequent code that creates lister via
drivers.find(d => d.listIds) and calls lister.listIds() is skipped, preventing
the error when no listIds-capable driver exists.

@harlan-zw harlan-zw merged commit e9d89b9 into main Mar 30, 2026
3 checks passed
@harlan-zw harlan-zw deleted the fix/remove-with-chunking branch March 30, 2026 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants