fix: prevent Tantivy segment accumulation in AsyncSearcher bulk index builds (APP-4767)#13286
Draft
warp-dev-github-integration[bot] wants to merge 1 commit into
Draft
fix: prevent Tantivy segment accumulation in AsyncSearcher bulk index builds (APP-4767)#13286warp-dev-github-integration[bot] wants to merge 1 commit into
warp-dev-github-integration[bot] wants to merge 1 commit into
Conversation
… builds (APP-4767) build_index_async previously sent one SearcherEvent::DocumentInserted per document, so with a 75ms batch window each batch of ≤100 docs triggered a separate commit() and created a separate Tantivy segment. Users with many Warp Drive objects (notebooks, workflows, env-vars) accumulated 10–23+ segments per searcher (seen in Sentry breadcrumbs: 'Prepared commit 23'), and each in-RAM segment consumes memory proportional to its content, driving total footprints into the 8–11 GB range. Fix: introduce SearcherEvent::BulkDocumentsInserted(Vec<…>) and rewrite build_index_async to collect all documents into a single Vec and send them as one channel message. The background consumer processes the entire Vec in a single execute_operations call → single commit() → single Tantivy segment, regardless of how many documents are indexed. insert_document_async (used for low-frequency incremental updates) is unchanged. Co-Authored-By: Oz <oz-agent@warp.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes excessive Tantivy segment accumulation in
AsyncSearcher::build_index_async, which caused multi-GB memory spikes for users with many Warp Drive notebooks/workflows.Root cause:
build_index_asyncpreviously sent oneSearcherEvent::DocumentInsertedper document. With a 75 ms batch window and up to 100 events per batch, a user with 200 notebooks would generate 2+ separatecommit()calls during each index rebuild — each commit creating a new in-RAM Tantivy segment. Sentry breadcrumbs confirmed segment accumulation reaching "Prepared commit 23" (23 segments), driving total heap usage to 8–11 GB.Fix: Introduces
SearcherEvent::BulkDocumentsInserted(Vec<FullTextSearchDocumentEntry>)and rewritesbuild_index_asyncto collect all documents into a singleVecand send them as one channel message. The background consumer processes the entireVecin a singleexecute_operationscall → onecommit()→ one Tantivy segment, regardless of how many documents are indexed.insert_document_async(used for low-frequency incremental updates) is unchanged.This is a complementary fix to #12819 (which reduces per-document size via content truncation + budget reductions). Together they prevent both the root causes of segment bloat.
Linked Issue
Linear: APP-4767
Sentry: https://warpdotdev.sentry.io/issues/7259255054/
The linked issue is labeled
ready-to-specorready-to-implement.Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes).
Testing
warp_search_coresearcher tests continue to pass.cargo checkandcargo clippy -p warp_search_core -- -D warningspass with no errors or warnings../script/runAgent Mode
Conversation: https://staging.warp.dev/conversation/3d7980ab-21db-484a-8d10-108238390087
Run: https://oz.staging.warp.dev/runs/019f1ef4-435d-7d06-97a4-d8eac0e04d7b
This PR was generated with Oz.