fix: guard codebase fragment retrieval against oversized files (APP-4801)#13285
Draft
warp-dev-github-integration[bot] wants to merge 1 commit into
Draft
fix: guard codebase fragment retrieval against oversized files (APP-4801)#13285warp-dev-github-integration[bot] wants to merge 1 commit into
warp-dev-github-integration[bot] wants to merge 1 commit into
Conversation
) Add a file-size check via is_file_parsable() before each async_fs::read_to_string call in build_fragments_from_metadata. The retrieval path had no upper bound on file size; async_fs::read_to_string (backed by std::fs::read_to_string) calls String::try_reserve_exact with the full file-metadata length before reading. On macOS, virtual-memory reservations for sparse or otherwise very large files can succeed, inflating the process footprint by many GiB and tripping the Excessive memory usage monitor. During indexing, is_file_parsable enforces a 3 MB ceiling for the same reason. The retrieval path (build_fragments_from_metadata) skipped this guard, allowing a file that grows large after indexing to cause a memory spike at search time. Sentry: https://sentry.io/organizations/warpdotdev/issues/7259255054/ Linear: https://linear.app/warpdotdev/issue/APP-4801 Co-Authored-By: Oz <oz-agent@warp.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
build_fragments_from_metadatain the codebase embedding retrieval path read source files viaasync_fs::read_to_stringwithout any file size check.std::fs::read_to_string(underneath) callsString::try_reserve_exact(metadata.len())before reading; on macOS, virtual-memory reservations for sparse or very large files succeed even when physical RAM is unavailable, inflating process footprint by tens of GiB and tripping the "Excessive memory usage detected" Sentry monitor.This is a follow-up to PRs #13196 and #13198 (APP-4801), which already fixed
warp_filesand project/global rule files. The code search retrieval path (build_fragments_from_metadata) was left uncovered.Fix: add
is_file_parsable(&path)before eachasync_fs::read_to_stringcall inbuild_fragments_from_metadata. This enforces the same 3 MB ceiling already used during indexing (repo_metadata::entry::MAX_FILE_SIZE), so a file that grows large after being indexed can no longer cause a memory spike at search time.Linked Issue
APP-4801 — https://linear.app/warpdotdev/issue/APP-4801/memory-spike-28-gib-unbounded-async-fsread-to-string-reserves-entire
Sentry: https://sentry.io/organizations/warpdotdev/issues/7259255054/
ready-to-specorready-to-implement.Testing
cargo check -p ai --features local_fs— passes (1m 40s)cargo clippy -p ai --features local_fs -- -D warnings— clean, no warnings./script/format— clean, no diffsI have manually tested my changes locally with
./script/runAgent Mode
Conversation: https://staging.warp.dev/conversation/0c4ea16b-31e2-4093-9be8-ea58e9bd1501
Run: https://oz.staging.warp.dev/runs/019f1ef4-435d-7d97-97c9-67077d5d056f
This PR was generated with Oz.