Skip to content

fix: guard codebase fragment retrieval against oversized files (APP-4801)#13285

Draft
warp-dev-github-integration[bot] wants to merge 1 commit into
masterfrom
oz/mem-triage-app4801-codebase-index
Draft

fix: guard codebase fragment retrieval against oversized files (APP-4801)#13285
warp-dev-github-integration[bot] wants to merge 1 commit into
masterfrom
oz/mem-triage-app4801-codebase-index

Conversation

@warp-dev-github-integration

Copy link
Copy Markdown
Contributor

Description

build_fragments_from_metadata in the codebase embedding retrieval path read source files via async_fs::read_to_string without any file size check. std::fs::read_to_string (underneath) calls String::try_reserve_exact(metadata.len()) before reading; on macOS, virtual-memory reservations for sparse or very large files succeed even when physical RAM is unavailable, inflating process footprint by tens of GiB and tripping the "Excessive memory usage detected" Sentry monitor.

This is a follow-up to PRs #13196 and #13198 (APP-4801), which already fixed warp_files and project/global rule files. The code search retrieval path (build_fragments_from_metadata) was left uncovered.

Fix: add is_file_parsable(&path) before each async_fs::read_to_string call in build_fragments_from_metadata. This enforces the same 3 MB ceiling already used during indexing (repo_metadata::entry::MAX_FILE_SIZE), so a file that grows large after being indexed can no longer cause a memory spike at search time.

Linked Issue

APP-4801https://linear.app/warpdotdev/issue/APP-4801/memory-spike-28-gib-unbounded-async-fsread-to-string-reserves-entire
Sentry: https://sentry.io/organizations/warpdotdev/issues/7259255054/

  • The linked issue is labeled ready-to-spec or ready-to-implement.

Testing

  • cargo check -p ai --features local_fs — passes (1m 40s)

  • cargo clippy -p ai --features local_fs -- -D warnings — clean, no warnings

  • ./script/format — clean, no diffs

  • I have manually tested my changes locally with ./script/run

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

Conversation: https://staging.warp.dev/conversation/0c4ea16b-31e2-4093-9be8-ea58e9bd1501
Run: https://oz.staging.warp.dev/runs/019f1ef4-435d-7d97-97c9-67077d5d056f
This PR was generated with Oz.

)

Add a file-size check via is_file_parsable() before each
async_fs::read_to_string call in build_fragments_from_metadata.

The retrieval path had no upper bound on file size; async_fs::read_to_string
(backed by std::fs::read_to_string) calls String::try_reserve_exact with the
full file-metadata length before reading. On macOS, virtual-memory reservations
for sparse or otherwise very large files can succeed, inflating the process
footprint by many GiB and tripping the Excessive memory usage monitor.

During indexing, is_file_parsable enforces a 3 MB ceiling for the same reason.
The retrieval path (build_fragments_from_metadata) skipped this guard, allowing
a file that grows large after indexing to cause a memory spike at search time.

Sentry: https://sentry.io/organizations/warpdotdev/issues/7259255054/
Linear: https://linear.app/warpdotdev/issue/APP-4801

Co-Authored-By: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant