fix(glob): bound gitignore matching memory to prevent scan OOM#1377
Draft
Simon (simonhj) wants to merge 3 commits into
Draft
fix(glob): bound gitignore matching memory to prevent scan OOM#1377Simon (simonhj) wants to merge 3 commits into
Simon (simonhj) wants to merge 3 commits into
Conversation
socket fix and socket scan aborted with "FATAL ERROR: CALL_AND_RETRY_LAST ... heap out of memory" (SIGABRT) on large monorepos. globWithGitIgnore discovers every nested .gitignore and unions their patterns; the non-negated code path handed that whole set to fast-glob's native ignore option. fast-glob re-compiles and re-tests its entire ignore array inside each directory scan, so a set of tens of thousands of patterns exhausts V8 code space, which raising --max-old-space-size does not relieve. Route the high-cardinality gitignore set through a single reused ignore instance (which compiles each rule once and memoizes it) and hand fast-glob only the small bounded set it needs to prune directories during the walk. The negated-pattern path already worked this way; this unifies both paths and removes the asymmetry that left the common case crashing. Add a regression test that builds a 100k-pattern nested-.gitignore tree and asserts the walk completes with the correct manifests, and correct a comment in getPackageFilesForScan that overstated what the streaming filter prevents.
Routing the non-negated path through the ignore package introduced two parity gaps versus fast-glob's native ignore matching: - The ignore package defaults to case-insensitive matching, while fast-glob (caseSensitiveMatch defaults to true) and git match case-sensitively. Build the matcher with ignorecase derived from caseSensitiveMatch so a `dist/` entry no longer also ignores a differently-cased `Dist/` sibling. - path.relative yields backslash-separated paths on Windows, which never match the forward-slash-anchored patterns. Normalize the relative path with normalizePath before ig.ignores(), matching how the patterns are anchored. Add a case-sensitivity regression test (dist/ vs Dist/).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
socket fixandsocket scanabort withFATAL ERROR: CALL_AND_RETRY_LAST … heap out of memoryon large monorepos that contain many nested.gitignorefiles.Cause
globWithGitIgnorediscovers every nested.gitignore, unions all their translated patterns into one set, and handed that entire set to fast-glob's nativeignoreoption. fast-glob re-compiles and re-tests its whole ignore array inside every directory scan, so a union of tens of thousands of patterns exhausts V8 code space and aborts the process. Raising--max-old-space-sizedoes not help — the allocation is regex executable code, not the data heap.Fix
Match the high-cardinality gitignore set through a single reused
ignoreinstance (which compiles each rule once and memoizes it) applied per streamed entry, and hand fast-glob only the small bounded set it needs to prune directories during the walk. The negated-pattern path already worked this way; this unifies both paths and removes the asymmetry that left the common, non-negated case crashing.Two parity details versus fast-glob's native ignore matching are preserved:
caseSensitiveMatch(default case-sensitive, matching git) rather than theignorepackage's case-insensitive default, sodist/no longer also ignores a differently-casedDist/.Tests
.gitignoretree and asserts the walk completes with the correct manifests; the pre-fix path exhausts a constrained worker heap at that count.dist/ignoresdist/but leavesDist/alone.globsuite stays green.