Skip to content

fix(glob): bound gitignore matching memory to prevent scan OOM#1377

Draft
Simon (simonhj) wants to merge 3 commits into
v1.xfrom
simon/glob-gitignore-oom-fix
Draft

fix(glob): bound gitignore matching memory to prevent scan OOM#1377
Simon (simonhj) wants to merge 3 commits into
v1.xfrom
simon/glob-gitignore-oom-fix

Conversation

@simonhj

Copy link
Copy Markdown

socket fix and socket scan abort with FATAL ERROR: CALL_AND_RETRY_LAST … heap out of memory on large monorepos that contain many nested .gitignore files.

Cause

globWithGitIgnore discovers every nested .gitignore, unions all their translated patterns into one set, and handed that entire set to fast-glob's native ignore option. fast-glob re-compiles and re-tests its whole ignore array inside every directory scan, so a union of tens of thousands of patterns exhausts V8 code space and aborts the process. Raising --max-old-space-size does not help — the allocation is regex executable code, not the data heap.

Fix

Match the high-cardinality gitignore set through a single reused ignore instance (which compiles each rule once and memoizes it) applied per streamed entry, and hand fast-glob only the small bounded set it needs to prune directories during the walk. The negated-pattern path already worked this way; this unifies both paths and removes the asymmetry that left the common, non-negated case crashing.

Two parity details versus fast-glob's native ignore matching are preserved:

  • Case sensitivity tracks caseSensitiveMatch (default case-sensitive, matching git) rather than the ignore package's case-insensitive default, so dist/ no longer also ignores a differently-cased Dist/.
  • The cwd-relative path is normalized to POSIX separators before matching, so it still matches the forward-slash-anchored patterns on Windows.

Tests

  • A regression test builds a 100k-pattern nested-.gitignore tree and asserts the walk completes with the correct manifests; the pre-fix path exhausts a constrained worker heap at that count.
  • A case-sensitivity test asserts dist/ ignores dist/ but leaves Dist/ alone.
  • The existing glob suite stays green.

socket fix and socket scan aborted with
"FATAL ERROR: CALL_AND_RETRY_LAST ... heap out of memory" (SIGABRT) on
large monorepos. globWithGitIgnore discovers every nested .gitignore and
unions their patterns; the non-negated code path handed that whole set to
fast-glob's native ignore option. fast-glob re-compiles and re-tests its
entire ignore array inside each directory scan, so a set of tens of
thousands of patterns exhausts V8 code space, which raising
--max-old-space-size does not relieve.

Route the high-cardinality gitignore set through a single reused ignore
instance (which compiles each rule once and memoizes it) and hand fast-glob
only the small bounded set it needs to prune directories during the walk.
The negated-pattern path already worked this way; this unifies both paths
and removes the asymmetry that left the common case crashing.

Add a regression test that builds a 100k-pattern nested-.gitignore tree and
asserts the walk completes with the correct manifests, and correct a
comment in getPackageFilesForScan that overstated what the streaming filter
prevents.
Routing the non-negated path through the ignore package introduced two
parity gaps versus fast-glob's native ignore matching:

- The ignore package defaults to case-insensitive matching, while fast-glob
  (caseSensitiveMatch defaults to true) and git match case-sensitively. Build
  the matcher with ignorecase derived from caseSensitiveMatch so a `dist/`
  entry no longer also ignores a differently-cased `Dist/` sibling.
- path.relative yields backslash-separated paths on Windows, which never
  match the forward-slash-anchored patterns. Normalize the relative path with
  normalizePath before ig.ignores(), matching how the patterns are anchored.

Add a case-sensitivity regression test (dist/ vs Dist/).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant