Enforce zip size limits on actual decompressed bytes#638
Conversation
Switches @tryghost/zip from `extract-zip` to `yauzl` directly so that the new size limits are enforced against actual decompressed bytes, not just the central-directory metadata. A zip whose header lies about its uncompressed size (the classic zip-bomb shape) previously slipped past the pre-flight check and then hung the underlying extract-zip pipeline indefinitely — because yauzl's AssertByteCountStream overrides `destroy` in a way that swallows mid-stream errors. The new implementation opens yauzl with `validateEntrySizes: false` and runs its own StreamingCounter Transform that propagates errors cleanly through the pipeline. Behaviour is otherwise preserved: same path-traversal, __MACOSX, symlink, filename-length, and mode-handling logic ported verbatim from extract-zip; same error codes and shapes; same onEntry hook. extract-zip is removed as a dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThis PR implements optional uncompressed size limits for zip extraction in Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/zip/test/zip.test.js (1)
519-524: ⚡ Quick winTighten the expected error code in the cumulative-limit test.
This test configures only
totalUncompressedBytes, so acceptingENTRY_TOO_LARGEcan mask regressions. AssertTOTAL_TOO_LARGEonly.Suggested fix
- assert.equal( - err.code === 'TOTAL_TOO_LARGE' || err.code === 'ENTRY_TOO_LARGE', - true, - `expected TOTAL_TOO_LARGE or ENTRY_TOO_LARGE, got ${err.code}`, - ); + assert.equal(err.code, 'TOTAL_TOO_LARGE');🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/zip/test/zip.test.js` around lines 519 - 524, The test currently allows either 'TOTAL_TOO_LARGE' or 'ENTRY_TOO_LARGE' for err.code which is too permissive for the cumulative-limit scenario; update the assertion in the test (the assert.equal checking err.code) to require err.code === 'TOTAL_TOO_LARGE' only and adjust the failure message to reflect that it expected 'TOTAL_TOO_LARGE' (remove 'ENTRY_TOO_LARGE' from the check and message), leaving the surrounding assert.equal(err.errorType, 'UnsupportedMediaTypeError') intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/zip/lib/extract.js`:
- Around line 334-350: The code currently calls fs.promises.mkdir(containerDir)
before verifying the resolved path is contained in dir, allowing a crafted entry
(e.g., ../outside/...) to create outside directories; fix by computing the
intended absolute target with path.resolve(dir, entry.fileName) (or
path.resolve(dir, containerDir)) and perform the containment check (use
path.relative(dir, target) and ensure it does not start with '..' or contain
'..' path segments) before calling fs.promises.mkdir; keep the existing
post-mkdir realpath check if you want to defend against symlinks but ensure the
pre-mkdir resolve-and-validate happens first in the extract.js logic around
entryPath/containerDir and the mkdir call.
---
Nitpick comments:
In `@packages/zip/test/zip.test.js`:
- Around line 519-524: The test currently allows either 'TOTAL_TOO_LARGE' or
'ENTRY_TOO_LARGE' for err.code which is too permissive for the cumulative-limit
scenario; update the assertion in the test (the assert.equal checking err.code)
to require err.code === 'TOTAL_TOO_LARGE' only and adjust the failure message to
reflect that it expected 'TOTAL_TOO_LARGE' (remove 'ENTRY_TOO_LARGE' from the
check and message), leaving the surrounding assert.equal(err.errorType,
'UnsupportedMediaTypeError') intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 498c3a1e-3ac3-4c2b-a1b6-4060f056192a
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (4)
packages/zip/README.mdpackages/zip/lib/extract.jspackages/zip/package.jsonpackages/zip/test/zip.test.js
| const entryPath = path.join(dir, entry.fileName); | ||
| const isDir = isDirectoryEntry(entry); | ||
| const mode = getEntryMode(entry, opts, isDir); | ||
| const containerDir = isDir ? entryPath : path.dirname(entryPath); | ||
|
|
||
| await fs.promises.mkdir(containerDir, { recursive: true }); | ||
|
|
||
| // Path-traversal guard: after mkdir, resolve the actual on-disk path and | ||
| // make sure it stays within `dir`. Catches both `../` shenanigans and | ||
| // symlink-target shenanigans created in earlier entries of the same zip. | ||
| const canonicalContainerDir = await fs.promises.realpath(containerDir); | ||
| const relative = path.relative(dir, canonicalContainerDir); | ||
| if (relative.split(path.sep).includes('..')) { | ||
| throw new Error( | ||
| `Out of bound path "${canonicalContainerDir}" found while processing file ${entry.fileName}`, | ||
| ); | ||
| } |
There was a problem hiding this comment.
Run containment checks before creating directories.
At Line 339, mkdir(containerDir) executes before the out-of-bounds check at Line 346. A crafted path like ../outside/new/file.txt can create directories outside destination before rejection.
Suggested fix
const entryPath = path.join(dir, entry.fileName);
const isDir = isDirectoryEntry(entry);
const mode = getEntryMode(entry, opts, isDir);
const containerDir = isDir ? entryPath : path.dirname(entryPath);
+ // Lexical pre-check to prevent creating directories outside `dir`.
+ const lexicalRelative = path.relative(dir, containerDir);
+ if (path.isAbsolute(lexicalRelative) || lexicalRelative.split(path.sep).includes('..')) {
+ throw new Error(
+ `Out of bound path "${containerDir}" found while processing file ${entry.fileName}`,
+ );
+ }
+
await fs.promises.mkdir(containerDir, { recursive: true });
// Path-traversal guard: after mkdir, resolve the actual on-disk path and
// make sure it stays within `dir`. Catches both `../` shenanigans and
// symlink-target shenanigans created in earlier entries of the same zip.
const canonicalContainerDir = await fs.promises.realpath(containerDir);
const relative = path.relative(dir, canonicalContainerDir);
- if (relative.split(path.sep).includes('..')) {
+ if (path.isAbsolute(relative) || relative.split(path.sep).includes('..')) {
throw new Error(
`Out of bound path "${canonicalContainerDir}" found while processing file ${entry.fileName}`,
);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/zip/lib/extract.js` around lines 334 - 350, The code currently calls
fs.promises.mkdir(containerDir) before verifying the resolved path is contained
in dir, allowing a crafted entry (e.g., ../outside/...) to create outside
directories; fix by computing the intended absolute target with
path.resolve(dir, entry.fileName) (or path.resolve(dir, containerDir)) and
perform the containment check (use path.relative(dir, target) and ensure it does
not start with '..' or contain '..' path segments) before calling
fs.promises.mkdir; keep the existing post-mkdir realpath check if you want to
defend against symlinks but ensure the pre-mkdir resolve-and-validate happens
first in the extract.js logic around entryPath/containerDir and the mkdir call.
Summary
Follow-up to #632. The size limits added in that PR are enforced at metadata pre-flight (
entry.uncompressedSize), which catches honestly-declared oversize archives but not zips that lie about their declared size.This PR replaces
extract-zipwithyauzldirectly and enforces the limits on actual decompressed bytes during streaming, closing that gap and removing one dependency.Why the dependency swap was necessary
extract-zipis a thin wrapper aroundyauzl. yauzl already includes anAssertByteCountStreamthat should error mid-stream when actual bytes exceed declared. But yauzl overrides that stream'sdestroywith a closure that silently swallows the error — so when an oversize entry is detected, no error propagates, andextract-zip'spipeline()waits forever. Result on main today: a zip with mismatched header sizes hangsextract()indefinitely.The fix is to open yauzl with
validateEntrySizes: falseto skip yauzl's broken counter, and run our ownStreamingCounterTransform whose errors propagate cleanly throughpipeline().extract-zipis no longer compatible with that approach, and it has been unmaintained since 2020, so we drop it.What's preserved
All security-critical logic from
extract-zipis ported verbatim:realpath+path.relative+..check)__MACOSX/resource-fork skipdefaultDirMode/defaultFileModehandlingonEntryhook contractAll existing error codes and shapes are preserved (
ENTRY_TOO_LARGE,TOTAL_TOO_LARGE,INVALID_ENTRY_SIZE,INVALID_ZIP_LIMIT).New regression tests
rejects a zip whose central directory lies about uncompressed size— builds a real zip, forges its central directoryuncompressedSizeto 5 bytes while actual payload is 1 MB, asserts clean rejection (not a hang)rejects a zip whose cumulative actual bytes exceed the total limit— two-entry version of the same scenario against the totalextracts a lying zip when no limits are configured— same forged zip with no limits — used to hang underextract-zip+yauzl, now resolves cleanly with the actual bytes on diskrejects an entry whose path escapes the destination directory— locks in the path-traversal guard (via yauzl mock, since archiver normalizes../out)(Supersedes #637 — same code, less scary branch name.)
Test plan
pnpm --filter @tryghost/zip test— 19/19 pass, coverage above thresholds, lint clean🤖 Generated with Claude Code