Skip to content

Enforce zip size limits on actual decompressed bytes#638

Closed
ErisDS wants to merge 1 commit into
mainfrom
fix/zip-streaming-size-limits
Closed

Enforce zip size limits on actual decompressed bytes#638
ErisDS wants to merge 1 commit into
mainfrom
fix/zip-streaming-size-limits

Conversation

@ErisDS
Copy link
Copy Markdown
Member

@ErisDS ErisDS commented May 14, 2026

Summary

Follow-up to #632. The size limits added in that PR are enforced at metadata pre-flight (entry.uncompressedSize), which catches honestly-declared oversize archives but not zips that lie about their declared size.

This PR replaces extract-zip with yauzl directly and enforces the limits on actual decompressed bytes during streaming, closing that gap and removing one dependency.

Why the dependency swap was necessary

extract-zip is a thin wrapper around yauzl. yauzl already includes an AssertByteCountStream that should error mid-stream when actual bytes exceed declared. But yauzl overrides that stream's destroy with a closure that silently swallows the error — so when an oversize entry is detected, no error propagates, and extract-zip's pipeline() waits forever. Result on main today: a zip with mismatched header sizes hangs extract() indefinitely.

The fix is to open yauzl with validateEntrySizes: false to skip yauzl's broken counter, and run our own StreamingCounter Transform whose errors propagate cleanly through pipeline(). extract-zip is no longer compatible with that approach, and it has been unmaintained since 2020, so we drop it.

What's preserved

All security-critical logic from extract-zip is ported verbatim:

  • Path-traversal guard (realpath + path.relative + .. check)
  • __MACOSX/ resource-fork skip
  • Symlink / directory / Windows-DOS-mode-16 detection
  • defaultDirMode / defaultFileMode handling
  • onEntry hook contract

All existing error codes and shapes are preserved (ENTRY_TOO_LARGE, TOTAL_TOO_LARGE, INVALID_ENTRY_SIZE, INVALID_ZIP_LIMIT).

New regression tests

  • rejects a zip whose central directory lies about uncompressed size — builds a real zip, forges its central directory uncompressedSize to 5 bytes while actual payload is 1 MB, asserts clean rejection (not a hang)
  • rejects a zip whose cumulative actual bytes exceed the total limit — two-entry version of the same scenario against the total
  • extracts a lying zip when no limits are configured — same forged zip with no limits — used to hang under extract-zip+yauzl, now resolves cleanly with the actual bytes on disk
  • rejects an entry whose path escapes the destination directory — locks in the path-traversal guard (via yauzl mock, since archiver normalizes ../ out)

(Supersedes #637 — same code, less scary branch name.)

Test plan

  • pnpm --filter @tryghost/zip test — 19/19 pass, coverage above thresholds, lint clean
  • Sanity-check: extract a real Ghost theme zip with no limits set
  • Sanity-check: extract a real Ghost theme zip with limits set above the theme's actual size
  • Verify the lying-header regression test fails on main (it hangs)

🤖 Generated with Claude Code

Switches @tryghost/zip from `extract-zip` to `yauzl` directly so that
the new size limits are enforced against actual decompressed bytes,
not just the central-directory metadata. A zip whose header lies about
its uncompressed size (the classic zip-bomb shape) previously slipped
past the pre-flight check and then hung the underlying extract-zip
pipeline indefinitely — because yauzl's AssertByteCountStream overrides
`destroy` in a way that swallows mid-stream errors. The new
implementation opens yauzl with `validateEntrySizes: false` and runs
its own StreamingCounter Transform that propagates errors cleanly
through the pipeline.

Behaviour is otherwise preserved: same path-traversal, __MACOSX,
symlink, filename-length, and mode-handling logic ported verbatim
from extract-zip; same error codes and shapes; same onEntry hook.

extract-zip is removed as a dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

Walkthrough

This PR implements optional uncompressed size limits for zip extraction in @tryghost/zip, replacing extract-zip with yauzl for fully streamed processing. The implementation adds perEntryUncompressedBytes and totalUncompressedBytes configuration options that enforce limits against actual decompressed bytes during streaming, not untrusted central-directory metadata. It includes security hardening against path traversal, symlinks, and overly long filenames, plus zip-bomb defense through a StreamingCounter Transform that fails the extraction pipeline when actual decompressed size exceeds configured limits.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • TryGhost/framework#632: Implements the same @tryghost/zip size limit feature, modifying the same extract.js and test files with overlapping limit enforcement and test coverage logic.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main change: enforcing zip size limits on actual decompressed bytes during streaming instead of relying on metadata.
Description check ✅ Passed The description is comprehensive and clearly related to the changeset, explaining the motivation, implementation details, security preservation, and test additions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/zip-streaming-size-limits

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/zip/test/zip.test.js (1)

519-524: ⚡ Quick win

Tighten the expected error code in the cumulative-limit test.

This test configures only totalUncompressedBytes, so accepting ENTRY_TOO_LARGE can mask regressions. Assert TOTAL_TOO_LARGE only.

Suggested fix
-                assert.equal(
-                    err.code === 'TOTAL_TOO_LARGE' || err.code === 'ENTRY_TOO_LARGE',
-                    true,
-                    `expected TOTAL_TOO_LARGE or ENTRY_TOO_LARGE, got ${err.code}`,
-                );
+                assert.equal(err.code, 'TOTAL_TOO_LARGE');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/zip/test/zip.test.js` around lines 519 - 524, The test currently
allows either 'TOTAL_TOO_LARGE' or 'ENTRY_TOO_LARGE' for err.code which is too
permissive for the cumulative-limit scenario; update the assertion in the test
(the assert.equal checking err.code) to require err.code === 'TOTAL_TOO_LARGE'
only and adjust the failure message to reflect that it expected
'TOTAL_TOO_LARGE' (remove 'ENTRY_TOO_LARGE' from the check and message), leaving
the surrounding assert.equal(err.errorType, 'UnsupportedMediaTypeError') intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/zip/lib/extract.js`:
- Around line 334-350: The code currently calls fs.promises.mkdir(containerDir)
before verifying the resolved path is contained in dir, allowing a crafted entry
(e.g., ../outside/...) to create outside directories; fix by computing the
intended absolute target with path.resolve(dir, entry.fileName) (or
path.resolve(dir, containerDir)) and perform the containment check (use
path.relative(dir, target) and ensure it does not start with '..' or contain
'..' path segments) before calling fs.promises.mkdir; keep the existing
post-mkdir realpath check if you want to defend against symlinks but ensure the
pre-mkdir resolve-and-validate happens first in the extract.js logic around
entryPath/containerDir and the mkdir call.

---

Nitpick comments:
In `@packages/zip/test/zip.test.js`:
- Around line 519-524: The test currently allows either 'TOTAL_TOO_LARGE' or
'ENTRY_TOO_LARGE' for err.code which is too permissive for the cumulative-limit
scenario; update the assertion in the test (the assert.equal checking err.code)
to require err.code === 'TOTAL_TOO_LARGE' only and adjust the failure message to
reflect that it expected 'TOTAL_TOO_LARGE' (remove 'ENTRY_TOO_LARGE' from the
check and message), leaving the surrounding assert.equal(err.errorType,
'UnsupportedMediaTypeError') intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 498c3a1e-3ac3-4c2b-a1b6-4060f056192a

📥 Commits

Reviewing files that changed from the base of the PR and between 0cf8046 and 3b82d12.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (4)
  • packages/zip/README.md
  • packages/zip/lib/extract.js
  • packages/zip/package.json
  • packages/zip/test/zip.test.js

Comment on lines +334 to +350
const entryPath = path.join(dir, entry.fileName);
const isDir = isDirectoryEntry(entry);
const mode = getEntryMode(entry, opts, isDir);
const containerDir = isDir ? entryPath : path.dirname(entryPath);

await fs.promises.mkdir(containerDir, { recursive: true });

// Path-traversal guard: after mkdir, resolve the actual on-disk path and
// make sure it stays within `dir`. Catches both `../` shenanigans and
// symlink-target shenanigans created in earlier entries of the same zip.
const canonicalContainerDir = await fs.promises.realpath(containerDir);
const relative = path.relative(dir, canonicalContainerDir);
if (relative.split(path.sep).includes('..')) {
throw new Error(
`Out of bound path "${canonicalContainerDir}" found while processing file ${entry.fileName}`,
);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run containment checks before creating directories.

At Line 339, mkdir(containerDir) executes before the out-of-bounds check at Line 346. A crafted path like ../outside/new/file.txt can create directories outside destination before rejection.

Suggested fix
     const entryPath = path.join(dir, entry.fileName);
     const isDir = isDirectoryEntry(entry);
     const mode = getEntryMode(entry, opts, isDir);
     const containerDir = isDir ? entryPath : path.dirname(entryPath);

+    // Lexical pre-check to prevent creating directories outside `dir`.
+    const lexicalRelative = path.relative(dir, containerDir);
+    if (path.isAbsolute(lexicalRelative) || lexicalRelative.split(path.sep).includes('..')) {
+        throw new Error(
+            `Out of bound path "${containerDir}" found while processing file ${entry.fileName}`,
+        );
+    }
+
     await fs.promises.mkdir(containerDir, { recursive: true });

     // Path-traversal guard: after mkdir, resolve the actual on-disk path and
     // make sure it stays within `dir`. Catches both `../` shenanigans and
     // symlink-target shenanigans created in earlier entries of the same zip.
     const canonicalContainerDir = await fs.promises.realpath(containerDir);
     const relative = path.relative(dir, canonicalContainerDir);
-    if (relative.split(path.sep).includes('..')) {
+    if (path.isAbsolute(relative) || relative.split(path.sep).includes('..')) {
         throw new Error(
             `Out of bound path "${canonicalContainerDir}" found while processing file ${entry.fileName}`,
         );
     }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/zip/lib/extract.js` around lines 334 - 350, The code currently calls
fs.promises.mkdir(containerDir) before verifying the resolved path is contained
in dir, allowing a crafted entry (e.g., ../outside/...) to create outside
directories; fix by computing the intended absolute target with
path.resolve(dir, entry.fileName) (or path.resolve(dir, containerDir)) and
perform the containment check (use path.relative(dir, target) and ensure it does
not start with '..' or contain '..' path segments) before calling
fs.promises.mkdir; keep the existing post-mkdir realpath check if you want to
defend against symlinks but ensure the pre-mkdir resolve-and-validate happens
first in the extract.js logic around entryPath/containerDir and the mkdir call.

@ErisDS ErisDS closed this May 14, 2026
@ErisDS ErisDS deleted the fix/zip-streaming-size-limits branch May 14, 2026 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant