Skip to content

perf(rar): switch to unrar-ng batch extract_all_with_callback#968

Open
ttys3 wants to merge 5 commits intoouch-org:mainfrom
ttys3:feature/impl-rar-extract-using-unrar-ng
Open

perf(rar): switch to unrar-ng batch extract_all_with_callback#968
ttys3 wants to merge 5 commits intoouch-org:mainfrom
ttys3:feature/impl-rar-extract-using-unrar-ng

Conversation

@ttys3
Copy link
Copy Markdown
Contributor

@ttys3 ttys3 commented Apr 29, 2026

resolve #714

perf test:

RAR Extraction Performance Comparison (Linux kernel v7.0 source)

this is test runs on a physical machine

tested under tmpfs to avoid filesystem I/O impact.
builtin time command comes from fish shell

Linux: kernel 7.0.3
shell: fish
rar/unrar: 7.22
unzip: UnZip 6.00 of 20 April 2009, by Info-ZIP

CPU: 12th Gen Intel(R) Core(TM) i7-12700
RAM: 32GB

Test file: kernel-v7.0.rar (created from Linux kernel v7.0 source tree)

Extraction Performance

Tool Command Executed (wall) User CPU Sys CPU
unrar (official) unrar x kernel-v7.0.rar 7.15 s 7.07 s 1.25 s
ouch 0.7.1 (old release) ouch decompress kernel-v7.0.rar 70.69 s 68.03 s 4.71 s
ouch (this PR) ouch decompress kernel-v7.0.rar 6.88 s 6.74 s 1.35 s

Reference: Other Formats

Tool Command Executed (wall) User CPU Sys CPU
unzip unzip v7.0.zip 7.17 s 6.20 s 0.95 s
rar (compression) rar a -r kernel-v7.0.rar ./linux-7.0/ 20.12 s 102.85 s 10.30 s

Key Takeaways

  • This PR (backed by unrar-ng 0.7.6) is ~10.3x faster than the previous release (70.69 s → 6.88 s).
  • Performance is on par with the official unrar tool (6.88 s vs 7.15 s).
  • Comparable to unzip extracting the same content (6.88 s vs 7.17 s).

Click here to expand and show detailed test steps below

# ensure /tmp is tmpfs, not disk
❯ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,size=15858944k,nr_inodes=1048576,inode64,usrquota)

❯ cd /tmp
❯ mkdir perf-test
❯ cd perf-test/
❯ curl -LZO https://github.com/ouch-org/ouch/releases/download/0.7.1/ouch-x86_64-unknown-linux-gnu.tar.gz
❯ curl -LZO https://github.com/torvalds/linux/archive/refs/tags/v7.0.zip

❯ tar xvzf ouch-x86_64-unknown-linux-gnu.tar.gz 

❯ time unzip v7.0.zip
Executed in    7.17 secs    fish           external
   usr time    6.20 secs    0.00 millis    6.20 secs
   sys time    0.95 secs    1.99 millis    0.95 secs


❯ time rar a -r kernel-v7.0.rar ./linux-7.0/
________________________________________________________
Executed in   20.12 secs    fish           external
   usr time  102.85 secs    1.40 millis  102.85 secs
   sys time   10.30 secs    1.78 millis   10.29 secs

❯ rm -rf linux-7.0/

❯ time unrar x kernel-v7.0.rar 
________________________________________________________
Executed in    7.15 secs    fish           external
   usr time    7.07 secs    0.00 millis    7.07 secs
   sys time    1.25 secs    2.73 millis    1.25 secs

❯ rm -rf linux-7.0/

❯ time ./ouch-x86_64-unknown-linux-gnu/ouch decompress -d ouch-extracted  kernel-v7.0.rar 
________________________________________________________
Executed in   70.69 secs    fish           external
   usr time   68.03 secs    1.47 millis   68.03 secs
   sys time    4.71 secs    1.74 millis    4.71 secs

❯ time ~/repo/rust/ouch/target/release/ouch decompress -d ouch-pr-extracted  kernel-v7.0.rar 
________________________________________________________
Executed in    6.88 secs    fish           external
   usr time    6.74 secs    0.19 millis    6.73 secs
   sys time    1.35 secs    3.04 millis    1.35 secs

Migrate RAR decompression from upstream unrar 0.5.7 to the maintained fork unrar-ng 0.7.3 via Cargo dep alias (source-level use unrar::* preserved). Replace the per-file read_header + extract_with_base loop with OpenArchive::extract_all_with_callback, which uses the C batch path internally and avoids per-file FFI overhead -- noticeably faster for archives with many small files.

Behavioral notes:

  • Directory entries in the archive are now materialized on disk by the C library; the previous loop explicitly skipped non-file headers so empty directories were not created. Most users expect the archive's original directory layout, so this is treated as a fix.
  • Per-file errors are captured in the callback and surfaced as Error::Custom with a "failed to extract " title plus a human-readable detail, rather than relying on a bare From conversion (which omitted the filename context).
  • From now formats via Display (err.to_string()) instead of the previous Debug-formatted err.code, so messages such as "Wrong password was specified" replace the bare BadPassword enum debug text. Covers LargeDict and the new Unmapped(i32) fallback too.
  • Closure handles the new ExtractEvent::LargeDictWarning by emitting an info line with the required vs supported dictionary size, then letting the DLL fail naturally to Err(LargeDict).

@ttys3 ttys3 force-pushed the feature/impl-rar-extract-using-unrar-ng branch from 3d25528 to 1279708 Compare April 29, 2026 17:03
@valoq
Copy link
Copy Markdown
Collaborator

valoq commented Apr 29, 2026

Thanks for the patch and picking up the unrar repo.
This replaces the unmaintained https://github.com/muja/unrar.rs with https://github.com/ttys3/unrar.rs as depencency

The PR actually fixes some important issues, including an actively exploited CVE in the backend unrar library

ideally I would like to switch to a pure rust solution, but that does not seem to exist and may be impossible with the unrar library license situation, so this seems to be the best option for ouch at the moment.

Aside from the failed CI tests, there are a few minor things:

src/archive/rar.rs:52 — the closure returns true unconditionally, including from the Err arm. unrar-ng docs states returning false from Start/Ok/Err cancels the rest of the extraction. The previous code bailed on the first error via ?, so extraction now continues past per-file errors and only the first one is recorded (the rest are silently dropped because of the if first_err.is_none() guard at line 38).

"lets the DLL fail naturally to Err(LargeDict)", but per unrar-ng docs, returning true from LargeDictWarning permits the oversized dictionary and proceeds; it's false that rejects it and produces Code::LargeDict. The closure at line 52 returns true, so it's actually permitting the oversized dict. Or did I miss something?

@ttys3 ttys3 force-pushed the feature/impl-rar-extract-using-unrar-ng branch from 1279708 to c7c96e4 Compare April 30, 2026 15:08
Migrate RAR decompression from upstream unrar 0.5.7 to the maintained
fork unrar-ng 0.7.4 via Cargo dep alias (source-level use unrar::*
preserved). Replace the per-file read_header + extract_with_base loop
with OpenArchive::extract_all_with_callback, which uses the C batch
path internally and avoids per-file FFI overhead -- noticeably faster
for archives with many small files.

Behavioral notes:

- Directory entries in the archive are now materialized on disk by
  the C library; the previous loop explicitly skipped non-file
  headers so empty directories were not created. Most users expect
  the archive's original directory layout, so this is treated as a
  fix.
- Per-file errors are captured in the callback and surfaced as
  Error::Custom with a "failed to extract <path>" title plus a
  human-readable detail, rather than relying on a bare
  From<UnrarError> conversion (which omitted the filename context).
  The Err arm returns false to cancel the rest of the extraction --
  matching the previous loop's ?-on-first-error semantics and
  preventing the first_err.is_none() guard from silently swallowing
  any subsequent per-file errors the C library might surface.
- From<UnrarError> now formats via Display (err.to_string()) instead
  of the previous Debug-formatted err.code, so messages such as
  "Wrong password was specified" replace the bare BadPassword enum
  debug text. Covers LargeDict and the new Unmapped(i32) fallback
  too.
- Closure handles the new ExtractEvent::LargeDictWarning by emitting
  an info line with the required vs supported dictionary size, then
  returning false to reject the oversized dictionary so the DLL
  surfaces the failure as Err(Code::LargeDict). Returning true would
  permit extraction to proceed with a dictionary the build cannot
  actually decompress, which is not the behavior we want.
- Pinned to 0.7.4 (not 0.7.3) because 0.7.3 fails to compile on
  Windows: the const offset_of assertions in unrar-ng-sys hard-coded
  the HeaderDataEx field offsets for 4-byte wchar_t (Linux/macOS) and
  panicked at compile time on Windows where wchar_t is 2 bytes. 0.7.4
  parameterizes those offsets by sizeof(wchar_t).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ttys3 ttys3 force-pushed the feature/impl-rar-extract-using-unrar-ng branch from c7c96e4 to 90226cb Compare April 30, 2026 15:27
@ttys3 ttys3 changed the title feat(rar): switch to unrar-ng 0.7.3 batch extract_all_with_callback feat(rar): switch to unrar-ng batch extract_all_with_callback Apr 30, 2026
ExtractEvent::Start fires before per-file extraction begins, so the
"extracted (size) filename" line printed too early -- before the
bytes actually landed on disk, and even when the file subsequently
errored out. Move the info call to ExtractEvent::Ok and stash the
size from Start (Ok carries only the filename), so the log line now
reflects a completed extraction. The wording matches zip/sevenz,
which already log post-success.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ttys3
Copy link
Copy Markdown
Contributor Author

ttys3 commented Apr 30, 2026

@valoq updated, PTAL

ttys3 and others added 3 commits April 30, 2026 15:54
unrar-ng 0.7.5 added the uncompressed file size to the
ExtractEvent::Ok variant (it was previously only on Start), so the
ouch-side workaround that stashed pending_size between Start and Ok
events is no longer necessary. Drop the pending_size local and the
ExtractEvent::Start arm; read the size directly from Ok.

Bumps the dep alias from 0.7.4 to 0.7.5 and syncs Cargo.lock.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Other archive backends (zip, sevenz, tar) format paths with PathFmt,
which wraps the path in quotes and strips the leading "./" noise via
NoQuotePathFmt. Switch the rar backend's two raw .display() sites
(the per-file extracted log and the failed-to-extract error title)
over to PathFmt so the user-facing output matches the rest of the
project.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ttys3 ttys3 changed the title feat(rar): switch to unrar-ng batch extract_all_with_callback perf(rar): switch to unrar-ng batch extract_all_with_callback May 4, 2026
@ttys3
Copy link
Copy Markdown
Contributor Author

ttys3 commented May 4, 2026

updated PR description with real physical machine tests result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Huge slowdown when decompressing big RAR archive

2 participants