perf(rar): switch to unrar-ng batch extract_all_with_callback#968
perf(rar): switch to unrar-ng batch extract_all_with_callback#968ttys3 wants to merge 5 commits intoouch-org:mainfrom
Conversation
3d25528 to
1279708
Compare
|
Thanks for the patch and picking up the unrar repo. The PR actually fixes some important issues, including an actively exploited CVE in the backend unrar library ideally I would like to switch to a pure rust solution, but that does not seem to exist and may be impossible with the unrar library license situation, so this seems to be the best option for ouch at the moment. Aside from the failed CI tests, there are a few minor things: src/archive/rar.rs:52 — the closure returns true unconditionally, including from the Err arm. unrar-ng docs states returning false from Start/Ok/Err cancels the rest of the extraction. The previous code bailed on the first error via ?, so extraction now continues past per-file errors and only the first one is recorded (the rest are silently dropped because of the if first_err.is_none() guard at line 38). "lets the DLL fail naturally to Err(LargeDict)", but per unrar-ng docs, returning true from LargeDictWarning permits the oversized dictionary and proceeds; it's false that rejects it and produces Code::LargeDict. The closure at line 52 returns true, so it's actually permitting the oversized dict. Or did I miss something? |
1279708 to
c7c96e4
Compare
Migrate RAR decompression from upstream unrar 0.5.7 to the maintained fork unrar-ng 0.7.4 via Cargo dep alias (source-level use unrar::* preserved). Replace the per-file read_header + extract_with_base loop with OpenArchive::extract_all_with_callback, which uses the C batch path internally and avoids per-file FFI overhead -- noticeably faster for archives with many small files. Behavioral notes: - Directory entries in the archive are now materialized on disk by the C library; the previous loop explicitly skipped non-file headers so empty directories were not created. Most users expect the archive's original directory layout, so this is treated as a fix. - Per-file errors are captured in the callback and surfaced as Error::Custom with a "failed to extract <path>" title plus a human-readable detail, rather than relying on a bare From<UnrarError> conversion (which omitted the filename context). The Err arm returns false to cancel the rest of the extraction -- matching the previous loop's ?-on-first-error semantics and preventing the first_err.is_none() guard from silently swallowing any subsequent per-file errors the C library might surface. - From<UnrarError> now formats via Display (err.to_string()) instead of the previous Debug-formatted err.code, so messages such as "Wrong password was specified" replace the bare BadPassword enum debug text. Covers LargeDict and the new Unmapped(i32) fallback too. - Closure handles the new ExtractEvent::LargeDictWarning by emitting an info line with the required vs supported dictionary size, then returning false to reject the oversized dictionary so the DLL surfaces the failure as Err(Code::LargeDict). Returning true would permit extraction to proceed with a dictionary the build cannot actually decompress, which is not the behavior we want. - Pinned to 0.7.4 (not 0.7.3) because 0.7.3 fails to compile on Windows: the const offset_of assertions in unrar-ng-sys hard-coded the HeaderDataEx field offsets for 4-byte wchar_t (Linux/macOS) and panicked at compile time on Windows where wchar_t is 2 bytes. 0.7.4 parameterizes those offsets by sizeof(wchar_t). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c7c96e4 to
90226cb
Compare
ExtractEvent::Start fires before per-file extraction begins, so the "extracted (size) filename" line printed too early -- before the bytes actually landed on disk, and even when the file subsequently errored out. Move the info call to ExtractEvent::Ok and stash the size from Start (Ok carries only the filename), so the log line now reflects a completed extraction. The wording matches zip/sevenz, which already log post-success. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@valoq updated, PTAL |
unrar-ng 0.7.5 added the uncompressed file size to the ExtractEvent::Ok variant (it was previously only on Start), so the ouch-side workaround that stashed pending_size between Start and Ok events is no longer necessary. Drop the pending_size local and the ExtractEvent::Start arm; read the size directly from Ok. Bumps the dep alias from 0.7.4 to 0.7.5 and syncs Cargo.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Other archive backends (zip, sevenz, tar) format paths with PathFmt, which wraps the path in quotes and strips the leading "./" noise via NoQuotePathFmt. Switch the rar backend's two raw .display() sites (the per-file extracted log and the failed-to-extract error title) over to PathFmt so the user-facing output matches the rest of the project. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
updated PR description with real physical machine tests result |
resolve #714
perf test:
RAR Extraction Performance Comparison (Linux kernel v7.0 source)
this is test runs on a physical machine
tested under
tmpfsto avoid filesystem I/O impact.builtin
timecommand comes from fish shellLinux: kernel 7.0.3
shell: fish
rar/unrar: 7.22
unzip: UnZip 6.00 of 20 April 2009, by Info-ZIP
CPU: 12th Gen Intel(R) Core(TM) i7-12700
RAM: 32GB
Test file:
kernel-v7.0.rar(created from Linux kernel v7.0 source tree)Extraction Performance
unrar(official)unrar x kernel-v7.0.rarouch0.7.1 (old release)ouch decompress kernel-v7.0.rarouch(this PR)ouch decompress kernel-v7.0.rarReference: Other Formats
unzipunzip v7.0.ziprar(compression)rar a -r kernel-v7.0.rar ./linux-7.0/Key Takeaways
unrar-ng0.7.6) is ~10.3x faster than the previous release (70.69 s → 6.88 s).unrartool (6.88 s vs 7.15 s).unzipextracting the same content (6.88 s vs 7.17 s).Click here to expand and show detailed test steps below
Migrate RAR decompression from upstream unrar 0.5.7 to the maintained fork unrar-ng 0.7.3 via Cargo dep alias (source-level use unrar::* preserved). Replace the per-file read_header + extract_with_base loop with OpenArchive::extract_all_with_callback, which uses the C batch path internally and avoids per-file FFI overhead -- noticeably faster for archives with many small files.
Behavioral notes: