feat: EXPERIMENT: dyn trait object injection for customizable read pipeline#693
feat: EXPERIMENT: dyn trait object injection for customizable read pipeline#693im7mortal wants to merge 1 commit intozip-rs:masterfrom
Conversation
…ve zero-cost pipeline for std behavior
Summary of ChangesHello @im7mortal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural change to the ZIP reading pipeline, enabling greater flexibility and customization for advanced users. By allowing the injection of dynamic trait objects, it provides a mechanism to override or extend core functionalities, starting with the CRC validation layer. This approach maintains the zero-cost abstraction for standard operations while offering an opt-in path for specific use cases, such as skipping validation or implementing alternative checks, without altering the library's core logic. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism for injecting custom readers into the read pipeline, demonstrated with a proof-of-concept for the validation layer. This adds flexibility for users who need to customize parts of the reading process, such as skipping CRC validation. The changes involve introducing a ValidatorReaderFactory trait object, modifying ZipFileReader to accommodate custom readers via a CompressedInner enum, and updating ZipReadOptions to accept a custom factory. A new example demonstrates how to use this feature.
The implementation is a solid proof-of-concept. I've suggested a refactoring to reduce code duplication in the Read implementation for ZipFileReader, which will improve maintainability. I've also suggested making the new example self-contained so it can be run without modification, improving its utility for users.
| // Override with your file; make sure to use a small dummy zip | ||
| let file = File::open("/home/user/dummy.zip")?; | ||
| let config = Config::default(); | ||
| let mut archive = ZipArchive::with_config(config, file)?; |
There was a problem hiding this comment.
The example uses a hardcoded file path /home/user/dummy.zip, which will cause it to fail for most users. To make the example self-contained and runnable out-of-the-box, consider creating a dummy zip archive in memory.
This also requires updating the use statements at the top of the file. You'll need to:
- Remove
use std::fs::File; - Add
use std::io::{Cursor, Write}; - Add
use zip::write::FileOptions;
// Create a dummy zip file in memory for the example.
let mut zip_data: Vec<u8> = Vec::new();
{
let mut zip_writer = zip::ZipWriter::new(std::io::Cursor::new(&mut zip_data));
zip_writer.start_file("hello_world.txt", zip::write::FileOptions::default())?;
zip_writer.write_all(b"Hello, World!")?;
zip_writer.finish()?;
}
let config = Config::default();
let mut archive = ZipArchive::with_config(config, std::io::Cursor::new(zip_data))?;| impl<R: Read + ?Sized> Read for ZipFileReader<'_, R> { | ||
| fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> { | ||
| match self { | ||
| ZipFileReader::NoReader => invalid_state(), | ||
| ZipFileReader::Raw(r) => r.read(buf), | ||
| ZipFileReader::Compressed(r) => r.read(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read(buf), | ||
| } | ||
| } | ||
|
|
||
| fn read_exact(&mut self, buf: &mut [u8]) -> io::Result<()> { | ||
| match self { | ||
| ZipFileReader::NoReader => invalid_state(), | ||
| ZipFileReader::Raw(r) => r.read_exact(buf), | ||
| ZipFileReader::Compressed(r) => r.read_exact(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read_exact(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read_exact(buf), | ||
| } | ||
| } | ||
|
|
||
| fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> { | ||
| match self { | ||
| ZipFileReader::NoReader => invalid_state(), | ||
| ZipFileReader::Raw(r) => r.read_to_end(buf), | ||
| ZipFileReader::Compressed(r) => r.read_to_end(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read_to_end(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read_to_end(buf), | ||
| } | ||
| } | ||
|
|
||
| fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> { | ||
| match self { | ||
| ZipFileReader::NoReader => invalid_state(), | ||
| ZipFileReader::Raw(r) => r.read_to_string(buf), | ||
| ZipFileReader::Compressed(r) => r.read_to_string(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read_to_string(buf), | ||
| ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read_to_string(buf), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The Read implementation for ZipFileReader contains duplicated logic for handling CompressedInner::Builtin and CompressedInner::Custom variants across read, read_exact, read_to_end, and read_to_string. This can be simplified by providing a way to get a &mut dyn Read from CompressedInner.
I suggest adding a helper method to CompressedInner:
impl<'a, R: Read + ?Sized> CompressedInner<'a, R> {
fn as_read_mut(&mut self) -> &mut (dyn Read + 'a) {
match self {
CompressedInner::Builtin(r) => r.as_mut(),
CompressedInner::Custom(r) => r.as_mut(),
}
}
}This allows you to simplify the Read implementation for ZipFileReader as shown in the suggestion, reducing boilerplate and improving maintainability.
impl<R: Read + ?Sized> Read for ZipFileReader<'_, R> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read(buf),
ZipFileReader::Compressed(r) => r.as_read_mut().read(buf),
}
}
fn read_exact(&mut self, buf: &mut [u8]) -> io::Result<()> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read_exact(buf),
ZipFileReader::Compressed(r) => r.as_read_mut().read_exact(buf),
}
}
fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read_to_end(buf),
ZipFileReader::Compressed(r) => r.as_read_mut().read_to_end(buf),
}
}
fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read_to_string(buf),
ZipFileReader::Compressed(r) => r.as_read_mut().read_to_string(buf),
}
}
}
As discussed extensively in #170 (and also in #223), this is a proof of concept.
The library maintains a zero-cost, highly optimizable path for the standard flow, while adding flexibility for users who need something more specific.
For this prototype, I focused on the read pipeline and implemented customization for the validation layer (CRC), as it was the simplest layer to experiment with.
As shown above, the enum grows for each custom reader. Currently, this is implemented only for validation, but the same issue applies to encryption and compression. I need to research for a more general and scalable approach that works uniformly across layers. For example, if a user customizes one layer, they may also need to provide implementations for the others.
The contract would be: full flexibility is possible, but it comes with trade-offs. Users opting into customization would also take responsibility for the added complexity and potential issues.
Not every requested feature or potentially useful extension should necessarily be included in the library.
Regarding the cost of dynamic dispatch, I found this good explanation of the overhead .