Skip to content

feat: EXPERIMENT: dyn trait object injection for customizable read pipeline#693

Draft
im7mortal wants to merge 1 commit intozip-rs:masterfrom
im7mortal:experiment-injections
Draft

feat: EXPERIMENT: dyn trait object injection for customizable read pipeline#693
im7mortal wants to merge 1 commit intozip-rs:masterfrom
im7mortal:experiment-injections

Conversation

@im7mortal
Copy link
Contributor

As discussed extensively in #170 (and also in #223), this is a proof of concept.

The library maintains a zero-cost, highly optimizable path for the standard flow, while adding flexibility for users who need something more specific.

For this prototype, I focused on the read pipeline and implemented customization for the validation layer (CRC), as it was the simplest layer to experiment with.

            ZipFileReader::NoReader => invalid_state(),
            ZipFileReader::Raw(r) => r.read(buf),
            ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read(buf),
 >>>        ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read(buf),
        

As shown above, the enum grows for each custom reader. Currently, this is implemented only for validation, but the same issue applies to encryption and compression. I need to research for a more general and scalable approach that works uniformly across layers. For example, if a user customizes one layer, they may also need to provide implementations for the others.

The contract would be: full flexibility is possible, but it comes with trade-offs. Users opting into customization would also take responsibility for the added complexity and potential issues.
Not every requested feature or potentially useful extension should necessarily be included in the library.

Regarding the cost of dynamic dispatch, I found this good explanation of the overhead .

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @im7mortal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural change to the ZIP reading pipeline, enabling greater flexibility and customization for advanced users. By allowing the injection of dynamic trait objects, it provides a mechanism to override or extend core functionalities, starting with the CRC validation layer. This approach maintains the zero-cost abstraction for standard operations while offering an opt-in path for specific use cases, such as skipping validation or implementing alternative checks, without altering the library's core logic.

Highlights

  • Customizable Read Pipeline: Introduced the ability to inject custom dyn Trait objects into the ZIP file reading pipeline, specifically targeting the validation layer (CRC) as a proof of concept.
  • ValidatorReaderFactory: Added a new ValidatorReaderFactory type alias, which is a dyn Fn that allows users to provide their own reader for the final stage of the decompression chain, enabling custom validation logic or skipping built-in checks like CRC.
  • ZipFileReader Enhancements: Refactored the internal ZipFileReader enum with a new CompressedInner enum to differentiate between built-in and custom compressed readers, allowing for dynamic dispatch of the final reader.
  • ZipReadOptions Extension: Extended ZipReadOptions with a final_reader_factory method, providing a public API for users to supply their custom ValidatorReaderFactory when opening a ZIP entry.
  • CRC Skip Example: Included a new example (injection_validation_reader.rs) demonstrating how to use the new final_reader_factory to disable CRC validation by injecting a pass-through reader.
Changelog
  • examples/injection_validation_reader.rs
    • Added a new example file to demonstrate how to use a custom final reader to skip CRC validation.
  • src/lib.rs
    • Exported the new ValidatorReaderFactory type for public use.
  • src/read.rs
    • Defined ValidatorReaderFactory as a type alias for a dyn Fn that creates a custom reader.
    • Introduced CompressedInner enum within ZipFileReader to encapsulate either a built-in Crc32Reader or a user-provided custom dyn Read.
    • Updated the Read trait implementation for ZipFileReader to dispatch calls based on the CompressedInner variant.
    • Modified take_raw_reader to explicitly disallow its use when a custom final reader is active, returning an error instead.
    • Updated the make_reader function to accept an optional final_reader_factory and use it to construct either a built-in Crc32Reader or a custom reader.
    • Added a validation_reader_factory field to the ZipReadOptions struct.
    • Implemented a final_reader_factory builder method for ZipReadOptions to set the custom factory.
    • Adjusted the Drop implementation for ZipFile to correctly exhaust custom readers when take_raw_reader is not supported.
    • Modified read_zipfile_from_stream and read_zipfile_from_stream_with_compressed_size to pass None for the new final_reader_factory parameter, maintaining default behavior.
Activity
  • No activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism for injecting custom readers into the read pipeline, demonstrated with a proof-of-concept for the validation layer. This adds flexibility for users who need to customize parts of the reading process, such as skipping CRC validation. The changes involve introducing a ValidatorReaderFactory trait object, modifying ZipFileReader to accommodate custom readers via a CompressedInner enum, and updating ZipReadOptions to accept a custom factory. A new example demonstrates how to use this feature.

The implementation is a solid proof-of-concept. I've suggested a refactoring to reduce code duplication in the Read implementation for ZipFileReader, which will improve maintainability. I've also suggested making the new example self-contained so it can be run without modification, improving its utility for users.

Comment on lines +20 to +23
// Override with your file; make sure to use a small dummy zip
let file = File::open("/home/user/dummy.zip")?;
let config = Config::default();
let mut archive = ZipArchive::with_config(config, file)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example uses a hardcoded file path /home/user/dummy.zip, which will cause it to fail for most users. To make the example self-contained and runnable out-of-the-box, consider creating a dummy zip archive in memory.

This also requires updating the use statements at the top of the file. You'll need to:

  • Remove use std::fs::File;
  • Add use std::io::{Cursor, Write};
  • Add use zip::write::FileOptions;
    // Create a dummy zip file in memory for the example.
    let mut zip_data: Vec<u8> = Vec::new();
    {
        let mut zip_writer = zip::ZipWriter::new(std::io::Cursor::new(&mut zip_data));
        zip_writer.start_file("hello_world.txt", zip::write::FileOptions::default())?;
        zip_writer.write_all(b"Hello, World!")?;
        zip_writer.finish()?;
    }

    let config = Config::default();
    let mut archive = ZipArchive::with_config(config, std::io::Cursor::new(zip_data))?;

Comment on lines 228 to 264
impl<R: Read + ?Sized> Read for ZipFileReader<'_, R> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read(buf),
ZipFileReader::Compressed(r) => r.read(buf),
ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read(buf),
ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read(buf),
}
}

fn read_exact(&mut self, buf: &mut [u8]) -> io::Result<()> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read_exact(buf),
ZipFileReader::Compressed(r) => r.read_exact(buf),
ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read_exact(buf),
ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read_exact(buf),
}
}

fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read_to_end(buf),
ZipFileReader::Compressed(r) => r.read_to_end(buf),
ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read_to_end(buf),
ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read_to_end(buf),
}
}

fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> {
match self {
ZipFileReader::NoReader => invalid_state(),
ZipFileReader::Raw(r) => r.read_to_string(buf),
ZipFileReader::Compressed(r) => r.read_to_string(buf),
ZipFileReader::Compressed(CompressedInner::Builtin(r)) => r.read_to_string(buf),
ZipFileReader::Compressed(CompressedInner::Custom(r)) => r.read_to_string(buf),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Read implementation for ZipFileReader contains duplicated logic for handling CompressedInner::Builtin and CompressedInner::Custom variants across read, read_exact, read_to_end, and read_to_string. This can be simplified by providing a way to get a &mut dyn Read from CompressedInner.

I suggest adding a helper method to CompressedInner:

impl<'a, R: Read + ?Sized> CompressedInner<'a, R> {
    fn as_read_mut(&mut self) -> &mut (dyn Read + 'a) {
        match self {
            CompressedInner::Builtin(r) => r.as_mut(),
            CompressedInner::Custom(r) => r.as_mut(),
        }
    }
}

This allows you to simplify the Read implementation for ZipFileReader as shown in the suggestion, reducing boilerplate and improving maintainability.

impl<R: Read + ?Sized> Read for ZipFileReader<'_, R> {
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        match self {
            ZipFileReader::NoReader => invalid_state(),
            ZipFileReader::Raw(r) => r.read(buf),
            ZipFileReader::Compressed(r) => r.as_read_mut().read(buf),
        }
    }

    fn read_exact(&mut self, buf: &mut [u8]) -> io::Result<()> {
        match self {
            ZipFileReader::NoReader => invalid_state(),
            ZipFileReader::Raw(r) => r.read_exact(buf),
            ZipFileReader::Compressed(r) => r.as_read_mut().read_exact(buf),
        }
    }

    fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {
        match self {
            ZipFileReader::NoReader => invalid_state(),
            ZipFileReader::Raw(r) => r.read_to_end(buf),
            ZipFileReader::Compressed(r) => r.as_read_mut().read_to_end(buf),
        }
    }

    fn read_to_string(&mut self, buf: &mut String) -> io::Result<usize> {
        match self {
            ZipFileReader::NoReader => invalid_state(),
            ZipFileReader::Raw(r) => r.read_to_string(buf),
            ZipFileReader::Compressed(r) => r.as_read_mut().read_to_string(buf),
        }
    }
}

@Pr0methean Pr0methean marked this pull request as draft March 1, 2026 16:46
@Pr0methean Pr0methean changed the title EXPERIMENT: dyn trait object injection for customizable read pipeline feat: EXPERIMENT: dyn trait object injection for customizable read pipeline Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant