feat(pack): add PackStats utility for pack object type distribution#139
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds a lightweight pack scanning API that collects object-type distribution statistics (including deltas) and expands the test suite to validate the new stats behavior against real and synthetic pack inputs.
Changes:
- Introduce
PackStats(withDisplay) andPack::pack_stats()to scan a pack stream and report object counts by type. - Refactor pack decoding tests to use local
tests/data/packs/...fixtures instead of downloading. - Add extensive unit tests for
pack_statsnormal/error paths and basic formatting.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f3899d6745
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 862fe12881
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4c6ca21cce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Add PackStats struct and Pack::pack_stats function that scans a pack file and returns object counts broken down by type (commits, trees, blobs, tags) and delta subtypes (offset, hash, zstd). The implementation reuses existing check_header, read_type_and_varint_size, read_offset_encoding, and ZlibDecoder helpers without duplicating the full Decode main loop (no thread pool, cache, or waitlist). Includes 17 tests covering normal path (SHA-1/SHA-256, with/without deltas, cross-validation against Pack::decode) and error path (invalid magic, invalid version, truncated data, invalid type bits, empty/short input).
Address PR review feedback: - Remove unnecessary Send bound from pack_stats signature - Replace read_to_end with io::sink() to avoid unbounded memory allocation when scanning large objects - Extract consume_zlib helper to eliminate duplicate decompression logic - Fix test comment (invalid_magic, not file-not-found) - Relax brittle assertion on byte formatting in error message
Wrap the input reader in Wrapper so a running SHA-1/SHA-256 hash is maintained across all bytes read. After scanning all objects, compare the computed hash against the stored trailer and reject the pack on mismatch — matching the integrity guarantee of Pack::decode. Add test_pack_stats_trailer_mismatch that flips a trailer byte and asserts the error is returned.
Address Copilot review: - Derive initial offset from header_data.len() instead of hardcoding 12 - Lift hash_buf allocation outside the main loop and reuse the buffer across all REF_DELTA objects to reduce allocator churn
4c6ca21 to
7d4a426
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d4a426123
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Place `use super::PackStats` before `use crate::` to satisfy the import ordering enforced by nightly rustfmt in CI.
Rust 1.96 introduces stricter clippy lints that cause CI failures: - pack_index.rs: replace sort_by with sort_by_key - protocol/pack.rs: collapse nested if into match guard
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 940cc05363
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
本 PR 共 6 个 commit,已处理 CI 中所有与本 PR 相关的错误:
本地验证: 改动文件:
|
Nightly rustfmt expects `use super::` and `use crate::` to be adjacent without an intervening blank line.
The 9 pre-existing tests in decode.rs directly open pack files from tests/data/packs/ using unwrap(), which panics when LFS objects have not been pulled. Add is_valid_pack() guard to all 7 helper functions and the standalone test_pack_check_header so they gracefully skip instead of failing CI.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 91be8ab371
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3ea2e52cb1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7163cbd8ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 838c541fad
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
# Conflicts: # src/internal/pack/decode.rs
Add PackStats struct and Pack::pack_stats function that scans a pack file and returns object counts broken down by type (commits, trees, blobs, tags) and delta subtypes (offset, hash, zstd).
The implementation reuses existing check_header, read_type_and_varint_size, read_offset_encoding, and ZlibDecoder helpers without duplicating the full Decode main loop (no thread pool, cache, or waitlist).
Includes 17 tests covering normal path (SHA-1/SHA-256, with/without deltas, cross-validation against Pack::decode) and error path (invalid magic, invalid version, truncated data, invalid type bits, empty/short input).