Skip to content

Fix corrupt data handling edge cases#116

Merged
oschwald merged 6 commits into
mainfrom
greg/fix-review-findings
Jun 28, 2026
Merged

Fix corrupt data handling edge cases#116
oschwald merged 6 commits into
mainfrom
greg/fix-review-findings

Conversation

@oschwald

@oschwald oschwald commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Summary

  • Harden skipped ignored/unknown-field decoding so corrupt data cannot bypass data-section bounds or nesting-depth limits.
  • Fix within()/networks() for IPv6 databases without an IPv4 subtree, including comments for the non-obvious terminal-node checks.
  • Make Metadata::build_time() overflow-safe and add Metadata::checked_build_time() for callers that need to detect overflow.
  • Add focused benchmarks and next-release changelog entries for the fixes.

Benchmarks

  • Decoder skip fix: serde benchmarks improved or stayed within noise; lookup stayed within noise.
  • Iterator fix: networks_city_test no change, city_subnet improved about 7.4%, mixed iteration within noise.
  • Metadata fix: direct metadata/build_time benchmark improved about 21.5%.

Validation

  • cargo fmt --check
  • cargo test
  • cargo bench --no-run
  • cargo clippy --all-targets -- -D warnings
  • cargo test --features mmap
  • cargo test --features simdutf8
  • cargo test --features unsafe-str-decode
  • cargo test --features "mmap simdutf8"
  • cargo test --features "mmap unsafe-str-decode"
  • cargo clippy --all-targets --features mmap -- -D warnings
  • cargo clippy --all-targets --features simdutf8 -- -D warnings
  • cargo clippy --all-targets --features unsafe-str-decode -- -D warnings
  • cargo clippy --all-targets --features "mmap simdutf8" -- -D warnings
  • cargo clippy --all-targets --features "mmap unsafe-str-decode" -- -D warnings

Summary by CodeRabbit

  • Bug Fixes
    • Improved within()/network iteration for IPv6 databases without an IPv4 search tree, including correct /32 handling and improved address normalization.
    • Made metadata build_time overflow-safe and changed it to return Result<SystemTime, _> for unrepresentable timestamps.
    • Strengthened decoder bounds/ignored-field skipping and updated verification to reject truncated/corrupt database data more reliably.
  • Tests
    • Added regression tests for build-time conversion/rejection, IPv6 iteration edge cases, and multiple corrupted/invalid fixtures.
  • Chores
    • Added new Criterion benchmarks for metadata, within(), and serde decoding micro-cases.

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds safer metadata time conversion, tighter decoder bounds and skip validation, IPv6 within/networks handling without an IPv4 subtree, and new benchmark targets for the affected paths.

Changes

Bug fixes and benchmarks

Layer / File(s) Summary
Data-section and metadata boundaries
src/reader.rs, src/result.rs, src/metadata.rs, src/reader_test.rs
Reader stores derived data-section boundaries, lazy decoding is limited to the metadata boundary, and metadata build-time conversion gains a checked path with an overflow fallback.
Skip validation and depth limits
src/decoder.rs, src/result.rs, src/reader_test.rs
The decoder skip path adds a recursion-depth cap, limit-based offset checks, post-skip validation, and regression coverage for truncated pointers and deep nesting.
IPv6 within and network normalization
src/reader.rs, src/within.rs, src/reader_test.rs
within() caches IPv4-subtree presence and handles IPv6 databases without an IPv4 subtree, while Within centralizes network-kind selection and emitted address formatting.
Benchmarks and release notes
Cargo.toml, benches/within.rs, benches/metadata.rs, benches/serde_usage.rs, CHANGELOG.md, README.md, UPGRADING.md
New benchmark targets and Criterion benchmarks are added for within, metadata, and serde decoding, and the changelog and upgrade notes record the 0.29.0 release changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐇 I hopped through bytes and counted the rain,
Safe little times won’t overflow again.
I bounded my burrow, sharp and bright,
And IPv6 twinkled just right.
With benches and notes in a tidy array,
This rabbit says: hop, hooray!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and clearly reflects the main focus on hardening corrupt data handling edge cases.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch greg/fix-review-findings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/decoder.rs`:
- Around line 713-727: The pointer-handling branch in decoder logic does not
verify the encoded pointer payload when follow_pointers is false, allowing
truncated pointers to be accepted after decode_pointer() clamps them. Update the
TYPE_POINTER path in the decoder to advance past the raw pointer bytes with
skip_bytes(size) in the no-follow case instead of immediately returning Ok(()),
while keeping the existing follow_pointers flow and validate_skip_end() behavior
for the pointer-following path.

In `@src/within.rs`:
- Around line 211-216: The normalization in the IPv6 handling path is zeroing
out low IPv6 addresses, which turns real networks like ::1/128 into ::/128.
Update the conversion logic in the within.rs IPv6 branch (the code that matches
on NetworkKind::V6 and IpAddr::V4, plus the related low-key handling later in
the same function) to reconstruct the IPv6 address from the original
IpInt::V6(ip) value instead of replacing it with zero, so the low bits are
preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a2071a19-c1e5-49ff-b2d8-f8583fd4ee85

📥 Commits

Reviewing files that changed from the base of the PR and between 8e86128 and 4cd8151.

📒 Files selected for processing (11)
  • CHANGELOG.md
  • Cargo.toml
  • benches/metadata.rs
  • benches/serde_usage.rs
  • benches/within.rs
  • src/decoder.rs
  • src/metadata.rs
  • src/reader.rs
  • src/reader_test.rs
  • src/result.rs
  • src/within.rs

Comment thread src/decoder.rs Outdated
Comment thread src/within.rs Outdated
@oschwald

Copy link
Copy Markdown
Owner Author

Addressed review feedback with three fixup commits:

  • fixup! fix: bound skipped data decoding
    • Validate raw pointer payload bytes in the verification/no-follow skip path.
    • Added regression coverage for a truncated pointer payload.
    • Added comments for the top-level container overrun checks.
    • Kept data-section bounds via a decoder limit while avoiding duplicate hot-path bounds checks.
  • fixup! fix: handle iteration without IPv4 subtree
    • Preserve low IPv6 address bits instead of normalizing ::1/128-style records to ::/128.
    • Reused the shared network_kind() helper in both iterator result branches.

Benchmark comparison against fresh main baselines (pr116_main_full):

  • lookup/maxminddb: no change detected (-0.27% to +0.37%).
  • lookup/maxminddb_par: no change detected (-0.02% to +6.18%, p=0.06).
  • serde_usage/lookup_only: within noise (+0.24% to +0.81%).
  • serde_usage/decode_city_only: within noise (+0.32% to +1.15%).
  • serde_usage/decode_country_only: improved (-4.96% to -4.46%).
  • serde_usage/decode_path_country_iso: improved (-13.81% to -13.28%).
  • serde_usage/decode_path_city_name: improved (-14.74% to -13.37%).

Branch-only benchmarks also ran:

  • serde_usage/decode_empty_record: 35.205 µs to 35.363 µs.
  • within/networks_city_test: 15.315 µs to 15.377 µs.
  • within/city_subnet: 2.5273 µs to 2.5345 µs.
  • within/networks_mixed_test: 3.5452 µs to 3.5998 µs.
  • metadata/build_time: 761.61 ps to 762.99 ps.

Validation rerun after the fixups:

  • cargo fmt --check
  • cargo test
  • cargo clippy --all-targets -- -D warnings
  • cargo test --features mmap
  • cargo test --features simdutf8
  • cargo test --features unsafe-str-decode
  • cargo test --features "mmap simdutf8"
  • cargo test --features "mmap unsafe-str-decode"
  • cargo clippy --all-targets --features mmap -- -D warnings
  • cargo clippy --all-targets --features simdutf8 -- -D warnings
  • cargo clippy --all-targets --features unsafe-str-decode -- -D warnings
  • cargo clippy --all-targets --features "mmap simdutf8" -- -D warnings
  • cargo clippy --all-targets --features "mmap unsafe-str-decode" -- -D warnings

@oschwald

Copy link
Copy Markdown
Owner Author

Addressed the review.md marker-boundary finding with bb9d77df (fixup! fix: bound skipped data decoding).

What changed:

  • Exclude the 14-byte metadata marker from the data-section end used by pointer bounds, verifier decoding, lazy result decoding, and within() empty-value checks.
  • Cache the marker-excluding data-section length on Reader so hot-path decoder setup does not recompute it.
  • Added regression coverage for a search-tree pointer resolving exactly to the metadata marker.
  • Added a short comment explaining why find_metadata_start() must be adjusted back to the marker start.

Validation after the fix:

  • cargo fmt --check
  • cargo test --quiet
  • cargo clippy --all-targets -- -D warnings
  • cargo test --features "mmap unsafe-str-decode" --quiet
  • cargo clippy --all-targets --features mmap -- -D warnings
  • cargo clippy --all-targets --features simdutf8 -- -D warnings
  • cargo clippy --all-targets --features unsafe-str-decode -- -D warnings
  • cargo clippy --all-targets --features "mmap simdutf8" -- -D warnings
  • cargo clippy --all-targets --features "mmap unsafe-str-decode" -- -D warnings
  • Focused regression: cargo test test_rejects_data_pointer_to_metadata_marker --quiet

Full benchmark pass against saved main baseline pr116_main_full:

  • lookup/maxminddb: within Criterion noise threshold (-1.13% to -0.49%).
  • lookup/maxminddb_par: no change detected (-2.83% to +3.55%).
  • serde_usage/lookup_only: improved (-1.77% to -1.18%).
  • serde_usage/decode_city_only: within Criterion noise threshold (-1.78% to -0.97%).
  • serde_usage/decode_country_only: improved (-6.28% to -5.70%).
  • serde_usage/decode_path_country_iso: improved (-12.89% to -12.27%).
  • serde_usage/decode_path_city_name: improved (-13.34% to -12.04%).

Branch-only benchmark targets newly added on this PR:

  • serde_usage/decode_empty_record: 35.409 us to 35.500 us.
  • within/networks_city_test: 15.369 us to 15.597 us.
  • within/city_subnet: 2.5300 us to 2.5411 us.
  • within/networks_mixed_test: 3.5663 us to 3.5955 us.
  • metadata/build_time: 760.86 ps to 763.01 ps.

No regressions showed up in this pass.

@oschwald oschwald force-pushed the greg/fix-review-findings branch 2 times, most recently from c25c6f6 to 3a53611 Compare June 28, 2026 21:26
@oschwald

Copy link
Copy Markdown
Owner Author

Updated the metadata build-time fix after follow-up discussion and force-pushed the autosquashed branch.

Current behavior:

  • Version is bumped to 0.29.0.
  • Metadata::build_time() now returns Result<SystemTime, MaxMindDbError>.
  • Metadata::checked_build_time() is removed.
  • Reader::from_source() / open_readfile() and verify() reject unrepresentable build_epoch values as InvalidDatabase.
  • The changelog lists this breaking change first, and UPGRADING.md includes the 0.28 -> 0.29 migration note.

Validation after autosquash:

  • cargo fmt --check
  • cargo test --quiet
  • cargo clippy --all-targets -- -D warnings
  • Feature tests: mmap, simdutf8, unsafe-str-decode, mmap simdutf8, mmap unsafe-str-decode
  • Feature clippy: mmap, simdutf8, unsafe-str-decode, mmap simdutf8, mmap unsafe-str-decode

Benchmark comparison for the affected API against the previous commit:

  • metadata/build_time: no change detected (-0.1192% to +0.3296%, p=0.42).

The earlier full main comparison for lookup/serde paths remains applicable; this follow-up changes metadata validation and the metadata benchmarked accessor.

@oschwald oschwald force-pushed the greg/fix-review-findings branch from 3a53611 to c589a6f Compare June 28, 2026 21:46
@oschwald

Copy link
Copy Markdown
Owner Author

Added the two agreed 0.29.0 breaking cleanups and autosquashed them into the metadata commit.

Additional changes:

  • Reader::metadata is now private; use Reader::metadata() for shared access to validated metadata.
  • Reader::from_source() / open now reject unsupported major format versions, unsupported IP versions, zero-node search trees, and unrepresentable build epochs.
  • Minor binary format versions remain accepted for forward compatibility.
  • Removed the unreleased checked_build_time() wording from CHANGELOG.md and UPGRADING.md.

Validation after autosquash:

  • cargo test --quiet
  • cargo clippy --all-targets -- -D warnings

Feature validation run before autosquash on the same tree:

  • cargo test --features mmap --quiet
  • cargo test --features simdutf8 --quiet
  • cargo test --features unsafe-str-decode --quiet
  • cargo test --features "mmap simdutf8" --quiet
  • cargo test --features "mmap unsafe-str-decode" --quiet
  • cargo clippy --all-targets --features mmap -- -D warnings
  • cargo clippy --all-targets --features simdutf8 -- -D warnings
  • cargo clippy --all-targets --features unsafe-str-decode -- -D warnings
  • cargo clippy --all-targets --features "mmap simdutf8" -- -D warnings
  • cargo clippy --all-targets --features "mmap unsafe-str-decode" -- -D warnings

Focused previous-commit benchmark comparisons:

  • metadata/build_time: within Criterion noise threshold (+0.6891% to +1.6779%).
  • within/networks_city_test: within Criterion noise threshold (+0.8514% to +1.7294%).
  • within/city_subnet: within Criterion noise threshold (+0.3233% to +1.2495%).
  • within/networks_mixed_test: within Criterion noise threshold (-1.7150% to -0.1483%).

@oschwald oschwald merged commit 7a44f85 into main Jun 28, 2026
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant