Skip to content

Parquet: Handle decimal and UUID literals in filter pushdown#16051

Open
laserninja wants to merge 1 commit intoapache:mainfrom
laserninja:fix/16035-decimal-uuid-filter-pushdown
Open

Parquet: Handle decimal and UUID literals in filter pushdown#16051
laserninja wants to merge 1 commit intoapache:mainfrom
laserninja:fix/16035-decimal-uuid-filter-pushdown

Conversation

@laserninja
Copy link
Copy Markdown

Summary

Handle BigDecimal and UUID literal types in ParquetFilters to enable filter pushdown for decimal and UUID predicates.

Problem

Closes #16035

ParquetFilters.getParquetPrimitive() throws UnsupportedOperationException for BigDecimal and UUID values. Additionally, decimal predicates were always routed through binaryColumn() regardless of precision, producing incorrect Parquet filter predicates for decimals that are stored as INT32 or INT64.

Solution

  • Decimal: Route predicates through the correct Parquet column type based on precision:
    • intColumn for precision ≤ 9 (stored as INT32)
    • longColumn for precision ≤ 18 (stored as INT64)
    • binaryColumn for precision > 18 (stored as FIXED_LEN_BYTE_ARRAY)
    • Added helper methods getDecimalAsInt(), getDecimalAsLong(), and getDecimalAsBinary() for the conversions
  • UUID: Convert via UUIDUtil.convert() to 16-byte Binary in getParquetPrimitive()

Testing

  • Existing tests pass (./gradlew :iceberg-parquet:test)
  • Added TestParquetFilters with 10 tests covering:
    • All three decimal precision tiers (INT32, INT64, FIXED_LEN_BYTE_ARRAY)
    • Decimal comparison predicates (lt, lte, gt, gte)
    • Decimal null/notNull predicates on optional fields
    • UUID equality and comparison predicates
    • UUID null/notNull predicates on optional fields
    • Regression tests for string and integer filter pushdown
  • ./gradlew spotlessApply passes

Note: This contribution was AI-assisted. The implementation logic and design decisions were reviewed and verified by the author.

Fix ParquetFilters.getParquetPrimitive() throwing
UnsupportedOperationException for BigDecimal and UUID values.

For decimals, the fix routes predicates through the correct Parquet
column type based on precision: INT32 (precision <= 9), INT64
(precision <= 18), or FIXED_LEN_BYTE_ARRAY (precision > 18),
matching how TypeToMessageType stores decimal values.

For UUIDs, the fix converts UUID values to 16-byte Binary via
UUIDUtil.convert() in getParquetPrimitive().

Closes apache#16035
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet: Filter pushdown throws UnsupportedOperationException for decimal and UUID predicates

2 participants