Skip to content

fix(parquet): fallback nested field reading to RowGroup reading (walkaround)#371

Open
zhf999 wants to merge 8 commits into
alibaba:mainfrom
zhf999:nested-col-fallback
Open

fix(parquet): fallback nested field reading to RowGroup reading (walkaround)#371
zhf999 wants to merge 8 commits into
alibaba:mainfrom
zhf999:nested-col-fallback

Conversation

@zhf999

@zhf999 zhf999 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Purpose

  • Fix a correctness issue when predicate pushdown enables page-level filtering on Parquet files containing nested columns.
  • When read schema contains nested fields (struct/list/map), page-level predicate filtering currently uses BuildPageFilteredSchema mapping that does not correctly align nested Parquet leaf columns, causing nested column reads to fail.
  • Add a fallback path: when nested columns exist in read schema, disable page-level index filtering and fall back to row-group level reading, which preserves correctness.

Changes

  • In src/paimon/format/parquet/parquet_file_batch_reader.cpp:
    • Detect nested columns in file schema via ArrowSchemaValidator::IsNestedType.
    • Skip page-level index filter initialization/build when nested field exists.
    • This makes reader use row-group filtered flow for nested schemas, avoiding the failing nested leaf-index mapping path.

Tests

  • Added/updated unit tests in src/paimon/format/parquet/page_filtered_row_group_reader_test.cpp (no public API test harness changes).
  • Validated expected behavior on:
    • nested struct full read with predicate
    • nested struct only-read with predicate on non-selected top-level field
    • nested list read with predicate
    • nested map read with predicate
    • adjacent nested columns coexistence

API and Format

  • No API changes.
  • No storage format/protocol changes.

Documentation

  • No user-facing feature/documentation updates required.

Generative AI tooling

  • No

Copilot AI review requested due to automatic review settings June 17, 2026 02:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@zhf999 zhf999 changed the title test: add test cases for nested columns fix(parquet): fallback nested field reading to RowGroup reading (walkaround) Jun 17, 2026
Comment thread src/paimon/format/parquet/parquet_file_batch_reader.cpp Outdated
Comment thread src/paimon/format/parquet/page_filtered_row_group_reader_test.cpp Outdated
Comment thread src/paimon/format/parquet/page_filtered_row_group_reader_test.cpp Outdated
Comment thread src/paimon/format/parquet/page_filtered_row_group_reader_test.cpp

@lxy-9602 lxy-9602 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants