Skip to content

feat: support read for nested type sub column#355

Open
lucasfang wants to merge 22 commits into
alibaba:mainfrom
lucasfang:macro
Open

feat: support read for nested type sub column#355
lucasfang wants to merge 22 commits into
alibaba:mainfrom
lucasfang:macro

Conversation

@lucasfang

@lucasfang lucasfang commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Title: feat: support nested struct sub-column recall and map selected-keys recall

Purpose

Linked issue: close #xxx

This PR introduces end-to-end support for read-side projection improvements focused on two capabilities:

  1. Nested struct sub-column recall
  • Support reading partial nested fields from STRUCT/LIST/MAP hierarchies.
  • Add recursive nested projection utilities at framework level, including schema-level detection for nested sub-field projection.
  • Keep format-specific behavior in format layer:
    • Avro: fail fast for unsupported nested sub-field projection with explicit error.
    • Lance: add the same fail-fast protection for unsupported nested sub-field projection.
  1. Map selected-keys recall
  • Support MAP_SELECTED_KEYS metadata driven key filtering during read.
  • Define and implement empty-string semantic for MAP_SELECTED_KEYS:
    • metadata key present with empty value means filter all keys.
    • empty tokens in comma-separated values are ignored.
  • Add utility and integration coverage for selected-keys behavior.

Additional alignment/fixes included in this commit range:

  • Refactor and consolidate nested projection detection logic into NestedProjectionUtils and simplify format readers.
  • Parquet read path fix for out-of-order projected schema (align output to read schema order).
  • Field mapping relaxations to avoid false mismatches for valid nested/projection scenarios.
  • Extend integration and unit tests around nested pruning and map selected-keys.

Tests

Unit tests:

  • NestedProjectionUtilsTest.HasNestedSubfieldProjection_NoProjection
  • NestedProjectionUtilsTest.HasNestedSubfieldProjection_WithProjection
  • NestedProjectionUtils map selected-keys related tests, including:
    • GetMapSelectedKeys_EmptyString
    • GetMapSelectedKeys_ContainsEmptyToken
    • FilterMapArrayBySelectedKeys_EmptyKeyMeansFilterAll

Format tests:

  • AvroFileBatchReaderTest.TestSetReadSchemaRejectNestedSubFieldProjection
  • Compression/AvroFileFormatTest.TestNestedMap/*

Integration tests:

  • FileFormats/NestedColumnPruningInteTest.MapSelectedKeysEmptyStringMeansFilterAll/*
  • Nested column pruning scenarios for parquet/orc matrix (struct/deep nested/special fields/map selected-keys)

Note:

  • Lance nested sub-field projection fail-fast test was added in source and should be executed in an environment built with PAIMON_ENABLE_LANCE=ON.

API and Format

  • API impact:

    • Read projection behavior is enhanced for nested recall and map selected-keys filtering.
    • No incompatible public API removal.
  • Storage/protocol impact:

    • No on-disk format or protocol change.
    • Behavior change is in reader projection and validation paths.

Documentation

  • No new external feature switch introduced.
  • Existing behavior is clarified by tests for nested projection and map selected-keys semantics.

Generative AI tooling

Generated-by: GitHub Copilot (GPT-5.3-Codex)

Copilot AI review requested due to automatic review settings June 10, 2026 08:04
@lucasfang lucasfang marked this pull request as draft June 10, 2026 08:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@lucasfang lucasfang force-pushed the macro branch 2 times, most recently from 4ec45b2 to b1972bb Compare June 12, 2026 09:17
@lucasfang lucasfang marked this pull request as ready for review June 12, 2026 09:28
@lucasfang lucasfang force-pushed the macro branch 3 times, most recently from fd6e284 to 19ea62b Compare June 16, 2026 01:11
@lucasfang lucasfang requested a review from Copilot June 16, 2026 01:25

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 5 comments.

Comment thread include/paimon/read_context.h
Comment thread src/paimon/core/operation/internal_read_context.cpp
Comment thread src/paimon/format/parquet/parquet_file_batch_reader.cpp Outdated
Comment thread src/paimon/format/parquet/parquet_file_batch_reader.cpp Outdated
Comment thread src/paimon/core/utils/nested_projection_utils.h Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants