Skip to content

IPC reader: handling of dictionaries with only null values #9595

@pierrebelzile

Description

@pierrebelzile

Describe the bug
The IPC specs https://arrow.apache.org/docs/format/Columnar.html#format-ipc state:

An edge-case for interleaved dictionary and record batches occurs when the record batches contain dictionary encoded arrays that are completely null. In this case, the dictionary for the encoded column might appear after the first record batch.

This does not seem to be implemented. The problem is that C++ (tested with version 17) does not serialize the dictionaries with only null values and therefore those streams cannot be read from Rust.

dict.rs.gz

To Reproduce
dict.rs.gz

Expected behavior
Create a recordbatch that match the source record batch: a typed array with nulls and an empty dictionary.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions