Skip to content

native_datafusion more permissive than Spark 3.x when reading Parquet TimestampNTZ columns #4219

@andygrove

Description

@andygrove

Describe the bug

Spark 3.x does not allow reading Parquet timestamp columns as TimestampNTZ due to the ambiguity. Spark 4.0 relaxed this restriction.

See https://issues.apache.org/jira/browse/SPARK-47447 for details.

The behavior of native_datafusion scan is correct when running with Spark 4.0+ but incorrect when running with Spark 3.x because it returns a value rather than throwing an exception.

The value returned is correct per Spark 4 definition, so this is not a data correctness issue per se, but Comet is more permissive with Spark 3.x

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions