[SPARK-57257][SQL] Support nanosecond-precision timestamps in Hive results by MaxGekk · Pull Request #56320 · apache/spark

MaxGekk · 2026-06-04T10:41:08Z

What changes were proposed in this pull request?

This PR modifies HiveResult to support the nanosecond-precision timestamp types TIMESTAMP_LTZ(p) (TimestampLTZNanosType) and TIMESTAMP_NTZ(p) (TimestampNTZNanosType), p in [7, 9]. Two cases are added to HiveResult.toHiveStringDefault, mirroring the existing microsecond timestamp cases:

(i: Instant, _: TimestampLTZNanosType) -> rendered in the session time zone.
(l: LocalDateTime, _: TimestampNTZNanosType) -> rendered zone-independently.

The external collected values are Instant (LTZ) and LocalDateTime (NTZ); they are converted to the physical TimestampNanosVal at the column precision and formatted with the nanosecond-aware TimestampFormatter (formatNanos / formatWithoutTimeZoneNanos, SPARK-57162), flooring sub-p digits and trimming trailing zeros. This is the same rendering used by casting these types to string (SPARK-57256), so Hive output stays consistent.

Why are the changes needed?

Before the change, formatting a nanosecond timestamp column through HiveResult (e.g. end-to-end SQL / golden-file tests, spark-sql CLI, Thrift server output) hits the catch-all match and fails with a MatchError, analogous to the TimeType issue fixed in SPARK-51517:

scala.MatchError
(2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class scala.Tuple2)

Does this PR introduce any user-facing change?

Yes. It fixes the error above. After the change, nanosecond timestamp values are rendered as proper strings in Hive results (only reachable when spark.sql.timestampNanosTypes.enabled=true).

How was this patch tested?

New cases in HiveResultSuite covering TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) for p in [7, 9]: precision-driven fraction width, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ session-zone rendering vs. zone-independent NTZ, and nested (array/map/struct) values.
New golden-file end-to-end tests timestamp-ltz-nanos.sql and timestamp-ntz-nanos.sql (as SPARK-51517 added time.sql), disabled in ThriftServerQueryTestSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.7.0

…sults ### What changes were proposed in this pull request? This PR modifies `HiveResult` to support the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` (`TimestampLTZNanosType`) and `TIMESTAMP_NTZ(p)` (`TimestampNTZNanosType`), `p` in [7, 9]. Two cases are added to `HiveResult.toHiveStringDefault`, mirroring the existing microsecond timestamp cases: - `(i: Instant, _: TimestampLTZNanosType)` -> rendered in the session time zone. - `(l: LocalDateTime, _: TimestampNTZNanosType)` -> rendered zone-independently. The external collected values are `Instant` (LTZ) and `LocalDateTime` (NTZ); they are converted to the physical `TimestampNanosVal` at the column precision and formatted with the nanosecond-aware `TimestampFormatter` (`formatNanos` / `formatWithoutTimeZoneNanos`, SPARK-57162), flooring sub-`p` digits and trimming trailing zeros. This is the same rendering used by casting these types to string (SPARK-57256), so Hive output stays consistent. ### Why are the changes needed? Before the change, formatting a nanosecond timestamp column through `HiveResult` (e.g. end-to-end SQL / golden-file tests, spark-sql CLI, Thrift server output) hits the catch-all match and fails with a `MatchError`, analogous to the `TimeType` issue fixed in SPARK-51517: ``` scala.MatchError (2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class scala.Tuple2) ``` ### Does this PR introduce _any_ user-facing change? Yes. It fixes the error above. After the change, nanosecond timestamp values are rendered as proper strings in Hive results (only reachable when `spark.sql.timestampNanosTypes.enabled=true`). ### How was this patch tested? - New cases in `HiveResultSuite` covering `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` for `p` in [7, 9]: precision-driven fraction width, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ session-zone rendering vs. zone-independent NTZ, and nested (array/map/struct) values. - New golden-file end-to-end tests `timestamp-ltz-nanos.sql` and `timestamp-ntz-nanos.sql` (as SPARK-51517 added `time.sql`), disabled in `ThriftServerQueryTestSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.7.0

MaxGekk · 2026-06-05T09:34:58Z

@peter-toth @dongjoon-hyun @HyukjinKwon Could you review this PR, please. This PR allows to write end-to-end tests in *.sql for timestamps with nanosecond precision.

uros-b · 2026-06-05T12:08:00Z

Do we have any test cases with a pre-1970 value like 1960-01-01 00:00:00.000000001 which exercises the negative-epoch path?

Good catch -- all the previous cases used a post-epoch base. Added a pre-1970 base (1960-01-01) so we now cover the negative-epochMicros + positive-nanosWithinMicro path, both in the HiveResultSuite matrix and in the timestamp-{ltz,ntz}-nanos.sql golden files. Done in d578fbb.

uros-b · 2026-06-05T12:08:37Z

It would be nice to add at least some test cases with null values (both top-level and nested/complex).

Added. Note: NULLs for these columns are handled by the generic (null, _) branch in HiveResult.toHiveString, which runs before the new TimestampLTZNanosType / TimestampNTZNanosType cases, so the path is type-agnostic -- but it's worth locking in. Added top-level (NULL) and nested array/map/struct NULL cases to both golden files (and the suite). Done in d578fbb.

peter-toth

LGTM, but the test gaps that @uros-b's mentioned are valid. Please add them before merging.

…mestamps in Hive results Address review feedback on apache#56320: - Add a pre-1970 base (1960-01-01) to exercise the negative-epoch path (negative epochMicros + positive nanosWithinMicro), in both HiveResultSuite and the timestamp-{ltz,ntz}-nanos.sql golden files. - Add top-level and nested (array/map/struct) NULL cases. NULLs are handled by the generic `(null, _)` branch in `HiveResult.toHiveString`, but this locks in the behavior. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.7.0

MaxGekk · 2026-06-05T13:53:58Z

Thanks for the review @peter-toth and @uros-b! Added the pre-1970 (negative-epoch) and NULL (top-level + nested) coverage in d578fbb, in both HiveResultSuite and the timestamp-{ltz,ntz}-nanos.sql golden files.

dongjoon-hyun

+1, LGTM.

MaxGekk · 2026-06-05T19:09:16Z

The failure Run / Build modules: pyspark-connect-old-client is not related to this PR, I believe.

Merging to master/4.x. Thank you, @peter-toth @uros-b @dongjoon-hyun for review.

…sults ### What changes were proposed in this pull request? This PR modifies `HiveResult` to support the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` (`TimestampLTZNanosType`) and `TIMESTAMP_NTZ(p)` (`TimestampNTZNanosType`), `p` in [7, 9]. Two cases are added to `HiveResult.toHiveStringDefault`, mirroring the existing microsecond timestamp cases: - `(i: Instant, _: TimestampLTZNanosType)` -> rendered in the session time zone. - `(l: LocalDateTime, _: TimestampNTZNanosType)` -> rendered zone-independently. The external collected values are `Instant` (LTZ) and `LocalDateTime` (NTZ); they are converted to the physical `TimestampNanosVal` at the column precision and formatted with the nanosecond-aware `TimestampFormatter` (`formatNanos` / `formatWithoutTimeZoneNanos`, SPARK-57162), flooring sub-`p` digits and trimming trailing zeros. This is the same rendering used by casting these types to string (SPARK-57256), so Hive output stays consistent. ### Why are the changes needed? Before the change, formatting a nanosecond timestamp column through `HiveResult` (e.g. end-to-end SQL / golden-file tests, spark-sql CLI, Thrift server output) hits the catch-all match and fails with a `MatchError`, analogous to the `TimeType` issue fixed in SPARK-51517: ``` scala.MatchError (2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class scala.Tuple2) ``` ### Does this PR introduce _any_ user-facing change? Yes. It fixes the error above. After the change, nanosecond timestamp values are rendered as proper strings in Hive results (only reachable when `spark.sql.timestampNanosTypes.enabled=true`). ### How was this patch tested? - New cases in `HiveResultSuite` covering `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` for `p` in [7, 9]: precision-driven fraction width, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ session-zone rendering vs. zone-independent NTZ, and nested (array/map/struct) values. - New golden-file end-to-end tests `timestamp-ltz-nanos.sql` and `timestamp-ntz-nanos.sql` (as SPARK-51517 added `time.sql`), disabled in `ThriftServerQueryTestSuite`. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.7.0 Closes #56320 from MaxGekk/nanos-hiveresult. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 637803e) Signed-off-by: Max Gekk <max.gekk@gmail.com>

MaxGekk mentioned this pull request Jun 5, 2026

[SPARK-57256][SQL] Cast nanosecond-precision timestamps to string #56317

Open

uros-b reviewed Jun 5, 2026

View reviewed changes

peter-toth approved these changes Jun 5, 2026

View reviewed changes

dongjoon-hyun approved these changes Jun 5, 2026

View reviewed changes

MaxGekk closed this in 637803e Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57257][SQL] Support nanosecond-precision timestamps in Hive results#56320

[SPARK-57257][SQL] Support nanosecond-precision timestamps in Hive results#56320
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-hiveresult

MaxGekk commented Jun 4, 2026

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

uros-b Jun 5, 2026

Uh oh!

MaxGekk Jun 5, 2026

Uh oh!

uros-b Jun 5, 2026

Uh oh!

MaxGekk Jun 5, 2026

Uh oh!

peter-toth left a comment

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MaxGekk commented Jun 4, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

uros-b Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

uros-b Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

peter-toth left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants