Skip to content

[SPARK-57257][SQL] Support nanosecond-precision timestamps in Hive results#56320

Closed
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-hiveresult
Closed

[SPARK-57257][SQL] Support nanosecond-precision timestamps in Hive results#56320
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-hiveresult

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 4, 2026

What changes were proposed in this pull request?

This PR modifies HiveResult to support the nanosecond-precision timestamp types TIMESTAMP_LTZ(p) (TimestampLTZNanosType) and TIMESTAMP_NTZ(p) (TimestampNTZNanosType), p in [7, 9]. Two cases are added to HiveResult.toHiveStringDefault, mirroring the existing microsecond timestamp cases:

  • (i: Instant, _: TimestampLTZNanosType) -> rendered in the session time zone.
  • (l: LocalDateTime, _: TimestampNTZNanosType) -> rendered zone-independently.

The external collected values are Instant (LTZ) and LocalDateTime (NTZ); they are converted to the physical TimestampNanosVal at the column precision and formatted with the nanosecond-aware TimestampFormatter (formatNanos / formatWithoutTimeZoneNanos, SPARK-57162), flooring sub-p digits and trimming trailing zeros. This is the same rendering used by casting these types to string (SPARK-57256), so Hive output stays consistent.

Why are the changes needed?

Before the change, formatting a nanosecond timestamp column through HiveResult (e.g. end-to-end SQL / golden-file tests, spark-sql CLI, Thrift server output) hits the catch-all match and fails with a MatchError, analogous to the TimeType issue fixed in SPARK-51517:

scala.MatchError
(2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class scala.Tuple2)

Does this PR introduce any user-facing change?

Yes. It fixes the error above. After the change, nanosecond timestamp values are rendered as proper strings in Hive results (only reachable when spark.sql.timestampNanosTypes.enabled=true).

How was this patch tested?

  • New cases in HiveResultSuite covering TIMESTAMP_LTZ(p) / TIMESTAMP_NTZ(p) for p in [7, 9]: precision-driven fraction width, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ session-zone rendering vs. zone-independent NTZ, and nested (array/map/struct) values.
  • New golden-file end-to-end tests timestamp-ltz-nanos.sql and timestamp-ntz-nanos.sql (as SPARK-51517 added time.sql), disabled in ThriftServerQueryTestSuite.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.7.0

…sults

### What changes were proposed in this pull request?
This PR modifies `HiveResult` to support the nanosecond-precision timestamp types
`TIMESTAMP_LTZ(p)` (`TimestampLTZNanosType`) and `TIMESTAMP_NTZ(p)`
(`TimestampNTZNanosType`), `p` in [7, 9]. Two cases are added to
`HiveResult.toHiveStringDefault`, mirroring the existing microsecond timestamp cases:

- `(i: Instant, _: TimestampLTZNanosType)` -> rendered in the session time zone.
- `(l: LocalDateTime, _: TimestampNTZNanosType)` -> rendered zone-independently.

The external collected values are `Instant` (LTZ) and `LocalDateTime` (NTZ); they are
converted to the physical `TimestampNanosVal` at the column precision and formatted with
the nanosecond-aware `TimestampFormatter` (`formatNanos` / `formatWithoutTimeZoneNanos`,
SPARK-57162), flooring sub-`p` digits and trimming trailing zeros. This is the same
rendering used by casting these types to string (SPARK-57256), so Hive output stays
consistent.

### Why are the changes needed?
Before the change, formatting a nanosecond timestamp column through `HiveResult` (e.g.
end-to-end SQL / golden-file tests, spark-sql CLI, Thrift server output) hits the
catch-all match and fails with a `MatchError`, analogous to the `TimeType` issue fixed in
SPARK-51517:

```
scala.MatchError
(2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class scala.Tuple2)
```

### Does this PR introduce _any_ user-facing change?
Yes. It fixes the error above. After the change, nanosecond timestamp values are rendered
as proper strings in Hive results (only reachable when
`spark.sql.timestampNanosTypes.enabled=true`).

### How was this patch tested?
- New cases in `HiveResultSuite` covering `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` for `p`
  in [7, 9]: precision-driven fraction width, trailing-zero trimming, nanosWithinMicro 0
  and 999, LTZ session-zone rendering vs. zone-independent NTZ, and nested
  (array/map/struct) values.
- New golden-file end-to-end tests `timestamp-ltz-nanos.sql` and `timestamp-ntz-nanos.sql`
  (as SPARK-51517 added `time.sql`), disabled in `ThriftServerQueryTestSuite`.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.7.0
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

@peter-toth @dongjoon-hyun @HyukjinKwon Could you review this PR, please. This PR allows to write end-to-end tests in *.sql for timestamps with nanosecond precision.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any test cases with a pre-1970 value like 1960-01-01 00:00:00.000000001 which exercises the negative-epoch path?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- all the previous cases used a post-epoch base. Added a pre-1970 base (1960-01-01) so we now cover the negative-epochMicros + positive-nanosWithinMicro path, both in the HiveResultSuite matrix and in the timestamp-{ltz,ntz}-nanos.sql golden files. Done in d578fbb.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add at least some test cases with null values (both top-level and nested/complex).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Note: NULLs for these columns are handled by the generic (null, _) branch in HiveResult.toHiveString, which runs before the new TimestampLTZNanosType / TimestampNTZNanosType cases, so the path is type-agnostic -- but it's worth locking in. Added top-level (NULL) and nested array/map/struct NULL cases to both golden files (and the suite). Done in d578fbb.

Copy link
Copy Markdown
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the test gaps that @uros-b's mentioned are valid. Please add them before merging.

…mestamps in Hive results

Address review feedback on apache#56320:
- Add a pre-1970 base (1960-01-01) to exercise the negative-epoch path
  (negative epochMicros + positive nanosWithinMicro), in both HiveResultSuite
  and the timestamp-{ltz,ntz}-nanos.sql golden files.
- Add top-level and nested (array/map/struct) NULL cases. NULLs are handled by
  the generic `(null, _)` branch in `HiveResult.toHiveString`, but this locks in
  the behavior.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.7.0
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

Thanks for the review @peter-toth and @uros-b! Added the pre-1970 (negative-epoch) and NULL (top-level + nested) coverage in d578fbb, in both HiveResultSuite and the timestamp-{ltz,ntz}-nanos.sql golden files.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

The failure Run / Build modules: pyspark-connect-old-client is not related to this PR, I believe.

Merging to master/4.x. Thank you, @peter-toth @uros-b @dongjoon-hyun for review.

@MaxGekk MaxGekk closed this in 637803e Jun 5, 2026
MaxGekk added a commit that referenced this pull request Jun 5, 2026
…sults

### What changes were proposed in this pull request?
This PR modifies `HiveResult` to support the nanosecond-precision timestamp types `TIMESTAMP_LTZ(p)` (`TimestampLTZNanosType`) and `TIMESTAMP_NTZ(p)` (`TimestampNTZNanosType`), `p` in [7, 9]. Two cases are added to `HiveResult.toHiveStringDefault`, mirroring the existing microsecond timestamp cases:

- `(i: Instant, _: TimestampLTZNanosType)` -> rendered in the session time zone.
- `(l: LocalDateTime, _: TimestampNTZNanosType)` -> rendered zone-independently.

The external collected values are `Instant` (LTZ) and `LocalDateTime` (NTZ); they are converted to the physical `TimestampNanosVal` at the column precision and formatted with the nanosecond-aware `TimestampFormatter` (`formatNanos` / `formatWithoutTimeZoneNanos`, SPARK-57162), flooring sub-`p` digits and trimming trailing zeros. This is the same rendering used by casting these types to string (SPARK-57256), so Hive output stays consistent.

### Why are the changes needed?
Before the change, formatting a nanosecond timestamp column through `HiveResult` (e.g. end-to-end SQL / golden-file tests, spark-sql CLI, Thrift server output) hits the catch-all match and fails with a `MatchError`, analogous to the `TimeType` issue fixed in SPARK-51517:

```
scala.MatchError
(2020-01-01T00:00:00.123456789Z, TimestampLTZNanosType(9)) (of class scala.Tuple2)
```

### Does this PR introduce _any_ user-facing change?
Yes. It fixes the error above. After the change, nanosecond timestamp values are rendered as proper strings in Hive results (only reachable when `spark.sql.timestampNanosTypes.enabled=true`).

### How was this patch tested?
- New cases in `HiveResultSuite` covering `TIMESTAMP_LTZ(p)` / `TIMESTAMP_NTZ(p)` for `p` in [7, 9]: precision-driven fraction width, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ session-zone rendering vs. zone-independent NTZ, and nested (array/map/struct) values.
- New golden-file end-to-end tests `timestamp-ltz-nanos.sql` and `timestamp-ntz-nanos.sql` (as SPARK-51517 added `time.sql`), disabled in `ThriftServerQueryTestSuite`.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.7.0

Closes #56320 from MaxGekk/nanos-hiveresult.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 637803e)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants