[SPARK-57256][SQL] Cast nanosecond-precision timestamps to string by MaxGekk · Pull Request #56317 · apache/spark

MaxGekk · 2026-06-04T08:23:52Z

What changes were proposed in this pull request?

Implement casting of the nanosecond-precision timestamp types TIMESTAMP_NTZ(p) (TimestampNTZNanosType) and TIMESTAMP_LTZ(p) (TimestampLTZNanosType), p in [7, 9], to STRING.

Casting is implemented in ToStringBase (mixed into Cast), so this change also fixes ToPrettyString (and therefore Dataset.show()) for these types via the shared base.

The change wires the SPARK-57162 formatter methods into the existing cast-to-string paths (interpreted and codegen):

TimestampLTZNanosType(p) -> TimestampFormatter.formatNanos(v, p) (renders in the session time zone).
TimestampNTZNanosType(p) -> TimestampFormatter.formatWithoutTimeZoneNanos(v, p) (zone-independent, UTC wall-clock grid).

The fractional-second precision p is taken from the source type; sub-p digits are floored and trailing zeros are trimmed, consistent with the microsecond cast path (both use FractionTimestampFormatter).

Cast.needsTimeZone is extended so that TimestampLTZNanosType -> StringType resolves the session time zone (mirroring TimestampType -> StringType); the NTZ variant does not need a time zone.

Why are the changes needed?

Today Cast permits these casts at analysis time (the generic (_, StringType) rule), but at runtime the nanosecond types have no dedicated case in ToStringBase and fall through to the default String.valueOf(...) branch, producing the internal form TimestampNanosVal(epochMicros, nanosWithinMicro) instead of a proper SQL timestamp string. Producing a correct textual representation is a prerequisite for nanosecond support in expressions, SHOW/pretty output, and downstream text-based sinks.

Does this PR introduce any user-facing change?

User-facing only when spark.sql.timestampNanosTypes.enabled=true; these types are not available otherwise. Casting to string never fails, so ANSI and non-ANSI modes behave identically.

With spark.sql.timestampNanosTypes.enabled=true:

SELECT CAST(ts AS STRING);
-- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
--   before: TimestampNanosVal(1577836800000000, 789)
--   after:  2020-01-01 00:00:00.123456789

How was this patch tested?

New cases in CastSuiteBase (run under both ANSI on/off; checkEvaluation exercises the interpreted and codegen paths): precision 7/8/9, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ time-zone shift under a non-UTC session zone vs. NTZ remaining unshifted, pre-epoch and year-9999 boundaries, and null input.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

MaxGekk · 2026-06-05T09:35:36Z

@stevomitric @uros-b Could you review this PR, please.

uros-b

Shall we add some end-to-end SQL test coverage?

uros-b

Also, it would be nice to add at least one complex type cast test (e.g. using array).

uros-b

Thanks for the changes @MaxGekk. Left a few comments, otherwise LGTM!

MaxGekk · 2026-06-05T09:45:29Z

Shall we add some end-to-end SQL test coverage?

@uros-b I agree. It would be nice. This PR #56320 should allow to write tests in *.sql. How about to merge it at first.

MaxGekk · 2026-06-05T14:15:20Z

Also, it would be nice to add at least one complex type cast test (e.g. using array).

Added in a50de21. The new CastSuiteBase test "SPARK-57256: cast complex types with nanosecond timestamps to string" covers array<timestamp_ntz_nanos> and array<timestamp_ltz_nanos> (with a null element to exercise the recursive nullString path in ToStringBase), plus a struct nesting both nanosecond variants. These go through the same recursive element path, confirming the nanos cases work nested inside complex types.

### What changes were proposed in this pull request? Implement casting of the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)` (`TimestampNTZNanosType`) and `TIMESTAMP_LTZ(p)` (`TimestampLTZNanosType`), `p` in [7, 9], to `STRING`. Casting is implemented in `ToStringBase` (mixed into `Cast`), so this also fixes `ToPrettyString` (and therefore `Dataset.show()`) for these types via the shared base. The change wires the SPARK-57162 formatter methods into the existing cast-to-string paths (interpreted and codegen): - `TimestampLTZNanosType(p)` -> `TimestampFormatter.formatNanos(v, p)` (session time zone). - `TimestampNTZNanosType(p)` -> `TimestampFormatter.formatWithoutTimeZoneNanos(v, p)` (zone-independent, UTC wall-clock grid). The fractional-second precision `p` is taken from the source type; sub-`p` digits are floored and trailing zeros are trimmed, consistent with the microsecond cast path (both use `FractionTimestampFormatter`). `Cast.needsTimeZone` is extended so that `TimestampLTZNanosType -> StringType` resolves the session time zone (mirroring `TimestampType -> StringType`); the NTZ variant does not need a time zone. ### Why are the changes needed? Today `Cast` permits these casts at analysis time (the generic `(_, StringType)` rule), but at runtime the nanosecond types have no dedicated case in `ToStringBase` and fall through to the default `String.valueOf(...)` branch, producing the internal form `TimestampNanosVal(epochMicros, nanosWithinMicro)` instead of a proper SQL timestamp string. A correct textual representation is a prerequisite for nanosecond support in expressions, SHOW/pretty output, and downstream text-based sinks. ### Does this PR introduce _any_ user-facing change? User-facing only when `spark.sql.timestampNanosTypes.enabled=true`; these types are not available otherwise. Casting to string never fails, so ANSI and non-ANSI modes behave identically. With `spark.sql.timestampNanosTypes.enabled=true`: ``` SELECT CAST(ts AS STRING); -- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789 -- before: TimestampNanosVal(1577836800000000, 789) -- after: 2020-01-01 00:00:00.123456789 ``` ### How was this patch tested? New cases in `CastSuiteBase` (run under both ANSI on/off; `checkEvaluation` exercises the interpreted and codegen paths): precision 7/8/9, trailing-zero trimming, `nanosWithinMicro` 0 and 999, LTZ time-zone shift under a non-UTC session zone vs. NTZ remaining unshifted, pre-epoch and year-9999 boundaries, and null input. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor

… string Co-authored-by: Max Gekk

…tamp types Co-authored-by: Max Gekk

…-string sweep Co-authored-by: Max Gekk

… complex-type cast coverage Rename the two nanosecond pretty-string tests in ToPrettyStringSuite with the `SPARK-57256:` prefix for traceability, matching the CastSuiteBase tests. Add a cast-to-string test for complex types nesting nanosecond timestamps: array<timestamp_ntz_nanos> / array<timestamp_ltz_nanos> (with a null element to cover the recursive nullString path) and a struct nesting both variants.

…d add SQL tests After rebasing onto the merged SPARK-57257, master routes interpreted CAST(nanos AS STRING) through the Types Framework's zone-less format(), which deliberately raised UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING as a placeholder. That shadowed the zone-aware formatting added here. Bypass TypeApiOps for the nanosecond timestamp types in ToStringBase.castToString so they use the zone-aware castToStringDefault cases (LTZ renders in the session time zone, mirroring the microsecond timestamp types). Remove the now-dead codegen error case and drop the obsolete TimestampNanosRowSuite test that asserted the error; positive cast coverage lives in CastSuiteBase/ToPrettyStringSuite. The framework format()/toSQLValue() still raise for the zone-less EXPLAIN / SQL-literal paths. Add end-to-end golden-file checks to cast.sql now that SPARK-57257 wired HiveResult: precision flooring, trailing-zero trimming, nanosWithinMicro boundaries, pre-1970 negative-epoch, complex types with nested NULL, top-level NULL, and string-context use. All result columns are STRING. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 1.7.0

MaxGekk · 2026-06-05T20:06:00Z

@uros-b Now that #56320 (SPARK-57257, HiveResult support) is merged, I rebased onto it and added end-to-end golden-file coverage in cast.sql (so it runs in both ANSI and, via --IMPORT, non-ANSI). The new CAST(... AS STRING) cases cover precision flooring (7/8/9), trailing-zero trimming, nanosWithinMicro boundaries (.000000999, sub-precision floored to 0), pre-1970 negative-epoch values (1960-...), complex types with a nested NULL (array/map/struct), a top-level NULL, and use in a real string context (concat). Every result column is STRING (the nanos type is only intermediate), so no ThriftServerQueryTestSuite ignore entry is needed, and the nanos preview flag is already on by default under tests.

uros-b reviewed Jun 5, 2026

View reviewed changes

Comment thread sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ToPrettyStringSuite.scala Outdated

uros-b reviewed Jun 5, 2026

View reviewed changes

Comment thread sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ToPrettyStringSuite.scala Outdated

uros-b reviewed Jun 5, 2026

View reviewed changes

uros-b approved these changes Jun 5, 2026

View reviewed changes

MaxGekk added 6 commits June 5, 2026 21:17

[SPARK-57256][SQL] Add DST transition test for nanosecond LTZ cast to…

39a0261

… string Co-authored-by: Max Gekk

[SPARK-57256][SQL] Add ToPrettyStringSuite tests for nanosecond times…

76a1be2

…tamp types Co-authored-by: Max Gekk

[SPARK-57256][SQL] Include nanosecond timestamp types in null-cast-to…

f40005f

…-string sweep Co-authored-by: Max Gekk

MaxGekk force-pushed the cast-nanos-to-string branch from a50de21 to ec68a98 Compare June 5, 2026 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57256][SQL] Cast nanosecond-precision timestamps to string#56317

[SPARK-57256][SQL] Cast nanosecond-precision timestamps to string#56317
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:cast-nanos-to-string

MaxGekk commented Jun 4, 2026

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

uros-b left a comment

Uh oh!

uros-b left a comment

Uh oh!

uros-b left a comment

Uh oh!

MaxGekk commented Jun 5, 2026 •

edited

Loading

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MaxGekk commented Jun 4, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

Uh oh!

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

MaxGekk commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGekk commented Jun 5, 2026 •

edited

Loading