Skip to content

[SPARK-57256][SQL] Cast nanosecond-precision timestamps to string#56317

Open
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:cast-nanos-to-string
Open

[SPARK-57256][SQL] Cast nanosecond-precision timestamps to string#56317
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:cast-nanos-to-string

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented Jun 4, 2026

What changes were proposed in this pull request?

Implement casting of the nanosecond-precision timestamp types TIMESTAMP_NTZ(p) (TimestampNTZNanosType) and TIMESTAMP_LTZ(p) (TimestampLTZNanosType), p in [7, 9], to STRING.

Casting is implemented in ToStringBase (mixed into Cast), so this change also fixes ToPrettyString (and therefore Dataset.show()) for these types via the shared base.

The change wires the SPARK-57162 formatter methods into the existing cast-to-string paths (interpreted and codegen):

  • TimestampLTZNanosType(p) -> TimestampFormatter.formatNanos(v, p) (renders in the session time zone).
  • TimestampNTZNanosType(p) -> TimestampFormatter.formatWithoutTimeZoneNanos(v, p) (zone-independent, UTC wall-clock grid).

The fractional-second precision p is taken from the source type; sub-p digits are floored and trailing zeros are trimmed, consistent with the microsecond cast path (both use FractionTimestampFormatter).

Cast.needsTimeZone is extended so that TimestampLTZNanosType -> StringType resolves the session time zone (mirroring TimestampType -> StringType); the NTZ variant does not need a time zone.

Why are the changes needed?

Today Cast permits these casts at analysis time (the generic (_, StringType) rule), but at runtime the nanosecond types have no dedicated case in ToStringBase and fall through to the default String.valueOf(...) branch, producing the internal form TimestampNanosVal(epochMicros, nanosWithinMicro) instead of a proper SQL timestamp string. Producing a correct textual representation is a prerequisite for nanosecond support in expressions, SHOW/pretty output, and downstream text-based sinks.

Does this PR introduce any user-facing change?

User-facing only when spark.sql.timestampNanosTypes.enabled=true; these types are not available otherwise. Casting to string never fails, so ANSI and non-ANSI modes behave identically.

With spark.sql.timestampNanosTypes.enabled=true:

SELECT CAST(ts AS STRING);
-- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
--   before: TimestampNanosVal(1577836800000000, 789)
--   after:  2020-01-01 00:00:00.123456789

How was this patch tested?

New cases in CastSuiteBase (run under both ANSI on/off; checkEvaluation exercises the interpreted and codegen paths): precision 7/8/9, trailing-zero trimming, nanosWithinMicro 0 and 999, LTZ time-zone shift under a non-UTC session zone vs. NTZ remaining unshifted, pre-epoch and year-9999 boundaries, and null input.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

@stevomitric @uros-b Could you review this PR, please.

Copy link
Copy Markdown
Member

@uros-b uros-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add some end-to-end SQL test coverage?

Copy link
Copy Markdown
Member

@uros-b uros-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be nice to add at least one complex type cast test (e.g. using array).

Copy link
Copy Markdown
Member

@uros-b uros-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @MaxGekk. Left a few comments, otherwise LGTM!

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

Shall we add some end-to-end SQL test coverage?

@uros-b I agree. It would be nice. This PR #56320 should allow to write tests in *.sql. How about to merge it at first.

@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

Also, it would be nice to add at least one complex type cast test (e.g. using array).

Added in a50de21. The new CastSuiteBase test "SPARK-57256: cast complex types with nanosecond timestamps to string" covers array<timestamp_ntz_nanos> and array<timestamp_ltz_nanos> (with a null element to exercise the recursive nullString path in ToStringBase), plus a struct nesting both nanosecond variants. These go through the same recursive element path, confirming the nanos cases work nested inside complex types.

MaxGekk added 6 commits June 5, 2026 21:17
### What changes were proposed in this pull request?
Implement casting of the nanosecond-precision timestamp types `TIMESTAMP_NTZ(p)`
(`TimestampNTZNanosType`) and `TIMESTAMP_LTZ(p)` (`TimestampLTZNanosType`),
`p` in [7, 9], to `STRING`.

Casting is implemented in `ToStringBase` (mixed into `Cast`), so this also fixes
`ToPrettyString` (and therefore `Dataset.show()`) for these types via the shared base.

The change wires the SPARK-57162 formatter methods into the existing cast-to-string
paths (interpreted and codegen):
- `TimestampLTZNanosType(p)` -> `TimestampFormatter.formatNanos(v, p)` (session time zone).
- `TimestampNTZNanosType(p)` -> `TimestampFormatter.formatWithoutTimeZoneNanos(v, p)`
  (zone-independent, UTC wall-clock grid).

The fractional-second precision `p` is taken from the source type; sub-`p` digits are
floored and trailing zeros are trimmed, consistent with the microsecond cast path (both
use `FractionTimestampFormatter`). `Cast.needsTimeZone` is extended so that
`TimestampLTZNanosType -> StringType` resolves the session time zone (mirroring
`TimestampType -> StringType`); the NTZ variant does not need a time zone.

### Why are the changes needed?
Today `Cast` permits these casts at analysis time (the generic `(_, StringType)` rule),
but at runtime the nanosecond types have no dedicated case in `ToStringBase` and fall
through to the default `String.valueOf(...)` branch, producing the internal form
`TimestampNanosVal(epochMicros, nanosWithinMicro)` instead of a proper SQL timestamp
string. A correct textual representation is a prerequisite for nanosecond support in
expressions, SHOW/pretty output, and downstream text-based sinks.

### Does this PR introduce _any_ user-facing change?
User-facing only when `spark.sql.timestampNanosTypes.enabled=true`; these types are not
available otherwise. Casting to string never fails, so ANSI and non-ANSI modes behave
identically.

With `spark.sql.timestampNanosTypes.enabled=true`:
```
SELECT CAST(ts AS STRING);
-- TIMESTAMP_NTZ(9) value 2020-01-01 00:00:00.123456789
--   before: TimestampNanosVal(1577836800000000, 789)
--   after:  2020-01-01 00:00:00.123456789
```

### How was this patch tested?
New cases in `CastSuiteBase` (run under both ANSI on/off; `checkEvaluation` exercises the
interpreted and codegen paths): precision 7/8/9, trailing-zero trimming, `nanosWithinMicro`
0 and 999, LTZ time-zone shift under a non-UTC session zone vs. NTZ remaining unshifted,
pre-epoch and year-9999 boundaries, and null input.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor
… complex-type cast coverage

Rename the two nanosecond pretty-string tests in ToPrettyStringSuite with the
`SPARK-57256:` prefix for traceability, matching the CastSuiteBase tests.

Add a cast-to-string test for complex types nesting nanosecond timestamps:
array<timestamp_ntz_nanos> / array<timestamp_ltz_nanos> (with a null element to
cover the recursive nullString path) and a struct nesting both variants.
…d add SQL tests

After rebasing onto the merged SPARK-57257, master routes interpreted
CAST(nanos AS STRING) through the Types Framework's zone-less format(),
which deliberately raised UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_TO_STRING
as a placeholder. That shadowed the zone-aware formatting added here.

Bypass TypeApiOps for the nanosecond timestamp types in
ToStringBase.castToString so they use the zone-aware castToStringDefault
cases (LTZ renders in the session time zone, mirroring the microsecond
timestamp types). Remove the now-dead codegen error case and drop the
obsolete TimestampNanosRowSuite test that asserted the error; positive
cast coverage lives in CastSuiteBase/ToPrettyStringSuite. The framework
format()/toSQLValue() still raise for the zone-less EXPLAIN / SQL-literal
paths.

Add end-to-end golden-file checks to cast.sql now that SPARK-57257 wired
HiveResult: precision flooring, trailing-zero trimming, nanosWithinMicro
boundaries, pre-1970 negative-epoch, complex types with nested NULL,
top-level NULL, and string-context use. All result columns are STRING.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.7.0
@MaxGekk MaxGekk force-pushed the cast-nanos-to-string branch from a50de21 to ec68a98 Compare June 5, 2026 20:03
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented Jun 5, 2026

@uros-b Now that #56320 (SPARK-57257, HiveResult support) is merged, I rebased onto it and added end-to-end golden-file coverage in cast.sql (so it runs in both ANSI and, via --IMPORT, non-ANSI). The new CAST(... AS STRING) cases cover precision flooring (7/8/9), trailing-zero trimming, nanosWithinMicro boundaries (.000000999, sub-precision floored to 0), pre-1970 negative-epoch values (1960-...), complex types with a nested NULL (array/map/struct), a top-level NULL, and use in a real string context (concat). Every result column is STRING (the nanos type is only intermediate), so no ThriftServerQueryTestSuite ignore entry is needed, and the nanos preview flag is already on by default under tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants