-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add TimestampWithOffset canonical extension type
#8743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add TimestampWithOffset canonical extension type
#8743
Conversation
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty straightforward and reasonable
TimestampWithOffset extension typeTimestampWithOffset extension type
|
There are several TODOs in this PRs description. Is that intended? |
|
@alamb We're currently still discussing the FORMAT in the For
Realistically, even after the FORMAT PR goes into the spec I think this will take me a few more weeks to fully flesh out. |
|
Thanks for the clarification @serramatutu -- I'll mark it as a draft again then |
…48002) ### Rationale for this change Closes #44248 Arrow has no built-in canonical way of representing the `TIMESTAMP WITH TIME ZONE` SQL type, which is present across multiple different database systems. Not having a native way to represent this forces users to either convert to UTC and drop the time zone, which may have correctness implications, or use bespoke workarounds. A new `arrow.timestamp_with_offset` extension type would introduce a standard canonical way of representing that information. Rust implementation: apache/arrow-rs#8743 Go implementation: apache/arrow-go#558 [DISCUSS] [thread in the mailing list](https://lists.apache.org/thread/yhbr3rj9l59yoxv92o2s6dqlop16sfnk). ### What changes are included in this PR? Proposal and documentation for `arrow.timestamp_with_offset` canonical extension type. ### Are these changes tested? N/A ### Are there any user-facing changes? Yes, this is an extension to the arrow format. * GitHub Issue: #44248 --------- Co-authored-by: David Li <li.davidm96@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
This commit adds a new `TimestampWithOffset` extension type. This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes.
d187bb9 to
f6cda58
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @serramatutu and @westonpace
| /// The extension type for `TimestampWithOffset`. | ||
| /// | ||
| /// <https://arrow.apache.org/docs/format/CanonicalExtensions.html#timestamp-with-offset> | ||
| TimestampWithOffset(TimestampWithOffset), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a pub enum, unfortunately this will need to wait until the next major arrow release (in Jan):
TimestampWithOffset extension typeTimestampWithOffset canonical extension type
|
Looks like we got a CI flake and need to rerun:
|
| /// This type represents a timestamp column that stores potentially different timezone offsets per value. | ||
| /// The timestamp is stored in UTC alongside the original timezone offset in minutes. This extension type | ||
| /// is intended to be compatible with ANSI SQL's `TIMESTAMP WITH TIME ZONE`, which is supported by multiple | ||
| /// database engines. | ||
| /// | ||
| /// The storage type of the extension is a `Struct` with 2 fields, in order: | ||
| /// - `timestamp`: a non-nullable `Timestamp(time_unit, "UTC")`, where `time_unit` is any Arrow `TimeUnit` (s, ms, us or ns). | ||
| /// - `offset_minutes`: a non-nullable signed 16-bit integer (`Int16`) representing the offset in minutes | ||
| /// from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent | ||
| /// east. Offsets normally range from -779 (-12:59) to +780 (+13:00). | ||
| /// | ||
| /// This type has no type parameters. | ||
| /// | ||
| /// Metadata is either empty or an empty string. | ||
| /// | ||
| /// It is also *permissible* for the `offset_minutes` field to be dictionary-encoded with a preferred (*but not required*) | ||
| /// index type of `int8`, or run-end-encoded with a preferred (*but not required*) runs type of `int8`. | ||
| /// | ||
| /// It's worth noting that the data source needs to resolve timezone strings such as `UTC` or | ||
| /// `Americas/Los_Angeles` into an offset in minutes in order to construct a `TimestampWithOffset`. | ||
| /// This makes `TimestampWithOffset` type "lossy" in the sense that any original "unresolved" | ||
| /// timezone string gets lost in this conversion. It's a tradeoff for optimizing the row | ||
| /// representation and simplifying the client code, which does not need to know how to convert | ||
| /// from timezone string to its corresponding offset in minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you gq the paragraphs on vim? Or the equivalent on your editor? Keeping them around 70 characters with uniform length.
Which issue does this PR close?
Implement
arrow.timestamp_with_offsetcanonical extension type.Rationale for this change
Be compliant with Arrow spec: apache/arrow#48002
What changes are included in this PR?
This commit adds a new
TimestampWithOffsetextension type. This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, this is a new canonical extension type.