Add support for datetime, date, and frozenset to FastPrimitivesCoder#38590
Add support for datetime, date, and frozenset to FastPrimitivesCoder#38590mattalbr wants to merge 3 commits into
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the FastPrimitivesCoder by adding native support for datetime, date, and frozenset types. By moving away from the generic pickling fallback for these common types, the changes significantly improve serialization efficiency and reduce the amount of data transmitted over the wire. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for datetime.datetime, datetime.date, and frozenset types in the FastPrimitivesCoder, utilizing RFC 9557 for lossless datetime timezone encoding. It also includes extensive formatting updates, such as quote normalization. Review feedback points out critical compatibility issues with Python 3.8 due to the direct use of the zoneinfo module, which requires conditional imports and availability checks. Additionally, the encoding logic for datetime and date needs to be corrected to properly handle nested contexts by using a single write operation and correctly passing the nested parameter to the stream.
| elif t is datetime.datetime: | ||
| # We use RFC 9557 for lossless encoding of timezone info. | ||
| stream.write_byte(DATETIME_TYPE) | ||
| stream.write(value.isoformat().encode("utf-8")) | ||
| if value.tzinfo is not None and type( | ||
| value.tzinfo) is not datetime.timezone: | ||
| stream.write(f"[{value.tzinfo}]".encode("utf-8")) | ||
| if type( | ||
| value.tzinfo) is datetime.timezone and (tzname := | ||
| value.tzname()) is not None: | ||
| stream.write(f"[tzn={tzname}]".encode("utf-8")) | ||
| if value.fold != 0: | ||
| stream.write(f"[f={value.fold}]".encode("utf-8")) |
There was a problem hiding this comment.
The current implementation of datetime encoding has two issues:
- It performs multiple
stream.writecalls. Ifnested=True, each call would potentially create a separate length-prefixed block (depending on theOutputStreamimplementation), but the decoder uses a singlestream.read_all(nested)call, which would only read the first part. - It omits the
nestedparameter in thestream.writecalls, which will cause a mismatch with the decoder'sread_all(nested)when the coder is used in a nested context (e.g., as an element in a list).
You should construct the full string first and write it once using the nested parameter.
| elif t is datetime.datetime: | |
| # We use RFC 9557 for lossless encoding of timezone info. | |
| stream.write_byte(DATETIME_TYPE) | |
| stream.write(value.isoformat().encode("utf-8")) | |
| if value.tzinfo is not None and type( | |
| value.tzinfo) is not datetime.timezone: | |
| stream.write(f"[{value.tzinfo}]".encode("utf-8")) | |
| if type( | |
| value.tzinfo) is datetime.timezone and (tzname := | |
| value.tzname()) is not None: | |
| stream.write(f"[tzn={tzname}]".encode("utf-8")) | |
| if value.fold != 0: | |
| stream.write(f"[f={value.fold}]".encode("utf-8")) | |
| elif t is datetime.datetime: | |
| # We use RFC 9557 for lossless encoding of timezone info. | |
| stream.write_byte(DATETIME_TYPE) | |
| dt_str = value.isoformat() | |
| if value.tzinfo is not None and type(value.tzinfo) is not datetime.timezone: | |
| dt_str += f"[{value.tzinfo}]" | |
| if type(value.tzinfo) is datetime.timezone and (tzname := value.tzname()) is not None: | |
| dt_str += f"[tzn={tzname}]" | |
| if value.fold != 0: | |
| dt_str += f"[f={value.fold}]" | |
| stream.write(dt_str.encode("utf-8"), nested) |
There was a problem hiding this comment.
nested defaults to False and this matches the style of the other implementations which also omit the nested parameter when not needed.
It's also likely more efficient to write to the stream multiple times which has a byte buffer vs constructing new immutable strings
| elif t is datetime.date: | ||
| stream.write_byte(DATE_TYPE) | ||
| stream.write(value.isoformat().encode("utf-8")) |
There was a problem hiding this comment.
The stream.write call for date is missing the nested parameter. This will cause a mismatch with the decoder's read_all(nested) when this coder is used in a nested context.
| elif t is datetime.date: | |
| stream.write_byte(DATE_TYPE) | |
| stream.write(value.isoformat().encode("utf-8")) | |
| elif t is datetime.date: | |
| stream.write_byte(DATE_TYPE) | |
| stream.write(value.isoformat().encode("utf-8"), nested) |
There was a problem hiding this comment.
nested defaults to False and this matches the style of the other implementations which also omit the nested parameter when not needed
| import sys | ||
| import textwrap | ||
| import unittest | ||
| import zoneinfo |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Significantly reduces bytes written to the wire for datetime, date and frozenset which previously would have fallen through to pickling.
Fixes #38549
Sorry for the significant formatting changes, they appear to be a result of the yapf pre-commit. Let me know if you'd like me to keep them or not.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.