Fix precision loss for large CQL timestamp values#636
Fix precision loss for large CQL timestamp values#636
Conversation
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
|
@copilot please complete the work on the PR |
The PR is now complete. All changes have been implemented and tested: Changes made (commit fbab5cd):
Testing results:
The fix ensures that timestamps like "2300-01-01 00:00:00.001" can be correctly round-tripped through serialization/deserialization without losing precision. |
cassandra/util.py
Outdated
| # Handle negative timestamps correctly | ||
| if remainder_ms < 0: | ||
| remainder_ms += 1000 | ||
| timestamp_seconds -= 1 |
There was a problem hiding this comment.
@copilot When can remainder be negative? Even for negative values it is positive:
>>> -2500 % 1000
500There was a problem hiding this comment.
You're absolutely right! Python's modulo operator always returns a non-negative result when the divisor is positive. The negative check was unnecessary. Removed in commit 93ad0f2.
Python's modulo operator always returns non-negative result when divisor is positive, regardless of dividend sign. The check for remainder_ms < 0 is therefore unnecessary in both Python and Cython implementations. Co-authored-by: Lorak-mmk <5958117+Lorak-mmk@users.noreply.github.com>
| from libc.stdint cimport int64_t | ||
| cdef datetime_from_timestamp(double timestamp) |
There was a problem hiding this comment.
@copilot Is datetime_from_timestamp used anywhere after you added datetime_from_timestamp_ms? Please check, and if it isn't - remove it.
There was a problem hiding this comment.
datetime_from_timestamp is still used in util.py by datetime_from_uuid1() and the Date class methods. It accepts seconds (not milliseconds) and cannot be removed. However, the unused import in deserializers.pyx was removed in commit 38911a3.
| def datetime_from_timestamp_ms(timestamp_ms): | ||
| """ | ||
| Creates a timezone-agnostic datetime from timestamp in milliseconds. | ||
| Avoids floating-point conversion to maintain precision for large timestamps. | ||
|
|
||
| Works around precision loss issues with large timestamps (far from epoch) | ||
| by using integer arithmetic throughout. | ||
|
|
||
| :param timestamp_ms: a unix timestamp, in milliseconds (as integer) | ||
| """ | ||
| # Break down milliseconds into components to avoid float conversion | ||
| # Python's % operator always returns non-negative result for positive divisor | ||
| timestamp_seconds = timestamp_ms // 1000 | ||
| remainder_ms = timestamp_ms % 1000 | ||
| microseconds = remainder_ms * 1000 | ||
| dt = DATETIME_EPOC + datetime.timedelta(seconds=timestamp_seconds, microseconds=microseconds) | ||
| return dt | ||
|
|
||
|
|
There was a problem hiding this comment.
@copilot ditto, is datetime_from_timestamp above used anywhere?
There was a problem hiding this comment.
Yes, datetime_from_timestamp is still used in util.py by datetime_from_uuid1() (line 124) and Date.date() / Date.__str__() methods (lines 979, 1021). It handles seconds-based timestamps, while the new datetime_from_timestamp_ms handles milliseconds-based timestamps for DateType serialization.
- Removed unused import of datetime_from_timestamp from deserializers.pyx (datetime_from_timestamp is still used in util.py by UUID and Date utilities) - Added integration test for large timestamp precision in test_types.py that verifies round-trip identity for timestamps far from epoch (year 2300, 1640, 2500) Co-authored-by: Lorak-mmk <5958117+Lorak-mmk@users.noreply.github.com>
Added integration test in commit 38911a3. The test verifies round-trip identity for large timestamps (years 2300, 1640, 2500) on a real cluster, ensuring millisecond precision is maintained through serialization and deserialization. |
Pre-review checklist
./docs/source/.Description
CQL
timestampvalues (signed milliseconds from epoch) lose precision when they are further than ~300 years from Unix epoch due to floating-point conversions during serialization/deserialization. This causes issues where a timestamp retrieved from the database cannot be used to query for the same row.Problem
The driver was converting int64 milliseconds to floating-point values during serialization/deserialization, causing precision loss for timestamps more than ~300 years from the Unix epoch (the maximum number of milliseconds that can be exactly represented in a double).
Example scenario:
Root Causes
*1e3,/1e3) to calculate millisecondsSolution
Implemented integer-only arithmetic throughout the timestamp conversion path:
Added new conversion functions:
datetime_from_timestamp_ms()incassandra/util.py(Python path)datetime_from_timestamp_ms()incassandra/cython_utils.pyx(Cython path)datetime_from_timestamp()function is preserved as it's still used by UUID utilities (datetime_from_uuid1()) and theDateclass for seconds-based conversionsUpdated DateType serialization/deserialization:
cassandra/cqltypes.py: ChangedDateType.deserialize()to call the new function directly on milliseconds (no division)cassandra/cqltypes.py: ChangedDateType.serialize()to use integer operations (*1000,//1000) instead of float operations (*1e3,/1e3)cassandra/deserializers.pyx: Updated Cython deserializer to use the new integer-based function and removed unused importCode cleanup:
datetime_from_timestamp_ms()- Python's modulo operator always returns non-negative results for positive divisorsdatetime_from_timestampimport fromcassandra/deserializers.pyxAdded comprehensive test coverage:
tests/unit/test_timestamp_precision.pywith 4 unit tests:test_large_timestamp_precision()intests/integration/standard/test_types.pythat verifies round-trip identity for large timestamps (years 2300, 1640, 2500) on a real clusterTesting
The changes are minimal and surgical, affecting only the timestamp conversion logic without altering any other functionality.
Original prompt
timestamplose precision due to some floating-point conversions along the way #532💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.