[CASSANDRA-21349] Fix backslash doubling in COPY TO/FROM round-trip on Python 3.10+#4780
Open
howiezhao wants to merge 1 commit intoapache:trunkfrom
Open
[CASSANDRA-21349] Fix backslash doubling in COPY TO/FROM round-trip on Python 3.10+#4780howiezhao wants to merge 1 commit intoapache:trunkfrom
howiezhao wants to merge 1 commit intoapache:trunkfrom
Conversation
Contributor
|
Do you have a jira for this? |
Contributor
|
@howiezhao can you write test cases? refer to test_copyutil.py |
Author
|
Hi @omniCoder77
Nope, should I create one?
Thanks for the reminder, I've already added it. |
Contributor
Contributor
|
@howiezhao after ticket creation, update the PR description to include the link to jira ticket |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Jira Link: https://issues.apache.org/jira/browse/CASSANDRA-21349
Issue
COPY TOfollowed byCOPY FROMcorrupts string values containing backslashes on Python 3.10+: the backslash count doubles on every round-trip. For example,https:\/\/google.combecomeshttps:\\/\\/google.comafter one cycle.RC
There are two independent backslash-escaping layers in the COPY pipeline:
format_value_text(pre-doubling, always present): doubles every\→\\before passing tocsv.writercsv.writerescapechar handling (Python-version-dependent): Python 3.10 fixed bpo-12178, makingcsv.writerproperly escape\→\\in all fieldsBefore Python 3.10,
csv.writerdid not escape bare backslashes in unquoted fields, so the pre-doubling fromformat_value_textwas exactly cancelled bycsv.readeron import. After Python 3.10, both layers apply, butcsv.readeronly removes one — leaving the value with doubled backslashes.Fix
In
ExportProcess.format_value, undoformat_value_text's pre-doubling on Python 3.10+ before handing the value tocsv.writer, makingcsv.writerthe sole escaping layer:No change for Python < 3.10.