Skip to content

[CASSANDRA-21349] Fix backslash doubling in COPY TO/FROM round-trip on Python 3.10+#4780

Open
howiezhao wants to merge 1 commit intoapache:trunkfrom
howiezhao:patch-1
Open

[CASSANDRA-21349] Fix backslash doubling in COPY TO/FROM round-trip on Python 3.10+#4780
howiezhao wants to merge 1 commit intoapache:trunkfrom
howiezhao:patch-1

Conversation

@howiezhao
Copy link
Copy Markdown

@howiezhao howiezhao commented Apr 30, 2026

Jira Link: https://issues.apache.org/jira/browse/CASSANDRA-21349

Issue

COPY TO followed by COPY FROM corrupts string values containing backslashes on Python 3.10+: the backslash count doubles on every round-trip. For example, https:\/\/google.com becomes https:\\/\\/google.com after one cycle.

RC

There are two independent backslash-escaping layers in the COPY pipeline:

  1. format_value_text (pre-doubling, always present): doubles every \\\ before passing to csv.writer
  2. csv.writer escapechar handling (Python-version-dependent): Python 3.10 fixed bpo-12178, making csv.writer properly escape \\\ in all fields

Before Python 3.10, csv.writer did not escape bare backslashes in unquoted fields, so the pre-doubling from format_value_text was exactly cancelled by csv.reader on import. After Python 3.10, both layers apply, but csv.reader only removes one — leaving the value with doubled backslashes.

Fix

In ExportProcess.format_value, undo format_value_text's pre-doubling on Python 3.10+ before handing the value to csv.writer, making csv.writer the sole escaping layer:

if sys.version_info >= (3, 10):
    formatted = formatted.replace('\\\\', '\\')

No change for Python < 3.10.

Screenshot 2026-04-30 at 17 46 36

@omniCoder77
Copy link
Copy Markdown
Contributor

Do you have a jira for this?

@omniCoder77
Copy link
Copy Markdown
Contributor

@howiezhao can you write test cases? refer to test_copyutil.py

@howiezhao
Copy link
Copy Markdown
Author

Hi @omniCoder77

Do you have a jira for this?

Nope, should I create one?

@howiezhao can you write test cases? refer to test_copyutil.py

Thanks for the reminder, I've already added it.

@omniCoder77
Copy link
Copy Markdown
Contributor

@omniCoder77
Copy link
Copy Markdown
Contributor

@howiezhao after ticket creation, update the PR description to include the link to jira ticket

@howiezhao howiezhao changed the title Fix backslash doubling in COPY TO/FROM round-trip on Python 3.10+ [CASSANDRA-21349] Fix backslash doubling in COPY TO/FROM round-trip on Python 3.10+ May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants