Skip to content

Fix _split_cells to handle non-unit width characters correctly#4155

Open
nisha-muthurajan wants to merge 2 commits into
Textualize:masterfrom
nisha-muthurajan:fix/split-cells-non-unit-chars
Open

Fix _split_cells to handle non-unit width characters correctly#4155
nisha-muthurajan wants to merge 2 commits into
Textualize:masterfrom
nisha-muthurajan:fix/split-cells-non-unit-chars

Conversation

@nisha-muthurajan
Copy link
Copy Markdown

Fixes #3299

The previous implementation used a proportional heuristic to estimate the starting character position, which overshot for multi-cell characters like emoji. Replace with a linear scan that accumulates real cell widths.

Type of changes

-✅ Bug fix

  • New feature
  • Documentation / docstrings
  • ✅ Tests
  • Other

AI?

  • AI was used to generate this PR

AI generated PRs may be accepted, but only if @willmcgugan has responded on an issue or discussion.

Checklist

  • I've run the latest black with default args on new code.
  • I've updated CHANGELOG.md and CONTRIBUTORS.md where appropriate (see note about typos above).
  • ✅ I've added tests for new code.
  • ✅ I accept that @willmcgugan may be pedantic in the code review.

Description

Fixes #3299

Segment._split_cells used a proportional heuristic to guess the starting
character position:

pos = int((cut / cell_length) * len(text))

This overshot for multi-cell characters (emoji, CJK) because it assumed all
characters have equal width. The fallback loop then couldn't recover correctly,
producing wrong splits like ('🦊🦊 ', ' abcdef') instead of ('🦊 ', ' abcdef').

Fixed by replacing the heuristic + loop with a simple linear scan that
accumulates real cell widths character by character, stopping precisely at
the cut point.

Added a regression test test_split_cells_emoji covering the two examples
from the issue plus an exact-boundary case.

Fixes Textualize#3299

The previous implementation used a proportional heuristic to estimate
the starting character position, which overshot for multi-cell characters
like emoji. Replace with a linear scan that accumulates real cell widths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Segment._split_cells doesn't handle non-unit characters well

1 participant