Skip to content

fix: preserve literal HTML entities#3

Merged
code-yeongyu merged 1 commit into
mainfrom
code-yeongyu/webfetch-entity-preservation
Jun 25, 2026
Merged

fix: preserve literal HTML entities#3
code-yeongyu merged 1 commit into
mainfrom
code-yeongyu/webfetch-entity-preservation

Conversation

@code-yeongyu

@code-yeongyu code-yeongyu commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes a follow-up content-fidelity issue in the reader-mode normalizer. DOM and Turndown paths already decode HTML once, so normalization must not decode a second time and turn literal examples such as <custom-element> into active-looking tags.

Changes

  • Keep entity decoding only in the raw regex fallback path, where HTML has not been parsed by DOM/Turndown.
  • Add a regression covering markdown and text extraction for literal entity examples.

QA / Evidence

  • RED: npx vitest --run test/webfetch.test.ts failed before the fix on literal entity preservation. Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-entity-red.txt.
  • GREEN: npx vitest --run test/webfetch.test.ts passed 16 tests and npm run check passed. Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-entity-green.txt.

Risks

  • The raw fallback still decodes once after tag stripping, preserving existing behavior for simple entities in non-DOM fallback scenarios.

Secret safety

No secret-bearing logs, tokens, auth headers, cookies, or credentials are included.


Summary by cubic

Fixes double-decoding of HTML entities in reader-mode normalization so literals like <custom-element> and AT&T stay literal in markdown and text outputs.

  • Bug Fixes
    • Remove entity decoding from markdown/plain-text normalization; decode only in the raw fallback path.
    • Normalize whitespace separately in the fallback, then decode once after tag stripping.
    • Add regression test covering markdown and text to ensure one decode layer is preserved.

Written for commit 971ac6b. Summary will update on new commits.

Review in cubic

@code-yeongyu code-yeongyu merged commit 1757b3e into main Jun 25, 2026
6 checks passed
@code-yeongyu code-yeongyu deleted the code-yeongyu/webfetch-entity-preservation branch June 25, 2026 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant