Skip to content

fix: preserve Tistory article bodies#2

Merged
code-yeongyu merged 1 commit into
mainfrom
code-yeongyu/tistory-reader-mode
Jun 25, 2026
Merged

fix: preserve Tistory article bodies#2
code-yeongyu merged 1 commit into
mainfrom
code-yeongyu/tistory-reader-mode

Conversation

@code-yeongyu

@code-yeongyu code-yeongyu commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes the reader-mode extraction path for Tistory-style posts where Mozilla Readability can choose surrounding blog chrome instead of the article body. The fetcher now also keeps browser-shaped navigation headers without falling back to a bot-like retry identity.

Changes

  • Prefer explicit article containers used by Tistory and common blog themes, while stripping related-post/sidebar/comment/ad chrome from the cloned article node.
  • Preserve readable line breaks by converting HTML fragments through a DOM pass before falling back to regex cleanup.
  • Use undici.request with browser navigation headers, manual redirect handling, bounded body reads, and no Cloudflare challenge retry with pi-webfetch user-agent.
  • Add regression tests for Tistory wrappers, newline preservation, navigation headers, Cloudflare no-retry behavior, and redirect-limit body handling.

QA / Evidence

  • npx vitest --run test/webfetch.test.ts: 15 tests passed. Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-verification-after-discard-fix.txt.
  • npm run check: tsgo --noEmit && biome check . passed. Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-verification-after-discard-fix.txt.
  • Public Tistory smoke was captured earlier in the ULW evidence set and confirmed the extracted text included article content and excluded another_category. Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-live-tistory.txt.

Risks

  • The explicit article selector path is intentionally conservative: it only wins when the cloned article text is at least 30 characters, then Readability remains the fallback.
  • Browser headers may need periodic UA refresh later, but this PR keeps the header shape covered by tests.

Secret safety

No secret-bearing logs, tokens, auth headers, cookies, or credentials are included in the evidence or PR body.


Summary by cubic

Fixes Tistory reader-mode by preferring real article containers and preserving line breaks, so posts no longer pick sidebar/category chrome. Also switches to browser-shaped fetching via undici.request for more stable loads and safer redirects.

  • Bug Fixes
    • Prefer known Tistory/common blog selectors; strip sidebars, comments, ads; fall back to Readability only if needed.
    • Preserve readable line breaks using a DOM pass for text and normalize markdown spacing.
    • Use browser navigation headers with undici.request, manual redirect handling, bounded body reads, and no bot-identity retry on Cloudflare challenges.

Written for commit 83266e8. Summary will update on new commits.

Review in cubic

@code-yeongyu code-yeongyu merged commit c65eb2d into main Jun 25, 2026
6 checks passed
@code-yeongyu code-yeongyu deleted the code-yeongyu/tistory-reader-mode branch June 25, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant