Skip to content

fix: skip readability after explicit article match#4

Merged
code-yeongyu merged 1 commit into
mainfrom
code-yeongyu/webfetch-entity-preservation
Jun 25, 2026
Merged

fix: skip readability after explicit article match#4
code-yeongyu merged 1 commit into
mainfrom
code-yeongyu/webfetch-entity-preservation

Conversation

@code-yeongyu

@code-yeongyu code-yeongyu commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Explicit article containers are now returned before falling back to Mozilla Readability parsing. This preserves the Tistory-style article body that the direct selector already identified, even if Readability would fail or choose surrounding page chrome.

Changes

  • Return the explicit article candidate before invoking Readability.parse().
  • Add a regression test that mocks Readability to throw and verifies explicit article conversion still succeeds.

QA / Evidence

  • npm run check: typecheck and Biome passed.
    • Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-npm-run-check-after-explicit.txt
  • npm test: full Vitest suite passed, 4 files / 35 tests.
    • Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-vitest-full-after-explicit.txt
  • Focused regression: npm test -- test/webfetch-explicit-article.test.ts passed.
    • Artifact: /Users/yeongyu/local-workspaces/senpi/.omo/evidence/webfetch-tistory-worktree-pr/external-explicit-article-green.txt

Risks

Low. The change only short-circuits after an existing explicit article selector has already passed the minimum text threshold; generic pages still use the existing Readability fallback.

Secret safety

Evidence contains command output only. No tokens, auth headers, cookies, or credential-bearing logs are included.


Summary by cubic

Return explicit article containers before running @mozilla/readability to preserve Tistory-style article bodies and avoid failures when Readability mis-parses or throws. Fixes cases where page chrome was selected or parsing failed.

  • Bug Fixes
    • Short-circuit to the explicit article when it meets the text threshold; skip Readability.parse().
    • Add a regression test that mocks @mozilla/readability and verifies skipping while producing the expected markdown.

Written for commit a5a65a7. Summary will update on new commits.

Review in cubic

@code-yeongyu code-yeongyu merged commit ce0d63a into main Jun 25, 2026
6 checks passed
@code-yeongyu code-yeongyu deleted the code-yeongyu/webfetch-entity-preservation branch June 25, 2026 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant