Skip to content

fix(fetch): reject binary responses and harden HTML sniffing#2690

Open
amitksingh1490 wants to merge 5 commits intomainfrom
cp-9246df63b
Open

fix(fetch): reject binary responses and harden HTML sniffing#2690
amitksingh1490 wants to merge 5 commits intomainfrom
cp-9246df63b

Conversation

@amitksingh1490
Copy link
Contributor

@amitksingh1490 amitksingh1490 commented Mar 25, 2026

Summary

Reject binary responses in fetch and harden HTML sniffing to avoid UTF-8 boundary panics.

Context

The fetch tool is designed for text-based web content. Binary endpoints (for example archive downloads) can produce unreadable output and confusing failures when treated as text. The initial HTML sniffing logic also sliced by byte index, which could panic on multibyte UTF-8 boundaries.

Changes

  • Added binary content-type detection before response body parsing
  • Returned a clear, actionable error for binary responses with a suggested curl -fLo command
  • Made HTML sniffing UTF-8-safe by selecting a valid character boundary before slicing
  • Added unit tests for text allowlist, binary detection, and case-insensitive content types
  • Updated the net fetch tool description to explicitly document binary-content behavior and fallback guidance

Key Implementation Details

Binary detection is implemented via is_binary_content_type, which allowlists text-oriented content types (text/*, JSON, XML, JavaScript, YAML, TOML, CSV, HTML, SVG, Markdown, and empty content-type) and treats all others as binary.

Use Cases

  • Fetching a .tar.gz URL now fails fast with clear guidance instead of trying to parse binary bytes as text
  • Responses containing multibyte UTF-8 characters near the sniffing boundary no longer risk panics
  • Regular text endpoints (HTML, JSON, XML) continue to work as expected

Testing

cargo test -p forge_services test_is_binary_content_type -- --nocapture

Links

  • Related issues: N/A

}

/// Returns true if the Content-Type header indicates binary (non-text) content.
fn is_binary_content_type(content_type: &str) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a util function we created to detect binary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me find it don't think it was based on contect-type header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no binary detection on http-headers, its on file system. I have moved this to util as we don't want to download and then fail we should fail fast.

@amitksingh1490 amitksingh1490 changed the title fix(fetch): enhance binary content detection and error handling fix(fetch): reject binary responses and harden HTML sniffing Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants