Skip to content

fix(crewai-files): block SSRF to non-public addresses in FileUrl fetch#6268

Open
zied-jlassi wants to merge 3 commits into
crewAIInc:mainfrom
zied-jlassi:fix/crewai-files-ssrf-fileurl
Open

fix(crewai-files): block SSRF to non-public addresses in FileUrl fetch#6268
zied-jlassi wants to merge 3 commits into
crewAIInc:mainfrom
zied-jlassi:fix/crewai-files-ssrf-fileurl

Conversation

@zied-jlassi

@zied-jlassi zied-jlassi commented Jun 20, 2026

Copy link
Copy Markdown

Summary

FileUrl.read() / FileUrl.aread() (in crewai-files) fetch arbitrary http(s) URLs with follow_redirects=True and no network validation. Because the library auto-coerces any "http(s)://…" string into a FileUrl (in _normalize_source, normalize_input_files, and the Pydantic _FileSourceCoercer used by every BaseFile.source), an application that forwards an agent/LLM/user-supplied URL as a file input can be steered into:

  • loopback / private / link-local services (http://127.0.0.1:…, http://10.0.0.5:6379/),
  • cloud metadata (http://169.254.169.254/latest/meta-data/…),
  • or an internal target reached via redirect from a public URL (the previous code followed redirects without re-validating each hop).

This is a Server-Side Request Forgery (CWE-918).

The only prior validation was the URL scheme (_validate_url), which controls the protocol but says nothing about the destination address.

Fix

  • Resolve the URL host and reject non-public addresses before each request — loopback, private, link-local, reserved, multicast, and unspecified ranges.
  • Follow redirects manually (follow_redirects=False) so every hop is re-validated; a public URL can no longer redirect into an internal/metadata address. Capped at _MAX_REDIRECTS = 5.
  • Normalize IPv4-mapped IPv6 (e.g. ::ffff:127.0.0.1) before classifying, because ipaddress.IPv6Address does not flag the mapped form as loopback/private on its own — a common SSRF-guard bypass.
  • aread() performs the DNS lookup via loop.run_in_executor(...) so it does not block the event loop, and shares the exact same classification logic as read() (no sync/async drift).

No new runtime dependency: only stdlib ipaddress, socket, urllib.parse.

Tests

tests/test_file_url.py, new TestFileUrlSSRF:

  • 7 parametrized blocked addresses incl. 169.254.169.254 and the IPv4-mapped ::ffff:127.0.0.1 bypass — asserts read() raises and never issues the HTTP request;
  • a false-positive guard: a normal public URL still fetches;
  • a public URL that redirects to 169.254.169.254 is blocked;
  • a redirect loop raises Too many redirects;
  • aread() applies the same guard.

Existing fetch tests are kept offline via an autouse mock_public_dns fixture, and test_read_fetches_content was updated to assert the new safe call (follow_redirects=False).

Known limitation (declared, not hidden)

This guard resolves the host and then lets httpx connect, so a determined attacker could still attempt DNS rebinding (TOCTOU) between the check and the connection. Closing that fully requires pinning the validated IP into the transport; I'm happy to follow up with that in a separate PR if you'd prefer it bundled. The common SSRF vectors (direct private/metadata targets, redirect-to-internal, IPv4-mapped) are closed here.

Notes

  • One logical change (SSRF guard on FileUrl fetch).
  • Conventional Commit title; Google-style docstrings; built-in generics; full annotations.
  • Ran the package's own ruff/mypy/pytest expectations locally on the changed module; the full suite should be exercised by CI (uv run pytest lib/crewai-files/tests/).

AI disclosure: AI-assisted audit (audit by AI and Zied Jlassi, Architect AI). This PR carries the llm-generated label per CONTRIBUTING.md.

Summary by CodeRabbit

Release Notes

  • Bug Fixes / Security

    • Strengthened remote file URL fetching with SSRF protections, blocking non-public/internal IP targets after DNS resolution.
    • Improved redirect handling by following redirects manually with strict validation and a maximum redirect limit.
  • Tests

    • Made the test suite offline-safe by mocking DNS resolution.
    • Expanded coverage for SSRF denial, DNS “pinning,” and redirect behavior for both synchronous and asynchronous fetching.
  • Chores

    • Updated lint rule ignores for test files.

FileUrl.read/aread fetched any http(s) URL with follow_redirects=True and
no network validation. Because the library auto-coerces "http(s)://..."
strings into FileUrl, a URL forwarded as a file input could reach loopback,
private, link-local or cloud-metadata addresses (169.254.169.254), or be
redirected into them from a public URL -- a Server-Side Request Forgery
(CWE-918).

Resolve the host and reject non-public addresses before each request, and
follow redirects manually so every hop is re-validated. IPv4-mapped IPv6
addresses are normalized so ::ffff:127.0.0.1 cannot bypass the check. The
async path resolves DNS in the default executor to avoid blocking the loop.

AI-assisted audit (audit by AI and Zied Jlassi). PR labeled llm-generated
per CONTRIBUTING.md.
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c84ba97d-c8d8-41b5-ae22-606c94514475

📥 Commits

Reviewing files that changed from the base of the PR and between b2f5ad9 and 7952193.

📒 Files selected for processing (2)
  • lib/crewai-files/tests/test_file_url.py
  • pyproject.toml
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml

📝 Walkthrough

Walkthrough

SSRF protection is added to FileUrl.read() and FileUrl.aread() in sources.py. New helper functions classify IP addresses as non-public and validate URL hostnames via DNS resolution. Both fetch methods now disable httpx automatic redirect-following and instead manually loop over redirects, re-validating each hop's hostname, enforcing a maximum redirect count, and parsing content-type from the final response. Tests are updated with DNS mocking fixtures and a dedicated TestFileUrlSSRF class covering blocked IPs, pinning behavior, public allowance, and redirect exhaustion.

Changes

SSRF Protection for FileUrl Fetching

Layer / File(s) Summary
SSRF helper functions and imports
lib/crewai-files/src/crewai_files/core/sources.py
Adds imports for asyncio, ipaddress, socket, urljoin, urlparse, and expanded typing; introduces _MAX_REDIRECTS and helpers to normalize/classify IPs as blocked, extract URL hosts, reject non-public resolved addresses, and validate hostnames synchronously (socket.getaddrinfo) and asynchronously (run_in_executor).
FileUrl.read()/aread() manual redirect loop
lib/crewai-files/src/crewai_files/core/sources.py
Replaces httpx automatic redirect-following with a manual loop: disables follow_redirects, reads Location header, re-validates each hop via SSRF guard, enforces _MAX_REDIRECTS, raises ValueError on exhaustion, calls raise_for_status(), caches response.content, and parses content-type into _content_type.
Test mocking infrastructure and fixtures
lib/crewai-files/tests/test_file_url.py
Adds _addrinfo and _response helpers plus an autouse fixture patching socket.getaddrinfo to a fixed public IP for offline testing; introduces _sync_client and _async_client mocked httpx client builders supporting context management, request building, and ordered send return values.
Existing test updates to new mocking patterns
lib/crewai-files/tests/test_file_url.py
Rewires test_read_fetches_content, test_aread_fetches_content, test_aread_caches_content, and Bedrock URL resolution test to patch httpx.Client and httpx.AsyncClient with new mocked builders instead of httpx.get, verifying caching and content-type behavior through mocked send responses.
Comprehensive SSRF security test class and config
lib/crewai-files/tests/test_file_url.py, pyproject.toml
Adds TestFileUrlSSRF class with parametrized tests verifying rejection of non-public IPs (loopback, metadata, private ranges, IPv6 variants) without outbound requests; tests DNS pinning (URL rewriting with validated IP, Host/SNI preservation, one-time DNS lookup, explicit port preservation, IPv6 bracketing); allows public addresses; blocks internal redirects; enforces redirect-loop limit using _MAX_REDIRECTS. Updates Ruff config to suppress S104 for test directory.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing SSRF protection in FileUrl fetch operations by blocking non-public addresses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zied-jlassi

Copy link
Copy Markdown
Author

Disclosure: this PR was prepared with AI assistance (audit by AI and Zied Jlassi). Per CONTRIBUTING.md it should carry the llm-generated label, but as an external contributor I don't have permission to add labels — could a maintainer please apply it? Flagging proactively so the AI authorship is on the record.

@corridor-security corridor-security Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Issues

  • Server-Side Request Forgery via DNS rebinding
    The new guard validates the hostname with socket.getaddrinfo() and then lets httpx resolve the hostname again when connecting. An attacker-controlled domain can return a public IP during validation and an internal or metadata IP during the request, preserving SSRF to non-public addresses.

Summary: This PR adds SSRF filtering and redirect re-validation for FileUrl URL fetches, but the fetch path still crosses a DNS trust boundary without binding the validated address to the connection.

Risk: High risk. A public attacker who can provide a FileUrl could use DNS rebinding to reach internal services or cloud metadata despite the new non-public address checks.

Recommendations:

  • Pin the validated IP address for the actual HTTP connection, or use an HTTP transport/resolver that reuses the validated DNS result and rejects any changed or non-public address at connect time.
  • Apply the same pinned-resolution behavior to both read() and aread() redirect hops.

current = self.url
for _ in range(_MAX_REDIRECTS + 1):
_assert_url_allowed(current)
response = httpx.get(current, follow_redirects=False)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The URL is validated by a separate DNS lookup before the request, but httpx.get(current, ...) resolves current again when opening the connection:

_assert_url_allowed(current)
response = httpx.get(current, follow_redirects=False)

For an attacker-controlled hostname, DNS can answer with a public IP for _assert_url_allowed() and a private or metadata IP for the subsequent httpx lookup, bypassing the SSRF guard. The async path uses the same pattern in aread().

Remediation: Bind the validated address to the request, for example with a custom resolver/transport or by connecting to the validated IP while preserving the original Host/SNI, and reject any redirect hop whose actual connection address is non-public.

For more details, see the finding in Corridor.

Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid finding — thanks. Fixed in b2f5ad9: the host is now resolved once, every returned address is validated, and the request connects to the validated IP directly (URL host replaced with the IP, original Host header and TLS sni_hostname preserved). The hostname is never resolved a second time by httpx, so the rebinding window is closed, and each redirect hop is resolved+validated the same way. Added tests for IP pinning, Host/SNI preservation, single-resolution, mixed public/private records, ports, IPv6 bracketing, and redirect re-validation.

The previous guard validated the host with a separate DNS lookup and then let
httpx resolve the hostname again when connecting, leaving a DNS-rebinding
(TOCTOU) window: an attacker-controlled host could answer public for the check
and private/metadata for the connection.

Resolve the host once, validate every returned address, and connect to the
validated IP directly (the URL host is replaced with the IP, while the original
Host header and TLS SNI hostname are preserved). The hostname is never resolved
a second time, so rebinding cannot occur. Every redirect hop is resolved and
validated the same way.

Adds tests for IP pinning, Host/SNI preservation, single-resolution (rebinding),
mixed public/private records, explicit ports, IPv6 bracketing, and redirect
re-validation.

AI-assisted audit (audit by AI and Zied Jlassi). Addresses the corridor-security
DNS-rebinding finding on this PR.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
lib/crewai-files/tests/test_file_url.py (2)

521-544: ⚡ Quick win

Mirror redirect SSRF coverage for aread().

The async path has its own redirect loop, but these async tests only cover direct denial and pinning. Add async equivalents for redirect-to-internal and redirect-exhaustion so aread() cannot regress while sync tests still pass.

Example async redirect coverage
+    `@pytest.mark.asyncio`
+    async def test_aread_blocks_redirect_to_internal(self):
+        url = FileUrl(url="https://example.com/start")
+        redirect = _response(
+            headers={"location": "http://169.254.169.254/latest/meta-data/"},
+            is_redirect=True,
+        )
+        client = _async_client([redirect])
+
+        def fake_getaddrinfo(host, *_args, **_kwargs):
+            return _addrinfo(
+                {
+                    "example.com": "93.184.216.34",
+                    "169.254.169.254": "169.254.169.254",
+                }[host]
+            )
+
+        with patch("socket.getaddrinfo", side_effect=fake_getaddrinfo):
+            with patch("httpx.AsyncClient", return_value=client):
+                with pytest.raises(ValueError, match="SSRF protection"):
+                    await url.aread()
+
+    `@pytest.mark.asyncio`
+    async def test_aread_blocks_redirect_bomb(self):
+        url = FileUrl(url="https://example.com/a")
+        redirects = [
+            _response(headers={"location": "https://example.com/a"}, is_redirect=True)
+            for _ in range(_MAX_REDIRECTS + 2)
+        ]
+        client = _async_client(redirects)
+        with patch("socket.getaddrinfo", return_value=_addrinfo("93.184.216.34")):
+            with patch("httpx.AsyncClient", return_value=client):
+                with pytest.raises(ValueError, match="Too many redirects"):
+                    await url.aread()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-files/tests/test_file_url.py` around lines 521 - 544, Add two new
async test methods following the existing test_aread_blocks_non_public_address
and test_aread_pins_connection_to_validated_ip methods to provide equivalent
redirect coverage for the aread() async path. Create a
test_aread_blocks_redirect_to_internal method that verifies aread() rejects
redirects pointing to non-public IP addresses, and a
test_aread_blocks_excessive_redirects method that verifies aread() terminates
after too many redirect hops, mirroring the SSRF and redirect exhaustion
protections that already exist in the synchronous read() tests to prevent
regression in the async redirect loop implementation.

503-519: Add assertions to verify follow_redirects is disabled in SSRF tests.

The redirect tests use mocked clients that don't auto-follow redirects regardless of constructor arguments. If production code regresses to follow_redirects=True (or removes the explicit False), these tests would still pass. Capture the patched constructor and assert follow_redirects is explicitly False in both the SSRF redirect and redirect-bomb tests.

Suggested test hardening
-        with patch("socket.getaddrinfo", side_effect=fake_getaddrinfo):
-            with patch("httpx.Client", return_value=client):
+        with patch("socket.getaddrinfo", side_effect=fake_getaddrinfo):
+            with patch("httpx.Client", return_value=client) as client_cls:
                 with pytest.raises(ValueError, match="SSRF protection"):
                     url.read()
+            assert client_cls.call_args.kwargs.get("follow_redirects") is False
-        with patch("socket.getaddrinfo", return_value=_addrinfo("93.184.216.34")):
-            with patch("httpx.Client", return_value=client):
+        with patch("socket.getaddrinfo", return_value=_addrinfo("93.184.216.34")):
+            with patch("httpx.Client", return_value=client) as client_cls:
                 with pytest.raises(ValueError, match="Too many redirects"):
                     url.read()
+            assert client_cls.call_args.kwargs.get("follow_redirects") is False
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai-files/tests/test_file_url.py` around lines 503 - 519, The SSRF and
redirect bomb tests do not verify that follow_redirects is explicitly disabled
in the httpx.Client constructor calls, which means the tests would still pass
even if the production code regresses to using follow_redirects=True. Capture
the patched httpx.Client constructor (the one being patched in both test
methods) by assigning it to a variable, then add assertions after the test
methods complete to verify that the constructor was called with
follow_redirects=False as an argument in both the SSRF protection test and the
test_read_blocks_redirect_bomb test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/crewai-files/tests/test_file_url.py`:
- Around line 521-544: Add two new async test methods following the existing
test_aread_blocks_non_public_address and
test_aread_pins_connection_to_validated_ip methods to provide equivalent
redirect coverage for the aread() async path. Create a
test_aread_blocks_redirect_to_internal method that verifies aread() rejects
redirects pointing to non-public IP addresses, and a
test_aread_blocks_excessive_redirects method that verifies aread() terminates
after too many redirect hops, mirroring the SSRF and redirect exhaustion
protections that already exist in the synchronous read() tests to prevent
regression in the async redirect loop implementation.
- Around line 503-519: The SSRF and redirect bomb tests do not verify that
follow_redirects is explicitly disabled in the httpx.Client constructor calls,
which means the tests would still pass even if the production code regresses to
using follow_redirects=True. Capture the patched httpx.Client constructor (the
one being patched in both test methods) by assigning it to a variable, then add
assertions after the test methods complete to verify that the constructor was
called with follow_redirects=False as an argument in both the SSRF protection
test and the test_read_blocks_redirect_bomb test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 087e354a-fc77-4f16-9753-dcfac75a661c

📥 Commits

Reviewing files that changed from the base of the PR and between 14974af and b2f5ad9.

📒 Files selected for processing (2)
  • lib/crewai-files/src/crewai_files/core/sources.py
  • lib/crewai-files/tests/test_file_url.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/crewai-files/src/crewai_files/core/sources.py

…edirects

Mirror the sync redirect-revalidation and redirect-bomb tests for aread(),
and assert both read()/aread() construct the httpx client with
follow_redirects disabled so a regression cannot silently re-enable
auto-follow and skip per-hop SSRF checks. Ignore S104 on the bind-all
literal used as an SSRF test fixture.

AI-assisted audit (audit by AI and Zied Jlassi, Architect AI).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant