fix(crewai-files): block SSRF to non-public addresses in FileUrl fetch#6268
fix(crewai-files): block SSRF to non-public addresses in FileUrl fetch#6268zied-jlassi wants to merge 3 commits into
Conversation
FileUrl.read/aread fetched any http(s) URL with follow_redirects=True and no network validation. Because the library auto-coerces "http(s)://..." strings into FileUrl, a URL forwarded as a file input could reach loopback, private, link-local or cloud-metadata addresses (169.254.169.254), or be redirected into them from a public URL -- a Server-Side Request Forgery (CWE-918). Resolve the host and reject non-public addresses before each request, and follow redirects manually so every hop is re-validated. IPv4-mapped IPv6 addresses are normalized so ::ffff:127.0.0.1 cannot bypass the check. The async path resolves DNS in the default executor to avoid blocking the loop. AI-assisted audit (audit by AI and Zied Jlassi). PR labeled llm-generated per CONTRIBUTING.md.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughSSRF protection is added to ChangesSSRF Protection for FileUrl Fetching
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Disclosure: this PR was prepared with AI assistance (audit by AI and Zied Jlassi). Per CONTRIBUTING.md it should carry the |
There was a problem hiding this comment.
Security Issues
- Server-Side Request Forgery via DNS rebinding
The new guard validates the hostname withsocket.getaddrinfo()and then letshttpxresolve the hostname again when connecting. An attacker-controlled domain can return a public IP during validation and an internal or metadata IP during the request, preserving SSRF to non-public addresses.
Summary: This PR adds SSRF filtering and redirect re-validation for FileUrl URL fetches, but the fetch path still crosses a DNS trust boundary without binding the validated address to the connection.
Risk: High risk. A public attacker who can provide a FileUrl could use DNS rebinding to reach internal services or cloud metadata despite the new non-public address checks.
Recommendations:
- Pin the validated IP address for the actual HTTP connection, or use an HTTP transport/resolver that reuses the validated DNS result and rejects any changed or non-public address at connect time.
- Apply the same pinned-resolution behavior to both
read()andaread()redirect hops.
| current = self.url | ||
| for _ in range(_MAX_REDIRECTS + 1): | ||
| _assert_url_allowed(current) | ||
| response = httpx.get(current, follow_redirects=False) |
There was a problem hiding this comment.
The URL is validated by a separate DNS lookup before the request, but httpx.get(current, ...) resolves current again when opening the connection:
_assert_url_allowed(current)
response = httpx.get(current, follow_redirects=False)For an attacker-controlled hostname, DNS can answer with a public IP for _assert_url_allowed() and a private or metadata IP for the subsequent httpx lookup, bypassing the SSRF guard. The async path uses the same pattern in aread().
Remediation: Bind the validated address to the request, for example with a custom resolver/transport or by connecting to the validated IP while preserving the original Host/SNI, and reject any redirect hop whose actual connection address is non-public.
For more details, see the finding in Corridor.
Provide feedback: Reply with whether this is a valid vulnerability or false positive to help improve Corridor's accuracy.
There was a problem hiding this comment.
Valid finding — thanks. Fixed in b2f5ad9: the host is now resolved once, every returned address is validated, and the request connects to the validated IP directly (URL host replaced with the IP, original Host header and TLS sni_hostname preserved). The hostname is never resolved a second time by httpx, so the rebinding window is closed, and each redirect hop is resolved+validated the same way. Added tests for IP pinning, Host/SNI preservation, single-resolution, mixed public/private records, ports, IPv6 bracketing, and redirect re-validation.
The previous guard validated the host with a separate DNS lookup and then let httpx resolve the hostname again when connecting, leaving a DNS-rebinding (TOCTOU) window: an attacker-controlled host could answer public for the check and private/metadata for the connection. Resolve the host once, validate every returned address, and connect to the validated IP directly (the URL host is replaced with the IP, while the original Host header and TLS SNI hostname are preserved). The hostname is never resolved a second time, so rebinding cannot occur. Every redirect hop is resolved and validated the same way. Adds tests for IP pinning, Host/SNI preservation, single-resolution (rebinding), mixed public/private records, explicit ports, IPv6 bracketing, and redirect re-validation. AI-assisted audit (audit by AI and Zied Jlassi). Addresses the corridor-security DNS-rebinding finding on this PR.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
lib/crewai-files/tests/test_file_url.py (2)
521-544: ⚡ Quick winMirror redirect SSRF coverage for
aread().The async path has its own redirect loop, but these async tests only cover direct denial and pinning. Add async equivalents for redirect-to-internal and redirect-exhaustion so
aread()cannot regress while sync tests still pass.Example async redirect coverage
+ `@pytest.mark.asyncio` + async def test_aread_blocks_redirect_to_internal(self): + url = FileUrl(url="https://example.com/start") + redirect = _response( + headers={"location": "http://169.254.169.254/latest/meta-data/"}, + is_redirect=True, + ) + client = _async_client([redirect]) + + def fake_getaddrinfo(host, *_args, **_kwargs): + return _addrinfo( + { + "example.com": "93.184.216.34", + "169.254.169.254": "169.254.169.254", + }[host] + ) + + with patch("socket.getaddrinfo", side_effect=fake_getaddrinfo): + with patch("httpx.AsyncClient", return_value=client): + with pytest.raises(ValueError, match="SSRF protection"): + await url.aread() + + `@pytest.mark.asyncio` + async def test_aread_blocks_redirect_bomb(self): + url = FileUrl(url="https://example.com/a") + redirects = [ + _response(headers={"location": "https://example.com/a"}, is_redirect=True) + for _ in range(_MAX_REDIRECTS + 2) + ] + client = _async_client(redirects) + with patch("socket.getaddrinfo", return_value=_addrinfo("93.184.216.34")): + with patch("httpx.AsyncClient", return_value=client): + with pytest.raises(ValueError, match="Too many redirects"): + await url.aread()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai-files/tests/test_file_url.py` around lines 521 - 544, Add two new async test methods following the existing test_aread_blocks_non_public_address and test_aread_pins_connection_to_validated_ip methods to provide equivalent redirect coverage for the aread() async path. Create a test_aread_blocks_redirect_to_internal method that verifies aread() rejects redirects pointing to non-public IP addresses, and a test_aread_blocks_excessive_redirects method that verifies aread() terminates after too many redirect hops, mirroring the SSRF and redirect exhaustion protections that already exist in the synchronous read() tests to prevent regression in the async redirect loop implementation.
503-519: Add assertions to verifyfollow_redirectsis disabled in SSRF tests.The redirect tests use mocked clients that don't auto-follow redirects regardless of constructor arguments. If production code regresses to
follow_redirects=True(or removes the explicitFalse), these tests would still pass. Capture the patched constructor and assertfollow_redirectsis explicitlyFalsein both the SSRF redirect and redirect-bomb tests.Suggested test hardening
- with patch("socket.getaddrinfo", side_effect=fake_getaddrinfo): - with patch("httpx.Client", return_value=client): + with patch("socket.getaddrinfo", side_effect=fake_getaddrinfo): + with patch("httpx.Client", return_value=client) as client_cls: with pytest.raises(ValueError, match="SSRF protection"): url.read() + assert client_cls.call_args.kwargs.get("follow_redirects") is False- with patch("socket.getaddrinfo", return_value=_addrinfo("93.184.216.34")): - with patch("httpx.Client", return_value=client): + with patch("socket.getaddrinfo", return_value=_addrinfo("93.184.216.34")): + with patch("httpx.Client", return_value=client) as client_cls: with pytest.raises(ValueError, match="Too many redirects"): url.read() + assert client_cls.call_args.kwargs.get("follow_redirects") is False🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai-files/tests/test_file_url.py` around lines 503 - 519, The SSRF and redirect bomb tests do not verify that follow_redirects is explicitly disabled in the httpx.Client constructor calls, which means the tests would still pass even if the production code regresses to using follow_redirects=True. Capture the patched httpx.Client constructor (the one being patched in both test methods) by assigning it to a variable, then add assertions after the test methods complete to verify that the constructor was called with follow_redirects=False as an argument in both the SSRF protection test and the test_read_blocks_redirect_bomb test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lib/crewai-files/tests/test_file_url.py`:
- Around line 521-544: Add two new async test methods following the existing
test_aread_blocks_non_public_address and
test_aread_pins_connection_to_validated_ip methods to provide equivalent
redirect coverage for the aread() async path. Create a
test_aread_blocks_redirect_to_internal method that verifies aread() rejects
redirects pointing to non-public IP addresses, and a
test_aread_blocks_excessive_redirects method that verifies aread() terminates
after too many redirect hops, mirroring the SSRF and redirect exhaustion
protections that already exist in the synchronous read() tests to prevent
regression in the async redirect loop implementation.
- Around line 503-519: The SSRF and redirect bomb tests do not verify that
follow_redirects is explicitly disabled in the httpx.Client constructor calls,
which means the tests would still pass even if the production code regresses to
using follow_redirects=True. Capture the patched httpx.Client constructor (the
one being patched in both test methods) by assigning it to a variable, then add
assertions after the test methods complete to verify that the constructor was
called with follow_redirects=False as an argument in both the SSRF protection
test and the test_read_blocks_redirect_bomb test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 087e354a-fc77-4f16-9753-dcfac75a661c
📒 Files selected for processing (2)
lib/crewai-files/src/crewai_files/core/sources.pylib/crewai-files/tests/test_file_url.py
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/crewai-files/src/crewai_files/core/sources.py
…edirects Mirror the sync redirect-revalidation and redirect-bomb tests for aread(), and assert both read()/aread() construct the httpx client with follow_redirects disabled so a regression cannot silently re-enable auto-follow and skip per-hop SSRF checks. Ignore S104 on the bind-all literal used as an SSRF test fixture. AI-assisted audit (audit by AI and Zied Jlassi, Architect AI).
Summary
FileUrl.read()/FileUrl.aread()(increwai-files) fetch arbitraryhttp(s)URLs withfollow_redirects=Trueand no network validation. Because the library auto-coerces any"http(s)://…"string into aFileUrl(in_normalize_source,normalize_input_files, and the Pydantic_FileSourceCoercerused by everyBaseFile.source), an application that forwards an agent/LLM/user-supplied URL as a file input can be steered into:http://127.0.0.1:…,http://10.0.0.5:6379/),http://169.254.169.254/latest/meta-data/…),This is a Server-Side Request Forgery (CWE-918).
The only prior validation was the URL scheme (
_validate_url), which controls the protocol but says nothing about the destination address.Fix
follow_redirects=False) so every hop is re-validated; a public URL can no longer redirect into an internal/metadata address. Capped at_MAX_REDIRECTS = 5.::ffff:127.0.0.1) before classifying, becauseipaddress.IPv6Addressdoes not flag the mapped form as loopback/private on its own — a common SSRF-guard bypass.aread()performs the DNS lookup vialoop.run_in_executor(...)so it does not block the event loop, and shares the exact same classification logic asread()(no sync/async drift).No new runtime dependency: only stdlib
ipaddress,socket,urllib.parse.Tests
tests/test_file_url.py, newTestFileUrlSSRF:169.254.169.254and the IPv4-mapped::ffff:127.0.0.1bypass — assertsread()raises and never issues the HTTP request;169.254.169.254is blocked;Too many redirects;aread()applies the same guard.Existing fetch tests are kept offline via an autouse
mock_public_dnsfixture, andtest_read_fetches_contentwas updated to assert the new safe call (follow_redirects=False).Known limitation (declared, not hidden)
This guard resolves the host and then lets
httpxconnect, so a determined attacker could still attempt DNS rebinding (TOCTOU) between the check and the connection. Closing that fully requires pinning the validated IP into the transport; I'm happy to follow up with that in a separate PR if you'd prefer it bundled. The common SSRF vectors (direct private/metadata targets, redirect-to-internal, IPv4-mapped) are closed here.Notes
FileUrlfetch).ruff/mypy/pytestexpectations locally on the changed module; the full suite should be exercised by CI (uv run pytest lib/crewai-files/tests/).AI disclosure: AI-assisted audit (audit by AI and Zied Jlassi, Architect AI). This PR carries the
llm-generatedlabel perCONTRIBUTING.md.Summary by CodeRabbit
Release Notes
Bug Fixes / Security
Tests
Chores