Skip to content

Comments

fix: prevent path traversal in knowledge source file paths (#4547)#4548

Open
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1771591862-fix-path-traversal-knowledge-sources
Open

fix: prevent path traversal in knowledge source file paths (#4547)#4548
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1771591862-fix-path-traversal-knowledge-sources

Conversation

@devin-ai-integration
Copy link
Contributor

fix: prevent path traversal in knowledge source file paths (#4547)

Summary

Adds path boundary validation to prevent path traversal attacks in knowledge source file path handling. Previously, convert_to_path() simply concatenated user-provided strings with KNOWLEDGE_DIRECTORY (e.g., Path("knowledge/" + path)), allowing ../ sequences to escape the knowledge directory and read arbitrary files on the system.

The fix uses Path.resolve() + Path.is_relative_to() to validate that the final resolved path remains within the knowledge directory, raising ValueError if it doesn't. Applied to three locations:

  • BaseFileKnowledgeSource.convert_to_path()
  • ExcelKnowledgeSource.convert_to_path()
  • CrewDoclingSource.validate_content() (inline path construction)

Review & Testing Checklist for Human

  • Path objects bypass validation: When a Path object (not a string) is passed, the traversal check is skipped entirely — only string inputs are validated. Verify this is acceptable for your threat model, since Path("../../etc/passwd") would not be caught.
  • Relative KNOWLEDGE_DIRECTORY + resolve(): KNOWLEDGE_DIRECTORY is "knowledge" (relative). Path("knowledge").resolve() resolves against CWD. Verify this behaves correctly in all deployment contexts (e.g., different working directories, symlinks).
  • test_valid_paths_still_work_in_convert_to_path passes a Path object, so it exercises the non-validated branch. Consider whether a test with a valid string path (e.g., "test_valid.txt" where knowledge/test_valid.txt exists) would better cover the happy path through the new validation logic.
  • Docling test will be skipped in environments without the docling package — the crew_docling_source.py fix may not be tested in CI.

Suggested manual test: instantiate a TextFileKnowledgeSource(file_paths=["../../../etc/passwd"]) and confirm it raises ValueError.

Notes

Add path boundary validation in convert_to_path() to ensure resolved
paths stay within the knowledge directory. This prevents attackers from
using '../' sequences to read arbitrary files outside the knowledge dir.

Fixes:
- BaseFileKnowledgeSource.convert_to_path()
- ExcelKnowledgeSource.convert_to_path()
- CrewDoclingSource.validate_content() inline path construction

Added tests covering path traversal rejection and valid path acceptance.

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants