Skip to content

fix: prevent path traversal in LocalStorageProvider (CWE-22)#2087

Open
sebastiondev wants to merge 1 commit intolangbot-app:masterfrom
sebastiondev:fix/cwe22-localstorage-local-9971
Open

fix: prevent path traversal in LocalStorageProvider (CWE-22)#2087
sebastiondev wants to merge 1 commit intolangbot-app:masterfrom
sebastiondev:fix/cwe22-localstorage-local-9971

Conversation

@sebastiondev
Copy link
Copy Markdown

Vulnerability Summary

CWE-22 — Path Traversal | Severity: Critical | CVSS ~9.3 (unauthenticated RCE-adjacent)

LocalStorageProvider in src/langbot/pkg/storage/providers/localstorage.py constructs file paths via:

os.path.join(LOCAL_STORAGE_PATH, key)

When key is an absolute path (e.g. "/etc/passwd"), os.path.join() discards all preceding components and returns the absolute path directly. When key contains .. sequences, the path resolves outside the storage root. This enables arbitrary file read, write, and delete on the host filesystem.

Data Flow

Attacker ──WebSocket──► _process_image_components()
                           │
                           ▼ component["path"] (no validation)
                    storage_provider.load(key="/etc/passwd")
                           │
                           ▼ os.path.join("data/storage", "/etc/passwd")
                                        → "/etc/passwd"
                           │
                           ▼ aiofiles.open("/etc/passwd", "rb")
                           │
                           ▼ base64-encoded content broadcast back to attacker
                           │
                           ▼ os.remove("/etc/passwd")  ← FILE DELETED

The primary attack vector is the WebSocket endpoint at /api/v1/pipelines/<pipeline_uuid>/ws/connect, which:

  • Uses @self.quart_app.websocket() directly, bypassing all auth middleware
  • Accepts Image message components with an attacker-controlled path field
  • Passes path directly to storage_provider.load() → reads the file
  • Then calls os.remove() on the same path → deletes the file

No authentication is required. The server binds 0.0.0.0 with CORS allow_origin="*".

All Callers of storage_provider Methods

# Caller Source of key Attacker-Reachable?
1 websocket_adapter._process_image_components() component["path"] from unauthenticated WebSocket Yes — primary vector
2 files.py GET /image/<path:image_key> URL path (has partial .. check, but misses absolute paths) Partially mitigated
3 plugin/handler.py GET_CONFIG_FILE data["file_key"] from plugin IPC Requires malicious plugin
4 kbmgr.py file.file_name from DB records Not directly reachable
5 rag/service/runtime.py Already has its own normpath + .. validation Has own mitigation
6 delete_dir_recursive callers Various Some reachable

Fix Description

Added a _safe_resolve(base, key) function at the storage provider layer that:

  1. Joins base and key with os.path.join()
  2. Canonicalises via os.path.realpath() (resolves symlinks and ..)
  3. Verifies the resolved path starts with os.path.realpath(base) + os.sep
  4. Raises ValueError if the path escapes the storage root

Every method on LocalStorageProvider (save, load, exists, delete, size, delete_dir_recursive) now calls _safe_resolve() before performing any filesystem operation.

Rationale

  • Defense-in-depth: The fix is at the lowest I/O layer, protecting all 6 callers simultaneously rather than patching each call site individually.
  • Textbook-correct: os.path.realpath() + prefix check is the standard mitigation for CWE-22 in Python (recommended by OWASP, used by Django/Flask internally).
  • No functional regression: Legitimate keys like "image_abc.png" and "bot_log_images/img_001.png" continue to work normally.
  • Minimal footprint: Only 2 files changed — the provider and a new test file.

Changes

 src/langbot/pkg/storage/providers/localstorage.py       | 42 +++++-
 tests/unit_tests/storage/test_localstorage_path_traversal.py | 181 ++++++
 2 files changed, 214 insertions(+), 9 deletions(-)

Test Results

9 new unit tests covering all attack vectors:

Test Vector Result
test_absolute_path_save_rejected Absolute path in save() ✅ Pass
test_absolute_path_load_rejected Absolute path in load() ✅ Pass
test_absolute_path_exists_rejected Absolute path in exists() ✅ Pass
test_absolute_path_delete_rejected Absolute path in delete() ✅ Pass
test_absolute_path_size_rejected Absolute path in size() ✅ Pass
test_dot_dot_path_traversal_rejected ../ relative traversal in load() ✅ Pass
test_delete_dir_recursive_traversal_rejected Absolute path in delete_dir_recursive() ✅ Pass
test_legitimate_key_works Normal key (flat file) ✅ Pass
test_legitimate_subdirectory_key_works Normal key (subdirectory) ✅ Pass

All tests use tmp_path fixtures and mock LOCAL_STORAGE_PATH, so they are safe, isolated, and reproducible.


Disprove Analysis

We systematically attempted to disprove this finding:

Auth Check

The WebSocket endpoint uses @self.quart_app.websocket() directly, bypassing all auth middleware (does not use self.route() which enforces auth). The GET /api/v1/files/image/<path:image_key> endpoint is explicitly auth_type=group.AuthType.NONE. No authentication protects the exploitable paths.

Network Check

  • Server binds on 0.0.0.0 (all interfaces)
  • CORS set to allow_origin="*"
  • No network-level restrictions found in configuration

Deployment Context

Dockerfile runs uv run main.py directly. No reverse proxy, VPN, or service mesh configured. The README describes it as a "production-grade platform" intended to be internet-facing.

Existing Mitigations Found

  1. files.py GET endpoint has a partial .. and \\ check — but misses absolute paths
  2. rag/service/runtime.py has its own normpath validation — only covers that one caller
  3. Quart <path:> converter normalises URLs — mitigates some HTTP-based attacks only
  4. None of these mitigate the WebSocket attack vector

Input Validation

Zero input validation exists in localstorage.py before this fix. The websocket_adapter._process_image_components() also has zero validation on component["path"].

Prior Reports

No prior security issues found via gh issue list. No SECURITY.md exists. No recent commits to this file address security.

Fix Adequacy

The fix adds _safe_resolve() at the storage provider layer, meaning it protects all callers simultaneously. This is the correct architectural location — defense-in-depth at the lowest I/O layer. No parallel code path bypasses this fix.

Verdict

CONFIRMED_VALID — High confidence. The vulnerability is unambiguously exploitable through the unauthenticated WebSocket endpoint. The os.path.join("data/storage", "/etc/passwd")"/etc/passwd" behavior is trivially reproducible.


Preconditions for Exploitation

  1. Network access to the LangBot instance (port 5300 by default)
  2. LangBot using LocalStorageProvider (default for non-S3 deployments)
  3. No additional preconditions — no auth required, CORS is *, binds 0.0.0.0

Note: The unauthenticated WebSocket endpoint is a separate (broader) security issue worth addressing independently. This PR focuses specifically on the storage-layer path traversal vulnerability.

Add _safe_resolve() helper that uses os.path.realpath() to canonicalize
the joined path and verifies it stays within LOCAL_STORAGE_PATH.

All six public methods (save, load, exists, delete, size,
delete_dir_recursive) now validate the key before performing any I/O.

This prevents absolute-path injection (e.g. key="/etc/passwd") and
relative traversal (e.g. key="../../etc/passwd") from escaping the
storage root directory.

CWE-22
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant