From ec80e8f5fd5a171576085b52032613d562f96688 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Mon, 1 Jun 2026 19:50:05 +0200 Subject: [PATCH] Add a draft THREAT_MODEL.md + SECURITY.md and link it from AGENTS.md Generated-by: Claude Code --- AGENTS.md | 7 ++ SECURITY.md | 13 +++ THREAT_MODEL.md | 224 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 244 insertions(+) create mode 100644 SECURITY.md create mode 100644 THREAT_MODEL.md diff --git a/AGENTS.md b/AGENTS.md index 44a294f60512..5937d25d8377 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -179,3 +179,10 @@ refactor(core): Simplify error handling - Use `just` for common development tasks - Check CI workflows for platform-specific requirements - Services are tested against real backends (credentials required) + +## Security + +Security model: [SECURITY.md](./SECURITY.md) + +Agents that scan this repository should consult `SECURITY.md` and the +threat model it links before reporting issues. diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 000000000000..6bc0b5718992 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,13 @@ +# Security Policy + +## Reporting a Vulnerability + +`apache/opendal` follows the [Apache Software Foundation security process](https://www.apache.org/security/). Please report suspected +vulnerabilities privately to `security@apache.org`; do not open public +GitHub issues or pull requests for security reports. + +## Threat Model + +What the project treats as in scope and out of scope, the security +properties it provides and disclaims, the adversary model, and how +findings are triaged are documented in [THREAT_MODEL.md](./THREAT_MODEL.md). diff --git a/THREAT_MODEL.md b/THREAT_MODEL.md new file mode 100644 index 000000000000..5affa073d098 --- /dev/null +++ b/THREAT_MODEL.md @@ -0,0 +1,224 @@ +# Apache OpenDAL — Threat Model (v1 draft) + +**Status:** v1 draft, authored by the ASF Security Team for OpenDAL PMC +review. Drafted against `apache/opendal` default branch, 2026-06-01. + +**Provenance legend:** *(documented)* — from the project's own docs/README/ +source; *(maintainer)* — ratified by an OpenDAL PMC member; *(inferred)* — +reasoned from code structure / domain norms, **not yet confirmed** (every +*(inferred)* claim has a matching item in §14 Open questions). + +**Draft confidence:** documented ~24 · maintainer 0 · inferred ~21. This is a +react-to-me draft, not a finished model — the OpenDAL PMC (xuanwo, tison) +confirms/corrects, especially the §14 items. + +**Revision triggers:** a new service backend with a novel auth/credential +shape; a new layer that touches credentials, paths, or request routing; a +change to the FFI/binding boundary; a change to default TLS/endpoint handling. + +--- + +## §1 Header + +- **Project:** Apache OpenDAL™ (`apache/opendal`) — "One Layer, All Storage": + a unified data-access layer (`Operator` abstraction) over object storage, + file systems, cloud SaaS, databases, protocols, and key-value services. + *(documented — README)* +- **Scope of this model:** the `apache/opendal` repository — the Rust core + (`core/`), the ~70 service backends (`core/services/`), the ~25 layers + (`core/layers/`), and the language bindings (`bindings/`). The sibling + repos `opendal-reqsign`, `opendal-go-services`, and `opendal-oli` get their + own models (separate Glasswing PRs). *(documented — repo layout)* +- **What OpenDAL is not:** not a storage server, not a sandbox, not an + auth/identity provider. It is an in-process client library that brokers a + caller's data access to a configured backend. *(inferred — Q1)* + +## §2 Scope and intended use + +OpenDAL is embedded **in-process** in a host application (or invoked through +a language binding) to read/write/list data on a backend the host configures. +*(documented — README "in-process library", `Operator`)* + +### Component families (distinct threat profiles) + +| Family | Path | Role / trust notes | +|---|---|---| +| Core (`Operator`/`Accessor`, types, raw) | `core/src/` | The public API surface + the abstraction every backend implements. *(documented)* | +| Service backends (~70) | `core/services/` | Each speaks a remote/local protocol: cloud object stores (s3, gcs, azblob, azdls, oss, cos, obs, b2, swift), HDFS/WebHDFS, SFTP/FTP/WebDAV, SaaS (gdrive, onedrive, dropbox, …), databases (mysql, postgresql, mongodb, surrealdb, …), KV/cache (redis, memcached, etcd, tikv, rocksdb, …), local FS, and HTTP. Each handles its own **credentials + auth + transport**. *(documented — `core/services/`)* | +| Layers (~25) | `core/layers/` | Cross-cutting middleware: retry, timeout, throttle, concurrent-limit, logging, tracing, metrics, capability-check, immutable-index, etc. Some touch security-relevant behaviour (logging may surface request metadata; retry/concurrent-limit affect resource bounds). *(documented — `core/layers/`)* | +| Language bindings | `bindings/` | C, C++, Go, Python, Java, Node.js, and more — each crosses an FFI/ABI boundary into the Rust core. *(documented — README/`bindings/`)* | + +**In scope:** core + service backends + layers + the bindings' glue to the +core. **Intended caller trust:** the host application and its configuration +(endpoints, credentials) are **trusted**; the *data* and *paths* the host +passes through OpenDAL may be attacker-influenced. *(inferred — Q2)* + +## §3 Out of scope (explicit non-goals) + +- `examples/`, `fuzz/`, `testkit/`, `tests/`, `benches/`, `edge/`, `dev/`, + `fixtures/` — not shipped as supported product. *(inferred — Q3)* +- The **remote services themselves** (S3, GCS, a Postgres server, …). OpenDAL + is a *client*; a vulnerability in the backend service is the backend's, not + OpenDAL's. *(inferred — Q3)* +- **Credential acquisition / storage / rotation.** OpenDAL consumes + credentials the host supplies via config; how the host obtains, stores, and + rotates them is the host's responsibility. *(inferred — Q10)* +- **Encryption at rest** and object-level integrity beyond what the backend + protocol provides. *(inferred — Q9)* +- Host-language memory safety on the far side of a binding's FFI boundary + (the binding marshals correctly; what the host language does with the bytes + is the host's). *(inferred — Q3)* + +## §4 Trust boundaries and data flow + +1. **Caller → `Operator` API.** The in-process boundary. The caller is + trusted to be non-malicious (an in-process caller already has full process + control — §7). What crosses is paths, data, and config. *(inferred — Q4)* +2. **OpenDAL → remote backend (network).** The high-value boundary: requests + carry credentials/signatures; responses are **backend-controlled bytes**. + TLS (where the backend supports it) protects this hop. *(inferred — Q5,Q6)* +3. **Config/credential ingestion.** Endpoints, regions, keys, tokens enter via + builder/config. A caller that lets *untrusted* input reach config (e.g. a + user-supplied endpoint for the `http` service) changes the trust picture + (SSRF surface). *(inferred — Q7)* +4. **FFI/binding boundary.** Each binding marshals across the Rust↔host ABI. + *(documented — `bindings/`)* + +## §5 Assumptions about the environment + +- Rust core; async via the host's runtime (typically Tokio). *(inferred — Q19)* +- For local backends (`fs`, `sftp`, `ftp`, `hdfs`, …) the OS path semantics and + filesystem permissions are the OS's; OpenDAL passes paths through. *(inferred — Q7)* +- Network reachability to the configured backend; system trust store for TLS. + *(inferred — Q6)* + +## §6 Assumptions about inputs — per-parameter trust + +| Input | Source | Trusted? | Notes | +|---|---|---|---| +| Backend config (endpoint, region, bucket) | Host app | Trusted | If host lets untrusted input set the endpoint (esp. `http`), that's an SSRF surface the host owns. *(inferred — Q7)* | +| Credentials (keys, tokens, SAS) | Host app | Trusted (secret) | OpenDAL must not log/leak them (§8/§9, Q8). | +| Object path / key | Caller (may be attacker-influenced) | Untrusted-capable | Path normalization + traversal handling across backends is security-relevant. *(inferred — Q7)* | +| Object data (write payload) | Caller | Pass-through | OpenDAL doesn't interpret payload semantics. *(inferred)* | +| Range / offset / length params | Caller | Untrusted-capable | Bounds handling on reads/multipart. *(inferred — Q11)* | +| Backend HTTP responses (bodies, headers, redirects, listing XML/JSON) | Remote backend | **Untrusted-capable** | A hostile/compromised backend can return adversarial sizes, redirects, malformed listings. Is parsing hardened against this? *(inferred — Q5)* | + +## §7 Adversary model + +- **Primary (in scope):** an actor who controls or influences the **data and + paths** the host passes through OpenDAL, and/or who sits on the + **network path** to the backend (MITM, absent/misconfigured TLS). + *(inferred — Q8)* +- **Secondary (likely in scope):** a **malicious or compromised backend + endpoint** returning adversarial responses (oversized bodies, decompression + pressure, malicious redirects, malformed list output). *(inferred — Q5)* +- **Out of scope:** the **in-process caller** — already has full control of the + host process; not a meaningful adversary at this layer. *(inferred — Q4)* + Side-channel / co-tenant adversaries against the host process. *(inferred — Q10)* + +## §8 Security properties the project provides *(all pending PMC confirmation — Q8/Q9/Q11)* + +- **Credential confidentiality in logs/traces:** the logging/tracing layers + are expected **not** to emit raw credentials or full signed Authorization + headers. *(inferred — Q8)* — violation symptom: a key/token/SAS appears in + log/trace/metric output; severity: high. +- **Transport security to cloud backends:** HTTPS endpoints by default for + cloud object stores. *(inferred — Q6)* — violation: silent downgrade to + plaintext for a backend that supports TLS; severity: high. +- **Memory safety** for well-formed inputs (Rust core, `#![forbid(unsafe)]`? + unknown for FFI). *(inferred — Q12)* +- **Resource bounds are opt-in** via layers (`concurrent-limit`, `throttle`, + `timeout`, `retry` caps). *(documented — `core/layers/`)* — i.e. OpenDAL + provides the *mechanism*; the host sets the policy. + +## §9 Security properties the project does *not* provide + +- **Not a sandbox / not an isolation boundary.** A trusted caller can do + anything the configured backend + credentials allow. *(inferred — Q1)* +- **No defense against a fully hostile backend** beyond protocol correctness — + if you point OpenDAL at a malicious endpoint, response bytes are still + parsed. *(inferred — Q5)* +- **Presigned URLs are bearer capabilities,** not an authentication boundary: + anyone holding a presigned URL has the access it encodes for its lifetime. + *(inferred — Q13)* +- **No object integrity/authenticity guarantee** beyond what the backend + protocol offers (e.g. an ETag/CRC is error-detection, not a MAC against a + malicious backend). *(inferred — Q9)* +- **No automatic bounding of object/listing sizes** — a `read` of a + 10 GiB object returns 10 GiB unless the caller streams/limits. *(inferred — Q11)* +- **No SSRF protection** if the host wires untrusted input into the endpoint/ + `http`-service config — endpoint choice is the host's. *(inferred — Q7)* + +## §10 Downstream (integrator) responsibilities + +- Manage, scope, and rotate credentials; never source backend **config** + (esp. endpoints) from untrusted input without validation. *(inferred — Q7,Q10)* +- Sanitize/validate **paths** derived from untrusted input before passing them + as object keys (path-traversal on `fs`/`sftp`/`ftp`/`webdav`). *(inferred — Q7)* +- Bound object sizes / streaming / concurrency via the provided layers for + untrusted or unbounded data. *(inferred — Q11)* +- Treat presigned URLs as secrets with the right TTL. *(inferred — Q13)* +- Ensure TLS endpoints for sensitive data; supply the trust store. *(inferred — Q6)* + +## §11 Known misuse patterns + +- Wiring an **attacker-controlled endpoint/URL** into the `http` service (or + any endpoint config) → SSRF to internal resources. *(inferred — Q7)* +- Using a **non-TLS** endpoint for a cloud backend over an untrusted network. + *(inferred — Q6)* +- Passing **untrusted user input as an object path** without normalization. + *(inferred — Q7)* +- Treating the backend's checksum/ETag as proof of authenticity against a + malicious backend. *(inferred — Q9)* + +### §11a Known non-findings (recurring false positives) *(seed — PMC to expand, Q18a)* + +- "OpenDAL connects to whatever endpoint is configured" — connecting to an + internal/operator-chosen endpoint is **operator configuration**, not a + library vuln. *(inferred — Q7)* +- "Credentials are read from env/config" — credential sourcing is the host's; + presence of an access-key field is not a finding. *(inferred — Q10)* +- "The retry layer can multiply requests" — that is a configured tradeoff, not + a defect; the host sets retry policy. *(inferred — Q11)* +- "A presigned URL grants access without login" — by design (bearer + capability). *(inferred — Q13)* + +## §12 Conditions that would change this model + +New backend with a novel auth/credential model; a layer that newly logs or +routes credentials/paths; FFI/ABI change in a binding; a change to default +endpoint/TLS behaviour. *(inferred)* + +## §13 Triage dispositions + +| Disposition | When | Licensing section | +|---|---|---| +| **VALID** | Violates a §8 property under the §7 adversary, in in-scope code | §7, §8 | +| **OUT-OF-MODEL** | Adversary is the in-process caller, or code in `examples/`/`tests/`/`fuzz/` | §3, §7 | +| **DOWNSTREAM-RESPONSIBILITY** | Credential mgmt, endpoint choice, path sanitization, size bounding | §10 | +| **DISCLAIMED** | A §9 non-guarantee (presign bearer, hostile-backend bytes, integrity-vs-MAC) | §9 | +| **MODEL-GAP** | Plausible but not covered here → escalate to PMC, may revise | §12 | + +## §14 Open questions for the maintainers + +Each maps to an *(inferred)* claim above; a one-line confirm/correct is enough. + +1. **(Q1)** Is "in-process client library, not a sandbox/server" the right framing? +2. **(Q2)** Confirm: host app + its config/credentials are trusted; data + paths passed through may be attacker-influenced. +3. **(Q3)** Confirm out-of-scope set (examples/fuzz/testkit/tests/benches/edge; remote services themselves; far-side-of-FFI host code). +4. **(Q4)** Is the in-process caller out of the adversary model (as for most libraries)? +5. **(Q5)** Is a **malicious/compromised backend** in scope? How hardened is response/listing parsing against adversarial bytes/redirects/sizes? +6. **(Q6)** TLS posture: HTTPS by default for cloud backends? Any silent plaintext fallbacks? Who supplies the trust store? +7. **(Q7)** SSRF / endpoint trust: is untrusted-endpoint config explicitly the host's responsibility? Path-traversal handling for local backends — normalized by OpenDAL or passed through? +8. **(Q8)** Credential confidentiality: do logging/tracing/metrics layers guarantee no raw credentials/Authorization headers are emitted? At which log levels? +9. **(Q9)** Object integrity: does OpenDAL claim any integrity/authenticity beyond the backend protocol? Confirm ETag/CRC ≠ MAC framing. +10. **(Q10)** Confirm credential acquisition/storage/rotation + co-tenant/side-channel are downstream/out-of-scope. +11. **(Q11)** Resource bounds: confirm size/concurrency bounding is opt-in via layers, and unbounded reads are not a library defect. +12. **(Q12)** Memory-safety posture: is the core `unsafe`-free? What about the FFI/binding layer? +13. **(Q13)** Presign semantics: confirm presigned URLs are bearer capabilities (not an auth boundary), TTL is the caller's choice. +14. **(Q18a)** What do scanners most often report against OpenDAL that you consider a non-finding? (Expand §11a.) +15. **(meta)** Should this model live as `THREAT_MODEL.md` in-repo (this PR), and is the `AGENTS.md → SECURITY.md → THREAT_MODEL.md` chain the discoverability path you want? + +## §15 Machine-readable companion + +Not generated for v1. Optional; can follow once prose is ratified.