From 7d72efc27a949746c7186ad845d1dc3f14e68262 Mon Sep 17 00:00:00 2001 From: John Myers Date: Thu, 21 May 2026 16:10:58 -0700 Subject: [PATCH] docs(rfc): propose sandbox proxy egress adapter model Signed-off-by: John Myers --- .../README.md | 420 ++++++++++++++++++ .../current-shape.md | 167 +++++++ .../implementation-plan.md | 127 ++++++ .../technical-design.md | 259 +++++++++++ 4 files changed, 973 insertions(+) create mode 100644 rfc/0004-sandbox-proxy-egress-adapter/README.md create mode 100644 rfc/0004-sandbox-proxy-egress-adapter/current-shape.md create mode 100644 rfc/0004-sandbox-proxy-egress-adapter/implementation-plan.md create mode 100644 rfc/0004-sandbox-proxy-egress-adapter/technical-design.md diff --git a/rfc/0004-sandbox-proxy-egress-adapter/README.md b/rfc/0004-sandbox-proxy-egress-adapter/README.md new file mode 100644 index 000000000..01001ca15 --- /dev/null +++ b/rfc/0004-sandbox-proxy-egress-adapter/README.md @@ -0,0 +1,420 @@ +--- +authors: + - "@johntmyers" +state: draft +links: + - https://github.com/NVIDIA/OpenShell/issues/1107 + - https://github.com/NVIDIA/OpenShell/pull/1083 + - https://github.com/NVIDIA/OpenShell/pull/1151 +--- + +# RFC 0004 - Sandbox Proxy Egress Adapter Model + + + +## Summary + +Refactor sandbox egress around one shared authorization and relay pipeline. +CONNECT, forward HTTP proxy, transparent native TCP, policy DNS, +`inference.local`, and `policy.local` should become adapters that translate +userland entry points into a common egress intent. Policy evaluation, +destination validation, credential injection, request-body rewrite, +WebSocket upgrade handling, protocol parsing, and relay ownership should happen +behind shared boundaries. + +This RFC keeps the main direction in this document. Supporting detail lives in: + +- [Current shape appendix](current-shape.md) +- [Technical design appendix](technical-design.md) +- [Implementation plan](implementation-plan.md) + +## Motivation + +The sandbox proxy has accumulated separate egress paths for CONNECT, forward +HTTP, local services, inference routing, endpoint metadata, credential +injection, and L7 policy. That makes security changes easy to apply to one path +and miss in another. + +The target shape separates three concerns: + +- **Adapters** describe how userland reached the proxy. +- **Authorization** decides whether that egress is allowed and what endpoint + behavior applies. +- **Relays** own bytes, credentials, protocol parsing, and upstream dialing. + +## Non-goals + +- Replace CONNECT with forward proxy as the only explicit proxy mode. +- Add SOCKS support. +- Add HTTP/2 L7 parsing in this refactor. +- Redesign provider credential storage. +- Reintroduce iptables as the sandbox packet filtering backend. +- Use eBPF connect hooks for transparent capture. Native TCP capture needs a + userland proxy in the byte stream for TLS termination and protocol parsing. + +## Proposal + +### Migration Big Rocks + +1. **Transport adapters.** CONNECT, forward HTTP, transparent TCP, policy DNS, + and local service routes become small entry adapters. They parse their + surface and produce either an egress intent, a local response, or a DNS + answer. They do not duplicate policy evaluation. +2. **Egress intent and decision.** The shared authorization boundary evaluates + L4 policy once per connection intent and returns one decision containing the + matched policy, matched endpoint, process identity, allowed IP metadata, TLS + behavior, and protocol enforcement. +3. **Relays.** Relays receive an authorized destination connector, not an + already-open upstream socket. HTTP relays evaluate every request before + dialing, own REST request-body credential rewrite, and hand allowed + WebSocket upgrades to the WebSocket relay. TCP application parsers own their + protocol loop and decide when a validated upstream connection is needed. + +### Unified Adapter Flow + +```mermaid +flowchart TD + User["Userland payload / harness"] + + subgraph ExplicitProxy["Explicit proxy listener"] + ProxyBytes["HTTP proxy bytes"] + IsConnect{"CONNECT request?"} + Connect["CONNECT adapter"] + Forward["Forward HTTP adapter"] + ProxyBytes --> IsConnect + IsConnect -- Yes --> Connect + IsConnect -- No --> Forward + end + + subgraph NativeTcp["Policy DNS + native TCP"] + NameLookup["Userland DNS lookup"] + PolicyDns["Policy DNS adapter"] + DnsAnswer["DNS answer"] + NativeConnect["Userland connect(ip:port)"] + TcpAdapter["Transparent TCP adapter"] + NameLookup --> PolicyDns + PolicyDns --> DnsAnswer + DnsAnswer --> NativeConnect + NativeConnect --> TcpAdapter + end + + subgraph LocalApis["Sandbox-local services"] + InferenceReq["Request to inference.local"] + PolicyReq["Request to policy.local"] + InferenceAdapter["Inference local adapter"] + PolicyAdapter["Policy local adapter"] + InferenceReq --> InferenceAdapter + PolicyReq --> PolicyAdapter + end + + subgraph Shared["Shared egress pipeline"] + Intent["Egress intent"] + Auth["Authorize and select endpoint"] + Decision["Egress decision"] + Validate["Resolve and validate destination"] + Relay["Relay"] + Deny["Adapter-specific deny response"] + Intent --> Auth + Auth --> Allowed{"Allowed?"} + Allowed -- No --> Deny + Allowed -- Yes --> Decision + Decision --> Validate + Validate --> Relay + end + + User --> ProxyBytes + User --> NameLookup + User --> NativeConnect + User --> InferenceReq + User --> PolicyReq + + Connect --> Intent + Forward --> Intent + TcpAdapter --> Intent + InferenceAdapter --> InferenceResp["Local inference response"] + PolicyAdapter --> PolicyResp["Local policy response"] +``` + +### Relay Flow + +```mermaid +flowchart TD + Start["Authorized egress + destination connector"] + Start --> HasFirst{"First HTTP request already parsed?"} + + HasFirst -- Yes --> ForwardMode{"Selected enforcement"} + ForwardMode -- "L4 only" --> HttpCred["HTTP relay
credential injection only"] + ForwardMode -- "HTTP rules" --> HttpL7["HTTP relay
REST/GraphQL/WebSocket policy"] + ForwardMode -- "TCP app rules" --> BadForward["Deny: HTTP request for TCP app endpoint"] + + HasFirst -- No --> Inspect["Inspect tunnel or native stream bytes"] + Inspect --> SkipTls{"Endpoint says skip TLS handling?"} + SkipTls -- Yes --> TcpBytes["TCP relay
byte copy"] + SkipTls -- No --> Peek["Peek client bytes"] + Peek --> IsTls{"TLS ClientHello?"} + IsTls -- Yes --> Tls["Shared TLS terminator"] + IsTls -- No --> Readable["Readable client stream"] + Tls --> Readable + + Readable --> Mode{"Selected enforcement"} + Mode -- "L4 only" --> SniffHttp{"Looks like HTTP?"} + SniffHttp -- Yes --> HttpCred + SniffHttp -- No --> TcpBytes + + Mode -- "HTTP rules" --> MustHttp{"Looks like HTTP?"} + MustHttp -- Yes --> HttpL7 + MustHttp -- No --> DenyHttp["Deny: expected HTTP"] + + Mode -- "TCP app rules" --> TcpParser["TCP relay
application parser owns loop"] + + HttpCred --> Creds["Resolve and redact credentials"] + HttpL7 --> CredsL7["Resolve and redact credentials"] + CredsL7 --> ParseHttp["Parse and evaluate each HTTP request"] + ParseHttp --> HttpAllowed{"Request allowed?"} + HttpAllowed -- No --> HttpDeny["Local HTTP deny
no upstream connect"] + HttpAllowed -- Yes --> Rewrite["Rewrite configured credential slots"] + Creds --> Rewrite + Rewrite --> HttpDial["Connect or reuse upstream"] + HttpDial --> HttpResponse["Write request and relay response"] + HttpResponse --> Upgrade{"101 WebSocket upgrade?"} + Upgrade -- No --> NextHttp["Continue HTTP request loop"] + Upgrade -- Yes --> WsMode{"WebSocket inspection needed?"} + WsMode -- No --> RawUpgrade["Raw upgraded stream"] + WsMode -- Yes --> WsRelay["WebSocket relay
text-frame rewrite / message policy"] + NextHttp --> ParseHttp + + TcpParser --> ParserDial["Parser dials upstream when protocol allows"] + TcpBytes --> TcpDial["Connect upstream"] + TcpDial --> ByteCopy["Copy bytes"] +``` + +Relay rules: + +- HTTP credential injection happens in both HTTP modes: L4-only HTTP and + HTTP-inspected. +- Credential injection includes request target, query, headers, opt-in REST + request bodies, and opt-in client-to-server WebSocket text frames. +- HTTP L7 policy is evaluated before upstream dial for each request. +- WebSocket upgrade policy is evaluated as HTTP first. After an allowed `101` + upgrade, the WebSocket relay owns frame parsing when text-frame credential + rewrite, WebSocket transport policy, GraphQL-over-WebSocket policy, or safe + compression handling is configured. Other upgraded streams remain raw. +- Forward HTTP must stay in the shared HTTP relay loop. It must not evaluate + one request and then switch to raw bidirectional copy. Keeping forward HTTP + single-request with `Connection: close` is also acceptable, but the invariant + is that no follow-on request bytes reach upstream unevaluated. +- `protocol: tcp` means L4 authorization plus byte copy unless HTTP is detected + for credential injection. +- Future TCP application parsers, such as Redis or Postgres, own the full + message loop and can parse multiple commands over one TCP session. + +### CONNECT Adapter + +CONNECT remains the standard explicit proxy tunnel for HTTPS and arbitrary TCP. +It parses the CONNECT line into an egress intent, then waits for the shared +relay to decide if and when an upstream connection should be opened. + +```mermaid +flowchart TD + Client["Client sends CONNECT host:port"] --> Parse["Parse target"] + Parse --> Intent["Build egress intent"] + Intent --> Auth["Shared authorization"] + Auth --> Allowed{"Destination allowed?"} + Allowed -- No --> Deny["CONNECT deny response"] + Allowed -- Yes --> Ready["Return tunnel-ready response"] + Ready --> Relay["Relay inspects tunneled bytes"] + Relay --> Dial["Relay or parser connects upstream when allowed"] +``` + +CONNECT should stay because forward proxy is only a plaintext HTTP request +format. CONNECT is still the generic explicit proxy mode for TLS and non-HTTP +TCP clients. + +### Forward HTTP Adapter + +Forward HTTP is compatibility for clients that send absolute-form HTTP requests. +The adapter parses the first request and hands it to the shared HTTP relay or +an equivalent guarded single-request relay. + +```mermaid +flowchart TD + Req["Absolute-form HTTP request"] --> Parse["Parse URI and first request"] + Parse --> Intent["Build egress intent"] + Intent --> Auth["Shared authorization"] + Auth --> Allowed{"Allowed?"} + Allowed -- No --> Deny["HTTP deny response"] + Allowed -- Yes --> Relay["Shared or guarded HTTP relay"] + Relay --> Mode{"Connection mode"} + Mode -- "Persistent" --> Loop["Evaluate every request on this connection"] + Mode -- "Single request" --> Close["Force Connection: close"] +``` + +### Transparent TCP Adapter + +Transparent TCP supports native clients that do not know they are using a +proxy. The capture mechanism should be network namespace interception into a +userland proxy listener. Since main now uses nftables for sandbox bypass +enforcement, transparent capture should be designed as nftables +REDIRECT/TPROXY state in the inner sandbox network namespace, not as an +iptables path. + +```mermaid +flowchart TD + Policy["Policy load / reload"] --> Register["Register native TCP names"] + Lookup["Userland DNS lookup"] --> Dns["Policy DNS adapter"] + Register --> Dns + Dns --> Answer["Return approved IPs"] + Answer --> Capture["Enable capture for active IP:port"] + Connect["Userland connect(ip:port)"] --> Capture + Capture --> Adapter["Transparent TCP adapter"] + Adapter --> Intent["Build egress intent from original destination"] + Intent --> Shared["Shared authorization and relay"] +``` + +### Policy DNS + +Policy DNS replaces static `/etc/hosts` snapshots for native TCP names. It is +query-driven: check whether the name is policy-eligible, resolve through trusted +DNS, filter returned IPs, publish the active endpoint mapping, and answer +userland. + +```mermaid +flowchart TD + Query["DNS query from userland"] --> Adapter["Policy DNS adapter"] + Adapter --> Known{"Registered native TCP policy name?"} + Known -- No --> Refuse["NXDOMAIN / REFUSED / SERVFAIL"] + Known -- Yes --> Upstream["Trusted upstream DNS lookup"] + Upstream --> Filter["Filter answers against endpoint policy"] + Filter --> Publish["Publish active mapping and capture rule"] + Publish --> Answer["DNS answer"] +``` + +The later `connect(ip:port)` still creates the egress intent and runs through +normal authorization. + +### Network Enforcement Substrate + +Current main uses nftables for bypass enforcement. It accepts proxy-bound +traffic and loopback, accepts established flows, then rejects and optionally +logs other TCP/UDP traffic for the bypass monitor. That is enforcement, not +native TCP capture. + +```mermaid +flowchart TD + Conn["Userland packet"] --> ProxyDest{"Proxy destination?"} + ProxyDest -- Yes --> AcceptProxy["nftables accept"] + ProxyDest -- No --> Capture{"Future native TCP capture match?"} + Capture -- Yes --> Redirect["nftables redirect/TPROXY to transparent adapter"] + Capture -- No --> Reject["nftables log + reject bypass"] + Reject --> Monitor["Bypass monitor emits OCSF"] +``` + +The transparent TCP work should extend this nftables model with explicit +capture rules that run before the reject path and are scoped to active policy +DNS mappings. + +### Local Service Adapters + +`inference.local` and `policy.local` are sandbox-local APIs. They should use +the adapter model, but they do not represent normal external egress. + +```mermaid +flowchart TD + A["Request to inference.local"] --> B["Inference local adapter"] + B --> C{"TLS and inference context available?"} + C -- No --> D["Local denial and log"] + C -- Yes --> E["Terminate client TLS"] + E --> F["Parse HTTP request"] + F --> G{"Known inference route?"} + G -- Yes --> H["Route through openshell-router"] + H --> I["Strip caller auth and inject provider auth/model"] + I --> J["Stream response with limits"] + G -- No --> K["403 and close"] +``` + +```mermaid +flowchart TD + A["Request to policy.local"] --> B["Policy local adapter"] + B --> C{"Local route"} + C -- "Current policy" --> D["Policy snapshot response"] + C -- "Recent denials" --> E["Bounded denial summaries"] + C -- "Policy proposal" --> F["Validate and submit proposal"] + D --> G["HTTP response"] + E --> G + F --> G +``` + +### Deployment Modes + +The first implementation can remain embedded in `openshell-sandbox`, but the +proxy should be shaped around explicit runtime contracts. + +| Mode | Shape | Main concern | +|------|-------|--------------| +| Embedded | Current sandbox process owns proxy modules | Lowest migration cost | +| Standalone process | Sandbox supervisor launches a proxy binary | Clear process/API boundary | +| Sidecar | Proxy runs outside the payload container but inside the sandbox boundary | Reliable process identity across namespaces | + +A pluggable proxy must expose the configured userland surfaces, implement the +gateway APIs it needs, and prove equivalent policy enforcement through tests. +The nftables rules that force or reject userland traffic belong to the sandbox +network boundary even if the proxy process later moves into a standalone binary +or sidecar. + +## Implementation plan + +The migration plan lives in [implementation-plan.md](implementation-plan.md). +The intended order is: first add regression coverage, then introduce the shared +authorization result and destination validation, then preserve the current +forward HTTP single-request/guarded-relay invariant, then add shared TLS +handling, TCP parser boundaries, nftables-backed policy DNS capture, local +service adapters, and finally the runtime boundary cleanup. + +## Risks + +- Tightening endpoint metadata failures from fail-open to deny may expose + latent policy or Rego errors. +- Deterministic endpoint selection may reject policies that currently load but + only work by accident. +- Transparent TCP capture adds network namespace interception complexity. +- Transparent TCP capture must coexist with the current nftables bypass + reject/log table without creating gaps where direct egress bypasses the proxy. +- Sidecar mode needs a reliable identity source for binary/path scoped policy. +- `policy.local` expands the sandbox-local control surface and needs strict + route validation, body limits, redaction, and gateway authentication. + +## Alternatives + +- Keep patching each current proxy path separately. This has the lowest short + term cost but keeps the security surface duplicated. +- Replace CONNECT with forward proxy. This does not work for arbitrary TCP and + is not a replacement for HTTPS tunnels. +- Build only transparent TCP. This helps native clients but does not replace + explicit proxy support used by common HTTP tooling. + +## Open questions + +1. Should overlapping endpoint metadata be rejected at policy load time, or + should policy name plus endpoint index define precedence? +2. Should missing TLS state fail closed for credential-capable or inspected + endpoints? +3. Should direct IP connects to a policy-DNS-resolved TCP endpoint be accepted, + or should DNS query correlation be required for stricter modes? +4. What TTL cap and stale-generation grace period should policy DNS use? +5. Which process identity source should sidecar mode use when it cannot inspect + payload process metadata through local `/proc`? +6. Which proxy capabilities should be negotiated with the gateway at startup? + +## Expected result + +Adding a new HTTP-family protocol parser should require parser code, policy +schema/Rego support, tests, and docs. It should not require new CONNECT and +forward-proxy branches. REST, GraphQL, WebSocket upgrade policy, request-body +credential rewrite, and WebSocket text-frame rewrite should all remain behind +the shared HTTP/WebSocket relay boundary. + +Adding a native TCP application parser should require policy DNS/capture +support, a TCP application parser, policy rules, tests, and docs. Plain +`protocol: tcp` remains L4 authorization plus byte relay. diff --git a/rfc/0004-sandbox-proxy-egress-adapter/current-shape.md b/rfc/0004-sandbox-proxy-egress-adapter/current-shape.md new file mode 100644 index 000000000..b428fed14 --- /dev/null +++ b/rfc/0004-sandbox-proxy-egress-adapter/current-shape.md @@ -0,0 +1,167 @@ +# Current Shape Appendix + +This appendix records the current proxy shape and the review findings that +motivate the adapter model. The main RFC intentionally keeps these details out +of the direction document. + +## Current Entry Points + +The sandbox proxy currently handles multiple userland-facing paths in the same +large module: + +- CONNECT proxy traffic for HTTPS and generic TCP tunnels. +- Forward HTTP proxy traffic for absolute-form HTTP requests. +- Local service routes such as `inference.local`. +- Network namespace bypass enforcement through nftables reject/log rules. +- Policy and endpoint metadata lookups through OPA/Rego. +- DNS resolution and endpoint validation for CONNECT and forward HTTP egress. +- Credential injection and redaction for provider-backed HTTP egress. +- Opt-in REST request-body credential rewrite. +- L7 REST, GraphQL, WebSocket, and GraphQL-over-WebSocket enforcement. + +The issue is not that these features exist. The issue is that entry mechanisms, +policy evaluation, endpoint metadata lookup, credential injection, and byte +relay decisions are interleaved. + +## Current CONNECT Shape + +```mermaid +flowchart TD + Client["Client CONNECT host:port"] --> Parse["Parse CONNECT target"] + Parse --> L4["Evaluate network policy"] + L4 --> Allowed{"Allowed?"} + Allowed -- No --> Deny["CONNECT denial"] + Allowed -- Yes --> Meta["Query endpoint metadata"] + Meta --> Config{"L7 or credential config?"} + Config -- No --> Raw["Open upstream and copy bytes"] + Config -- Yes --> Tunnel["Return tunnel-ready response"] + Tunnel --> Inspect["Parse tunneled HTTP when possible"] + Inspect --> L7["Evaluate HTTP policy"] + L7 --> Inject["Inject credentials if configured"] + Inject --> Upstream["Write upstream and relay response"] +``` + +This path has the strongest HTTP relay behavior because it can keep parsing +requests on a long-lived tunnel and enforce L7 rules per request. + +## Current Forward HTTP Shape + +```mermaid +flowchart TD + Client["Absolute-form HTTP request"] --> Parse["Parse first request"] + Parse --> L4["Evaluate network policy"] + L4 --> Allowed{"Allowed?"} + Allowed -- No --> Deny["HTTP denial"] + Allowed -- Yes --> L7{"Matching L7 endpoint?"} + L7 -- Yes --> Eval["Evaluate REST/GraphQL/WebSocket policy"] + Eval --> Rewrite["Rewrite to origin-form + configured credentials"] + L7 -- No --> Rewrite + Rewrite --> Close["Force Connection: close except WebSocket upgrade"] + Close --> Upstream["Open upstream"] + Upstream --> Relay["Guarded HTTP relay / upgrade relay"] +``` + +The latest main branch no longer has the old raw-copy-after-first-request shape +for ordinary forward HTTP. It rewrites ordinary requests with `Connection: +close`, uses guarded HTTP relay helpers for body handling, and sends allowed +WebSocket upgrades through the same upgrade relay. That is a narrower surface +than the historical bidirectional copy, but it is still implemented separately +from the CONNECT relay path. + +## Current Network Namespace Enforcement + +```mermaid +flowchart TD + Start["Process in sandbox network namespace"] --> Dest{"Destination"} + Dest -- "Proxy host_ip:port" --> Proxy["Accept to sandbox proxy"] + Dest -- "Loopback" --> Loopback["Accept loopback"] + Dest -- "Established/related" --> Established["Accept response packet"] + Dest -- "Other TCP/UDP" --> Reject["nftables log + reject"] + Reject --> Monitor["Bypass monitor reads dmesg"] + Monitor --> OCSF["OCSF network + detection events"] +``` + +The sandbox now installs an `inet` nftables filter table for bypass +enforcement. The table accepts proxy-bound traffic, loopback, and established +flows, then rejects and optionally logs other TCP/UDP traffic. It does not +currently redirect native TCP connections into the proxy. + +## Current Local Service Shape + +```mermaid +flowchart TD + Request["Request to local name"] --> Match{"Known local route?"} + Match -- "inference.local" --> Inference["Inference routing logic"] + Match -- "policy.local" --> Policy["Policy local logic"] + Match -- No --> External["Normal egress path"] + Inference --> LocalResponse["Local response"] + Policy --> LocalResponse +``` + +Local routes are userland-facing proxy surfaces. They should stay distinct from +external egress while still fitting the adapter model. + +## Findings To Preserve + +### Invariant: forward proxy must not relay unevaluated follow-on HTTP bytes + +The historical forward path evaluated at most the first absolute-form request, +rewrote it, then switched to bidirectional copy. Bytes already buffered after +the first header block, or later pipelined requests on the same client/upstream +connection, could reach upstream without the CONNECT L7 relay's per-request +parser/evaluator. + +Latest main mitigates this by forcing ordinary forward HTTP to one request per +connection and by using guarded relay helpers. The adapter model should +preserve the invariant either by keeping forward HTTP single-request/close or +by passing the first parsed request into a shared HTTP relay loop. + +### Endpoint config is not tied to deterministic matched policy + +The policy name used for L4 authorization and logging can be selected through a +different precedence rule than endpoint metadata. With overlapping host, port, +and binary rules, allowed IPs, TLS behavior, enforcement, and +`allow_encoded_slash` can come from a different endpoint than the policy name +logged and used for L4 allow. + +The adapter model requires authorization to return one decision with one +deterministic matched endpoint. + +### Endpoint metadata query failures fail open to L4 behavior + +If endpoint metadata lookup fails, callers can interpret the result as no L7 +configuration and downgrade to credential-only or raw L4 relay. + +The adapter model treats endpoint metadata as part of the authorization result. +Failure to materialize required metadata should deny rather than erase extended +configuration. + +### Control-plane port block only applies on one resolution path + +Blocked control-plane ports are enforced inside one allowed-IPs validation +path, while the normal host-based path uses a different validation route. + +The adapter model moves resolution, allowed IP checks, SSRF checks, and +control-plane port blocks into shared destination validation. + +## Existing Feature Inventory + +The refactor should preserve: + +- CONNECT explicit proxy support. +- Forward HTTP explicit proxy support. +- nftables bypass reject/log enforcement. +- Provider credential injection and redaction. +- REST request-body credential rewrite. +- WebSocket text-frame credential rewrite. +- REST endpoint method/path policy. +- GraphQL L7 policy. +- WebSocket transport and GraphQL-over-WebSocket policy. +- Inference routing through `inference.local`. +- Agent-facing policy routes through `policy.local`. +- Timeout and resource tracking for client, upstream, and local service work. +- Structured OCSF logging for network and HTTP policy outcomes. +- SSRF and internal address protections. +- Control-plane port protection. +- `allowed_ips` endpoint restrictions. +- TLS termination for inspectable client connections. diff --git a/rfc/0004-sandbox-proxy-egress-adapter/implementation-plan.md b/rfc/0004-sandbox-proxy-egress-adapter/implementation-plan.md new file mode 100644 index 000000000..94ba53b7f --- /dev/null +++ b/rfc/0004-sandbox-proxy-egress-adapter/implementation-plan.md @@ -0,0 +1,127 @@ +# Implementation Plan + +This plan is intentionally separate from the main RFC so the proposal can stay +direction-focused. + +## Phase 0 - Regression Tests + +- Add tests for forward HTTP pipelining and keep-alive follow-on requests, + including the current `Connection: close` mitigation. +- Add tests for overlapping endpoint metadata selection. +- Add tests for endpoint metadata query failures. +- Add tests for control-plane port blocking through all destination validation + paths. +- Add nftables bypass enforcement tests that verify proxy-bound traffic is + accepted while direct TCP/UDP egress is rejected and logged when available. + +## Phase 1 - Authorization Result + +- Introduce `EgressIntent` and `EgressDecision`. +- Make authorization return matched policy and matched endpoint metadata + together. +- Fail closed when required endpoint metadata cannot be materialized. +- Emit consistent OCSF network denial events from the shared boundary. + +## Phase 2 - Shared Destination Validation + +- Move DNS resolution, allowed IP filtering, SSRF checks, and control-plane port + checks into one destination validation path. +- Return an `UpstreamConnector` rather than an opened upstream socket. +- Add tests proving CONNECT, forward HTTP, and transparent TCP use the same + validation behavior. + +## Phase 3 - Forward HTTP Adapter + +- Convert forward HTTP into an adapter that parses the first absolute-form + request and builds an egress intent. +- Route the parsed first request into the shared HTTP relay or preserve the + current guarded single-request relay behavior. +- Keep the no-raw-copy invariant after the first request. + +## Phase 4 - HTTP And WebSocket Relay Consolidation + +- Centralize HTTP request parsing, REST policy, GraphQL policy, WebSocket + upgrade policy, credential resolution, redaction, request rewrite, upstream + dial, and response relay. +- Evaluate every HTTP request before upstream write. +- Ensure denied HTTP requests do not create upstream TCP sessions. +- Preserve opt-in REST request-body credential rewrite behind the shared HTTP + relay, including bounded buffering, supported content-type handling, + `Content-Length` recomputation, and fail-closed unresolved placeholders. +- Preserve WebSocket upgrade handling behind the shared relay, including + opt-in client-to-server text-frame credential rewrite, WebSocket transport + message policy, GraphQL-over-WebSocket policy, and raw passthrough for other + upgraded protocols. + +## Phase 5 - Shared TLS Termination + +- Move client-side TLS detection and termination before the HTTP/TCP relay + split. +- Keep endpoint TLS behavior on `EgressDecision`. +- Remove duplicate HTTP-specific and TCP-specific TLS termination decisions. + +## Phase 6 - TCP Relay And Parser Boundary + +- Rename raw TCP relay concepts to `TcpRelay`. +- Add a TCP application parser dispatch point for future protocol enforcement. +- Keep `protocol: tcp` as L4 authorization plus byte copy. +- Let TCP application parsers own their message loop and call the connector + when protocol state allows. + +## Phase 7 - Policy DNS And Transparent TCP + +- Add policy DNS registration for native TCP endpoint names. +- Replace static host-file mapping with query-driven DNS answers. +- Publish active DNS answer state and capture rules. +- Implement nftables REDIRECT/TPROXY capture rules ahead of the bypass reject + path; do not add a parallel iptables path. +- Implement transparent TCP adapter lookup from captured original destination + to active endpoint generation. +- Decide TTL and stale-generation behavior. + +## Phase 8 - Local Service Adapters + +- Model `inference.local` as a local adapter with TLS termination, route + validation, provider auth injection, streaming limits, and OCSF logging. +- Model `policy.local` as a local adapter for current policy, bounded denial + summaries, and policy proposals. +- Keep both paths outside normal external egress relay. + +## Phase 9 - Runtime Boundary + +- Keep embedded mode for the first migration. +- Define the proxy runtime API needed for a future standalone binary: + configured listeners, policy updates, gateway calls, telemetry, and shutdown. +- Identify process identity requirements for standalone and sidecar modes. + +## Phase 10 - Cleanup + +- Remove duplicated endpoint metadata queries from relay paths. +- Remove duplicated deny rendering where adapters can own response shape. +- Remove any remaining forward HTTP raw-copy fallback. +- Update architecture docs once implementation lands. + +## Testing Plan + +- Unit-test each adapter's intent construction and deny response shape. +- Unit-test authorization precedence for overlapping policy and endpoint rules. +- Integration-test shared destination validation across CONNECT, forward HTTP, + and transparent TCP. +- Integration-test HTTP keep-alive and pipelined requests with REST, GraphQL, + and WebSocket upgrade enforcement. +- Integration-test credential injection in L4-only HTTP and HTTP-inspected + paths. +- Integration-test REST request-body credential rewrite for JSON, + form-url-encoded, `text/*`, unsupported content types, chunked framing, body + caps, and unresolved placeholders. +- Integration-test WebSocket text-frame credential rewrite, raw upgraded + passthrough, WebSocket message policy, GraphQL-over-WebSocket policy, and + safe compression negotiation. +- Integration-test TLS termination before HTTP/TCP relay split. +- Integration-test `protocol: tcp` byte-copy behavior. +- Add parser harness tests before adding Redis, Postgres, or similar TCP + application parsers. +- Integration-test policy DNS TTL, stale generation handling, and captured + connect correlation. +- Integration-test `inference.local` and `policy.local` body limits, timeout + behavior, redaction, and local denial responses. diff --git a/rfc/0004-sandbox-proxy-egress-adapter/technical-design.md b/rfc/0004-sandbox-proxy-egress-adapter/technical-design.md new file mode 100644 index 000000000..b13e259f4 --- /dev/null +++ b/rfc/0004-sandbox-proxy-egress-adapter/technical-design.md @@ -0,0 +1,259 @@ +# Technical Design Appendix + +This appendix carries the implementation-level design details behind the main +RFC. + +## Shared Data Boundaries + +### EgressIntent + +`EgressIntent` is the normalized description of what userland is trying to do. + +It should carry: + +- entry transport: CONNECT, forward HTTP, transparent TCP, or local HTTP; +- requested destination host/port or captured original IP/port; +- process identity inputs collected by the adapter/runtime; +- optional first HTTP request for forward proxy traffic; +- optional local service route. + +Adapters build intents. They should not query endpoint metadata or select +relays. + +### EgressDecision + +`EgressDecision` is the policy result consumed by validation and relay code. + +It should carry: + +- allow or deny; +- deterministic matched policy identifier; +- deterministic matched endpoint identifier and endpoint metadata; +- process identity used for evaluation; +- destination and allowed IP constraints; +- TLS behavior; +- protocol enforcement; +- logging context and denial reason. + +Relay code should read this decision. It should not query OPA again for +endpoint metadata, TLS mode, allowed IPs, or parser selection. + +## Protocol Enforcement + +Use a protocol enforcement value derived from endpoint policy: + +| Policy protocol | Enforcement | Relay behavior | +|-----------------|-------------|----------------| +| omitted / `tcp` | None | L4 authorization plus byte relay, with optional HTTP sniff for credential injection | +| `rest` | HTTP | HTTP request parser with REST rules, plus opt-in request-body and WebSocket text-frame credential rewrite | +| `graphql` | HTTP | HTTP request parser with GraphQL rules | +| `websocket` | HTTP | HTTP upgrade policy followed by WebSocket frame policy or GraphQL-over-WebSocket policy | +| future `redis`, `postgres`, `mysql`, ... | TCP application | Protocol-specific TCP parser owns the message loop | + +`protocol: tcp` is effectively the default L4 mode. It should not run TCP +application parsers. + +Avoid using the term "provider" for these parser concepts because providers +are already a first-class credential and routing domain in OpenShell. + +## Suggested Types + +The exact Rust shape can evolve, but the boundaries should look like this: + +```rust +enum EgressTransport { + Connect, + ForwardHttp, + TransparentTcp, + LocalHttp, +} + +struct EgressIntent { + transport: EgressTransport, + destination: RequestedDestination, + process: ProcessIdentity, + first_request: Option, + local_route: Option, +} + +struct EgressDecision { + outcome: PolicyOutcome, + matched_policy: Option, + endpoint: Option, + log_context: EgressLogContext, +} + +struct MatchedEndpoint { + id: EndpointId, + allowed_ips: AllowedIpPolicy, + tls: TlsPolicy, + enforcement: ProtocolEnforcement, +} + +enum ProtocolEnforcement { + None, + Http(HttpL7Config), + TcpApplication(TcpApplicationConfig), +} + +enum HttpL7Protocol { + Rest, + Graphql, + Websocket, +} + +struct HttpL7Config { + protocol: HttpL7Protocol, + allow_encoded_slash: bool, + websocket_credential_rewrite: bool, + request_body_credential_rewrite: bool, + websocket_graphql_policy: bool, +} + +struct RelayContext { + decision: EgressDecision, + connector: UpstreamConnector, + deadlines: RelayDeadlines, + telemetry: RelayTelemetry, +} +``` + +`UpstreamConnector` is the relay-owned dial boundary. It encapsulates the +validated destination and lets relays/parsers open an upstream connection only +after protocol policy allows it. + +## Module Layout + +A future split could look like: + +| Module | Responsibility | +|--------|----------------| +| `proxy::adapter::connect` | Parse CONNECT and render CONNECT responses | +| `proxy::adapter::forward_http` | Parse absolute-form HTTP and preserve first request | +| `proxy::adapter::transparent_tcp` | Recover captured original destination | +| `proxy::adapter::policy_dns` | Answer eligible DNS queries and publish active mappings | +| `proxy::adapter::local` | Implement `inference.local` and `policy.local` surfaces | +| `proxy::auth` | Build decisions from intents and OPA results | +| `proxy::destination` | Resolve, filter, and validate destinations | +| `proxy::netfilter` | Own nftables bypass and future transparent capture rules | +| `proxy::relay::http` | HTTP request loop, credentials, REST/GraphQL/WebSocket upgrade policy | +| `proxy::relay::websocket` | WebSocket frame validation, text-frame rewrite, and message policy | +| `proxy::relay::tcp` | TCP byte relay and TCP application parser dispatch | +| `proxy::relay::tls` | Shared client-side TLS termination | +| `proxy::parser` | HTTP, WebSocket, and TCP application parser traits/config | +| `proxy::telemetry` | OCSF and tracing helpers | + +## Policy DNS And Resolved TCP State + +Policy DNS should be query-driven rather than a static `/etc/hosts` snapshot. + +1. Policy load registers eligible native TCP endpoint names. +2. Userland performs DNS lookup. +3. Policy DNS checks whether the name is registered for native TCP. +4. Policy DNS resolves through trusted upstream DNS. +5. Answers are filtered against endpoint metadata and SSRF controls. +6. The adapter publishes the DNS answer, endpoint generation, and capture rule. +7. Userland later calls `connect(ip:port)`. +8. Transparent TCP recovers the original destination and maps it to the active + endpoint generation. +9. Normal egress authorization and relay selection run. + +The resolved endpoint store is therefore not a preemptive global DNS snapshot. +It is active state produced by policy-eligible lookups and consumed by +transparent TCP connects. + +## nftables Boundary + +Current main uses nftables, not iptables, for sandbox network bypass +enforcement. The installed `inet` table accepts traffic to the sandbox proxy, +loopback, and established/related flows, then rejects and optionally logs other +TCP/UDP traffic. The bypass monitor reads those log lines and emits OCSF +network and detection events. + +Transparent TCP capture should build on this same nftables substrate: + +- capture rules must run before the generic bypass reject rules; +- capture rules should be scoped to active policy DNS IP/port mappings; +- capture state should be updated atomically with endpoint generation changes; +- reject/log rules remain the fallback for unmatched TCP/UDP egress; +- VM or Podman driver nftables rules are infrastructure NAT/isolation and + should not be treated as the proxy policy enforcement point. + +## Endpoint Selection And OPA + +OPA/Rego should return policy and endpoint metadata through one deterministic +authorization result. It should not let policy name and endpoint config be +selected by different precedence rules. + +Two acceptable approaches: + +- Reject overlapping endpoint metadata at load or merge time. +- Define a single deterministic precedence key and use it for both policy name + and endpoint metadata. + +Endpoint metadata query failures should fail closed when metadata is required +for the selected endpoint. They should not silently downgrade to L4 behavior. + +## Credential Injection Boundary + +Credential injection belongs in the HTTP relay: + +1. Authorization selects the endpoint and confirms credentials may be used. +2. The HTTP relay resolves credentials only when it has an allowed HTTP request. +3. Secrets are redacted from logs and policy-visible metadata. +4. The final upstream request or frame is rewritten with real credentials + immediately before write. + +Both L4-only HTTP and HTTP-inspected paths can inject credentials. The +difference is whether REST, GraphQL, or WebSocket policy is evaluated before +the rewrite. + +Credential rewrite slots should be explicit: + +- request target, query values, and headers for HTTP-family traffic; +- REST request bodies only when `request_body_credential_rewrite` is enabled; +- client-to-server WebSocket text frames only when + `websocket_credential_rewrite` is enabled; +- GraphQL-over-WebSocket connection/control messages when they are carried in + text frames and the endpoint enables the WebSocket rewrite path. + +Request-body rewrite is REST-only. It should buffer bounded UTF-8 textual +bodies, including JSON, form-url-encoded, and `text/*`, recompute +`Content-Length`, preserve unsupported bodies that contain no reserved +credential markers, and fail closed when a reserved placeholder cannot be +resolved safely. Binary WebSocket frames are not rewritten. + +## Parser Boundary + +Protocol parsers operate on streams owned by the relay. + +- HTTP parsing converts bytes into request metadata, evaluates request policy, + and loops for keep-alive or pipelined requests. +- WebSocket parsing starts only after an allowed HTTP upgrade. It validates the + handshake/frame stream and owns client-to-server text-frame inspection when + credential rewrite, transport message policy, GraphQL-over-WebSocket policy, + or compression handling is configured. +- TCP application parsers read client and upstream streams as needed and own + their message loop. +- A TCP parser can deny before dialing, dial for a server handshake, or keep + evaluating commands/queries throughout the session. + +This avoids a separate dial strategy enum. The parser knows which protocol +milestone is sufficient to call the validated connector. + +## Timeout And Resource Ownership + +| Owner | Resource | +|-------|----------| +| Adapter | Client-side parse timeout and adapter-specific deny response | +| Authorization | OPA deadline and policy evaluation telemetry | +| Destination validator | DNS timeout, allowed IP checks, SSRF checks, control-plane port checks | +| TLS terminator | Client TLS handshake timeout and certificate selection | +| HTTP relay | Per-request read/write deadlines, body caps, request-body rewrite caps, upstream reuse | +| WebSocket relay | Upgrade validation, frame limits, text-frame rewrite, compression limits, message policy | +| TCP relay | Byte-copy idle timeout and half-close handling | +| TCP parser | Protocol message timeouts and parser-specific limits | +| Local service adapter | Local route body limits, response caps, gateway call timeout | + +Timeouts should be recorded in telemetry at the owner boundary that can explain +the failure.