Skip to content

feat(policy): add agentic approval loop#1528

Draft
zredlined wants to merge 6 commits into
mainfrom
1097-agentic-policy-approval-loop
Draft

feat(policy): add agentic approval loop#1528
zredlined wants to merge 6 commits into
mainfrom
1097-agentic-policy-approval-loop

Conversation

@zredlined
Copy link
Copy Markdown
Collaborator

@zredlined zredlined commented May 22, 2026

Summary

Ships the agentic policy approval loop end-to-end. When the sandbox denies a network request, an agent inside the sandbox can propose a narrow policy refinement; the gateway runs a formal prover against the merged-policy delta; safe proposals (no new findings) auto-approve in ~1s; risky ones land in pending with structured evidence the reviewer can act on. The agent waits on a socket — zero LLM tokens burn during human review.

This is the loop the platform has been building toward: agents do the narrowing work, the prover catches changes the operator should know about, and the audit trail makes every approval reconstructable.

Closes #1097
Refs #1062
Refs #1532

What this PR ships

The loop. Sandbox denial → agent reads /etc/openshell/skills/policy_advisor.md → agent POSTs a narrow proposal to policy.local → gateway runs the prover → either auto-approve (empty delta) or pending (any finding) → on approval, sandbox hot-reloads → agent retries.

Prover wired in as the auto-approval referee. Every proposal (mechanistic and agent-authored alike) runs through openshell-prover. The prover answers four categorical questions about the proposed change — see What the prover decides. The gateway computes the delta vs the baseline policy and the auto-approval gate fires only when the delta is empty.

Providers-v2 in the loop. The prover validates against the effective policy — provider profiles composed in via providers-v2 are part of the model the prover reasons over. Agent-authored chunks for endpoints a provider profile covers land as their own rules (Fix A in merge.rs) instead of getting silently absorbed into the provider rule, so the prover sees the agent's narrow contribution honestly.

Default-deny posture preserved. Auto-approval is opt-in via the proposal_approval_mode runtime setting: gateway scope (openshell settings set --global proposal_approval_mode auto) or sandbox scope (openshell settings set <name> proposal_approval_mode auto), with gateway scope winning. Default ("manual", the absence of any setting) routes every proposal to human review regardless of prover verdict. CLI exposes a shorthand at create time: openshell sandbox create --approval-mode <manual|auto>, which writes the sandbox-scoped setting post-create. The audit event carries resolved_from=<gateway|sandbox|default> so operators can see why a given approval was auto vs manual.

Demo that walks the full loop. examples/agent-driven-policy-management/demo.sh runs a Codex agent through a two-path flow against a local gateway: one un-credentialed action auto-approves silently; one credentialed action escalates with a categorical finding, demo.sh approves on behalf, the agent retries and the file lands in GitHub. End-to-end in ~50–110s with one human-visible escalation, exactly the kind the prover cannot decide unilaterally.

Reconstructable audit. Every auto-approval emits a CONFIG:APPROVED OCSF event with unmapped fields auto=true, source=<mechanistic|agent_authored>, prover_delta=empty, and resolved_from=<gateway|sandbox|default>. The chunk's persisted validation_result carries the categorical finding lines for human-reviewed approvals.

Provider profile tightening. providers/github.yaml defaults api.github.com from read-write to read-only. Writes (gh / git via REST) now flow through the agentic loop — the loop becomes the on-ramp to write access, and the prover audits each capability change.

What the prover decides

The prover answers four formal questions about each proposed change. Each "yes" is its own categorical finding — no severity grade. Any finding blocks auto-approval; empty delta means the change is provably safe under the model.

Category The prover detects
link_local_reach Reach to a host in 169.254.0.0/16 or fe80::/10 (cloud-metadata range, serves credentials).
l7_bypass_credentialed A binary using a wire protocol the L7 proxy cannot inspect (git-remote-https, ssh, nc) reaches a host where a credential is in scope.
credential_reach_expansion A binary gains credentialed reach to a (host, port) it could not reach before.
capability_expansion On a (binary, host, port) that already had credentialed reach, the policy adds a new HTTP method. Finding cites the specific method.

Detail in crates/openshell-prover/README.md.

What the demo shows

==> Step 1 — un-credentialed reach (auto-approves)
   curl GET raw.githubusercontent.com/.../api.github.com.json
   prover: no findings (no credential in scope for the host)
   gateway: auto-approved in ~1s
   audit: "auto-approved: no new prover findings (source=agent_authored)"

==> Step 2 — credentialed capability change (escalates)
   curl PUT api.github.com/.../specific.md
   prover: credential_reach_expansion (or capability_expansion) on api.github.com:443
   gateway: pending — human review required
   demo.sh approves on behalf → agent retries → file lands in github

Acceptance criteria (deterministic, in tests)

  1. Un-credentialed reach auto-approves under auto-mode (zero findings, terminal status approved).
  2. Credentialed reach expansion lands in pending with credential_reach_expansion in validation_result.
  3. Capability expansion on an already-reached credentialed host lands in pending with capability_expansion citing the new method.
  4. Link-local reach lands in pending unconditionally with link_local_reach.
  5. L7-bypass binary with credential lands in pending with l7_bypass_credentialed.
  6. Implicit supersede works in both directions on (host, port, binary) overlap.
  7. Default approval mode is manual — empty delta does NOT auto-approve when the proposal_approval_mode setting is unset at both scopes, "manual", or any unknown future value. Gateway scope wins over sandbox scope.
  8. Approval mode resolves through settings: gateway scope wins over sandbox scope, and CLI --approval-mode auto writes the sandbox-scoped setting after create.
  9. Auto-approval audit carries auto=true, source=<mode>, prover_delta=empty, and resolved_from=<gateway|sandbox|default> as unmapped OCSF fields.
  10. Agent-submitted rule names using the reserved _provider_ prefix are rejected at submit time.
  11. Categorical findings (no severity tiers) appear in validation_result.

All covered by unit and integration tests in crates/openshell-server/src/grpc/policy.rs::tests.

Testing

  • cargo test --workspace --lib — 534 gateway tests, all 16 crates green.
  • cargo clippy -p openshell-server -p openshell-cli -p openshell-core --all-targets -- -D warnings — clean.
  • cargo fmt --check — clean.
  • ./examples/agent-driven-policy-management/demo.sh runs end-to-end against the local Docker gateway and writes the demo file to GitHub.

Explicitly deferred (follow-up PRs)

  • LLM-based contextual review layered on top of the deterministic gate.
  • Intent files / per-sandbox config of "which findings auto-reject vs. escalate."
  • Credential scope modeling (read-only vs write-scoped tokens).
  • MCP as a third L7 surface (REST + GraphQL + MCP).
  • Per-binary credential isolation (binaries see only the credentials their policy authorizes).
  • L7 watch mode for L4 grants (record HTTP requests through approved L4 tunnels for later L4→L7 conversion).
  • Trust tiers per sandbox class (production sandboxes get tighter defaults).
  • Dedicated CONFIG:AUTO_APPROVED OCSF event class (today reuses CONFIG:APPROVED with auto=true unmapped).
  • User-facing docs page under docs/ for the agentic loop.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

zredlined added 4 commits May 19, 2026 11:07
Signed-off-by: Alexander Watson <zredlined@gmail.com>
…roval

Run the prover on every proposal regardless of analysis_mode. Auto-approve
proposals whose merged-policy delta is empty (proposer-agnostic, with the
global-policy gate respected). Calibrate prover findings to a single HIGH
severity emitted on link-local hosts, L4+credential-in-scope, and
bypass-L7-binary+credential-in-scope. Add implicit supersede on
(host, port, binary): newer submissions auto-reject older pending chunks,
and incoming mechanistic chunks auto-reject when an approved agent_authored
chunk already covers the same endpoint.

Audit auto-approvals via CONFIG:APPROVED OCSF events carrying auto=true,
source=<mode>, prover_delta=empty as unmapped fields, with message text
"auto-approved: no new prover findings". Build credential set from
sandbox-attached providers (presence only — no scope modeling in v1).
Signed-off-by: Alexander Watson <zredlined@gmail.com>
The prover now answers four formal questions about a proposed policy
change and emits one finding per "yes" answer:

  - link_local_reach
  - l7_bypass_credentialed
  - credential_reach_expansion
  - capability_expansion

There is no severity grade. The category name is the signal; the
per-path evidence carries the structured detail. The auto-approval
gate is binary — empty delta or not. This removes the previous
HIGH/MEDIUM/CRITICAL severity tiers and the narrowness classifier
that was inconsistent across the access-shorthand / explicit-rules
boundary.

Gateway-side finding_delta gains category suppression:
capability_expansion paths whose (binary, host, port) appears in the
credential_reach_expansion delta are suppressed, so a brand-new
credentialed reach surfaces as one finding rather than one reach plus
N method findings.

The github provider profile now defaults api.github.com to read-only
(was: read-write). Writes flow through the agentic loop — the prover
audits each capability change rather than treating broad write access
as the default.

Demo, sandbox skill, and architecture docs updated to describe the
four-category model. Prover gains a README.md documenting the formal
queries, evidence shape, and how to add a new category.
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

zredlined added 2 commits May 22, 2026 13:45
…iasing

Move proposal_approval_mode out of SandboxSpec and into the existing runtime-mutable settings model so it can be flipped on a running sandbox and pinned fleet-wide via gateway scope. Precedence matches the rest of the settings model: gateway wins over sandbox, default is manual. The CLI's --approval-mode flag on `sandbox create` is now a shorthand that writes the sandbox-scoped setting post-create. Auto-approval audit events carry resolved_from=<gateway|sandbox|default>.

Reject agent proposals whose rule_name starts with `_provider_`. That namespace is reserved for provider-profile-synthesized rules; allowing agents to address them by name would bypass the merge guard that splits agent contributions into their own rule so the prover sees them honestly.

Refs #1097
@zredlined zredlined self-assigned this May 23, 2026
@zredlined zredlined added the topic:l7 Application-layer policy and inspection work label May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:l7 Application-layer policy and inspection work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(gateway): persist and validate agent policy proposal operations

1 participant