feat(policy): add agentic approval loop by zredlined · Pull Request #1528 · NVIDIA/OpenShell

zredlined · 2026-05-22T15:26:28Z

Summary

Ships the agentic policy approval loop end-to-end. When the sandbox denies a network request, an agent inside the sandbox can propose a narrow policy refinement; the gateway runs a formal prover against the merged-policy delta; safe proposals (no new findings) auto-approve in ~1s; risky ones land in pending with structured evidence the reviewer can act on. The agent waits on a socket — zero LLM tokens burn during human review.

This is the loop the platform has been building toward: agents do the narrowing work, the prover catches changes the operator should know about, and the audit trail makes every approval reconstructable.

Closes #1097
Refs #1062
Refs #1532

What this PR ships

The loop. Sandbox denial → agent reads /etc/openshell/skills/policy_advisor.md → agent POSTs a narrow proposal to policy.local → gateway runs the prover → either auto-approve (empty delta) or pending (any finding) → on approval, sandbox hot-reloads → agent retries.

Prover wired in as the auto-approval referee. Every proposal (mechanistic and agent-authored alike) runs through openshell-prover. The prover answers four categorical questions about the proposed change — see What the prover decides. The gateway computes the delta vs the baseline policy and the auto-approval gate fires only when the delta is empty.

Providers-v2 in the loop. The prover validates against the effective policy — provider profiles composed in via providers-v2 are part of the model the prover reasons over. Agent-authored chunks for endpoints a provider profile covers land as their own rules (Fix A in merge.rs) instead of getting silently absorbed into the provider rule, so the prover sees the agent's narrow contribution honestly.

Default-deny posture preserved. Auto-approval is opt-in via the proposal_approval_mode runtime setting: gateway scope (openshell settings set --global proposal_approval_mode auto) or sandbox scope (openshell settings set <name> proposal_approval_mode auto), with gateway scope winning. Default ("manual", the absence of any setting) routes every proposal to human review regardless of prover verdict. CLI exposes a shorthand at create time: openshell sandbox create --approval-mode <manual|auto>, which writes the sandbox-scoped setting post-create. The audit event carries resolved_from=<gateway|sandbox|default> so operators can see why a given approval was auto vs manual.

Demo that walks the full loop. examples/agent-driven-policy-management/demo.sh runs a Codex agent through a two-path flow against a local gateway: one un-credentialed action auto-approves silently; one credentialed action escalates with a categorical finding, demo.sh approves on behalf, the agent retries and the file lands in GitHub. End-to-end in ~50–110s with one human-visible escalation, exactly the kind the prover cannot decide unilaterally.

Reconstructable audit. Every auto-approval emits a CONFIG:APPROVED OCSF event with unmapped fields auto=true, source=<mechanistic|agent_authored>, prover_delta=empty, and resolved_from=<gateway|sandbox|default>. The chunk's persisted validation_result carries the categorical finding lines for human-reviewed approvals.

Provider profile tightening. providers/github.yaml defaults api.github.com from read-write to read-only. Writes (gh / git via REST) now flow through the agentic loop — the loop becomes the on-ramp to write access, and the prover audits each capability change.

What the prover decides

The prover answers four formal questions about each proposed change. Each "yes" is its own categorical finding — no severity grade. Any finding blocks auto-approval; empty delta means the change is provably safe under the model.

Category	The prover detects
`link_local_reach`	Reach to a host in `169.254.0.0/16` or `fe80::/10` (cloud-metadata range, serves credentials).
`l7_bypass_credentialed`	A binary using a wire protocol the L7 proxy cannot inspect (`git-remote-https`, `ssh`, `nc`) reaches a host where a credential is in scope.
`credential_reach_expansion`	A binary gains credentialed reach to a (host, port) it could not reach before.
`capability_expansion`	On a (binary, host, port) that already had credentialed reach, the policy adds a new HTTP method. Finding cites the specific method.

Detail in crates/openshell-prover/README.md.

What the demo shows

==> Step 1 — un-credentialed reach (auto-approves)
   curl GET raw.githubusercontent.com/.../api.github.com.json
   prover: no findings (no credential in scope for the host)
   gateway: auto-approved in ~1s
   audit: "auto-approved: no new prover findings (source=agent_authored)"

==> Step 2 — credentialed capability change (escalates)
   curl PUT api.github.com/.../specific.md
   prover: credential_reach_expansion (or capability_expansion) on api.github.com:443
   gateway: pending — human review required
   demo.sh approves on behalf → agent retries → file lands in github

Acceptance criteria (deterministic, in tests)

Un-credentialed reach auto-approves under auto-mode (zero findings, terminal status approved).
Credentialed reach expansion lands in pending with credential_reach_expansion in validation_result.
Capability expansion on an already-reached credentialed host lands in pending with capability_expansion citing the new method.
Link-local reach lands in pending unconditionally with link_local_reach.
L7-bypass binary with credential lands in pending with l7_bypass_credentialed.
Implicit supersede works in both directions on (host, port, binary) overlap.
Default approval mode is manual — empty delta does NOT auto-approve when the proposal_approval_mode setting is unset at both scopes, "manual", or any unknown future value. Gateway scope wins over sandbox scope.
Approval mode resolves through settings: gateway scope wins over sandbox scope, and CLI --approval-mode auto writes the sandbox-scoped setting after create.
Auto-approval audit carries auto=true, source=<mode>, prover_delta=empty, and resolved_from=<gateway|sandbox|default> as unmapped OCSF fields.
Agent-submitted rule names using the reserved _provider_ prefix are rejected at submit time.
Categorical findings (no severity tiers) appear in validation_result.

All covered by unit and integration tests in crates/openshell-server/src/grpc/policy.rs::tests.

Testing

cargo test --workspace --lib — 534 gateway tests, all 16 crates green.
cargo clippy -p openshell-server -p openshell-cli -p openshell-core --all-targets -- -D warnings — clean.
cargo fmt --check — clean.
./examples/agent-driven-policy-management/demo.sh runs end-to-end against the local Docker gateway and writes the demo file to GitHub.

Explicitly deferred (follow-up PRs)

LLM-based contextual review layered on top of the deterministic gate.
Intent files / per-sandbox config of "which findings auto-reject vs. escalate."
Credential scope modeling (read-only vs write-scoped tokens).
MCP as a third L7 surface (REST + GraphQL + MCP).
Per-binary credential isolation (binaries see only the credentials their policy authorizes).
L7 watch mode for L4 grants (record HTTP requests through approved L4 tunnels for later L4→L7 conversion).
Trust tiers per sandbox class (production sandboxes get tighter defaults).
Dedicated CONFIG:AUTO_APPROVED OCSF event class (today reuses CONFIG:APPROVED with auto=true unmapped).
User-facing docs page under docs/ for the agentic loop.

Checklist

Follows Conventional Commits
Commits are signed off (DCO)

Signed-off-by: Alexander Watson <zredlined@gmail.com>

…roval Run the prover on every proposal regardless of analysis_mode. Auto-approve proposals whose merged-policy delta is empty (proposer-agnostic, with the global-policy gate respected). Calibrate prover findings to a single HIGH severity emitted on link-local hosts, L4+credential-in-scope, and bypass-L7-binary+credential-in-scope. Add implicit supersede on (host, port, binary): newer submissions auto-reject older pending chunks, and incoming mechanistic chunks auto-reject when an approved agent_authored chunk already covers the same endpoint. Audit auto-approvals via CONFIG:APPROVED OCSF events carrying auto=true, source=<mode>, prover_delta=empty as unmapped fields, with message text "auto-approved: no new prover findings". Build credential set from sandbox-attached providers (presence only — no scope modeling in v1).

Signed-off-by: Alexander Watson <zredlined@gmail.com>

The prover now answers four formal questions about a proposed policy change and emits one finding per "yes" answer: - link_local_reach - l7_bypass_credentialed - credential_reach_expansion - capability_expansion There is no severity grade. The category name is the signal; the per-path evidence carries the structured detail. The auto-approval gate is binary — empty delta or not. This removes the previous HIGH/MEDIUM/CRITICAL severity tiers and the narrowness classifier that was inconsistent across the access-shorthand / explicit-rules boundary. Gateway-side finding_delta gains category suppression: capability_expansion paths whose (binary, host, port) appears in the credential_reach_expansion delta are suppressed, so a brand-new credentialed reach surfaces as one finding rather than one reach plus N method findings. The github provider profile now defaults api.github.com to read-only (was: read-write). Writes flow through the agentic loop — the prover audits each capability change rather than treating broad write access as the default. Demo, sandbox skill, and architecture docs updated to describe the four-category model. Prover gains a README.md documenting the formal queries, evidence shape, and how to add a new category.

copy-pr-bot · 2026-05-22T15:26:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…iasing Move proposal_approval_mode out of SandboxSpec and into the existing runtime-mutable settings model so it can be flipped on a running sandbox and pinned fleet-wide via gateway scope. Precedence matches the rest of the settings model: gateway wins over sandbox, default is manual. The CLI's --approval-mode flag on `sandbox create` is now a shorthand that writes the sandbox-scoped setting post-create. Auto-approval audit events carry resolved_from=<gateway|sandbox|default>. Reject agent proposals whose rule_name starts with `_provider_`. That namespace is reserved for provider-profile-synthesized rules; allowing agents to address them by name would bypass the merge guard that splits agent contributions into their own rule so the prover sees them honestly. Refs #1097

zredlined added 4 commits May 19, 2026 11:07

feat(policy): validate agent-authored proposals

59bebfd

Signed-off-by: Alexander Watson <zredlined@gmail.com>

feat(policy): refine agentic approval demo

216bb35

Signed-off-by: Alexander Watson <zredlined@gmail.com>

zredlined added 2 commits May 22, 2026 13:45

fix(policy): move approval mode into settings

ff2eb8a

zredlined self-assigned this May 23, 2026

zredlined added the topic:l7 Application-layer policy and inspection work label May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(policy): add agentic approval loop#1528

feat(policy): add agentic approval loop#1528
zredlined wants to merge 6 commits into
mainfrom
1097-agentic-policy-approval-loop

zredlined commented May 22, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zredlined commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What this PR ships

What the prover decides

What the demo shows

Acceptance criteria (deterministic, in tests)

Testing

Explicitly deferred (follow-up PRs)

Checklist

Uh oh!

copy-pr-bot Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zredlined commented May 22, 2026 •

edited

Loading