Skip to content

Commit ea2fddb

Browse files
authored
feat(policy): agent-driven policy management — the agent half (#1323)
* feat(policy): plumb chunk_ids and rejection_reason through proposal pipeline Prereq plumbing for the agent revise-and-resubmit loop. Two narrow additive proto changes unblock the upcoming /wait endpoint (#1092), prover validation badge (#1097), and reject --guidance surfaces (#1098). - SubmitPolicyAnalysisResponse: add accepted_chunk_ids so the in-sandbox agent gets handles to watch its proposals. Surfaced through the typed grpc_client wrapper and policy.local's POST /v1/proposals 202 body. Closes #1094. - PolicyChunk + StoredDraftChunk + DraftChunkPayload: add validation_result (gateway prover verdict, populated by #1097) and rejection_reason (operator free-form text). Both plain strings; no enums, no parsing on the read path. Closes #1096. - RejectDraftChunk now persists the existing reason field into the chunk's rejection_reason so it round-trips back to the agent via GetDraftPolicy. UndoDraftChunk clears it on the way back to pending so consumers cannot read a stale guidance string from a prior reject -> re-approve -> undo cycle. Whole surface stays gated behind agent_policy_proposals_enabled. Two focused tests cover the round-trip and the undo-clears guarantee. Signed-off-by: Alexander Watson <zredlined@gmail.com> * feat(sandbox): add /v1/proposals/{id} and /wait long-poll to policy.local The agent feedback channel back from policy.local. Two new routes let the in-sandbox agent learn its proposal's outcome on a single blocking HTTP call — zero LLM tokens during the wait. - GET /v1/proposals/{chunk_id} returns the chunk's current state in one gateway call. - GET /v1/proposals/{chunk_id}/wait?timeout=<s> blocks until the chunk transitions out of pending. Default 60s, clamped [1, 300]. Agent re-issues on timeout to extend. Response carries the chunk's status plus the two feedback fields shipped in the prereq commit: rejection_reason (free-form reviewer text) and validation_result (gateway prover verdict, empty until #1097). On timeout: same shape with timed_out: true so the agent can disambiguate without parsing. Wait handler short-polls GetDraftPolicy every 1s inside the request with a tokio::time::Instant deadline. One gateway connection is opened per request and reused across all polls, so a 60s wait does one TLS handshake instead of sixty. A future commit can swap the loop body for a tokio::sync::broadcast driven by a watcher task — the agent-visible contract (URL, query, response shape) is independent of the polling implementation. All routes stay behind agent_policy_proposals_enabled. Closes #1092. Signed-off-by: Alexander Watson <zredlined@gmail.com> * docs(sandbox): teach policy_advisor skill the wait + redraft loop The agent-facing instructions for the feedback loop. The endpoints exist; this is the doc that makes them usable. policy_advisor.md gains: - API entries for GET /v1/proposals/{chunk_id} and /wait?timeout=<s>, including the field semantics (status, rejection_reason, validation_result, timed_out). - A note on the submit response's accepted_chunk_ids / rejection_reasons split so the agent handles partial acceptance. - Step 6 saves the chunk_ids and addresses any submit-time rejections before waiting. - Step 7 walks the four wait outcomes: approved (retry, with the honest "may still fail" caveat), rejected (read rejection_reason AND validation_result; address whichever has content), still-pending with timed_out (re-call), non-2xx (surface, do not retry). skills.rs gains two assertions on the skill content so a future edit cannot drop the wait endpoint or the rejection_reason directive silently. Closes #1095. Signed-off-by: Alexander Watson <zredlined@gmail.com> * test(policy-advisor): add end-to-end smoke for the agent feedback loop A focused smoke that exercises the new policy.local /wait endpoint on a live gateway + sandbox, separate from the existing no-LLM regression harness (which still drives the OLD retry-with-bash-loop recovery pattern). Two flows: - Flow A — approve-and-retry: agent submits, /wait blocks, host runs `openshell rule approve`, /wait returns status=approved. Confirms the happy path round-trip latency. - Flow B — reject-with-guidance: agent submits, /wait blocks, host runs `openshell rule reject --reason "..."`, /wait returns status=rejected with the exact reviewer text in rejection_reason. Confirms the free-form guidance contract round-trips through the agent feedback channel. No GitHub credentials needed — proposals are synthetic and never trigger outbound traffic. Both flows expect agent_policy_proposals_enabled=true and a running gateway. Adds three cases to sandbox-runner.sh: submit-test-proposal (no GH deps), proposal-status, proposal-wait. The existing put-file and submit-proposal cases used by test.sh are untouched. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(policy-advisor): surface real CLI errors from wait-smoke preflight The preflight piped openshell's stderr to /dev/null and relied on jq to default the missing setting key to "<unset>", but under `set -euo pipefail` a non-zero exit from openshell makes the whole pipeline fail and the command substitution exits the script silently before the intended fail() message can print. Capture stderr explicitly, check the CLI exit code, and surface the real error plus the expected fix (port-forward + gateway add + select) when the CLI cannot reach the gateway. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(policy-advisor): pass --json to settings get in wait-smoke preflight `openshell settings get --global` defaults to a human-readable table; jq cannot parse it and the preflight died with a numeric-literal error. Pass --json so jq gets actual JSON. Also touched up the suggested recovery commands in the preflight error to match the real CLI shape (`gateway add <endpoint> --name <name>` and the env-var override warning). Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(policy): dedup draft chunks only in mechanistic mode; return effective id The smoke harness for the agent feedback loop caught a real bug in the gateway: SubmitPolicyAnalysis's response carried a chunk_id that was never persisted whenever the SQL ON CONFLICT path fired. Two failure modes, both load-bearing: - Agent-authored proposals targeting the same host/port/binary (e.g. the redraft-after-rejection loop) silently folded into one row and any RejectDraftChunk by the new chunk_id failed with "chunk not found." Latent since #1151, surfaced by #1094 returning chunk_ids. - Mechanistic mode had the same class of bug — the dedup fold-in is the intended behavior there, but the response still advertised the newly-generated UUID instead of the existing row's id. Less visible because no current caller reads mechanistic chunk_ids back, but the proto contract was violated either way. Fix in three parts: - put_draft_chunk now takes Option<&str> dedup_key explicitly and returns the effective row id (via RETURNING). None binds NULL to the dedup_key column, which bypasses the partial-index ON CONFLICT path entirely. Caller-decides semantics replace store-side magic. - handle_submit_policy_analysis picks dedup_key per chunk using an allowlist (only "mechanistic" dedups) and pushes the returned effective_id to accepted_chunk_ids. New modes default to no-dedup so a misconfigured caller cannot silently lose proposals. - The two-copy draft_chunk_dedup_key helper consolidated to one observation_dedup_key in policy_store.rs with a doc comment. Tests: - agent_authored_submits_for_same_endpoint_do_not_dedup pins the redraft-loop contract: two intentional submissions with the same host/port/binary get distinct chunk_ids, both findable via GetDraftPolicy, both rejectable by id. - mechanistic_submits_for_same_endpoint_dedup_into_one_chunk locks in the observation-mode dedup AND asserts both submits return the same effective_id — would have caught the deeper bug. Proto: SubmitPolicyAnalysisRequest.analysis_mode doc updated to describe the actual semantics (mechanistic dedups, agent_authored and unknown modes do not). Signed-off-by: Alexander Watson <zredlined@gmail.com> * docs(examples): retarget policy-management demo at the /wait endpoint The narrated demo (examples/agent-driven-policy-management) has been the public face of this feature since #1151. Its agent prompt told Codex to retry the original PUT every few seconds for up to 120 seconds — a polling workaround for the missing /wait endpoint that this branch shipped. Update the demo to exercise /wait so the canonical reading of the feature reflects the actual UX win. - agent-task.md: step 4 is now "call /wait, branch on status" with the three outcomes spelled out (approved → retry once; rejected → read rejection_reason and revise or stop; pending+timed_out → re-issue /wait once, do NOT busy-loop or shorten the timeout). Also makes explicit that the demo submits one rule per proposal so accepted_chunk_ids[0] is the safe single id to wait on. - demo.sh: header docstring rewritten as a six-step loop that mirrors the README. narrate_sandbox_workflow drops its parallel numbering and uses bullets (the runtime narration is the agent's sub-actions, not a separate decomposition of the loop). Approve step header and success message now reference /wait waking the agent, not "policy hot-reload retry." - README.md: top-of-file flow expanded from 5 to 6 steps to include the /wait call and chunk_id capture; "Going further" section now describes both regression scripts and the boundary between them (real-GitHub retry vs. synthetic /wait wire test). Slow-path qualifier corrected from "image pull on first run" to "sandbox cold-start (SSH bring-up plus Codex install)". - wait-smoke.sh header rewritten to make it unambiguous this is a regression, NOT a tutorial, with explicit prereq commands instead of prereq descriptions, and a pointer at demo.sh for the narrated story. No code paths change; this is the readability pass. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(examples): pass --yes on demo.sh's global setting writes Global setting updates require explicit confirmation in non-interactive mode; demo.sh's enable_agent_proposals and the cleanup restore path were missing --yes and hard-failed the preflight. Pre-existing issue that surfaced now that more of the demo runs through this path. No other behavior change. Signed-off-by: Alexander Watson <zredlined@gmail.com> * feat(sandbox): emit OCSF audit events for policy proposal lifecycle The demo's policy decision trace previously showed only the proxy enforcement story (HTTP:PUT DENIED, CONFIG:LOADED, HTTP:PUT ALLOWED). It was silent about who proposed what or who decided what — the audit-trail receipts for the agent feedback loop were missing. policy.local now emits sandbox-side OCSF events at the observation moments, into the same stream as the existing CONFIG:LOADED: - CONFIG:PROPOSED on submit_proposal acceptance. Per accepted chunk: the message names the chunk_id, target endpoint, L7 method/path, and binary so the trace correlates against the inbox card via chunk_id. - CONFIG:APPROVED on /wait observation of approved status. - CONFIG:REJECTED on /wait observation of rejected status. Carries the reviewer's free-form rejection_reason in the message AND as an unmapped field, both sanitized (control chars stripped, capped at 200 chars with an ellipsis marker). The agent still reads the raw text via GET /v1/proposals/{id}; sanitization is audit-side only, per AGENTS.md's no-secrets-in-OCSF rule. The submit path defends the audit_summaries / accepted_chunk_ids index pairing against a future gateway change that compresses past rejected chunks (the proto doesn't promise 1:1 ordering with the request). Today client-side validation makes the lengths always match; if they don't, the pairing falls back to a generic per-id event rather than mis-attribute. The wait handler's emit site fires once per terminal-status observation. Multiple concurrent waiters on the same chunk would each emit one event; acceptable for single-waiter-per-chunk demos and the right place to dedup is the SIEM. demo.sh's trace filter now surfaces the four CONFIG: events alongside HTTP:PUT, so the trace at the end of every run tells the full story from deny to allow via propose -> approve. wait-smoke.sh's prereq notes recommend redirecting kubectl port-forward output so its "Handling connection for 8090" lines don't bleed into demo narration. Three new unit tests on the sandbox-side helpers — summary builder happy path, fallback, and the rejection_reason sanitizer. Signed-off-by: Alexander Watson <zredlined@gmail.com> * feat(policy): /wait awaits local policy reload; demo auto-approves redrafts Three things in one commit, all surfaced by running the demo end-to-end against a real gateway and finding the agent had to draft a broader second proposal. 1. /wait race fix. Previously /wait returned `approved` the moment it observed the gateway's chunk status flip, but the local supervisor reloads policy on its own poll cycle (~10s in practice). The agent's retry would race the reload and hit the still-old policy, getting denied. Codex then drafted a broader rule and re-submitted — sound agent behavior, but not what /wait should provoke. Now /wait captures the local policy version at start, and after observed-approved waits for the supervisor to load a strictly-newer version before returning. Bounded by the caller's deadline; best-effort return if the deadline elapses without the version bumping. Two new unit tests pin the happy path and the deadline-clamped fallback. 2. demo.sh auto-approve loop. Replaces approve_when_pending + wait_for_agent with one approve_pending_until_agent_exits function that keeps watching for pending chunks and approving them until the agent process exits (or the configured timeout). Defense in depth against future redraft scenarios for any reason; today (post-fix #1) the agent should only submit one proposal per task, but we don't want to hang silently if it does submit more. 3. UX. Step headers now carry "[t+1.2s]" relative timestamps so reading the run output makes latency visible (the demo's whole point is the wait is cheap — surface that). A spin_wait helper renders an ASCII spinner during the watch loop so the demo never looks frozen on a TTY. Falls back to plain sleep on non-TTY contexts. Closes the race condition diagnosed from the trace timing where the gateway approved at t+0, sandbox observed at t+0.3s, but the supervisor didn't load v2 until t+9.4s — well after the agent had already retried and been denied. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(sandbox): /wait detects policy reload by content, not the schema version The previous attempt at the /wait-after-approve race fix compared `SandboxPolicy.version` between /wait start and the policy reload — but that field is the *schema* version (constant 1), not a revision counter. Every comparison was `current(1) > baseline(1) == false`, so the wait blocked until the agent's 300s timeout regardless of whether the supervisor had actually reloaded. The demo SSH connection then timed out around the 240s mark. Diagnosed from a live run's OCSF trace: supervisor pulled v2 at +8.5s after approval (CONFIG:LOADED), but the sandbox-side CONFIG:APPROVED that my /wait emits didn't fire until +304s — exactly at the 300s deadline. Fix: compare the whole policy via prost's derived PartialEq. Any field change (network_policies map being the only one that actually mutates today) flips equality. A clone-per-200ms-tick on a few-KB proto is cheap inside the bounded wait window. Tests rewritten to match the new contract: the supervisor-reload fixture now keeps `version: 1` constant and changes `network_policies` contents, mirroring the exact failure mode from the live run. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(examples): redact tokens with python literal-string replace, not sed The sed-based redact_log in demo.sh broke when one of the auth tokens contained a character that conflicted with sed's pattern parser ("unterminated substitute pattern" on the Codex JWT). The whole log tail then blanks on failure, hiding the very failure context we're trying to surface. Switch to a python subprocess that takes the tokens via argv and does literal str.replace. No regex, no delimiter games, no truncation. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(sandbox): scope /wait reload check to the approved rule Reviewer (John Myers) flagged two failure modes in the prior whole-policy fingerprint approach used by policy.local /wait: - False sleep: when the supervisor reloads between two /wait calls (the skill tells the agent to re-issue on timed_out), the new call snapshots the already-updated policy as baseline and burns the full timeout waiting for a change that never comes. - False wakeup: any unrelated reload (other agent's approval, settings change) flips the diff, but the chunk's actual rule may not be loaded yet — the agent retries and hits policy_denied for no real signal. Replace the diff with rule-coverage. New public helper openshell_policy::policy_covers_rule reuses endpoints_overlap (so it matches add_rule's merge semantics, including the fold-into-existing-key case) plus an L7 allow check on method/path (so an existing endpoint that doesn't yet contain the proposed method doesn't signal coverage). Add policy_reloaded: true|false to the /wait response on approve, with a 500ms floor on the reload-wait phase so approvals arriving near the deadline still get a fair shot at reloaded=true. Update the policy_advisor skill to branch on it: reloaded=true → retry; reloaded=false → re-issue /wait once with timeout=30, then surface to user. Don't loop tightly. Tests: - 9 new unit tests in openshell-policy pinning coverage semantics (L4-only, L7 method gap, fold-into-existing-key, empty binaries). - 4 new tokio tests in policy_local mirroring John's exact scenarios. - wait-smoke.sh asserts policy_reloaded=true on Flow A. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(server): make GetDraftPolicy dual-auth so /wait works under OIDC policy.local calls GetDraftPolicy from inside the sandbox supervisor via the sandbox gRPC client, which authenticates with the shared x-sandbox-secret. GetDraftPolicy was listed only in the Bearer-auth scope table (config:read) and was not in SANDBOX_SECRET_METHODS or DUAL_AUTH_METHODS, so OIDC-enabled gateways rejected those calls and the /wait long-poll surfaced gateway_lookup_failed. Local/no-OIDC setups happened to work because the auth check is short-circuited. Add GetDraftPolicy to DUAL_AUTH_METHODS, matching the existing GetSandboxConfig pattern (called by both CLI reviewer surfaces with Bearer and the sandbox supervisor with x-sandbox-secret). Dual-auth short-circuits the scope check for sandbox-secret callers, so the config:read entry in authz.rs continues to gate Bearer-only flows. Mirror the openshell_get_sandbox_config_is_dual_auth assertion for GetDraftPolicy. Note: ssh_handshake_secret is server-wide, not per-sandbox, so a sandbox-secret caller can today name any sandbox in a SubmitPolicyAnalysis request — and now in a GetDraftPolicy request. The exposure is symmetric with the existing SANDBOX_SECRET_METHODS pattern. Filed as a follow-up: per-sandbox secret binding, tracked separately. Signed-off-by: Alexander Watson <zredlined@gmail.com> * fix(ci): address rebased check failures Signed-off-by: Alexander Watson <zredlined@gmail.com> --------- Signed-off-by: Alexander Watson <zredlined@gmail.com>
1 parent 96d909d commit ea2fddb

19 files changed

Lines changed: 2453 additions & 106 deletions

File tree

crates/openshell-policy/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ use serde::{Deserialize, Serialize};
2727
pub use compose::{ProviderPolicyLayer, compose_effective_policy, provider_rule_name};
2828
pub use merge::{
2929
PolicyMergeError, PolicyMergeOp, PolicyMergeResult, PolicyMergeWarning, generated_rule_name,
30-
merge_policy,
30+
merge_policy, policy_covers_rule,
3131
};
3232

3333
// ---------------------------------------------------------------------------

crates/openshell-policy/src/merge.rs

Lines changed: 365 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,78 @@ pub struct PolicyMergeResult {
207207
pub changed: bool,
208208
}
209209

210+
/// Returns true iff `policy` semantically contains the rule an `AddRule`
211+
/// merge of `proposed` would produce.
212+
///
213+
/// "Contains" means: for every endpoint in `proposed`, some rule in
214+
/// `policy.network_policies` has an endpoint with overlapping
215+
/// host/path/port set AND containing every L7 allow (method/path) the
216+
/// proposed endpoint requested, and that rule's binaries cover every
217+
/// binary in `proposed`.
218+
///
219+
/// The sandbox's `policy.local /wait` long-poll uses this to decide when
220+
/// the local supervisor has actually loaded a policy that includes the
221+
/// chunk the agent just had approved. A whole-policy hash compare is wrong
222+
/// in both directions: it can wake the wait on unrelated reloads (false
223+
/// wakeup) and can fail to wake when the supervisor reloaded between two
224+
/// `/wait` calls (false sleep). This check is the property the agent
225+
/// actually cares about — "is my rule in effect right now?".
226+
///
227+
/// L4-vs-L7 split: endpoint overlap reuses `endpoints_overlap` so the
228+
/// L4 surface (host/path/port) lines up with the `add_rule` merge — if
229+
/// the gateway folded the chunk into an existing rule under a different
230+
/// key, this check still returns true. The L7 layer is checked
231+
/// separately because `endpoints_overlap` is intentionally L4-only:
232+
/// without the L7 check, coverage would return true the instant the
233+
/// supervisor reloaded *any* change to an overlapping endpoint, even
234+
/// before the new method/path actually landed — exactly the false-wakeup
235+
/// mode this fix exists to prevent, just one layer down.
236+
pub fn policy_covers_rule(policy: &SandboxPolicy, proposed: &NetworkPolicyRule) -> bool {
237+
if proposed.endpoints.is_empty() {
238+
return false;
239+
}
240+
proposed.endpoints.iter().all(|target_endpoint| {
241+
policy.network_policies.values().any(|rule| {
242+
rule.endpoints.iter().any(|endpoint| {
243+
endpoints_overlap(endpoint, target_endpoint)
244+
&& endpoint_l7_covers(endpoint, target_endpoint)
245+
}) && proposed.binaries.iter().all(|target_binary| {
246+
rule.binaries
247+
.iter()
248+
.any(|binary| binary.path == target_binary.path)
249+
})
250+
})
251+
})
252+
}
253+
254+
/// L7 coverage for a single endpoint match. If the proposed endpoint
255+
/// declared explicit L7 allow rules (method+path), every one of them must
256+
/// be present in the merged endpoint's `rules`. An empty `proposed.rules`
257+
/// is treated as "L4-only" and returns true (the endpoint match alone is
258+
/// sufficient).
259+
///
260+
/// Conservative on access presets: if a merged endpoint uses
261+
/// `access: read-write` instead of explicit rules, this returns false
262+
/// even though the preset would permit the method at runtime. That
263+
/// produces a one-cycle re-issue on the agent's side — preferable to a
264+
/// false-positive coverage signal that lets the agent retry too early.
265+
fn endpoint_l7_covers(merged: &NetworkEndpoint, proposed: &NetworkEndpoint) -> bool {
266+
if proposed.rules.is_empty() {
267+
return true;
268+
}
269+
proposed.rules.iter().all(|proposed_rule| {
270+
let Some(proposed_allow) = proposed_rule.allow.as_ref() else {
271+
return true;
272+
};
273+
merged.rules.iter().any(|existing| {
274+
existing.allow.as_ref().is_some_and(|existing_allow| {
275+
existing_allow.method == proposed_allow.method
276+
&& existing_allow.path == proposed_allow.path
277+
})
278+
})
279+
})
280+
}
281+
210282
pub fn merge_policy(
211283
policy: SandboxPolicy,
212284
operations: &[PolicyMergeOp],
@@ -782,6 +854,7 @@ mod tests {
782854

783855
use super::{
784856
PolicyMergeError, PolicyMergeOp, PolicyMergeWarning, generated_rule_name, merge_policy,
857+
policy_covers_rule,
785858
};
786859
use crate::restrictive_default_policy;
787860
use openshell_core::proto::{
@@ -1187,6 +1260,298 @@ mod tests {
11871260
assert!(!result.policy.network_policies.contains_key("github"));
11881261
}
11891262

1263+
#[test]
1264+
fn policy_covers_rule_returns_true_when_merged_rule_present() {
1265+
let proposed = NetworkPolicyRule {
1266+
name: "agent_proposed".to_string(),
1267+
endpoints: vec![endpoint("api.github.com", 443)],
1268+
binaries: vec![NetworkBinary {
1269+
path: "/usr/bin/curl".to_string(),
1270+
..Default::default()
1271+
}],
1272+
};
1273+
1274+
let merged = merge_policy(
1275+
restrictive_default_policy(),
1276+
&[PolicyMergeOp::AddRule {
1277+
rule_name: "allow_api_github_com_443".to_string(),
1278+
rule: proposed.clone(),
1279+
}],
1280+
)
1281+
.expect("merge should succeed");
1282+
1283+
assert!(policy_covers_rule(&merged.policy, &proposed));
1284+
}
1285+
1286+
#[test]
1287+
fn policy_covers_rule_returns_false_when_unrelated_rule_present() {
1288+
let proposed = NetworkPolicyRule {
1289+
name: "agent_proposed".to_string(),
1290+
endpoints: vec![endpoint("api.github.com", 443)],
1291+
binaries: vec![NetworkBinary {
1292+
path: "/usr/bin/curl".to_string(),
1293+
..Default::default()
1294+
}],
1295+
};
1296+
1297+
// Merge an *unrelated* rule for a different host. The proposed rule
1298+
// for api.github.com is still not present — this is John's
1299+
// "false-wakeup" case: an unrelated policy reload must not signal
1300+
// that the agent's rule is loaded.
1301+
let merged = merge_policy(
1302+
restrictive_default_policy(),
1303+
&[PolicyMergeOp::AddRule {
1304+
rule_name: "allow_api_example_com_443".to_string(),
1305+
rule: rule_with_endpoint("unrelated", "api.example.com", 443),
1306+
}],
1307+
)
1308+
.expect("merge should succeed");
1309+
1310+
assert!(!policy_covers_rule(&merged.policy, &proposed));
1311+
}
1312+
1313+
#[test]
1314+
fn policy_covers_rule_handles_merge_into_existing_endpoint() {
1315+
// The merge logic folds a new rule into an existing rule when their
1316+
// endpoints overlap, even under a different network_policies key.
1317+
// Coverage must survive that fold — name-keyed checks would miss it.
1318+
let proposed = NetworkPolicyRule {
1319+
name: "agent_proposed".to_string(),
1320+
endpoints: vec![endpoint("api.github.com", 443)],
1321+
binaries: vec![NetworkBinary {
1322+
path: "/usr/bin/curl".to_string(),
1323+
..Default::default()
1324+
}],
1325+
};
1326+
1327+
let mut policy = restrictive_default_policy();
1328+
policy.network_policies.insert(
1329+
"preexisting_github".to_string(),
1330+
NetworkPolicyRule {
1331+
name: "preexisting_github".to_string(),
1332+
endpoints: vec![endpoint("api.github.com", 443)],
1333+
binaries: vec![NetworkBinary {
1334+
path: "/usr/bin/git".to_string(),
1335+
..Default::default()
1336+
}],
1337+
},
1338+
);
1339+
1340+
let merged = merge_policy(
1341+
policy,
1342+
&[PolicyMergeOp::AddRule {
1343+
rule_name: "allow_api_github_com_443".to_string(),
1344+
rule: proposed.clone(),
1345+
}],
1346+
)
1347+
.expect("merge should succeed");
1348+
1349+
assert!(
1350+
!merged
1351+
.policy
1352+
.network_policies
1353+
.contains_key("allow_api_github_com_443"),
1354+
"proposed rule should have been folded into the existing key"
1355+
);
1356+
assert!(policy_covers_rule(&merged.policy, &proposed));
1357+
}
1358+
1359+
#[test]
1360+
fn policy_covers_rule_returns_false_when_binary_missing() {
1361+
let proposed = NetworkPolicyRule {
1362+
name: "agent_proposed".to_string(),
1363+
endpoints: vec![endpoint("api.github.com", 443)],
1364+
binaries: vec![NetworkBinary {
1365+
path: "/usr/bin/curl".to_string(),
1366+
..Default::default()
1367+
}],
1368+
};
1369+
1370+
// Endpoint exists in the policy but with a *different* binary. The
1371+
// agent's retry would still be denied; reload coverage should
1372+
// reflect that.
1373+
let mut policy = restrictive_default_policy();
1374+
policy.network_policies.insert(
1375+
"existing".to_string(),
1376+
NetworkPolicyRule {
1377+
name: "existing".to_string(),
1378+
endpoints: vec![endpoint("api.github.com", 443)],
1379+
binaries: vec![NetworkBinary {
1380+
path: "/usr/bin/git".to_string(),
1381+
..Default::default()
1382+
}],
1383+
},
1384+
);
1385+
1386+
assert!(!policy_covers_rule(&policy, &proposed));
1387+
}
1388+
1389+
#[test]
1390+
fn policy_covers_rule_returns_false_for_empty_proposed_endpoints() {
1391+
// Defensive: a rule with no endpoints carries no signal we can match
1392+
// on, so coverage is never true.
1393+
let proposed = NetworkPolicyRule::default();
1394+
let policy = restrictive_default_policy();
1395+
assert!(!policy_covers_rule(&policy, &proposed));
1396+
}
1397+
1398+
#[test]
1399+
fn policy_covers_rule_returns_false_when_proposed_l7_method_not_loaded() {
1400+
// John's false-wakeup mode at L7: the supervisor has an
1401+
// overlapping endpoint loaded (e.g. read-only GET), but the
1402+
// chunk's proposed PUT method is not in the merged endpoint's
1403+
// rules yet. Coverage must NOT return true here, or the agent
1404+
// retries the PUT and hits another policy_denied.
1405+
let proposed = NetworkPolicyRule {
1406+
name: "agent_put".to_string(),
1407+
endpoints: vec![NetworkEndpoint {
1408+
host: "api.github.com".to_string(),
1409+
port: 443,
1410+
ports: vec![443],
1411+
protocol: "rest".to_string(),
1412+
rules: vec![rest_rule("PUT", "/repos/foo/bar/contents/x.md")],
1413+
..Default::default()
1414+
}],
1415+
binaries: vec![NetworkBinary {
1416+
path: "/usr/bin/curl".to_string(),
1417+
..Default::default()
1418+
}],
1419+
};
1420+
1421+
let mut policy = restrictive_default_policy();
1422+
policy.network_policies.insert(
1423+
"existing_readonly".to_string(),
1424+
NetworkPolicyRule {
1425+
name: "existing_readonly".to_string(),
1426+
endpoints: vec![NetworkEndpoint {
1427+
host: "api.github.com".to_string(),
1428+
port: 443,
1429+
ports: vec![443],
1430+
protocol: "rest".to_string(),
1431+
rules: vec![rest_rule("GET", "/repos/foo/bar/contents/x.md")],
1432+
..Default::default()
1433+
}],
1434+
binaries: vec![NetworkBinary {
1435+
path: "/usr/bin/curl".to_string(),
1436+
..Default::default()
1437+
}],
1438+
},
1439+
);
1440+
1441+
assert!(
1442+
!policy_covers_rule(&policy, &proposed),
1443+
"endpoint overlaps but L7 PUT not loaded yet; must not signal coverage"
1444+
);
1445+
}
1446+
1447+
#[test]
1448+
fn policy_covers_rule_returns_true_after_l7_merge_lands() {
1449+
// Same setup as above, but with the proposed L7 rule merged in.
1450+
// Coverage must now return true.
1451+
let proposed = NetworkPolicyRule {
1452+
name: "agent_put".to_string(),
1453+
endpoints: vec![NetworkEndpoint {
1454+
host: "api.github.com".to_string(),
1455+
port: 443,
1456+
ports: vec![443],
1457+
protocol: "rest".to_string(),
1458+
rules: vec![rest_rule("PUT", "/repos/foo/bar/contents/x.md")],
1459+
..Default::default()
1460+
}],
1461+
binaries: vec![NetworkBinary {
1462+
path: "/usr/bin/curl".to_string(),
1463+
..Default::default()
1464+
}],
1465+
};
1466+
1467+
let mut policy = restrictive_default_policy();
1468+
policy.network_policies.insert(
1469+
"existing".to_string(),
1470+
NetworkPolicyRule {
1471+
name: "existing".to_string(),
1472+
endpoints: vec![NetworkEndpoint {
1473+
host: "api.github.com".to_string(),
1474+
port: 443,
1475+
ports: vec![443],
1476+
protocol: "rest".to_string(),
1477+
rules: vec![
1478+
rest_rule("GET", "/repos/foo/bar/contents/x.md"),
1479+
rest_rule("PUT", "/repos/foo/bar/contents/x.md"),
1480+
],
1481+
..Default::default()
1482+
}],
1483+
binaries: vec![NetworkBinary {
1484+
path: "/usr/bin/curl".to_string(),
1485+
..Default::default()
1486+
}],
1487+
},
1488+
);
1489+
1490+
assert!(policy_covers_rule(&policy, &proposed));
1491+
}
1492+
1493+
#[test]
1494+
fn policy_covers_rule_returns_true_for_l4_only_proposed_when_endpoint_present() {
1495+
// A chunk that targets a non-REST surface (no L7 rules) needs
1496+
// only the L4 endpoint match to be considered covered. Empty
1497+
// proposed.rules must not be treated as "no method matches".
1498+
let proposed = NetworkPolicyRule {
1499+
name: "ssh_clone".to_string(),
1500+
endpoints: vec![NetworkEndpoint {
1501+
host: "github.com".to_string(),
1502+
port: 22,
1503+
ports: vec![22],
1504+
..Default::default()
1505+
}],
1506+
binaries: vec![NetworkBinary {
1507+
path: "/usr/bin/git".to_string(),
1508+
..Default::default()
1509+
}],
1510+
};
1511+
1512+
let merged = merge_policy(
1513+
restrictive_default_policy(),
1514+
&[PolicyMergeOp::AddRule {
1515+
rule_name: "allow_github_com_22".to_string(),
1516+
rule: proposed.clone(),
1517+
}],
1518+
)
1519+
.expect("merge should succeed");
1520+
1521+
assert!(policy_covers_rule(&merged.policy, &proposed));
1522+
}
1523+
1524+
#[test]
1525+
fn policy_covers_rule_treats_empty_proposed_binaries_as_any_binary() {
1526+
// A proposed rule with no binaries is the "any binary" shape.
1527+
// The merged rule keeps its own binaries; coverage holds iff
1528+
// endpoint and (vacuously satisfied) binary set match. Document
1529+
// the semantics so a future reader doesn't flip it accidentally.
1530+
let proposed = NetworkPolicyRule {
1531+
name: "any_binary_rule".to_string(),
1532+
endpoints: vec![endpoint("api.github.com", 443)],
1533+
binaries: vec![],
1534+
};
1535+
1536+
let mut policy = restrictive_default_policy();
1537+
policy.network_policies.insert(
1538+
"existing".to_string(),
1539+
NetworkPolicyRule {
1540+
name: "existing".to_string(),
1541+
endpoints: vec![endpoint("api.github.com", 443)],
1542+
binaries: vec![NetworkBinary {
1543+
path: "/usr/bin/curl".to_string(),
1544+
..Default::default()
1545+
}],
1546+
},
1547+
);
1548+
1549+
assert!(
1550+
policy_covers_rule(&policy, &proposed),
1551+
"empty proposed binaries should match any merged binary set"
1552+
);
1553+
}
1554+
11901555
#[test]
11911556
fn add_rule_without_existing_match_inserts_requested_key() {
11921557
let policy = restrictive_default_policy();

0 commit comments

Comments
 (0)