feat: rule nudges — agent-facing prompts on deny verdicts#53
Merged
Conversation
…onse Adds optional human-readable hint propagation from policy YAML `evaluate[].nudge` through the policy evaluator into the gate.check HTTP response. Hint is only surfaced on deny verdicts; allow / monitor downgrades / daemon-monitor suppression all clear it so the agent never sees remediation guidance for a call it's allowed to make. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…replace parallel slice with struct C1: nudge was stripped at the policy layer during monitor downgrade, so when daemon=firewall escalated a MonitorMatch back to deny, the hint was already gone. Move nudge-stripping up to the API layer (applyDaemonModeOverride): policy.Evaluate now carries Nudge through the monitor branch, and the daemon override decides whether to surface or clear it based on the final user-visible verdict. I1: replace Gate.Evaluators []Evaluator + Gate.EvalNudges []string parallel slices with a single Gate.Evals []evalEntry. The two slices were a footgun (could drift out of sync); welding them together removes the bounds-check at the firing-index lookup. External callers use the new Gate.Evaluators() accessor for type-name introspection. M1: strengthen the daemon-monitor strip-site comment to call out that the policy layer leaves Nudge populated and the API layer clears it because the agent is being allowed to proceed. Tests: add TestApplyDaemonModeOverride_FirewallEscalatesWithNudge covering the C1 win, plus _MonitorMatchStripsNudge (relocated cousin of the old policy-layer test) and _MonitorSuppressesDenyAndStripsNudge. The renamed TestEvaluate_MonitorDowngradeKeepsNudge in the policy package now asserts the new behaviour: nudge survives the downgrade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a policy rule with `nudge:` produces a deny verdict, surface the hint to the model through the harness's only inbound channel — the deny-reason text — by appending it as `\n\n→ Suggested: <hint>` to both permissionDecisionReason and stopReason. Implemented as a shared denyReasonWithNudge helper invoked by claudePreToolUseHandler, codexPreToolUseHandler, cursorGateHandler (preToolUse + beforeMCPExecution), and cursorBeforeShellHandler. Allow / monitor / non-matching paths see Nudge == "" so they pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eferences Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add three end-to-end tests that exercise the full nudge wire path — real daemon, real CLI subprocess, real policy file — covering both deny-with-nudge and allow-without-nudge cases. The omitempty case asserts wire-level absence (Object.keys / `in`) rather than empty string, so a regression that emits `nudge: ""` on allow paths is caught. The e2e fixture policy now includes two nudge-bearing gates: `safety.rm-suggest-trash` (Bash, `rm -rf`) and `safety.secret-read-suggest-skill` (Read, `**/.aws/credentials`). The Read path is chosen so it does not collide with the existing `rogue.secret-read` globs (`**/.env*`, `**/.ssh/**`). `safety.rm-suggest-trash` is intentionally placed BEFORE `rogue.destructive-bash` so a plain `rm -rf` fires the nudge rule; the legacy destructive-bash test was retargeted to `git push --force` (the second alternation in destructive-bash) so that rule's coverage is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a brief request-fields list mirroring the new response-fields list so the gates/check section is balanced rather than docs-half-an-endpoint. Per CodeRabbit feedback on PR #53. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rule authors can now attach a `nudge: ` to any `evaluate[]` clause. When the matched verdict is `deny`, every harness shim (Claude Code, Codex, Cursor) appends the nudge to the deny reason as `"\n\n→ Suggested: "`. The format string is intentionally stable so external tooling can grep `→ Suggested: ` to spot the hint.
Use cases (per the screenshot context this came from):
Companion PR adding the `nudge` field to the schema + two example rules: openagentlock/rules#2.
Backward compat
Purely additive. Existing rules without a `nudge:` continue to work unchanged. The `nudge` field on `/v1/gates/check` JSON uses `omitempty` — clients that ignore it see the same wire shape as before.
What's in this PR (5 commits)
Daemon mode interaction matrix (verified by tests):
Test plan
Out of scope
🤖 Generated with Claude Code