Skip to content

feat: rule nudges — agent-facing prompts on deny verdicts#53

Merged
knhn1004 merged 6 commits intomainfrom
feat/rule-nudges
May 3, 2026
Merged

feat: rule nudges — agent-facing prompts on deny verdicts#53
knhn1004 merged 6 commits intomainfrom
feat/rule-nudges

Conversation

@knhn1004
Copy link
Copy Markdown
Collaborator

@knhn1004 knhn1004 commented May 3, 2026

Summary

Rule authors can now attach a `nudge: ` to any `evaluate[]` clause. When the matched verdict is `deny`, every harness shim (Claude Code, Codex, Cursor) appends the nudge to the deny reason as `"\n\n→ Suggested: "`. The format string is intentionally stable so external tooling can grep `→ Suggested: ` to spot the hint.

Use cases (per the screenshot context this came from):

  • Block this command, prefer this one — `rm -rf` denied with nudge `"use `trash ` instead — recoverable from Trash"`. Agent sees the hint, retries with `trash`.
  • Force a skill — secret reads denied with nudge `"use the secret-fetcher skill from openagentlock/skills"`. Agent learns the right entrypoint instead of just being blocked.

Companion PR adding the `nudge` field to the schema + two example rules: openagentlock/rules#2.

Backward compat

Purely additive. Existing rules without a `nudge:` continue to work unchanged. The `nudge` field on `/v1/gates/check` JSON uses `omitempty` — clients that ignore it see the same wire shape as before.

What's in this PR (5 commits)

  1. `feat(daemon): plumb nudge field from policy rule through verdict response` — `policy.EvalResult.Nudge`, `Gate.Evals` (replaces parallel slices), `/v1/gates/check` JSON `nudge` field with `omitempty`. 8 unit + integration tests.
  2. `fix(daemon): preserve nudge through monitor for firewall escalation; replace parallel slice with struct` — addresses the firewall-escalation gap (daemon-firewall + policy-monitor → deny was dropping the nudge). Strip moved up to `mode.go` based on FINAL verdict.
  3. `feat(hooks): concatenate nudge into deny reason for claude/codex/cursor` — new `denyReasonWithNudge` helper; wired into 4 deny-reply sites (1 Claude, 1 Codex, 2 Cursor — pre-tool + before-shell). 4 daemon httptest tests + 3 CLI subprocess tests.
  4. `docs: document the rule nudge field across policies, api, and hooks references` — `docs/guide/policies.md` (new `### Nudges` subsection), `docs/reference/api.md` (response shape), `docs/reference/hooks.md` (deny reply format). `mkdocs build --strict` clean.
  5. `test(e2e): nudge round-trip via fake-hook for Bash and Read tools` — 3 new e2e tests covering deny+nudge for Bash, deny+nudge for Read, and `omitempty` wire-level absence on allow.

Daemon mode interaction matrix (verified by tests):

daemon mode policy mode rule has nudge final verdict nudge on wire
default enforce yes deny yes
default monitor yes allow (monitor pass) no
firewall monitor yes deny (escalation) yes
monitor enforce yes allow (suppression) no

Test plan

  • `go test -race ./internal/api/... ./internal/policy/...` from `control-plane/` — clean.
  • `bun test` from `cli/` — 153 pass / 1 skipped (e2e gated on `go` availability) / 0 fail.
  • `bun test tests/e2e.test.ts` from `cli/` (with `go` + ledger staticlib available) — 27 pass / 1 skip / 0 fail.
  • `mkdocs build --strict` — clean.
  • CI green.

Out of scope

  • Hard command rewrite (`rm` → `trash` via daemon-side substitution). The user's design conversation discussed this; we landed on nudge-only as the MVP because (a) it's purely additive across every harness contract, (b) the agent self-corrects on the nudge text just as effectively, and (c) command rewriting requires per-harness contract changes that aren't equally well-supported across Claude Code / Codex / Cursor today. Nudge-only ships the user's intent ("If you see something, do something instead — and that 'do something' is a prompt") without invasive harness work.

🤖 Generated with Claude Code

knhn1004 and others added 6 commits May 3, 2026 14:58
…onse

Adds optional human-readable hint propagation from policy YAML
`evaluate[].nudge` through the policy evaluator into the gate.check
HTTP response. Hint is only surfaced on deny verdicts; allow / monitor
downgrades / daemon-monitor suppression all clear it so the agent never
sees remediation guidance for a call it's allowed to make.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…replace parallel slice with struct

C1: nudge was stripped at the policy layer during monitor downgrade,
so when daemon=firewall escalated a MonitorMatch back to deny, the
hint was already gone. Move nudge-stripping up to the API layer
(applyDaemonModeOverride): policy.Evaluate now carries Nudge through
the monitor branch, and the daemon override decides whether to
surface or clear it based on the final user-visible verdict.

I1: replace Gate.Evaluators []Evaluator + Gate.EvalNudges []string
parallel slices with a single Gate.Evals []evalEntry. The two slices
were a footgun (could drift out of sync); welding them together
removes the bounds-check at the firing-index lookup. External
callers use the new Gate.Evaluators() accessor for type-name
introspection.

M1: strengthen the daemon-monitor strip-site comment to call out
that the policy layer leaves Nudge populated and the API layer
clears it because the agent is being allowed to proceed.

Tests: add TestApplyDaemonModeOverride_FirewallEscalatesWithNudge
covering the C1 win, plus _MonitorMatchStripsNudge (relocated cousin
of the old policy-layer test) and _MonitorSuppressesDenyAndStripsNudge.
The renamed TestEvaluate_MonitorDowngradeKeepsNudge in the policy
package now asserts the new behaviour: nudge survives the downgrade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a policy rule with `nudge:` produces a deny verdict, surface the
hint to the model through the harness's only inbound channel — the
deny-reason text — by appending it as `\n\n→ Suggested: <hint>` to both
permissionDecisionReason and stopReason. Implemented as a shared
denyReasonWithNudge helper invoked by claudePreToolUseHandler,
codexPreToolUseHandler, cursorGateHandler (preToolUse +
beforeMCPExecution), and cursorBeforeShellHandler. Allow / monitor /
non-matching paths see Nudge == "" so they pass through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eferences

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add three end-to-end tests that exercise the full nudge wire path —
real daemon, real CLI subprocess, real policy file — covering both
deny-with-nudge and allow-without-nudge cases. The omitempty case
asserts wire-level absence (Object.keys / `in`) rather than empty
string, so a regression that emits `nudge: ""` on allow paths is
caught.

The e2e fixture policy now includes two nudge-bearing gates:
`safety.rm-suggest-trash` (Bash, `rm -rf`) and
`safety.secret-read-suggest-skill` (Read, `**/.aws/credentials`).
The Read path is chosen so it does not collide with the existing
`rogue.secret-read` globs (`**/.env*`, `**/.ssh/**`).
`safety.rm-suggest-trash` is intentionally placed BEFORE
`rogue.destructive-bash` so a plain `rm -rf` fires the nudge rule;
the legacy destructive-bash test was retargeted to
`git push --force` (the second alternation in destructive-bash) so
that rule's coverage is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a brief request-fields list mirroring the new response-fields list
so the gates/check section is balanced rather than docs-half-an-endpoint.
Per CodeRabbit feedback on PR #53.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@knhn1004 knhn1004 merged commit a3740ee into main May 3, 2026
5 checks passed
@knhn1004 knhn1004 deleted the feat/rule-nudges branch May 3, 2026 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant