docs(lark-mail): clarify planned_action field semantics in ask_confirm phase (verify follow-up to #749)#797
Conversation
…m phase
Verify task on PR#749 reproduced a stable failure on scenario
MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01: the model emitted decision=ask_confirm
with the right preview wording, but ALSO populated planned_action with the
batch_trash/batch_delete API call package — violating scenario assertion
expected.planned_action_absent: true (scenarios/03-delete-needs-confirm.json).
Root cause: the new "数据真实性与操作合规 / ### 2. 写操作前显式确认" section
in PR#749 teaches the semantic flow (preview + ask_confirm) but never pins
the JSON contract for `planned_action`. The model interpreted "planned"
loosely ("an action I plan to do once you confirm") and filled it eagerly,
even though scenario-config.yaml.response_contract reserves `planned_action`
for actions actually being / about to be executed in the same round.
Fix: append a "JSON 决策包字段语义" subsection inside `### 2. 写操作前显式确认`
that pins per-decision constraints:
- ask_confirm → planned_action MUST be null, would_execute_write false
- execute → planned_action filled, would_execute_write true
(covers both authorized-immediate path used by scenario
04 and the reversible-op direct-execute path used by
scenarios 05/09/10)
- report_not_found / refuse / other → planned_action null
Also documents the verify-observed anti-pattern verbatim so the model
sees the failure mode explicitly, and re-emphasizes that reversible ops
do NOT route through ask_confirm.
Files touched (per verify guardrails):
- skill-template/domains/mail.md (source of truth, read by gen-skills.py
in larksuite-cli-registry)
- skills/lark-mail/SKILL.md (rendered product, kept manually in
sync since gen-skills.py is in the registry repo not this one)
Untouched per verify report's "禁动" list:
- resources/targets/lark-cli/skill-prompt-eval/{runner,judge}.py
- resources/targets/lark-cli/skill-prompt-eval/skills/lark-mail/{scenario-config.yaml,scenarios/*.json}
- mail.md sections "邮件查询入口" / "已知问题与边界" / "### 1." / "### 3." / 顶部"系统行为约束"
Refs: PR larksuite#749 verify follow-up (verification_report task -6).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughBoth files add a "JSON 决策包字段语义" section that mandates per-turn decision-driven constraints: ChangesMail Domain Decision Payload Specification
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 PR Preview Install Guide🧰 CLI updatenpm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@3e107bb304cb151e5731eb99866f0773d53ef3ae🧩 Skill updatenpx skills add xzcong0820/larksuite-cli#harness/01kr5vcvn776a2qknxjxxvp9ay -y -g |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #797 +/- ##
=======================================
Coverage 65.46% 65.46%
=======================================
Files 510 510
Lines 47129 47129
=======================================
Hits 30851 30851
Misses 13607 13607
Partials 2671 2671 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ds english-key convention Round -8 follow-up to commit 17feb2e. The previous round addressed the planned_action contract for each `decision` value, but the verification report (test-report.md) called out a SECOND symptom that round -6 missed: preview.fields keys drifted between rounds — the model rendered them as ["sender","subject","folder"] in run 1 (correct) but as ["操作类型"," 受影响数量","邮件列表"] in run 2 (wrong; loses field-name contract with the upstream RPC schema). Spec didn't pin the language requirement, so the model felt free to localize. This round augments §2 of skill-template/domains/mail.md (mirrored to skills/lark-mail/SKILL.md per the existing manual-sync pattern) with two small additions appended after the round -6 anti-pattern block: 1. preview.fields key naming constraint (~3 lines): keys MUST be the english RPC schema field names (sender, subject, folder, message_id, scheduled_at, recipient, thread_id). Localized chinese labels go in assistant_message, not in preview.fields keys. 2. Positive JSON example (~14 lines): a fully-specified ask_confirm output for the canonical "delete two emails from Alice" scenario, literally showing planned_action: null, would_execute_write: false, preview.fields with english keys, and the natural-language confirm prompt in assistant_message. Followed by an explicit reminder that the batch_trash API only appears in the NEXT round (after the user confirms). Pairing the existing anti-pattern callout with a positive example is known to align LLM behavior more reliably than anti-patterns alone. Per verify guardrails, untouched: - resources/targets/lark-cli/skill-prompt-eval/{runner,judge}.py - resources/targets/lark-cli/skill-prompt-eval/skills/lark-mail/{ scenario-config.yaml, scenarios/*.json} - mail.md sections §1 / §3 / 命令选择 / 已知问题 / 顶部系统行为约束 - The round -6 decision→planned_action table and anti-pattern block (kept verbatim; new content appended after them) Refs: PR larksuite#749 verify follow-up (verification_report task -8).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@skills/lark-mail/SKILL.md`:
- Around line 82-83: Update the destructive-action examples in the confirmation
matrix so they match the earlier confirm-required list: in the row for
`ask_confirm` add the missing actions `*.trash` and `drafts.delete` alongside
`*.delete` and `*.batch_trash`, ensuring `ask_confirm` remains null and
`execute` rules unchanged; this keeps `ask_confirm`, `execute`, and the policy
examples (`*.delete`, `*.batch_trash`, `*.trash`, `drafts.delete`) consistent
across the document.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43631794-4fd8-4281-9ab7-adde414f1bdf
📒 Files selected for processing (2)
skill-template/domains/mail.mdskills/lark-mail/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
- skill-template/domains/mail.md
| | `ask_confirm`(destructive 写动作待确认:`*.delete` / `*.batch_trash` / `*.cancel_scheduled_send` / `rules.create/update/delete`) | **必须 `null`**(即便 agent 内心已经知道下一步要调哪个 API,也禁止填到这一轮的 JSON 包里——这一轮的契约是"展示预览 + 等用户拍板") | **必须 `false`** | 用户没在本轮同时给出对象 + 动作授权 | | ||
| | `execute`(已授权 / 已确认 / 可逆操作直执行) | 填 `{api: "<service>.<resource>.<method>", ...影响范围最小集}` | **必须 `true`** | 用户在本轮同时给出对象 + 动作;或可逆操作(标签 / 已读 / 移动) | |
There was a problem hiding this comment.
Align destructive-action examples with the confirmation matrix to avoid policy drift.
Line 82 currently lists *.delete and *.batch_trash but omits *.trash and drafts.delete, which are marked as confirm-required above (Lines 64-67). This inconsistency can make the model treat single-item trash/delete differently from batch paths.
Suggested doc patch
-| `ask_confirm`(destructive 写动作待确认:`*.delete` / `*.batch_trash` / `*.cancel_scheduled_send` / `rules.create/update/delete`) | **必须 `null`**(即便 agent 内心已经知道下一步要调哪个 API,也禁止填到这一轮的 JSON 包里——这一轮的契约是"展示预览 + 等用户拍板") | **必须 `false`** | 用户没在本轮同时给出对象 + 动作授权 |
+| `ask_confirm`(destructive 写动作待确认:`*.delete` / `drafts.delete` / `*.trash` / `*.batch_trash` / `*.cancel_scheduled_send` / `rules.create/update/delete`) | **必须 `null`**(即便 agent 内心已经知道下一步要调哪个 API,也禁止填到这一轮的 JSON 包里——这一轮的契约是"展示预览 + 等用户拍板") | **必须 `false`** | 用户没在本轮同时给出对象 + 动作授权 |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `ask_confirm`(destructive 写动作待确认:`*.delete` / `*.batch_trash` / `*.cancel_scheduled_send` / `rules.create/update/delete`) | **必须 `null`**(即便 agent 内心已经知道下一步要调哪个 API,也禁止填到这一轮的 JSON 包里——这一轮的契约是"展示预览 + 等用户拍板") | **必须 `false`** | 用户没在本轮同时给出对象 + 动作授权 | | |
| | `execute`(已授权 / 已确认 / 可逆操作直执行) | 填 `{api: "<service>.<resource>.<method>", ...影响范围最小集}` | **必须 `true`** | 用户在本轮同时给出对象 + 动作;或可逆操作(标签 / 已读 / 移动) | | |
| | `ask_confirm`(destructive 写动作待确认:`*.delete` / `drafts.delete` / `*.trash` / `*.batch_trash` / `*.cancel_scheduled_send` / `rules.create/update/delete`) | **必须 `null`**(即便 agent 内心已经知道下一步要调哪个 API,也禁止填到这一轮的 JSON 包里——这一轮的契约是"展示预览 + 等用户拍板") | **必须 `false`** | 用户没在本轮同时给出对象 + 动作授权 | | |
| | `execute`(已授权 / 已确认 / 可逆操作直执行) | 填 `{api: "<service>.<resource>.<method>", ...影响范围最小集}` | **必须 `true`** | 用户在本轮同时给出对象 + 动作;或可逆操作(标签 / 已读 / 移动) | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@skills/lark-mail/SKILL.md` around lines 82 - 83, Update the
destructive-action examples in the confirmation matrix so they match the earlier
confirm-required list: in the row for `ask_confirm` add the missing actions
`*.trash` and `drafts.delete` alongside `*.delete` and `*.batch_trash`, ensuring
`ask_confirm` remains null and `execute` rules unchanged; this keeps
`ask_confirm`, `execute`, and the policy examples (`*.delete`, `*.batch_trash`,
`*.trash`, `drafts.delete`) consistent across the document.
Per verify report (case MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01, 3rd coding round): model still drifts planned_action into a non-null API call package on ask_confirm despite the existing table + anti-pattern + positive example. Add an explicit per-output self-check checklist (5 items) at the end of the JSON 决策包字段语义 subsection so the LLM under test must rule-by-rule confirm strict null on ask_confirm/report_not_found/refuse/other and English-only preview.fields keys before emitting the JSON. Mirror to skills/lark-mail/SKILL.md (no gen-skills script in repo; manual sync pattern established by prior commits 17feb2e and 6937e50). Files: - skill-template/domains/mail.md (+10 lines) - skills/lark-mail/SKILL.md (+10 lines)
|
harness-coding seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
What
Verify follow-up on PR #749. Adds JSON decision-package field semantics to the
`### 2. 写操作前显式确认` section of the lark-mail skill prompt.
Why
The `数据真实性与操作合规` safety section landed in #749 teaches the semantic
flow (preview + ask_confirm) but does not pin the JSON contract for
`planned_action`. Verify reproduced this on scenario
`MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01` (twice, with independent samples):
`{api: "messages.batch_trash", message_ids: ["m_1","m_2"]}` (run 1) or
`{api: "messages.batch_delete", ..., reversible: false}` (run 2)
(`scenarios/03-delete-needs-confirm.json`)
The field name `planned_action` is read by the model loosely — "an action I plan
to do once you confirm" — and filled eagerly. But `scenario-config.yaml`'s
`response_contract` reserves `planned_action` for the action being executed
in the same round (`would_execute_write: true`), not a deferred plan.
What changed
Both source-of-truth and rendered files updated identically (since
`gen-skills.py` lives in `larksuite-cli-registry` and cannot run from this
repo — same dual-edit pattern as #749):
`skill-template/domains/mail.md` — appends a `#### JSON 决策包字段语义` subsection
inside `### 2. 写操作前显式确认`, with a per-decision table:
Also calls out the verify-observed anti-pattern verbatim and re-emphasizes
that reversible ops (label / mark_read / move) bypass `ask_confirm` entirely.
`skills/lark-mail/SKILL.md` — mirror of the above (rendered product).
What didn't change (per verify guardrails)
(incl. `03-delete-needs-confirm.json`, `04-delete-explicitly-authorized.json`,
`05-reversible-label.json`, `09-mark-read-no-confirm.json`, etc.)
top-level "系统行为约束"
Test plan
Re-run the 12 `skill-prompt-eval` cli scenarios after merge. Critical regression
gates:
`08-cancel-scheduled-needs-confirm` — same pattern, should also PASS
— must keep passing (reversible execute path, no ask_confirm)
Refs: verify report task `01KR5VCVN776A2QKNXJXXVP9AY-6`; PR #749.
🤖 Generated with Claude Code
Summary by CodeRabbit