Skip to content

fix(lark-mail): clarify JSON field semantics in confirm vs execute#798

Open
xzcong0820 wants to merge 1 commit intolarksuite:mainfrom
xzcong0820:harness/01kr6z4sg2d03vtgy9f0h2twkf
Open

fix(lark-mail): clarify JSON field semantics in confirm vs execute#798
xzcong0820 wants to merge 1 commit intolarksuite:mainfrom
xzcong0820:harness/01kr6z4sg2d03vtgy9f0h2twkf

Conversation

@xzcong0820
Copy link
Copy Markdown
Collaborator

@xzcong0820 xzcong0820 commented May 9, 2026

What

Follow-up fix on top of #749. Adds a "决策状态机字段语义" subsection to both
skills/lark-mail/SKILL.md and skill-template/domains/mail.md, spelling
out which JSON output fields belong to decision = ask_confirm, execute,
or report_not_found.

Why

Verify run on #749 caught a stable failure at scenario
MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01 (skill-prompt-eval, two consecutive
runs):

  • During ask_confirm, the model leaked execute-phase commitment into
    planned_action (once as the string "batch_trash messages [m_1, m_2]",
    once as a dict {"api": "messages.batch_trash", "message_ids": [...]})
  • preview.fields sometimes collapsed into preview.items instead of
    being explicitly listed, so downstream couldn't see sender / subject / folder were extracted

Root cause: the new "写操作前显式确认" section in #749 described the
semantic rules (preview-then-confirm, reversible vs irreversible) but
did not anchor those rules to the runner's JSON output schema fields,
so the model had no signal for which fields belong to which decision stage.

Fix

Append a stage-by-stage field constraint table to both files (only the
two SKILL prompt files are touched — runner.py / judge.py /
scenarios/*.json are intentionally untouched):

  • ask_confirm: planned_action MUST be null; would_execute_write = false; preview.fields MUST be a list of field names that includes
    sender / subject / folder (or rule_name for rule changes); batch
    operations MUST include the affected count in assistant_message.
  • execute: planned_action carries the literal API name (e.g.
    "messages.batch_trash", "messages.mark_read"); reversible ops
    (label / read state / move) go directly to execute, not via
    ask_confirm.
  • report_not_found: target_found = false, planned_action = null.

The wording explicitly calls out the failure modes seen in verify
(string vs dict shape both qualify as "should be null but filled"), so
the model has a stable rule, not an example to overfit to.

Scope

  • skills/lark-mail/SKILL.md (+26 lines under the existing
    "## 数据真实性与操作合规 / 2. 写操作前显式确认" section)
  • skill-template/domains/mail.md (+26 lines, same content for the
    domain prompt template)
  • No test / runner / scenario changes.

Related

Summary by CodeRabbit

Release Notes

  • Documentation
    • Clarified decision state machine field semantics for JSON output specifications, including handling rules for different decision stages (ask_confirm, execute, report_not_found), field constraints, preview data structure requirements, and batch operation confirmation messaging standards.

Review Change Stack

… vs execute

Verify run on PR larksuite#749 caught a code_bug at MAIL-PROMPT-DELETE-NEEDS-CONFIRM-01:
during ask_confirm, the model leaked execute-phase commitment into
`planned_action` (string or dict form), and `preview.fields` sometimes
collapsed into `preview.items` instead of being listed explicitly.

Root cause: the new "写操作前显式确认" section described the semantics
(preview-then-confirm) but did not specify how those map to the runner's
JSON output schema fields, so the model had no anchor for which fields
belong to which decision stage.

Fix: append a "决策状态机字段语义" subsection in both files, spelling out
field-by-field constraints for `decision = ask_confirm` /  `execute` /
`report_not_found`:
- ask_confirm: planned_action MUST be null; preview.fields must list
  sender/subject/folder (or rule_name); batch_* must include affected count.
- execute: planned_action filled with literal API name; reversible ops go
  here directly, not via ask_confirm.
- report_not_found: target_found=false, planned_action=null.

Scope per verify report: only SKILL prompt files touched; runner.py / judge.py
/ scenarios/*.json untouched.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 96e08459-0cd3-4508-9b61-85bba9b4d574

📥 Commits

Reviewing files that changed from the base of the PR and between 4aceae9 and 56e66aa.

📒 Files selected for processing (2)
  • skill-template/domains/mail.md
  • skills/lark-mail/SKILL.md

📝 Walkthrough

Walkthrough

Two documentation files are updated in parallel to define decision state machine field semantics for external runners outputting JSON. The changes establish a contract specifying three decision modes (ask_confirm, execute, report_not_found), stage-correct field handling, and content requirements for previews and batch operation wording.

Changes

Mail Decision State Machine Contract

Layer / File(s) Summary
Template Specification
skill-template/domains/mail.md
Domain template documentation establishes decision state machine contract with allowed decision values and field-level rules: planned_action null during ask-confirm phase and required during execute, preview.fields/preview.items structure requirements, and batch operation wording must include affected-count digits.
Skill SKILL.md Specification
skills/lark-mail/SKILL.md
Skill documentation reinforces the same contract with JSON output constraints: planned_action null-vs-API-name constraints per mode, would_execute_write/target_found correctness invariants, preview field/item list formatting, and assistant_message content requirements including batch affected-count wording.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • larksuite/cli#749: Both PRs modify the mail domain documentation to enforce the same decision/confirmation semantics—handling "not found" results, mandatory previews/confirmations for destructive writes (including affected-count wording), and fields like target_found/preview/planned_action.

Suggested labels

documentation, domain/mail, size/M

Suggested reviewers

  • chanthuang
  • infeng

Poem

🐰 A contract takes shape in our docs so clear,
With decision states and fields without fear,
Templates and skills now speak as one—
The mail domain's semantics, beautifully done! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: clarifying JSON field semantics for the confirm vs execute decision states in the lark-mail skill.
Description check ✅ Passed The description is comprehensive and covers all required template sections: Summary (motivation and scope), Changes (detailed list), Test Plan (acknowledged through context), and Related Issues (links #749 and the failing scenario).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added domain/mail PR touches the mail domain size/M Single-domain feat or fix with limited business impact labels May 9, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 65.46%. Comparing base (4aceae9) to head (56e66aa).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #798   +/-   ##
=======================================
  Coverage   65.46%   65.46%           
=======================================
  Files         510      510           
  Lines       47129    47129           
=======================================
  Hits        30851    30851           
  Misses      13607    13607           
  Partials     2671     2671           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@56e66aaf8cc5abc305075a7e20bd2931984a8a5b

🧩 Skill update

npx skills add xzcong0820/larksuite-cli#harness/01kr6z4sg2d03vtgy9f0h2twkf -y -g

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain/mail PR touches the mail domain size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant