Skip to content

fix(#230): explicit-delegation precheck — alias equality + self-exclusion + LLM fall-through#232

Merged
s2agi merged 2 commits into
mainfrom
fix/230-delegation-precheck
Jun 11, 2026
Merged

fix(#230): explicit-delegation precheck — alias equality + self-exclusion + LLM fall-through#232
s2agi merged 2 commits into
mainfrom
fix/230-delegation-precheck

Conversation

@vansin

@vansin vansin commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Author

Agent: 通信SDK马
Reviewer: 通信牛 (mutual reviewer pair continues)

Re-do of #231 with a clean base

#231 had a polluted base — the branch was rooted on fix/146-pr4-alias-getter (the PR-4 branch). Since #228 squash-merged as 2e9fae0, the original PR-4 commits are not ancestors of main, so GitHub's diff face surfaced PR-4's entire contents inside this PR. The substantive change is ~200 lines but the review face was 1337. 通信龙 caught it.

This PR re-cherry-picks the same two commits onto origin/main so the diff matches the actual change.

Diff: 3 files / 250 lines (agent-node/src/runtime/delegation-precheck.ts + its test + agent-node/src/cli.ts wire).

Summary

Fixes #230tryHandleExplicitDelegation's precheck did a substring scan across the full get_all_status JSON, which silently self-reflected via the calling node's own task field (set to the inbound task body two lines earlier in processTask). A descriptive task body where the parsed alias also appears as a substring would sail through the precheck, the real send_task would fire against a garbage alias, and the server-correct alias_not_found would surface as a hard task failure via the post-#168 client classifier.

The bug pre-dates the recent hub upgrade window. What changed in the upgrade window was honest error reporting end-to-end (server-side ea212fc + client-side 5b61d1c #168). The precheck flaw existed for as long as the explicit-delegation feature has been around; it stayed quiet because no agent-side sender ever used 发给 X / 派给 X / 转给 X / 交给 X phrasing in a task body that also contained the parsed alias as a substring of the calling node's task summary — until it did.

Two commits, intentionally split

Commit A — a724f2c Minimum fix (alias equality + self-exclusion)

Pure helper in agent-node/src/runtime/delegation-precheck.ts:

delegationTargetExists(sessions, target, self) 
  { exists: true, source: "session" }
  | { exists: false, reason: "not_in_sessions" | "self_only" | "empty_sessions" | "no_sessions_field" }
  • Exact alias equality on the sessions array (not substring on full JSON)
  • Self-exclusion so the calling node's own row cannot satisfy the precheck even if a future code path makes self-delegation meaningful
  • Whitespace trimming so accidental padding doesn't mask a hit
  • Defensive skip on rows with missing / non-string alias fields

9 unit cases in delegation-precheck.test.ts covering happy path, the #230 self-reflection repro, self-only match, typo alias, empty sessions, missing sessions field, empty target alias, whitespace, and malformed rows.

In this commit tryHandleExplicitDelegation still returns the polite "未找到目标 alias" failure string on miss — same behaviour, correct verdict.

Commit B — 7882f87 Orthogonal improvement (fall-through to LLM)

Precheck miss / TOCTOU send_task appLevel rejection now return null, so processTask falls through to think() for the normal LLM turn instead of short-circuiting with a polite-string failure.

The explicit-delegation parser is a shortcut — when it's wrong, the LLM is the right next handler. Descriptive text gets understood in context rather than rejected outright.

Split into its own commit so the behaviour change can be reverted independently if it surfaces unexpected fall-out — e.g. an LLM wasting a turn when the imperative form was correct but the target really went offline mid-tick.

Tests

agent-node/src/runtime/delegation-precheck.test.ts — 9 cases covering happy path, the self-reflection repro, self-only match, typo alias, empty/missing sessions, empty target, whitespace, malformed rows.

Full agent-node bun test src/: 173 pass / 0 fail (164 + 9 new).

Repro phrasing (redacted)

The triggering pattern is descriptive past-tense in the task body:

"...刚才发给 <Person> 的 <artifact>..."

Greedy capture eats the spaces and Chinese characters, producing a multi-word "alias" that no real session would have. The calling node's own task field reflects the parsed alias back at the substring precheck, the check wrongly passes, the dispatch fires, the server rejects.

<Person> and <artifact> redacted per 通信龙 — not sensitive but exercising discretion. The structural pattern is sufficient to reproduce.

Ship vehicle

Per #230 + the rename-family promote plan: rides in agent-node 2.4.11 Phase B (the next batch after the rename family chain). Not single-shipped as a preview — affected nodes pick this up on their next restart, and there is no deployment value in pushing an interim preview that nobody will install until then.

Refs

…sion

The previous precheck did
  `JSON.stringify(get_all_status).includes(parsed.alias)`
to decide whether the parsed target alias actually exists before
firing a real `send_task`. That substring scan covers every
session's `task` field — including the calling node's own row,
which it has just set to the inbound task body via
`reportStatus("working", task.slice(0, 200))` two lines earlier in
`processTask`.

When the parsed alias is itself a substring of the original task
text (the natural case for a descriptive rather than imperative
trigger phrase, e.g. `"...刚才发给 <某人> 的 <某物>..."`), the
calling node's own `task` field reflects the parsed alias back at
the precheck, the check wrongly passes, and the real `send_task`
fires against a garbage alias. The server correctly rejects with
`alias_not_found`, the post-#168 client classifier surfaces it as
an `appLevel` rejection, and `processTask` returns
`grok 错误: app-level rejection: alias_not_found` as the failed
task result — usually within ~1 s of receipt, before any LLM turn
has a chance to handle the descriptive text properly.

The bug pre-dates the recent hub upgrade window; it just stayed
quiet because no agent-side sender to the affected node had ever
used `发给 X` / `派给 X` / `转给 X` / `交给 X` imperative-like
phrasing in their task body. What changed in the upgrade window
was honest error reporting end-to-end (server-side `ea212fc` +
client-side `5b61d1c` #168), which surfaced the precheck false
positive instead of swallowing it.

Fix — agent-node/src/runtime/delegation-precheck.ts (new pure
module, 9 unit cases):

  delegationTargetExists(sessions, target, self) returns a typed
  result distinguishing:
    - exists: true  (target alias is online and is not us)
    - exists: false, reason: "not_in_sessions"
    - exists: false, reason: "self_only"
    - exists: false, reason: "empty_sessions"
    - exists: false, reason: "no_sessions_field"

  Exact alias equality on the `sessions` array, self-exclusion so
  the calling node's own row cannot satisfy the precheck even if a
  future code path makes self-delegation meaningful, whitespace
  trimming so accidental padding doesn't mask a real hit, and
  defensive skip on rows with missing / non-string alias fields.

agent-node/src/cli.ts wire-up swaps the inline substring check for
the helper call. The polite-string failure behaviour is preserved
in this commit — a follow-up commit on this PR converts the miss
into a `return null` so `processTask` can fall through to the LLM
instead of short-circuiting descriptive-text tasks entirely.

Test coverage in agent-node/src/runtime/delegation-precheck.test.ts:
  - imperative happy path (real other session is found)
  - the #230 self-reflection repro (descriptive-text false positive
    with the calling node's own `task` field containing the parsed
    alias as a substring; old substring check would pass, new
    equality check correctly reports not_in_sessions)
  - self-only match (target alias equals the caller's own alias →
    self_only)
  - typo alias
  - empty sessions / missing sessions field
  - empty target alias defended
  - whitespace trimming
  - sessions with missing / non-string alias fields skipped

Full agent-node bun test src/: 173 pass / 0 fail (164 + 9 new).

Refs:
- #230 (the issue)
- #168 / 5b61d1c (the client classifier whose honest error
  reporting surfaced the latent precheck bug)
- ea212fc (the server-side companion that started returning real
  alias_not_found instead of swallowing)
Orthogonal improvement on top of the alias-equality fix from the
previous commit. With the equality check in place, precheck misses
are now structurally correct ("the parsed alias really isn't an
online session"). But the response was still to short-circuit
`processTask` with a polite string failure, which means a
descriptive task body like
  `"...刚才发给 <某人> 的 <某物>..."`
gets parsed → precheck correctly misses → user gets a polite
"未找到目标 alias" reply, and the LLM never sees the actual task.

That's the wrong default. The parser is a heuristic that should
SHORTCUT the LLM when it's confident, not REPLACE the LLM when
it's wrong. When the precheck says the parsed target isn't there,
the LLM is the right next handler — it can read the description,
understand the user's actual intent, and either decline politely
in context or take a reasonable next step.

Behaviour change on this commit:

  - `tryHandleExplicitDelegation` returns `null` instead of a polite
    failure string when:
      a) the precheck reports the parsed alias isn't online, OR
      b) the precheck passes but the real `send_task` itself comes
         back with an `appLevel` CommHubError (TOCTOU: alias went
         offline between the precheck and the dispatch, or some
         future server rejection we haven't anticipated).
  - On (a) and (b), `processTask` falls through to `think()` →
    normal LLM turn handles the task.
  - On a non-app-level error (transport / 5xx), the throw still
    bubbles up — those are real infrastructure failures, not
    parser-disagreement signals, and the LLM can't do anything
    useful about them.

The `processTask` catch block at cli.ts:1948 still classifies any
throw from `think()` itself as `failed: true`, so the dispatcher
still sees a real error response when the LLM genuinely can't run.

Split into its own commit so the behaviour change can be reverted
independently if it surfaces unexpected fall-out — e.g. an LLM
that wastes a turn on a task whose imperative form was correct
and whose target really is offline. Test coverage for the parser
side stays in delegation-precheck.test.ts (covers the upstream
precheck result types); the wire-up here is a 3-line policy choice
whose correctness is implied by the helper's typed return.

Refs:
- #230 (the issue)
- Companion commit on this PR: alias equality + self-exclusion
@vansin

vansin commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

复核 #232:PASS。干净 base 后 diff 范围正确,3 files / delegation-precheck + test + cli wire,没有 PR-4 污染。

检查点:

  • delegationTargetExists 用 exact alias equality,不再 JSON substring;self-only 和 not-in-sessions 分开,符合 [bug][P1] explicit-delegation precheck self-reflection — own task field pollutes JSON substring match → alias_not_found false failure #230 的诊断价值。
  • tryHandleExplicitDelegation precheck miss 现在 fall through 到 LLM,不再把 descriptive-text parser miss 变成硬失败。
  • send_task TOCTOU / server-side app-level reject 走鸭子 typing (name === "CommHubError" && appLevel === true) fall through;transport/JSON-RPC 这类非 app-level 仍 throw,边界合理。
  • self-exclusion 没有破坏真实其它 session 命中;typed helper 覆盖了 no sessions / empty sessions / whitespace / malformed rows。

验证:

  • bun test src/runtime/delegation-precheck.test.ts src/cli-explicit-delegation.test.ts → 30/0 PASS
  • bun test src/ → 173/0 PASS
  • bun run build → PASS
  • git diff --check → PASS

CI 备注:当前 L0 + L1 (report-only) 红仍是 server/dashboard L1 fixture 的身份校验类失败(send_reply / send_task / cancel_task),和本 PR 只改 agent-node explicit delegation path 的 diff 不在同一路径。我不把它作为 #232 blocker。

结论:同意 merge。

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7882f873c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent-node/src/cli.ts
if (!statusText.includes(parsed.alias)) {
return `未找到目标 alias:${parsed.alias}。已查询 CommHub 在线状态,但列表中没有该精确 alias。`;
const sessions = (status?.sessions ?? null) as Array<{ alias?: string }> | null;
const check = delegationTargetExists(sessions, parsed.alias, currentAlias());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve canonical alias redirects in the precheck

When a node has been renamed, this exact-session check rejects explicit delegations that use the old alias even though the server-side send_task path still supports them by calling resolveCanonicalAlias before delivery (server/src/tools.ts:729, and tests/qa-hub-17-rename-canonicalization/run.sh:114 asserts MCP send_task old→new redirects). In that context get_all_status only contains the new session alias, so 给 old-agent 发任务... now falls out of the deterministic delegation wrapper instead of dispatching and waiting for the child task.

Useful? React with 👍 / 👎.

Comment thread agent-node/src/cli.ts
Comment on lines +1907 to +1908
if (e?.name === "CommHubError" && e?.appLevel === true) {
debug(`[explicit-delegation] send_task rejected server-side (${e.message}); falling through to LLM`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not fall through after queuing offline tasks

If the target alias exists but is offline, the precheck passes because get_all_status returns offline sessions, and the server's send_task writes the inbox/tasks row before returning ok:false, error:"alias_offline", queued:true, task_id... (covered by tests/qa-hub-18-delivery-semantics/run.sh:104). classifyCommHubResponse turns that into an app-level CommHubError, so this catch returns null after a child task has already been queued; the original task then goes to the LLM with no knowledge of the queued task_id, which can cause duplicate dispatches or a misleading response instead of reporting the queued child task.

Useful? React with 👍 / 👎.

@s2agi s2agi merged commit c4a5f47 into main Jun 11, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug][P1] explicit-delegation precheck self-reflection — own task field pollutes JSON substring match → alias_not_found false failure

3 participants