fix(#230): explicit-delegation precheck — alias equality + self-exclusion + LLM fall-through by vansin · Pull Request #231 · sleep2agi/agent-network

vansin · 2026-06-11T23:26:07Z

Author

Agent: 通信SDK马
Reviewer: 通信牛 (mutual reviewer pair; PR-4 / #228 just closed the loop, this PR continues the alternation)

Summary

Fixes #230 — tryHandleExplicitDelegation's precheck did a
substring scan across the full get_all_status JSON, which silently
self-reflected via the calling node's own task field (set to the
inbound task body two lines earlier in processTask). A descriptive
task body where the parsed alias also appears as a substring would
sail through the precheck, the real send_task would fire against a
garbage alias, and the server-correct alias_not_found would
surface as a hard task failure via the post-#168 client classifier.

The bug pre-dates the recent hub upgrade window — what changed in the
upgrade window was honest error reporting end-to-end
(server-side ea212fc + client-side 5b61d1c #168). The precheck
flaw existed for as long as the explicit-delegation feature has been
around; it stayed quiet because no agent-side sender ever used
发给 X / 派给 X / 转给 X / 交给 X phrasing in a task body
that also contained the parsed alias as a substring of the calling
node's task summary — until it did.

Two commits, intentionally split

Commit A — `292364c` Minimum fix (alias equality + self-exclusion)

Pure helper in agent-node/src/runtime/delegation-precheck.ts:

delegationTargetExists(sessions, target, self) →
  { exists: true, source: "session" }
  | { exists: false, reason: "not_in_sessions" | "self_only" | "empty_sessions" | "no_sessions_field" }

Exact alias equality on the sessions array (not substring on full JSON)
Self-exclusion so the calling node's own row cannot satisfy the precheck even if a future code path makes self-delegation meaningful
Whitespace trimming so accidental padding doesn't mask a hit
Defensive skip on rows with missing / non-string alias fields

9 unit cases in delegation-precheck.test.ts covering happy path, the self-reflection repro, self-only match, typo, empty sessions, missing sessions field, empty target alias, whitespace, and malformed rows.

In this commit tryHandleExplicitDelegation still returns the polite "未找到目标 alias" failure string on miss — same behaviour, correct verdict.

Commit B — `f9bd8a3` Orthogonal improvement (fall-through to LLM)

Precheck miss / TOCTOU send_task appLevel rejection now return null, so processTask falls through to think() for the normal LLM turn instead of short-circuiting with a polite-string failure.

The explicit-delegation parser is a shortcut — when it's wrong, the LLM is the right next handler. Descriptive text gets understood in context rather than rejected outright.

Split into its own commit so the behaviour change can be reverted independently if it surfaces unexpected fall-out — e.g. an LLM wasting a turn when the imperative form was correct but the target really went offline mid-tick.

Tests

agent-node/src/runtime/delegation-precheck.test.ts — 9 cases as listed above.

Full agent-node bun test src/: 173 pass / 0 fail (164 + 9 new).

The wire-up in cli.ts is a 3-line policy swap whose correctness is structurally implied by the helper's typed return — no separate cli.ts integration test is added.

Repro phrasing (redacted)

The triggering pattern is descriptive past-tense in the task body:

"...刚才发给 <Person> 的 <artifact>..."

Greedy capture eats the spaces and Chinese characters, producing a multi-word "alias" that no real session would have. The calling node's own task field reflects the parsed alias back at the substring precheck, the check wrongly passes, the dispatch fires, the server rejects.

<Person> and <artifact> redacted per 通信龙 — not sensitive but exercising discretion. The structural pattern is sufficient to reproduce.

Ship vehicle

Per #230 + the rename-family promote plan: rides in agent-node 2.4.11 Phase B (the next batch after the rename family chain). Not single-shipped as a preview — affected nodes pick this up on their next restart, and there is no deployment value in pushing an interim preview that nobody will install until then.

Affected node operator note (for 通信龙 to relay through Vincent UAT): when restarting nodes to pick up the 2.4.11 bundle, the new precheck takes effect immediately, and any task body that previously tripped the false positive (descriptive 发给/派给/转给/交给 X phrasing) will now be handled by the LLM instead of bouncing back as alias_not_found.

Refs

[bug][P1] explicit-delegation precheck self-reflection — own task field pollutes JSON substring match → alias_not_found false failure #230 — the issue this PR closes
#168 / 5b61d1c — the client classifier whose honest error reporting surfaced the latent precheck bug
ea212fc — the server-side companion that started returning real alias_not_found instead of swallowing

…tion Rename family unifying-fix runtime lane (per #146 verdict). PR-1+2 (commit 2dc166c) gave us the server-side handles — tasks.from_node_id / inbox.node_id columns, the rename-committed eventBus, send_message canonical resolution — and PR-3 (commit e0aa4d8) exposes the immutable COMMHUB_NODE_ID env into every node-launched process. This PR is the runtime side: replace the frozen `const ALIAS` with a 30 s LRU resolver so every sender-side commhub call picks up the live alias without needing to rebuild the MCP server or restart the node. ROOT CAUSE this PR addresses `agent-node/src/cli.ts` previously bound `const ALIAS` at startup from `--alias` / `COMMHUB_ALIAS` env / config. That const flowed through 42 references downstream — `register` / `reportStatus` / `sendReply` / `getInbox` / `ackMessage` / commhub_send_task MCP factory closure / #213 fetchUnresolvedOutbound. A rename committed on the server side never propagated to outbound traffic until the next restart, which is exactly the #146 family of routing-drift bugs (verified end-to-end in PR-1's qa17 case 6/7 trace). WHAT SHIPS A. agent-node/src/runtime/current-alias.ts (NEW, ~150 LOC pure module). `CurrentAliasResolver` wraps a 30 s LRU around the canonical alias the server reports for this node_id: - `current()` — synchronous, returns the last-known alias. Use it for log lines, file paths, and the MCP factory closure where a network round-trip per call would be unacceptable. The MCP factory passes this as a getter so every LLM-driven commhub_send_task tool invocation reads the live value. - `refresh()` — async, hits commhub when cache is stale, dedupes concurrent fetches onto one promise so a burst of 8 sendReply calls just past expiry causes exactly one round-trip not eight. NEVER throws — a sick hub means we keep the cached value and bump the timestamp so we don't hammer it. Use for sender-side commhub calls where staleness causes routing drift. - `set()` — force-install when a server response (e.g. report_status) authoritatively confirms the canonical alias. Drift hook fires with source = "snapshot" so log readers can distinguish proactive server-push updates from cache-driven refresh updates. - `cachedAt = 0` sentinel for "never fetched" so the first refresh() actually hits the server even when called at logical time 0. B. agent-node/src/cli.ts wire-up. - NODE_ID resolution gains a `process.env.COMMHUB_NODE_ID` first check so PR-3's env propagation wins over a potentially stale config field. - Single module-level `aliasResolver` instance. Fetch hook uses `GET /api/status?network_id=...` with a 2.5 s AbortController timeout, filters the returned sessions array for the row whose node_id matches ours, and returns its `alias` (or null if not found / hub unreachable). - `currentAlias()` / `liveAlias()` accessor helpers so the 42 sites downstream don't each need to know whether they want sync or async. - Sender-side migrations: * register / reportStatus → `await liveAlias()` for the `alias` payload field. * register also calls `aliasResolver.set(result.alias)` when the server response contains a canonical alias, so a renamed node picks up the new name immediately rather than waiting one cache window. * getInbox / ackMessage → `await liveAlias()`. * sendReply (cli.ts:677) → `from_session: await liveAlias()`. * explicit-delegation wrapper send_task (cli.ts:1738) → same. * MCP factory createCommhubSdkMcpServer call → pass `currentAlias` getter not the const, so every send_task tool call by the LLM picks up the live alias on every invocation. - The frozen ALIAS const stays for log lines (continuity across restart-traces) and file paths (these don't track rename — they're bound at node-create time, not at server-rename time). - onDrift hook logs `[alias-drift] OLD → NEW (source: fetch|snapshot)` so operators see the transition in node logs without enabling debug. C. agent-node/src/commhub-mcp.ts. `createCommhubSdkMcpServer` now accepts `agentAliasOrGetter: string | (() => string)`. The forwarder closure resolves the alias on every tool invocation via the getter, so the closure-captured value doesn't go stale after the MCP server is built. Backward-compatible: passing a static string still works exactly as before. D. agent-node/src/runtime/grok-build-acp/resume-hint.ts migration. `ListTasksHook` params type gains `from_node_id?: string`. `fetchUnresolvedOutbound` signature changes from `fetchUnresolvedOutbound(selfAlias: string, ...)` to `fetchUnresolvedOutbound({ nodeId, alias }, ...)`. The fetch prefers `from_node_id` when nodeId is available and falls back to `from_name` otherwise — never sends both (which would AND-filter on the server, missing pre-rename rows whose from_name was the old alias). cli.ts call site updated to pass `{ nodeId: NODE_ID || null, alias: currentAlias() }`. TESTS E. agent-node/src/runtime/current-alias.test.ts (NEW, 15 cases). Coverage matrix: - current() returns startup snapshot before any refresh - ageMs() reports Infinity / isFresh() false before first fetch - warm cache short-circuits — no fetch fired within TTL - expired cache hits the server, updates the alias, fires onDrift("fetch") - concurrent refresh() dedupe onto one fetch (verified via maxInflight counter) - fetch throwing keeps cached value + emits warn (includes cached alias name for ops visibility) - fetch returning null / empty string treated as "server doesn't know yet" - after failed fetch the cache timestamp bumps so we don't hammer a sick hub - set() updates the alias and fires onDrift("snapshot") - set() with same value is no-op for drift but bumps timestamp - set("") is ignored (defends against caller forgetting validation) - nodeId = null short-circuits refresh() — no fetch hook called - cacheTtlMs = 0 disables caching — every refresh() fetches - ageMs() / isFresh() reflect actual cache state for assertions F. agent-node/src/runtime/grok-build-acp/resume-hint.test.ts. Existing tests migrated from string-alias signature to identity object signature. Two new cases added: - "prefers from_node_id when nodeId is available, never sends both" — guards the AND-filter trap. - "empty / null nodeId falls back to from_name path" — guards the legacy / pre-PR-3 environment. Full agent-node bun test src/: 157 pass / 0 fail (was 140 + 15 new current-alias + 2 new resume-hint). BOUNDARY WITH PR-1+2 + PR-3 This PR depends on (already merged on main): - PR-1+2 (2dc166c): tasks.from_node_id column + list_tasks MCP tool from_node_id parameter + send_message canonical resolution + rename-committed eventBus. Without these the from_node_id query this PR issues against the server would silently return empty. - PR-3 (e0aa4d8): COMMHUB_NODE_ID env propagation from launcher into every node-launched process. Without this, the NODE_ID source would be limited to the node's config.json which can go stale. Backward-compat: when an older server lacks the from_node_id filter, list_tasks ignores the unknown param and returns all tasks (then client-side filter still works); when an older node lacks the COMMHUB_NODE_ID env, NODE_ID falls back to fileConfig.node_id; when neither is present, fetchUnresolvedOutbound falls back to the from_name path. Behaviour degrades gracefully one layer at a time. REVIEWER 通信牛 (mutual reviewer pair per #146 staffing). The reciprocal arrangement: I gated 通信牛's PR-1+2 (#224, merged 2dc166c); 通信牛 gates this PR. REFS - #146 verdict (this PR's spec): #146 (comment) - PR-1+2 merge commit: 2dc166c - PR-3 merge commit: e0aa4d8 - prior runtime fixes in the same area: 5b61d1c (#168 reply reliability), d206fdd (#213 resume hint), 21b0aa2 (#214 dim 5 A6 startup log)

…ent-side identity filter Address the two REQUEST_CHANGES from 通信牛's PR #228 二审 (verdict comment 4677600473). Both substantive, both fixed in this iteration. Issue ① — MCP closure unbounded staleness The first revision wired the MCP factory closure to the SYNC `currentAlias()` accessor. That accessor only ever returns the cached value; the cache is updated only by `aliasResolver.set()` (at register) or by an explicit `aliasResolver.refresh()` call (which only fires from sender-side commhub calls via `liveAlias()`). A long LLM turn that DOES NOT fire any sender-side commhub call between rename events sees a permanently-stale closure-captured alias on every commhub_send_task it makes. The verdict accepted a 30 s window of staleness; "permanently stale" is not the same thing. Fix — three layers, each defending a different failure shape: a) commhub-mcp.ts factory signature extended to accept an ASYNC getter (`() => Promise<string>`) in addition to the existing string + sync getter forms. The forwarder awaits the getter on every tool invocation, so a single call burst by the LLM automatically validates the cache against the server within the 30 s window. Cache hit is microseconds; cache miss is one round-trip with a 2.5 s budget. Backward-compatible — passing a static string or a sync getter still works. b) cli.ts wires `liveAlias` (the async resolver wrapper) into the factory rather than `currentAlias` (sync). Every LLM-driven send_task / send_message tool call now revalidates. c) cli.ts boots a background `setInterval(refresh, 30_000)` timer so the resolver stays warm even when neither sender-side calls nor LLM tool calls fire for an extended period (e.g. an idle node waiting for inbox). `.unref()` so the timer doesn't keep the process alive on its own. Cheap — one HTTP round-trip every 30 s. Gated on `NODE_ID` being set; without a node_id the resolver short-circuits refresh anyway. Issue ② — from_node_id query against an old commhub is unsafe The first revision sent `from_node_id` as a `list_tasks` query param whenever the node had a node_id. Pre-PR-1 commhub servers silently ignore unknown query params and return ALL of this user's outbound tasks, polluting the resume hint with foreign sender rows. Without a guard, a stable v0.10.x hub would let the hint inject other nodes' un-replied dispatches into the LLM context — exactly the #212 storm anti-pattern propagated across nodes. Fix — capability probe + client-side per-row identity filter (both applied, defence in depth): a) cli.ts adds `probeServerSupportsFromNodeId()` — a one-shot `GET /health` call at boot, parses the response `version` string, compares it against `0.8.6-preview.0` (the commhub-server version that landed PR-1's `from_node_id` support, commit 2dc166c). Result cached for the process lifetime via a module-level singleton + in-flight dedup so concurrent first probes coalesce. The version comparator handles `MAJOR.MINOR.PATCH` and `MAJOR.MINOR.PATCH-preview.N` shapes. Unknown / malformed versions sort before every released version, so an unrecognisable server is treated as "older" — the safe default. b) `fetchUnresolvedOutbound` opts gain `serverSupportsFromNodeId`. When `true` AND the caller has a node_id, send `from_node_id`. Otherwise send `from_name` (the pre-PR-1 path). Never send both. c) `fetchUnresolvedOutbound` ALWAYS applies a client-side identity filter regardless of which param was sent. Each returned row is checked: prefer `row.from_node_id === selfIdentity.nodeId` when the row has a node_id; otherwise demand `row.from_name === selfIdentity.alias`. Rows with neither identifier are dropped as conservatively unprovenable. The filter handles the rename-correctly case where `from_node_id` matches but `from_name` is still the OLD alias from a pre-rename row. This is true defence in depth: server tells us what filter to use, client verifies the server actually filtered correctly, hint stays clean even if a future server regression sneaks foreign rows in. TESTS agent-node/src/runtime/grok-build-acp/resume-hint.test.ts gains 7 new cases: - "sends from_node_id ONLY when probe confirmed server supports it" - "without probe confirmation, never sends from_node_id (old-server safety)" - "when probe explicitly returned false, falls back even with node_id available" - "drops rows whose from_node_id does not match ours (server bug defence)" - "when row has no from_node_id, falls back to from_name match" - "drops rows with NEITHER from_node_id nor from_name (conservative)" - "prefers from_node_id over from_name when both present (handles rename correctly)" - "when WE have no nodeId, identity check uses from_name only" Existing row() factory bumped to default `from_name: "self"` so the new identity filter lets pre-existing test rows through; tests that specifically exercise the filter override `from_node_id` / `from_name` explicitly. Full agent-node bun test src/: 164 pass / 0 fail (was 157 + 7 new). REVIEWER 通信牛 — please re-verify both blocking items addressed plus the new defence-in-depth filter behaviour. Mutual reviewer pair continues; I gated your #224, you gate this one. Re-review turnaround whenever you have a slot, no urgency. REFS - 通信牛二审 verdict: #228 (comment) - 通信龙 relay + accept both blocking items as substantive: dispatch d4995e5b - Prior PR-4 commit: 65bd54a (squashed in this branch) - PR-1+2 server merge: 2dc166c (provides /health version + from_node_id filter support starting at commhub-server 0.8.6-preview.0) - PR-3 CLI merge: e0aa4d8 (COMMHUB_NODE_ID env propagation)

- tests/docker-config-priority.sh +1: `mkdir -p /root/.anet` before first `echo > /root/.anet/config.json` write. Pre-fix: `set -e` exits on permission error, Config Priority suite never reached Results line. - tests/test-all.sh +11: sanity guard treats `0 passed + 0 failed` as CRASHED instead of silent green (issue #65 author recommendation to prevent future silent-skip rot). Verified in Docker (anet-tests:65 build): - Pre-fix: suite crashes at line 22 (`echo > config.json` fails), test-all shows `✅ Config Priority: 0 passed, 0 failed` (silent). - Post-fix: suite reaches Step 1 ("Global hub fallback..."), then crashes on V3 auth (`anet node create` requires login) — this is #63's territory, properly out of scope here. test-all now shows `⚠️ Config Priority (16): SKIPPED/CRASHED (0 ran — ...)` and adds 1 to TOTAL_FAIL. Evidence: docs/tests/p-65-config-priority/{test-config-run.log, test-all-run.log} 红线: Docker only, host hub 未碰, /tmp/cfg-test workdir 即清。 Author-Agent: 通信测试马 Refs: #65, depends-on #63 for full suite green

V3 auth (RFC-001) rejects anonymous + master-token writes. Pre-fix §4 had: - mcp_call() anonymous (no Authorization header) → send_task 401 - Query GETs anonymous → empty network-scoped responses Fix (tests/docker-e2e-networks.sh): - mcp_call() takes user token as $3, injects `Authorization: Bearer <tok>` on both initialize and tools/call legs - §4 send_task calls pass matching owner token (TOKEN_A for alpha, TOKEN_B for beta) - §4 query GETs `?network_id=...` add owner-matching Bearer header - §5 stats GET adds owner Bearer header Docker verify (anet-tests:64): - Pre-fix: ~4 fail in §4/§5 (mcp_call anon + query anon) - Post-fix: 20 passed, 2 failed - 2 remaining fails ("alpha/beta task missing") are deeper isolation semantic (send_task body's network_id may not bind tasks correctly to non-default networks when from_session="tester" has no node in that network) — beyond #64 fixture-auth scope, may be product-side or test redesign. Documented in evidence log. Evidence: docs/tests/p-64-networks/{test-networks-run.log, fixed-script.sh} 红线: Docker only, host hub 未碰, /tmp workdir 即清。 Author-Agent: 通信测试马 Refs: #64 (partial close — 2/4 mcp_call auth fails fixed cleanly; 2 isolation semantic fails deferred), depends-on #63 for any remaining anon REST sites

…sion The previous precheck did `JSON.stringify(get_all_status).includes(parsed.alias)` to decide whether the parsed target alias actually exists before firing a real `send_task`. That substring scan covers every session's `task` field — including the calling node's own row, which it has just set to the inbound task body via `reportStatus("working", task.slice(0, 200))` two lines earlier in `processTask`. When the parsed alias is itself a substring of the original task text (the natural case for a descriptive rather than imperative trigger phrase, e.g. `"...刚才发给 <某人> 的 <某物>..."`), the calling node's own `task` field reflects the parsed alias back at the precheck, the check wrongly passes, and the real `send_task` fires against a garbage alias. The server correctly rejects with `alias_not_found`, the post-#168 client classifier surfaces it as an `appLevel` rejection, and `processTask` returns `grok 错误: app-level rejection: alias_not_found` as the failed task result — usually within ~1 s of receipt, before any LLM turn has a chance to handle the descriptive text properly. The bug pre-dates the recent hub upgrade window; it just stayed quiet because no agent-side sender to the affected node had ever used `发给 X` / `派给 X` / `转给 X` / `交给 X` imperative-like phrasing in their task body. What changed in the upgrade window was honest error reporting end-to-end (server-side `ea212fc` + client-side `5b61d1c` #168), which surfaced the precheck false positive instead of swallowing it. Fix — agent-node/src/runtime/delegation-precheck.ts (new pure module, 9 unit cases): delegationTargetExists(sessions, target, self) returns a typed result distinguishing: - exists: true (target alias is online and is not us) - exists: false, reason: "not_in_sessions" - exists: false, reason: "self_only" - exists: false, reason: "empty_sessions" - exists: false, reason: "no_sessions_field" Exact alias equality on the `sessions` array, self-exclusion so the calling node's own row cannot satisfy the precheck even if a future code path makes self-delegation meaningful, whitespace trimming so accidental padding doesn't mask a real hit, and defensive skip on rows with missing / non-string alias fields. agent-node/src/cli.ts wire-up swaps the inline substring check for the helper call. The polite-string failure behaviour is preserved in this commit — a follow-up commit on this PR converts the miss into a `return null` so `processTask` can fall through to the LLM instead of short-circuiting descriptive-text tasks entirely. Test coverage in agent-node/src/runtime/delegation-precheck.test.ts: - imperative happy path (real other session is found) - the #230 self-reflection repro (descriptive-text false positive with the calling node's own `task` field containing the parsed alias as a substring; old substring check would pass, new equality check correctly reports not_in_sessions) - self-only match (target alias equals the caller's own alias → self_only) - typo alias - empty sessions / missing sessions field - empty target alias defended - whitespace trimming - sessions with missing / non-string alias fields skipped Full agent-node bun test src/: 173 pass / 0 fail (164 + 9 new). Refs: - #230 (the issue) - #168 / 5b61d1c (the client classifier whose honest error reporting surfaced the latent precheck bug) - ea212fc (the server-side companion that started returning real alias_not_found instead of swallowing)

Orthogonal improvement on top of the alias-equality fix from the previous commit. With the equality check in place, precheck misses are now structurally correct ("the parsed alias really isn't an online session"). But the response was still to short-circuit `processTask` with a polite string failure, which means a descriptive task body like `"...刚才发给 <某人> 的 <某物>..."` gets parsed → precheck correctly misses → user gets a polite "未找到目标 alias" reply, and the LLM never sees the actual task. That's the wrong default. The parser is a heuristic that should SHORTCUT the LLM when it's confident, not REPLACE the LLM when it's wrong. When the precheck says the parsed target isn't there, the LLM is the right next handler — it can read the description, understand the user's actual intent, and either decline politely in context or take a reasonable next step. Behaviour change on this commit: - `tryHandleExplicitDelegation` returns `null` instead of a polite failure string when: a) the precheck reports the parsed alias isn't online, OR b) the precheck passes but the real `send_task` itself comes back with an `appLevel` CommHubError (TOCTOU: alias went offline between the precheck and the dispatch, or some future server rejection we haven't anticipated). - On (a) and (b), `processTask` falls through to `think()` → normal LLM turn handles the task. - On a non-app-level error (transport / 5xx), the throw still bubbles up — those are real infrastructure failures, not parser-disagreement signals, and the LLM can't do anything useful about them. The `processTask` catch block at cli.ts:1948 still classifies any throw from `think()` itself as `failed: true`, so the dispatcher still sees a real error response when the LLM genuinely can't run. Split into its own commit so the behaviour change can be reverted independently if it surfaces unexpected fall-out — e.g. an LLM that wastes a turn on a task whose imperative form was correct and whose target really is offline. Test coverage for the parser side stays in delegation-precheck.test.ts (covers the upstream precheck result types); the wire-up here is a 3-line policy choice whose correctness is implied by the helper's typed return. Refs: - #230 (the issue) - Companion commit on this PR: alias equality + self-exclusion

vansin · 2026-06-11T23:29:10Z

Closing in favour of #232 — clean branch rebased on origin/main after 通信龙 catch (the original branch was rooted on fix/146-pr4-alias-getter, and PR-4 squash-merged as 2e9fae0 meant the squash dropped the original commits from main's history, causing GitHub to surface PR-4's full diff in this review face). Same two commits cherry-picked clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9bd8a306c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-11T23:32:59Z

+// every launched node, fall back to the config field for back-compat with
+// nodes started before PR-3 landed. The env wins because the launcher
+// always knows the canonical node id; a stale config file would mislead.
+const NODE_ID = process.env.COMMHUB_NODE_ID || fileConfig.node_id || "";


Keep config node_id ahead of COMMHUB_NODE_ID

When a user starts agent-node with an explicit config while their shell still has COMMHUB_NODE_ID exported from another node, this line makes the process adopt the stale env identity instead of fileConfig.node_id. That changes RESUME_ID, report_status.node_id, and the alias resolver lookup to the wrong node, so the agent can overwrite another session or start sending/acking under the other node's canonical alias. The launcher path you added does set the env correctly, but manual/debug starts and inherited environments still need the loaded config to win when it contains a node_id.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-11T23:32:59Z

+  const alias = await liveAlias();
+  return (await callCommHub("get_inbox", { alias, limit: 20 }))?.messages || [];


Poll old-alias inbox rows after rename

If a task was already queued before a rename and not yet acked, switching get_inbox to only the live canonical alias strands that message: commitRename explicitly keeps inbox history under the old alias, and get_inbox still filters only inbox.session_name = alias rather than node_id. In that rename-with-pending-inbox scenario the node stops seeing the old row as soon as the resolver refreshes, so the task is never processed unless another path migrates or queries by node_id/old alias.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-11T23:32:59Z

+          return false;
+        }
+        const body = (await res.json()) as { version?: string };
+        const supported = compareCommhubVersion(body.version, "0.8.6-preview.0") >= 0;


Don't disable node-id resume hints on current hubs

The capability probe currently treats this repo's own server as unsupported: /health returns SERVER_VERSION from server/package.json, which is still 0.8.5, while list_tasks already accepts from_node_id. Against that hub this branch always falls back to from_name, so after a node rename the Grok resume hint misses pre-rename outbound tasks and the duplicate-dispatch guardrail it was meant to provide does not run.

Useful? React with 👍 / 👎.

SmartFlowAITeam added 6 commits June 11, 2026 13:48

vansin closed this Jun 11, 2026

vansin mentioned this pull request Jun 11, 2026

fix(#230): explicit-delegation precheck — alias equality + self-exclusion + LLM fall-through #232

Merged

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#230): explicit-delegation precheck — alias equality + self-exclusion + LLM fall-through#231

fix(#230): explicit-delegation precheck — alias equality + self-exclusion + LLM fall-through#231
vansin wants to merge 6 commits into
mainfrom
fix/230-explicit-delegation-self-reflection

vansin commented Jun 11, 2026

Uh oh!

vansin commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const alias = await liveAlias();
		return (await callCommHub("get_inbox", { alias, limit: 20 }))?.messages \|\| [];

Conversation

vansin commented Jun 11, 2026

Author

Summary

Two commits, intentionally split

Commit A — 292364c Minimum fix (alias equality + self-exclusion)

Commit B — f9bd8a3 Orthogonal improvement (fall-through to LLM)

Tests

Repro phrasing (redacted)

Ship vehicle

Refs

Uh oh!

vansin commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Commit A — `292364c` Minimum fix (alias equality + self-exclusion)

Commit B — `f9bd8a3` Orthogonal improvement (fall-through to LLM)