From 3da4b95e03c1baaec79b4e3b7ac94480017a3ca5 Mon Sep 17 00:00:00 2001 From: screenleon Date: Thu, 2 Jul 2026 11:17:19 +0900 Subject: [PATCH 1/5] feat(CC-434): extract shared detached-launch lib for gate/dispatch supervisors CC-433 spike judged the sentinel/setsid mechanics in dispatch-supervisor.sh and gate-supervisor.sh as byte-identical and safely extractable; this lands scripts/lib/detached-launch.sh with the 7 shared functions, migrates both supervisors and their pmctl wait/run_detached counterparts to use it, and adds a fixture test guarding the two inline REPO_ROOT resolution blocks (kept inline due to a bootstrap circular dependency) against drift. Also records the poll->notify IPC evaluation as CC-435: judged not worth the added complexity/risk given the current single-waiter usage pattern, tracked as a conditionally-triggered someday ticket rather than a committed follow-up. Co-Authored-By: Claude Sonnet 5 --- BACKLOG.md | 57 +++++- cli/pmctl | 2 +- docs/spikes/CC-433.md | 251 +++++++++++++++++++++++++ scripts/dispatch-supervisor.sh | 16 +- scripts/gate-supervisor.sh | 22 ++- scripts/lib/detached-launch.sh | 156 ++++++++++++++++ scripts/lib/pmctl-dispatch.sh | 130 ++++++------- scripts/lib/pmctl-gate.sh | 91 ++++----- scripts/run-all-tests.sh | 2 + scripts/test-detached-launch.sh | 291 +++++++++++++++++++++++++++++ scripts/test-dispatch-lifecycle.sh | 5 + scripts/test-gate-lifecycle.sh | 2 +- scripts/test-pmctl-gate.sh | 3 +- scripts/test-run-all-tests.sh | 2 + 14 files changed, 885 insertions(+), 145 deletions(-) create mode 100644 docs/spikes/CC-433.md create mode 100644 scripts/lib/detached-launch.sh create mode 100755 scripts/test-detached-launch.sh diff --git a/BACKLOG.md b/BACKLOG.md index 7854afd7..10631f2a 100644 --- a/BACKLOG.md +++ b/BACKLOG.md @@ -76,7 +76,9 @@ CC-001/CC-002 were consumed by PR #24 fix bundle inline, with no standalone entr | CC-425 | ✅ closed 2026-07-02 | `pr-gate.sh --head ` 新增;diff 一組固定 base..head ref(branch/tag/commit),不涉及 PR 或 working tree;`--base` 既有支援已可省 PR。與 `--allow-dirty` 互斥(明確拒絕)。 | ops/gate | 2026-06-25 | pr:#355 | P3 | — | | CC-431 | 🟢 someday | **[test-e2e.sh + release-verify.sh: opencode adapter support]** `--adapter` 目前只接受 `claude\|codex\|auto`;opencode 在 v0.6.0 加入後未同步更新 e2e 驗證路徑。需:(1) 將 opencode 加入兩腳本的 adapter 驗證清單;(2) Phase B dispatch 支援 opencode;(3) Phase C pr-gate smoke 評估是否可用 opencode executor(目前硬碼 codex)。觸發:release-verify --e2e --adapter opencode 被拒(exit 2)。 | ops/test | 2026-06-30 | — | P3 | — | | CC-432 | ✅ closed 2026-07-02 | test-release-verify.sh 12 個重複 `--no-suite` 呼叫改共用快取(`rv_no_suite_once`),380s → ~127s;方向 A(假 repo 隔離)/序列化耦合窄化皆評估後擱置不追(風險高於效益) | ops/test | 2026-07-01 | pr:#354 | P2 | design | -| CC-433 | 🟢 someday | **[detached lifecycle:抽共用 sentinel lib + wait 改主動通知]** (1) `scripts/dispatch-supervisor.sh` 與 `scripts/gate-supervisor.sh` 的 setsid/nohup 啟動 + nonce-authenticated sentinel 寫入邏輯結構相同但各自重寫,應抽成共用 lib,兩邊各自只保留獨有業務邏輯(preflight+adapter vs. 直接 exec pr-gate.sh);(2) `pmctl dispatch wait`/`pmctl gate wait` 目前用 `sleep \$POLL_INTERVAL` 輪詢 sentinel 檔案,應改為主動通知(如 blocking read on FIFO、inotify 等),supervisor 完成時主動喚醒 wait 而非讓它每 N 秒醒來檢查一次。解法未定案,需先 `/pre-impl` 或 `/spike` 收斂設計。 | arch/gate | 2026-07-01 | — | P3 | design | +| CC-433 | ✅ closed 2026-07-02 | detached lifecycle spike:`docs/spikes/CC-433.md` — 共用 lib 抽取 GREEN(adopt,開 CC-434);poll→通知機制遷移 AMBER(mkfifo 技術可行但 multi-waiter 資料損毀未解,維持輪詢) | arch/gate | 2026-07-01 | — | P3 | spike | +| CC-434 | 🔵 active | **[detached lifecycle:抽共用 sentinel lib scripts/lib/detached-launch.sh]** 依 CC-433 spike 結論落地:`dispatch-supervisor.sh`/`gate-supervisor.sh` 與 `pmctl_dispatch_wait`/`pmctl_gate_wait` 改用共用的 nonce 產生/key-dir 加固/sentinel 寫入輪詢函式;`resolve_repo_root` 因 bootstrap 循環依賴保留 inline,加 fixture 測試防兩份 inline 區塊漂移;dispatch 側安全預檢查(native-arg 走私防護/adapter/brief/guard 驗證)零改動。 | arch/gate | 2026-07-02 | — | P2 | — | +| CC-435 | 🟢 someday | **[poll→通知機制 single-waiter guard:條件觸發,非既定後續票]** 只有在真正出現多個 waiter 需要同時等待同一個 run_id/gate_id 的場景時才拿出來討論;候選設計見 `docs/spikes/CC-433.md` Open risks(方案 A:`flock` 搶鎖+敗者退回輪詢;方案 B:per-waiter 專屬 fifo+supervisor 廣播)。CC-434 完成後重新盤點成本效益:輪詢 vs blocking read 在單一 waiter/數分鐘等待場景下資源消耗差距趨近於零,延遲改善(≤2s→近乎即時)對人在等 gate 結果無感,而兩個方案都要在安全敏感的 supervisor 檔案引入新 race condition,投資報酬率目前不足,故不排入既定實作,僅記錄設計供未來觸發條件成立時起步。 | arch/gate | 2026-07-02 | — | P3 | design | --- @@ -1217,7 +1219,9 @@ Fix:文件化 `GOPATH=/tmp/gopath go build` 慣例到 brief self_verify go bui **area**: ops/test **Priority**: P2 — 不阻塞 CC-423,但影響日常開發迭代速度,排在下一個 PR 優先分析規劃。 -## CC-433 — detached lifecycle:抽共用 sentinel lib + wait 改主動通知 🟢 someday +## CC-433 — detached lifecycle:抽共用 sentinel lib + wait 改主動通知 ✅ 2026-07-02 + +**See**: docs/spikes/CC-433.md **Problem**:CC-423(gate detached lifecycle)實作時直接照抄 `scripts/dispatch-supervisor.sh` 的 setsid/nohup 啟動 + nonce-authenticated sentinel 寫入模式,寫出 `scripts/gate-supervisor.sh`,兩份檔案在「啟動 detached process + 寫 sentinel」這塊結構相同(`_write_sentinel`/`_die` 的形狀、`/tmp/pm-*-sentinel--` 命名、per-user mode-700 key 目錄)卻各自重寫,沒有抽共用 lib。 @@ -1232,8 +1236,55 @@ Fix:文件化 `GOPATH=/tmp/gopath go build` 慣例到 brief self_verify go bui - 輪詢改主動通知:評估可行機制,例如 (a) named pipe/FIFO:supervisor 完成時寫入 FIFO,wait 用 blocking read 而非 `sleep` 迴圈喚醒;(b) `inotifywait`(若目標平台可穩定安裝該工具)監控 sentinel 檔案建立事件;(c) 其他 IPC 機制。需評估跨平台相容性(尤其 CI/macOS/WSL2)與現有 fail-closed/timeout/indeterminate(exit 3)語意是否受影響。 - 兩項改動涉及安全敏感的 supervisor 檔案(尤其 dispatch 側有完整 preflight 防禦),需謹慎規劃測試涵蓋範圍,避免共用化過程中意外弱化 dispatch 的安全邊界。 +**Investigation scope**: +1. 共用 lib 邊界:比對 `scripts/dispatch-supervisor.sh` 與 `scripts/gate-supervisor.sh` 的 setsid/nohup 啟動 + sentinel 寫入/nonce-key-file 管理邏輯,界定可抽出到 `scripts/lib/detached-launch.sh` 的共用函式,以及各自必須保留的獨有邏輯(dispatch 的 adapter/guard preflight + `pmctl_dispatch_execute_tail`;gate 的直接 exec pr-gate.sh + result 完整性檢查),並確認抽出不弱化 dispatch 側現有 security preflight。 +2. Poll→通知機制:評估 FIFO blocking read / `inotifywait` / 其他 IPC 在 CI、macOS、WSL2 三個目標平台的可行性與相容性,並確認選定機制下 `pmctl dispatch wait`/`pmctl gate wait` 既有的 fail-closed/timeout/indeterminate(exit 3)語意維持不變。 + +**Done-when**:`docs/spikes/CC-433.md` 對上述兩項各給出明確建議(含至少一個實際 call site 的 pilot walkthrough,建議先遷移 gate 側,blast radius 較低),並標註 GREEN/AMBER/RED 可行性判定;後續實作票依此結果撰寫。 + +**Result log**:完成,見 `docs/spikes/CC-433.md`(2026-07-02)。判定:共用 lib 抽取 **GREEN**(7/8 函式可乾淨抽取到 `scripts/lib/detached-launch.sh`,`resolve_repo_root` 因循環依賴保留 inline,邊界清楚、無安全弱化,建議開 CC-433a 實作);poll→通知機制遷移 **AMBER**(mkfifo blocking read 技術可行且延遲大幅改善,但 multi-waiter 並發下有資料損毀風險,需先設計 single-waiter guard 才可採用,本輪維持輪詢)。 + **Trigger**:CC-423(gate detached lifecycle)pr-gate 迭代後,使用者檢視 `scripts/gate-supervisor.sh` 與 `scripts/dispatch-supervisor.sh` 的重複程度,並注意到 wait 端目前是輪詢實作,要求記錄為後續改善票(2026-07-01)。 **area**: arch/gate **Priority**: P3(someday)。 -**Cross-link**: [[CC-423]]、[[CC-432]]。 +**Cross-link**: [[CC-423]]、[[CC-432]]、[[CC-434]]。 + +## CC-434 — detached lifecycle:抽共用 sentinel lib scripts/lib/detached-launch.sh 🔵 active + +**Problem**:CC-433 spike(`docs/spikes/CC-433.md`)判定共用 lib 抽取為 GREEN——`dispatch-supervisor.sh`/`gate-supervisor.sh` 在啟動 detached process + 寫 sentinel 這塊今天是逐位元組相同的實作,但各自重寫,維護成本已在 CC-423 實作中顯現。本票落地該建議。 + +**Requirement**(依 spike Pilot walkthrough 收斂,非待定案): +- 新增 `scripts/lib/detached-launch.sh`:7 個共用函式(`detached_launch_generate_nonce`、`detached_launch_key_file`、`detached_launch_secure_key_dir`、`detached_launch_write_key_file`、`detached_launch_sentinel_path`、`detached_launch_under_setsid`、`detached_launch_write_sentinel`、`detached_launch_wait_for_sentinel`)。 +- `resolve_repo_root`(symlink 解析)因 bootstrap 循環依賴(腳本要先解出 REPO_ROOT 才能 source lib)**保留 inline** 於 `dispatch-supervisor.sh`/`gate-supervisor.sh` 頂端;加一個 fixture 測試以 marker-comment 框住兩處區塊、逐字 diff,防止未來修改其中一份卻忘了同步另一份。 +- `gate-supervisor.sh`/`pmctl_gate_wait`(`scripts/lib/pmctl-gate.sh`)與 `dispatch-supervisor.sh`/`pmctl_dispatch_wait`(`scripts/lib/pmctl-dispatch.sh`)都改用共用函式;wait 端沿用輪詢(`detached_launch_wait_for_sentinel`),不動 IPC 機制(CC-433 spike 判定 poll→通知遷移 AMBER,未收斂)。 +- dispatch 側所有安全預檢查(native-arg 走私防護、adapter 解析、brief 驗證、guard check、run-spec schema 驗證、brief-snapshot 路徑相等性清理)**零改動**,不進共用 lib。 +- gate 側特有邏輯(`gate_result_verify` 結構完整性檢查、`--cd`→run-dir 包含關係比對、timeout 提示文字)**零改動**,維持在 `pmctl-gate.sh`。 +- sentinel 檔名/key-dir 路徑命名維持與現況位元組相同(`/tmp/pm-supervisor-sentinel--`、`/tmp/pm-gate-sentinel--`、`pm-dispatch`/`pm-gate-dispatch` key-dir namespace),純委派實作、不改變任何對外行為。 + +**Done-when**:`scripts/lib/detached-launch.sh` 落地並被兩側 supervisor + wait 函式使用;REPO_ROOT inline 區塊漂移守衛測試存在且通過;既有 dispatch/gate lifecycle 測試套件(`test-dispatch-lifecycle.sh`、`test-gate-lifecycle.sh`、`test-pmctl-dispatch.sh`、`test-pmctl-gate.sh` 等)全數通過,無行為變化;`run-all-tests.sh` 全套綠燈。 + +**area**: arch/gate +**Priority**: P2。 +**Cross-link**: [[CC-433]]、[[CC-423]]、[[CC-435]]。 + +## CC-435 — poll→通知機制 single-waiter guard:條件觸發,非既定後續票 🟢 someday + +**Problem**:`docs/spikes/CC-433.md` 判定 poll→通知機制遷移為 AMBER——mkfifo blocking read 技術可行且延遲大幅改善,但發現並發 waiter 讀同一個 fifo 會造成 byte-level 資料損毀的正確性風險(輪詢設計沒有這個問題)。CC-434 實作完成後與使用者進一步討論了兩個候選防護設計,重新盤點成本效益後決定不排入既定實作。 + +**Why**(盤點結論,決定本票只在條件觸發時才啟動): +- **資源消耗**:輪詢(`sleep 2s` + `stat()`)與 blocking read 在「一個 run/gate 對應一個 waiter、等待數分鐘到數十分鐘」的實際用量下,差距趨近於零——兩者都是「睡眠中不耗 CPU」等級,不構成採用理由。 +- **延遲精度**:唯一有意義的量化差異是輪詢最多晚 2 秒才發現完成,listener 近乎即時;但這個延遲對「人在等 PR gate/dispatch 結果」的使用情境無感,不是使用者能察覺的體驗差異。 +- **複雜度/風險**:兩個候選設計都要在安全敏感的 supervisor 檔案(`dispatch-supervisor.sh`/`gate-supervisor.sh`)與 wait 端引入新的 race condition、新的清理責任、新的測試面,投資報酬率不足以證成這個複雜度。 + +**Requirement**(候選設計草稿,僅供未來觸發條件成立時起步,非本票立即要做的規格): +- **方案 A**:對 sentinel 的 `.waitlock` 檔案做 `flock -n` 搶排他鎖;搶到鎖的 waiter 走 mkfifo blocking read 快速路徑,搶不到鎖的 waiter 安全退回既有輪詢(`detached_launch_wait_for_sentinel`),不去碰 fifo。需補上「拿到鎖後、mkfifo 之前先檢查 sentinel 是否已存在」的 TOCTOU 修正(supervisor 搶先完成的情況)。`detached_launch_write_sentinel` 需加一段 best-effort 廣播(fifo 存在才嘗試非阻塞寫入,失敗不影響檔案寫入這個唯一正確性來源)。 +- **方案 B**:每個 waiter 建立自己專屬的 fifo(不共享),supervisor 完成時掃描一個註冊表目錄、逐一廣播寫入每個已註冊 waiter 的 fifo。沒有任何 waiter 需要退回輪詢,代價是要處理註冊 race(同樣用 TOCTOU 檢查解)與殭屍 fifo 清理(比照現有 `pmctl_dispatch_wait` key file 靠 tmpwatch 回收的先例,不影響正確性)。 + +**Done-when**:僅在觸發條件成立(見下)後才需要收斂 Done-when;屆時應包含至少 3 個新測試案例:兩個以上 waiter 同時等待同一個 run_id/gate_id、supervisor 比任一 waiter 先完成、fifo/lock 建立失敗時的行為。 + +**Trigger**(條件觸發,非既定排程):**僅在真正出現需要多個 waiter 同時等待同一個 run_id/gate_id 的場景時才拿出來討論**(例如某個 orchestration 流程設計上就要 fan-out 通知給多個消費者)。目前 `pmctl dispatch wait`/`gate wait` 的呼叫模式都是「一個呼叫端等一個結果」,此條件尚未成立,故列為 someday 而非排入 milestone。 + +**area**: arch/gate +**Priority**: P3(someday,條件觸發)。 +**Cross-link**: [[CC-433]]、[[CC-434]]。 diff --git a/cli/pmctl b/cli/pmctl index 9fd444dd..40899eed 100755 --- a/cli/pmctl +++ b/cli/pmctl @@ -16,7 +16,7 @@ done REPO_ROOT="$(cd "$(dirname "$_pmctl_self")/.." && pwd)" unset _pmctl_self _pmctl_dir -for _lib in pmctl-policy pmctl-fs pmctl-adapter pmctl-backlog pmctl-guard executor-router pmctl-dispatch pmctl-trace pmctl-task pmctl-decision gate-result-verify pmctl-gate pmctl-safe pmctl-validate pmctl-context pmctl-memory pmctl-artifacts pmctl-pre-release; do +for _lib in detached-launch pmctl-policy pmctl-fs pmctl-adapter pmctl-backlog pmctl-guard executor-router pmctl-dispatch pmctl-trace pmctl-task pmctl-decision gate-result-verify pmctl-gate pmctl-safe pmctl-validate pmctl-context pmctl-memory pmctl-artifacts pmctl-pre-release; do # shellcheck source=/dev/null [[ -r "$REPO_ROOT/scripts/lib/$_lib.sh" ]] && . "$REPO_ROOT/scripts/lib/$_lib.sh" done diff --git a/docs/spikes/CC-433.md b/docs/spikes/CC-433.md new file mode 100644 index 00000000..c9f976eb --- /dev/null +++ b/docs/spikes/CC-433.md @@ -0,0 +1,251 @@ +# CC-433 — detached lifecycle:抽共用 sentinel lib + wait 改主動通知 (spike result) + +**Status**: complete +**Date**: 2026-07-02 +**Ticket**: BACKLOG.md CC-433 + +## Investigation scope + +1. 共用 lib 邊界:比對 `scripts/dispatch-supervisor.sh` 與 + `scripts/gate-supervisor.sh` 的 setsid/nohup 啟動 + sentinel 寫入/ + nonce-key-file 管理邏輯,界定可抽出到 `scripts/lib/detached-launch.sh` + 的共用函式,以及各自必須保留的獨有邏輯,並確認抽出不弱化 dispatch 側 + 現有 security preflight。 +2. Poll→通知機制:評估 FIFO blocking read / `inotifywait` / 其他 IPC 在 + CI、macOS、WSL2 三個目標平台的可行性與相容性,並確認選定機制下 + `pmctl dispatch wait`/`pmctl gate wait` 既有的 fail-closed/timeout/ + indeterminate(exit 3)語意維持不變。 + +## Angles + +### a1 — sentinel lib 邊界稽核 + +`dispatch-supervisor.sh` 與 `gate-supervisor.sh` 在下列機制上今天已是 +byte-identical,可安全抽到共用 lib,零弱化: + +- REPO_ROOT 解析 +- nonce 產生(`/dev/urandom` + `$RANDOM` fallback) +- key-dir 加固(`mkdir` + `chmod 700` + owner-uid 驗證) +- sentinel 寫入 / 輪詢 + +Dispatch 側所有安全性檢查——native-arg 走私防護(D149-159)、adapter 解析 +(D169-171)、brief 驗證(D173-175)、guard check(D177-180)、run-spec +schema 驗證、brief-snapshot 路徑相等性清理(D76-85/221-228)——全部是 +dispatch-only,且明確**不可**搬進共用 lib、**不可**變成可選/參數化: +folding 進共用 lib 會強迫變成「gate 也要跑」或「dispatch 可選跳過」,兩者 +都是弱化。 + +提出 8 個共用函式簽名:`resolve_repo_root`、`generate_nonce`、`key_dir`、 +`secure_key_dir`、`write_key_file`、`sentinel_path`、`under_setsid`、 +`write_sentinel`、`wait_for_sentinel`。 + +### a2 — poll→通知 IPC 機制評估(WSL2 環境實測) + +`mkfifo` blocking read(`mkfifo; exec 3<>"$fifo"; read -t timeout -u 3`) +實測可行,正確保留 timeout(rc=124)/indeterminate(key 檔缺席,exit 3, +完全在 fifo 建立前發生,不受影響)語意,延遲從最多 `POLL_INTERVAL`(目前 +2 秒)降到近乎即時。 + +**發現正確性風險**:並發 waiter 讀同一 fifo 會導致 byte-level 資料損毀 +(兩個 reader 同時讀,單次多行 write 的位元組被拆散/交錯)。這是輪詢設計 +沒有的問題(輪詢下第二個 waiter 頂多讀到已消費的 clean exit-3)。 + +`inotifywait` 因跨平台問題被否決:Linux-only kernel API,macOS 無原生 +支援,多數 CI 未預裝,任何 fallback-to-polling 都等於沒解決問題。 + +**建議**:維持輪詢;除非先設計並驗證 single-waiter 防護(如對 key-file +做 `flock`),否則不要換成 mkfifo。 + +### a3 — gate 側 pilot walkthrough + +畫出 `scripts/lib/detached-launch.sh` 完整草稿(8 個函式中 7 個可抽取的 +實作),`scripts/gate-supervisor.sh` 改用共用函式後的精簡版本, +`pmctl_gate_wait` 改用共用 `detached_launch_wait_for_sentinel`(沿用輪詢, +不上 mkfifo,因為 a2 的 single-waiter guard 尚未收斂)。 + +**關鍵新發現(a1/a2 都沒覆蓋到)**:`detached_launch_resolve_repo_root` +無法抽成 sourced lib 函式——循環依賴:supervisor 腳本要先解出 REPO_ROOT +才能 source lib,但這個函式本身就活在要被 source 的 lib 裡。結論:這個 +函式必須保留 inline 在兩個 supervisor 腳本頂端(重複約 8 行,可接受的 +重複成本),其餘 7 個函式可以乾淨抽取。 + +確認 gate 特有、不可搬進共用邏輯的行為:`gate_result_verify` 結構完整性 +檢查、`--cd`→run-dir 包含關係比對、timeout 提示文字用詞——這些跟 dispatch +的 durable-record 機制完全不同,維持現狀。 + +附上 stretch 選項:flock-guard 過的 mkfifo 變體函式草稿,明確標記「不建議 +進本輪遷移 PR,只作為未來 ticket 起點記錄」。 + +Nonce 環境變數透過函式呼叫前綴(`VAR=val detached_launch_under_setsid ...`) +傳遞給子行程,語意正確但草擬時容易漏掉,建議實作 PR 要有測試覆蓋這點。 + +## Findings + +- **共用邊界清楚且可驗證**:dispatch 與 gate 的 supervisor 腳本今天在 + 「啟動 detached process + 寫 sentinel」這塊是逐位元組相同的實作,兩份 + 獨立重寫維護成本已在 CC-423 中顯現(gate 側的 result 完整性 fail-closed + 邏輯,dispatch 側原本沒有,兩邊已開始各自演化)。 +- **7/8 函式可乾淨抽取,1 個因循環依賴須留 inline**(`resolve_repo_root`), + 重複成本可接受(~8 行/檔)。 +- **dispatch 側的安全預檢查全部是 dispatch-only**,抽取共用 lib 的過程 + 不會、也不應該碰到這些函式;a1 與 a3 的稽核結果一致,無安全弱化風險。 +- **poll→通知遷移技術上可行(mkfifo blocking read 延遲大幅改善),但有 + 尚未解決的並發正確性 bug**:多個 waiter 同時 blocking-read 同一個 fifo + 時,寫入內容可能被交錯撕裂,這是輪詢架構完全不會遇到的失效模式。 + `inotifywait` 因跨平台不可移植性(macOS 無原生支援、CI 未必安裝)被 + 排除,不是本輪候選。 +- 現有 fail-closed/timeout(rc=124)/indeterminate(exit 3)語意在 + mkfifo 原型中維持不變,但這只驗證了 single-waiter 情境;multi-waiter + 下的資料損毀尚未有防護設計或測試覆蓋。 + +## Recommendation + +**共用 lib 抽取:GREEN(adopt)** +邊界乾淨、可驗證、無安全弱化。7 個函式可直接抽到 +`scripts/lib/detached-launch.sh`,1 個函式(`resolve_repo_root`)因 +bootstrapping 循環依賴保留 inline 於兩個 supervisor 腳本(可接受的重複 +成本)。建議先遷移 gate 側(blast radius 較低,CC-423 剛完成,測試新鮮), +dispatch 側緊接著跟進,兩邊改動應在同一個實作 PR 內完成以避免中間態 +(一邊用 lib、一邊還在用舊碼)造成的行為漂移。 + +**poll→通知機制遷移:AMBER(conditional,尚不可直接採用)** +技術可行性已驗證(mkfifo blocking read 延遲改善顯著、既有語意保留), +但存在未解的並發正確性風險(multi-waiter 資料損毀)。在沒有先設計並驗證 +single-waiter 防護(例如對 key-file 做 `flock`,或改為每個 waiter 各自 +專屬的 fifo/sentinel 檔案)之前,不建議採用。本輪實作 PR 應維持輪詢 +(`sleep $POLL_INTERVAL`)。CC-434 完成後與使用者重新盤點成本效益 +(詳見 Open risks),結論是目前用量下投資報酬率不足,維持輪詢;改為 +**條件觸發**(CC-435,only when 真正出現多 waiter 需求)而非既定後續票。 + +## Pilot walkthrough + +以下為 a3 產出的 `scripts/gate-supervisor.sh` 遷移前後對比草稿(僅涉及 +共用 lib 抽取,不涉及 poll→通知,因該項為 AMBER 未採用): + +```diff +--- a/scripts/gate-supervisor.sh (before) ++++ b/scripts/gate-supervisor.sh (after) +@@ -1,10 +1,14 @@ + #!/usr/bin/env bash + set -euo pipefail + +-# --- inline: REPO_ROOT resolution (duplicated in dispatch-supervisor.sh) --- ++# REPO_ROOT resolution stays inline: this script must resolve its own root ++# before it can `source` the shared lib, so the resolver cannot itself live ++# in the lib it bootstraps. ~8 duplicated lines, accepted cost (see a3). + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" + +-# --- inline: nonce generation, key-dir hardening, sentinel write/poll --- +-# (~80 lines, byte-identical to dispatch-supervisor.sh's equivalent block) +-... ++# shellcheck disable=SC1091 ++source "$REPO_ROOT/scripts/lib/detached-launch.sh" + + # --- gate-only: exec pr-gate.sh directly, no adapter/guard preflight --- + GATE_ID="$1" +-NONCE="$(generate_nonce_inline)" # was: local duplicate impl +-KEY_DIR="$(secure_key_dir_inline "$GATE_ID")" ++NONCE="$(detached_launch_generate_nonce)" ++KEY_DIR="$(detached_launch_secure_key_dir "gate" "$GATE_ID")" + + setsid_run() { +- # ~15 lines duplicated setsid/nohup wrapper +- ... ++ NONCE="$NONCE" detached_launch_under_setsid \ ++ -- bash "$REPO_ROOT/scripts/pr-gate.sh" "$@" + } + + setsid_run "$@" +-write_sentinel_inline "$KEY_DIR" "$?" # was: local duplicate impl ++detached_launch_write_sentinel "$KEY_DIR" "$?" + + # gate-only: result completeness check stays here, unchanged + gate_result_verify "$KEY_DIR" +``` + +```diff +--- a/scripts/pmctl-gate.sh (pmctl_gate_wait, before) ++++ b/scripts/pmctl-gate.sh (pmctl_gate_wait, after) +@@ +-pmctl_gate_wait() { +- local gate_id="$1" +- local sentinel +- sentinel="$(_pmctl_gate_sentinel_key_file "$gate_id")" +- while true; do +- [[ -f "$sentinel" ]] && break +- sleep "${POLL_INTERVAL:-2}" +- done +- # gate-only: result completeness fail-closed check +- gate_result_verify "$sentinel" +-} ++pmctl_gate_wait() { ++ local gate_id="$1" ++ local sentinel ++ sentinel="$(detached_launch_sentinel_path "gate" "$gate_id")" ++ detached_launch_wait_for_sentinel "$sentinel" "${POLL_INTERVAL:-2}" ++ # gate-only: result completeness fail-closed check stays here, unchanged ++ gate_result_verify "$sentinel" ++} +``` + +`detached_launch_wait_for_sentinel` 內部沿用輪詢實作(不引入 mkfifo), +既有 timeout/indeterminate 回傳碼與呼叫端行為完全不變 —— 這個 pilot 只 +驗證共用 lib 抽取的乾淨度,不涉及 IPC 機制變更。 + +## Open risks + +- **mkfifo multi-waiter 資料損毀尚無防護設計**:目前只有問題確認,沒有 + 修復方案的原型或測試。這是 poll→通知維持 AMBER 的直接原因。 + CC-434 實作完成後與使用者進一步討論兩個候選防護設計(方案 A:`flock` + 搶鎖、敗者退回輪詢;方案 B:per-waiter 專屬 fifo、supervisor 廣播給 + 所有已註冊 waiter),並重新盤點成本效益,結論: + - **資源消耗**:輪詢(`sleep 2s` + `stat()`)與 blocking read 在本場景 + (單一 waiter、等待時間數分鐘到數十分鐘)下差距趨近於零,都是「睡眠 + 中不耗 CPU」等級,不構成採用的理由。 + - **延遲精度**:唯一有意義的量化差異是輪詢最多晚 2 秒才發現完成, + listener 近乎即時——但這個延遲對「人在等 PR gate/dispatch 結果」的 + 使用情境無感,不是使用者能察覺的體驗差異。 + - **複雜度/風險**:兩個方案都要在安全敏感的 supervisor 檔案 + (`dispatch-supervisor.sh`/`gate-supervisor.sh`)與 wait 端引入新的 + race condition(TOCTOU:supervisor 搶先完成 vs waiter 還沒就緒)、 + 新的清理責任(殭屍 lock/fifo)、新的測試面(多 waiter 併發、 + supervisor 搶先完成、fifo 建立失敗)。 + - **結論**:在目前「一個 run/gate 對應一個 waiter」的實際用量下,投資 + 報酬率不足以證成這個複雜度,維持輪詢。若未來出現真正需要多個 + waiter 同時等待同一個 run_id/gate_id 的場景(例如某個 orchestration + 流程設計上就要 fan-out 通知給多個消費者),再重新評估——見 CC-435。 +- **`resolve_repo_root` 的 inline 重複**:兩份 supervisor 腳本頂端各自 + 維護約 8 行相同邏輯,未來若 REPO_ROOT 解析邏輯需要變更(例如支援 + symlink repo 或 monorepo 子目錄),需要記得同步改兩處。緩解方案已收斂 + 為具體待辦,見 Next tasks / CC-434:加 marker-comment 框住的 fixture + 測試,逐字比對兩處 inline 區塊,防止漂移未被察覺。 +- **nonce 環境變數透過函式呼叫前綴傳遞**(`VAR=val detached_launch_under_setsid ...`) + 的語意在 pilot 草稿中正確,但容易在後續維護中遺漏(例如改成 `export` + 或忘記在呼叫處帶前綴),需要專門測試覆蓋,不能只靠 code review 抓。 +- Pilot walkthrough 只涵蓋 gate 側;dispatch 側套用共用 lib 時, + native-arg 走私防護等 dispatch-only 邏輯與新 lib 呼叫之間的介面尚未 + 逐行草擬,實作 PR 需要重複 a3 的走查方法應用到 dispatch 側。 + +## Next tasks + +- **CC-434**(建議實作票):共用 lib 抽取 + gate/dispatch 兩側遷移。 + 範圍:新增 `scripts/lib/detached-launch.sh`(7 個共用函式, + `resolve_repo_root` 保留 inline),`gate-supervisor.sh` 先遷移 + (blast radius 較低),`dispatch-supervisor.sh` 在同一 PR 跟進,兩邊 + wait 端(`pmctl_gate_wait`/`pmctl_dispatch_wait`)改用共用 + `wait_for_sentinel`(沿用輪詢,不動 IPC 機制)。測試需覆蓋 nonce 環境 + 變數傳遞正確性與 dispatch 側安全預檢查零弱化(回歸測試)。**同一 PR 內 + 一併加上 inline `resolve_repo_root` 區塊的防漂移守衛**:兩份 supervisor + 腳本頂端各自維護的 ~8 行 REPO_ROOT 解析邏輯必須逐字相同,建議在 + `test-*.sh` 套件中新增一個 fixture 測試,直接抽取兩個腳本中該區塊的 + 原始碼字串比對(例如以固定的起訖 marker comment 框住區塊,測試用 + `sed -n '/# BEGIN resolve-root/,/# END resolve-root/p'` 各抽一份逐行 + diff),一旦兩處不同步就讓測試 fail-closed,避免未來修改其中一份卻忘了 + 同步另一份(重演 CC-423 gate 側 result 完整性邏輯漂移的教訓)。 +- **CC-435**(someday,條件觸發):single-waiter guard 設計(方案 A: + `flock` 搶鎖、敗者退回輪詢;方案 B:per-waiter 專屬 fifo、supervisor + 廣播),**僅在出現真正需要多個 waiter 同時等待同一個 run_id/gate_id 的 + 場景時才拿出來討論**——目前評估投資報酬率不足,見上方 Open risks。 + 本 spike 的 mkfifo 原型、flock-guard 草稿、方案 A/B 的 race 分析可作為 + 該票的起點,不建議在沒有實際需求前先實作。 diff --git a/scripts/dispatch-supervisor.sh b/scripts/dispatch-supervisor.sh index 8f3f7d46..ae57626d 100755 --- a/scripts/dispatch-supervisor.sh +++ b/scripts/dispatch-supervisor.sh @@ -19,8 +19,12 @@ # later resolves the terminal outcome from the durable dispatch record. set -euo pipefail -# Resolve the real script path through symlinks so REPO_ROOT is the actual repo -# directory regardless of how the supervisor is launched (pattern from cli/pmctl). +# REPO_ROOT resolution stays inline (BEGIN/END markers below): this script +# must resolve its own root before it can `source` the shared lib, so the +# resolver cannot itself live in the lib it bootstraps. Duplicated verbatim +# in gate-supervisor.sh; scripts/test-detached-launch.sh diffs the two +# marked blocks to catch drift (see docs/spikes/CC-433.md angle a3). +# BEGIN resolve-root _self="${BASH_SOURCE[0]}" while [[ -L "$_self" ]]; do _dir="$(cd "$(dirname "$_self")" && pwd)" @@ -29,8 +33,9 @@ while [[ -L "$_self" ]]; do done REPO_ROOT="$(cd "$(dirname "$_self")/.." && pwd)" unset _self _dir +# END resolve-root -for _lib in pmctl-config portable runner-kind executor-router pmctl-guard dispatch-record pmctl-dispatch; do +for _lib in pmctl-config portable runner-kind executor-router pmctl-guard dispatch-record pmctl-dispatch detached-launch; do # shellcheck source=/dev/null [[ -r "$REPO_ROOT/scripts/lib/$_lib.sh" ]] && . "$REPO_ROOT/scripts/lib/$_lib.sh" done @@ -53,8 +58,9 @@ _write_sentinel() { local _state="${1:-failed}" _rc="${2:-2}" if [[ "${spec_run_id:-}" =~ ^run-[A-Za-z0-9]+-[A-Za-z0-9]+$ ]] \ && [[ -n "${_sentinel_nonce:-}" ]]; then - printf 'final_state=%s\nexit_code=%s\n' "$_state" "$_rc" \ - > "/tmp/pm-supervisor-sentinel-${spec_run_id}-${_sentinel_nonce}" 2>/dev/null || true + local _sentinel_path + _sentinel_path="$(detached_launch_sentinel_path "pm-supervisor" "$spec_run_id" "$_sentinel_nonce")" + detached_launch_write_sentinel "$_sentinel_path" "final_state=$_state" "exit_code=$_rc" fi } diff --git a/scripts/gate-supervisor.sh b/scripts/gate-supervisor.sh index 69772148..5d62b90a 100755 --- a/scripts/gate-supervisor.sh +++ b/scripts/gate-supervisor.sh @@ -13,8 +13,12 @@ # -- set -euo pipefail -# Resolve the real script path through symlinks so REPO_ROOT is the actual repo -# directory regardless of how the supervisor is launched (pattern from cli/pmctl). +# REPO_ROOT resolution stays inline (BEGIN/END markers below): this script +# must resolve its own root before it can `source` the shared lib, so the +# resolver cannot itself live in the lib it bootstraps. Duplicated verbatim +# in dispatch-supervisor.sh; scripts/test-detached-launch.sh diffs the two +# marked blocks to catch drift (see docs/spikes/CC-433.md angle a3). +# BEGIN resolve-root _self="${BASH_SOURCE[0]}" while [[ -L "$_self" ]]; do _dir="$(cd "$(dirname "$_self")" && pwd)" @@ -23,6 +27,10 @@ while [[ -L "$_self" ]]; do done REPO_ROOT="$(cd "$(dirname "$_self")/.." && pwd)" unset _self _dir +# END resolve-root + +# shellcheck disable=SC1091 +. "$REPO_ROOT/scripts/lib/detached-launch.sh" # Capture the parent-supplied sentinel nonce and immediately unset it so # pr-gate.sh (and any reviewer session it spawns) cannot read it and forge the @@ -38,11 +46,11 @@ gate_id="" _write_sentinel() { local _state="${1:-failed}" _rc="${2:-2}" _result="${3:-}" if [[ "$gate_id" =~ ^gate-[0-9]{8}-[0-9]{6}-[A-Za-z0-9]{6,}$ ]] && [[ -n "$_sentinel_nonce" ]]; then - { - printf 'final_state=%s\n' "$_state" - printf 'exit_code=%s\n' "$_rc" - [[ -n "$_result" ]] && printf 'result_file=%s\n' "$_result" - } > "/tmp/pm-gate-sentinel-${gate_id}-${_sentinel_nonce}" 2>/dev/null || true + local _sentinel_path + _sentinel_path="$(detached_launch_sentinel_path "pm-gate" "$gate_id" "$_sentinel_nonce")" + local -a _pairs=("final_state=$_state" "exit_code=$_rc") + [[ -n "$_result" ]] && _pairs+=("result_file=$_result") + detached_launch_write_sentinel "$_sentinel_path" "${_pairs[@]}" fi } diff --git a/scripts/lib/detached-launch.sh b/scripts/lib/detached-launch.sh new file mode 100644 index 00000000..34aaca50 --- /dev/null +++ b/scripts/lib/detached-launch.sh @@ -0,0 +1,156 @@ +#!/usr/bin/env bash +# detached-launch.sh — shared setsid/nohup + nonce-authenticated sentinel +# primitives (CC-434, spike CC-433 angle a1/a3). +# +# Extracted from the byte-identical portions of scripts/gate-supervisor.sh / +# scripts/lib/pmctl-gate.sh and scripts/dispatch-supervisor.sh / +# scripts/lib/pmctl-dispatch.sh. Owns ONLY the detach/sentinel mechanics that +# are provably identical across both callers: +# - nonce generation, per-user key-dir management +# - setsid/nohup process launch +# - sentinel write (opaque key=value passthrough) / poll-for-existence +# +# Deliberately does NOT own: +# - REPO_ROOT self-resolution (circular: the caller must resolve its own +# root BEFORE it can source this file — see docs/spikes/CC-433.md angle a3) +# - sentinel CONTENT semantics (which keys, how to parse/verify them) — that +# stays with each caller (gate_result_verify vs dispatch_record) +# - any dispatch-only security preflight (adapter/route/guard/brief checks) +# — those never applied to the gate side and must not be introduced here +# +# Sourced by both scripts/gate-supervisor.sh / scripts/dispatch-supervisor.sh +# (after each resolves its own REPO_ROOT) and scripts/lib/pmctl-gate.sh / +# scripts/lib/pmctl-dispatch.sh. Do NOT set -euo pipefail here (callers carry +# their own flags). + +# Generate a nonce suitable for sentinel-path unguessability. /dev/urandom +# first, $RANDOM concatenation fallback if urandom is unavailable/empty. +detached_launch_generate_nonce() { + local nonce + nonce="$(LC_ALL=C tr -dc 'A-Za-z0-9' /dev/null | head -c 32 2>/dev/null)" \ + || nonce="${RANDOM}${RANDOM}${RANDOM}" + [[ -n "$nonce" ]] || nonce="${RANDOM}${RANDOM}${RANDOM}" + printf '%s' "$nonce" +} + +# Per-user private key-file path for a given namespace (e.g. "pm-dispatch", +# "pm-gate-dispatch") and id. Prefers XDG_RUNTIME_DIR (tmpfs, already +# per-user/per-session) and falls back to a uid-suffixed /tmp dir. +detached_launch_key_file() { + local namespace="${1:?namespace required}" id="${2:?id required}" uid key_dir + uid="$(id -u 2>/dev/null)" || uid="0" + if [[ -n "${XDG_RUNTIME_DIR:-}" && -d "${XDG_RUNTIME_DIR}" ]]; then + key_dir="${XDG_RUNTIME_DIR}/${namespace}" + else + key_dir="/tmp/${namespace}-${uid}" + fi + printf '%s/%s' "$key_dir" "$id" +} + +# mkdir -p + chmod 700 + owner-uid verification on a key dir. `mkdir -m 700 +# -p` is insufficient: -m only applies to the deepest *new* dir, and a +# pre-existing dir keeps its prior mode/owner — a pre-seeded permissive or +# foreign-owned dir could expose nonce files. Fails closed; distinguishes +# failure stages via return code so callers can print a stage-specific +# message (create/secure/ownership), matching pre-extraction behavior: +# returns 0 — key_dir exists, mode 700, owned by current uid +# returns 1 — mkdir failed +# returns 2 — chmod failed (not owner of a pre-existing dir?) +# returns 3 — owned by a different uid +detached_launch_secure_key_dir() { + local key_dir="${1:?key_dir required}" + mkdir -p "$key_dir" 2>/dev/null || return 1 + chmod 700 "$key_dir" 2>/dev/null || return 2 + local owner + owner="$(stat -c '%u' "$key_dir" 2>/dev/null || stat -f '%u' "$key_dir" 2>/dev/null || true)" + if [[ -n "$owner" && "$owner" != "$(id -u)" ]]; then + return 3 + fi + return 0 +} + +# Write a nonce to . Caller must have called +# detached_launch_secure_key_dir on dirname(key_file) first. +detached_launch_write_key_file() { + local key_file="${1:?key_file required}" nonce="${2:?nonce required}" + printf '%s' "$nonce" > "$key_file" 2>/dev/null +} + +# Deterministic sentinel path for a given /tmp prefix ("pm-supervisor" for +# dispatch, "pm-gate" for gate), id, and nonce pair. Both the launcher +# (writes) and the waiter (polls) derive this independently — the path is +# never stored in a workspace-readable location. +detached_launch_sentinel_path() { + local prefix="${1:?prefix required}" id="${2:?id required}" nonce="${3:?nonce required}" + printf '/tmp/%s-sentinel-%s-%s' "$prefix" "$id" "$nonce" +} + +# Launch [args...] detached via setsid+nohup (falling back to +# nohup+disown when setsid is unavailable). stdout+stderr go to ; +# if is non-empty, the backgrounded PID is recorded there. +# Env vars assigned on the call itself (e.g. `NONCE="$x" detached_launch_under_setsid ...`) +# propagate to the launched process for the lifetime of this function call, +# same as any other simple-command prefix assignment in bash. +# +# Usage: detached_launch_under_setsid [--] +# Returns 0 once the process is launched and (if requested) the PID is +# persisted; does not wait for the process to complete. +detached_launch_under_setsid() { + local script_path="${1:?script_path required}" log_file="${2:?log_file required}" pid_file="${3-}" + shift 3 + [[ "${1:-}" == "--" ]] && shift + + mkdir -p "$(dirname "$log_file")" || return 1 + [[ -n "$pid_file" ]] && { mkdir -p "$(dirname "$pid_file")" || return 1; } + + local pid + if command -v setsid >/dev/null 2>&1; then + setsid nohup bash "$script_path" "$@" "$log_file" 2>&1 & + pid=$! + else + nohup bash "$script_path" "$@" "$log_file" 2>&1 & + pid=$! + disown "$pid" 2>/dev/null || true + fi + + if [[ -n "$pid_file" ]]; then + printf '%s\n' "$pid" > "$pid_file" || return 1 + fi + return 0 +} + +# Write an opaque key=value sentinel file. Content semantics (which keys, in +# what order) are entirely the caller's decision — this function does not +# interpret the pairs, just serializes them one per line. +# +# Usage: detached_launch_write_sentinel "final_state=GO" "exit_code=0" ["result_file=/path"]... +detached_launch_write_sentinel() { + local sentinel_path="${1:?sentinel_path required}" + shift + local pair + { + for pair in "$@"; do + printf '%s\n' "$pair" + done + } > "$sentinel_path" 2>/dev/null || true +} + +# Poll for a sentinel file's existence. Pure poll, no parse, no cleanup: the +# caller reads/removes the file itself once this returns 0. Does NOT handle +# the "key file absent" (indeterminate/exit 3) case — that check happens +# before this function is ever called, against the per-user key file, not +# the sentinel itself (see pmctl_gate_wait / pmctl_dispatch_wait). +# returns 0 — sentinel appeared within timeout +# returns 124 — timed out waiting +detached_launch_wait_for_sentinel() { + local sentinel_path="${1:?sentinel_path required}" timeout="${2:?timeout required}" + local poll_interval="${3:-2}" + local start elapsed + start="$SECONDS" + while true; do + [[ -f "$sentinel_path" ]] && return 0 + elapsed=$((SECONDS - start)) + (( elapsed >= timeout )) && return 124 + sleep "$poll_interval" + done +} diff --git a/scripts/lib/pmctl-dispatch.sh b/scripts/lib/pmctl-dispatch.sh index 12a6b567..3cbb6635 100644 --- a/scripts/lib/pmctl-dispatch.sh +++ b/scripts/lib/pmctl-dispatch.sh @@ -819,35 +819,14 @@ pmctl_dispatch_write_runspec() { # Both pmctl_dispatch_run_detached and pmctl_dispatch_wait use the same derivation # so they find the same file without storing the path in the workspace. _pmctl_sentinel_key_file() { - local _run_id="${1:-}" _uid _key_dir - _uid="$(id -u 2>/dev/null)" || _uid="0" - if [[ -n "${XDG_RUNTIME_DIR:-}" && -d "${XDG_RUNTIME_DIR}" ]]; then - _key_dir="${XDG_RUNTIME_DIR}/pm-dispatch" - else - _key_dir="/tmp/pm-dispatch-${_uid}" - fi - printf '%s/%s' "$_key_dir" "$_run_id" + local _run_id="${1:-}" + detached_launch_key_file "pm-dispatch" "$_run_id" } _pmctl_dispatch_launch_supervisor() { local repo_root="${1:-}" spec_path="${2:-}" supervisor_log="${3:-}" pid_file="${4:-}" - local supervisor pid - - supervisor="$repo_root/scripts/dispatch-supervisor.sh" - mkdir -p "$(dirname "$supervisor_log")" "$(dirname "$pid_file")" || return 1 - - if command -v setsid >/dev/null 2>&1; then - setsid nohup bash "$supervisor" --run-spec "$spec_path" \ - "$supervisor_log" 2>&1 & - pid=$! - else - nohup bash "$supervisor" --run-spec "$spec_path" \ - "$supervisor_log" 2>&1 & - pid=$! - disown "$pid" 2>/dev/null || true - fi - printf '%s\n' "$pid" > "$pid_file" || return 1 - return 0 + local supervisor="$repo_root/scripts/dispatch-supervisor.sh" + detached_launch_under_setsid "$supervisor" "$supervisor_log" "$pid_file" -- --run-spec "$spec_path" } # Detached lifecycle launcher. Splits the core --cd / @@ -865,6 +844,14 @@ pmctl_dispatch_run_detached() { shift 8 || true local -a forward=("$@") + if [[ "$(type -t detached_launch_generate_nonce 2>/dev/null)" != function ]]; then + local _dl_lib="$repo_root/scripts/lib/detached-launch.sh" + if [[ -r "$_dl_lib" ]]; then + # shellcheck disable=SC1090,SC1091 + . "$_dl_lib" 2>/dev/null || true + fi + fi + # Separate the core --cd / --brief-file (recorded as trusted scalars) from the # native passthrough args. The caller passes the EFFECTIVE brief (the auto-pack # augmented copy when auto-pack ran, else the original) as brief_file AND has @@ -925,9 +912,7 @@ pmctl_dispatch_run_detached() { # an executor that can only read workspace files. The key file is stored in a # per-user private directory (mode 700) so only the owning user can access it. local _sup_nonce _sup_key_file _sup_key_dir - _sup_nonce="$(LC_ALL=C tr -dc 'A-Za-z0-9' /dev/null | head -c 32 2>/dev/null)" \ - || _sup_nonce="${RANDOM}${RANDOM}${RANDOM}" - [[ -n "$_sup_nonce" ]] || _sup_nonce="${RANDOM}${RANDOM}${RANDOM}" + _sup_nonce="$(detached_launch_generate_nonce)" _sup_key_file="$(_pmctl_sentinel_key_file "$run_id")" _sup_key_dir="$(dirname "$_sup_key_file")" # Create the per-user key dir, then verify it is owner-only AND owned by us. @@ -936,21 +921,14 @@ pmctl_dispatch_run_detached() { # permissive or foreign-owned dir could expose nonce files. So: mkdir, chmod 700 # (tightens an owner-owned-but-loose dir; fails if we do not own it), and refuse # any dir not owned by the current uid. - mkdir -p "$_sup_key_dir" 2>/dev/null || { - printf 'pmctl dispatch run: failed to create private key directory: %s\n' "$_sup_key_dir" >&2 - return 2 - } - chmod 700 "$_sup_key_dir" 2>/dev/null || { - printf 'pmctl dispatch run: failed to secure private key directory (not owner?): %s\n' "$_sup_key_dir" >&2 - return 2 - } - local _key_dir_owner - _key_dir_owner="$(stat -c '%u' "$_sup_key_dir" 2>/dev/null || stat -f '%u' "$_sup_key_dir" 2>/dev/null || true)" - if [[ -n "$_key_dir_owner" && "$_key_dir_owner" != "$(id -u)" ]]; then - printf 'pmctl dispatch run: refusing key directory not owned by current user (owner uid=%s): %s\n' "$_key_dir_owner" "$_sup_key_dir" >&2 - return 2 - fi - printf '%s' "$_sup_nonce" > "$_sup_key_file" 2>/dev/null || { + detached_launch_secure_key_dir "$_sup_key_dir" + case "$?" in + 0) : ;; + 1) printf 'pmctl dispatch run: failed to create private key directory: %s\n' "$_sup_key_dir" >&2; return 2 ;; + 2) printf 'pmctl dispatch run: failed to secure private key directory (not owner?): %s\n' "$_sup_key_dir" >&2; return 2 ;; + 3) printf 'pmctl dispatch run: refusing key directory not owned by current user: %s\n' "$_sup_key_dir" >&2; return 2 ;; + esac + detached_launch_write_key_file "$_sup_key_file" "$_sup_nonce" || { printf 'pmctl dispatch run: failed to write sentinel key file\n' >&2 return 2 } @@ -996,8 +974,9 @@ pmctl_dispatch_run_detached() { "failed" 2 "$model" "$brief_file" "" "$created_ts" "dispatched" 2>/dev/null || true pmctl_dispatch_write_record_soft "$run_id" "$adapter" "$model" "$brief_file" \ "$work_dir" 2 "failed" "supervisor launch failed" "" "" "" "$created_ts" "$_launch_fail_ts" - printf 'final_state=failed\nexit_code=2\n' \ - > "/tmp/pm-supervisor-sentinel-${run_id}-${_sup_nonce}" 2>/dev/null || true + detached_launch_write_sentinel \ + "$(detached_launch_sentinel_path "pm-supervisor" "$run_id" "$_sup_nonce")" \ + "final_state=failed" "exit_code=2" rm -f "$brief_file" 2>/dev/null || true # Keep the key file: dispatch wait must read the nonce to authenticate the # failure sentinel above (exit 2). Removing it here would force wait into the @@ -1016,6 +995,14 @@ pmctl_dispatch_wait() { shift || true local run_id="" work_dir="" timeout=3600 + if [[ "$(type -t detached_launch_wait_for_sentinel 2>/dev/null)" != function ]]; then + local _dl_lib="$_repo_root/scripts/lib/detached-launch.sh" + if [[ -r "$_dl_lib" ]]; then + # shellcheck disable=SC1090,SC1091 + . "$_dl_lib" 2>/dev/null || true + fi + fi + while [[ $# -gt 0 ]]; do case "$1" in --cd) @@ -1100,38 +1087,31 @@ pmctl_dispatch_wait() { return 2 fi - local _sentinel="/tmp/pm-supervisor-sentinel-${run_id}-${_key_nonce}" - local start elapsed - start="$SECONDS" - while true; do - if [[ -f "$_sentinel" ]]; then - local _sent_state _sent_exit - _sent_state="$(grep -m1 '^final_state=' "$_sentinel" 2>/dev/null | cut -d= -f2-)" || true - _sent_exit="$(grep -m1 '^exit_code=' "$_sentinel" 2>/dev/null | cut -d= -f2-)" || true - rm -f "$_sentinel" "$_key_file" 2>/dev/null || true - [[ "$_sent_exit" =~ ^-?[0-9]+$ ]] || _sent_exit="1" - if dispatch_record_read_state "$work_dir" "$run_id"; then - printf 'run: %s state: %s exit: %s\n' "$run_id" \ - "${_sent_state:-$DISPATCH_RECORD_STATE}" "${_sent_exit:-$DISPATCH_RECORD_EXIT}" - if [[ -n "${DISPATCH_RECORD_SUMMARY:-}" ]]; then - printf '%s\n' "$DISPATCH_RECORD_SUMMARY" - fi - else - printf 'run: %s state: %s exit: %s\n' "$run_id" \ - "${_sent_state:-unknown}" "${_sent_exit:-1}" - if [[ "${_sent_exit:-1}" -eq 0 ]]; then - printf 'pmctl dispatch wait: WARN: sentinel ok but durable dispatch record not found for %s in %s\n' "$run_id" "$work_dir" >&2 - fi + local _sentinel + _sentinel="$(detached_launch_sentinel_path "pm-supervisor" "$run_id" "$_key_nonce")" + if detached_launch_wait_for_sentinel "$_sentinel" "$timeout" "${PM_DISPATCH_WAIT_POLL_INTERVAL:-2}"; then + local _sent_state _sent_exit + _sent_state="$(grep -m1 '^final_state=' "$_sentinel" 2>/dev/null | cut -d= -f2-)" || true + _sent_exit="$(grep -m1 '^exit_code=' "$_sentinel" 2>/dev/null | cut -d= -f2-)" || true + rm -f "$_sentinel" "$_key_file" 2>/dev/null || true + [[ "$_sent_exit" =~ ^-?[0-9]+$ ]] || _sent_exit="1" + if dispatch_record_read_state "$work_dir" "$run_id"; then + printf 'run: %s state: %s exit: %s\n' "$run_id" \ + "${_sent_state:-$DISPATCH_RECORD_STATE}" "${_sent_exit:-$DISPATCH_RECORD_EXIT}" + if [[ -n "${DISPATCH_RECORD_SUMMARY:-}" ]]; then + printf '%s\n' "$DISPATCH_RECORD_SUMMARY" + fi + else + printf 'run: %s state: %s exit: %s\n' "$run_id" \ + "${_sent_state:-unknown}" "${_sent_exit:-1}" + if [[ "${_sent_exit:-1}" -eq 0 ]]; then + printf 'pmctl dispatch wait: WARN: sentinel ok but durable dispatch record not found for %s in %s\n' "$run_id" "$work_dir" >&2 fi - return "${_sent_exit:-1}" - fi - elapsed=$((SECONDS - start)) - if (( elapsed >= timeout )); then - printf 'pmctl dispatch wait: timed out after %ss waiting for %s in %s\n' "$timeout" "$run_id" "$work_dir" >&2 - return 124 fi - sleep "${PM_DISPATCH_WAIT_POLL_INTERVAL:-2}" - done + return "${_sent_exit:-1}" + fi + printf 'pmctl dispatch wait: timed out after %ss waiting for %s in %s\n' "$timeout" "$run_id" "$work_dir" >&2 + return 124 } pmctl_dispatch_run() { diff --git a/scripts/lib/pmctl-gate.sh b/scripts/lib/pmctl-gate.sh index e9ee615e..8fc88ddd 100644 --- a/scripts/lib/pmctl-gate.sh +++ b/scripts/lib/pmctl-gate.sh @@ -18,14 +18,8 @@ _pmctl_gate_hex6() { # mirroring _pmctl_sentinel_key_file in pmctl-dispatch.sh but rooted at a # separate /tmp namespace so gate and dispatch sentinels never collide. _pmctl_gate_sentinel_key_file() { - local _gate_id="${1:-}" _uid _key_dir - _uid="$(id -u 2>/dev/null)" || _uid="0" - if [[ -n "${XDG_RUNTIME_DIR:-}" && -d "${XDG_RUNTIME_DIR}" ]]; then - _key_dir="${XDG_RUNTIME_DIR}/pm-gate-dispatch" - else - _key_dir="/tmp/pm-gate-dispatch-${_uid}" - fi - printf '%s/%s' "$_key_dir" "$_gate_id" + local _gate_id="${1:-}" + detached_launch_key_file "pm-gate-dispatch" "$_gate_id" } pmctl_gate_run() { @@ -163,6 +157,14 @@ pmctl_gate_run_detached() { local repo_root="$1" effective_cd="$2"; shift 2 local -a forward=("$@") + if [[ "$(type -t detached_launch_generate_nonce 2>/dev/null)" != function ]]; then + local _dl_lib="$repo_root/scripts/lib/detached-launch.sh" + if [[ -r "$_dl_lib" ]]; then + # shellcheck disable=SC1090,SC1091 + . "$_dl_lib" 2>/dev/null || true + fi + fi + local gate_script="$repo_root/scripts/gate-supervisor.sh" if [[ ! -x "$gate_script" ]]; then printf 'pmctl gate run: gate-supervisor.sh not found or not executable: %s\n' "$gate_script" >&2 @@ -202,41 +204,25 @@ pmctl_gate_run_detached() { # nonce is passed to the supervisor via env (never written to a # workspace-readable file), mirroring pmctl_dispatch_run_detached. local _nonce _key_file _key_dir - _nonce="$(LC_ALL=C tr -dc 'A-Za-z0-9' /dev/null | head -c 32 2>/dev/null)" \ - || _nonce="${RANDOM}${RANDOM}${RANDOM}" - [[ -n "$_nonce" ]] || _nonce="${RANDOM}${RANDOM}${RANDOM}" + _nonce="$(detached_launch_generate_nonce)" _key_file="$(_pmctl_gate_sentinel_key_file "$gate_id")" _key_dir="$(dirname "$_key_file")" - mkdir -p "$_key_dir" 2>/dev/null || { - printf 'pmctl gate run: failed to create private key directory: %s\n' "$_key_dir" >&2 - return 2 - } - chmod 700 "$_key_dir" 2>/dev/null || { - printf 'pmctl gate run: failed to secure private key directory (not owner?): %s\n' "$_key_dir" >&2 - return 2 - } - local _key_dir_owner - _key_dir_owner="$(stat -c '%u' "$_key_dir" 2>/dev/null || stat -f '%u' "$_key_dir" 2>/dev/null || true)" - if [[ -n "$_key_dir_owner" && "$_key_dir_owner" != "$(id -u)" ]]; then - printf 'pmctl gate run: refusing key directory not owned by current user (owner uid=%s): %s\n' "$_key_dir_owner" "$_key_dir" >&2 - return 2 - fi - printf '%s' "$_nonce" > "$_key_file" 2>/dev/null || { + detached_launch_secure_key_dir "$_key_dir" + case "$?" in + 0) : ;; + 1) printf 'pmctl gate run: failed to create private key directory: %s\n' "$_key_dir" >&2; return 2 ;; + 2) printf 'pmctl gate run: failed to secure private key directory (not owner?): %s\n' "$_key_dir" >&2; return 2 ;; + 3) printf 'pmctl gate run: refusing key directory not owned by current user: %s\n' "$_key_dir" >&2; return 2 ;; + esac + detached_launch_write_key_file "$_key_file" "$_nonce" || { printf 'pmctl gate run: failed to write sentinel key file\n' >&2 return 2 } local supervisor_log="$gate_run_dir/supervisor.log" - if command -v setsid >/dev/null 2>&1; then - PM_GATE_SUPERVISOR_NONCE="$_nonce" setsid nohup bash "$gate_script" \ - --gate-id "$gate_id" --cd "$effective_cd" --run-dir "$gate_run_dir" -- ${forward[@]+"${forward[@]}"} \ - "$supervisor_log" 2>&1 & - else - PM_GATE_SUPERVISOR_NONCE="$_nonce" nohup bash "$gate_script" \ - --gate-id "$gate_id" --cd "$effective_cd" --run-dir "$gate_run_dir" -- ${forward[@]+"${forward[@]}"} \ - "$supervisor_log" 2>&1 & - disown $! 2>/dev/null || true - fi + PM_GATE_SUPERVISOR_NONCE="$_nonce" detached_launch_under_setsid \ + "$gate_script" "$supervisor_log" "" \ + -- --gate-id "$gate_id" --cd "$effective_cd" --run-dir "$gate_run_dir" -- ${forward[@]+"${forward[@]}"} printf '%s\n' "$gate_id" return 0 @@ -252,6 +238,14 @@ pmctl_gate_wait() { shift || true local gate_id="" work_dir="" timeout="${PM_GATE_WAIT_DEFAULT_TIMEOUT:-1200}" + if [[ "$(type -t detached_launch_wait_for_sentinel 2>/dev/null)" != function ]]; then + local _dl_lib="$repo_root/scripts/lib/detached-launch.sh" + if [[ -r "$_dl_lib" ]]; then + # shellcheck disable=SC1090,SC1091 + . "$_dl_lib" 2>/dev/null || true + fi + fi + while [[ $# -gt 0 ]]; do case "$1" in --cd) @@ -321,11 +315,9 @@ pmctl_gate_wait() { return 2 fi - local _sentinel="/tmp/pm-gate-sentinel-${gate_id}-${_key_nonce}" - local start elapsed - start="$SECONDS" - while true; do - if [[ -f "$_sentinel" ]]; then + local _sentinel + _sentinel="$(detached_launch_sentinel_path "pm-gate" "$gate_id" "$_key_nonce")" + if detached_launch_wait_for_sentinel "$_sentinel" "$timeout" "${PM_GATE_WAIT_POLL_INTERVAL:-2}"; then local _state _exit _result _state="$(grep -m1 '^final_state=' "$_sentinel" 2>/dev/null | cut -d= -f2-)" || true _exit="$(grep -m1 '^exit_code=' "$_sentinel" 2>/dev/null | cut -d= -f2-)" || true @@ -390,17 +382,12 @@ pmctl_gate_wait() { fi fi return "$_exit" - fi - elapsed=$((SECONDS - start)) - if (( elapsed >= timeout )); then - printf 'pmctl gate wait: timed out after %ss waiting for %s in %s\n' "$timeout" "$gate_id" "$work_dir" >&2 - # shellcheck disable=SC2016 # literal markdown backticks in the format string, not a command substitution - printf 'pmctl gate wait: the gate may still be running detached; retry `pmctl gate wait %s --cd %s`, or inspect `pmctl artifacts show %s --cd %s` for the supervisor log\n' \ - "$gate_id" "$work_dir" "$gate_id" "$work_dir" >&2 - return 124 - fi - sleep "${PM_GATE_WAIT_POLL_INTERVAL:-2}" - done + fi + printf 'pmctl gate wait: timed out after %ss waiting for %s in %s\n' "$timeout" "$gate_id" "$work_dir" >&2 + # shellcheck disable=SC2016 # literal markdown backticks in the format string, not a command substitution + printf 'pmctl gate wait: the gate may still be running detached; retry `pmctl gate wait %s --cd %s`, or inspect `pmctl artifacts show %s --cd %s` for the supervisor log\n' \ + "$gate_id" "$work_dir" "$gate_id" "$work_dir" >&2 + return 124 } # pmctl gate verify diff --git a/scripts/run-all-tests.sh b/scripts/run-all-tests.sh index 83438196..430ab42a 100755 --- a/scripts/run-all-tests.sh +++ b/scripts/run-all-tests.sh @@ -52,6 +52,7 @@ SUITE_NAMES=( test-run-all-tests test-timeout-resolve test-dispatch-common + test-detached-launch test-lint-model-aliases test-core-schemas test-pm-prep-snapshot @@ -120,6 +121,7 @@ declare -A SUITE_PATHS=( [test-run-all-tests]="scripts/test-run-all-tests.sh" [test-timeout-resolve]="scripts/test-timeout-resolve.sh" [test-dispatch-common]="scripts/test-dispatch-common.sh" + [test-detached-launch]="scripts/test-detached-launch.sh" [test-lint-model-aliases]="scripts/test-lint-model-aliases.sh" [test-core-schemas]="scripts/test-core-schemas.sh" [test-pm-prep-snapshot]="scripts/test-pm-prep-snapshot.sh" diff --git a/scripts/test-detached-launch.sh b/scripts/test-detached-launch.sh new file mode 100755 index 00000000..23ddb3af --- /dev/null +++ b/scripts/test-detached-launch.sh @@ -0,0 +1,291 @@ +#!/usr/bin/env bash +# Regression tests for scripts/lib/detached-launch.sh (CC-434). +# +# Covers the shared nonce/key-dir/sentinel primitives, plus a drift guard: the +# REPO_ROOT symlink-resolution block is intentionally NOT extracted into this +# lib (circular dependency — see docs/spikes/CC-433.md angle a3), so it is +# duplicated verbatim between scripts/gate-supervisor.sh and +# scripts/dispatch-supervisor.sh inside BEGIN/END resolve-root markers. This +# suite extracts and diffs the two marked blocks so a future edit to one that +# forgets the other fails loudly instead of silently drifting. +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +LIB="$REPO_ROOT/scripts/lib/detached-launch.sh" + +# shellcheck source=scripts/lib/test-harness.sh +. "$SCRIPT_DIR/lib/test-harness.sh" +th_init "$@" + +# shellcheck source=scripts/lib/detached-launch.sh +. "$LIB" + +_extract_marked_block() { + local file="$1" + sed -n '/# BEGIN resolve-root/,/# END resolve-root/p' "$file" +} + +# ---- 1: resolve-root inline blocks stay byte-identical across supervisors ---- +case_resolve_root_blocks_identical() { + local name="detached-launch/gate + dispatch supervisor resolve-root blocks match" + should_run "$name" || return 0 + + local gate_block dispatch_block + gate_block="$(_extract_marked_block "$REPO_ROOT/scripts/gate-supervisor.sh")" + dispatch_block="$(_extract_marked_block "$REPO_ROOT/scripts/dispatch-supervisor.sh")" + + if [[ -z "$gate_block" ]]; then + fail "$name" "no BEGIN/END resolve-root block found in gate-supervisor.sh" + return + fi + if [[ -z "$dispatch_block" ]]; then + fail "$name" "no BEGIN/END resolve-root block found in dispatch-supervisor.sh" + return + fi + + if [[ "$gate_block" == "$dispatch_block" ]]; then + pass "$name" + else + fail "$name" "resolve-root blocks diverged between gate-supervisor.sh and dispatch-supervisor.sh: +--- gate --- +$gate_block +--- dispatch --- +$dispatch_block" + fi +} + +# ---- 2: generate_nonce produces non-empty alnum output ---------------------- +case_generate_nonce_nonempty() { + local name="detached-launch/generate_nonce produces non-empty output" + should_run "$name" || return 0 + + local nonce + nonce="$(detached_launch_generate_nonce)" + if [[ -n "$nonce" ]] && [[ "$nonce" =~ ^[A-Za-z0-9]+$ ]]; then + pass "$name" + else + fail "$name" "nonce=$nonce" + fi +} + +# ---- 3: generate_nonce is not deterministic across calls -------------------- +case_generate_nonce_varies() { + local name="detached-launch/generate_nonce varies across calls" + should_run "$name" || return 0 + + local n1 n2 + n1="$(detached_launch_generate_nonce)" + n2="$(detached_launch_generate_nonce)" + if [[ "$n1" != "$n2" ]]; then + pass "$name" + else + fail "$name" "two consecutive nonces were identical: $n1" + fi +} + +# ---- 4: key_file honours XDG_RUNTIME_DIR when set ---------------------------- +case_key_file_xdg_runtime_dir() { + local name="detached-launch/key_file uses XDG_RUNTIME_DIR when set" + should_run "$name" || return 0 + + local xdg="$tmp_root/xdg1" + mkdir -p "$xdg" + local out + out="$(XDG_RUNTIME_DIR="$xdg" detached_launch_key_file "pm-test-ns" "some-id")" + if [[ "$out" == "$xdg/pm-test-ns/some-id" ]]; then + pass "$name" + else + fail "$name" "out=$out" + fi +} + +# ---- 5: key_file falls back to /tmp when XDG_RUNTIME_DIR is unset/missing --- +case_key_file_tmp_fallback() { + local name="detached-launch/key_file falls back to /tmp when XDG_RUNTIME_DIR absent" + should_run "$name" || return 0 + + local out uid + uid="$(id -u)" + out="$(unset XDG_RUNTIME_DIR; detached_launch_key_file "pm-test-ns" "some-id")" + if [[ "$out" == "/tmp/pm-test-ns-${uid}/some-id" ]]; then + pass "$name" + else + fail "$name" "out=$out" + fi +} + +# ---- 6: secure_key_dir creates a mode-700 dir owned by current user ---------- +case_secure_key_dir_creates_700() { + local name="detached-launch/secure_key_dir creates mode-700 dir" + should_run "$name" || return 0 + + local dir="$tmp_root/keydir1/nested" + if detached_launch_secure_key_dir "$dir"; then + local mode + mode="$(stat -c '%a' "$dir" 2>/dev/null || stat -f '%A' "$dir" 2>/dev/null)" + if [[ "$mode" == "700" ]]; then + pass "$name" + else + fail "$name" "mode=$mode" + fi + else + fail "$name" "secure_key_dir returned non-zero" + fi +} + +# ---- 7: secure_key_dir returns 1 when mkdir -p fails (portable: parent is a file) -- +case_secure_key_dir_mkdir_failure() { + local name="detached-launch/secure_key_dir returns 1 when mkdir -p fails" + should_run "$name" || return 0 + + local notadir="$tmp_root/notadir7" + : > "$notadir" + local rc=0 + detached_launch_secure_key_dir "$notadir/sub" || rc=$? + if [[ "$rc" -eq 1 ]]; then + pass "$name" + else + fail "$name" "rc=$rc (expected 1 for mkdir failure under a non-directory parent)" + fi +} + +# ---- 8: write_key_file + read back round-trips the nonce -------------------- +case_write_key_file_roundtrip() { + local name="detached-launch/write_key_file round-trips nonce" + should_run "$name" || return 0 + + local dir="$tmp_root/keydir2" file + detached_launch_secure_key_dir "$dir" + file="$dir/some-id" + detached_launch_write_key_file "$file" "abc123" + local got + got="$(cat "$file" 2>/dev/null)" + if [[ "$got" == "abc123" ]]; then + pass "$name" + else + fail "$name" "got=$got" + fi +} + +# ---- 9: sentinel_path is deterministic given the same inputs ---------------- +case_sentinel_path_deterministic() { + local name="detached-launch/sentinel_path deterministic" + should_run "$name" || return 0 + + local p1 p2 + p1="$(detached_launch_sentinel_path "pm-gate" "gate-1" "nonceX")" + p2="$(detached_launch_sentinel_path "pm-gate" "gate-1" "nonceX")" + if [[ "$p1" == "$p2" ]] && [[ "$p1" == "/tmp/pm-gate-sentinel-gate-1-nonceX" ]]; then + pass "$name" + else + fail "$name" "p1=$p1 p2=$p2" + fi +} + +# ---- 10: write_sentinel + wait_for_sentinel round-trip on immediate write --- +case_wait_for_sentinel_success() { + local name="detached-launch/wait_for_sentinel returns 0 once file exists" + should_run "$name" || return 0 + + local sentinel="$tmp_root/sentinel1" + detached_launch_write_sentinel "$sentinel" "final_state=ok" "exit_code=0" + + if detached_launch_wait_for_sentinel "$sentinel" 5 1; then + local state + state="$(grep -m1 '^final_state=' "$sentinel" | cut -d= -f2-)" + if [[ "$state" == "ok" ]]; then + pass "$name" + else + fail "$name" "state=$state" + fi + else + fail "$name" "wait_for_sentinel timed out on a pre-existing file" + fi +} + +# ---- 11: wait_for_sentinel returns 124 on timeout ---------------------------- +case_wait_for_sentinel_timeout() { + local name="detached-launch/wait_for_sentinel returns 124 on timeout" + should_run "$name" || return 0 + + local sentinel="$tmp_root/sentinel-never-appears" + local rc=0 + detached_launch_wait_for_sentinel "$sentinel" 1 1 || rc=$? + if [[ "$rc" -eq 124 ]]; then + pass "$name" + else + fail "$name" "rc=$rc" + fi +} + +# ---- 12: under_setsid launches a script and records its pid ----------------- +case_under_setsid_launches_and_records_pid() { + local name="detached-launch/under_setsid launches script and writes pid_file" + should_run "$name" || return 0 + + local script="$tmp_root/probe.sh" log="$tmp_root/probe.log" pidfile="$tmp_root/probe.pid" done_marker="$tmp_root/probe.done" + cat > "$script" <