feat: autonomous owner harness — self-prompting controllers over flow#71
Conversation
|
Addressed an independent review pass:
Full suite green, vet clean. |
|
Closed out the independent review's test-gap section ( All review findings now addressed (blocking race, bonus self-paced-next-wake clobber, all nits, all test gaps). Full suite green, vet clean. Marking ready for review. |
|
Known follow-up (not in this PR — it's a behavior change, not a review fix): #75 — owner ticks don't capture a session id / transcript, so "what happened in the last tick" isn't inspectable the way playbook runs are. Proposed fix is to mint + record a fresh per-tick session id ( |
|
Second independent review (Fable 5) pass addressed in Two HIGH gaps fixed:
Mediums: absolute charter/journal paths in the tick prompt (headless tick no longer creates a stray Minors: 8 new TDD regressions, full suite green, vet clean. Verified no regressions to existing surfaces. Note: branch is behind |
…r flow Adds `owner`: a durable, repo-scoped self-prompting controller that takes ongoing responsibility for an outcome. Not a single session — it is a charter + a clock that wakes on an interval, runs a fresh headless tick, acts, and reschedules itself. Data layer (internal/flowdb): - owners table (additive migration) + Owner CRUD: Create/Get/List/Update - DueOwners with timezone-correct, parse-based time comparison - live-tick bookkeeping columns (tick_pid / tick_started) CLI (internal/app): - flow add owner / owner list (with next-tick) / show / start / pause - the tick: `flow owner tick-due` (scheduler scan) -> detached `flow __owner-tick` (fresh sessionless headless run via SkipPermissionsRun) - live tick indicator: `tick: running (pid ...)` in owner show, with read-time pid reconciliation (dead -> last tick) - flow add task --tag (repeatable); owner show splits out playbook runs Model: owners ORCHESTRATE, never execute inline. A tick is sessionless and gets no `flow done` KB/project sweep, so work is routed through playbook-runs (recurring) and tasks (one-time) that self-close with the sweep. Human questions are tasks tagged `question` + `owner:<slug>`. Owner<->task linkage and the question marker reuse the existing tag system (the only new table is `owners`). Docs: design spec at docs/superpowers/specs/2026-06-08-autonomous-owner-harness-design.md. Skill section 4.17 teaches the ownership model to harness sessions. All TDD (RED -> GREEN); full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The tick is a fresh sessionless run, so all continuity must live on disk. Two fixes: - Review step now uses `flow owner show <slug>` (lists in-flight tasks, playbook runs, AND questions with status) instead of `flow list tasks --tag` — which defaults to kind=regular and HID the owner's own playbook-runs, so a tick could re-spawn a run already in progress. - Tick journal: each tick now READS its recent notes under owners/<slug>/updates/ at the start (to recover what it dispatched / is waiting on) and WRITES a short note before exiting (what it observed, what it dispatched with slugs, what the next tick should check). Same updates/ convention as tasks and playbooks — the owner's durable memory across blank-session ticks. Skill §4.17 + §4.5 updated to teach the journal + owner-show review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… wake
Adds an on-demand way to wake an owner now, instead of only the scheduler:
flow owner tick <slug> interactive: spawns a tab the user drives
(may use AskUserQuestion, refine the charter live)
flow owner tick <slug> --auto headless on-demand (same as a scheduler tick)
Mirrors `flow do` vs `flow do --auto`. Interactive ticks mint a throwaway
session and reuse the spawner; they get a distinct prompt that permits the
human to navigate and folds lessons back into the charter. The on-demand
tick is extra — it does not disturb next_wake_at.
Use case: run the FIRST tick interactively to shepherd the owner and tune
the charter before it runs unattended (skill §4.17 nudges this when an
owner has never ticked). Scheduled ticks stay headless.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the fixed recurring interval with dynamic, per-run scheduling: - `flow owner next <slug> --in <dur> | --at <when>` sets next_wake_at. Each tick calls it before exiting to choose when it next needs to run (e.g. +15m while watching a deploy, +1d when idle). - `--every` is demoted to a fallback heartbeat floor: optional, defaults to 24h, and only re-wakes the owner if a tick crashes or never sets its next wake. It is no longer a rigid cron cadence. - Both tick prompts gain a self-pace step; the agent sets the cadence itself (no user confirmation, even on an interactive tick). - Intake asks only for a loose fallback interval, not a fixed schedule. Skill §4.17 + command reference updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flow stays pure CLI with no OS-specific scheduler code. Firing ticks is delegated to the host, set up by the skill — not compiled into the binary. - Overlap guard (portable Go): `flow owner tick-due` now skips any owner whose tick is still running (live tick_pid), so a slow tick never gets a second one stacked on it. A dead pid falls through and re-dispatches. - Scheduler is a SKILL recipe (§4.17), not a flow command: check → install if missing → verify → respawn, per host. macOS launchd LaunchAgent polling `flow owner tick-due` every 60s; Linux systemd-user-timer / cron variant. Idempotent, opt-in, easy to unload (stops all owners). No launchd code in flow — the OS glue lives in the skill, adapting per-OS at runtime. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
An interactive (`flow owner tick`) launch spawned the tab but never recorded anything, so `flow owner show` reported `last tick: (never)` even after a full interactive tick ran, journaled, and dispatched work. Now the interactive path stamps last_tick_at + last_tick_status='interactive' after a successful spawn. (Completion isn't tracked — the user drives the tab — but the session self-paces via `flow owner next` and journals as before.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…chd plist Hit in practice: launchd runs with a minimal PATH, so a dispatched tick failed with `exec: "claude": executable file not found in $PATH`. The ensure-scheduler recipe now requires embedding the user's full interactive $PATH via EnvironmentVariables.PATH in the plist (claude/gh/git live in ~/.local/bin, homebrew, etc.). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds a proper lifecycle end for owners (replacing hand-deleting DB rows): - `flow owner retire <slug>` — graceful: status=retired + archived. Stops ticking (DueOwners requires active), drops off the default list; charter, journal, tick logs, and owned tasks are preserved. - `flow owner retire <slug> --delete` — hard: removes the owner row AND the owners/<slug>/ directory. Owned tasks (tag owner:<slug>) are independent and left intact either way. flowdb: RetireOwner + DeleteOwner. Skill §4.17 lifecycle + reference updated (confirm via AskUserQuestion before retiring/deleting). All TDD. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The owner skill section had grown verbose with repeated rationale (orchestrate-never-inline stated 3×, self-pacing in 3 places) and a restated tick procedure that duplicates the tick prompt. Condensed to ~39% smaller while keeping every command, the tag contract, the journal, the scheduler recipe (incl. the launchd PATH gotcha), lifecycle, and all four anti-patterns. No behavior change — pure density. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Owners are a peer object of task/project/playbook, so they should be reachable via the same verb-first surface, not only the grouped `flow owner <verb>` form. Add `flow list owners` and `flow show owner <slug>` as aliases that delegate to the existing ownerList/ownerShow renderers (zero duplication — same function call). Scope is deliberately just list + show: those are the read/inspect verbs owners share with the other top-level objects, and the surface mismatch users actually hit. Lifecycle verbs (start/pause/next/retire/ tick) stay grouped under `flow owner` since they have no task analogue and read naturally as `flow owner pause` etc. `flow add owner` is unchanged. Tests assert the alias output is byte-identical to the grouped form so the two surfaces can't drift. Skill command reference notes both forms. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The scheduler (parent, `flow owner tick-due`) and the detached tick
(child, `flow __owner-tick`) both wrote the owner row via the full-row
`flowdb.UpdateOwner`. Writing a stale in-memory struct clobbered the
other writer's columns:
- A tick that finished before the scheduler recorded its pid had its
`last_tick_status` ('ok'/'error') overwritten with the pre-tick value,
and showed "running" with a dead pid until the next reconcile.
- A tick that self-paced mid-run (`flow owner next`) had its new
`next_wake_at` clobbered back to the value loaded when the tick
started — so self-pacing silently didn't stick.
Replace the full-row writes with targeted column updates that each touch
only the columns that writer owns, so the writes commute:
dispatch/start → tick_pid, tick_started (+ next_wake_at for scheduler)
finish → last_tick_at, last_tick_status, clear tick_pid/started
interactive → last_tick_* only (leaves next_wake_at for self-pace)
`reconcileOwnerTick` is now guarded (`WHERE tick_pid IS NOT NULL`) so it
can't overwrite a genuine result with 'dead' if a finish lands first —
mirroring the `--auto` path's `reconcileAutoRun` guard.
The one remaining overlap (a stale dead tick_pid from a fast-finishing
child) is self-healing via the dead-pid reconcile + DueOwners overlap
guard; no durable result is lost.
TDD: TestOwnerTickDueFastFinishingChildKeepsResult and
TestCmdOwnerTickPreservesSelfPacedNextWake both RED before, GREEN after.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…endly slug error Knocks out the non-blocking findings from the independent review: - `flow owner next` now rejects a wake time in the past (stale --at or a negative --in) instead of leaving the owner perpetually due and ticking every scheduler pass. - Tick paths guard on owner status: the detached `flow __owner-tick` skips a non-active owner cleanly (covers the dispatch→exec window and manual --auto on a paused owner), and the hand-triggered `flow owner tick` refuses a paused/retired owner with resume guidance instead of silently spawning a session. - `flow add owner --slug <existing>` now fails with a friendly "slug already exists" message rather than a raw UNIQUE-constraint error. - Skill + Owner doc-comment wording: drop the stale "fixed interval" phrasing for `--every` in favor of "fallback heartbeat floor" (it self-paces via `flow owner next`), matching the §4.17 body. (The reconcile-guard nit was already handled in the prior race-fix commit.) TDD: four regressions added (RED→GREEN), full suite green, vet clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the test cases the independent review flagged as missing — all exercise existing defensive/behavioral paths that were previously unasserted: - DueOwners skips an unparseable next_wake_at (doesn't crash the pass). - tick-due skips an owner with an invalid --every (CreateOwner/direct writes don't validate the interval like `flow add owner` does). - A second tick-due pass does not re-dispatch an owner whose next_wake was just advanced (proves the schedule-advance, with the pid forced dead so the overlap guard isn't what's masking it). - `flow owner next --at` accepts a date expression (tomorrow), not just RFC3339. - The detached tick's child env is sessionless — CLAUDE_CODE_SESSION_ID is stripped, so each tick is a fresh run (the central owner invariant). No production code changes. Full suite green, vet clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…, tick paths + robustness
Fixes the functionality gaps from the second independent review:
- **retire→start zombie (HIGH):** `ownerStart` now clears `archived_at`
(new `flowdb.ActivateOwner`), so `flow owner start` un-retires a retired
owner instead of leaving it active-but-archived (hidden from the list,
never dispatched by DueOwners). Skill updated: retire is reversible via
start.
- **`--tag` didn't intersect (HIGH):** `flow list tasks --tag a --tag b`
is now repeatable and ANDs the tags (new `TaskFilter.Tags`), so the
skill-documented ledger query `--tag owner:<slug> --tag question`
returns only that owner's questions instead of every owner's.
- **Tick prompt relative paths (MED):** the prompt now points at the
ABSOLUTE `$FLOW_ROOT/owners/<slug>/{charter.md,updates/}` (via
ownerDirFor) and warns it's not under work_dir — a headless tick
(cwd=work_dir) no longer fails to find its charter or creates a stray
owners/ dir in the user's repo.
- **PID-reuse staleness cap (MED):** a tick_pid that looks alive long
after tick_started (reboot pid-reuse, incl. EPERM=alive) no longer
stalls an owner forever — `ownerTickStale` (1h cap) lets the overlap
guard re-dispatch and reconcile heal it.
- **Lifecycle full-row writes (LOW-MED):** `ownerNext`/`Start`/`Pause`
now use targeted column writes (SetOwnerNextWake/ActivateOwner/
PauseOwner) so they can't clobber concurrent tick bookkeeping (extends
the cea8462 fix to the human-facing commands).
- **Status-guard skip leaked a stale pid:** the non-active skip in
`cmdOwnerTick` now clears tick_pid/tick_started so reconcile doesn't
later mislabel a clean skip as 'dead'.
- **Minors:** `flow help` now lists the owner commands; `owner show`
drops done tasks from the "in flight" section.
TDD: 8 new regressions (RED→GREEN) across flowdb + app; one existing test
updated (a "running tick" needs a recent tick_started now that the
staleness cap exists). Full suite green, vet clean. No regressions to
existing surfaces.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…l pattern Adds the documented path for making an owner event-driven (reactive on demand) while staying a cheap poller by default — the "own the whole latency curve" capability. - `flow owner tick --auto` now carries the same overlap guard as the scheduler: it refuses to stack a headless tick on top of one already running (dead/stale pid falls through). This is the prerequisite for event triggers — a watcher's `owner tick --auto` can now race a scheduled tick without double-firing. (Closes the deferred Fable nit.) - Skill §4.17: new "Event-driven owners" subsection. A tick can't hold a Monitor (it's headless and exits), so it dispatches a BOUNDED watcher task (`flow do --auto` + Monitor tool) that, on the event, writes a focus note to the journal and fires a focused `flow owner tick --auto`, then exits — back to spaced polling. Framed as bounded-only (a watcher is a living session; never make it permanent). TDD: TestOwnerTickManualAutoRefusesWhileTickRunning (RED→GREEN). Full suite green, vet clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
Adds owners — durable, named, repo-scoped self-prompting controllers that take ongoing responsibility for an outcome (re-wake, re-evaluate, act until done), vs. flow's one-shot
flow do --autoand manual playbooks.An owner is not a single Claude session. It is a charter (operating manual) + a journal + a clock. Each tick is a fresh headless run that reads its charter + journal, reviews what it owns, orchestrates (never executes inline — it routes work through tasks/playbook-runs that self-close with the
flow doneKB sweep), parks human decisions as question-tasks, self-paces its next wake, and journals.Design doc:
docs/superpowers/specs/2026-06-08-autonomous-owner-harness-design.mdHow ticks fire (no daemon, portable)
flow owner tick-due(scan due owners → dispatch detached ticks).Commands
Model highlights
flow donesweep), so they route work through tasks (one-time) / playbooks (recurring) that self-close and capture learnings.owners; owner↔task linkage and the question marker reuse the existing tag system (owner:<slug>,question).flow owner next);--everyis just a fallback heartbeat floor.owners/<slug>/updates/, read at tick start, written before exit (the owner's memory across blank-session ticks).Tests
All TDD (RED → GREEN). Full suite green, no regressions.
Status
Draft — opening for review before merge. Dogfooding live: an
af-stabilityowner is already watching Agent Factory prod (found a stuck scheduled job, autonomously raised a GitHub issue + TDD'd PR, stopped before merge).🤖 Generated with Claude Code