Skip to content

fix(add-workers): pass discovered emails to cap-probe + retry discover without --exclude-tmux#137

Merged
NagyVikt merged 1 commit into
mainfrom
agent/claude/add-workers-discovery-fix-2026-05-16-00-41
May 15, 2026
Merged

fix(add-workers): pass discovered emails to cap-probe + retry discover without --exclude-tmux#137
NagyVikt merged 1 commit into
mainfrom
agent/claude/add-workers-discovery-fix-2026-05-16-00-41

Conversation

@NagyVikt
Copy link
Copy Markdown
Contributor

Automated by gx branch finish (PR flow).

…r without --exclude-tmux

Two cascading bugs in `scripts/codex-fleet/add-workers.sh` made the discovery
wrapper report `only 0 healthy unallocated accounts available` and then fail
with `no healthy accounts available`, even when ≥1 account in the canonical
`agent-auth list` pool passed the 5h<100% / weekly<90% / not-already-active
filter.

Root cause #1 — cap-probe invoked with no email arguments

`pick_accounts()` called `cap-probe.sh "$need"` (only the count). cap-probe's
shebang is `<need_n> email1 email2 ...`: after `shift`, it iterates
`for email in "$@"` over an empty list, probes nothing, and exits with 0
healthy rows. The wrapper then took the empty result as "0 healthy" and
moved on.

Root cause #2 — discover-accounts fail-closes when target tmux session is absent

`bash discover-accounts.sh --exclude-tmux <session>` runs
`tmux list-panes -s -t <session> | sed | sort | tr | sed` under
`set -eo pipefail`. On a host where `<session>` doesn't exist on the
default tmux server (e.g. running `add-workers.sh` outside the fleet
session, or with `CODEX_FLEET_TMUX_SOCKET` unset so the wrapper degrades
to the operator's default tmux), tmux exits 1, pipefail kicks in, the
helper exits before reaching its python emitter, and the wrapper sees an
empty tempfile. The wrapper then treated empty as "all candidates
allocated" instead of "tmux filter unusable, retry without it".

Fix (surgical, in-file only)

1. After the first discover-accounts call, if the tempfile is empty,
   retry without `--exclude-tmux`. We still keep `--exclude-active` so
   accounts already in `fleet-active-accounts.txt` are skipped.
2. Before invoking cap-probe, extract the email column from the
   discovered TSV and pass each email as a positional arg so cap-probe
   has something to probe. Empty discovery skips cap-probe entirely.

The helper-side bug (discover-accounts.sh exiting 1 when the tmux
session is missing instead of treating an empty tmux query as
"no live panes to exclude") is left untouched per file-scope contract;
the wrapper now compensates for it.

Verified on host 2026-05-16:
  bash -n scripts/codex-fleet/add-workers.sh             # exit 0
  docker run koalaman/shellcheck:stable …add-workers.sh  # only pre-existing findings
  bash scripts/codex-fleet/add-workers.sh 1 --dry-run    # picks admin-mite (1 healthy)
  bash scripts/codex-fleet/add-workers.sh 2 --dry-run    # picks 2 healthy
@NagyVikt NagyVikt merged commit 04b9ad3 into main May 15, 2026
@NagyVikt NagyVikt deleted the agent/claude/add-workers-discovery-fix-2026-05-16-00-41 branch May 15, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant