Skip to content

fix(spawn): reject unknown harness with 400 instead of opaque 500#211

Open
codebanditssss wants to merge 1 commit into
mainfrom
fix/spawn-unknown-harness-400
Open

fix(spawn): reject unknown harness with 400 instead of opaque 500#211
codebanditssss wants to merge 1 commit into
mainfrom
fix/spawn-unknown-harness-400

Conversation

@codebanditssss

Copy link
Copy Markdown
Collaborator

Closes #210.

Problem

ao spawn --harness <unknown> returned an opaque INTERNAL_ERROR 500. The harness was only resolved at m.agents.Agent(cfg.Harness) deep inside Manager.Spawnafter the seed row, worktree, and provisioning. On a miss it returned a plain (untyped) error, so toAPIError fell through to its default and the envelope collapsed it to 500. The late failure also left a terminated orphan row and a wasted worktree (the "create row before validating preconditions" pattern from #152).

This is the unknown-harness-name sibling of #152 Bug 6 (unknown binary on PATHAGENT_BINARY_NOT_FOUND); that fix doesn't cover a harness with no registered adapter.

Change

  • New sentinel ErrUnknownHarness, mapped in toAPIErrorapierr.Invalid("UNKNOWN_HARNESS", ...) (400).
  • Validate the harness against the registry immediately after effectiveHarness, before CreateSession. No durable state is created for a doomed spawn — no orphan row, no worktree.

Test

  • TestSpawn_RejectsUnknownHarness: returns ErrUnknownHarness, and asserts no session row, workspace, or runtime is created.
  • TestToAPIErrorMapsWorkspaceBranchSentinels: adds the UNKNOWN_HARNESS mapping case.
  • go build ./... and full go test ./... pass.
  • End-to-end against a live daemon: --harness bogus-agent now returns unknown agent harness: "bogus-agent" (UNKNOWN_HARNESS) (exit 1) instead of Internal server error; a valid --harness codex still spawns.

An unknown --harness was only caught at the agent-registry lookup deep in
Spawn, after the seed row and worktree were already created: the untyped
error collapsed to INTERNAL_ERROR 500 and left a terminated orphan row.

Validate the harness against the registry before any durable state is
created and return a typed ErrUnknownHarness mapped to UNKNOWN_HARNESS (400).
Sibling to #152 Bug 6 (unknown binary on PATH), which did not cover a harness
with no registered adapter.

Closes #210
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Fixes opaque 500s when ao spawn --harness <unknown> is called by introducing ErrUnknownHarness and validating the harness registry immediately after effectiveHarness, before any durable state (CreateSession, workspace, runtime) is created.

  • manager.go: adds the early guard; ErrUnknownHarness sentinel exported for mapping. A now-unreachable late fallback (agent, ok := m.agents.Agent(cfg.Harness); if !ok { ... }) was not removed and could mislead future readers.
  • service.go: maps ErrUnknownHarnessapierr.Invalid(\"UNKNOWN_HARNESS\", ...) (400) in toAPIError, consistent with the surrounding sentinel pattern.
  • Tests cover both the manager-level rejection (no session row / workspace / runtime created) and the service-level error-code mapping.

Confidence Score: 4/5

Safe to merge; the early harness guard correctly prevents orphan rows and the 400 mapping is consistent with existing sentinel handling.

The change is well-scoped and the new code paths are covered by tests. The one thing worth a second look is the unreachable late fallback in manager.go — it doesn't cause incorrect behaviour today, but it reads as if it still handles the unknown-harness case and could mislead someone who later rearranges the guard.

backend/internal/session_manager/manager.go — the stale agent, ok := m.agents.Agent block after workspace creation should be cleaned up.

Important Files Changed

Filename Overview
backend/internal/session_manager/manager.go Adds early harness guard before CreateSession; leaves a now-unreachable late fallback that becomes misleading dead code
backend/internal/service/session/service.go Adds ErrUnknownHarness → UNKNOWN_HARNESS 400 mapping in toAPIError, consistent with surrounding sentinel cases
backend/internal/session_manager/manager_test.go Adds missingAgents stub and TestSpawn_RejectsUnknownHarness, correctly asserting no session row, workspace, or runtime is created
backend/internal/service/session/service_test.go Adds UNKNOWN_HARNESS test case to toAPIError sentinel table; error wrapping correctly exercises errors.Is

Sequence Diagram

sequenceDiagram
    participant CLI as ao CLI
    participant Svc as session.Service
    participant Mgr as session_manager.Manager
    participant Reg as agents.Registry
    participant DB as Store

    CLI->>Svc: POST /spawn (harness: "bogus")
    Svc->>Mgr: Spawn(cfg)
    Mgr->>Mgr: loadProject()
    Mgr->>Mgr: effectiveHarness()
    Mgr->>Reg: Agent("bogus")
    Reg-->>Mgr: nil, false
    Mgr-->>Svc: ErrUnknownHarness (no DB write, no workspace)
    Svc->>Svc: toAPIError → UNKNOWN_HARNESS
    Svc-->>CLI: 400 UNKNOWN_HARNESS

    Note over Mgr,DB: Before this PR: Store.CreateSession() called here,<br/>leaving an orphan row on harness miss
Loading

Comments Outside Diff (1)

  1. backend/internal/session_manager/manager.go, line 219-224 (link)

    P2 Late harness fallback is now unreachable dead code

    The early guard (introduced in this PR) calls m.agents.Agent(cfg.Harness) and returns before any durable state is created; if execution reaches line 219, the registry was already confirmed to have the harness, so the !ok branch can never fire. Leaving the stale block in place is misleading — a future maintainer might soften the early check (e.g. move it later "for symmetry") without realising the late block is what prevents orphan rows, because the late block looks like it handles that case. The second lookup should either be simplified to agent, _ := m.agents.Agent(cfg.Harness) with a comment, or the early check should capture and propagate agent so the registry is only queried once.

Reviews (1): Last reviewed commit: "fix(spawn): reject unknown harness with ..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ao spawn --harness <unknown> returns opaque 500 instead of 400

2 participants