Skip to content

Add idle_output_timeout as Per-Step Recipe Override#793

Merged
Trecek merged 10 commits intointegrationfrom
add-idle-output-timeout-as-per-step-recipe-override/785
Apr 13, 2026
Merged

Add idle_output_timeout as Per-Step Recipe Override#793
Trecek merged 10 commits intointegrationfrom
add-idle-output-timeout-as-per-step-recipe-override/785

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 13, 2026

Summary

Thread idle_output_timeout through the same per-step override chain that stale_threshold
already uses. Add the field to RecipeStep, parse it from YAML, validate it (0 = disabled,
negative = error), whitelist it in rules_tools.py, add it to the HeadlessExecutor protocol,
thread it through tools_execution.py and both DefaultHeadlessExecutor.run() and
run_headless_core(), then set idle_output_timeout: 0 (disabled) on the five compute-heavy
steps in research.yaml.

The key semantic difference from stale_threshold: 0 is a valid recipe value meaning
"no idle timeout for this step" (maps to None in run_managed_async's API), whereas
stale_threshold rejects 0 as invalid. The 0→None conversion happens at the resolution
layer (both DefaultHeadlessExecutor.run() and run_headless_core()).

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 55, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([START: Recipe YAML loaded])
    COMPLETE_WATCH([COMPLETE: idle watchdog active])
    COMPLETE_NO_WATCH([COMPLETE: no idle watchdog])
    IDLE_STALL([IDLE_STALL: process killed])

    subgraph RecipeLayer ["Recipe Layer — ● schema.py · ● io.py · ● validator.py · ● rules_tools.py"]
        direction TB
        ParseStep["● _parse_step()<br/>━━━━━━━━━━<br/>recipe/io.py<br/>data.get('idle_output_timeout')"]
        StepSchema["● RecipeStep<br/>━━━━━━━━━━<br/>idle_output_timeout: int | None<br/>None = use global cfg<br/>0 = disabled for this step"]
        Validate["● validate_recipe()<br/>━━━━━━━━━━<br/>recipe/validator.py<br/>must be int ≥ 0, or None"]
        RulesTools["● _TOOL_PARAMS[run_skill]<br/>━━━━━━━━━━<br/>recipe/rules_tools.py<br/>idle_output_timeout registered<br/>as first-class MCP param"]
    end

    subgraph MCPLayer ["MCP Layer — ● tools_execution.py"]
        direction TB
        RunSkill["● run_skill MCP tool<br/>━━━━━━━━━━<br/>server/tools_execution.py<br/>idle_output_timeout: int | None<br/>→ float(v) if v is not None else None"]
    end

    subgraph ExecutorLayer ["Executor Layer — ● _type_protocols.py · ● headless.py"]
        direction TB
        Protocol["● HeadlessExecutor.run()<br/>━━━━━━━━━━<br/>core/_type_protocols.py<br/>idle_output_timeout: float | None"]
        DefaultExec["● DefaultHeadlessExecutor.run()<br/>━━━━━━━━━━<br/>execution/headless.py<br/>_raw = val ?? float(cfg.idle_output_timeout=600)"]
        ZeroGate{"_raw_idle > 0.0?<br/>━━━━━━━━━━<br/>0 → None gate"}
    end

    subgraph CoreExec ["Core Execution — ● headless.py · process.py"]
        direction TB
        HeadlessCore["● run_headless_core()<br/>━━━━━━━━━━<br/>execution/headless.py<br/>passes effective_idle to runner<br/>(0→None guard is no-op here)"]
        ManagedAsync["run_managed_async()<br/>━━━━━━━━━━<br/>execution/process.py<br/>idle_output_timeout: float | None"]
        WatchDecision{"idle_output_timeout<br/>is not None?<br/>━━━━━━━━━━<br/>start watchdog?"}
        WatchStdout["_watch_stdout_idle()<br/>━━━━━━━━━━<br/>anyio task group coroutine<br/>polls stdout for idle time"]
    end

    %% FLOW %%
    START --> ParseStep
    ParseStep --> StepSchema
    StepSchema --> Validate
    Validate -->|"valid: int ≥ 0 or None"| RulesTools
    Validate -->|"invalid: raises error"| IDLE_STALL
    RulesTools --> RunSkill

    RunSkill --> Protocol
    Protocol --> DefaultExec
    DefaultExec --> ZeroGate
    ZeroGate -->|"yes → effective_idle: float"| HeadlessCore
    ZeroGate -->|"no (0.0) → effective_idle: None"| HeadlessCore
    HeadlessCore --> ManagedAsync
    ManagedAsync --> WatchDecision
    WatchDecision -->|"not None"| WatchStdout
    WatchDecision -->|"None (disabled)"| COMPLETE_NO_WATCH
    WatchStdout -->|"idle < threshold"| COMPLETE_WATCH
    WatchStdout -->|"idle ≥ threshold"| IDLE_STALL

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class COMPLETE_WATCH,COMPLETE_NO_WATCH output;
    class IDLE_STALL detector;
    class StepSchema stateNode;
    class ParseStep,RulesTools phase;
    class Validate detector;
    class RunSkill handler;
    class Protocol phase;
    class DefaultExec,HeadlessCore,ManagedAsync handler;
    class ZeroGate,WatchDecision stateNode;
    class WatchStdout phase;
Loading

Data Lineage Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart LR
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    subgraph Origins ["Data Origins"]
        direction TB
        DEFAULTS["defaults.yaml<br/>━━━━━━━━━━<br/>run_skill:<br/>  idle_output_timeout: 600"]
        RESEARCH["● research.yaml<br/>━━━━━━━━━━<br/>5 steps: idle_output_timeout: 0<br/>implement / troubleshoot / experiment"]
    end

    subgraph GlobalCfg ["Global Config (Fallback Source of Truth)"]
        SETTINGS["settings.py RunSkillConfig<br/>━━━━━━━━━━<br/>idle_output_timeout: int = 600<br/>loaded from defaults.yaml"]
    end

    subgraph SchemaValidation ["● Schema, Parse & Validation"]
        direction TB
        PARSE["● io.py _parse_step()<br/>━━━━━━━━━━<br/>data.get('idle_output_timeout')<br/>raw int | None from YAML"]
        RECIPE_STEP["● schema.py RecipeStep<br/>━━━━━━━━━━<br/>idle_output_timeout: int | None<br/>None = use global; 0 = disabled"]
        VALIDATOR["● validator.py<br/>━━━━━━━━━━<br/>non-negative int required when set<br/>0 allowed (explicit disable)"]
        RULES["● rules_tools.py<br/>━━━━━━━━━━<br/>idle_output_timeout whitelisted<br/>in per-step field set (lint-time)"]
    end

    subgraph ExecPipeline ["● Execution Pipeline"]
        direction TB
        PROTOCOL["● _type_protocols.py<br/>━━━━━━━━━━<br/>HeadlessExecutor.run() protocol<br/>idle_output_timeout: float | None"]
        RUN_SKILL["● tools_execution.run_skill()<br/>━━━━━━━━━━<br/>int | None param<br/>→ float | None cast before executor"]
        HEADLESS["● HeadlessExecutor.run()<br/>━━━━━━━━━━<br/>None → fallback to cfg.idle_output_timeout<br/>0.0 → None (disabled)"]
        CORE["● run_headless_core()<br/>━━━━━━━━━━<br/>effective_idle: float | None<br/>passes to subprocess runner"]
    end

    subgraph Watchdog ["Process Watchdog (Consumption)"]
        direction TB
        MANAGED["run_managed_async()<br/>━━━━━━━━━━<br/>spawns _watch_stdout_idle task<br/>only when value is positive float"]
        WATCH["_watch_stdout_idle()<br/>━━━━━━━━━━<br/>polls stdout file size every 5s<br/>fires if size stalls ≥ timeout"]
        KILL["async_kill_process_tree()<br/>━━━━━━━━━━<br/>IDLE_STALL → kill headless Claude<br/>acc.idle_stall = True"]
    end

    %% PRIMARY DATA FLOWS %%
    DEFAULTS -->|"dynaconf load"| SETTINGS
    RESEARCH -->|"YAML parse"| PARSE
    PARSE -->|"RecipeStep(idle_output_timeout=...)"| RECIPE_STEP
    RECIPE_STEP -->|"validate at load time"| VALIDATOR
    RECIPE_STEP -->|"step.idle_output_timeout"| RUN_SKILL
    SETTINGS -->|"fallback when None"| HEADLESS
    RUN_SKILL -->|"float | None"| HEADLESS
    HEADLESS -->|"effective_idle: float | None"| CORE
    CORE -->|"runner(..., idle_output_timeout=...)"| MANAGED
    MANAGED -->|"tg.start_soon()"| WATCH
    WATCH -->|"trigger.set() on expiry"| KILL

    %% SECONDARY / STRUCTURAL FLOWS %%
    RECIPE_STEP -.->|"lint-time field whitelist"| RULES
    PROTOCOL -.->|"declares interface"| HEADLESS

    %% CLASS ASSIGNMENTS %%
    class DEFAULTS,RESEARCH cli;
    class SETTINGS,RECIPE_STEP,PROTOCOL stateNode;
    class PARSE,RUN_SKILL,MANAGED handler;
    class HEADLESS,CORE phase;
    class VALIDATOR,RULES,WATCH detector;
    class KILL integration;
Loading

Operational Diagram

%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ── CONFIGURATION SOURCES ── %%
    subgraph Config ["CONFIGURATION (read-only inputs)"]
        direction TB

        GLOBAL_CFG["RunSkillConfig<br/>━━━━━━━━━━<br/>idle_output_timeout = 600<br/>(global default, seconds)<br/>config/settings.py"]

        RESEARCH_YAML["● research.yaml<br/>━━━━━━━━━━<br/>5 compute-heavy steps:<br/>implement_phase<br/>run_experiment<br/>re_run_experiment<br/>adjust_experiment<br/>troubleshoot_implement_failure<br/>→ idle_output_timeout: 0"]
    end

    %% ── RECIPE PIPELINE ── %%
    subgraph RecipePipeline ["● RECIPE PIPELINE (parse → validate → schema)"]
        direction TB

        PARSE["● recipe/io.py<br/>━━━━━━━━━━<br/>_parse_step()<br/>data.get('idle_output_timeout')<br/>→ int | None"]

        SCHEMA["● recipe/schema.py<br/>━━━━━━━━━━<br/>RecipeStep.idle_output_timeout<br/>type: int | None<br/>default: None<br/>None = use global cfg<br/>0 = disabled for this step"]

        VALIDATE["● recipe/validator.py<br/>━━━━━━━━━━<br/>must be non-negative int or None<br/>negative values rejected<br/>0 explicitly permitted"]

        RULES["● recipe/rules_tools.py<br/>━━━━━━━━━━<br/>registered as known run_skill param<br/>prevents dead-with-param false positive"]
    end

    %% ── MCP SERVER ── %%
    subgraph MCPServer ["● MCP SERVER — run_skill tool"]
        direction TB

        RUN_SKILL["● tools_execution.py<br/>━━━━━━━━━━<br/>run_skill(idle_output_timeout: int | None)<br/>None → pass None (executor falls back to global)<br/>int → float(value) passed to executor<br/>0 → float(0.0) → executor disables timeout"]
    end

    %% ── PROTOCOL BOUNDARY ── %%
    subgraph Protocol ["● PROTOCOL BOUNDARY"]
        direction TB

        PROTOCOL["● _type_protocols.py<br/>━━━━━━━━━━<br/>HeadlessExecutor.run(<br/>  idle_output_timeout: float | None<br/>)<br/>None = use global cfg"]
    end

    %% ── EXECUTION LAYER ── %%
    subgraph Execution ["● EXECUTION LAYER"]
        direction TB

        EXECUTOR["● execution/headless.py<br/>━━━━━━━━━━<br/>DefaultHeadlessExecutor.run()<br/>Resolution:<br/>  None → float(cfg.idle_output_timeout)<br/>  0.0 → effective_idle = None (disabled)<br/>  >0.0 → effective_idle = value"]

        CORE["● run_headless_core()<br/>━━━━━━━━━━<br/>Receives pre-resolved float | None<br/>Passes effective_idle to runner<br/>None = no idle kill signal"]
    end

    %% ── SUBPROCESS ── %%
    subgraph Subprocess ["SUBPROCESS RUNNER"]
        direction TB

        RUNNER["Process Runner<br/>━━━━━━━━━━<br/>idle_output_timeout=effective_idle<br/>None → timeout disabled<br/>600.0 → kill after 600s idle<br/>custom float → step override active"]
    end

    %% ── FLOWS ── %%
    RESEARCH_YAML -->|"idle_output_timeout: 0 (YAML field)"| PARSE
    PARSE --> SCHEMA
    SCHEMA --> VALIDATE
    SCHEMA --> RULES
    VALIDATE -->|"valid RecipeStep.idle_output_timeout"| RUN_SKILL

    GLOBAL_CFG -->|"fallback default (600s)"| EXECUTOR

    RUN_SKILL -->|"int|None → float|None"| PROTOCOL
    PROTOCOL --> EXECUTOR
    EXECUTOR -->|"resolved float|None"| CORE
    CORE --> RUNNER

    %% CLASS ASSIGNMENTS %%
    class GLOBAL_CFG phase;
    class RESEARCH_YAML cli;
    class PARSE,SCHEMA handler;
    class VALIDATE,RULES detector;
    class RUN_SKILL handler;
    class PROTOCOL phase;
    class EXECUTOR handler;
    class CORE handler;
    class RUNNER output;
Loading

Closes #785

Implementation Plan

Plan file: /home/talon/projects/autoskillit-runs/impl-785-20260412-214909-105758/.autoskillit/temp/make-plan/idle_output_timeout_per_step_plan_2026-04-12_215200.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 183 20.7k 809.0k 51.9k 1 10m 58s
verify 216 15.6k 1.6M 70.9k 1 4m 23s
implement 688 33.9k 5.4M 85.4k 1 11m 9s
prepare_pr 70 6.0k 224.2k 25.6k 1 1m 44s
run_arch_lenses 182 22.0k 483.6k 133.0k 3 10m 24s
compose_pr 67 7.6k 232.0k 28.9k 1 2m 10s
Total 1.4k 105.8k 8.7M 395.8k 40m 52s

Trecek added 8 commits April 12, 2026 22:08
…parser

Add idle_output_timeout: int | None = None to RecipeStep dataclass,
register it in _PARSE_STEP_HANDLED_FIELDS, and thread it through
_parse_step() via data.get(). 0 means disabled for this step; None
means fall back to global config.
…ist entry

Validate that idle_output_timeout is a non-negative integer when set
(0 is valid and means disabled). Add idle_output_timeout to the
run_skill tool parameter whitelist so the recipe orchestrator can
pass it as a kwarg.
Extend HeadlessExecutor.run() protocol signature with optional
idle_output_timeout: float | None = None parameter, matching the
same pattern as stale_threshold.
Add idle_output_timeout: int | None = None parameter to run_skill(),
convert to float and pass to executor.run(). The None passthrough is
intentional; 0→None (disabled) conversion happens in the executor layer.
… layers

Add idle_output_timeout: float | None = None to run_headless_core() and
DefaultHeadlessExecutor.run(). Both layers apply the same 0→None
resolution (raw=0 means disabled, maps to None for run_managed_async).
When not provided, falls back to float(cfg.idle_output_timeout).
…h steps

Add idle_output_timeout: 0 (disabled) to implement_phase,
troubleshoot_implement_failure, run_experiment, adjust_experiment,
and re_run_experiment. These steps use skills that may produce no
stdout for long periods (LLM thinking, compilation, experiments).
T1: RecipeStep parses idle_output_timeout from YAML (120, None, 0)
T2: Validator rejects negative values, accepts 0 and positive
T3: HeadlessExecutor protocol includes idle_output_timeout param
T4/T5: run_skill passes idle_output_timeout to executor (120→120.0, None→None)
T6: run_headless_core applies 0→None conversion and cfg fallback
Also update MockExecutor signatures in existing server tests.
Reformatted rules_tools.py, tools_execution.py, and test_context.py
to comply with ruff line-length and style rules.
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

Found 14 actionable findings (10 critical, 4 warning). See inline comments.

tool_ctx.runner.push(self._success_payload(marker))
await run_headless_core(
"/investigate foo", cwd="/tmp", ctx=tool_ctx, idle_output_timeout=None
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[critical] tests: Same fragile call_args_list 4-tuple unpacking issue as L3599.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Same as L3599 — MockSubprocessRunner stores explicit 4-tuples, documented in conftest.py:49. Unpacking is correct.

) -> None:
"""idle_output_timeout=None falls back to float(cfg.idle_output_timeout)."""
from autoskillit.execution.headless import run_headless_core

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: Tautological assertion: cfg_value is read from the live config then asserted equal to the result. If the implementation simply reads the same config field, this always passes regardless of correctness. Assert against a known constant (e.g. == 600.0) to make the test meaningful.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — fixed in e8e772a. Replaced cfg_value = float(tool_ctx.config.run_skill.idle_output_timeout) reference with a direct assertion against 600.0 (the known config default per defaults.yaml:47 and settings.py:96). Test now catches implementations that ignore the config entirely.

_raw_idle = (
idle_output_timeout
if idle_output_timeout is not None
else float(cfg.idle_output_timeout)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: float(cfg.idle_output_timeout) is called unconditionally with no error context. If dynaconf surfaces this as a string via an env-var override, the resulting ValueError is uncontextualised. Validate the type in the config dataclass or wrap with a descriptive try/except.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. settings.py:402 performs idle_output_timeout=int(val(...)) at config load time, coercing env-var overrides before they reach headless.py. By the time float(cfg.idle_output_timeout) is called at headless.py:995, the value is a validated int. An env-var string like 'abc' would raise ValueError at config load (settings.py), not here.

expected_output_patterns=expected_output_patterns,
write_behavior=write_spec,
stale_threshold=float(stale_threshold) if stale_threshold is not None else None,
idle_output_timeout=float(idle_output_timeout)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] defense: float(idle_output_timeout) is called without guarding against non-numeric input. FastMCP deserialises from JSON; an unexpected type raises a confusing TypeError instead of a clean validation error. Add an isinstance guard or try/except with a descriptive message.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. Parameter typed as int | None at tools_execution.py:154. FastMCP validates JSON schema types before invoking the function body; non-numeric inputs are rejected at the MCP transport layer with a schema error, never reaching the float() call.

Trecek and others added 2 commits April 12, 2026 23:08
…ltHeadlessExecutor.run()

DefaultHeadlessExecutor.run() was pre-resolving idle_output_timeout (0.0→None) before
passing it to run_headless_core(). run_headless_core() then treated None as "not supplied"
and fell back to float(cfg.idle_output_timeout), re-enabling the very timeout the step
intended to disable. Fix: pass raw idle_output_timeout directly to run_headless_core(),
which is the single authoritative resolution layer.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…put_timeout fallback test

The previous assertion read float(cfg.idle_output_timeout) from the same config
the implementation reads, making the test tautological — it would pass even if the
implementation used an arbitrary config field. Assert the known default 600.0 directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Trecek Trecek added this pull request to the merge queue Apr 13, 2026
Merged via the queue into integration with commit 974920e Apr 13, 2026
2 checks passed
@Trecek Trecek deleted the add-idle-output-timeout-as-per-step-recipe-override/785 branch April 13, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant