Implementation Plan: Generalize generate-report Template for Non-Software Research#796
Merged
Trecek merged 5 commits intointegrationfrom Apr 13, 2026
Conversation
Trecek
commented
Apr 13, 2026
Collaborator
Author
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested (3 actionable, 3 require human decision)
Trecek
commented
Apr 13, 2026
Collaborator
Author
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 6 findings (3 actionable fixes + 3 requiring human decision). See inline comments. Verdict: changes_requested.
…ests Add four new contract tests to test_generate_report_contracts.py: - test_no_rust_specific_package_manager - test_domain_adaptive_ordering_guidance - test_data_availability_section_supported - test_recommendations_or_discussion_framing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…domains Four targeted changes to SKILL.md make the report template domain-adaptive without breaking existing software engineering reports: 1. Remove Rust-specific `cargo tree` example from Environment section; use language-agnostic examples (pip freeze, conda list, lock files). 2. Add Domain-Adaptive Section Ordering guidance in Step 3 covering engineering (default), biology/biomedical (IMRaD), and economics/ social science ordering conventions. 3. Add optional Data Availability section between Recommendations and Appendix for biology/medical/social science journal requirements. 4. Update Recommendations heading to allow "Discussion and Future Directions" as an alternative framing for non-engineering domains. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…in-adaptive' Reviewer flagged that `assert "domain" in lower` provides no signal since the word appears throughout any technical documentation. Replace with `assert "domain-adaptive" in lower` which targets the specific guidance phrase added in this branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…diction, and mandatory sections Three related corrections to the Domain-Adaptive Section Ordering in SKILL.md: 1. Biology ordering: remove non-standard 'Analysis' step (standard IMRaD is Introduction→Methods→Results→Discussion); update label from 'Follow IMRaD' to 'Adapted from IMRaD' and add 'What We Learned' to the ordering. 2. Executive Summary rule: add academic-journal exception — biology/medical/ social science submissions may replace Executive Summary with a structured Abstract. 3. Economics/social science ordering: add 'What We Learned' to resolve the contradiction with the mandatory-sections rule. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Reviewer noted that italic markup inside an H2 heading renders unexpectedly in some Markdown processors. Separate the alternative-framing note into its own paragraph below the heading. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3e30c0c to
dca18f8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
generate-reportskill's SKILL.md contained software-engineering defaults that reducecredibility and usefulness for non-software research (biology, economics, social science).
Four targeted changes make the template domain-adaptive without breaking existing software
research reports:
cargo treeexample from the Environment section.software order as default).
journals).
"Recommendations" section in non-engineering domains.
All changes are strictly additive loosening of the template. The existing default ordering
and section names remain valid; agents gain permission to adapt them.
Architecture Impact
Scenarios (Validation) Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 55, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; subgraph S1 ["SCENARIO 1: Conclusive Research Report (Primary Happy Path)"] direction LR S1_RECIPE["Recipe Orchestrator<br/>━━━━━━━━━━<br/>research.yaml<br/>generate_report step"] S1_RUNSKILL["run_skill MCP Tool<br/>━━━━━━━━━━<br/>tools_execution.py<br/>resolve_target_skill → headless session"] S1_SKILL["● SKILL.md<br/>━━━━━━━━━━<br/>generate-report<br/>Domain-generalized template<br/>instructs headless agent"] S1_ARTIFACTS["Experiment Artifacts<br/>━━━━━━━━━━<br/>results_path · *_metrics.json<br/>visualization-plan.md<br/>AUTOSKILLIT_TEMP/"] S1_VIZ["Visualization Pipeline<br/>━━━━━━━━━━<br/>.plot-venv<br/>matplotlib / seaborn / plotly<br/>per-figure scripts"] S1_REPORT["report.md<br/>━━━━━━━━━━<br/>research/YYYY-MM-DD/{slug}/<br/>committed to worktree"] S1_CAPTURE["context.report_path<br/>━━━━━━━━━━<br/>capture: report_path token<br/>on_success → test step"] end subgraph S2 ["SCENARIO 2: Domain-Adaptive Non-Software Report (Biology / Social Science)"] direction LR S2_RECIPE["Recipe Orchestrator<br/>━━━━━━━━━━<br/>research.yaml<br/>generate_report step<br/>(non-software domain)"] S2_SKILL["● SKILL.md<br/>━━━━━━━━━━<br/>Domain-adaptive ordering<br/>Language-agnostic env capture<br/>No Rust/cargo assumptions"] S2_ADAPT["Domain Adapter<br/>━━━━━━━━━━<br/>Biology → IMRaD order<br/>Economics / Social Science<br/>→ alternate section order"] S2_SECTIONS["Domain-Specific Sections<br/>━━━━━━━━━━<br/>Data Availability (optional)<br/>Discussion & Future Dirs<br/>Data Scope Statement (mandatory)"] S2_REPORT["report.md<br/>━━━━━━━━━━<br/>Domain-appropriate format<br/>committed to worktree"] end subgraph S3 ["SCENARIO 3: Inconclusive Experiment Report"] direction LR S3_ENSURE["ensure_results<br/>━━━━━━━━━━<br/>Retry exhaustion detected<br/>No usable experiment output"] S3_RUNSKILL["run_skill MCP Tool<br/>━━━━━━━━━━<br/>generate_report_inconclusive step<br/>--inconclusive flag"] S3_SKILL["● SKILL.md<br/>━━━━━━━━━━<br/>Inconclusive path:<br/>No failure framing<br/>Emphasize what was learned"] S3_REPORT["report.md<br/>━━━━━━━━━━<br/>What We Learned framing<br/>Figure specs under details<br/>committed to worktree"] S3_CAPTURE["context.report_path<br/>━━━━━━━━━━<br/>capture: report_path token<br/>on_success → test step"] end subgraph S4 ["SCENARIO 4: Contract Validation (CI Static Analysis)"] direction LR S4_TESTS["● test_generate_report<br/>_contracts.py<br/>━━━━━━━━━━<br/>tests/contracts/<br/>9 static content assertions"] S4_SKILL["● SKILL.md<br/>━━━━━━━━━━<br/>generate-report<br/>read as plain text"] S4_CHECKS["Domain Generalization Guards<br/>━━━━━━━━━━<br/>Data Scope Statement present<br/>Metrics Provenance present<br/>Gate Enforcement (no substitution)<br/>No Rust/cargo tooling<br/>biology + domain-adaptive ordering<br/>Data Availability section<br/>Discussion & Future Dirs"] S4_RESULT["task test-check<br/>━━━━━━━━━━<br/>PASS: all domain guards satisfied<br/>FAIL: regression in SKILL.md"] end %% SCENARIO 1 FLOW %% S1_RECIPE -->|"tool: run_skill"| S1_RUNSKILL S1_RUNSKILL -->|"loads instructions"| S1_SKILL S1_RUNSKILL -->|"reads"| S1_ARTIFACTS S1_SKILL -->|"instructs agent to synthesize"| S1_VIZ S1_ARTIFACTS -->|"feeds"| S1_VIZ S1_VIZ -->|"writes & commits"| S1_REPORT S1_REPORT -->|"emits report_path token"| S1_CAPTURE %% SCENARIO 2 FLOW %% S2_RECIPE -->|"tool: run_skill"| S2_SKILL S2_SKILL -->|"selects ordering"| S2_ADAPT S2_ADAPT -->|"structures"| S2_SECTIONS S2_SECTIONS -->|"writes & commits"| S2_REPORT %% SCENARIO 3 FLOW %% S3_ENSURE -->|"on_failure path"| S3_RUNSKILL S3_RUNSKILL -->|"loads instructions"| S3_SKILL S3_SKILL -->|"instructs inconclusive framing"| S3_REPORT S3_REPORT -->|"emits report_path token"| S3_CAPTURE %% SCENARIO 4 FLOW %% S4_TESTS -->|"reads SKILL.md text"| S4_SKILL S4_SKILL -->|"content asserted against"| S4_CHECKS S4_CHECKS -->|"produces"| S4_RESULT %% CLASS ASSIGNMENTS %% class S1_RECIPE,S2_RECIPE,S3_ENSURE cli; class S1_RUNSKILL,S3_RUNSKILL,S1_VIZ,S2_ADAPT,S2_SECTIONS handler; class S1_SKILL,S2_SKILL,S3_SKILL,S4_SKILL phase; class S1_ARTIFACTS,S1_CAPTURE,S3_CAPTURE stateNode; class S1_REPORT,S2_REPORT,S3_REPORT,S4_RESULT output; class S4_TESTS detector; class S4_CHECKS integration;Module Dependency (Structural) Diagram
%%{init: {'flowchart': {'nodeSpacing': 45, 'rankSpacing': 65, 'curve': 'basis'}}}%% graph TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph L3 ["LAYER 3 — APPLICATION"] direction LR SERVER["server/<br/>━━━━━━━━━━<br/>tools_execution.py<br/>run_skill handler<br/>tools_kitchen.py"] CLI["cli/<br/>━━━━━━━━━━<br/>app.py · _cook.py<br/>_prompts.py"] end subgraph L2 ["LAYER 2 — BUSINESS LOGIC"] direction LR RECIPE["recipe/<br/>━━━━━━━━━━<br/>rules_skill_content.py<br/>rules_skills.py · validator.py<br/>contracts.py"] MIGRATION["migration/<br/>━━━━━━━━━━<br/>engine.py · _api.py<br/>loader.py"] end subgraph L1 ["LAYER 1 — INFRASTRUCTURE"] direction LR CONFIG["config/<br/>━━━━━━━━━━<br/>settings.py<br/>defaults.yaml<br/>ingredient_defaults.py"] PIPELINE["pipeline/<br/>━━━━━━━━━━<br/>context.py · gate.py<br/>audit.py · timings.py<br/>tokens.py"] EXECUTION["execution/<br/>━━━━━━━━━━<br/>headless.py · process.py<br/>ci.py · session.py<br/>quota.py"] WORKSPACE["workspace/<br/>━━━━━━━━━━<br/>skills.py (SkillResolver)<br/>session_skills.py<br/>worktree.py · clone.py"] end subgraph L0 ["LAYER 0 — FOUNDATION (fan-in: 102 files)"] CORE["core/<br/>━━━━━━━━━━<br/>types · io · paths · logging<br/>constants · kitchen_state<br/>claude_conventions"] end subgraph SKILL_LAYER ["SKILL ARTIFACTS (not Python modules)"] direction LR SKILL["● generate-report/<br/>━━━━━━━━━━<br/>SKILL.md<br/>domain-adaptive template<br/>biology · econ · engineering"] OTHER_SKILLS["skills/ · skills_extended/<br/>━━━━━━━━━━<br/>Tier 1/2/3 skill definitions<br/>open-kitchen · sous-chef<br/>arch-lens-* · exp-lens-*"] end subgraph HOOKS_LAYER ["HOOKS (standalone scripts)"] HOOKS["hooks/<br/>━━━━━━━━━━<br/>quota_guard · branch_protection<br/>pretty_output · skill_cmd_guard"] HOOK_REG["hook_registry.py<br/>━━━━━━━━━━<br/>HookDef · HOOK_REGISTRY<br/>generate_hooks_json"] end subgraph TEST_LAYER ["TEST LAYER"] direction LR CONTRACT_TEST["● test_generate_report<br/>_contracts.py<br/>━━━━━━━━━━<br/>tests/contracts/<br/>8 assertions (3 new)<br/>pure filesystem read"] ARCH_TESTS["tests/arch/<br/>━━━━━━━━━━<br/>test_layer_enforcement.py<br/>test_subpackage_isolation.py<br/>AST rule enforcement"] end %% ── L3 → L2 (valid downward) ── SERVER -->|"imports"| RECIPE SERVER -->|"imports"| MIGRATION SERVER -->|"imports"| WORKSPACE CLI -->|"imports"| RECIPE CLI -->|"imports"| MIGRATION CLI -->|"imports"| WORKSPACE %% ── L3 lateral (same level, deferred) ── CLI -.->|"deferred L3↔L3"| SERVER %% ── L3 → L1 (valid downward) ── SERVER -->|"imports"| CONFIG SERVER -->|"imports"| PIPELINE SERVER -->|"imports"| EXECUTION CLI -->|"imports"| CONFIG CLI -->|"imports"| EXECUTION %% ── L3 → Hooks (L3→scripts boundary) ── SERVER -.->|"narrow const import<br/>(boundary note)"| HOOKS %% ── L2 lateral (same level, deferred) ── RECIPE -.->|"deferred noqa (L2↔L2)<br/>7 call-site imports"| WORKSPACE MIGRATION -.->|"deferred (L2↔L2)<br/>6 method-body imports"| RECIPE %% ── L1 → L0 (valid downward) ── CONFIG -->|"imports"| CORE PIPELINE -->|"imports"| CORE EXECUTION -->|"imports"| CORE WORKSPACE -->|"imports"| CORE %% ── L1 TYPE_CHECKING-only upward refs (not runtime) ── EXECUTION -.->|"TYPE_CHECKING only<br/>(not runtime)"| CONFIG EXECUTION -.->|"TYPE_CHECKING only<br/>(not runtime)"| PIPELINE WORKSPACE -.->|"TYPE_CHECKING only<br/>(not runtime)"| CONFIG %% ── Hooks → Core ── HOOKS -->|"via hook_registry"| HOOK_REG HOOK_REG -->|"imports"| CORE %% ── Skill Resolution Chain ── WORKSPACE ==>|"discovers & loads"| SKILL WORKSPACE -->|"discovers & loads"| OTHER_SKILLS RECIPE ==>|"validates skill content"| SKILL RECIPE -->|"validates skill refs"| OTHER_SKILLS SERVER ==>|"invokes via run_skill"| SKILL %% ── Contract Test ── CONTRACT_TEST ==>|"reads SKILL.md<br/>(filesystem only,<br/>no module import)"| SKILL ARCH_TESTS -->|"AST-checks layer rules"| L1 ARCH_TESTS -->|"AST-checks layer rules"| L0 %% CLASS ASSIGNMENTS %% class SERVER,CLI cli; class RECIPE,MIGRATION phase; class CONFIG,PIPELINE,EXECUTION,WORKSPACE handler; class CORE stateNode; class SKILL,OTHER_SKILLS output; class HOOKS,HOOK_REG integration; class CONTRACT_TEST,ARCH_TESTS detector;Process Flow (Physiological) Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; %% TERMINALS %% START([START]) COMPLETE([COMPLETE]) ERROR([ERROR]) subgraph Args ["Argument Parsing"] direction TB A1["worktree_path<br/>━━━━━━━━━━<br/>Abs path to worktree"] A2["results_path<br/>━━━━━━━━━━<br/>Abs path to experiment results"] A3{"--inconclusive?"} A4{"output-mode = local<br/>AND --issue-url?"} end subgraph Gather ["Step 1 — Gather Artifacts"] direction TB G1["Read experiment-plan.md<br/>━━━━━━━━━━<br/>AUTOSKILLIT_TEMP/experiment-plan.md"] G2["Read scope report<br/>━━━━━━━━━━<br/>AUTOSKILLIT_TEMP/scope/ (if present)"] G3["Read results file<br/>━━━━━━━━━━<br/>results_path arg"] G4["Read raw run data<br/>━━━━━━━━━━<br/>AUTOSKILLIT_TEMP/run-experiment/"] G5{"*_metrics.json<br/>files present?"} G6["Read metrics JSON files<br/>━━━━━━━━━━<br/>accuracy_metrics.json, etc."] G7["Scan experiment code<br/>━━━━━━━━━━<br/>scripts, fixtures, tools in worktree"] end subgraph IssueRef ["Step 1.5 — Issue Reference Header"] direction TB IR1["Parse issue number<br/>━━━━━━━━━━<br/>Last numeric segment of URL"] IR2["Prepend blockquote ref<br/>━━━━━━━━━━<br/>> This research addresses Issue #N"] end subgraph ReportType ["Step 2 — Determine Report Type"] direction TB RT1{"--inconclusive flag<br/>or status = INCONCLUSIVE?"} RT2["Conclusive Report<br/>━━━━━━━━━━<br/>Full findings + recommendations"] RT3["Inconclusive Report<br/>━━━━━━━━━━<br/>Boundary conditions + what was learned<br/>Distinguish negative vs inconclusive"] end subgraph Viz ["Step 2.5 — Produce Visualizations"] direction TB V1{"visualization-plan.md<br/>exists?"} V2{"Zero figure specs<br/>in plan?"} V3["Create plot venv<br/>━━━━━━━━━━<br/>matplotlib + seaborn (+ plotly if needed)"] V4["For each figure-spec:<br/>━━━━━━━━━━<br/>Write fig{N}_{slug}.py script<br/>Run script → confirm output<br/>On failure: emit MISSING + continue"] V5["Commit scripts and images<br/>━━━━━━━━━━<br/>git add research/ && git commit"] end subgraph DomainOrder ["● Step 3 — Domain-Adaptive Section Ordering"] direction TB DO1{"Research domain?"} DO2["Engineering / Computational<br/>━━━━━━━━━━<br/>Exec Summary → Background →<br/>Methodology → Results →<br/>Observations → Analysis →<br/>What We Learned → Conclusions →<br/>Recommendations"] DO3["Biology / Biomedical<br/>━━━━━━━━━━<br/>IMRaD: Background → Methodology →<br/>Results → Analysis →<br/>Discussion and Future Directions<br/>(Nature-style: Methods to supplementary)"] DO4["Economics / Social Science<br/>━━━━━━━━━━<br/>Background → Methodology →<br/>Results → Analysis →<br/>Discussion and Future Directions"] DO5["Mandatory sections (all domains)<br/>━━━━━━━━━━<br/>Data Scope Statement<br/>Metrics Provenance Check<br/>Gate Enforcement<br/>What We Learned<br/>Exec Summary first / Appendices last"] end subgraph WriteReport ["Step 3 (cont.) — Write Report File"] direction TB WR1["Create research dir<br/>━━━━━━━━━━<br/>research/YYYY-MM-DD-{slug}/"] WR2["Validate metrics provenance<br/>━━━━━━━━━━<br/>Check timestamp + content relevance<br/>Disclose stale or irrelevant metrics"] WR3["Enforce hypothesis gates<br/>━━━━━━━━━━<br/>Use gate from experiment plan<br/>No threshold substitution"] WR4["● Write report.md<br/>━━━━━━━━━━<br/>incl. Data Availability section<br/>incl. Discussion or Recommendations<br/>(domain-adaptive framing)"] end subgraph Commit ["Step 4 — Commit and Emit"] direction TB C1["git add research/<br/>━━━━━━━━━━<br/>git commit: 'Add research report: {title}'"] C2["Emit report_path token<br/>━━━━━━━━━━<br/>report_path = {absolute path to report.md}"] end %% MAIN FLOW %% START --> A1 START --> A2 A1 & A2 --> A3 A3 --> A4 A4 -->|"yes"| IR1 A4 -->|"no"| G1 IR1 --> IR2 --> G1 G1 --> G2 --> G3 --> G4 --> G5 G5 -->|"yes"| G6 G5 -->|"no"| G7 G6 --> G7 G7 --> RT1 RT1 -->|"conclusive"| RT2 RT1 -->|"inconclusive"| RT3 RT2 & RT3 --> V1 V1 -->|"no"| DO1 V1 -->|"yes"| V2 V2 -->|"empty plan"| DO1 V2 -->|"has specs"| V3 V3 --> V4 --> V5 --> DO1 DO1 -->|"engineering default"| DO2 DO1 -->|"biology/biomedical"| DO3 DO1 -->|"economics/social science"| DO4 DO2 & DO3 & DO4 --> DO5 DO5 --> WR1 --> WR2 --> WR3 --> WR4 WR4 --> C1 --> C2 --> COMPLETE WR2 -->|"unrecoverable artifact error"| ERROR %% CLASS ASSIGNMENTS %% class START,COMPLETE,ERROR terminal; class A1,A2 stateNode; class A3,A4,RT1,V1,V2,G5,DO1 stateNode; class G1,G2,G3,G4,G6,G7 handler; class IR1,IR2 handler; class RT2,RT3 handler; class V3,V4,V5 handler; class DO2,DO3,DO4 phase; class DO5 detector; class WR1,WR2,WR3 handler; class WR4 output; class C1,C2 output;Closes #789
Implementation Plan
Plan file:
/home/talon/projects/autoskillit-runs/impl-789-20260412-214910-831191/.autoskillit/temp/make-plan/generalize_generate_report_template_plan_2026-04-12_000000.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary