Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions .codex/skills/runnable-libcrypto-canonical-eval/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
name: runnable-libcrypto-canonical-eval
description: Run or audit the canonical libcrypto evaluation contract for SYM-20 using the repo-local 2026-04-28 ground-truth bundle, fresh runnable-cmp-eval compares, and explicit address mapping. Use when a task mentions SYM-20, canonical libcrypto eval, gtBlock.pb, the 2026-04-28 ground truth, or when historical CSV/coverage-sidecar results must be distinguished from the authoritative compare path.
---

# Runnable Libcrypto Canonical Eval

Use this skill when the task is specifically about the authoritative libcrypto benchmark contract, not just generic Runnable compare metrics.

## Canonical Contract

For `SYM-20`, the authoritative repo-local contract is:

1. Canonical binary:
- `python3 scripts/libcrypto_bench_paths.py binary --must-exist`
2. Canonical ground truth protobuf:
- `python3 scripts/libcrypto_bench_paths.py groundtruth-pb --must-exist`
3. Address mapping:
- derive `.text` start from the ELF
- quick check: `python3 scripts/libcrypto_bench_paths.py text-start`
- keep Runnable rebase at `0x50000000` unless the user explicitly provides another contract
4. Compare path:
- `python3 scripts/validate_libcrypto_ground_truth.py cmp --ll /abs/path/to/file.ll --out-dir /abs/path/to/out`

The wrapper records fresh `HIT`, `MISMATCH`, `OBJ_ONLY`, `LL_ONLY`, `FALSE_NEGATIVE`, `FALSE_POSITIVE`, `precision`, and `recall`.

## Quick Start

Gap-audit the canonical GT bundle:

```bash
python3 scripts/validate_libcrypto_ground_truth.py gap-audit \
--out-dir runs/groundtruth_validation/canonical_gap_audit
```

Compare a lift under the canonical contract:

```bash
python3 scripts/validate_libcrypto_ground_truth.py cmp \
--ll /abs/path/to/libcrypto.ll \
--out-dir runs/groundtruth_validation/canonical_cmp
```

If you already know the exact binary and want the lower-level wrapper:

```bash
python3 .codex/skills/runnable-cmp-eval/scripts/run_cmp_eval.py \
--binary /abs/path/to/libcrypto.so.3 \
--ll /abs/path/to/libcrypto.ll \
--text-start 0x... \
--runnable-base 0x50000000
```

## Historical / Non-Canonical Paths

Do **not** report the following as the canonical `SYM-20` result without an explicit label:

- `coverage_sidecar_union_threadpool16`
- CSV-only GT flows such as `ground_truth.csv`
- `rebase_base=0x50400000`
- old libcrypto binaries whose SHA differs from the repo-local canonical `2026-04-28` bundle

Those paths are useful for forensics and side-by-side experiments, but they are not the authoritative compare contract.

When historical artifacts are involved, report them as:

- `historical sidecar-union / non-canonical`
- `old-binary compare / non-canonical`
- `canonical gtBlock.pb compare / authoritative`

## Required Reporting

Always include:

- absolute binary path
- absolute GT path
- absolute `.ll` path
- binary SHA when contract mismatch is possible
- `.text` start
- Runnable base
- `HIT`, `MISMATCH`, `OBJ_ONLY`, `LL_ONLY`
- `FALSE_NEGATIVE`, `FALSE_POSITIVE`
- `precision`, `recall`
- whether the result is `canonical` or `non-canonical`

## Current Repo Evidence

Use this note when you need the latest repo-local evidence and caveats:

- `docs/exp/2026-05-10-libcrypto-canonical-eval-contract.md`

Current repo-local comparable pair:

- baseline:
- `runs/runnable-dev-2026-0429-textstart-serial/libcrypto.so.3.entry_0x500cef80.ll`
- optimized:
- `runs/runnable-dev-2026-0501-textstart-dynamic-reapfix/libcrypto.so.3.entry_0x500cef80.dynamic.ll`

Both already compare against the canonical `2026-04-28` binary with:

- `.text_start=0xcef80`
- `runnable_base=0x50000000`

Prefer these over old `2026-04-14` baseline artifacts when the goal is to produce a same-contract baseline/optimized comparison for `SYM-20`.

## Caveat

The canonical `gtBlock.pb` bundle still has documented coverage gaps. Run the gap audit and report those counts instead of assuming objdump and GT are identical.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Runnable Libcrypto Canonical Eval"
short_description: "按 SYM-20 canonical 合同评估 libcrypto"
default_prompt: "Use $runnable-libcrypto-canonical-eval to run or audit the canonical SYM-20 libcrypto evaluation contract, distinguish it from historical sidecar-union metrics, and report fresh HIT/FN/FP/precision/recall counters."
93 changes: 93 additions & 0 deletions .codex/skills/runnable-symphony-followup/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
---
name: runnable-symphony-followup
description: Draft or submit Runnable follow-up or backlog Linear issues discovered during implementation under the current Symphony workflow. Use when the main task reveals out-of-scope work that should become a separate Runnable issue with clear title, description, acceptance criteria, and validation.
---

# Runnable Symphony Follow-Up

Use this skill when a new issue should be split out from current work instead of expanding the active ticket.

This repo's `WORKFLOW.md` explicitly requires out-of-scope findings to become separate `Backlog` issues with clear scope. This skill standardizes that split.

## Default Goal

Create a follow-up issue draft that:

- is clearly separate from the current ticket,
- is small enough to be executed independently,
- is suitable for `Backlog` by default,
- explains why it was split out instead of handled now.

## Default Behavior

- Default target state: `Backlog`.
- Default relationship intent: mark it as related to the current issue when the available Linear tool supports verified relation creation.
- If the follow-up cannot proceed until the current issue lands, note that it should also be linked with `blockedBy` when the available tool supports it.
- Keep the issue narrowly scoped to one concrete gap.

If a Linear tool is available in-session, prefer creating the follow-up issue instead of only drafting it, but only create `related` / `blockedBy` links when the tool exposes a verified relation operation.

## Required Structure

```md
Title: <concise outcome-oriented title>

Reason for split:
- <why this is out of scope for the current issue>

## Summary

<short paragraph>

## Problem

- <concrete gap>
- <how it was discovered>
- <why it should not be folded into the current ticket>

## Scope

- In scope: <bounded work>
- Out of scope: <what remains excluded>

## Acceptance Criteria

- [ ] <observable outcome 1>
- [ ] <observable outcome 2>

## Validation

- [ ] <proof item>
```

## When To Use

- During implementation you notice a real bug or missing capability not required to complete the current issue.
- The current issue would become much harder to review if the new work were included.
- A cleanup, metrics extension, robustness pass, or doc correction deserves separate tracking.

## When Not To Use

- The new work is required to satisfy the current issue's acceptance criteria.
- The problem is only speculative and has no concrete evidence yet.
- The issue is so large that it should be split into multiple follow-ups instead of one.

## Runnable-Specific Guidance

- Mention concrete evidence paths when available: run directories, docs, scripts, tests, logs.
- For experiment-related follow-ups, specify the expected report artifact path.
- For code follow-ups, specify the likely code/test/doc surfaces but keep implementation flexibility.
- If you know the current issue identifier, mention that the new issue should be linked as `related` when supported by the available Linear tool.
- If the dependency is real, explicitly say the new issue should be linked as `blockedBy` the current issue when supported by the available Linear tool.
- If you are inside a Symphony issue session, prefer creating the follow-up under the same project/team as the current issue.
- If relation creation is not supported by the available Linear tool in-session, still create or draft the issue and report the intended `related` / `blockedBy` linkage explicitly instead of guessing mutation syntax.

## Output Style

Prefer returning:

1. a final proposed title,
2. the draft issue body,
3. a one-line note saying `Suggested initial state: Backlog`.

Do not bloat the follow-up with execution plans that belong in the future workpad comment.
4 changes: 4 additions & 0 deletions .codex/skills/runnable-symphony-followup/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Runnable Symphony Follow-Up"
short_description: "为 Runnable 拆分 Symphony follow-up / backlog issue"
default_prompt: "Use $runnable-symphony-followup to turn an out-of-scope finding into a separate Runnable Linear backlog issue with clear scope, acceptance criteria, and validation."
132 changes: 132 additions & 0 deletions .codex/skills/runnable-symphony-issue/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
---
name: runnable-symphony-issue
description: Create, submit, or refine Runnable Linear issues intended to be executed through Symphony. Use when the user wants to file or 提交 a new issue, rewrite an issue draft, or standardize ticket content so Runnable work can be picked up cleanly by the current Symphony workflow in `WORKFLOW.md`.
---

# Runnable Symphony Issue

Use this skill when the task is to write a new Runnable issue for the Symphony + Linear workflow, not to implement the issue itself.

This repo's issue workflow is defined in `WORKFLOW.md`. Follow that contract instead of inventing a generic ticket format.

## Default Goal

Produce issue text that is immediately usable by the current Runnable Symphony flow:

- scoped narrowly enough for one agent run,
- concrete enough to reproduce,
- explicit about acceptance criteria,
- explicit about validation,
- safe to start from `Todo` and move through `In Progress` -> `Human Review` -> `Merging` -> `Done`.

When the user asks to actually create the issue in Linear, prefer doing that in-session if a Linear MCP tool or Symphony `linear_graphql` tool is available.

## Runnable-Specific Rules

- Prefer one issue per concrete outcome. Do not pack multiple experiments or refactors into one ticket.
- Write for this repo's real workflows: `scripts/`, `docs/exp/`, `docs/reference/`, `tests/`, `openspec/changes/`, and libcrypto validation/eval paths.
- If the task is exploratory, define the expected artifact clearly: for example a report under `docs/exp/...`, a script under `scripts/...`, or a bounded code change plus validation.
- If the task could sprawl, split it now and keep the current issue to the smallest reviewable slice.
- If the task is discovered while doing other work and is out of scope, prefer the follow-up skill `runnable-symphony-followup`.

## Required Structure

Unless the user asks for a different format, draft the issue with these sections:

```md
## Summary

<1 short paragraph describing the problem and intended outcome>

## Problem

- <concrete current behavior or gap>
- <why it matters>
- <reproduction signal or evidence path, if known>

## Scope

- In scope: <bounded list>
- Out of scope: <bounded list>

## Acceptance Criteria

- [ ] <observable outcome 1>
- [ ] <observable outcome 2>

## Validation

- [ ] <exact command, script, or document check>
- [ ] <second proof item when needed>

## Notes

- <optional constraints, file hints, prior runs, issue links>
```

## Writing Guidance

- `Summary` should say what will be true after the issue is complete.
- `Problem` should describe the current failure mode, missing capability, or quality gap.
- `Scope` should constrain the execution so Symphony does not grow the task during implementation.
- `Acceptance Criteria` must be reviewer-visible outcomes, not implementation steps.
- `Validation` must name concrete commands, scripts, or doc artifacts whenever possible.
- `Notes` is optional and should hold paths, run IDs, comparison baselines, or dependencies.

## State Guidance

- New work intended for immediate execution should usually be created in `Todo`.
- Follow-up work discovered during implementation should usually be created in `Backlog`.
- Do not tell Symphony to start from `Backlog`; `WORKFLOW.md` explicitly treats it as out of scope until a human moves it.

## Submission Mode

If the user asks to "submit", "create", or "file" the issue:

1. Draft the final title and body first.
2. If a Linear tool is available, create the issue instead of stopping at prose.
3. If no Linear tool is available, return a ready-to-paste title/body/state package.

Prefer these target states:

- `Todo` for new work intended to be executed soon by Symphony.
- `Backlog` only when the user explicitly wants deferred work or the task is an out-of-scope follow-up.

## Linear / Symphony Guardrails

- Treat Runnable's tracker project slug as `runnable-e97c680b3b79`.
- If you are already inside a Symphony issue session, prefer reusing the current issue's `project.id` and `team.id` when creating a sibling or follow-up issue.
- If you have the current issue context, use the verified `ResolveStateId` and `CreateIssue` GraphQL patterns from `WORKFLOW.md`.
- `WORKFLOW.md` does not define a verified GraphQL mutation for issue relations, so treat `related` / `blockedBy` creation as best-effort via trusted tooling only.
- If you do not have current issue context and do not have a trusted project/team lookup helper in-session, do not invent unknown GraphQL schema fields just to force submission. Fall back to a ready-to-submit draft.
- If the user asks for links or dependencies between issues, add the relation only when the available Linear tooling already exposes a verified way to do it in-session.

## Good Runnable Issue Shapes

- Tight code fix plus a targeted test.
- One bounded experiment run plus a report artifact.
- One metrics or validation improvement with explicit before/after proof.
- One documentation or workflow correction tied to a concrete misleading behavior.

## Avoid

- Multi-week epics disguised as one issue.
- Acceptance criteria like "investigate" with no artifact.
- Validation like "make sure it works" with no command or output path.
- Mixing implementation, benchmark campaign, paper-writing, and cleanup into one ticket.
- Telling the future agent to ask humans for missing details unless there is a real auth or permission blocker.

## Runnable Defaults

When the user gives only a rough intent, bias toward:

- clear file/path anchors,
- reproducible commands,
- explicit output artifacts,
- narrow scope that can plausibly complete in one Symphony ticket cycle.

If helpful, also read:

- `WORKFLOW.md` for the active state machine and workpad rules.
- [issue-templates.md](references/issue-templates.md) for ready-to-use Runnable ticket templates.
- [runnable-linear-context.md](references/runnable-linear-context.md) for repo-specific Linear/Symphony defaults.
4 changes: 4 additions & 0 deletions .codex/skills/runnable-symphony-issue/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Runnable Symphony Issue"
short_description: "为 Runnable 写可被 Symphony 执行的 Linear issue"
default_prompt: "Use $runnable-symphony-issue to draft or rewrite a Runnable Linear issue so it matches the current Symphony workflow in WORKFLOW.md, including clear scope, acceptance criteria, and validation."
Loading