Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions .codex/skills/fn-fp-root-cause-analysis/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
name: fn-fp-root-cause-analysis
description: Use when runnable lift metrics already exist and you need structured false-negative or false-positive root-cause analysis from ground-truth, merged LL, and shard artifacts such as coverage CSVs, illegalEntry logs, stderr logs, or shard result manifests
---

# FN/FP Root Cause Analysis

## Overview

This skill is for explaining *why* runnable lift missed or over-lifted addresses.
Do not use it just to compute aggregate precision or recall.

## When To Use

- You already have ground-truth and lifted output.
- You need evidence-backed FN or FP categories instead of only `tp/fp/fn`.
- You have shard artifacts such as `shard_results.json`, `*.coverage.csv`, `*.illegalEntry.log`, `*.stderr.log`, `*.ll`, `*.li.csv`, or `*.need.csv`.

Do not use this skill when the task is only “run evaluation and report metrics”.
For that, run the validator script first and stop there unless asked for causes.

## Inputs

- Ground truth:
- preferred: final-layout CSV like `ground_truth.csv`
- plus `function_symbols.csv` for function-to-shard attribution
- Lift output:
- merged LL such as `merged.ll`
- Optional shard evidence:
- `shard_results.json`
- shard `*.coverage.csv`
- shard `*.illegalEntry.log`
- shard `*.stderr.log`
- shard `*.ll`

## Workflow

1. Compute metrics and address sets:

```bash
python3 runnable/scripts/validate_libcrypto_ground_truth.py \
--ground-truth-csv <ground_truth.csv> \
--ll <merged.ll> \
--function-symbols-csv <function_symbols.csv> \
--csv-image-base <csv image base> \
--rebase-base <runtime base> \
--summary-out <validation.json>
```

2. Analyze FN and FP causes:

```bash
python3 runnable/scripts/analyze_fn_fp_root_causes.py \
--validation-summary <validation.json> \
--ground-truth-csv <ground_truth.csv> \
--function-symbols-csv <function_symbols.csv> \
--merged-ll <merged.ll> \
--shard-results-json <shard_results.json> \
--csv-image-base <csv image base> \
--rebase-base <runtime base> \
--summary-out <analysis.json>
```

## Minimal Reproduction

```bash
python3 runnable/scripts/validate_libcrypto_ground_truth.py \
--ground-truth-csv test/fixtures/fn_fp_root_cause/ground_truth.csv \
--ll test/fixtures/fn_fp_root_cause/merged.ll \
--function-symbols-csv test/fixtures/fn_fp_root_cause/function_symbols.csv \
--csv-image-base 0x1000 \
--rebase-base 0x50000000 \
--summary-out /tmp/fnfp.validation.json

python3 runnable/scripts/analyze_fn_fp_root_causes.py \
--validation-summary /tmp/fnfp.validation.json \
--ground-truth-csv test/fixtures/fn_fp_root_cause/ground_truth.csv \
--function-symbols-csv test/fixtures/fn_fp_root_cause/function_symbols.csv \
--merged-ll test/fixtures/fn_fp_root_cause/merged.ll \
--shard-results-json test/fixtures/fn_fp_root_cause/shard_results.json \
--csv-image-base 0x1000 \
--rebase-base 0x50000000 \
--summary-out /tmp/fnfp.analysis.json
```

## Output Shape

Expect JSON with:

- `findings[]`
- each finding includes:
- `kind`: `fn` or `fp`
- `address`
- `reason`
- `priority`
- `symbol` and `range` when attributable
- `evidence_paths`

## Supported Reason Categories

- `illegal_entry_suppression`
- `shard_timeout`
- `shard_error`
- `shard_empty`
- `merge_missing`
- `continuation_byte`
- `ground_truth_gap`
- `padding`
- `outside_gt_coverage`
- `extra_lifted_bytes`

## Interpretation Rules

- Prefer shard-state explanations for FN when shard evidence exists.
- Prefer address-shape explanations for FP:
- continuation bytes
- GT gap near nearby instruction starts
- padding or data-section spill
- outside coverage
- extra lifted bytes inside covered space

## Reporting Standard

When summarizing results for reviewers, include:

- representative addresses
- category counts
- exact evidence paths
- first-priority investigation directions

Do not collapse everything back into only aggregate metrics.
80 changes: 80 additions & 0 deletions .codex/skills/runnable-libcrypto-canonical-eval/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
name: runnable-libcrypto-canonical-eval
description: Run or audit the canonical libcrypto evaluation contract for SYM-20 using the repo-local ground-truth bundle reachable from this workspace, fresh compare_runnable_text-based compares, and explicit address mapping. Use when a task mentions SYM-20, canonical libcrypto eval, gtBlock.pb, or when historical CSV/coverage-sidecar results must be distinguished from the authoritative compare path.
---

# Runnable Libcrypto Canonical Eval

Use this skill when the task is specifically about the authoritative libcrypto benchmark contract, not just generic Runnable compare metrics.

## Canonical Contract

For `SYM-20`, the authoritative contract in this workspace is:

1. Canonical binary:
- `python3 runnable/scripts/libcrypto_bench_paths.py binary --must-exist`
2. Canonical ground truth protobuf:
- `python3 runnable/scripts/libcrypto_bench_paths.py groundtruth-pb --must-exist`
3. Protobuf loader:
- `python3 runnable/scripts/libcrypto_bench_paths.py blocks-pb2 --must-exist`
4. Compare tool:
- `python3 runnable/scripts/libcrypto_bench_paths.py cmp-tool --must-exist`
5. Address mapping:
- derive `.text` start from the ELF
- quick check: `python3 runnable/scripts/libcrypto_bench_paths.py text-start`
- keep Runnable rebase at `0x50000000` unless the user explicitly provides another contract
6. Compare path:
- `python3 runnable/scripts/validate_libcrypto_ground_truth.py cmp --ll /abs/path/to/file.ll --out-dir /abs/path/to/out`

The wrapper records fresh `HIT`, `MISMATCH`, `OBJ_ONLY`, `LL_ONLY`, `FALSE_NEGATIVE`, `FALSE_POSITIVE`, `precision`, and `recall`.

## Quick Start

Gap-audit the canonical GT bundle:

```bash
python3 runnable/scripts/validate_libcrypto_ground_truth.py gap-audit \
--out-dir runs/groundtruth_validation/canonical_gap_audit
```

Compare a lift under the canonical contract:

```bash
python3 runnable/scripts/validate_libcrypto_ground_truth.py cmp \
--ll /abs/path/to/libcrypto.ll \
--out-dir runs/groundtruth_validation/canonical_cmp
```

## Historical / Non-Canonical Paths

Do **not** report the following as the canonical `SYM-20` result without an explicit label:

- `coverage_sidecar_union_threadpool16`
- CSV-only GT flows such as `ground_truth.csv`
- `rebase_base=0x50400000`
- old libcrypto binaries whose SHA differs from the repo-local canonical bundle

When historical artifacts are involved, report them as:

- `historical sidecar-union / non-canonical`
- `old-binary compare / non-canonical`
- `canonical gtBlock.pb compare / authoritative`

## Required Reporting

Always include:

- absolute binary path
- absolute GT path
- absolute `.ll` path
- `.text` start
- Runnable base
- `HIT`, `MISMATCH`, `OBJ_ONLY`, `LL_ONLY`
- `FALSE_NEGATIVE`, `FALSE_POSITIVE`
- `precision`, `recall`
- whether the result is `canonical` or `non-canonical`

## Repo Notes

- The canonical binary / protobuf are auto-discovered from this workspace's `GroudTruth` checkout, not from a checked-in `archives/` directory under `Runnable-Rewriting`.
- `runnable/scripts/validate_libcrypto_ground_truth.py` still supports the legacy CSV validation mode used by existing fn/fp root-cause tests; use `csv-validate` only for that older flow.
109 changes: 109 additions & 0 deletions .codex/skills/runnable-parallel-lift/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
---
name: runnable-parallel-lift
description: Use when working on runnable parallel lift workflows and deciding whether a task should use the current dynamic branch-driven runnable-lift path or the legacy offline static sharding path.
---

# Runnable Parallel Lift

## Overview

There are two different parallel lift models in this repository. The default current architecture is the online dynamic branch-driven mode inside `runnable-lift`. The older offline static sharding flow exists only for legacy address-ranged experiments and must not be treated as the current default.

## Default Workflow: Dynamic Branch-Driven `runnable-lift`

Use this when the task is about the current parallel lift architecture.

Entry command:

```bash
runnable-lift <binary> <output.ll> \
-dynamic-parallel \
-parallel-workers=<N> \
-parallel-fragment-dir=<fragment-dir> \
[other normal runnable-lift flags]
```

Current user-facing flags:

- `-dynamic-parallel`
- `-parallel-workers=<N>`
- `-parallel-fragment-dir=<PATH>`

Internal flags that agents should not pass manually:

- `-parallel-worker-mode`
- `-parallel-seed-pc`

Dynamic artifacts:

- coordinator output: `<output.ll>`
- worker fragments: `<fragment-dir>/worker_<pc>.ll`
- worker logs:
- `<fragment-dir>/worker_<pc>.ll.stdout.log`
- `<fragment-dir>/worker_<pc>.ll.stderr.log`
- temporary merge output before rename: `<output.ll>.merged.ll`

Dynamic merge behavior:

- `runnable/tools/runnable-lift/CodeGenerator.cpp` spawns branch workers from the coordinator.
- Successful worker fragments are merged back into the top-level module with `runnable/scripts/merge_dynamic_runnable_fragments.py`.
- The final merged module replaces the coordinator output path. Dynamic mode does not produce `merged_full.ll`, `shard_results.json`, `raw/`, or `shards/`.

Recommended validation:

```bash
rg -n "dynamic-parallel|parallel-workers|parallel-fragment-dir" \
runnable/tools/runnable-lift/Main.cpp \
runnable/tools/runnable-lift/CodeGenerator.cpp

python3 -m unittest discover -s test -p 'test_merge_dynamic_runnable_fragments.py' -v
```

## Legacy Workflow: Offline Static Sharding

This is the old address-ranged family. Historical wrappers for it were `scripts/libcrypto_parallel_lift.py` together with `run_libcrypto_parallel_lift_stable.py`.

In this repository snapshot, the surviving legacy helper and CLI specification for that workflow is `runnable/scripts/_merge_dynamic_fragments_lib.py`. That path expects `runnable-lift` builds with `-addr-range-min` and `-addr-range-max` support and describes the offline per-function/static-shard artifact model instead of online branch spawning.

Legacy outputs are different from the dynamic mode:

- `shards/`
- `raw/`
- `eval/`
- `logs/`
- `shard_results.json`
- `shard_results.jsonl`
- `status.json`
- `merged.ll`
- `merged_full.ll`
- `final_report.json`
- `final_report.md`

Use the legacy flow only when the task explicitly calls for static address-ranged lifting, libcrypto/OpenSSL-style offline evaluation, or comparing against historical shard reports.

Recommended validation:

```bash
python3 runnable/scripts/_merge_dynamic_fragments_lib.py --help
git show origin/wip/runnable-20260409-121909:test/lift_openssl_parallel.py | sed -n '1,120p'
```

## Guardrails

- Treat dynamic branch-driven `runnable-lift` as the default explanation unless the task explicitly asks for the old static sharding flow.
- Do not mix dynamic worker fragments (`worker_<pc>.ll`) with legacy `shards/` or `raw/` outputs in the same interpretation or report.
- Do not describe `scripts/libcrypto_parallel_lift.py` or `run_libcrypto_parallel_lift_stable.py` as the current architecture.
- Do not expect `merged_full.ll`, `parallel.eval.json`, or shard manifests from dynamic mode.
- Do not feed dynamic fragments into the legacy offline evaluation pipeline. Dynamic fragments are only for `merge_dynamic_runnable_fragments.py`.
- Do not pass `-parallel-worker-mode` or `-parallel-seed-pc` by hand. Those are worker-internal flags.
- If a task depends on `-addr-range-min` or `-addr-range-max`, call it legacy/static sharding explicitly and verify the target branch still exposes those flags before proceeding.

## References

- `README.md` dynamic parallel prototype section
- `runnable/tools/runnable-lift/Main.cpp`
- `runnable/tools/runnable-lift/CodeGenerator.cpp`
- `runnable/scripts/merge_dynamic_runnable_fragments.py`
- `runnable/scripts/_merge_dynamic_fragments_lib.py`
- `docs/superpowers/plans/2026-04-29-runnable-lift-dynamic-branch-parallel.md`
- `test/test_merge_dynamic_runnable_fragments.py`
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,45 @@ $ runnable-lift hello hello.ll 2>hello.log
$ runnable translate hello
```

## Dynamic Parallel Lift (Prototype)

The `codex/dynamic-parallel-lift` branch adds an experimental dynamic
branch-driven parallel mode to `runnable-lift`.

User-facing flags:

- `-dynamic-parallel`: enable dynamic branch-driven worker spawning
- `-parallel-workers=<N>`: cap the number of worker subprocesses
- `-parallel-fragment-dir=<PATH>`: directory for worker `.ll` fragments and logs

Example:

```
$ mkdir -p /tmp/runnable-fragments
$ runnable-lift hello hello.ll \
-dynamic-parallel \
-parallel-workers=4 \
-parallel-fragment-dir=/tmp/runnable-fragments \
2>hello.parallel.log
```

Artifacts:

- coordinator output: `hello.ll`
- worker fragments: `/tmp/runnable-fragments/worker_<pc>.ll`
- worker stdout/stderr logs:
- `/tmp/runnable-fragments/worker_<pc>.ll.stdout.log`
- `/tmp/runnable-fragments/worker_<pc>.ll.stderr.log`

Notes:

- `-parallel-worker-mode` and `-parallel-seed-pc` are internal flags used by
worker subprocesses and should not be passed manually.
- Successful worker fragments are merged back into the final top-level `.ll`
output with the repository helper `runnable/scripts/merge_dynamic_runnable_fragments.py`.
- This branch is still experimental. If fragment merge fails, the coordinator
output and worker `.ll` fragments are still left on disk for manual inspection.


## Experimental Evaluation

Expand Down
Loading