diff --git a/.codex/skills/runnable-libcrypto-canonical-eval/SKILL.md b/.codex/skills/runnable-libcrypto-canonical-eval/SKILL.md
new file mode 100644
index 000000000..534f827b5
--- /dev/null
+++ b/.codex/skills/runnable-libcrypto-canonical-eval/SKILL.md
@@ -0,0 +1,108 @@
+---
+name: runnable-libcrypto-canonical-eval
+description: Run or audit the canonical libcrypto evaluation contract for SYM-20 using the repo-local 2026-04-28 ground-truth bundle, fresh runnable-cmp-eval compares, and explicit address mapping. Use when a task mentions SYM-20, canonical libcrypto eval, gtBlock.pb, the 2026-04-28 ground truth, or when historical CSV/coverage-sidecar results must be distinguished from the authoritative compare path.
+---
+
+# Runnable Libcrypto Canonical Eval
+
+Use this skill when the task is specifically about the authoritative libcrypto benchmark contract, not just generic Runnable compare metrics.
+
+## Canonical Contract
+
+For `SYM-20`, the authoritative repo-local contract is:
+
+1. Canonical binary:
+   - `python3 scripts/libcrypto_bench_paths.py binary --must-exist`
+2. Canonical ground truth protobuf:
+   - `python3 scripts/libcrypto_bench_paths.py groundtruth-pb --must-exist`
+3. Address mapping:
+   - derive `.text` start from the ELF
+   - quick check: `python3 scripts/libcrypto_bench_paths.py text-start`
+   - keep Runnable rebase at `0x50000000` unless the user explicitly provides another contract
+4. Compare path:
+   - `python3 scripts/validate_libcrypto_ground_truth.py cmp --ll /abs/path/to/file.ll --out-dir /abs/path/to/out`
+
+The wrapper records fresh `HIT`, `MISMATCH`, `OBJ_ONLY`, `LL_ONLY`, `FALSE_NEGATIVE`, `FALSE_POSITIVE`, `precision`, and `recall`.
+
+## Quick Start
+
+Gap-audit the canonical GT bundle:
+
+```bash
+python3 scripts/validate_libcrypto_ground_truth.py gap-audit \
+  --out-dir runs/groundtruth_validation/canonical_gap_audit
+```
+
+Compare a lift under the canonical contract:
+
+```bash
+python3 scripts/validate_libcrypto_ground_truth.py cmp \
+  --ll /abs/path/to/libcrypto.ll \
+  --out-dir runs/groundtruth_validation/canonical_cmp
+```
+
+If you already know the exact binary and want the lower-level wrapper:
+
+```bash
+python3 .codex/skills/runnable-cmp-eval/scripts/run_cmp_eval.py \
+  --binary /abs/path/to/libcrypto.so.3 \
+  --ll /abs/path/to/libcrypto.ll \
+  --text-start 0x... \
+  --runnable-base 0x50000000
+```
+
+## Historical / Non-Canonical Paths
+
+Do **not** report the following as the canonical `SYM-20` result without an explicit label:
+
+- `coverage_sidecar_union_threadpool16`
+- CSV-only GT flows such as `ground_truth.csv`
+- `rebase_base=0x50400000`
+- old libcrypto binaries whose SHA differs from the repo-local canonical `2026-04-28` bundle
+
+Those paths are useful for forensics and side-by-side experiments, but they are not the authoritative compare contract.
+
+When historical artifacts are involved, report them as:
+
+- `historical sidecar-union / non-canonical`
+- `old-binary compare / non-canonical`
+- `canonical gtBlock.pb compare / authoritative`
+
+## Required Reporting
+
+Always include:
+
+- absolute binary path
+- absolute GT path
+- absolute `.ll` path
+- binary SHA when contract mismatch is possible
+- `.text` start
+- Runnable base
+- `HIT`, `MISMATCH`, `OBJ_ONLY`, `LL_ONLY`
+- `FALSE_NEGATIVE`, `FALSE_POSITIVE`
+- `precision`, `recall`
+- whether the result is `canonical` or `non-canonical`
+
+## Current Repo Evidence
+
+Use this note when you need the latest repo-local evidence and caveats:
+
+- `docs/exp/2026-05-10-libcrypto-canonical-eval-contract.md`
+
+Current repo-local comparable pair:
+
+- baseline:
+  - `runs/runnable-dev-2026-0429-textstart-serial/libcrypto.so.3.entry_0x500cef80.ll`
+- optimized:
+  - `runs/runnable-dev-2026-0501-textstart-dynamic-reapfix/libcrypto.so.3.entry_0x500cef80.dynamic.ll`
+
+Both already compare against the canonical `2026-04-28` binary with:
+
+- `.text_start=0xcef80`
+- `runnable_base=0x50000000`
+
+Prefer these over old `2026-04-14` baseline artifacts when the goal is to produce a same-contract baseline/optimized comparison for `SYM-20`.
+
+## Caveat
+
+The canonical `gtBlock.pb` bundle still has documented coverage gaps. Run the gap audit and report those counts instead of assuming objdump and GT are identical.
diff --git a/.codex/skills/runnable-libcrypto-canonical-eval/agents/openai.yaml b/.codex/skills/runnable-libcrypto-canonical-eval/agents/openai.yaml
new file mode 100644
index 000000000..2f86c9b63
--- /dev/null
+++ b/.codex/skills/runnable-libcrypto-canonical-eval/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Runnable Libcrypto Canonical Eval"
+  short_description: "按 SYM-20 canonical 合同评估 libcrypto"
+  default_prompt: "Use $runnable-libcrypto-canonical-eval to run or audit the canonical SYM-20 libcrypto evaluation contract, distinguish it from historical sidecar-union metrics, and report fresh HIT/FN/FP/precision/recall counters."
diff --git a/.codex/skills/runnable-symphony-followup/SKILL.md b/.codex/skills/runnable-symphony-followup/SKILL.md
new file mode 100644
index 000000000..21ef7424b
--- /dev/null
+++ b/.codex/skills/runnable-symphony-followup/SKILL.md
@@ -0,0 +1,93 @@
+---
+name: runnable-symphony-followup
+description: Draft or submit Runnable follow-up or backlog Linear issues discovered during implementation under the current Symphony workflow. Use when the main task reveals out-of-scope work that should become a separate Runnable issue with clear title, description, acceptance criteria, and validation.
+---
+
+# Runnable Symphony Follow-Up
+
+Use this skill when a new issue should be split out from current work instead of expanding the active ticket.
+
+This repo's `WORKFLOW.md` explicitly requires out-of-scope findings to become separate `Backlog` issues with clear scope. This skill standardizes that split.
+
+## Default Goal
+
+Create a follow-up issue draft that:
+
+- is clearly separate from the current ticket,
+- is small enough to be executed independently,
+- is suitable for `Backlog` by default,
+- explains why it was split out instead of handled now.
+
+## Default Behavior
+
+- Default target state: `Backlog`.
+- Default relationship intent: mark it as related to the current issue when the available Linear tool supports verified relation creation.
+- If the follow-up cannot proceed until the current issue lands, note that it should also be linked with `blockedBy` when the available tool supports it.
+- Keep the issue narrowly scoped to one concrete gap.
+
+If a Linear tool is available in-session, prefer creating the follow-up issue instead of only drafting it, but only create `related` / `blockedBy` links when the tool exposes a verified relation operation.
+
+## Required Structure
+
+```md
+Title: <concise outcome-oriented title>
+
+Reason for split:
+- <why this is out of scope for the current issue>
+
+## Summary
+
+<short paragraph>
+
+## Problem
+
+- <concrete gap>
+- <how it was discovered>
+- <why it should not be folded into the current ticket>
+
+## Scope
+
+- In scope: <bounded work>
+- Out of scope: <what remains excluded>
+
+## Acceptance Criteria
+
+- [ ] <observable outcome 1>
+- [ ] <observable outcome 2>
+
+## Validation
+
+- [ ] <proof item>
+```
+
+## When To Use
+
+- During implementation you notice a real bug or missing capability not required to complete the current issue.
+- The current issue would become much harder to review if the new work were included.
+- A cleanup, metrics extension, robustness pass, or doc correction deserves separate tracking.
+
+## When Not To Use
+
+- The new work is required to satisfy the current issue's acceptance criteria.
+- The problem is only speculative and has no concrete evidence yet.
+- The issue is so large that it should be split into multiple follow-ups instead of one.
+
+## Runnable-Specific Guidance
+
+- Mention concrete evidence paths when available: run directories, docs, scripts, tests, logs.
+- For experiment-related follow-ups, specify the expected report artifact path.
+- For code follow-ups, specify the likely code/test/doc surfaces but keep implementation flexibility.
+- If you know the current issue identifier, mention that the new issue should be linked as `related` when supported by the available Linear tool.
+- If the dependency is real, explicitly say the new issue should be linked as `blockedBy` the current issue when supported by the available Linear tool.
+- If you are inside a Symphony issue session, prefer creating the follow-up under the same project/team as the current issue.
+- If relation creation is not supported by the available Linear tool in-session, still create or draft the issue and report the intended `related` / `blockedBy` linkage explicitly instead of guessing mutation syntax.
+
+## Output Style
+
+Prefer returning:
+
+1. a final proposed title,
+2. the draft issue body,
+3. a one-line note saying `Suggested initial state: Backlog`.
+
+Do not bloat the follow-up with execution plans that belong in the future workpad comment.
diff --git a/.codex/skills/runnable-symphony-followup/agents/openai.yaml b/.codex/skills/runnable-symphony-followup/agents/openai.yaml
new file mode 100644
index 000000000..e53a7e817
--- /dev/null
+++ b/.codex/skills/runnable-symphony-followup/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Runnable Symphony Follow-Up"
+  short_description: "为 Runnable 拆分 Symphony follow-up / backlog issue"
+  default_prompt: "Use $runnable-symphony-followup to turn an out-of-scope finding into a separate Runnable Linear backlog issue with clear scope, acceptance criteria, and validation."
diff --git a/.codex/skills/runnable-symphony-issue/SKILL.md b/.codex/skills/runnable-symphony-issue/SKILL.md
new file mode 100644
index 000000000..01151e668
--- /dev/null
+++ b/.codex/skills/runnable-symphony-issue/SKILL.md
@@ -0,0 +1,132 @@
+---
+name: runnable-symphony-issue
+description: Create, submit, or refine Runnable Linear issues intended to be executed through Symphony. Use when the user wants to file or 提交 a new issue, rewrite an issue draft, or standardize ticket content so Runnable work can be picked up cleanly by the current Symphony workflow in `WORKFLOW.md`.
+---
+
+# Runnable Symphony Issue
+
+Use this skill when the task is to write a new Runnable issue for the Symphony + Linear workflow, not to implement the issue itself.
+
+This repo's issue workflow is defined in `WORKFLOW.md`. Follow that contract instead of inventing a generic ticket format.
+
+## Default Goal
+
+Produce issue text that is immediately usable by the current Runnable Symphony flow:
+
+- scoped narrowly enough for one agent run,
+- concrete enough to reproduce,
+- explicit about acceptance criteria,
+- explicit about validation,
+- safe to start from `Todo` and move through `In Progress` -> `Human Review` -> `Merging` -> `Done`.
+
+When the user asks to actually create the issue in Linear, prefer doing that in-session if a Linear MCP tool or Symphony `linear_graphql` tool is available.
+
+## Runnable-Specific Rules
+
+- Prefer one issue per concrete outcome. Do not pack multiple experiments or refactors into one ticket.
+- Write for this repo's real workflows: `scripts/`, `docs/exp/`, `docs/reference/`, `tests/`, `openspec/changes/`, and libcrypto validation/eval paths.
+- If the task is exploratory, define the expected artifact clearly: for example a report under `docs/exp/...`, a script under `scripts/...`, or a bounded code change plus validation.
+- If the task could sprawl, split it now and keep the current issue to the smallest reviewable slice.
+- If the task is discovered while doing other work and is out of scope, prefer the follow-up skill `runnable-symphony-followup`.
+
+## Required Structure
+
+Unless the user asks for a different format, draft the issue with these sections:
+
+```md
+## Summary
+
+<1 short paragraph describing the problem and intended outcome>
+
+## Problem
+
+- <concrete current behavior or gap>
+- <why it matters>
+- <reproduction signal or evidence path, if known>
+
+## Scope
+
+- In scope: <bounded list>
+- Out of scope: <bounded list>
+
+## Acceptance Criteria
+
+- [ ] <observable outcome 1>
+- [ ] <observable outcome 2>
+
+## Validation
+
+- [ ] <exact command, script, or document check>
+- [ ] <second proof item when needed>
+
+## Notes
+
+- <optional constraints, file hints, prior runs, issue links>
+```
+
+## Writing Guidance
+
+- `Summary` should say what will be true after the issue is complete.
+- `Problem` should describe the current failure mode, missing capability, or quality gap.
+- `Scope` should constrain the execution so Symphony does not grow the task during implementation.
+- `Acceptance Criteria` must be reviewer-visible outcomes, not implementation steps.
+- `Validation` must name concrete commands, scripts, or doc artifacts whenever possible.
+- `Notes` is optional and should hold paths, run IDs, comparison baselines, or dependencies.
+
+## State Guidance
+
+- New work intended for immediate execution should usually be created in `Todo`.
+- Follow-up work discovered during implementation should usually be created in `Backlog`.
+- Do not tell Symphony to start from `Backlog`; `WORKFLOW.md` explicitly treats it as out of scope until a human moves it.
+
+## Submission Mode
+
+If the user asks to "submit", "create", or "file" the issue:
+
+1. Draft the final title and body first.
+2. If a Linear tool is available, create the issue instead of stopping at prose.
+3. If no Linear tool is available, return a ready-to-paste title/body/state package.
+
+Prefer these target states:
+
+- `Todo` for new work intended to be executed soon by Symphony.
+- `Backlog` only when the user explicitly wants deferred work or the task is an out-of-scope follow-up.
+
+## Linear / Symphony Guardrails
+
+- Treat Runnable's tracker project slug as `runnable-e97c680b3b79`.
+- If you are already inside a Symphony issue session, prefer reusing the current issue's `project.id` and `team.id` when creating a sibling or follow-up issue.
+- If you have the current issue context, use the verified `ResolveStateId` and `CreateIssue` GraphQL patterns from `WORKFLOW.md`.
+- `WORKFLOW.md` does not define a verified GraphQL mutation for issue relations, so treat `related` / `blockedBy` creation as best-effort via trusted tooling only.
+- If you do not have current issue context and do not have a trusted project/team lookup helper in-session, do not invent unknown GraphQL schema fields just to force submission. Fall back to a ready-to-submit draft.
+- If the user asks for links or dependencies between issues, add the relation only when the available Linear tooling already exposes a verified way to do it in-session.
+
+## Good Runnable Issue Shapes
+
+- Tight code fix plus a targeted test.
+- One bounded experiment run plus a report artifact.
+- One metrics or validation improvement with explicit before/after proof.
+- One documentation or workflow correction tied to a concrete misleading behavior.
+
+## Avoid
+
+- Multi-week epics disguised as one issue.
+- Acceptance criteria like "investigate" with no artifact.
+- Validation like "make sure it works" with no command or output path.
+- Mixing implementation, benchmark campaign, paper-writing, and cleanup into one ticket.
+- Telling the future agent to ask humans for missing details unless there is a real auth or permission blocker.
+
+## Runnable Defaults
+
+When the user gives only a rough intent, bias toward:
+
+- clear file/path anchors,
+- reproducible commands,
+- explicit output artifacts,
+- narrow scope that can plausibly complete in one Symphony ticket cycle.
+
+If helpful, also read:
+
+- `WORKFLOW.md` for the active state machine and workpad rules.
+- [issue-templates.md](references/issue-templates.md) for ready-to-use Runnable ticket templates.
+- [runnable-linear-context.md](references/runnable-linear-context.md) for repo-specific Linear/Symphony defaults.
diff --git a/.codex/skills/runnable-symphony-issue/agents/openai.yaml b/.codex/skills/runnable-symphony-issue/agents/openai.yaml
new file mode 100644
index 000000000..184b071c9
--- /dev/null
+++ b/.codex/skills/runnable-symphony-issue/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "Runnable Symphony Issue"
+  short_description: "为 Runnable 写可被 Symphony 执行的 Linear issue"
+  default_prompt: "Use $runnable-symphony-issue to draft or rewrite a Runnable Linear issue so it matches the current Symphony workflow in WORKFLOW.md, including clear scope, acceptance criteria, and validation."
diff --git a/.codex/skills/runnable-symphony-issue/references/issue-templates.md b/.codex/skills/runnable-symphony-issue/references/issue-templates.md
new file mode 100644
index 000000000..a36ad22b4
--- /dev/null
+++ b/.codex/skills/runnable-symphony-issue/references/issue-templates.md
@@ -0,0 +1,106 @@
+# Runnable Symphony Issue Templates
+
+Use these templates when the user wants a draft quickly. Adapt them to the specific task; do not leave placeholders vague.
+
+## 1. Code Fix
+
+```md
+## Summary
+
+Fix <bug> in `<path-or-subsystem>` so that <desired outcome>.
+
+## Problem
+
+- Current behavior: <what fails>
+- Evidence: <command / file / run / log path>
+- Impact: <why this matters>
+
+## Scope
+
+- In scope: fix the behavior in `<path>`
+- In scope: add or update targeted coverage for the regression
+- Out of scope: unrelated refactors or broader optimization
+
+## Acceptance Criteria
+
+- [ ] <observable fixed behavior>
+- [ ] regression coverage exists for the failure mode
+
+## Validation
+
+- [ ] `<exact test or script command>`
+- [ ] `<secondary spot-check if needed>`
+
+## Notes
+
+- Related paths: `<path1>`, `<path2>`
+```
+
+## 2. Experiment / Validation Run
+
+```md
+## Summary
+
+Run a bounded experiment for <hypothesis> and record the result in `<artifact path>`.
+
+## Problem
+
+- We currently do not know whether <hypothesis / comparison> holds.
+- Existing evidence: <run dir / doc / metric snapshot>
+
+## Scope
+
+- In scope: run <specific script or workflow>
+- In scope: summarize metrics and interpretation in `<artifact path>`
+- Out of scope: unrelated repair work unless required to complete the planned run
+
+## Acceptance Criteria
+
+- [ ] experiment completes or fails with a clearly documented blocker
+- [ ] resulting metrics are written to `<artifact path>`
+- [ ] summary states whether the hypothesis was supported
+
+## Validation
+
+- [ ] `<exact run command>`
+- [ ] `test -f <artifact path>` or equivalent artifact existence check
+```
+
+## 3. Documentation / Workflow Fix
+
+```md
+## Summary
+
+Correct `<doc-or-workflow>` so it matches the current Runnable behavior for <topic>.
+
+## Problem
+
+- Current documentation or workflow text is misleading in `<path>`
+- Evidence: <mismatch between code, scripts, and docs>
+
+## Scope
+
+- In scope: update the relevant docs or workflow text
+- In scope: align example commands and expected artifacts
+- Out of scope: changing runtime behavior unless required for correctness
+
+## Acceptance Criteria
+
+- [ ] the corrected doc/workflow matches current commands and paths
+- [ ] stale or misleading guidance is removed or replaced
+
+## Validation
+
+- [ ] review `<doc path>` against `<code/script path>`
+- [ ] if commands are changed, run the command with a safe dry-run or equivalent proof
+```
+
+## 4. Issue Rewrite
+
+Use when the user already has a rough draft and wants it cleaned up:
+
+1. Keep the original intent.
+2. Replace vague verbs with observable outcomes.
+3. Move implementation details out of `Acceptance Criteria` and into `Scope` or `Notes`.
+4. Add concrete `Validation` items.
+5. Split the issue if it still contains more than one independently reviewable outcome.
diff --git a/.codex/skills/runnable-symphony-issue/references/runnable-linear-context.md b/.codex/skills/runnable-symphony-issue/references/runnable-linear-context.md
new file mode 100644
index 000000000..071d0060c
--- /dev/null
+++ b/.codex/skills/runnable-symphony-issue/references/runnable-linear-context.md
@@ -0,0 +1,32 @@
+# Runnable Linear / Symphony Context
+
+Repo-specific defaults extracted from `WORKFLOW.md`:
+
+- Tracker kind: `linear`
+- Tracker project slug: `runnable-e97c680b3b79`
+- Active states:
+  - `Todo`
+  - `In Progress`
+  - `Human Review`
+  - `Merging`
+  - `Rework`
+- Terminal states:
+  - `Closed`
+  - `Cancelled`
+  - `Canceled`
+  - `Duplicate`
+  - `Done`
+
+Issue creation defaults:
+
+- Use `Todo` for fresh work intended for Symphony execution.
+- Use `Backlog` for follow-up or deferred work.
+- Keep issue body concise and reviewer-oriented.
+- Include explicit `Acceptance Criteria` and `Validation`.
+
+If creating a follow-up from an active issue:
+
+- keep it in the same project,
+- add `related` / `blockedBy` relations only when the available Linear tool exposes a verified relation operation,
+- otherwise record the intended linkage explicitly in the issue body or workpad,
+- do not expand the current ticket just because related work was discovered.
diff --git a/WORKFLOW.md b/WORKFLOW.md
new file mode 100644
index 000000000..2a6be6ed0
--- /dev/null
+++ b/WORKFLOW.md
@@ -0,0 +1,455 @@
+---
+tracker:
+  kind: linear
+  project_slug: "runnable-e97c680b3b79"
+  active_states:
+    - Todo
+    - In Progress
+    - Human Review
+    - Merging
+    - Rework
+  terminal_states:
+    - Closed
+    - Cancelled
+    - Canceled
+    - Duplicate
+    - Done
+polling:
+  interval_ms: 30000
+workspace:
+  root: /hdd/code/runnable
+hooks:
+  after_create: |
+    git clone --depth 1 git@github.com:GRIN2021/Runnable-Rewriting.git .
+  before_remove: null
+agent:
+  max_concurrent_agents: 10
+  max_turns: 6
+codex:
+  command: codex --dangerously-bypass-approvals-and-sandbox --config shell_environment_policy.inherit=all --config 'model="gpt-5.4"' --config model_reasoning_effort=xhigh app-server
+  approval_policy: never
+  thread_sandbox: danger-full-access
+  turn_sandbox_policy:
+    type: dangerFullAccess
+---
+
+You are working on a Linear ticket `{{ issue.identifier }}`
+
+{% if attempt %}
+Continuation context:
+
+- This is retry attempt #{{ attempt }} because the ticket is still in an active state.
+- Resume from the current workspace state instead of restarting from scratch.
+- Do not repeat already-completed investigation or validation unless needed for new code changes.
+- Do not end the turn while the issue remains in an active state unless you are blocked by missing required permissions/secrets.
+  {% endif %}
+
+Issue context:
+Identifier: {{ issue.identifier }}
+Title: {{ issue.title }}
+Current status: {{ issue.state }}
+Labels: {{ issue.labels }}
+URL: {{ issue.url }}
+
+Description:
+{% if issue.description %}
+{{ issue.description }}
+{% else %}
+No description provided.
+{% endif %}
+
+Instructions:
+
+1. This is an unattended orchestration session. Never ask a human to perform follow-up actions.
+2. Only stop early for a true blocker (missing required auth/permissions/secrets). If blocked, record it in the workpad and move the issue according to workflow.
+3. Final message must report completed actions and blockers only. Do not include "next steps for user".
+
+Work only in the provided repository copy. Do not touch any other path.
+
+## Prerequisite: Linear MCP or `linear_graphql` tool is available
+
+The agent should be able to talk to Linear, either via a configured Linear MCP server or injected `linear_graphql` tool. If none are present, stop and ask the user to configure Linear.
+
+## Linear GraphQL Guardrails
+
+When using `linear_graphql`, follow these rules exactly:
+
+- Treat `{{ issue.id }}` as the canonical current issue ID.
+- Treat `{{ issue.identifier }}` only as human-readable display text. Do not use `identifier` inside `IssueFilter`.
+- Treat `tracker.project_slug` as a Linear `slugId`. When filtering by project, use `project: { slugId: { eq: ... } }`, not `slug`.
+- Prefer `issue(id: $issueId)` over broad list queries whenever you are operating on the current ticket.
+- Do not introspect the full schema. Use the fixed query and mutation templates below.
+
+Use these exact GraphQL patterns:
+
+Current issue and project/team context:
+
+```graphql
+query CurrentIssue($issueId: String!) {
+  issue(id: $issueId) {
+    id
+    identifier
+    title
+    state {
+      id
+      name
+    }
+    project {
+      id
+      name
+      slugId
+    }
+    team {
+      id
+      name
+    }
+  }
+}
+```
+
+Find a state ID by name for the current issue's team:
+
+```graphql
+query ResolveStateId($issueId: String!, $stateName: String!) {
+  issue(id: $issueId) {
+    team {
+      states(filter: { name: { eq: $stateName } }, first: 1) {
+        nodes {
+          id
+          name
+        }
+      }
+    }
+  }
+}
+```
+
+Read current issue comments when searching for `## Codex Workpad`:
+
+```graphql
+query IssueComments($issueId: String!) {
+  issue(id: $issueId) {
+    comments(first: 50) {
+      nodes {
+        id
+        body
+        createdAt
+      }
+    }
+  }
+}
+```
+
+Move the current issue to a new state:
+
+```graphql
+mutation UpdateIssueState($issueId: String!, $stateId: String!) {
+  issueUpdate(id: $issueId, input: { stateId: $stateId }) {
+    success
+  }
+}
+```
+
+Create or update the persistent workpad comment:
+
+```graphql
+mutation CreateComment($issueId: String!, $body: String!) {
+  commentCreate(input: { issueId: $issueId, body: $body }) {
+    success
+  }
+}
+```
+
+Create follow-up issues in the same team/project:
+
+```graphql
+mutation CreateIssue(
+  $teamId: String!
+  $projectId: String!
+  $stateId: String!
+  $title: String!
+  $description: String!
+) {
+  issueCreate(
+    input: {
+      teamId: $teamId
+      projectId: $projectId
+      stateId: $stateId
+      title: $title
+      description: $description
+    }
+  ) {
+    success
+    issue {
+      id
+      identifier
+      url
+    }
+  }
+}
+```
+
+Do not invent alternate filter fields or alternate mutation shapes when the templates above fit the task.
+
+No verified GraphQL mutation for issue relations is defined in this workflow.
+If a follow-up should be linked as `related` or `blockedBy`, only create that
+relation through a trusted Linear tool that already exposes a verified
+operation; otherwise create the follow-up issue and record the intended linkage
+explicitly in the issue body or workpad.
+
+## Default posture
+
+- Start by determining the ticket's current status, then follow the matching flow for that status.
+- Start every task by opening the tracking workpad comment and bringing it up to date before doing new implementation work.
+- Spend extra effort up front on planning and verification design before implementation.
+- Reproduce first: always confirm the current behavior/issue signal before changing code so the fix target is explicit.
+- Keep ticket metadata current (state, checklist, acceptance criteria, links).
+- Treat a single persistent Linear comment as the source of truth for progress.
+- Use that single workpad comment for all progress and handoff notes; do not post separate "done"/summary comments.
+- Treat any ticket-authored `Validation`, `Test Plan`, or `Testing` section as non-negotiable acceptance input: mirror it in the workpad and execute it before considering the work complete.
+- When meaningful out-of-scope improvements are discovered during execution,
+  file a separate Linear issue instead of expanding scope. The follow-up issue
+  must include a clear title, description, and acceptance criteria, be placed in
+  `Backlog`, and be assigned to the same project as the current issue.
+- If the available Linear tool exposes a verified relation operation, link the
+  current issue as `related` and use `blockedBy` when the follow-up depends on
+  the current issue.
+- Otherwise, record the intended `related` / `blockedBy` linkage explicitly in
+  the follow-up issue body or workpad and do not guess GraphQL relation schema.
+- Move status only when the matching quality bar is met.
+- Operate autonomously end-to-end unless blocked by missing requirements, secrets, or permissions.
+- Use the blocked-access escape hatch only for true external blockers (missing required tools/auth) after exhausting documented fallbacks.
+
+## Related skills
+
+- `linear`: interact with Linear.
+- `commit`: produce clean, logical commits during implementation.
+- `push`: keep remote branch current and publish updates.
+- `pull`: keep branch updated with latest `origin/main` before handoff.
+- `land`: when ticket reaches `Merging`, explicitly open and follow `.codex/skills/land/SKILL.md`, which includes the `land` loop.
+
+## Status map
+
+- `Backlog` -> out of scope for this workflow; do not modify.
+- `Todo` -> queued; immediately transition to `In Progress` before active work.
+  - Special case: if a PR is already attached, treat as feedback/rework loop (run full PR feedback sweep, address or explicitly push back, revalidate, return to `Human Review`).
+- `In Progress` -> implementation actively underway.
+- `Human Review` -> PR is attached and validated; waiting on human approval.
+- `Merging` -> approved by human; execute the `land` skill flow (do not call `gh pr merge` directly).
+- `Rework` -> reviewer requested changes; planning + implementation required.
+- `Done` -> terminal state; no further action required.
+
+## Step 0: Determine current ticket state and route
+
+1. Fetch the issue by explicit ticket ID.
+2. Read the current state.
+3. Route to the matching flow:
+   - `Backlog` -> do not modify issue content/state; stop and wait for human to move it to `Todo`.
+   - `Todo` -> immediately move to `In Progress`, then ensure bootstrap workpad comment exists (create if missing), then start execution flow.
+     - If PR is already attached, start by reviewing all open PR comments and deciding required changes vs explicit pushback responses.
+   - `In Progress` -> continue execution flow from current scratchpad comment.
+   - `Human Review` -> wait and poll for decision/review updates.
+   - `Merging` -> on entry, open and follow `.codex/skills/land/SKILL.md`; do not call `gh pr merge` directly.
+   - `Rework` -> run rework flow.
+   - `Done` -> do nothing and shut down.
+4. Check whether a PR already exists for the current branch and whether it is closed.
+   - If a branch PR exists and is `CLOSED` or `MERGED`, treat prior branch work as non-reusable for this run.
+   - Create a fresh branch from `origin/main` and restart execution flow as a new attempt.
+5. For `Todo` tickets, do startup sequencing in this exact order:
+   - `update_issue(..., state: "In Progress")`
+   - find/create `## Codex Workpad` bootstrap comment
+   - only then begin analysis/planning/implementation work.
+6. Add a short comment if state and issue content are inconsistent, then proceed with the safest flow.
+
+## Step 1: Start/continue execution (Todo or In Progress)
+
+1.  Find or create a single persistent scratchpad comment for the issue:
+    - Search existing comments for a marker header: `## Codex Workpad`.
+    - Ignore resolved comments while searching; only active/unresolved comments are eligible to be reused as the live workpad.
+    - If found, reuse that comment; do not create a new workpad comment.
+    - If not found, create one workpad comment and use it for all updates.
+    - Persist the workpad comment ID and only write progress updates to that ID.
+2.  If arriving from `Todo`, do not delay on additional status transitions: the issue should already be `In Progress` before this step begins.
+3.  Immediately reconcile the workpad before new edits:
+    - Check off items that are already done.
+    - Expand/fix the plan so it is comprehensive for current scope.
+    - Ensure `Acceptance Criteria` and `Validation` are current and still make sense for the task.
+4.  Start work by writing/updating a hierarchical plan in the workpad comment.
+5.  Ensure the workpad includes a compact environment stamp at the top as a code fence line:
+    - Format: `<host>:<abs-workdir>@<short-sha>`
+    - Example: `devbox-01:/home/dev-user/code/symphony-workspaces/MT-32@7bdde33bc`
+    - Do not include metadata already inferable from Linear issue fields (`issue ID`, `status`, `branch`, `PR link`).
+6.  Add explicit acceptance criteria and TODOs in checklist form in the same comment.
+    - If changes are user-facing, include a UI walkthrough acceptance criterion that describes the end-to-end user path to validate.
+    - If changes touch app files or app behavior, add explicit app-specific flow checks to `Acceptance Criteria` in the workpad (for example: launch path, changed interaction path, and expected result path).
+    - If the ticket description/comment context includes `Validation`, `Test Plan`, or `Testing` sections, copy those requirements into the workpad `Acceptance Criteria` and `Validation` sections as required checkboxes (no optional downgrade).
+7.  Run a principal-style self-review of the plan and refine it in the comment.
+8.  Before implementing, capture a concrete reproduction signal and record it in the workpad `Notes` section (command/output, screenshot, or deterministic UI behavior).
+9.  Run the `pull` skill to sync with latest `origin/main` before any code edits, then record the pull/sync result in the workpad `Notes`.
+    - Include a `pull skill evidence` note with:
+      - merge source(s),
+      - result (`clean` or `conflicts resolved`),
+      - resulting `HEAD` short SHA.
+10. Compact context and proceed to execution.
+
+## PR feedback sweep protocol (required)
+
+When a ticket has an attached PR, run this protocol before moving to `Human Review`:
+
+1. Identify the PR number from issue links/attachments.
+2. Gather feedback from all channels:
+   - Top-level PR comments (`gh pr view --comments`).
+   - Inline review comments (`gh api repos/<owner>/<repo>/pulls/<pr>/comments`).
+   - Review summaries/states (`gh pr view --json reviews`).
+3. Treat every actionable reviewer comment (human or bot), including inline review comments, as blocking until one of these is true:
+   - code/test/docs updated to address it, or
+   - explicit, justified pushback reply is posted on that thread.
+4. Update the workpad plan/checklist to include each feedback item and its resolution status.
+5. Re-run validation after feedback-driven changes and push updates.
+6. Repeat this sweep until there are no outstanding actionable comments.
+
+## Blocked-access escape hatch (required behavior)
+
+Use this only when completion is blocked by missing required tools or missing auth/permissions that cannot be resolved in-session.
+
+- GitHub is **not** a valid blocker by default. Always try fallback strategies first (alternate remote/auth mode, then continue publish/review flow).
+- Do not move to `Human Review` for GitHub access/auth until all fallback strategies have been attempted and documented in the workpad.
+- If a non-GitHub required tool is missing, or required non-GitHub auth is unavailable, move the ticket to `Human Review` with a short blocker brief in the workpad that includes:
+  - what is missing,
+  - why it blocks required acceptance/validation,
+  - exact human action needed to unblock.
+- Keep the brief concise and action-oriented; do not add extra top-level comments outside the workpad.
+
+## Step 2: Execution phase (Todo -> In Progress -> Human Review)
+
+1.  Determine current repo state (`branch`, `git status`, `HEAD`) and verify the kickoff `pull` sync result is already recorded in the workpad before implementation continues.
+2.  If current issue state is `Todo`, move it to `In Progress`; otherwise leave the current state unchanged.
+3.  Load the existing workpad comment and treat it as the active execution checklist.
+    - Edit it liberally whenever reality changes (scope, risks, validation approach, discovered tasks).
+4.  Implement against the hierarchical TODOs and keep the comment current:
+    - Check off completed items.
+    - Add newly discovered items in the appropriate section.
+    - Keep parent/child structure intact as scope evolves.
+    - Update the workpad immediately after each meaningful milestone (for example: reproduction complete, code change landed, validation run, review feedback addressed).
+    - Never leave completed work unchecked in the plan.
+    - For tickets that started as `Todo` with an attached PR, run the full PR feedback sweep protocol immediately after kickoff and before new feature work.
+5.  Run validation/tests required for the scope.
+    - Mandatory gate: execute all ticket-provided `Validation`/`Test Plan`/ `Testing` requirements when present; treat unmet items as incomplete work.
+    - Prefer a targeted proof that directly demonstrates the behavior you changed.
+    - You may make temporary local proof edits to validate assumptions (for example: tweak a local build input for `make`, or hardcode a UI account / response path) when this increases confidence.
+    - Revert every temporary proof edit before commit/push.
+    - Document these temporary proof steps and outcomes in the workpad `Validation`/`Notes` sections so reviewers can follow the evidence.
+    - If app-touching, run `launch-app` validation and capture/upload media via `github-pr-media` before handoff.
+6.  Re-check all acceptance criteria and close any gaps.
+7.  Before every `git push` attempt, run the required validation for your scope and confirm it passes; if it fails, address issues and rerun until green, then commit and push changes.
+8.  Attach PR URL to the issue (prefer attachment; use the workpad comment only if attachment is unavailable).
+    - Ensure the GitHub PR has label `symphony` (add it if missing).
+9.  Merge latest `origin/main` into branch, resolve conflicts, and rerun checks.
+10. Update the workpad comment with final checklist status and validation notes.
+    - Mark completed plan/acceptance/validation checklist items as checked.
+    - Add final handoff notes (commit + validation summary) in the same workpad comment.
+    - Do not include PR URL in the workpad comment; keep PR linkage on the issue via attachment/link fields.
+    - Add a short `### Confusions` section at the bottom when any part of task execution was unclear/confusing, with concise bullets.
+    - Do not post any additional completion summary comment.
+11. Before moving to `Human Review`, poll PR feedback and checks:
+    - Read the PR `Manual QA Plan` comment (when present) and use it to sharpen UI/runtime test coverage for the current change.
+    - Run the full PR feedback sweep protocol.
+    - Confirm PR checks are passing (green) after the latest changes.
+    - Confirm every required ticket-provided validation/test-plan item is explicitly marked complete in the workpad.
+    - Repeat this check-address-verify loop until no outstanding comments remain and checks are fully passing.
+    - Re-open and refresh the workpad before state transition so `Plan`, `Acceptance Criteria`, and `Validation` exactly match completed work.
+12. Only then move issue to `Human Review`.
+    - Exception: if blocked by missing required non-GitHub tools/auth per the blocked-access escape hatch, move to `Human Review` with the blocker brief and explicit unblock actions.
+13. For `Todo` tickets that already had a PR attached at kickoff:
+    - Ensure all existing PR feedback was reviewed and resolved, including inline review comments (code changes or explicit, justified pushback response).
+    - Ensure branch was pushed with any required updates.
+    - Then move to `Human Review`.
+
+## Step 3: Human Review and merge handling
+
+1. When the issue is in `Human Review`, do not code or change ticket content.
+2. Poll for updates as needed, including GitHub PR review comments from humans and bots.
+3. If review feedback requires changes, move the issue to `Rework` and follow the rework flow.
+4. If approved, human moves the issue to `Merging`.
+5. When the issue is in `Merging`, open and follow `.codex/skills/land/SKILL.md`, then run the `land` skill in a loop until the PR is merged. Do not call `gh pr merge` directly.
+6. After merge is complete, move the issue to `Done`.
+
+## Step 4: Rework handling
+
+1. Treat `Rework` as a full approach reset, not incremental patching.
+2. Re-read the full issue body and all human comments; explicitly identify what will be done differently this attempt.
+3. Close the existing PR tied to the issue.
+4. Remove the existing `## Codex Workpad` comment from the issue.
+5. Create a fresh branch from `origin/main`.
+6. Start over from the normal kickoff flow:
+   - If current issue state is `Todo`, move it to `In Progress`; otherwise keep the current state.
+   - Create a new bootstrap `## Codex Workpad` comment.
+   - Build a fresh plan/checklist and execute end-to-end.
+
+## Completion bar before Human Review
+
+- Step 1/2 checklist is fully complete and accurately reflected in the single workpad comment.
+- Acceptance criteria and required ticket-provided validation items are complete.
+- Validation/tests are green for the latest commit.
+- PR feedback sweep is complete and no actionable comments remain.
+- PR checks are green, branch is pushed, and PR is linked on the issue.
+- Required PR metadata is present (`symphony` label).
+- If app-touching, runtime validation/media requirements from `App runtime validation (required)` are complete.
+
+## Guardrails
+
+- If the branch PR is already closed/merged, do not reuse that branch or prior implementation state for continuation.
+- For closed/merged branch PRs, create a new branch from `origin/main` and restart from reproduction/planning as if starting fresh.
+- If issue state is `Backlog`, do not modify it; wait for human to move to `Todo`.
+- Do not edit the issue body/description for planning or progress tracking.
+- Use exactly one persistent workpad comment (`## Codex Workpad`) per issue.
+- If comment editing is unavailable in-session, use the update script. Only report blocked if both MCP editing and script-based editing are unavailable.
+- Temporary proof edits are allowed only for local verification and must be reverted before commit.
+- If out-of-scope improvements are found, create a separate Backlog issue rather
+  than expanding current scope, and include a clear
+  title/description/acceptance criteria and same-project assignment.
+- Add `related` / `blockedBy` relations in-session only when the available
+  Linear tool exposes a verified relation operation.
+- Otherwise, record the intended linkage explicitly in the follow-up issue body
+  or workpad instead of guessing relation mutation syntax.
+- Do not move to `Human Review` unless the `Completion bar before Human Review` is satisfied.
+- In `Human Review`, do not make changes; wait and poll.
+- If state is terminal (`Done`), do nothing and shut down.
+- Keep issue text concise, specific, and reviewer-oriented.
+- If blocked and no workpad exists yet, add one blocker comment describing blocker, impact, and next unblock action.
+
+## Workpad template
+
+Use this exact structure for the persistent workpad comment and keep it updated in place throughout execution:
+
+````md
+## Codex Workpad
+
+```text
+<hostname>:<abs-path>@<short-sha>
+```
+
+### Plan
+
+- [ ] 1\. Parent task
+  - [ ] 1.1 Child task
+  - [ ] 1.2 Child task
+- [ ] 2\. Parent task
+
+### Acceptance Criteria
+
+- [ ] Criterion 1
+- [ ] Criterion 2
+
+### Validation
+
+- [ ] targeted tests: `<command>`
+
+### Notes
+
+- <short progress note with timestamp>
+
+### Confusions
+
+- <only include when something was confusing during execution>
+````
diff --git a/docs/exp/2026-05-10-libcrypto-canonical-eval-contract.md b/docs/exp/2026-05-10-libcrypto-canonical-eval-contract.md
new file mode 100644
index 000000000..7d109d729
--- /dev/null
+++ b/docs/exp/2026-05-10-libcrypto-canonical-eval-contract.md
@@ -0,0 +1,326 @@
+# 2026-05-10 libcrypto canonical eval contract
+
+## Goal
+
+修复 `SYM-20` 里 `libcrypto` 的评估口径错配，把当前 repo-local authoritative contract、历史 non-canonical 路径、以及本工作区可直接验证的结果拆开记录清楚。
+
+## Inputs
+
+- Canonical binary:
+  - `archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.so.3`
+- Canonical GT:
+  - `archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.gtBlock.pb`
+- Historical old-binary baseline:
+  - `archives/experiments/libcrypto_master_test_20260414/libcrypto.so.3`
+  - `runs/validate-libcrypto-llm-agent-current/baseline/cmp.repaired.json`
+  - `runs/validate-libcrypto-llm-agent-current/baseline/summary.json`
+- Historical SYM-19 note:
+  - `/hdd/code/runnable/SYM-19/docs/exp/2026-05-02-sym-19-libcrypto-eval.md`
+- Canonical new-GT serial run:
+  - `runs/runnable-dev-2026-0429-newgt-serial/libcrypto.so.3`
+  - `runs/runnable-dev-2026-0429-newgt-serial/libcrypto.so.3.entry_0x500cf000.ll`
+
+## Environment
+
+- Workspace root: `/home/iskindar/Project/runnable`
+- Canonical path resolver:
+  - `scripts/libcrypto_bench_paths.py`
+- Canonical GT audit / compare wrapper:
+  - `scripts/validate_libcrypto_ground_truth.py`
+- Fresh compare wrapper:
+  - `.codex/skills/runnable-cmp-eval/scripts/run_cmp_eval.py`
+
+## Commands
+
+Resolve canonical assets:
+
+```bash
+python3 scripts/libcrypto_bench_paths.py binary --must-exist
+python3 scripts/libcrypto_bench_paths.py groundtruth-pb --must-exist
+```
+
+Gap-audit canonical GT coverage:
+
+```bash
+python3 scripts/validate_libcrypto_ground_truth.py gap-audit \
+  --out-dir runs/groundtruth_validation/canonical_gap_audit
+```
+
+Show that the old baseline `.ll` is not comparable to canonical GT:
+
+```bash
+python3 scripts/validate_libcrypto_ground_truth.py cmp \
+  --ll runs/validate-libcrypto-llm-agent-current/docker_exec/baseline/libcrypto.so.3.entry_0x500cf000.ll \
+  --out-dir runs/groundtruth_validation/canonical_baseline_cmp \
+  --examples 10
+```
+
+Re-run a comparable canonical baseline using the repo-local `newgt-serial` artifact:
+
+```bash
+python3 scripts/validate_libcrypto_ground_truth.py cmp \
+  --binary runs/runnable-dev-2026-0429-newgt-serial/libcrypto.so.3 \
+  --groundtruth archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.gtBlock.pb \
+  --ll runs/runnable-dev-2026-0429-newgt-serial/libcrypto.so.3.entry_0x500cf000.ll \
+  --out-dir runs/groundtruth_validation/newgt_serial_cmp \
+  --examples 10
+```
+
+Binary identity checks:
+
+```bash
+sha256sum \
+  /hdd/code/runnable/SYM-19/test/openssl_data/libcrypto.so.3 \
+  archives/experiments/libcrypto_master_test_20260414/libcrypto.so.3 \
+  archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.so.3
+
+readelf -WS archives/experiments/libcrypto_master_test_20260414/libcrypto.so.3
+readelf -WS archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.so.3
+```
+
+## Artifacts
+
+- Canonical GT gap audit:
+  - `runs/groundtruth_validation/canonical_gap_audit/gap.summary.json`
+  - `runs/groundtruth_validation/canonical_gap_audit/gap.summary.txt`
+- Canonical compare against old baseline `.ll`:
+  - `runs/groundtruth_validation/canonical_baseline_cmp/cmp.json`
+  - `runs/groundtruth_validation/canonical_baseline_cmp/cmp.verdict.txt`
+- Canonical comparable serial baseline:
+  - `runs/groundtruth_validation/newgt_serial_cmp/cmp.json`
+  - `runs/groundtruth_validation/newgt_serial_cmp/cmp.txt`
+  - `runs/groundtruth_validation/newgt_serial_cmp/cmp.verdict.txt`
+- Canonical textstart baseline:
+  - `runs/groundtruth_validation/textstart_serial_cmp/cmp.json`
+  - `runs/groundtruth_validation/textstart_serial_cmp/cmp.txt`
+  - `runs/groundtruth_validation/textstart_serial_cmp/cmp.verdict.txt`
+- Canonical textstart optimized:
+  - `runs/groundtruth_validation/textstart_dynamic_cmp/cmp.json`
+  - `runs/groundtruth_validation/textstart_dynamic_cmp/cmp.verdict.txt`
+- Historical old-binary baseline:
+  - `runs/validate-libcrypto-llm-agent-current/baseline/cmp.repaired.json`
+  - `runs/validate-libcrypto-llm-agent-current/baseline/summary.json`
+
+## Results
+
+### Canonical authoritative contract
+
+- Binary:
+  - `archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.so.3`
+  - sha256 `2d4faaa94bb53b5f92a7d8d0b581eea1ad0446c30f34a8ddf4baef713f744d04`
+- GT:
+  - `archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.gtBlock.pb`
+- Address mapping:
+  - ELF `.text` start: `0xcef80`
+  - Runnable base: `0x50000000`
+- Compare path:
+  - fresh `runnable-cmp-eval`
+  - do not use cached `.result`
+  - do not use `coverage_sidecar_union_threadpool16` as final authority
+
+### Canonical GT coverage caveat
+
+`gap-audit` shows the current `gtBlock.pb` bundle is not a byte-for-byte mirror of `objdump -d -j .text`:
+
+- `objdump_real_instruction_count=707311`
+- `groundtruth_instruction_count=670703`
+- `unseen_instruction_count=36610`
+- `unseen_ratio_over_groundtruth=0.05458451803555374`
+- `instruction_category_counts.outside_gt_coverage=35257`
+- `instruction_category_counts.padding=1353`
+
+This caveat must be reported with canonical results instead of assuming GT and objdump are identical.
+
+### Why the SYM-19 `recall=0.445502995810004` result is not directly comparable
+
+`SYM-19` used a different binary, different GT representation, different address contract, and different evaluator:
+
+1. Binary generation mismatch:
+   - `/hdd/code/runnable/SYM-19/test/openssl_data/libcrypto.so.3`
+   - `archives/experiments/libcrypto_master_test_20260414/libcrypto.so.3`
+   - both share sha256 `932923d4498c83f75c60a7a02404267297df9f7a2844ed6f3f8d154041b558db`
+   - canonical `2026-04-28` binary has sha256 `2d4faaa94bb53b5f92a7d8d0b581eea1ad0446c30f34a8ddf4baef713f744d04`
+2. `.text` layout mismatch:
+   - old binary `.text` starts at `0xcf000`
+   - canonical binary `.text` starts at `0xcef80`
+3. GT representation mismatch:
+   - `SYM-19`: CSV GT bundle under `out/sym-19/gt/ground_truth.csv`
+   - canonical: `gtBlock.pb`
+4. Evaluator mismatch:
+   - `SYM-19`: `coverage_sidecar_union_threadpool16`
+   - canonical: fresh `runnable-cmp-eval`
+5. Address-domain mismatch:
+   - `SYM-19`: `rebase_base=0x50400000`
+   - canonical compare: `runnable_base=0x50000000`
+
+So `SYM-19`’s:
+
+- baseline `precision=0.9885320776927604`, `recall=0.445502995810004`
+- optimized `precision=0.9885531253892196`, `recall=0.4463316579715496`
+
+must be labeled `historical sidecar-union / non-canonical`, not compared numerically against canonical `gtBlock.pb` results.
+
+### Historical old-binary baseline under the old contract
+
+From `runs/validate-libcrypto-llm-agent-current/baseline/cmp.repaired.json`:
+
+- binary: old `2026-04-14` binary
+- `.text` start: `0xcf000`
+- `runnable_base=0x50000000`
+- `obj_count=932388`
+- `ll_count=847589`
+- `HIT=787564`
+- `MISMATCH=1414`
+- `OBJ_ONLY=143410`
+- `LL_ONLY=58611`
+- `FALSE_NEGATIVE=144824`
+- `FALSE_POSITIVE=60025`
+- `precision=0.929181478287236`
+- `recall=0.8446741056298451`
+
+This is a valid old-binary compare, but not the `SYM-20` canonical authority.
+
+### Proof that the old baseline `.ll` fails under the canonical contract
+
+Comparing the old baseline `.ll` against canonical `2026-04-28` GT produces a catastrophic mismatch:
+
+- `obj_count=679506`
+- `ll_count=847618`
+- `HIT=24175`
+- `MISMATCH=112631`
+- `OBJ_ONLY=542700`
+- `LL_ONLY=710812`
+- `FALSE_NEGATIVE=655331`
+- `FALSE_POSITIVE=823443`
+- `precision=0.028521102666531385`
+- `recall=0.03557731646225346`
+
+This is the direct proof that “old `.ll` + canonical GT” is not a legitimate comparison target.
+
+### Corrected comparable baseline available in the current workspace
+
+The current workspace does contain one directly comparable canonical baseline artifact:
+
+- binary: `runs/runnable-dev-2026-0429-newgt-serial/libcrypto.so.3`
+- ll: `runs/runnable-dev-2026-0429-newgt-serial/libcrypto.so.3.entry_0x500cf000.ll`
+- binary sha256 matches canonical `2026-04-28` GT binary
+
+Fresh canonical compare result:
+
+- `obj_count=679506`
+- `ll_count=593206`
+- `HIT=532849`
+- `MISMATCH=1466`
+- `OBJ_ONLY=145191`
+- `LL_ONLY=58891`
+- `FALSE_NEGATIVE=146657`
+- `FALSE_POSITIVE=60357`
+- `precision=0.8982528834839837`
+- `recall=0.7841711478633007`
+
+### Corrected comparable baseline / optimized pair for SYM-20
+
+The current workspace also contains one full corrected pair that satisfies the ticket's comparability rule:
+
+- same canonical binary:
+  - `archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.so.3`
+- same canonical GT:
+  - `archives/groundtruth/libcrypto_groudtruth_20260428/libcrypto.gtBlock.pb`
+- same address mapping:
+  - `.text_start=0xcef80`
+  - `runnable_base=0x50000000`
+- same compare path:
+  - fresh `runnable-cmp-eval` through `scripts/validate_libcrypto_ground_truth.py cmp`
+- same artifact contract:
+  - `cmp.json`
+  - `cmp.txt`
+  - `cmp.verdict.txt`
+
+Baseline:
+
+- ll: `runs/runnable-dev-2026-0429-textstart-serial/libcrypto.so.3.entry_0x500cef80.ll`
+- `obj_count=679506`
+- `ll_count=554051`
+- `HIT=527641`
+- `MISMATCH=1044`
+- `OBJ_ONLY=150821`
+- `LL_ONLY=25366`
+- `FALSE_NEGATIVE=151865`
+- `FALSE_POSITIVE=26410`
+- `precision=0.9523329079813952`
+- `recall=0.7765067563788988`
+
+Optimized:
+
+- ll: `runs/runnable-dev-2026-0501-textstart-dynamic-reapfix/libcrypto.so.3.entry_0x500cef80.dynamic.ll`
+- `obj_count=679506`
+- `ll_count=569562`
+- `HIT=529848`
+- `MISMATCH=1217`
+- `OBJ_ONLY=148441`
+- `LL_ONLY=38497`
+- `FALSE_NEGATIVE=149658`
+- `FALSE_POSITIVE=39714`
+- `precision=0.9302727358917905`
+- `recall=0.7797547041527227`
+
+Delta (`optimized - baseline`):
+
+- `HIT=+2207`
+- `MISMATCH=+173`
+- `OBJ_ONLY=-2380`
+- `LL_ONLY=+13131`
+- `FALSE_NEGATIVE=-2207`
+- `FALSE_POSITIVE=+13304`
+- `precision=-0.022060172089604757`
+- `recall=+0.003247947773823978`
+
+Verdict under the corrected canonical contract:
+
+- recall `improved`
+- precision `regressed`
+- overall verdict: `mixed`, not a clean improvement
+
+### Remaining evidence gap
+
+I still could not find the raw artifact behind the older historical claim:
+
+- accepted baseline `precision=0.960596`, `recall=0.845340`
+- historical llm `precision=0.966030`, `recall=0.841228`
+
+The historical “accepted baseline `0.960596 / 0.845340` and llm `0.966030 / 0.841228`” numbers currently appear in OpenSpec design text, but no raw compare artifact for them was found in this workspace or the scanned `/hdd/code/runnable/SYM-20` tree.
+
+So the truthful current state is:
+
+- canonical comparable baseline: reproduced
+- canonical comparable optimized: reproduced
+- corrected comparable verdict: reproduced
+- historical non-canonical sidecar-union metrics: explained and fenced off
+- older accepted 0.960596 / 0.845340 claim: still lacks raw artifact provenance in the current workspace
+
+## Conclusion
+
+`SYM-20`’s main bug is now explicit:
+
+- historical `SYM-19` numbers were reported from a different binary generation and a different evaluator contract
+- current repo scripts also carried forward a stale `0xcf000` default that belongs to the old binary generation, not the canonical `2026-04-28` GT bundle
+
+The repo-local canonical contract is:
+
+- canonical `2026-04-28` binary
+- canonical `gtBlock.pb`
+- ELF-derived `.text` start
+- `runnable_base=0x50000000`
+- fresh `runnable-cmp-eval`
+
+Under that corrected contract, the currently reproducible pair in this workspace is:
+
+- baseline: `precision=0.9523329079813952`, `recall=0.7765067563788988`
+- optimized: `precision=0.9302727358917905`, `recall=0.7797547041527227`
+- verdict: recall improves slightly, precision regresses materially, so the result is `mixed`
+
+## Next Step
+
+1. If the historical accepted `0.960596 / 0.845340` pair still matters, recover its raw compare artifact or regenerate it under a documented run directory
+2. Decide whether the repo should standardize on the `textstart-*` pair above or on a separately recovered accepted baseline lineage
+3. Keep future libcrypto claims on the canonical skill + `validate_libcrypto_ground_truth.py cmp` path only
diff --git a/docs/exp/topics.md b/docs/exp/topics.md
new file mode 100644
index 000000000..ab6abc40a
--- /dev/null
+++ b/docs/exp/topics.md
@@ -0,0 +1,41 @@
+# Experiment Topics
+
+This page groups experiment-related documents by topic so that new runs can be found without knowing the exact date-based filename.
+
+## libcrypto
+
+Use this group for `libcrypto.so.3` lift, agent/classifier, and policy experiments.
+
+- [2026-04-22 libcrypto boundary audit](2026-04-22-libcrypto-boundary-audit.md)
+  - normalizes current illegal-entry candidates and emits deterministic boundary actions
+- [2026-04-22 libcrypto old classifier validation](2026-04-22-libcrypto-old-classifier-validation.md)
+  - validates old `raw` and old `agent` classifiers on current `libcrypto.so.3`
+  - includes wrong-proxy failure and fixed-proxy rerun
+- [2026-05-10 libcrypto canonical eval contract](2026-05-10-libcrypto-canonical-eval-contract.md)
+  - fixes the SYM-20 evaluation contract and separates canonical GT compare from historical sidecar-union metrics
+- [../design/libcrypto/README.md](../design/libcrypto/README.md)
+  - method/design navigation for the `libcrypto` track
+- [../reference/RUNNABLE_CORE_AND_METRICS.md](../reference/RUNNABLE_CORE_AND_METRICS.md)
+  - background on how `--llm-policy` fits into Runnable
+
+## Rewrite Validation
+
+Use this group for serial vs parallel rewrite checks and behavior-validation summaries.
+
+- [../results/2026-04-22-coreutils-serial-parallel-validation.md](../results/2026-04-22-coreutils-serial-parallel-validation.md)
+  - serial/parallel behavior comparison on `coreutils`
+
+## GT / Embedded Data
+
+Use this group for ground-truth methodology and embedded-data interpretation.
+
+- [../reference/README-embedata.md](../reference/README-embedata.md)
+  - concrete embedded-data examples
+- [../design/2026-04-02-evaluation-wrapup-design.md](../design/2026-04-02-evaluation-wrapup-design.md)
+  - GT repair and stripped-OpenSSL design context
+
+## Notes
+
+- A single experiment can appear under multiple topics.
+- Date-based experiment files should stay under `docs/exp/`.
+- Broader validation summaries that are more like stable result notes can remain under `docs/results/` and still be linked here.
diff --git a/scripts/libcrypto_bench_paths.py b/scripts/libcrypto_bench_paths.py
new file mode 100644
index 000000000..30b0b28ff
--- /dev/null
+++ b/scripts/libcrypto_bench_paths.py
@@ -0,0 +1,173 @@
+#!/usr/bin/env python3
+"""Resolve canonical libcrypto benchmark assets from repo-local or external roots."""
+
+from __future__ import annotations
+
+import argparse
+import os
+import re
+import subprocess
+import sys
+from pathlib import Path
+from typing import Iterable, List
+
+
+ROOT_DIR = Path(__file__).resolve().parents[1]
+BENCH_ROOT_ENV = "RUNNABLE_LIBCRYPTO_BENCH_ROOT"
+GROUND_TRUTH_ENV = "RUNNABLE_LIBCRYPTO_GROUND_TRUTH"
+CANONICAL_GT_DIRNAME = "libcrypto_groudtruth_20260428"
+LEGACY_GT_DIRNAME = "libcrypto_master_test_20260414"
+CANONICAL_GT_BINARY_REL = (
+    Path("archives") / "groundtruth" / CANONICAL_GT_DIRNAME / "libcrypto.so.3"
+)
+LEGACY_GT_BINARY_REL = (
+    Path("archives") / "experiments" / LEGACY_GT_DIRNAME / "libcrypto.so.3"
+)
+READELF_TEXT_RE = re.compile(r"^\s*\[\s*\d+\]\s+(\S+)\s+\S+\s+([0-9a-fA-F]+)\s")
+
+
+def _unique_paths(paths: Iterable[Path]) -> List[Path]:
+    unique: List[Path] = []
+    seen: set[str] = set()
+    for path in paths:
+        key = os.path.normpath(str(path.expanduser()))
+        if key in seen:
+            continue
+        seen.add(key)
+        unique.append(Path(key))
+    return unique
+
+
+def configured_bench_roots(repo_root: Path = ROOT_DIR) -> List[Path]:
+    candidates: List[Path] = []
+    env_root = os.environ.get(BENCH_ROOT_ENV)
+    if env_root:
+        candidates.append(Path(env_root).expanduser())
+    candidates.append(repo_root.expanduser())
+    return _unique_paths(candidates)
+
+
+def _binary_candidates_from_root(root: Path) -> List[Path]:
+    root = root.expanduser()
+    if root.name == "libcrypto.so.3" or root.is_file():
+        return [root]
+
+    candidates = [
+        root / CANONICAL_GT_BINARY_REL,
+        root / LEGACY_GT_BINARY_REL,
+        root / CANONICAL_GT_DIRNAME / "libcrypto.so.3",
+        root / LEGACY_GT_DIRNAME / "libcrypto.so.3",
+        root / "libcrypto.so.3",
+    ]
+    if root.name == "groundtruth":
+        candidates.insert(0, root / CANONICAL_GT_DIRNAME / "libcrypto.so.3")
+    if root.name == "experiments":
+        candidates.insert(0, root / LEGACY_GT_DIRNAME / "libcrypto.so.3")
+    return _unique_paths(candidates)
+
+
+def binary_candidates(repo_root: Path = ROOT_DIR) -> List[Path]:
+    exact = os.environ.get(GROUND_TRUTH_ENV)
+    candidates: List[Path] = []
+    if exact:
+        candidates.append(Path(exact).expanduser())
+    for root in configured_bench_roots(repo_root):
+        candidates.extend(_binary_candidates_from_root(root))
+    return _unique_paths(candidates)
+
+
+def default_ground_truth_binary(repo_root: Path = ROOT_DIR) -> Path:
+    candidates = binary_candidates(repo_root)
+    for candidate in candidates:
+        if candidate.exists():
+            return candidate.resolve()
+    return candidates[0]
+
+
+def bench_root_for_binary(binary: Path, repo_root: Path = ROOT_DIR) -> Path:
+    resolved_binary = binary.resolve()
+    for root in configured_bench_roots(repo_root):
+        expanded_root = root.expanduser()
+        if expanded_root.is_file():
+            if expanded_root.resolve() == resolved_binary:
+                return expanded_root.parent.resolve()
+            continue
+        try:
+            resolved_binary.relative_to(expanded_root.resolve())
+            return expanded_root.resolve()
+        except ValueError:
+            continue
+    return resolved_binary.parent.resolve()
+
+
+def _groundtruth_pb_candidates(binary: Path) -> List[Path]:
+    candidates: List[Path] = []
+    if ".so" in binary.name:
+        base = binary.name.split(".so", 1)[0]
+        candidates.append(binary.with_name(f"{base}.gtBlock.pb"))
+    candidates.append(Path(str(binary) + ".gtBlock.pb"))
+    candidates.append(binary.with_name(f"{binary.name}.gtBlock.pb"))
+    return _unique_paths(candidates)
+
+
+def default_groundtruth_pb(repo_root: Path = ROOT_DIR) -> Path:
+    binary = default_ground_truth_binary(repo_root)
+    candidates = _groundtruth_pb_candidates(binary)
+    for candidate in candidates:
+        if candidate.exists():
+            return candidate.resolve()
+    return candidates[0]
+
+
+def parse_text_start_from_readelf_output(output: str) -> int:
+    for line in output.splitlines():
+        match = READELF_TEXT_RE.match(line)
+        if match and match.group(1) == ".text":
+            return int(match.group(2), 16)
+    raise RuntimeError("cannot detect .text start from readelf output")
+
+
+def detect_text_start(binary: Path) -> int:
+    result = subprocess.run(
+        ["readelf", "-WS", str(binary)],
+        check=True,
+        capture_output=True,
+        text=True,
+    )
+    return parse_text_start_from_readelf_output(result.stdout)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Resolve canonical libcrypto benchmark asset paths."
+    )
+    parser.add_argument(
+        "kind",
+        choices=["bench-root", "binary", "groundtruth-pb", "text-start"],
+    )
+    parser.add_argument("--repo-root", type=Path, default=ROOT_DIR)
+    parser.add_argument("--must-exist", action="store_true")
+    return parser
+
+
+def main(argv: list[str] | None = None) -> int:
+    args = build_parser().parse_args(argv)
+    repo_root = args.repo_root.resolve()
+    if args.kind == "bench-root":
+        value = configured_bench_roots(repo_root)[0]
+    elif args.kind == "binary":
+        value = default_ground_truth_binary(repo_root)
+    elif args.kind == "groundtruth-pb":
+        value = default_groundtruth_pb(repo_root)
+    else:
+        value = hex(detect_text_start(default_ground_truth_binary(repo_root)))
+
+    if args.kind != "text-start" and args.must_exist and not value.exists():
+        print(str(value), file=sys.stderr)
+        return 2
+    print(str(value))
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/scripts/validate_libcrypto_ground_truth.py b/scripts/validate_libcrypto_ground_truth.py
new file mode 100644
index 000000000..9492a2cef
--- /dev/null
+++ b/scripts/validate_libcrypto_ground_truth.py
@@ -0,0 +1,533 @@
+#!/usr/bin/env python3
+"""Validate the refreshed libcrypto ground truth and compare lift outputs safely."""
+
+from __future__ import annotations
+
+import argparse
+import bisect
+import collections
+import importlib.util
+import json
+import re
+import subprocess
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Dict, Iterable, List, Sequence, Tuple
+
+from libcrypto_bench_paths import default_ground_truth_binary, default_groundtruth_pb
+
+
+ROOT_DIR = Path(__file__).resolve().parents[1]
+SCRIPT_DIR = Path(__file__).resolve().parent
+DEFAULT_BINARY = default_ground_truth_binary(ROOT_DIR)
+DEFAULT_GROUNDTRUTH = default_groundtruth_pb(ROOT_DIR)
+DEFAULT_RUN_CMP_EVAL = (
+    ROOT_DIR / ".codex" / "skills" / "runnable-cmp-eval" / "scripts" / "run_cmp_eval.py"
+)
+DEFAULT_VENDOR_BLOCKS_PB2 = SCRIPT_DIR / "_vendor" / "blocks_pb2.py"
+DEFAULT_OUT_DIR = ROOT_DIR / "runs" / "groundtruth_validation"
+DEFAULT_RUNNABLE_BASE = 0x50000000
+DEFAULT_MIN_PRECISION = 0.80
+DEFAULT_MIN_RECALL = 0.80
+READELF_TEXT_RE = re.compile(r"^\s*\[\s*\d+\]\s+(\S+)\s+\S+\s+([0-9a-fA-F]+)\s")
+OBJDUMP_INST_RE = re.compile(r"^\s*([0-9a-fA-F]+):\s+((?:[0-9a-fA-F]{2}\s)+)\s*(.*)$")
+
+
+@dataclass
+class CmpVerdict:
+    ok: bool
+    reasons: List[str]
+
+
+def parse_int(value: str) -> int:
+    return int(value, 0)
+
+
+def ensure_file(path: Path, label: str) -> None:
+    if not path.is_file():
+        raise FileNotFoundError(f"{label} not found: {path}")
+
+
+def ensure_dir(path: Path) -> None:
+    path.mkdir(parents=True, exist_ok=True)
+
+
+def run_cmd(cmd: Sequence[str], *, check: bool = True) -> subprocess.CompletedProcess:
+    return subprocess.run(cmd, check=check, text=True, capture_output=True)
+
+
+def read_json(path: Path) -> Dict[str, object]:
+    return json.loads(path.read_text(encoding="utf-8"))
+
+
+def write_json(path: Path, payload: Dict[str, object]) -> None:
+    path.write_text(json.dumps(payload, indent=2) + "\n", encoding="utf-8")
+
+
+def write_text(path: Path, lines: Iterable[str]) -> None:
+    path.write_text("\n".join(lines) + "\n", encoding="utf-8")
+
+
+def detect_text_start(binary: Path) -> int:
+    out = run_cmd(["readelf", "-WS", str(binary)]).stdout
+    for line in out.splitlines():
+        match = READELF_TEXT_RE.match(line)
+        if match and match.group(1) == ".text":
+            return int(match.group(2), 16)
+    raise RuntimeError(f"cannot detect .text start for {binary}")
+
+
+def resolve_default_groundtruth_path(binary: Path) -> Path:
+    candidates: List[Path] = []
+    if ".so" in binary.name:
+        base = binary.name.split(".so", 1)[0]
+        candidates.append(binary.with_name(f"{base}.gtBlock.pb"))
+    candidates.append(Path(str(binary) + ".gtBlock.pb"))
+    candidates.append(binary.with_name(f"{binary.name}.gtBlock.pb"))
+
+    seen = set()
+    for candidate in candidates:
+        if candidate in seen:
+            continue
+        seen.add(candidate)
+        if candidate.exists():
+            return candidate
+    raise FileNotFoundError(
+        f"could not infer groundtruth protobuf next to {binary}; tried: "
+        + ", ".join(str(path) for path in candidates)
+    )
+
+
+def load_blocks_pb2(path: Path):
+    spec = importlib.util.spec_from_file_location("groundtruth_blocks_pb2", path)
+    if spec is None or spec.loader is None:
+        raise RuntimeError(f"cannot load protobuf module from {path}")
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    return module
+
+
+def merge_ranges(ranges: Iterable[Tuple[int, int]]) -> List[Tuple[int, int]]:
+    sorted_ranges = sorted(ranges)
+    merged: List[List[int]] = []
+    for start, end in sorted_ranges:
+        if not merged or start > merged[-1][1]:
+            merged.append([start, end])
+            continue
+        merged[-1][1] = max(merged[-1][1], end)
+    return [(start, end) for start, end in merged]
+
+
+def in_ranges(addr: int, ranges: Sequence[Tuple[int, int]], starts: Sequence[int]) -> bool:
+    idx = bisect.bisect_right(starts, addr) - 1
+    if idx < 0:
+        return False
+    start, end = ranges[idx]
+    return start <= addr < end
+
+
+def parse_groundtruth(gt_path: Path, blocks_pb2_path: Path) -> Tuple[set[int], List[Tuple[int, int]], List[Tuple[int, int]]]:
+    blocks_pb2 = load_blocks_pb2(blocks_pb2_path)
+    module = blocks_pb2.module()
+    module.ParseFromString(gt_path.read_bytes())
+
+    gt_inst_addrs: set[int] = set()
+    covered_ranges: List[Tuple[int, int]] = []
+    padding_ranges: List[Tuple[int, int]] = []
+    for func in module.fuc:
+        for bb in func.bb:
+            for inst in bb.instructions:
+                gt_inst_addrs.add(int(inst.va))
+            va = int(bb.va)
+            size = int(bb.size)
+            padding = int(bb.padding)
+            covered_end = va + size - padding
+            if covered_end > va:
+                covered_ranges.append((va, covered_end))
+            if padding > 0:
+                pad_start = covered_end
+                pad_end = va + size
+                if pad_end > pad_start:
+                    padding_ranges.append((pad_start, pad_end))
+
+    return gt_inst_addrs, merge_ranges(covered_ranges), merge_ranges(padding_ranges)
+
+
+def parse_objdump_instructions(binary: Path) -> List[Tuple[int, int, str]]:
+    out = run_cmd(["objdump", "-d", "-j", ".text", str(binary)]).stdout
+    instructions: List[Tuple[int, int, str]] = []
+    for line in out.splitlines():
+        match = OBJDUMP_INST_RE.match(line)
+        if not match:
+            continue
+        addr = int(match.group(1), 16)
+        bytes_field = match.group(2).strip()
+        asm = match.group(3).strip()
+        if not asm:
+            continue
+        size = len(bytes_field.split())
+        instructions.append((addr, size, asm))
+    return instructions
+
+
+def merge_unseen_ranges(unseen: Sequence[Tuple[int, int, str]]) -> List[Tuple[int, int, int, List[str]]]:
+    if not unseen:
+        return []
+    ranges: List[Tuple[int, int, int, List[str]]] = []
+    start = unseen[0][0]
+    end = unseen[0][0] + unseen[0][1]
+    count = 1
+    sample = [unseen[0][2]]
+    for addr, size, asm in unseen[1:]:
+        if addr == end:
+            end = addr + size
+            count += 1
+            if len(sample) < 3:
+                sample.append(asm)
+            continue
+        ranges.append((start, end, count, sample))
+        start = addr
+        end = addr + size
+        count = 1
+        sample = [asm]
+    ranges.append((start, end, count, sample))
+    return ranges
+
+
+def analyze_groundtruth_gap(
+    binary: Path,
+    groundtruth: Path,
+    blocks_pb2_path: Path,
+) -> Dict[str, object]:
+    gt_inst_addrs, covered_ranges, padding_ranges = parse_groundtruth(
+        groundtruth, blocks_pb2_path
+    )
+    covered_starts = [start for start, _ in covered_ranges]
+    padding_starts = [start for start, _ in padding_ranges]
+
+    obj_insts = parse_objdump_instructions(binary)
+    unseen = [
+        (addr, size, asm) for addr, size, asm in obj_insts if addr not in gt_inst_addrs
+    ]
+    unseen_ranges = merge_unseen_ranges(unseen)
+
+    instruction_category_counts = collections.Counter()
+    range_category_counts = collections.Counter()
+    instruction_examples = {"padding": [], "outside_gt_coverage": []}
+    range_examples = {"padding": [], "outside_gt_coverage": []}
+
+    for addr, size, asm in unseen:
+        category = (
+            "padding"
+            if in_ranges(addr, padding_ranges, padding_starts)
+            else "outside_gt_coverage"
+        )
+        instruction_category_counts[category] += 1
+        if len(instruction_examples[category]) < 20:
+            instruction_examples[category].append(
+                {"addr": hex(addr), "size": size, "asm": asm}
+            )
+
+    unseen_range_items = []
+    for start, end, inst_count, sample_asm in unseen_ranges:
+        category = (
+            "padding"
+            if in_ranges(start, padding_ranges, padding_starts)
+            else "outside_gt_coverage"
+        )
+        range_category_counts[category] += 1
+        item = {
+            "start": hex(start),
+            "end": hex(end),
+            "inst_count": inst_count,
+            "category": category,
+            "sample_asm": sample_asm,
+        }
+        unseen_range_items.append(item)
+        if len(range_examples[category]) < 20:
+            range_examples[category].append(item)
+
+    unseen_ratio = len(unseen) / len(gt_inst_addrs) if gt_inst_addrs else 0.0
+    return {
+        "binary": str(binary),
+        "groundtruth": str(groundtruth),
+        "blocks_pb2": str(blocks_pb2_path),
+        "objdump_real_instruction_count": len(obj_insts),
+        "groundtruth_instruction_count": len(gt_inst_addrs),
+        "unseen_instruction_count": len(unseen),
+        "unseen_ratio_over_groundtruth": unseen_ratio,
+        "covered_range_count": len(covered_ranges),
+        "padding_range_count": len(padding_ranges),
+        "unseen_contiguous_range_count": len(unseen_ranges),
+        "instruction_category_counts": dict(instruction_category_counts),
+        "range_category_counts": dict(range_category_counts),
+        "instruction_examples": instruction_examples,
+        "range_examples": range_examples,
+        "unseen_ranges": unseen_range_items,
+    }
+
+
+def write_gap_outputs(out_dir: Path, summary: Dict[str, object]) -> Tuple[Path, Path]:
+    ensure_dir(out_dir)
+    json_path = out_dir / "gap.summary.json"
+    txt_path = out_dir / "gap.summary.txt"
+    write_json(json_path, summary)
+
+    lines = [
+        f"binary: {summary['binary']}",
+        f"groundtruth: {summary['groundtruth']}",
+        f"blocks_pb2: {summary['blocks_pb2']}",
+        f"objdump_real_instruction_count: {summary['objdump_real_instruction_count']}",
+        f"groundtruth_instruction_count: {summary['groundtruth_instruction_count']}",
+        f"unseen_instruction_count: {summary['unseen_instruction_count']}",
+        "unseen_ratio_over_groundtruth: "
+        f"{float(summary['unseen_ratio_over_groundtruth']):.12f}",
+        f"unseen_contiguous_range_count: {summary['unseen_contiguous_range_count']}",
+        "",
+        "[instruction_category_counts]",
+    ]
+    for key, value in summary["instruction_category_counts"].items():
+        lines.append(f"{key}: {value}")
+    lines.append("")
+    lines.append("[range_category_counts]")
+    for key, value in summary["range_category_counts"].items():
+        lines.append(f"{key}: {value}")
+    write_text(txt_path, lines)
+    return json_path, txt_path
+
+
+def assess_cmp_payload(
+    payload: Dict[str, object],
+    *,
+    min_precision: float,
+    min_recall: float,
+) -> CmpVerdict:
+    reasons: List[str] = []
+    precision = float(payload.get("precision", 0.0))
+    recall = float(payload.get("recall", 0.0))
+
+    if precision < min_precision:
+        reasons.append(f"precision {precision:.6f} < {min_precision:.6f}")
+    if recall < min_recall:
+        reasons.append(f"recall {recall:.6f} < {min_recall:.6f}")
+    if precision < 0.20 and recall < 0.20:
+        reasons.append(
+            "metrics are catastrophically low; the .ll likely comes from a different binary build"
+        )
+    return CmpVerdict(ok=not reasons, reasons=reasons)
+
+
+def write_cmp_verdict(out_dir: Path, payload: Dict[str, object], verdict: CmpVerdict) -> Path:
+    verdict_path = out_dir / "cmp.verdict.txt"
+    lines = [
+        f"binary: {payload.get('binary', '')}",
+        f"ll: {payload.get('ll', '')}",
+        f"text_start: 0x{int(payload.get('text_start', 0)):x}",
+        f"runnable_base: 0x{int(payload.get('runnable_base', 0)):x}",
+        f"precision: {float(payload.get('precision', 0.0)):.6f}",
+        f"recall: {float(payload.get('recall', 0.0)):.6f}",
+        f"ok: {str(verdict.ok).lower()}",
+    ]
+    if verdict.reasons:
+        lines.append("")
+        lines.append("[reasons]")
+        lines.extend(verdict.reasons)
+    write_text(verdict_path, lines)
+    return verdict_path
+
+
+def run_cmp(
+    *,
+    binary: Path,
+    ll_path: Path,
+    out_dir: Path,
+    run_cmp_eval: Path,
+    text_start: int | None,
+    runnable_base: int,
+    min_precision: float,
+    min_recall: float,
+    examples: int,
+) -> Tuple[Dict[str, object], CmpVerdict, Path, Path, Path]:
+    ensure_file(binary, "binary")
+    ensure_file(ll_path, "ll")
+    ensure_file(run_cmp_eval, "run_cmp_eval.py")
+    ensure_dir(out_dir)
+
+    resolved_text_start = detect_text_start(binary) if text_start is None else text_start
+    json_out = out_dir / "cmp.json"
+    text_out = out_dir / "cmp.txt"
+    cmd = [
+        "python3",
+        str(run_cmp_eval),
+        "--binary",
+        str(binary),
+        "--ll",
+        str(ll_path),
+        "--text-start",
+        hex(resolved_text_start),
+        "--runnable-base",
+        hex(runnable_base),
+        "--examples",
+        str(examples),
+        "--json-out",
+        str(json_out),
+        "--text-out",
+        str(text_out),
+    ]
+    result = run_cmd(cmd, check=False)
+    if result.returncode != 0:
+        raise RuntimeError(
+            "run_cmp_eval.py failed:\n"
+            f"stdout:\n{result.stdout}\n"
+            f"stderr:\n{result.stderr}"
+        )
+    payload = read_json(json_out)
+    verdict = assess_cmp_payload(
+        payload,
+        min_precision=min_precision,
+        min_recall=min_recall,
+    )
+    verdict_path = write_cmp_verdict(out_dir, payload, verdict)
+    return payload, verdict, json_out, text_out, verdict_path
+
+
+def print_gap_summary(summary: Dict[str, object], txt_path: Path) -> None:
+    print(f"gap_summary={txt_path}")
+    print(
+        "gap_counts="
+        f"unseen={summary['unseen_instruction_count']} "
+        f"outside_gt_coverage={summary['instruction_category_counts'].get('outside_gt_coverage', 0)} "
+        f"padding={summary['instruction_category_counts'].get('padding', 0)}"
+    )
+
+
+def print_cmp_summary(
+    payload: Dict[str, object],
+    verdict: CmpVerdict,
+    verdict_path: Path,
+) -> None:
+    print(f"cmp_verdict={verdict_path}")
+    print(
+        "cmp_metrics="
+        f"precision={float(payload.get('precision', 0.0)):.6f} "
+        f"recall={float(payload.get('recall', 0.0)):.6f} "
+        f"text_start=0x{int(payload.get('text_start', 0)):x} "
+        f"runnable_base=0x{int(payload.get('runnable_base', 0)):x}"
+    )
+    if verdict.reasons:
+        for reason in verdict.reasons:
+            print(f"cmp_reason={reason}", file=sys.stderr)
+
+
+def add_shared_binary_args(parser: argparse.ArgumentParser) -> None:
+    parser.add_argument("--binary", type=Path, default=DEFAULT_BINARY)
+    parser.add_argument("--groundtruth", type=Path, default=None)
+    parser.add_argument("--blocks-pb2", type=Path, default=DEFAULT_VENDOR_BLOCKS_PB2)
+    parser.add_argument("--out-dir", type=Path, default=DEFAULT_OUT_DIR)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Validate the refreshed libcrypto ground truth and compare lift outputs."
+    )
+    subparsers = parser.add_subparsers(dest="command", required=True)
+
+    gap = subparsers.add_parser("gap-audit", help="Check binary .text addresses against gtBlock.pb")
+    add_shared_binary_args(gap)
+
+    cmp_parser = subparsers.add_parser("cmp", help="Run runnable-cmp-eval with safe defaults")
+    add_shared_binary_args(cmp_parser)
+    cmp_parser.add_argument("--ll", type=Path, required=True)
+    cmp_parser.add_argument("--run-cmp-eval", type=Path, default=DEFAULT_RUN_CMP_EVAL)
+    cmp_parser.add_argument("--text-start", default="elf")
+    cmp_parser.add_argument("--runnable-base", default=hex(DEFAULT_RUNNABLE_BASE))
+    cmp_parser.add_argument("--min-precision", type=float, default=DEFAULT_MIN_PRECISION)
+    cmp_parser.add_argument("--min-recall", type=float, default=DEFAULT_MIN_RECALL)
+    cmp_parser.add_argument("--examples", type=int, default=10)
+    cmp_parser.add_argument("--allow-low-metrics", action="store_true")
+
+    all_parser = subparsers.add_parser("all", help="Run gap audit and then compare a lift")
+    add_shared_binary_args(all_parser)
+    all_parser.add_argument("--ll", type=Path, required=True)
+    all_parser.add_argument("--run-cmp-eval", type=Path, default=DEFAULT_RUN_CMP_EVAL)
+    all_parser.add_argument("--text-start", default="elf")
+    all_parser.add_argument("--runnable-base", default=hex(DEFAULT_RUNNABLE_BASE))
+    all_parser.add_argument("--min-precision", type=float, default=DEFAULT_MIN_PRECISION)
+    all_parser.add_argument("--min-recall", type=float, default=DEFAULT_MIN_RECALL)
+    all_parser.add_argument("--examples", type=int, default=10)
+    all_parser.add_argument("--allow-low-metrics", action="store_true")
+    return parser
+
+
+def resolve_groundtruth(binary: Path, groundtruth: Path | None) -> Path:
+    return groundtruth.resolve() if groundtruth else resolve_default_groundtruth_path(binary.resolve())
+
+
+def parse_text_start_arg(value: str) -> int | None:
+    if value == "elf":
+        return None
+    return parse_int(value)
+
+
+def cmd_gap_audit(args: argparse.Namespace) -> int:
+    binary = args.binary.resolve()
+    groundtruth = resolve_groundtruth(binary, args.groundtruth)
+    blocks_pb2 = args.blocks_pb2.resolve()
+    ensure_file(binary, "binary")
+    ensure_file(groundtruth, "groundtruth")
+    ensure_file(blocks_pb2, "blocks_pb2")
+
+    summary = analyze_groundtruth_gap(binary, groundtruth, blocks_pb2)
+    _, txt_path = write_gap_outputs(args.out_dir.resolve(), summary)
+    print_gap_summary(summary, txt_path)
+    return 0
+
+
+def cmd_cmp(args: argparse.Namespace) -> int:
+    binary = args.binary.resolve()
+    groundtruth = resolve_groundtruth(binary, args.groundtruth)
+    ensure_file(groundtruth, "groundtruth")
+    payload, verdict, _, _, verdict_path = run_cmp(
+        binary=binary,
+        ll_path=args.ll.resolve(),
+        out_dir=args.out_dir.resolve(),
+        run_cmp_eval=args.run_cmp_eval.resolve(),
+        text_start=parse_text_start_arg(args.text_start),
+        runnable_base=parse_int(args.runnable_base),
+        min_precision=args.min_precision,
+        min_recall=args.min_recall,
+        examples=args.examples,
+    )
+    print_cmp_summary(payload, verdict, verdict_path)
+    if verdict.ok or args.allow_low_metrics:
+        return 0
+    return 3
+
+
+def cmd_all(args: argparse.Namespace) -> int:
+    gap_rc = cmd_gap_audit(args)
+    if gap_rc != 0:
+        return gap_rc
+    return cmd_cmp(args)
+
+
+def main(argv: Sequence[str] | None = None) -> int:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+    try:
+        if args.command == "gap-audit":
+            return cmd_gap_audit(args)
+        if args.command == "cmp":
+            return cmd_cmp(args)
+        if args.command == "all":
+            return cmd_all(args)
+    except Exception as exc:
+        print(str(exc), file=sys.stderr)
+        return 2
+    parser.error(f"unknown command: {args.command}")
+    return 2
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/tests/test_libcrypto_bench_paths.py b/tests/test_libcrypto_bench_paths.py
new file mode 100644
index 000000000..da0b7a9ab
--- /dev/null
+++ b/tests/test_libcrypto_bench_paths.py
@@ -0,0 +1,42 @@
+import importlib.util
+import sys
+import unittest
+from pathlib import Path
+
+
+SCRIPT_PATH = Path(__file__).resolve().parents[1] / "scripts" / "libcrypto_bench_paths.py"
+
+
+def load_module():
+    spec = importlib.util.spec_from_file_location("libcrypto_bench_paths", SCRIPT_PATH)
+    module = importlib.util.module_from_spec(spec)
+    assert spec.loader is not None
+    sys.modules[spec.name] = module
+    spec.loader.exec_module(module)
+    return module
+
+
+class LibcryptoBenchPathsTests(unittest.TestCase):
+    def test_parse_text_start_from_readelf_output(self):
+        module = load_module()
+        output = """
+  [11] .init             PROGBITS        00000000000ce5c0 0ce5c0 00001c 00  AX  0   0  4
+  [12] .plt              PROGBITS        00000000000ce5e0 0ce5e0 000990 10  AX  0   0 16
+  [13] .text             PROGBITS        00000000000cef80 0cef80 2e306e 00  AX  0   0 64
+"""
+
+        self.assertEqual(module.parse_text_start_from_readelf_output(output), 0xCEF80)
+
+    def test_parse_text_start_from_readelf_output_raises_when_missing(self):
+        module = load_module()
+
+        with self.assertRaises(RuntimeError):
+            module.parse_text_start_from_readelf_output("[ 1] .data PROGBITS 00000000 000000 000000 00 WA 0 0 1")
+
+    def test_build_parser_supports_text_start_kind(self):
+        module = load_module()
+        parser = module.build_parser()
+
+        args = parser.parse_args(["text-start"])
+
+        self.assertEqual(args.kind, "text-start")
diff --git a/tests/test_validate_libcrypto_ground_truth.py b/tests/test_validate_libcrypto_ground_truth.py
new file mode 100644
index 000000000..36608539d
--- /dev/null
+++ b/tests/test_validate_libcrypto_ground_truth.py
@@ -0,0 +1,103 @@
+import importlib.util
+import sys
+import tempfile
+import unittest
+from pathlib import Path
+
+
+SCRIPT_PATH = (
+    Path(__file__).resolve().parents[1] / "scripts" / "validate_libcrypto_ground_truth.py"
+)
+
+
+def load_module():
+    spec = importlib.util.spec_from_file_location(
+        "validate_libcrypto_ground_truth", SCRIPT_PATH
+    )
+    module = importlib.util.module_from_spec(spec)
+    assert spec.loader is not None
+    sys.modules[spec.name] = module
+    spec.loader.exec_module(module)
+    return module
+
+
+class ValidateLibcryptoGroundTruthTests(unittest.TestCase):
+    def test_resolve_default_groundtruth_prefers_libcrypto_gtblock_name(self):
+        module = load_module()
+        with tempfile.TemporaryDirectory() as tmpdir:
+            root = Path(tmpdir)
+            binary = root / "libcrypto.so.3"
+            binary.write_bytes(b"\x7fELF")
+            expected = root / "libcrypto.gtBlock.pb"
+            expected.write_bytes(b"pb")
+
+            resolved = module.resolve_default_groundtruth_path(binary)
+
+            self.assertEqual(resolved, expected)
+
+    def test_assess_cmp_payload_fails_low_metrics(self):
+        module = load_module()
+        payload = {
+            "binary": "/tmp/libcrypto.so.3",
+            "ll": "/tmp/libcrypto.so.3.ll",
+            "text_start": 0xCEF80,
+            "runnable_base": 0x50000000,
+            "obj_count": 679479,
+            "ll_count": 847589,
+            "hit": 24175,
+            "mismatch": 112625,
+            "obj_only": 542679,
+            "ll_only": 710789,
+            "false_negative": 655304,
+            "false_positive": 823414,
+            "precision": 0.028522,
+            "recall": 0.035579,
+        }
+
+        verdict = module.assess_cmp_payload(
+            payload,
+            min_precision=0.80,
+            min_recall=0.80,
+        )
+
+        self.assertFalse(verdict.ok)
+        self.assertTrue(
+            any("precision" in reason for reason in verdict.reasons),
+            verdict.reasons,
+        )
+        self.assertTrue(
+            any("recall" in reason for reason in verdict.reasons),
+            verdict.reasons,
+        )
+
+    def test_assess_cmp_payload_accepts_strong_metrics(self):
+        module = load_module()
+        payload = {
+            "binary": "/tmp/libcrypto.so.3",
+            "ll": "/tmp/libcrypto.so.3.ll",
+            "text_start": 0xCF000,
+            "runnable_base": 0x50000000,
+            "obj_count": 932819,
+            "ll_count": 847589,
+            "hit": 787556,
+            "mismatch": 1420,
+            "obj_only": 143843,
+            "ll_only": 58613,
+            "false_negative": 145263,
+            "false_positive": 60033,
+            "precision": 0.929172,
+            "recall": 0.844275,
+        }
+
+        verdict = module.assess_cmp_payload(
+            payload,
+            min_precision=0.80,
+            min_recall=0.80,
+        )
+
+        self.assertTrue(verdict.ok)
+        self.assertEqual(verdict.reasons, [])
+
+
+if __name__ == "__main__":
+    unittest.main()