codemap

Static codebase + binary analyzer, decompiler, and patcher. One binary, 644 actions, 18 source languages, PE/ELF/Mach-O/WASM decompilation to readable recompilable C on x86/x64, ARM64, RISC-V, and WebAssembly, sub-second cold-cache on 3K-file repos. No network, no servers, no databases, no API keys.

This README is your system prompt. Designed for AI agents: drop the entire file into your context (or fetch https://raw.githubusercontent.com/charleschenai/codemap/main/README.md) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see docs/HUMAN.md. Everyone else, keep reading.

Mission: Break down CODE (source + binary) so AI can replicate it.

What's new in v8.52 — semantic rewrite pipeline closed (644 actions)

codemap rewrite --function NAME --edit-c FILE --apply — decompile → recompile edited C → surgical patch (in-place NOP-pad, or code-cave + JMP trampoline when it grows) → replay original vs patched for bounded behavioral equivalence. Built on a recompilable-C foundation (decompiled C now compiles cleanly: every goto target labeled, code-addresses-as-values emitted as numeric literals). Plus a self-directed discovery engine (cross-pollinate) that mines codemap's own capability graph for novel primitive-fusion R&D directions, ranked by leverage × coherence.

Earlier — the 40-topic grind + v11 stack complete (644 actions)

Two full 20-topic roadmaps (THIRD + FOURTH) landed. THIRD-20 (all real, gate-verified): transplant, translate, fingerprint, hot-patch, api-shim, size-opt, multi-refactor, fuzz-harness, instrument, visual-docs, vuln-discover, protocol-rec, vectorize, ml-patch, jit-resolve, self-rewrite, gpu-lift, kernel-rewrite, mobile-fuse, os-map. FOURTH-20 mediums (real): self-bench, eval-suite, lasm, worm-defense, pear-fuzz, pqc-translate, ref-decompile.

FOURTH-20 deep tier — converted from honest skeletons to REAL in v8.22–v8.30. These were labeled honest skeletons at v8.21 (we don't fake); every one was then made to actually compute, each gated by a discriminator (an input that would expose a stub). They now produce real results, with bounded engines — precisely scoped, not the heavy external backends:

Action	What's real now	Honest bound (what it is NOT)
`sys-sim`	native x86-64 interpreter — executes real instructions	bounded subset, not a full-system/CPU emulator
`superset-decompile`	real every-offset superset decode + provenance + interval selection	—
`zk-attest`	in-tree SHA-256 + Merkle + Fiat-Shamir + verify (tamper → FAIL)	a commitment/attestation scheme, not a general zk-SNARK prover
`gpu-rewrite`	real PTX parse / transform / re-emit	PTX-level, not a full GPU recompiler for all ISAs
`prove-rewrite`	bounded symbolic-execution translation-validation → EQUIVALENT / NOT-EQ / INCONCLUSIVE	sound only to the symex bound, not a Coq/Lean machine-checked proof
`proof-patch`	discharges obligations via symex + taint → DISCHARGED / VIOLATED / OPEN with engine evidence	bounded discharge, not a full theorem prover
`meta-evolve`	persisted win-rate tuning computed from real run data	—
`self-improve-demo`	genuine measured delta from real bench runs (`--dry`, human-gated)	—
`llm-decompile`	pluggable LLM backend (off by default = deterministic/offline) + recompile-consistency gate (only accepts output that recompiles/verifies)	does not ship/claim an integrated LLM; you bring the backend

Still never a faked verified/proven claim — the bounds above are stated, not hidden. Run any of them yourself (codemap <action> <binary>) and check the output against this table.

What's new in v8.13 — the autonomous + verifiable engine

codemap is now an autonomous, self-improving, verifiable security engine. The decompiler covers five architectures end-to-end and the action arsenal composes into goal-driven, no-human loops.

Multi-arch decompiler COMPLETE. decompile/ir produce readable, recompilable C for x86, x64, ARM64 (incl NEON/SIMD), RISC-V (RV64GC incl compressed/M/A), and WebAssembly — all through the same lift → SSA → type/var recovery → SAILR structuring → C pipeline.

The Autonomous lane (new actions):

run — agentic mode: codemap run goal=<attack-surface|audit-crypto|modernize|harden> <bin> runs a deterministic, offline, no-LLM PLAN→ACT→OBSERVE→VERIFY→REPORT loop that composes existing actions into a DAG, threads one graph, is budget/step-capped, emits JSON, and only marks a finding fixed if the patch recompiles + re-validates.
learn — self-improving: records what-worked from each run into a project-brain store; run's planner consults it to tune the DAG over time. The loop is closed — planning improves with usage, no code changes.
redteam — autonomous offensive campaign (taint → symbolic → ranked PoC bundle + report).
infer-spec ... export=acsl|lean — machine-checkable proof export (Frama-C ACSL + Lean/Coq), so patches are provable, not just plausible.
provenance — signed, tamper-evident manifests for patched/twinned/hardened artifacts.
pqc-migrate — detect quantum-vulnerable crypto → apply NIST PQC (ML-KEM/ML-DSA/SLH-DSA) → equivalence note.
deobfuscate — the inverse of harden: de-flatten CFG, crack opaque predicates, decrypt strings via symbolic + graph.

Plus, across the roadmap: binary-twin (cleanroom fork), xlang-graph (cross-language call fusion), to-rust (C→idiomatic Rust), replay (record/replay + mutation), what-if (change-impact), firmware, sbom-flow, crypto-audit, model-extract, game-assets, brain-lock. 644 actions.

What's new in v8.4 — multi-arch decompiler + the three strategic arms

v8.4 pushes the v8.3 decompiler in two directions: multi-architecture (it now produces recompilable C for ARM64/AArch64, not just x86/x64) and the first increments of the three strategic arms that turn codemap from read-only intelligence into a full understand → reason → change platform.

v8.4.0 new actions (Phase-1+ across roadmap topics): project-brain (persistent project memory + git-history what-changed), infer-spec (formal pre/post/invariant inference → ACSL + Rust contracts, Daikon-style templates), c-diff (graph-aware decompiled-C diff with call-graph change propagation), ci (binary CI/CD attack-surface gate), vuln-backport (CVE patch → older-binary backport locator). ARM64 decompilation hardened: recursion, switch recovery, emit cleanup, recursive-call returns. 644 actions.

ARM64 / AArch64 decompiler. ARM64 Mach-O now disassembles (Capstone-backed Arm64Lifter; function sizing from LC_FUNCTION_STARTS) and lifts through the same IR pipeline as x86 — codemap ir <arm64-bin> <fn> emits readable C with recovered args (AAPCS64 x0–x7), real calls (recursion is a call, not an asm comment), and frame/sp modeling. --verify PASS on ARM64, not just x86: both arches decompile → recompile cleanly.
ir --verify — recompile gate, first-class. codemap ir <bin> <fn> --verify writes the emitted C to a temp file and runs a host C compiler on it, reporting PASS / FAIL — ground truth that the decompilation is recompilable, not just plausible. The backbone of codemap's verify-by-running discipline.
Arm 1 — Binary patching. bin-patch-fn: surgical, layout-preserving in-place function patching (canned stubs ret0/ret1/ret/nop or raw hex), fits-gated, verified by re-disassembly. Neutralize a check (bin-patch-fn ./app check_license ret1) without touching any other offset / reloc / string. (The decompile → edit-C → recompile → relink loop is the next increment.)
Arm 2 — Symbolic / concolic. concolic: an interval constraint solver over the SSA-IR branch guards (no SMT dependency) — per path it reports SAT (with a concrete register seed that drives execution down it), DEAD (contradictory guards → opaque-predicate / dead-code signal), or PARTIAL. Concrete concolic seeds in the default build.
Arm 3 — Dynamic bridge. trace-plan: uses the code-property graph to choose a selective instrumentation scope (entry, call sites, dangerous sinks, loop heads — not every instruction) and emits a ready-to-run, ABI-aware GDB script. Drive it with concolic seeds; ingest the trace with runtime-merge.
Graph fusion — cross-binary name recovery. name-recovery recovers a stripped binary's anonymous sub_<va> names by matching them (40-dim structural fingerprint, cosine, greedy 1:1) to named functions in a reference binary, fusing the recovered names into the graph. Exact on same-build; honest-partial across optimization levels.
Decompiler correctness sweep. Fixed multi-block argument recovery (args flowing across a loop/branch were emitted void with use-before-def; now seeded at the SSA entry from the calling convention), struct-field deref (p->x), and 2D-array index (m[i*cols+j]) — all now recompile.
Built for the AI-agent customer. agent-brief (one-page high-signal map of a codebase), search (relevance-ranked discovery across 644 actions), graph-export (Graphviz / Mermaid / Cytoscape JSON / interactive HTML). Plus human onboarding: cargo binstall, a Homebrew formula, and a docs/HUMAN.md quickstart.

What's new in v8.3 — the graph-fused decompiler

v8.3 (through 8.3.5) turns codemap's binary side into a real decompiler: lift → SSA → DCE/copy-prop → type & variable recovery → SAILR structuring → readable, recompilable C. It went from "finds 1 function in a stripped PE" to:

Full binary coverage. PE (x86/x64), ELF (x86/x64/ARM/AArch64), and Mach-O x86-64 — function discovery via PE .pdata RUNTIME_FUNCTION, ELF symbols/.eh_frame, and Mach-O LC_FUNCTION_STARTS.
Readable C reconstruction. Recovered structs (p->field with synthesized typedefs), arrays (a[i]), string literals (return "hello world"), float/XMM ABI params & returns (SysV + Win64), C++ virtual calls (obj->vfunc_0()), and clean control flow on -O2 (no goto-soup).
C++ exception recovery. Idiomatic try { … } catch (int e) { … } reconstructed from a stripped binary's .eh_frame + .gcc_except_table — including the caught type, demangled from the LSDA type table. Most decompilers drop the handler entirely or render it as goto-soup.
Correctness, not just readability. Fixed real silent mis-decompilations — array-index liveness (loops returned a[0]·n), dropped movzbl masks (x & 0xff → x) — caught and fixed via a re-execution gate.
Behaviorally verified. Every change is gated on a 79-binary recompilability corpus + a G10 re-execution harness (decompile → recompile → run → diff): recovered code is behavior-identical on the scalar subset, not just plausible-looking.
Graph-fused. Decompiled functions feed codemap's heterogeneous code-property graph, so its dataflow / taint / call-graph / centrality analyses run on stripped binaries, not just source.

What's new in v8

v8 cuts the v7 series at 7.184.0 (2026-05-18) and turns over to 8.0.0 (2026-05-20). Headline themes:

Action registry complete (T1). Every action self-registers via inventory::submit!; actions/mod.rs has zero dispatch arms (catch-all _ => Err(UnknownAction) only). Adding a new action is a single submit-block edit in the owning module file.
iced-x86 linear-sweep precision (T3). All bin_text_* density actions disassemble via iced-x86 instead of raw byte-scans — eliminates instruction-boundary false positives.
Lint zero (T8). #![deny(warnings)] locked into codemap-core and codemap-cli; cargo clippy -- -D warnings ships at 0 warnings.
arXiv research: filter scaffolds, ship real work (T9). pointer-analysis (Andersen field-sensitive PA + Tarjan SCC) and cegio (rsmt2-driven SMT) shipped with real implementations. bin-taint shipped Phase A (CFG, intra/inter-procedural taint, PLT-resolved source/sink, pathfinding, stripped-binary fallback). 16 items removed in v8.2.0 cleanup: 13 skeleton scaffolds (symex-concolic, loop-polyhedral, detect-memory-corruption, neural-decompile, side-channel-detect, symex-speculative, gpu-analyze, semantic-slice, synthesize, abstract-interp, bin-search, patch-binary, natural-query) + 3 failed experiments (meta-path-ppr proof +0.0000 lift, rfmoe 3/8 FAIL, ising-landscape proof pending) — all 59–145 LOC with no proof reports or integration tests.
16 Phase F actions multi-corpus replicated: transfer-entropy, hebbian-coupling, kl-drift, network-motifs, code-entropy, criticality-soc, fatigue-crack, bio-physarum, preferential-attachment, small-world, phase-transitions, lyapunov-tracker, universality-class, lattice-evidence, control-theory-pid-ci-cd, codemap-mcp.

644 actions registered (full index in docs/ACTION_CATALOG.md; generated from the registry by gen-action-docs and gated by tests/single_source_of_truth.rs). 236 bin-* parsers, 18 source-language tree-sitter parsers, 1614 lib tests, decompiler recompile 96.5% (self-bench real, 193/200), 0 clippy warnings.

When to reach for codemap

Problem	Codemap action	Why codemap (vs alternatives)
"What does this codebase do?"	`summary --dir <path>`	Cross-file structural overview in one call. Beats reading files.
"Find unused functions / dead code"	`dead-functions --dir <path>`	Call-graph reachability across modules. grep can't do this.
"Who calls function X?"	`callers --dir <path> X`	True call graph (AST-aware), not a string match.
"What does function X depend on (transitively)?"	`trace --dir <path> X`	Walks the dep graph. grep would only find direct refs.
"What changed between two commits?"	`diff --dir <path> <ref1> <ref2>`	Semantic diff, not line diff.
"Find security issues"	`audit --dir <path>`	Composite of taint + secret-scan + dep-tree + dead-deps.
"Where would a tainted input flow?"	`taint --dir <path> --source <fn> --sink <fn>`	Path-sensitive, sanitizer-aware, alias-aware, cross-procedural.
"Reverse-engineer a binary"	`bin-info <path/to/binary>`	PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in.
"Find cross-language coupling"	`cross-lang --dir <path>`	Imports/calls that cross language boundaries.

When NOT to reach for codemap

Editing files: codemap is read-only. Use Edit/Write directly.
Running code: codemap doesn't compile or exec. Use bash.
Live process state: codemap is static. Use ps, lsof, ss.
Single-file grep: if you know the file, grep is faster.
String search across few files: if N<5 files, just grep.

Install

From release (recommended)

Download the tarball for your platform and extract the binary:

# Linux x86_64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-x86_64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap

# Linux aarch64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-aarch64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap

# macOS (add to PATH if needed)
export PATH="$HOME/.local/bin:$PATH"

Add $HOME/.local/bin to your PATH in ~/.bashrc or ~/.zshrc:

export PATH="$HOME/.local/bin:$PATH"

For system-wide install (/usr/local/bin/codemap):

sudo cp codemap /usr/local/bin/
sudo chmod +x /usr/local/bin/codemap

From source

git clone https://github.com/charleschenai/codemap && cd codemap
cargo build --release -p codemap-cli
cp target/release/codemap ~/.local/bin/codemap
chmod +x ~/.local/bin/codemap

Verify

codemap --version-detail

Prints:

codemap 8.2.0
git: <latest-sha>
built: <build-date>
host: <hostname>/<arch>

If the binary is older than expected, re-run install with --update.

How to call any action

Universal shape:

codemap <ACTION> [TARGET...] --dir <PATH> [--json] [--quiet] [other-flags]

Flag	Purpose
`--dir <PATH>`	Required. Repo/dir to scan. Repeatable for multi-repo.
`--json`	Output JSON (parseable). Default is text (human-readable).
`--quiet`	Suppress scan/cache status messages on stderr.
`--no-cache`	Force re-scan, ignore `.codemap/cache.bincode`.
`--include-path <PATH>`	C/C++ include search path.
`--watch [SECS]`	Re-run every N seconds.

For agents: always use --json and --quiet unless you specifically want text output.

Discover actions

codemap --help                                       # full action list
codemap <action> --help                              # action-specific flags

Action categories

644 actions (a curated subset advertised in --help, 236 fine-grained bin-* parsers, plus the rest) grouped by purpose. Full catalog at docs/ACTION_CATALOG.md. High-level groups:

Category	Action count	Examples
Analysis	~20	`summary`, `stats`, `trace`, `callers`, `hotspots`, `layers`, `health`, `decorators`
Code intelligence	~30	`complexity`, `import-cost`, `churn`, `api-diff`, `clones`, `entry-points`, `dead-functions`
Dataflow / security	~16	`data-flow`, `taint`, `bin-taint`, `slice`, `trace-value`, `sinks`, `secret-scan`, `audit`, `dep-tree`
Graph theory	~40	`pagerank`, `hubs`, `bridges`, `centrality` (17 measures), `community` (Leiden), `bellman-ford`
Binary / RE	~235	`elf-info`, `pe-imports`, `macho-info`, `bin-anti-debug`, `bin-disasm`, `bin-strings`, `bin-relocs`
Schemas	~10	`proto-schema`, `openapi-schema`, `graphql-schema`, `sql-extract`, `dbf-schema`
Supply chain	~10	`osv-scan`, `sbom-diff`, `license-check`, `cve-scan`
Config-as-code	~10	`k8s-scan`, `iac-scan`, `dockerfile-scan`, `ci-scan`, `oci-scan`
ML / AI	~10	`gguf-info`, `safetensors-info`, `onnx-info`, `cuda-info`, `pyc-info`
LSP bridge	~5	`lsp-symbols`, `lsp-references`, `lsp-calls`, `lsp-diagnostics`, `lsp-types`
Web	~5	`web-sitemap`, `js-api-extract` (HAR/HTML input required)
Cross-language	~5	`lang-bridges`, `gpu-functions`, `monkey-patches`
Composite	~10	`audit`, `compare`, `validate`, `changeset`, `handoff`, `pipeline`
arXiv-derived	2	`pointer-analysis` (Andersen PA), `cegio` (SMT optimizer)

Output schema

All --json outputs follow:

{
  "ok": <boolean>,
  "action": "<action-name>",
  "dir": "<scanned-path>",
  "result": <action-specific>,
  "stats": { "files_scanned": N, "duration_ms": M, "cache_hits": K }
}

result shape varies per action. Action-specific schemas in docs/SCHEMAS.md.

Exit codes

Code	Meaning	Agent response
0	Success	Parse `--json` output
1	Usage error (bad flag, missing --dir)	Re-read `--help`, fix args, retry
2	I/O error (path not found, no read perm)	Verify path, retry
101	Panic	Do not retry. File a bug at https://github.com/charleschenai/codemap/issues

Other non-zero codes: action-specific. See <action> --help.

AI agent usage guide

codemap is designed for AI agents as its primary customer. Below is the canonical walkthrough for integrating codemap into agent workflows.

Why use codemap instead of grep/read?

Scenario	grep / raw edits	codemap
"What does this codebase do?"	Read every file sequentially	`summary` — structural overview in one call
"Find dead / unused code"	Manual reachability tracing	`dead-functions` — true call-graph reachability
"Who calls function X?"	String match across files	`callers` — AST-aware call graph
"What does function X depend on?"	Direct import grep	`trace` — transitive dep graph walk
"What changed between two commits?"	Line-level diff	`diff` — semantic diff (AST-aware)
"Find security issues"	YARA / pattern match	`audit` — composite: taint + secret-scan + dep-tree + dead-deps
"Where does tainted input flow?"	No tool	`taint` — path-sensitive, sanitizer-aware, cross-procedural
"Analyze a compiled binary"	`strings` + `hexdump` + manual	`bin-info` + `bin-taint` — PE/ELF/Mach-O parsers + taint analysis
"Graph metrics on code"	Custom scripts	500+ built-in actions (graph theory, entropy, ML, physics-inspired)

codemap is read-only, no network, no servers, no databases, no API keys. It scans your local filesystem, builds ASTs + CFGs + graphs in memory, and returns structured JSON output.

Canonical call pattern

Every action follows this pattern:

codemap <ACTION> [TARGET] --dir <PATH> --json --quiet [OPTIONS]

Flag	Purpose
`--json`	JSON output (machine-readable)
`--quiet`	Suppress progress bars and logs
`--dir`	Directory to analyze (required)

Output schema (for actions that return results):

{
  "ok": true,
  "result": { ... },
  "metrics": {
    "time_ms": 42,
    "files_scanned": 1501,
    "edges": 100219
  }
}

On failure:

{
  "ok": false,
  "error": "error message"
}

Exit codes:

0 — success
1 — error (check --json output for details)

Worked examples

Example 1: "What does this repo do?"

codemap summary --json --quiet --dir ./project
# → Cross-file structural overview: top-level modules, key dependencies, entry points

Example 2: "Find unused functions"

codemap dead-functions --json --quiet --dir ./project
# → Functions with zero callers across the module graph. Includes call-chain depth.

Example 3: "Security audit"

codemap audit --json --quiet --dir ./project
# → Composite: taint analysis + secret detection + dependency tree + dead deps
#   Returns findings ranked by confidence with source→sink paths

Example 4: "Taint analysis — find injection paths"

codemap taint --json --quiet --dir ./project --source read --sink system
# → Path-sensitive taint from `read` to `system` with confidence scoring
#   Reports ranked source→sink paths with alias resolution

Example 5: "Binary analysis — what is this executable?"

codemap bin-info --json --quiet ./target/release/my-binary
# → PE/ELF/Mach-O parser: sections, imports, exports, symbols,
#   capa-rules detection, YARA signatures, anti-debug indicators

MCP: the recommended adoption path

For agents that use MCP-compatible clients (Claude Code, Cursor, Windsurf), add codemap as an MCP tool server. All 644 actions become available as MCP tools with proper input schemas:

// ~/.claude/settings.json
{
  "mcpServers": {
    "codemap": {
      "command": "python3",
      "args": ["/path/to/codemap/docs/codemap-mcp-server.py"]
    }
  }
}

This is the recommended path because:

No CLI parsing needed — tools have structured input schemas
Self-documenting — tools/list returns every action name, description, and schema
Executable via JSON-RPC — tools/call with {name, arguments} dispatches any action
Zero config for AI — the agent discovers capabilities automatically

Set CODEMAP_BIN if your codemap binary is not on PATH:

export CODEMAP_BIN=~/.local/bin/codemap

Environment variables

Variable	Purpose	Default
`CODEMAP_BIN`	Path to codemap binary	`codemap` (from PATH)
`CODEMAP_CACHE`	Custom cache directory	`.codemap/cache.bincode` (next to scanned dir)

Error handling

Always check --json output for error details:

result=$(codemap <ACTION> --json --quiet --dir ./project)
if echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d['ok'] else 1)"; then
  echo "Success: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['result'])")"
else
  echo "Error: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['error'])")"
fi

Performance notes

Cold cache: sub-second on repos up to 3K files
Warm cache: near-instant (reads .codemap/cache.bincode)
Large repos (10K+ files): 5-30 seconds for full analysis
All analysis is in-memory. No disk writes except the cache file.
No network calls during analysis.

Recipes — when the agent has a specific job to do

Each recipe: what the action does → command → sample output → when to use it.

For the complete flat list of action names see docs/ACTION_CATALOG.md.

Codebase understanding (first-look on an unknown repo)

`summary` — one-page structural overview

Reports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.

$ codemap summary --dir ./my-repo --json --quiet
{"ok":true,"result":{"files":2824,"languages":["rust","python","typescript"],
  "entry_points":["src/main.rs","src/lib.rs"],"top_modules":["analysis","insights","cpg"]}}

Use when: new repo, "tell me what this does" before diving deeper.

`stats` — quantitative metrics

Per-language LOC + file counts, function/class density, fan-in/fan-out distribution.

$ codemap stats --dir ./my-repo --json --quiet
{"ok":true,"result":{"rust":{"files":341,"loc":89432,"fns":2104},"python":{"files":52,"loc":4108}}}

Use when: comparing repos by size, reporting metrics, sanity-checking parse coverage.

`layers` — architectural layer detection

Infers boundaries (web / service / data / infra) from import patterns + naming conventions.

$ codemap layers --dir ./my-repo --json --quiet
{"ok":true,"result":{"layers":[{"name":"web","modules":["routes","handlers"]},
  {"name":"data","modules":["models","repo"]}],"violations":[...]}}

Use when: validating that "web shouldn't import from data" type architectural rules hold.

`hotspots` — files with most churn × complexity

Surfaces "danger zone" code (high git churn + high cyclomatic complexity).

$ codemap hotspots --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"hotspots":[{"file":"src/parser.rs","churn":48,"complexity":92,"score":4416}]}}

Use when: prioritizing refactor work, finding "where bugs live."

`entry-points` — public API surface

Lists exported functions/classes that other code can call from outside.

$ codemap entry-points --dir ./my-repo --json --quiet
{"ok":true,"result":{"entries":[{"name":"create_user","file":"api/users.rs","kind":"public_fn"}]}}

Use when: API documentation, understanding what's a stable contract.

`health` — overall quality summary

Composite: dead code % + clippy/lint count + circular deps + missing tests. Single "is this repo healthy?" score.

$ codemap health --dir ./my-repo --json --quiet
{"ok":true,"result":{"score":78,"dead_code_pct":3.2,"circular_deps":2,"missing_tests":["api/users.rs::delete"]}}

Use when: quick "should we touch this codebase or not" gut-check.

Code quality & cleanup

`dead-functions` — unreachable code

Functions never called by any other function in the workspace.

$ codemap dead-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":[{"file":"src/old.rs","function":"legacy_helper","line":42}]}}

Use when: cleanup PR, removing tech debt. Don't use for: identifying entry points (they're "dead" by call-graph but intentionally public).

`dead-files` — files imported nowhere

Files no other file imports / uses.

$ codemap dead-files --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead_files":["src/experimental/old_impl.rs","tools/debug.py"]}}

Use when: dead-import cleanup.

`dead-deps` — declared deps never imported

Packages in Cargo.toml/package.json/pyproject.toml that no source file imports.

$ codemap dead-deps --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":["serde_json (Cargo.toml)","lodash (package.json)"]}}

Use when: dep cleanup, reducing build time + attack surface.

`complexity` — cyclomatic complexity per function

McCabe complexity (branches+1). Catches "this function should be split."

$ codemap complexity --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"fn":"parse_expression","file":"parser.rs","cyclomatic":34,"lines":280}]}}

Use when: finding refactor candidates, code review automation.

`churn` — git change frequency per file

Commits-touching-file count over a window.

$ codemap churn --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"file":"src/parser.rs","commits":78,"authors":12}]}}

Use when: combined with complexity for hotspots, ownership analysis.

`clones` — duplicated code blocks

Detects near-identical token sequences across files (copy-paste detection).

$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50
{"ok":true,"result":{"clones":[{"size":120,"locations":[["a.rs:14","b.rs:22"]],"similarity":0.94}]}}

Use when: finding extraction candidates for shared functions.

`circular` — circular import detection

Reports module cycles (a → b → c → a).

$ codemap circular --dir ./my-repo --json --quiet
{"ok":true,"result":{"cycles":[["src/a.rs","src/b.rs","src/a.rs"]]}}

Use when: untangling architecture before a refactor.

Impact tracing & change analysis

`trace` — transitive callees (what does X depend on?)

Walks the call graph forward from a function/symbol, returns full dep tree.

$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals
{"ok":true,"result":{"node":"RecalcInvoiceTotals","calls":[
  {"name":"ship_chg_sum","file":"backend/invoices.go:120","depth":1},
  {"name":"format_money","file":"util/money.go:8","depth":2}]}}

Use when: impact analysis before changing a function, generating context for an LLM.

`callers` — transitive callers (who calls X?)

Reverse of trace. Returns the function's call sites + their callers.

$ codemap callers --dir ./my-repo --json --quiet validate_user
{"ok":true,"result":{"callers":[{"caller":"login","file":"auth.py:88","depth":1}]}}

Use when: "if I change this signature, what breaks?"

`blast-radius` — affected entities from a change

Combines callers + dataflow + tests touched. Most pessimistic estimate.

$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id
{"ok":true,"result":{"functions":42,"tests":7,"endpoints":3,"db_columns":2}}

Use when: "what's the size of changing this thing?"

`diff` — semantic diff between two refs

Function-level diff: added, removed, signature-changed, body-changed.

$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"added":["validate_email"],"removed":["old_validator"],
  "signature_changed":[{"fn":"create","before":"(name)","after":"(name,email)"}]}}

Use when: generating PR descriptions, understanding code review scope.

`api-diff` — breaking-change classifier

Like diff but specifically flags BREAKING vs additive changes to public API.

$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"breaking":[
  {"kind":"removed","fn":"OldAPI::v1_login"},
  {"kind":"signature_change","fn":"create_user","before":"(name)","after":"(name,email)"}]}}

Use when: versioning decisions (semver minor vs major), CHANGELOG generation.

`diff-impact` — functions affected by a commit range

Maps the diff to every transitively-affected caller.

$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"impacted_fns":127,"impacted_files":34,"high_risk":["payment::charge"]}}

Use when: deciding test scope for a PR.

`churn-vs-complexity` (via `hotspots`) — see Codebase understanding above

Data flow & security

`audit` — composite security report

Runs taint + secret-scan + dead-deps + dep-tree + license-check in one pass.

$ codemap audit --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[
  {"kind":"secret","file":".env.sample","line":3,"pattern":"AWS_KEY"},
  {"kind":"taint","source":"req.body","sink":"db.execute","path":[...]},
  {"kind":"dep-vuln","package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

Use when: first-pass security review of an unfamiliar repo.

`taint` — path-sensitive taint flow

Tracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. safe = sanitize(x)), cross-procedural (parses wrapper bodies to detect hidden sanitizers).

$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'
{"ok":true,"result":{"paths":[{"source":"req.query.id","sink":"db.execute(sql)",
  "hops":["params.id","userId","query"],"sanitized":false}]}}

Use when: SQLi/XSS/SSRF detection, "is user input reaching this sink?"

`slice` — backward program slice

Given a target variable/sink, return only the code that influences it.

$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py
{"ok":true,"result":{"slice_lines":[12,15,22,30,42],"file":"auth.py"}}

Use when: narrowing what to read when chasing a bug.

`sinks` — list all dangerous sinks

Enumerates every db.execute, eval, exec, Runtime.exec, subprocess.shell=True, innerHTML=, etc.

$ codemap sinks --dir ./my-repo --json --quiet
{"ok":true,"result":{"sinks":[{"kind":"sql","file":"api/users.rs","line":88,"expr":"db.execute(query)"}]}}

Use when: building taint queries, audit checklist generation.

`secret-scan` — credentials in source

20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.

$ codemap secret-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":".env.sample","line":3,"kind":"aws_access_key","masked":"AKIA****REDACTED"}]}}

Use when: pre-commit hook, pre-publish audit.

`data-flow` — value origin tracing

Where does this variable's value come from? (def-use chain)

$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'
{"ok":true,"result":{"origins":[{"file":"auth.py:88","expr":"req.cookies['session']"}]}}

Use when: "where does this magic value come from?"

`api-surface` — every exported HTTP endpoint

Detects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.

$ codemap api-surface --dir ./my-repo --json --quiet
{"ok":true,"result":{"endpoints":[{"method":"POST","path":"/users","handler":"create_user","auth_required":false}]}}

Use when: generating OpenAPI from existing code, finding unauthenticated endpoints.

Graph algorithms (heterogeneous-graph queries)

These run on codemap's internal call graph + import graph + AST graph.

`pagerank` — most-important nodes

NetworkX-style PageRank. High score = central + many incoming refs.

$ codemap pagerank --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"ranked":[{"fn":"handle_request","score":0.082}]}}

Use when: finding "load-bearing" functions, prioritizing code review.

`hubs` — high-out-degree nodes

Functions/modules that depend on many others. Different from PageRank (which is about incoming).

$ codemap hubs --dir ./my-repo --json --quiet
{"ok":true,"result":{"hubs":[{"fn":"orchestrator","out_degree":47}]}}

Use when: finding god-objects, refactor targets.

`bridges` — single-edge cut points

Edges whose removal disconnects the graph. These are critical paths.

$ codemap bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"from":"auth","to":"db","modules":["auth.rs","db.rs"]}]}}

Use when: identifying single points of failure in module coupling.

`centrality` (17 measures) — broker / connector detection

Run with a specific measure: betweenness, eigenvector, katz, closeness, harmonic, load, structural-holes (brokers), voterank, etc. All NetworkX standards.

$ codemap betweenness --dir ./my-repo --json --quiet --top 5
{"ok":true,"result":{"top":[{"node":"db_session","betweenness":0.34}]}}

Use when: finding modules that connect otherwise-separate subsystems.

`clusters` — community detection (Leiden default)

Partitions the graph into densely-connected sub-communities.

$ codemap clusters --dir ./my-repo --json --quiet leiden
{"ok":true,"result":{"clusters":[{"id":0,"size":34,"members":["auth.rs","users.rs"]}]}}

Use when: discovering implicit module boundaries.

`paths` — shortest path between two nodes

Returns the chain of imports/calls connecting source → target.

$ codemap paths --dir ./my-repo --json --quiet user_input db_write
{"ok":true,"result":{"path":["user_input","sanitize","query_builder","db_write"],"length":4}}

Use when: "how does X reach Y?"

`subgraph` — extract a focused subgraph

Returns nodes within N hops of a target. Useful before deep analysis.

$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2
{"ok":true,"result":{"nodes":[...],"edges":[...]}}

Use when: narrowing scope before more expensive analysis.

`bellman-ford <src>` / `astar <src> <tgt>` / `floyd-warshall` / etc.

Classical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.

Binary analysis & reverse engineering

Decompiler (`ir` / `decompile`) — full lift → SSA → simplify → type-recovery → variable-recovery → calling-convention → SAILR structuring → C++ RTTI → readable-C emit pipeline

This is a real decompiler. 14-stage pipeline that reconstructs expressions, variables, types, and if / while / switch syntax (incl. jump tables / computed branches / string-literal returns) from compiled binaries. Full G10 fidelity (10/10) + 79/79 protected-bin decomp test pass (bugbins-verify + reexec_harness) with switch_dispatch special-case recovery (const char* + "zero".."seven"/"unknown" map, a1 scrutinee, correct default VA); see CHANGELOG + docs/COMMIT_LEDGER.md for G10 fixes + Job 3 consolidation + GAP3-6 (F-4 -O2 dangling-goto/continue, C++ vcall via rtti, XMM/float ABI + libc-extern recomp fix, Mach-O x86-64 thin+FAT) + GAP9 (no more rsp/rbp/rbx/r12-r15 "lifter gap =0" noise decls in every fn; frame uses elided to 0) + GAP8 (struct field recovery: ptr->field_0xN with synthesized typedefs for recompile) + GAP7 Part A (array element type from access width: int32_t* for 4-byte loads). Emitted C is gcc-recompilable (current 79/79 state supersedes earlier ~48/60 notes). Cross-binary type propagation + RTTI + stack slots + confidence scores. Mach-O x86-64 support (function discovery via LC_FUNCTION_STARTS + symtab + sections; feeds iced-x86/IR).

Known limitation (gap 11, deferred): Array indexing inside loops can decompile with an incorrect (use-before-def) index (e.g. ghost reg instead of loop counter v), producing behaviorally-wrong recompiled output (sum may return a[0]*n instead of 10); element type is correct. Root: copy-prop drops the index register's def on register reuse inside the loop. Tracked as gap 11.

Remaining gaps documented in DECOMPILER.md. (New direction: user-driven decompiler quality per Ghidra issues etc.)

# Decompile a single function (full pipeline)
codemap ir <binary> [<hex-fn-addr> | <name>]

# Decompile entry point
codemap ir <binary>

# Batch call-tree walk with structural hints
codemap decompile <binary> [max-depth=N] [max-children=N] [deep]

Pipeline stages:

Lift — iced-x86 decode → IRCFG (three-address IR with explicit BitWidth)
SSA construction — Cytron et al. (1991): iterated-dominance-frontier phi placement + pre-order DFS renaming
Simplify — 42 peephole rules (Miasm / angr reference-FIRST): constant folding, identity elimination, SSA-aware simplification, signed-div-by-power-of-2, ROL/ROR detection, byte-swap, etc.
Calling-convention recovery — SysV AMD64 ABI: populate Call.args from rdi/rsi/rdx/rcx/r8/r9
Dead-code elimination — backward dataflow liveness (~80% flag computations pruned)
Copy/constant propagation — 4 alternating iterations of copy-prop + simplify + DCE
Dead-block removal — reachability from entry; prunes linker padding
Block coalescing — merge linear Goto-chains
SAILR structuring — CFG + IRCFG → C-shaped AST (Sequence / IfThen / IfThenElse / While / For / Switch / Call / Goto)
Variable recovery — classifies variables: Register, Stack, Memory, Temporary, Constant
Type inference — Phase 2 seeded from widths + Mem-loads/Stores; iterated-meet solver infers Int / Pointer / struct types
Stack-slot analysis — rsp-relative offsets for *(rsp_N) → stack[<offset>]
C++ RTTI analysis — vtable references → class declarations (base classes, virtual methods, fields)
C emission — structured AST → readable C source with type annotations, stack-slot names, symbol resolution

Differentiators:

Cross-binary type/name propagation — types from one binary's RTTI flow into another's
Graph-as-validator — heterogeneous code graph cross-checks decompilation output
Recompilable-C target — structured, typed, symbol-resolved C suitable for recompilation

Example output:

=== codemap ir ===
Binary:        ./target/release/codemap
Format:        ELF64 (64-bit, arch=x64)
Function:      main @ 0x401000 (234 bytes, 78 insns)
CFG blocks:    12
CFG edges:     18 (pre-enrich) → 18 (post-enrich)
Jump tables:   0 resolved indirect-JMPs
SSA phis:      3 inserted
Variables:     45 total (12 reg, 20 stack[-0x10..+0x18], 10 mem, 3 const, 0 tmp)
Types:         30 bound (15 int, 10 ptr, 3 top, 2 bot, 0 other)
CC args:       5 call sites populated (SysV AMD64)
DCE removed:   62 dead stmts (pre-prop) + 8 (post-prop)
Copy-prop:     15 stmts inlined
Dead blocks:   2 removed (unreachable)
Coalesced:     4 blocks merged

--- structured AST ---
Sequence {
  Let { rbp_0 = rbp }
  Let { rsp_0 = (rsp - 0x10) }
  IfThen {
    Cond: (rax_0 == 0)
    Then: Sequence { Call { printf("usage\n") } }
  }
  While {
    Cond: (argc_0 > 0)
    Body: Sequence { ... }
  }
  Ret { rax_0 }
}

--- C-shaped output ---
int main(int argc, char *argv[]) {
    uint64_t rbp_0 = rbp;
    uint64_t rsp_0 = (rsp - 0x10);

    if (rax_0 == 0) {
        printf("usage\n");
    }

    while (argc_0 > 0) {
        // ... loop body ...
        argc_0 = argc_0 - 1;
    }

    return rax_0;
}

Use when: binary reverse engineering, understanding compiled code, patch generation, static analysis of binaries. See docs/DECOMPILER.md for full pipeline reference.

`bin-info` / `elf-info` / `macho-info` / `pe-info` — binary fingerprint

Format detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.

$ codemap bin-info /usr/local/bin/codemap --json --quiet
{"ok":true,"result":{"format":"ELF64","arch":"aarch64","rust":true,"strip":false,
  "sections":34,"anti_debug":[],"packed":false}}

Use when: triage step 1 — "what is this binary?"

`pe-imports` / `pe-exports` — Windows PE import/export tables

Lists every DLL imported + every function exported.

$ codemap pe-imports ./sample.exe --json --quiet
{"ok":true,"result":{"imports":[{"dll":"kernel32.dll","functions":["VirtualAlloc","CreateProcessA"]}]}}

Use when: static behavioral profiling — what APIs does this binary depend on?

`pe-strings` / `bin-strings` — string extraction

Ascii + utf16le + entropy-filtered.

$ codemap pe-strings ./sample.exe --json --quiet --min-len 8
{"ok":true,"result":{"strings":["http://c2.example.com","cmd.exe /c"]}}

Use when: triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.

`binary-diff` — semantic binary diff

Functions added / removed / modified between two builds.

$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe
{"ok":true,"result":{"added":["new_handler"],"removed":["legacy_proc"],"modified":["main"]}}

Use when: patch analysis, regression hunting in firmware.

`dotnet-meta` — .NET assembly metadata

PE that contains CLI/.NET — reads the metadata streams, lists types + methods.

$ codemap dotnet-meta ./sample.dll --json --quiet
{"ok":true,"result":{"assembly":"Sample.Dll","types":["Foo","Bar"],"methods_count":42}}

Use when: analyzing .NET malware or .NET 3rd-party libs.

`java-class` — JVM class file

Constant pool, method signatures, bytecode summaries.

`wasm-info` — WebAssembly module

Imports, exports, function table, memory layout.

Schemas & config-as-code

`openapi-schema` / `graphql-schema` / `proto-schema` — extract API schemas

Parses spec files and reports endpoints/types/operations.

$ codemap openapi-schema --dir ./api --json --quiet
{"ok":true,"result":{"paths":[{"method":"GET","path":"/users","operationId":"listUsers"}]}}

Use when: generating client code, checking spec consistency.

`k8s-scan` — Kubernetes CIS audit (16 rules)

Checks privileged containers, hostNetwork, missing resource limits, etc.

$ codemap k8s-scan --dir ./k8s/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"K8S-001","resource":"Deployment/api","severity":"high","msg":"privileged=true"}]}}

Use when: auditing manifests before apply.

`iac-scan` — Terraform/CloudFormation/Pulumi audit (12 rules)

$ codemap iac-scan --dir ./infra/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"IAC-007","file":"main.tf","msg":"S3 bucket public-read ACL"}]}}

`dockerfile-scan` — Dockerfile audit (10 rules)

$ codemap dockerfile-scan --dir ./ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"DKR-002","msg":"running as root","line":18}]}}

`ci-scan` — CI/CD pipeline audit (37 rules across 6 ecosystems)

GitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, pull_request_target misuse.

$ codemap ci-scan --dir ./.github/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"GH-003","file":"deploy.yml","msg":"unpinned action ref"}]}}

`oci-scan` — OCI image / docker save tarball audit

Per-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.

$ codemap oci-scan --dir ./image.tar --json --quiet --mode all
{"ok":true,"result":{"layers":[...],"secrets":[...],"licenses":[...]}}

`sql-extract` — SQL DDL/DML extraction

Pulls SQL out of source code or .sql files. Schema + queries.

$ codemap sql-extract --dir ./my-repo --json --quiet
{"ok":true,"result":{"tables":[{"name":"users","columns":[...]}],"queries":[...]}}

Supply chain

`osv-scan` — match deps against OSV.dev advisories (offline)

Semver-range-aware.

$ codemap osv-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"vulns":[{"package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

`sbom-diff` — CycloneDX/SPDX diff

Added, removed, upgraded, downgraded packages between two SBOMs.

$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet
{"ok":true,"result":{"added":[...],"removed":[...],"upgraded":[...]}}

`license-check` — SPDX compatibility

Per-package license + compatibility verdict.

$ codemap license-check --dir ./my-repo --json --quiet
{"ok":true,"result":{"deps":[{"name":"foo","license":"GPL-3.0","compatible":false}]}}

`cve-scan` — same as osv-scan but specifically against MITRE CVE corpus

ML / AI model files

`gguf-info` — llama.cpp GGUF inspection

Architecture, layer count, head count, quant level, vocab size.

$ codemap gguf-info ./model.gguf --json --quiet
{"ok":true,"result":{"arch":"llama","n_layers":32,"n_heads":32,"vocab_size":32000,"quant":"Q4_K_M"}}

Use when: "what model is this file?" Pre-load sanity check.

`safetensors-info` — HuggingFace safetensors inspection

Tensor shapes, dtypes, total params.

$ codemap safetensors-info ./model.safetensors --json --quiet
{"ok":true,"result":{"tensors":291,"total_params":7240000000,"dtype":"float16"}}

`onnx-info` — ONNX model graph

Operators, inputs, outputs, opset.

$ codemap onnx-info ./model.onnx --json --quiet
{"ok":true,"result":{"opset":17,"ops":["Conv","Relu","MaxPool"],"inputs":[{"name":"x","shape":[1,3,224,224]}]}}

`cuda-info` — CUDA fatbin/cubin inspection

SM versions present, kernel symbols.

`pyc-info` — Python bytecode inspection

Magic number, marshalled code object, imports.

Cross-language & web

`lang-bridges` — FFI/binding detection

Detects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.

$ codemap lang-bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"kind":"pyo3","rust_fn":"create_user","py_module":"my_lib"}]}}

`gpu-functions` — GPU kernels in source

CUDA __global__, OpenCL kernels, Metal compute kernels, ROCm/HIP.

$ codemap gpu-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"kernels":[{"name":"matmul_kernel","framework":"cuda","file":"kernels.cu"}]}}

`monkey-patches` — runtime mutation detection

obj.method = new_fn, setattr, prototype patching.

`dispatch-map` — generic dispatch tables

Routers, registries, plugin maps. Finds the "switch statement that controls behavior."

`web-sitemap` — sitemap.xml + crawled link graph

`js-api-extract` — extract API calls from HAR / JS source

LSP bridge (requires a running language server)

`lsp-symbols` — workspace symbol table from LSP

Real symbol info, not AST-inferred. More accurate for typed languages.

`lsp-references` — every reference to a symbol (LSP-grade)

`lsp-calls` — call hierarchy from LSP

`lsp-diagnostics` — current LSP diagnostics across the workspace

$ codemap lsp-diagnostics --dir ./my-repo --json --quiet
{"ok":true,"result":{"diagnostics":[{"file":"src/main.rs","line":42,"severity":"error","msg":"E0308: mismatched types"}]}}

Use when: programmatic access to compiler/type-checker errors.

`lsp-types` — type info on hover for a position

arXiv-derived research actions (advanced)

These implement specific research papers. cegio and pointer-analysis have real implementations with proof reports; bin-taint Phase A shipped with empirical proof (P@10 target, achieved P=1.00/R=0.80).

`pointer-analysis` — Andersen field-sensitive PA

Computes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.

$ codemap pointer-analysis --dir ./my-repo --json --quiet
{"ok":true,"result":{"scope_vars":102000,"copy_constraints":132000,
  "aliases":[{"ptr":"p","may_alias":["a","b"]}]}}

Use when: understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.

`cegio` — counterexample-guided inductive optimization

arXiv 1704.03738. Given taint paths, synthesizes the minimum input that triggers a vulnerability.

$ codemap cegio --dir ./my-repo --json --quiet --taint-result <prior-taint-output>
{"ok":true,"result":{"trigger":{"input":"' OR 1=1--","reaches_sink":true}}}

Use when: turning a taint finding into a proof-of-concept exploit input.

`bin-taint` — binary taint analysis (Phase A)

Lifts x86-64 ELF executable sections to a taint IR, builds CFG, propagates forward may-taint dataflow from PLT-resolved sources (read/recv/fread/getenv/strcpy/memcpy) to sinks (system/popen/exec/sprintf/dlopen), reports ranked source→sink paths. Stripped-binary fallback via bounded .text pathfinding. Proof: precision 1.00, recall 0.80 on 8-binary corpus (4 vuln classes detected, 0 false positives on 3 safe programs).

$ codemap bin-taint ./vulnerable-binary --json --quiet
{"ok":true,"result":{"findings":[{"source":"getenv","sink":"system","hops":["env","cmd","system"],"confidence":0.9},{"source":"read","sink":"sprintf","hops":["buf","format","sprintf"],"confidence":0.7}]}}

Use when: binary taint analysis on stripped ELF, finding command injection / format string / exec injection paths in compiled code.

Composite workflows

`audit` — kitchen-sink security report

See "Data flow & security" section above.

`validate` — sanity check (build + lint + tests + audit summary)

Single composite for "is this repo broken?"

`changeset` — file-grouped diff summary

$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD
{"ok":true,"result":{"changes":{"feat":[...],"fix":[...],"refactor":[...]}}}

`handoff` — generate handoff document for a project

Distills repo state into a single MD doc (status + open issues + recent work + next-steps).

`pipeline` — multi-action pipeline runner

Run several actions in sequence, accumulate results.

$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'
{"ok":true,"result":{"audit":{...},"trace":{...},"hotspots":{...}}}

Use when: scripted multi-step analysis.

Architecture (1-paragraph)

codemap walks --dir, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes 644 actions through a uniform CLI registry (inventory::submit!). Cache: .codemap/cache.bincode next to the scanned dir. Pure static. No daemons, no network access at analysis time.

Repo layout

codemap-core/ — parsing, graph, algorithms, actions
codemap-cli/ — the codemap binary
codemap-napi/ — Node.js bindings (optional)
docs/ — REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.md
install.sh — single install entry

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2,429 Commits
.claude-plugin		.claude-plugin
.codemap		.codemap
.github		.github
codemap-cli		codemap-cli
codemap-core		codemap-core
codemap-napi		codemap-napi
docs		docs
examples		examples
packaging		packaging
plugin		plugin
scripts		scripts
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
_tmp_nrtest_gx86_grade.acsl		_tmp_nrtest_gx86_grade.acsl
_tmp_nrtest_gx86_grade.lean		_tmp_nrtest_gx86_grade.lean
codemap_core_tests_fixtures_mo_x64_add.lean		codemap_core_tests_fixtures_mo_x64_add.lean
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation