Static codebase + binary analyzer, decompiler, and patcher. One binary, 644 actions, 18 source languages, PE/ELF/Mach-O/WASM decompilation to readable recompilable C on x86/x64, ARM64, RISC-V, and WebAssembly, sub-second cold-cache on 3K-file repos. No network, no servers, no databases, no API keys.
This README is your system prompt. Designed for AI agents: drop the entire file into your context (or fetch https://raw.githubusercontent.com/charleschenai/codemap/main/README.md) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see docs/HUMAN.md. Everyone else, keep reading.
Mission: Break down CODE (source + binary) so AI can replicate it.
codemap rewrite --function NAME --edit-c FILE --apply — decompile → recompile edited C → surgical patch (in-place NOP-pad, or code-cave + JMP trampoline when it grows) → replay original vs patched for bounded behavioral equivalence. Built on a recompilable-C foundation (decompiled C now compiles cleanly: every goto target labeled, code-addresses-as-values emitted as numeric literals). Plus a self-directed discovery engine (cross-pollinate) that mines codemap's own capability graph for novel primitive-fusion R&D directions, ranked by leverage × coherence.
Two full 20-topic roadmaps (THIRD + FOURTH) landed. THIRD-20 (all real, gate-verified): transplant, translate, fingerprint, hot-patch, api-shim, size-opt, multi-refactor, fuzz-harness, instrument, visual-docs, vuln-discover, protocol-rec, vectorize, ml-patch, jit-resolve, self-rewrite, gpu-lift, kernel-rewrite, mobile-fuse, os-map. FOURTH-20 mediums (real): self-bench, eval-suite, lasm, worm-defense, pear-fuzz, pqc-translate, ref-decompile.
FOURTH-20 deep tier — converted from honest skeletons to REAL in v8.22–v8.30. These were labeled honest skeletons at v8.21 (we don't fake); every one was then made to actually compute, each gated by a discriminator (an input that would expose a stub). They now produce real results, with bounded engines — precisely scoped, not the heavy external backends:
| Action | What's real now | Honest bound (what it is NOT) |
|---|---|---|
sys-sim |
native x86-64 interpreter — executes real instructions | bounded subset, not a full-system/CPU emulator |
superset-decompile |
real every-offset superset decode + provenance + interval selection | — |
zk-attest |
in-tree SHA-256 + Merkle + Fiat-Shamir + verify (tamper → FAIL) | a commitment/attestation scheme, not a general zk-SNARK prover |
gpu-rewrite |
real PTX parse / transform / re-emit | PTX-level, not a full GPU recompiler for all ISAs |
prove-rewrite |
bounded symbolic-execution translation-validation → EQUIVALENT / NOT-EQ / INCONCLUSIVE | sound only to the symex bound, not a Coq/Lean machine-checked proof |
proof-patch |
discharges obligations via symex + taint → DISCHARGED / VIOLATED / OPEN with engine evidence | bounded discharge, not a full theorem prover |
meta-evolve |
persisted win-rate tuning computed from real run data | — |
self-improve-demo |
genuine measured delta from real bench runs (--dry, human-gated) |
— |
llm-decompile |
pluggable LLM backend (off by default = deterministic/offline) + recompile-consistency gate (only accepts output that recompiles/verifies) | does not ship/claim an integrated LLM; you bring the backend |
Still never a faked verified/proven claim — the bounds above are stated, not hidden. Run any of them yourself (codemap <action> <binary>) and check the output against this table.
codemap is now an autonomous, self-improving, verifiable security engine. The decompiler covers five architectures end-to-end and the action arsenal composes into goal-driven, no-human loops.
Multi-arch decompiler COMPLETE. decompile/ir produce readable, recompilable C for x86, x64, ARM64 (incl NEON/SIMD), RISC-V (RV64GC incl compressed/M/A), and WebAssembly — all through the same lift → SSA → type/var recovery → SAILR structuring → C pipeline.
The Autonomous lane (new actions):
run— agentic mode:codemap run goal=<attack-surface|audit-crypto|modernize|harden> <bin>runs a deterministic, offline, no-LLM PLAN→ACT→OBSERVE→VERIFY→REPORT loop that composes existing actions into a DAG, threads one graph, is budget/step-capped, emits JSON, and only marks a finding fixed if the patch recompiles + re-validates.learn— self-improving: records what-worked from each run into a project-brain store;run's planner consults it to tune the DAG over time. The loop is closed — planning improves with usage, no code changes.redteam— autonomous offensive campaign (taint → symbolic → ranked PoC bundle + report).infer-spec ... export=acsl|lean— machine-checkable proof export (Frama-C ACSL + Lean/Coq), so patches are provable, not just plausible.provenance— signed, tamper-evident manifests for patched/twinned/hardened artifacts.pqc-migrate— detect quantum-vulnerable crypto → apply NIST PQC (ML-KEM/ML-DSA/SLH-DSA) → equivalence note.deobfuscate— the inverse ofharden: de-flatten CFG, crack opaque predicates, decrypt strings via symbolic + graph.
Plus, across the roadmap: binary-twin (cleanroom fork), xlang-graph (cross-language call fusion), to-rust (C→idiomatic Rust), replay (record/replay + mutation), what-if (change-impact), firmware, sbom-flow, crypto-audit, model-extract, game-assets, brain-lock. 644 actions.
v8.4 pushes the v8.3 decompiler in two directions: multi-architecture (it now produces recompilable C for ARM64/AArch64, not just x86/x64) and the first increments of the three strategic arms that turn codemap from read-only intelligence into a full understand → reason → change platform.
v8.4.0 new actions (Phase-1+ across roadmap topics): project-brain (persistent project memory + git-history what-changed), infer-spec (formal pre/post/invariant inference → ACSL + Rust contracts, Daikon-style templates), c-diff (graph-aware decompiled-C diff with call-graph change propagation), ci (binary CI/CD attack-surface gate), vuln-backport (CVE patch → older-binary backport locator). ARM64 decompilation hardened: recursion, switch recovery, emit cleanup, recursive-call returns. 644 actions.
- ARM64 / AArch64 decompiler. ARM64 Mach-O now disassembles (Capstone-backed
Arm64Lifter; function sizing fromLC_FUNCTION_STARTS) and lifts through the same IR pipeline as x86 —codemap ir <arm64-bin> <fn>emits readable C with recovered args (AAPCS64x0–x7), real calls (recursion is acall, not anasmcomment), and frame/spmodeling.--verifyPASS on ARM64, not just x86: both arches decompile → recompile cleanly. ir --verify— recompile gate, first-class.codemap ir <bin> <fn> --verifywrites the emitted C to a temp file and runs a host C compiler on it, reporting PASS / FAIL — ground truth that the decompilation is recompilable, not just plausible. The backbone of codemap's verify-by-running discipline.- Arm 1 — Binary patching.
bin-patch-fn: surgical, layout-preserving in-place function patching (canned stubsret0/ret1/ret/nopor raw hex), fits-gated, verified by re-disassembly. Neutralize a check (bin-patch-fn ./app check_license ret1) without touching any other offset / reloc / string. (The decompile → edit-C → recompile → relink loop is the next increment.) - Arm 2 — Symbolic / concolic.
concolic: an interval constraint solver over the SSA-IR branch guards (no SMT dependency) — per path it reports SAT (with a concrete register seed that drives execution down it), DEAD (contradictory guards → opaque-predicate / dead-code signal), or PARTIAL. Concrete concolic seeds in the default build. - Arm 3 — Dynamic bridge.
trace-plan: uses the code-property graph to choose a selective instrumentation scope (entry, call sites, dangerous sinks, loop heads — not every instruction) and emits a ready-to-run, ABI-aware GDB script. Drive it withconcolicseeds; ingest the trace withruntime-merge. - Graph fusion — cross-binary name recovery.
name-recoveryrecovers a stripped binary's anonymoussub_<va>names by matching them (40-dim structural fingerprint, cosine, greedy 1:1) to named functions in a reference binary, fusing the recovered names into the graph. Exact on same-build; honest-partial across optimization levels. - Decompiler correctness sweep. Fixed multi-block argument recovery (args flowing across a loop/branch were emitted
voidwith use-before-def; now seeded at the SSA entry from the calling convention), struct-field deref (p->x), and 2D-array index (m[i*cols+j]) — all now recompile. - Built for the AI-agent customer.
agent-brief(one-page high-signal map of a codebase),search(relevance-ranked discovery across 644 actions),graph-export(Graphviz / Mermaid / Cytoscape JSON / interactive HTML). Plus human onboarding:cargo binstall, a Homebrew formula, and adocs/HUMAN.mdquickstart.
v8.3 (through 8.3.5) turns codemap's binary side into a real decompiler: lift → SSA → DCE/copy-prop → type & variable recovery → SAILR structuring → readable, recompilable C. It went from "finds 1 function in a stripped PE" to:
- Full binary coverage. PE (x86/x64), ELF (x86/x64/ARM/AArch64), and Mach-O x86-64 — function discovery via PE
.pdataRUNTIME_FUNCTION, ELF symbols/.eh_frame, and Mach-OLC_FUNCTION_STARTS. - Readable C reconstruction. Recovered structs (
p->fieldwith synthesized typedefs), arrays (a[i]), string literals (return "hello world"), float/XMM ABI params & returns (SysV + Win64), C++ virtual calls (obj->vfunc_0()), and clean control flow on-O2(no goto-soup). - C++ exception recovery. Idiomatic
try { … } catch (int e) { … }reconstructed from a stripped binary's.eh_frame+.gcc_except_table— including the caught type, demangled from the LSDA type table. Most decompilers drop the handler entirely or render it as goto-soup. - Correctness, not just readability. Fixed real silent mis-decompilations — array-index liveness (loops returned
a[0]·n), droppedmovzblmasks (x & 0xff→x) — caught and fixed via a re-execution gate. - Behaviorally verified. Every change is gated on a 79-binary recompilability corpus + a G10 re-execution harness (decompile → recompile → run → diff): recovered code is behavior-identical on the scalar subset, not just plausible-looking.
- Graph-fused. Decompiled functions feed codemap's heterogeneous code-property graph, so its dataflow / taint / call-graph / centrality analyses run on stripped binaries, not just source.
v8 cuts the v7 series at 7.184.0 (2026-05-18) and turns over to 8.0.0 (2026-05-20). Headline themes:
- Action registry complete (T1). Every action self-registers via
inventory::submit!;actions/mod.rshas zero dispatch arms (catch-all_ => Err(UnknownAction)only). Adding a new action is a single submit-block edit in the owning module file. - iced-x86 linear-sweep precision (T3). All
bin_text_*density actions disassemble via iced-x86 instead of raw byte-scans — eliminates instruction-boundary false positives. - Lint zero (T8).
#![deny(warnings)]locked intocodemap-coreandcodemap-cli;cargo clippy -- -D warningsships at 0 warnings. - arXiv research: filter scaffolds, ship real work (T9).
pointer-analysis(Andersen field-sensitive PA + Tarjan SCC) andcegio(rsmt2-driven SMT) shipped with real implementations.bin-taintshipped Phase A (CFG, intra/inter-procedural taint, PLT-resolved source/sink, pathfinding, stripped-binary fallback). 16 items removed in v8.2.0 cleanup: 13 skeleton scaffolds (symex-concolic,loop-polyhedral,detect-memory-corruption,neural-decompile,side-channel-detect,symex-speculative,gpu-analyze,semantic-slice,synthesize,abstract-interp,bin-search,patch-binary,natural-query) + 3 failed experiments (meta-path-pprproof +0.0000 lift,rfmoe3/8 FAIL,ising-landscapeproof pending) — all 59–145 LOC with no proof reports or integration tests. - 16 Phase F actions multi-corpus replicated:
transfer-entropy,hebbian-coupling,kl-drift,network-motifs,code-entropy,criticality-soc,fatigue-crack,bio-physarum,preferential-attachment,small-world,phase-transitions,lyapunov-tracker,universality-class,lattice-evidence,control-theory-pid-ci-cd,codemap-mcp.
644 actions registered (full index in docs/ACTION_CATALOG.md; generated from the registry by gen-action-docs and gated by tests/single_source_of_truth.rs). 236 bin-* parsers, 18 source-language tree-sitter parsers, 1614 lib tests, decompiler recompile 96.5% (self-bench real, 193/200), 0 clippy warnings.
| Problem | Codemap action | Why codemap (vs alternatives) |
|---|---|---|
| "What does this codebase do?" | summary --dir <path> |
Cross-file structural overview in one call. Beats reading files. |
| "Find unused functions / dead code" | dead-functions --dir <path> |
Call-graph reachability across modules. grep can't do this. |
| "Who calls function X?" | callers --dir <path> X |
True call graph (AST-aware), not a string match. |
| "What does function X depend on (transitively)?" | trace --dir <path> X |
Walks the dep graph. grep would only find direct refs. |
| "What changed between two commits?" | diff --dir <path> <ref1> <ref2> |
Semantic diff, not line diff. |
| "Find security issues" | audit --dir <path> |
Composite of taint + secret-scan + dep-tree + dead-deps. |
| "Where would a tainted input flow?" | taint --dir <path> --source <fn> --sink <fn> |
Path-sensitive, sanitizer-aware, alias-aware, cross-procedural. |
| "Reverse-engineer a binary" | bin-info <path/to/binary> |
PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in. |
| "Find cross-language coupling" | cross-lang --dir <path> |
Imports/calls that cross language boundaries. |
- Editing files: codemap is read-only. Use Edit/Write directly.
- Running code: codemap doesn't compile or exec. Use bash.
- Live process state: codemap is static. Use
ps,lsof,ss. - Single-file grep: if you know the file,
grepis faster. - String search across few files: if N<5 files, just
grep.
Download the tarball for your platform and extract the binary:
# Linux x86_64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-x86_64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap
# Linux aarch64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-aarch64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap
# macOS (add to PATH if needed)
export PATH="$HOME/.local/bin:$PATH"Add $HOME/.local/bin to your PATH in ~/.bashrc or ~/.zshrc:
export PATH="$HOME/.local/bin:$PATH"For system-wide install (/usr/local/bin/codemap):
sudo cp codemap /usr/local/bin/
sudo chmod +x /usr/local/bin/codemapgit clone https://github.com/charleschenai/codemap && cd codemap
cargo build --release -p codemap-cli
cp target/release/codemap ~/.local/bin/codemap
chmod +x ~/.local/bin/codemapcodemap --version-detail
Prints:
codemap 8.2.0
git: <latest-sha>
built: <build-date>
host: <hostname>/<arch>
If the binary is older than expected, re-run install with --update.
Universal shape:
codemap <ACTION> [TARGET...] --dir <PATH> [--json] [--quiet] [other-flags]
| Flag | Purpose |
|---|---|
--dir <PATH> |
Required. Repo/dir to scan. Repeatable for multi-repo. |
--json |
Output JSON (parseable). Default is text (human-readable). |
--quiet |
Suppress scan/cache status messages on stderr. |
--no-cache |
Force re-scan, ignore .codemap/cache.bincode. |
--include-path <PATH> |
C/C++ include search path. |
--watch [SECS] |
Re-run every N seconds. |
For agents: always use --json and --quiet unless you specifically want text output.
codemap --help # full action list
codemap <action> --help # action-specific flags
644 actions (a curated subset advertised in --help, 236 fine-grained bin-* parsers, plus the rest) grouped by purpose. Full catalog at docs/ACTION_CATALOG.md. High-level groups:
| Category | Action count | Examples |
|---|---|---|
| Analysis | ~20 | summary, stats, trace, callers, hotspots, layers, health, decorators |
| Code intelligence | ~30 | complexity, import-cost, churn, api-diff, clones, entry-points, dead-functions |
| Dataflow / security | ~16 | data-flow, taint, bin-taint, slice, trace-value, sinks, secret-scan, audit, dep-tree |
| Graph theory | ~40 | pagerank, hubs, bridges, centrality (17 measures), community (Leiden), bellman-ford |
| Binary / RE | ~235 | elf-info, pe-imports, macho-info, bin-anti-debug, bin-disasm, bin-strings, bin-relocs |
| Schemas | ~10 | proto-schema, openapi-schema, graphql-schema, sql-extract, dbf-schema |
| Supply chain | ~10 | osv-scan, sbom-diff, license-check, cve-scan |
| Config-as-code | ~10 | k8s-scan, iac-scan, dockerfile-scan, ci-scan, oci-scan |
| ML / AI | ~10 | gguf-info, safetensors-info, onnx-info, cuda-info, pyc-info |
| LSP bridge | ~5 | lsp-symbols, lsp-references, lsp-calls, lsp-diagnostics, lsp-types |
| Web | ~5 | web-sitemap, js-api-extract (HAR/HTML input required) |
| Cross-language | ~5 | lang-bridges, gpu-functions, monkey-patches |
| Composite | ~10 | audit, compare, validate, changeset, handoff, pipeline |
| arXiv-derived | 2 | pointer-analysis (Andersen PA), cegio (SMT optimizer) |
All --json outputs follow:
{
"ok": <boolean>,
"action": "<action-name>",
"dir": "<scanned-path>",
"result": <action-specific>,
"stats": { "files_scanned": N, "duration_ms": M, "cache_hits": K }
}
result shape varies per action. Action-specific schemas in docs/SCHEMAS.md.
| Code | Meaning | Agent response |
|---|---|---|
| 0 | Success | Parse --json output |
| 1 | Usage error (bad flag, missing --dir) | Re-read --help, fix args, retry |
| 2 | I/O error (path not found, no read perm) | Verify path, retry |
| 101 | Panic | Do not retry. File a bug at https://github.com/charleschenai/codemap/issues |
Other non-zero codes: action-specific. See <action> --help.
codemap is designed for AI agents as its primary customer. Below is the canonical walkthrough for integrating codemap into agent workflows.
| Scenario | grep / raw edits | codemap |
|---|---|---|
| "What does this codebase do?" | Read every file sequentially | summary — structural overview in one call |
| "Find dead / unused code" | Manual reachability tracing | dead-functions — true call-graph reachability |
| "Who calls function X?" | String match across files | callers — AST-aware call graph |
| "What does function X depend on?" | Direct import grep | trace — transitive dep graph walk |
| "What changed between two commits?" | Line-level diff | diff — semantic diff (AST-aware) |
| "Find security issues" | YARA / pattern match | audit — composite: taint + secret-scan + dep-tree + dead-deps |
| "Where does tainted input flow?" | No tool | taint — path-sensitive, sanitizer-aware, cross-procedural |
| "Analyze a compiled binary" | strings + hexdump + manual |
bin-info + bin-taint — PE/ELF/Mach-O parsers + taint analysis |
| "Graph metrics on code" | Custom scripts | 500+ built-in actions (graph theory, entropy, ML, physics-inspired) |
codemap is read-only, no network, no servers, no databases, no API keys. It scans your local filesystem, builds ASTs + CFGs + graphs in memory, and returns structured JSON output.
Every action follows this pattern:
codemap <ACTION> [TARGET] --dir <PATH> --json --quiet [OPTIONS]| Flag | Purpose |
|---|---|
--json |
JSON output (machine-readable) |
--quiet |
Suppress progress bars and logs |
--dir |
Directory to analyze (required) |
Output schema (for actions that return results):
{
"ok": true,
"result": { ... },
"metrics": {
"time_ms": 42,
"files_scanned": 1501,
"edges": 100219
}
}On failure:
{
"ok": false,
"error": "error message"
}Exit codes:
0— success1— error (check--jsonoutput for details)
Example 1: "What does this repo do?"
codemap summary --json --quiet --dir ./project
# → Cross-file structural overview: top-level modules, key dependencies, entry pointsExample 2: "Find unused functions"
codemap dead-functions --json --quiet --dir ./project
# → Functions with zero callers across the module graph. Includes call-chain depth.Example 3: "Security audit"
codemap audit --json --quiet --dir ./project
# → Composite: taint analysis + secret detection + dependency tree + dead deps
# Returns findings ranked by confidence with source→sink pathsExample 4: "Taint analysis — find injection paths"
codemap taint --json --quiet --dir ./project --source read --sink system
# → Path-sensitive taint from `read` to `system` with confidence scoring
# Reports ranked source→sink paths with alias resolutionExample 5: "Binary analysis — what is this executable?"
codemap bin-info --json --quiet ./target/release/my-binary
# → PE/ELF/Mach-O parser: sections, imports, exports, symbols,
# capa-rules detection, YARA signatures, anti-debug indicatorsFor agents that use MCP-compatible clients (Claude Code, Cursor, Windsurf), add codemap as an MCP tool server. All 644 actions become available as MCP tools with proper input schemas:
// ~/.claude/settings.json
{
"mcpServers": {
"codemap": {
"command": "python3",
"args": ["/path/to/codemap/docs/codemap-mcp-server.py"]
}
}
}This is the recommended path because:
- No CLI parsing needed — tools have structured input schemas
- Self-documenting —
tools/listreturns every action name, description, and schema - Executable via JSON-RPC —
tools/callwith{name, arguments}dispatches any action - Zero config for AI — the agent discovers capabilities automatically
Set CODEMAP_BIN if your codemap binary is not on PATH:
export CODEMAP_BIN=~/.local/bin/codemap| Variable | Purpose | Default |
|---|---|---|
CODEMAP_BIN |
Path to codemap binary | codemap (from PATH) |
CODEMAP_CACHE |
Custom cache directory | .codemap/cache.bincode (next to scanned dir) |
Always check --json output for error details:
result=$(codemap <ACTION> --json --quiet --dir ./project)
if echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d['ok'] else 1)"; then
echo "Success: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['result'])")"
else
echo "Error: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['error'])")"
fi- Cold cache: sub-second on repos up to 3K files
- Warm cache: near-instant (reads
.codemap/cache.bincode) - Large repos (10K+ files): 5-30 seconds for full analysis
- All analysis is in-memory. No disk writes except the cache file.
- No network calls during analysis.
Each recipe: what the action does → command → sample output → when to use it.
For the complete flat list of action names see docs/ACTION_CATALOG.md.
Reports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.
$ codemap summary --dir ./my-repo --json --quiet
{"ok":true,"result":{"files":2824,"languages":["rust","python","typescript"],
"entry_points":["src/main.rs","src/lib.rs"],"top_modules":["analysis","insights","cpg"]}}
Use when: new repo, "tell me what this does" before diving deeper.
Per-language LOC + file counts, function/class density, fan-in/fan-out distribution.
$ codemap stats --dir ./my-repo --json --quiet
{"ok":true,"result":{"rust":{"files":341,"loc":89432,"fns":2104},"python":{"files":52,"loc":4108}}}
Use when: comparing repos by size, reporting metrics, sanity-checking parse coverage.
Infers boundaries (web / service / data / infra) from import patterns + naming conventions.
$ codemap layers --dir ./my-repo --json --quiet
{"ok":true,"result":{"layers":[{"name":"web","modules":["routes","handlers"]},
{"name":"data","modules":["models","repo"]}],"violations":[...]}}
Use when: validating that "web shouldn't import from data" type architectural rules hold.
Surfaces "danger zone" code (high git churn + high cyclomatic complexity).
$ codemap hotspots --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"hotspots":[{"file":"src/parser.rs","churn":48,"complexity":92,"score":4416}]}}
Use when: prioritizing refactor work, finding "where bugs live."
Lists exported functions/classes that other code can call from outside.
$ codemap entry-points --dir ./my-repo --json --quiet
{"ok":true,"result":{"entries":[{"name":"create_user","file":"api/users.rs","kind":"public_fn"}]}}
Use when: API documentation, understanding what's a stable contract.
Composite: dead code % + clippy/lint count + circular deps + missing tests. Single "is this repo healthy?" score.
$ codemap health --dir ./my-repo --json --quiet
{"ok":true,"result":{"score":78,"dead_code_pct":3.2,"circular_deps":2,"missing_tests":["api/users.rs::delete"]}}
Use when: quick "should we touch this codebase or not" gut-check.
Functions never called by any other function in the workspace.
$ codemap dead-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":[{"file":"src/old.rs","function":"legacy_helper","line":42}]}}
Use when: cleanup PR, removing tech debt. Don't use for: identifying entry points (they're "dead" by call-graph but intentionally public).
Files no other file imports / uses.
$ codemap dead-files --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead_files":["src/experimental/old_impl.rs","tools/debug.py"]}}
Use when: dead-import cleanup.
Packages in Cargo.toml/package.json/pyproject.toml that no source file imports.
$ codemap dead-deps --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":["serde_json (Cargo.toml)","lodash (package.json)"]}}
Use when: dep cleanup, reducing build time + attack surface.
McCabe complexity (branches+1). Catches "this function should be split."
$ codemap complexity --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"fn":"parse_expression","file":"parser.rs","cyclomatic":34,"lines":280}]}}
Use when: finding refactor candidates, code review automation.
Commits-touching-file count over a window.
$ codemap churn --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"file":"src/parser.rs","commits":78,"authors":12}]}}
Use when: combined with complexity for hotspots, ownership analysis.
Detects near-identical token sequences across files (copy-paste detection).
$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50
{"ok":true,"result":{"clones":[{"size":120,"locations":[["a.rs:14","b.rs:22"]],"similarity":0.94}]}}
Use when: finding extraction candidates for shared functions.
Reports module cycles (a → b → c → a).
$ codemap circular --dir ./my-repo --json --quiet
{"ok":true,"result":{"cycles":[["src/a.rs","src/b.rs","src/a.rs"]]}}
Use when: untangling architecture before a refactor.
Walks the call graph forward from a function/symbol, returns full dep tree.
$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals
{"ok":true,"result":{"node":"RecalcInvoiceTotals","calls":[
{"name":"ship_chg_sum","file":"backend/invoices.go:120","depth":1},
{"name":"format_money","file":"util/money.go:8","depth":2}]}}
Use when: impact analysis before changing a function, generating context for an LLM.
Reverse of trace. Returns the function's call sites + their callers.
$ codemap callers --dir ./my-repo --json --quiet validate_user
{"ok":true,"result":{"callers":[{"caller":"login","file":"auth.py:88","depth":1}]}}
Use when: "if I change this signature, what breaks?"
Combines callers + dataflow + tests touched. Most pessimistic estimate.
$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id
{"ok":true,"result":{"functions":42,"tests":7,"endpoints":3,"db_columns":2}}
Use when: "what's the size of changing this thing?"
Function-level diff: added, removed, signature-changed, body-changed.
$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"added":["validate_email"],"removed":["old_validator"],
"signature_changed":[{"fn":"create","before":"(name)","after":"(name,email)"}]}}
Use when: generating PR descriptions, understanding code review scope.
Like diff but specifically flags BREAKING vs additive changes to public API.
$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"breaking":[
{"kind":"removed","fn":"OldAPI::v1_login"},
{"kind":"signature_change","fn":"create_user","before":"(name)","after":"(name,email)"}]}}
Use when: versioning decisions (semver minor vs major), CHANGELOG generation.
Maps the diff to every transitively-affected caller.
$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"impacted_fns":127,"impacted_files":34,"high_risk":["payment::charge"]}}
Use when: deciding test scope for a PR.
Runs taint + secret-scan + dead-deps + dep-tree + license-check in one pass.
$ codemap audit --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[
{"kind":"secret","file":".env.sample","line":3,"pattern":"AWS_KEY"},
{"kind":"taint","source":"req.body","sink":"db.execute","path":[...]},
{"kind":"dep-vuln","package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}
Use when: first-pass security review of an unfamiliar repo.
Tracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. safe = sanitize(x)), cross-procedural (parses wrapper bodies to detect hidden sanitizers).
$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'
{"ok":true,"result":{"paths":[{"source":"req.query.id","sink":"db.execute(sql)",
"hops":["params.id","userId","query"],"sanitized":false}]}}
Use when: SQLi/XSS/SSRF detection, "is user input reaching this sink?"
Given a target variable/sink, return only the code that influences it.
$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py
{"ok":true,"result":{"slice_lines":[12,15,22,30,42],"file":"auth.py"}}
Use when: narrowing what to read when chasing a bug.
Enumerates every db.execute, eval, exec, Runtime.exec, subprocess.shell=True, innerHTML=, etc.
$ codemap sinks --dir ./my-repo --json --quiet
{"ok":true,"result":{"sinks":[{"kind":"sql","file":"api/users.rs","line":88,"expr":"db.execute(query)"}]}}
Use when: building taint queries, audit checklist generation.
20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.
$ codemap secret-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":".env.sample","line":3,"kind":"aws_access_key","masked":"AKIA****REDACTED"}]}}
Use when: pre-commit hook, pre-publish audit.
Where does this variable's value come from? (def-use chain)
$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'
{"ok":true,"result":{"origins":[{"file":"auth.py:88","expr":"req.cookies['session']"}]}}
Use when: "where does this magic value come from?"
Detects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.
$ codemap api-surface --dir ./my-repo --json --quiet
{"ok":true,"result":{"endpoints":[{"method":"POST","path":"/users","handler":"create_user","auth_required":false}]}}
Use when: generating OpenAPI from existing code, finding unauthenticated endpoints.
These run on codemap's internal call graph + import graph + AST graph.
NetworkX-style PageRank. High score = central + many incoming refs.
$ codemap pagerank --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"ranked":[{"fn":"handle_request","score":0.082}]}}
Use when: finding "load-bearing" functions, prioritizing code review.
Functions/modules that depend on many others. Different from PageRank (which is about incoming).
$ codemap hubs --dir ./my-repo --json --quiet
{"ok":true,"result":{"hubs":[{"fn":"orchestrator","out_degree":47}]}}
Use when: finding god-objects, refactor targets.
Edges whose removal disconnects the graph. These are critical paths.
$ codemap bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"from":"auth","to":"db","modules":["auth.rs","db.rs"]}]}}
Use when: identifying single points of failure in module coupling.
Run with a specific measure: betweenness, eigenvector, katz, closeness, harmonic, load, structural-holes (brokers), voterank, etc. All NetworkX standards.
$ codemap betweenness --dir ./my-repo --json --quiet --top 5
{"ok":true,"result":{"top":[{"node":"db_session","betweenness":0.34}]}}
Use when: finding modules that connect otherwise-separate subsystems.
Partitions the graph into densely-connected sub-communities.
$ codemap clusters --dir ./my-repo --json --quiet leiden
{"ok":true,"result":{"clusters":[{"id":0,"size":34,"members":["auth.rs","users.rs"]}]}}
Use when: discovering implicit module boundaries.
Returns the chain of imports/calls connecting source → target.
$ codemap paths --dir ./my-repo --json --quiet user_input db_write
{"ok":true,"result":{"path":["user_input","sanitize","query_builder","db_write"],"length":4}}
Use when: "how does X reach Y?"
Returns nodes within N hops of a target. Useful before deep analysis.
$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2
{"ok":true,"result":{"nodes":[...],"edges":[...]}}
Use when: narrowing scope before more expensive analysis.
Classical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.
Decompiler (ir / decompile) — full lift → SSA → simplify → type-recovery → variable-recovery → calling-convention → SAILR structuring → C++ RTTI → readable-C emit pipeline
This is a real decompiler. 14-stage pipeline that reconstructs expressions, variables, types, and if / while / switch syntax (incl. jump tables / computed branches / string-literal returns) from compiled binaries. Full G10 fidelity (10/10) + 79/79 protected-bin decomp test pass (bugbins-verify + reexec_harness) with switch_dispatch special-case recovery (const char* + "zero".."seven"/"unknown" map, a1 scrutinee, correct default VA); see CHANGELOG + docs/COMMIT_LEDGER.md for G10 fixes + Job 3 consolidation + GAP3-6 (F-4 -O2 dangling-goto/continue, C++ vcall via rtti, XMM/float ABI + libc-extern recomp fix, Mach-O x86-64 thin+FAT) + GAP9 (no more rsp/rbp/rbx/r12-r15 "lifter gap =0" noise decls in every fn; frame uses elided to 0) + GAP8 (struct field recovery: ptr->field_0xN with synthesized typedefs for recompile) + GAP7 Part A (array element type from access width: int32_t* for 4-byte loads). Emitted C is gcc-recompilable (current 79/79 state supersedes earlier ~48/60 notes). Cross-binary type propagation + RTTI + stack slots + confidence scores. Mach-O x86-64 support (function discovery via LC_FUNCTION_STARTS + symtab + sections; feeds iced-x86/IR).
Known limitation (gap 11, deferred): Array indexing inside loops can decompile with an incorrect (use-before-def) index (e.g. ghost reg instead of loop counter v), producing behaviorally-wrong recompiled output (sum may return a[0]*n instead of 10); element type is correct. Root: copy-prop drops the index register's def on register reuse inside the loop. Tracked as gap 11.
Remaining gaps documented in DECOMPILER.md. (New direction: user-driven decompiler quality per Ghidra issues etc.)
# Decompile a single function (full pipeline)
codemap ir <binary> [<hex-fn-addr> | <name>]
# Decompile entry point
codemap ir <binary>
# Batch call-tree walk with structural hints
codemap decompile <binary> [max-depth=N] [max-children=N] [deep]Pipeline stages:
- Lift — iced-x86 decode → IRCFG (three-address IR with explicit BitWidth)
- SSA construction — Cytron et al. (1991): iterated-dominance-frontier phi placement + pre-order DFS renaming
- Simplify — 42 peephole rules (Miasm / angr reference-FIRST): constant folding, identity elimination, SSA-aware simplification, signed-div-by-power-of-2, ROL/ROR detection, byte-swap, etc.
- Calling-convention recovery — SysV AMD64 ABI: populate Call.args from rdi/rsi/rdx/rcx/r8/r9
- Dead-code elimination — backward dataflow liveness (~80% flag computations pruned)
- Copy/constant propagation — 4 alternating iterations of copy-prop + simplify + DCE
- Dead-block removal — reachability from entry; prunes linker padding
- Block coalescing — merge linear Goto-chains
- SAILR structuring — CFG + IRCFG → C-shaped AST (Sequence / IfThen / IfThenElse / While / For / Switch / Call / Goto)
- Variable recovery — classifies variables: Register, Stack, Memory, Temporary, Constant
- Type inference — Phase 2 seeded from widths + Mem-loads/Stores; iterated-meet solver infers Int / Pointer / struct types
- Stack-slot analysis — rsp-relative offsets for
*(rsp_N)→stack[<offset>] - C++ RTTI analysis — vtable references → class declarations (base classes, virtual methods, fields)
- C emission — structured AST → readable C source with type annotations, stack-slot names, symbol resolution
Differentiators:
- Cross-binary type/name propagation — types from one binary's RTTI flow into another's
- Graph-as-validator — heterogeneous code graph cross-checks decompilation output
- Recompilable-C target — structured, typed, symbol-resolved C suitable for recompilation
Example output:
=== codemap ir ===
Binary: ./target/release/codemap
Format: ELF64 (64-bit, arch=x64)
Function: main @ 0x401000 (234 bytes, 78 insns)
CFG blocks: 12
CFG edges: 18 (pre-enrich) → 18 (post-enrich)
Jump tables: 0 resolved indirect-JMPs
SSA phis: 3 inserted
Variables: 45 total (12 reg, 20 stack[-0x10..+0x18], 10 mem, 3 const, 0 tmp)
Types: 30 bound (15 int, 10 ptr, 3 top, 2 bot, 0 other)
CC args: 5 call sites populated (SysV AMD64)
DCE removed: 62 dead stmts (pre-prop) + 8 (post-prop)
Copy-prop: 15 stmts inlined
Dead blocks: 2 removed (unreachable)
Coalesced: 4 blocks merged
--- structured AST ---
Sequence {
Let { rbp_0 = rbp }
Let { rsp_0 = (rsp - 0x10) }
IfThen {
Cond: (rax_0 == 0)
Then: Sequence { Call { printf("usage\n") } }
}
While {
Cond: (argc_0 > 0)
Body: Sequence { ... }
}
Ret { rax_0 }
}
--- C-shaped output ---
int main(int argc, char *argv[]) {
uint64_t rbp_0 = rbp;
uint64_t rsp_0 = (rsp - 0x10);
if (rax_0 == 0) {
printf("usage\n");
}
while (argc_0 > 0) {
// ... loop body ...
argc_0 = argc_0 - 1;
}
return rax_0;
}
Use when: binary reverse engineering, understanding compiled code, patch generation, static analysis of binaries. See docs/DECOMPILER.md for full pipeline reference.
Format detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.
$ codemap bin-info /usr/local/bin/codemap --json --quiet
{"ok":true,"result":{"format":"ELF64","arch":"aarch64","rust":true,"strip":false,
"sections":34,"anti_debug":[],"packed":false}}
Use when: triage step 1 — "what is this binary?"
Lists every DLL imported + every function exported.
$ codemap pe-imports ./sample.exe --json --quiet
{"ok":true,"result":{"imports":[{"dll":"kernel32.dll","functions":["VirtualAlloc","CreateProcessA"]}]}}
Use when: static behavioral profiling — what APIs does this binary depend on?
Ascii + utf16le + entropy-filtered.
$ codemap pe-strings ./sample.exe --json --quiet --min-len 8
{"ok":true,"result":{"strings":["http://c2.example.com","cmd.exe /c"]}}
Use when: triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.
Functions added / removed / modified between two builds.
$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe
{"ok":true,"result":{"added":["new_handler"],"removed":["legacy_proc"],"modified":["main"]}}
Use when: patch analysis, regression hunting in firmware.
PE that contains CLI/.NET — reads the metadata streams, lists types + methods.
$ codemap dotnet-meta ./sample.dll --json --quiet
{"ok":true,"result":{"assembly":"Sample.Dll","types":["Foo","Bar"],"methods_count":42}}
Use when: analyzing .NET malware or .NET 3rd-party libs.
Constant pool, method signatures, bytecode summaries.
Imports, exports, function table, memory layout.
Parses spec files and reports endpoints/types/operations.
$ codemap openapi-schema --dir ./api --json --quiet
{"ok":true,"result":{"paths":[{"method":"GET","path":"/users","operationId":"listUsers"}]}}
Use when: generating client code, checking spec consistency.
Checks privileged containers, hostNetwork, missing resource limits, etc.
$ codemap k8s-scan --dir ./k8s/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"K8S-001","resource":"Deployment/api","severity":"high","msg":"privileged=true"}]}}
Use when: auditing manifests before apply.
$ codemap iac-scan --dir ./infra/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"IAC-007","file":"main.tf","msg":"S3 bucket public-read ACL"}]}}
$ codemap dockerfile-scan --dir ./ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"DKR-002","msg":"running as root","line":18}]}}
GitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, pull_request_target misuse.
$ codemap ci-scan --dir ./.github/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"GH-003","file":"deploy.yml","msg":"unpinned action ref"}]}}
Per-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.
$ codemap oci-scan --dir ./image.tar --json --quiet --mode all
{"ok":true,"result":{"layers":[...],"secrets":[...],"licenses":[...]}}
Pulls SQL out of source code or .sql files. Schema + queries.
$ codemap sql-extract --dir ./my-repo --json --quiet
{"ok":true,"result":{"tables":[{"name":"users","columns":[...]}],"queries":[...]}}
Semver-range-aware.
$ codemap osv-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"vulns":[{"package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}
Added, removed, upgraded, downgraded packages between two SBOMs.
$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet
{"ok":true,"result":{"added":[...],"removed":[...],"upgraded":[...]}}
Per-package license + compatibility verdict.
$ codemap license-check --dir ./my-repo --json --quiet
{"ok":true,"result":{"deps":[{"name":"foo","license":"GPL-3.0","compatible":false}]}}
Architecture, layer count, head count, quant level, vocab size.
$ codemap gguf-info ./model.gguf --json --quiet
{"ok":true,"result":{"arch":"llama","n_layers":32,"n_heads":32,"vocab_size":32000,"quant":"Q4_K_M"}}
Use when: "what model is this file?" Pre-load sanity check.
Tensor shapes, dtypes, total params.
$ codemap safetensors-info ./model.safetensors --json --quiet
{"ok":true,"result":{"tensors":291,"total_params":7240000000,"dtype":"float16"}}
Operators, inputs, outputs, opset.
$ codemap onnx-info ./model.onnx --json --quiet
{"ok":true,"result":{"opset":17,"ops":["Conv","Relu","MaxPool"],"inputs":[{"name":"x","shape":[1,3,224,224]}]}}
SM versions present, kernel symbols.
Magic number, marshalled code object, imports.
Detects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.
$ codemap lang-bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"kind":"pyo3","rust_fn":"create_user","py_module":"my_lib"}]}}
CUDA __global__, OpenCL kernels, Metal compute kernels, ROCm/HIP.
$ codemap gpu-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"kernels":[{"name":"matmul_kernel","framework":"cuda","file":"kernels.cu"}]}}
obj.method = new_fn, setattr, prototype patching.
Routers, registries, plugin maps. Finds the "switch statement that controls behavior."
Real symbol info, not AST-inferred. More accurate for typed languages.
$ codemap lsp-diagnostics --dir ./my-repo --json --quiet
{"ok":true,"result":{"diagnostics":[{"file":"src/main.rs","line":42,"severity":"error","msg":"E0308: mismatched types"}]}}
Use when: programmatic access to compiler/type-checker errors.
These implement specific research papers. cegio and pointer-analysis have real implementations with proof reports; bin-taint Phase A shipped with empirical proof (P@10 target, achieved P=1.00/R=0.80).
Computes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.
$ codemap pointer-analysis --dir ./my-repo --json --quiet
{"ok":true,"result":{"scope_vars":102000,"copy_constraints":132000,
"aliases":[{"ptr":"p","may_alias":["a","b"]}]}}
Use when: understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.
arXiv 1704.03738. Given taint paths, synthesizes the minimum input that triggers a vulnerability.
$ codemap cegio --dir ./my-repo --json --quiet --taint-result <prior-taint-output>
{"ok":true,"result":{"trigger":{"input":"' OR 1=1--","reaches_sink":true}}}
Use when: turning a taint finding into a proof-of-concept exploit input.
Lifts x86-64 ELF executable sections to a taint IR, builds CFG, propagates forward may-taint dataflow from PLT-resolved sources (read/recv/fread/getenv/strcpy/memcpy) to sinks (system/popen/exec/sprintf/dlopen), reports ranked source→sink paths. Stripped-binary fallback via bounded .text pathfinding. Proof: precision 1.00, recall 0.80 on 8-binary corpus (4 vuln classes detected, 0 false positives on 3 safe programs).
$ codemap bin-taint ./vulnerable-binary --json --quiet
{"ok":true,"result":{"findings":[{"source":"getenv","sink":"system","hops":["env","cmd","system"],"confidence":0.9},{"source":"read","sink":"sprintf","hops":["buf","format","sprintf"],"confidence":0.7}]}}
Use when: binary taint analysis on stripped ELF, finding command injection / format string / exec injection paths in compiled code.
See "Data flow & security" section above.
Single composite for "is this repo broken?"
$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD
{"ok":true,"result":{"changes":{"feat":[...],"fix":[...],"refactor":[...]}}}
Distills repo state into a single MD doc (status + open issues + recent work + next-steps).
Run several actions in sequence, accumulate results.
$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'
{"ok":true,"result":{"audit":{...},"trace":{...},"hotspots":{...}}}
Use when: scripted multi-step analysis.
codemap walks --dir, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes 644 actions through a uniform CLI registry (inventory::submit!). Cache: .codemap/cache.bincode next to the scanned dir. Pure static. No daemons, no network access at analysis time.
codemap-core/— parsing, graph, algorithms, actionscodemap-cli/— thecodemapbinarycodemap-napi/— Node.js bindings (optional)docs/— REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.mdinstall.sh— single install entry
MIT. See LICENSE.