Skip to content

charleschenai/codemap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,429 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

codemap

Static codebase + binary analyzer, decompiler, and patcher. One binary, 644 actions, 18 source languages, PE/ELF/Mach-O/WASM decompilation to readable recompilable C on x86/x64, ARM64, RISC-V, and WebAssembly, sub-second cold-cache on 3K-file repos. No network, no servers, no databases, no API keys.

This README is your system prompt. Designed for AI agents: drop the entire file into your context (or fetch https://raw.githubusercontent.com/charleschenai/codemap/main/README.md) and you have everything you need — what codemap is, when to use it, how to install it, how to call every category of action, output schemas, exit codes, MCP setup. No further docs required for 95% of usage. Humans: see docs/HUMAN.md. Everyone else, keep reading.

Mission: Break down CODE (source + binary) so AI can replicate it.

What's new in v8.52 — semantic rewrite pipeline closed (644 actions)

codemap rewrite --function NAME --edit-c FILE --apply — decompile → recompile edited C → surgical patch (in-place NOP-pad, or code-cave + JMP trampoline when it grows) → replay original vs patched for bounded behavioral equivalence. Built on a recompilable-C foundation (decompiled C now compiles cleanly: every goto target labeled, code-addresses-as-values emitted as numeric literals). Plus a self-directed discovery engine (cross-pollinate) that mines codemap's own capability graph for novel primitive-fusion R&D directions, ranked by leverage × coherence.

Earlier — the 40-topic grind + v11 stack complete (644 actions)

Two full 20-topic roadmaps (THIRD + FOURTH) landed. THIRD-20 (all real, gate-verified): transplant, translate, fingerprint, hot-patch, api-shim, size-opt, multi-refactor, fuzz-harness, instrument, visual-docs, vuln-discover, protocol-rec, vectorize, ml-patch, jit-resolve, self-rewrite, gpu-lift, kernel-rewrite, mobile-fuse, os-map. FOURTH-20 mediums (real): self-bench, eval-suite, lasm, worm-defense, pear-fuzz, pqc-translate, ref-decompile.

FOURTH-20 deep tier — converted from honest skeletons to REAL in v8.22–v8.30. These were labeled honest skeletons at v8.21 (we don't fake); every one was then made to actually compute, each gated by a discriminator (an input that would expose a stub). They now produce real results, with bounded engines — precisely scoped, not the heavy external backends:

Action What's real now Honest bound (what it is NOT)
sys-sim native x86-64 interpreter — executes real instructions bounded subset, not a full-system/CPU emulator
superset-decompile real every-offset superset decode + provenance + interval selection
zk-attest in-tree SHA-256 + Merkle + Fiat-Shamir + verify (tamper → FAIL) a commitment/attestation scheme, not a general zk-SNARK prover
gpu-rewrite real PTX parse / transform / re-emit PTX-level, not a full GPU recompiler for all ISAs
prove-rewrite bounded symbolic-execution translation-validation → EQUIVALENT / NOT-EQ / INCONCLUSIVE sound only to the symex bound, not a Coq/Lean machine-checked proof
proof-patch discharges obligations via symex + taint → DISCHARGED / VIOLATED / OPEN with engine evidence bounded discharge, not a full theorem prover
meta-evolve persisted win-rate tuning computed from real run data
self-improve-demo genuine measured delta from real bench runs (--dry, human-gated)
llm-decompile pluggable LLM backend (off by default = deterministic/offline) + recompile-consistency gate (only accepts output that recompiles/verifies) does not ship/claim an integrated LLM; you bring the backend

Still never a faked verified/proven claim — the bounds above are stated, not hidden. Run any of them yourself (codemap <action> <binary>) and check the output against this table.

What's new in v8.13 — the autonomous + verifiable engine

codemap is now an autonomous, self-improving, verifiable security engine. The decompiler covers five architectures end-to-end and the action arsenal composes into goal-driven, no-human loops.

Multi-arch decompiler COMPLETE. decompile/ir produce readable, recompilable C for x86, x64, ARM64 (incl NEON/SIMD), RISC-V (RV64GC incl compressed/M/A), and WebAssembly — all through the same lift → SSA → type/var recovery → SAILR structuring → C pipeline.

The Autonomous lane (new actions):

  • runagentic mode: codemap run goal=<attack-surface|audit-crypto|modernize|harden> <bin> runs a deterministic, offline, no-LLM PLAN→ACT→OBSERVE→VERIFY→REPORT loop that composes existing actions into a DAG, threads one graph, is budget/step-capped, emits JSON, and only marks a finding fixed if the patch recompiles + re-validates.
  • learnself-improving: records what-worked from each run into a project-brain store; run's planner consults it to tune the DAG over time. The loop is closed — planning improves with usage, no code changes.
  • redteam — autonomous offensive campaign (taint → symbolic → ranked PoC bundle + report).
  • infer-spec ... export=acsl|lean — machine-checkable proof export (Frama-C ACSL + Lean/Coq), so patches are provable, not just plausible.
  • provenance — signed, tamper-evident manifests for patched/twinned/hardened artifacts.
  • pqc-migrate — detect quantum-vulnerable crypto → apply NIST PQC (ML-KEM/ML-DSA/SLH-DSA) → equivalence note.
  • deobfuscate — the inverse of harden: de-flatten CFG, crack opaque predicates, decrypt strings via symbolic + graph.

Plus, across the roadmap: binary-twin (cleanroom fork), xlang-graph (cross-language call fusion), to-rust (C→idiomatic Rust), replay (record/replay + mutation), what-if (change-impact), firmware, sbom-flow, crypto-audit, model-extract, game-assets, brain-lock. 644 actions.

What's new in v8.4 — multi-arch decompiler + the three strategic arms

v8.4 pushes the v8.3 decompiler in two directions: multi-architecture (it now produces recompilable C for ARM64/AArch64, not just x86/x64) and the first increments of the three strategic arms that turn codemap from read-only intelligence into a full understand → reason → change platform.

v8.4.0 new actions (Phase-1+ across roadmap topics): project-brain (persistent project memory + git-history what-changed), infer-spec (formal pre/post/invariant inference → ACSL + Rust contracts, Daikon-style templates), c-diff (graph-aware decompiled-C diff with call-graph change propagation), ci (binary CI/CD attack-surface gate), vuln-backport (CVE patch → older-binary backport locator). ARM64 decompilation hardened: recursion, switch recovery, emit cleanup, recursive-call returns. 644 actions.

  • ARM64 / AArch64 decompiler. ARM64 Mach-O now disassembles (Capstone-backed Arm64Lifter; function sizing from LC_FUNCTION_STARTS) and lifts through the same IR pipeline as x86 — codemap ir <arm64-bin> <fn> emits readable C with recovered args (AAPCS64 x0x7), real calls (recursion is a call, not an asm comment), and frame/sp modeling. --verify PASS on ARM64, not just x86: both arches decompile → recompile cleanly.
  • ir --verify — recompile gate, first-class. codemap ir <bin> <fn> --verify writes the emitted C to a temp file and runs a host C compiler on it, reporting PASS / FAIL — ground truth that the decompilation is recompilable, not just plausible. The backbone of codemap's verify-by-running discipline.
  • Arm 1 — Binary patching. bin-patch-fn: surgical, layout-preserving in-place function patching (canned stubs ret0/ret1/ret/nop or raw hex), fits-gated, verified by re-disassembly. Neutralize a check (bin-patch-fn ./app check_license ret1) without touching any other offset / reloc / string. (The decompile → edit-C → recompile → relink loop is the next increment.)
  • Arm 2 — Symbolic / concolic. concolic: an interval constraint solver over the SSA-IR branch guards (no SMT dependency) — per path it reports SAT (with a concrete register seed that drives execution down it), DEAD (contradictory guards → opaque-predicate / dead-code signal), or PARTIAL. Concrete concolic seeds in the default build.
  • Arm 3 — Dynamic bridge. trace-plan: uses the code-property graph to choose a selective instrumentation scope (entry, call sites, dangerous sinks, loop heads — not every instruction) and emits a ready-to-run, ABI-aware GDB script. Drive it with concolic seeds; ingest the trace with runtime-merge.
  • Graph fusion — cross-binary name recovery. name-recovery recovers a stripped binary's anonymous sub_<va> names by matching them (40-dim structural fingerprint, cosine, greedy 1:1) to named functions in a reference binary, fusing the recovered names into the graph. Exact on same-build; honest-partial across optimization levels.
  • Decompiler correctness sweep. Fixed multi-block argument recovery (args flowing across a loop/branch were emitted void with use-before-def; now seeded at the SSA entry from the calling convention), struct-field deref (p->x), and 2D-array index (m[i*cols+j]) — all now recompile.
  • Built for the AI-agent customer. agent-brief (one-page high-signal map of a codebase), search (relevance-ranked discovery across 644 actions), graph-export (Graphviz / Mermaid / Cytoscape JSON / interactive HTML). Plus human onboarding: cargo binstall, a Homebrew formula, and a docs/HUMAN.md quickstart.

What's new in v8.3 — the graph-fused decompiler

v8.3 (through 8.3.5) turns codemap's binary side into a real decompiler: lift → SSA → DCE/copy-prop → type & variable recovery → SAILR structuring → readable, recompilable C. It went from "finds 1 function in a stripped PE" to:

  • Full binary coverage. PE (x86/x64), ELF (x86/x64/ARM/AArch64), and Mach-O x86-64 — function discovery via PE .pdata RUNTIME_FUNCTION, ELF symbols/.eh_frame, and Mach-O LC_FUNCTION_STARTS.
  • Readable C reconstruction. Recovered structs (p->field with synthesized typedefs), arrays (a[i]), string literals (return "hello world"), float/XMM ABI params & returns (SysV + Win64), C++ virtual calls (obj->vfunc_0()), and clean control flow on -O2 (no goto-soup).
  • C++ exception recovery. Idiomatic try { … } catch (int e) { … } reconstructed from a stripped binary's .eh_frame + .gcc_except_tableincluding the caught type, demangled from the LSDA type table. Most decompilers drop the handler entirely or render it as goto-soup.
  • Correctness, not just readability. Fixed real silent mis-decompilations — array-index liveness (loops returned a[0]·n), dropped movzbl masks (x & 0xffx) — caught and fixed via a re-execution gate.
  • Behaviorally verified. Every change is gated on a 79-binary recompilability corpus + a G10 re-execution harness (decompile → recompile → run → diff): recovered code is behavior-identical on the scalar subset, not just plausible-looking.
  • Graph-fused. Decompiled functions feed codemap's heterogeneous code-property graph, so its dataflow / taint / call-graph / centrality analyses run on stripped binaries, not just source.

What's new in v8

v8 cuts the v7 series at 7.184.0 (2026-05-18) and turns over to 8.0.0 (2026-05-20). Headline themes:

  • Action registry complete (T1). Every action self-registers via inventory::submit!; actions/mod.rs has zero dispatch arms (catch-all _ => Err(UnknownAction) only). Adding a new action is a single submit-block edit in the owning module file.
  • iced-x86 linear-sweep precision (T3). All bin_text_* density actions disassemble via iced-x86 instead of raw byte-scans — eliminates instruction-boundary false positives.
  • Lint zero (T8). #![deny(warnings)] locked into codemap-core and codemap-cli; cargo clippy -- -D warnings ships at 0 warnings.
  • arXiv research: filter scaffolds, ship real work (T9). pointer-analysis (Andersen field-sensitive PA + Tarjan SCC) and cegio (rsmt2-driven SMT) shipped with real implementations. bin-taint shipped Phase A (CFG, intra/inter-procedural taint, PLT-resolved source/sink, pathfinding, stripped-binary fallback). 16 items removed in v8.2.0 cleanup: 13 skeleton scaffolds (symex-concolic, loop-polyhedral, detect-memory-corruption, neural-decompile, side-channel-detect, symex-speculative, gpu-analyze, semantic-slice, synthesize, abstract-interp, bin-search, patch-binary, natural-query) + 3 failed experiments (meta-path-ppr proof +0.0000 lift, rfmoe 3/8 FAIL, ising-landscape proof pending) — all 59–145 LOC with no proof reports or integration tests.
  • 16 Phase F actions multi-corpus replicated: transfer-entropy, hebbian-coupling, kl-drift, network-motifs, code-entropy, criticality-soc, fatigue-crack, bio-physarum, preferential-attachment, small-world, phase-transitions, lyapunov-tracker, universality-class, lattice-evidence, control-theory-pid-ci-cd, codemap-mcp.

644 actions registered (full index in docs/ACTION_CATALOG.md; generated from the registry by gen-action-docs and gated by tests/single_source_of_truth.rs). 236 bin-* parsers, 18 source-language tree-sitter parsers, 1614 lib tests, decompiler recompile 96.5% (self-bench real, 193/200), 0 clippy warnings.


When to reach for codemap

Problem Codemap action Why codemap (vs alternatives)
"What does this codebase do?" summary --dir <path> Cross-file structural overview in one call. Beats reading files.
"Find unused functions / dead code" dead-functions --dir <path> Call-graph reachability across modules. grep can't do this.
"Who calls function X?" callers --dir <path> X True call graph (AST-aware), not a string match.
"What does function X depend on (transitively)?" trace --dir <path> X Walks the dep graph. grep would only find direct refs.
"What changed between two commits?" diff --dir <path> <ref1> <ref2> Semantic diff, not line diff.
"Find security issues" audit --dir <path> Composite of taint + secret-scan + dep-tree + dead-deps.
"Where would a tainted input flow?" taint --dir <path> --source <fn> --sink <fn> Path-sensitive, sanitizer-aware, alias-aware, cross-procedural.
"Reverse-engineer a binary" bin-info <path/to/binary> PE/ELF/Mach-O parser. capa + YARA + signsrch + PEiD rules built in.
"Find cross-language coupling" cross-lang --dir <path> Imports/calls that cross language boundaries.

When NOT to reach for codemap

  • Editing files: codemap is read-only. Use Edit/Write directly.
  • Running code: codemap doesn't compile or exec. Use bash.
  • Live process state: codemap is static. Use ps, lsof, ss.
  • Single-file grep: if you know the file, grep is faster.
  • String search across few files: if N<5 files, just grep.

Install

From release (recommended)

Download the tarball for your platform and extract the binary:

# Linux x86_64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-x86_64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap

# Linux aarch64
curl -fsSL https://github.com/charleschenai/codemap/releases/latest/download/codemap-v8.2.0-aarch64-linux.tar.gz -o codemap.tar.gz
tar xzf codemap.tar.gz -C ~/.local/bin/
chmod +x ~/.local/bin/codemap

# macOS (add to PATH if needed)
export PATH="$HOME/.local/bin:$PATH"

Add $HOME/.local/bin to your PATH in ~/.bashrc or ~/.zshrc:

export PATH="$HOME/.local/bin:$PATH"

For system-wide install (/usr/local/bin/codemap):

sudo cp codemap /usr/local/bin/
sudo chmod +x /usr/local/bin/codemap

From source

git clone https://github.com/charleschenai/codemap && cd codemap
cargo build --release -p codemap-cli
cp target/release/codemap ~/.local/bin/codemap
chmod +x ~/.local/bin/codemap

Verify

codemap --version-detail

Prints:

codemap 8.2.0
git: <latest-sha>
built: <build-date>
host: <hostname>/<arch>

If the binary is older than expected, re-run install with --update.


How to call any action

Universal shape:

codemap <ACTION> [TARGET...] --dir <PATH> [--json] [--quiet] [other-flags]
Flag Purpose
--dir <PATH> Required. Repo/dir to scan. Repeatable for multi-repo.
--json Output JSON (parseable). Default is text (human-readable).
--quiet Suppress scan/cache status messages on stderr.
--no-cache Force re-scan, ignore .codemap/cache.bincode.
--include-path <PATH> C/C++ include search path.
--watch [SECS] Re-run every N seconds.

For agents: always use --json and --quiet unless you specifically want text output.

Discover actions

codemap --help                                       # full action list
codemap <action> --help                              # action-specific flags

Action categories

644 actions (a curated subset advertised in --help, 236 fine-grained bin-* parsers, plus the rest) grouped by purpose. Full catalog at docs/ACTION_CATALOG.md. High-level groups:

Category Action count Examples
Analysis ~20 summary, stats, trace, callers, hotspots, layers, health, decorators
Code intelligence ~30 complexity, import-cost, churn, api-diff, clones, entry-points, dead-functions
Dataflow / security ~16 data-flow, taint, bin-taint, slice, trace-value, sinks, secret-scan, audit, dep-tree
Graph theory ~40 pagerank, hubs, bridges, centrality (17 measures), community (Leiden), bellman-ford
Binary / RE ~235 elf-info, pe-imports, macho-info, bin-anti-debug, bin-disasm, bin-strings, bin-relocs
Schemas ~10 proto-schema, openapi-schema, graphql-schema, sql-extract, dbf-schema
Supply chain ~10 osv-scan, sbom-diff, license-check, cve-scan
Config-as-code ~10 k8s-scan, iac-scan, dockerfile-scan, ci-scan, oci-scan
ML / AI ~10 gguf-info, safetensors-info, onnx-info, cuda-info, pyc-info
LSP bridge ~5 lsp-symbols, lsp-references, lsp-calls, lsp-diagnostics, lsp-types
Web ~5 web-sitemap, js-api-extract (HAR/HTML input required)
Cross-language ~5 lang-bridges, gpu-functions, monkey-patches
Composite ~10 audit, compare, validate, changeset, handoff, pipeline
arXiv-derived 2 pointer-analysis (Andersen PA), cegio (SMT optimizer)

Output schema

All --json outputs follow:

{
  "ok": <boolean>,
  "action": "<action-name>",
  "dir": "<scanned-path>",
  "result": <action-specific>,
  "stats": { "files_scanned": N, "duration_ms": M, "cache_hits": K }
}

result shape varies per action. Action-specific schemas in docs/SCHEMAS.md.

Exit codes

Code Meaning Agent response
0 Success Parse --json output
1 Usage error (bad flag, missing --dir) Re-read --help, fix args, retry
2 I/O error (path not found, no read perm) Verify path, retry
101 Panic Do not retry. File a bug at https://github.com/charleschenai/codemap/issues

Other non-zero codes: action-specific. See <action> --help.

AI agent usage guide

codemap is designed for AI agents as its primary customer. Below is the canonical walkthrough for integrating codemap into agent workflows.

Why use codemap instead of grep/read?

Scenario grep / raw edits codemap
"What does this codebase do?" Read every file sequentially summary — structural overview in one call
"Find dead / unused code" Manual reachability tracing dead-functions — true call-graph reachability
"Who calls function X?" String match across files callers — AST-aware call graph
"What does function X depend on?" Direct import grep trace — transitive dep graph walk
"What changed between two commits?" Line-level diff diff — semantic diff (AST-aware)
"Find security issues" YARA / pattern match audit — composite: taint + secret-scan + dep-tree + dead-deps
"Where does tainted input flow?" No tool taint — path-sensitive, sanitizer-aware, cross-procedural
"Analyze a compiled binary" strings + hexdump + manual bin-info + bin-taint — PE/ELF/Mach-O parsers + taint analysis
"Graph metrics on code" Custom scripts 500+ built-in actions (graph theory, entropy, ML, physics-inspired)

codemap is read-only, no network, no servers, no databases, no API keys. It scans your local filesystem, builds ASTs + CFGs + graphs in memory, and returns structured JSON output.

Canonical call pattern

Every action follows this pattern:

codemap <ACTION> [TARGET] --dir <PATH> --json --quiet [OPTIONS]
Flag Purpose
--json JSON output (machine-readable)
--quiet Suppress progress bars and logs
--dir Directory to analyze (required)

Output schema (for actions that return results):

{
  "ok": true,
  "result": { ... },
  "metrics": {
    "time_ms": 42,
    "files_scanned": 1501,
    "edges": 100219
  }
}

On failure:

{
  "ok": false,
  "error": "error message"
}

Exit codes:

  • 0 — success
  • 1 — error (check --json output for details)

Worked examples

Example 1: "What does this repo do?"

codemap summary --json --quiet --dir ./project
# → Cross-file structural overview: top-level modules, key dependencies, entry points

Example 2: "Find unused functions"

codemap dead-functions --json --quiet --dir ./project
# → Functions with zero callers across the module graph. Includes call-chain depth.

Example 3: "Security audit"

codemap audit --json --quiet --dir ./project
# → Composite: taint analysis + secret detection + dependency tree + dead deps
#   Returns findings ranked by confidence with source→sink paths

Example 4: "Taint analysis — find injection paths"

codemap taint --json --quiet --dir ./project --source read --sink system
# → Path-sensitive taint from `read` to `system` with confidence scoring
#   Reports ranked source→sink paths with alias resolution

Example 5: "Binary analysis — what is this executable?"

codemap bin-info --json --quiet ./target/release/my-binary
# → PE/ELF/Mach-O parser: sections, imports, exports, symbols,
#   capa-rules detection, YARA signatures, anti-debug indicators

MCP: the recommended adoption path

For agents that use MCP-compatible clients (Claude Code, Cursor, Windsurf), add codemap as an MCP tool server. All 644 actions become available as MCP tools with proper input schemas:

// ~/.claude/settings.json
{
  "mcpServers": {
    "codemap": {
      "command": "python3",
      "args": ["/path/to/codemap/docs/codemap-mcp-server.py"]
    }
  }
}

This is the recommended path because:

  1. No CLI parsing needed — tools have structured input schemas
  2. Self-documentingtools/list returns every action name, description, and schema
  3. Executable via JSON-RPCtools/call with {name, arguments} dispatches any action
  4. Zero config for AI — the agent discovers capabilities automatically

Set CODEMAP_BIN if your codemap binary is not on PATH:

export CODEMAP_BIN=~/.local/bin/codemap

Environment variables

Variable Purpose Default
CODEMAP_BIN Path to codemap binary codemap (from PATH)
CODEMAP_CACHE Custom cache directory .codemap/cache.bincode (next to scanned dir)

Error handling

Always check --json output for error details:

result=$(codemap <ACTION> --json --quiet --dir ./project)
if echo "$result" | python3 -c "import sys,json; d=json.load(sys.stdin); sys.exit(0 if d['ok'] else 1)"; then
  echo "Success: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['result'])")"
else
  echo "Error: $(echo "$result" | python3 -c "import sys,json; print(json.load(sys.stdin)['error'])")"
fi

Performance notes

  • Cold cache: sub-second on repos up to 3K files
  • Warm cache: near-instant (reads .codemap/cache.bincode)
  • Large repos (10K+ files): 5-30 seconds for full analysis
  • All analysis is in-memory. No disk writes except the cache file.
  • No network calls during analysis.

Recipes — when the agent has a specific job to do

Each recipe: what the action doescommandsample outputwhen to use it.

For the complete flat list of action names see docs/ACTION_CATALOG.md.


Codebase understanding (first-look on an unknown repo)

summary — one-page structural overview

Reports file count, languages, entry points, top modules, dispatch density. Single-call onboarding.

$ codemap summary --dir ./my-repo --json --quiet
{"ok":true,"result":{"files":2824,"languages":["rust","python","typescript"],
  "entry_points":["src/main.rs","src/lib.rs"],"top_modules":["analysis","insights","cpg"]}}

Use when: new repo, "tell me what this does" before diving deeper.

stats — quantitative metrics

Per-language LOC + file counts, function/class density, fan-in/fan-out distribution.

$ codemap stats --dir ./my-repo --json --quiet
{"ok":true,"result":{"rust":{"files":341,"loc":89432,"fns":2104},"python":{"files":52,"loc":4108}}}

Use when: comparing repos by size, reporting metrics, sanity-checking parse coverage.

layers — architectural layer detection

Infers boundaries (web / service / data / infra) from import patterns + naming conventions.

$ codemap layers --dir ./my-repo --json --quiet
{"ok":true,"result":{"layers":[{"name":"web","modules":["routes","handlers"]},
  {"name":"data","modules":["models","repo"]}],"violations":[...]}}

Use when: validating that "web shouldn't import from data" type architectural rules hold.

hotspots — files with most churn × complexity

Surfaces "danger zone" code (high git churn + high cyclomatic complexity).

$ codemap hotspots --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"hotspots":[{"file":"src/parser.rs","churn":48,"complexity":92,"score":4416}]}}

Use when: prioritizing refactor work, finding "where bugs live."

entry-points — public API surface

Lists exported functions/classes that other code can call from outside.

$ codemap entry-points --dir ./my-repo --json --quiet
{"ok":true,"result":{"entries":[{"name":"create_user","file":"api/users.rs","kind":"public_fn"}]}}

Use when: API documentation, understanding what's a stable contract.

health — overall quality summary

Composite: dead code % + clippy/lint count + circular deps + missing tests. Single "is this repo healthy?" score.

$ codemap health --dir ./my-repo --json --quiet
{"ok":true,"result":{"score":78,"dead_code_pct":3.2,"circular_deps":2,"missing_tests":["api/users.rs::delete"]}}

Use when: quick "should we touch this codebase or not" gut-check.


Code quality & cleanup

dead-functions — unreachable code

Functions never called by any other function in the workspace.

$ codemap dead-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":[{"file":"src/old.rs","function":"legacy_helper","line":42}]}}

Use when: cleanup PR, removing tech debt. Don't use for: identifying entry points (they're "dead" by call-graph but intentionally public).

dead-files — files imported nowhere

Files no other file imports / uses.

$ codemap dead-files --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead_files":["src/experimental/old_impl.rs","tools/debug.py"]}}

Use when: dead-import cleanup.

dead-deps — declared deps never imported

Packages in Cargo.toml/package.json/pyproject.toml that no source file imports.

$ codemap dead-deps --dir ./my-repo --json --quiet
{"ok":true,"result":{"dead":["serde_json (Cargo.toml)","lodash (package.json)"]}}

Use when: dep cleanup, reducing build time + attack surface.

complexity — cyclomatic complexity per function

McCabe complexity (branches+1). Catches "this function should be split."

$ codemap complexity --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"fn":"parse_expression","file":"parser.rs","cyclomatic":34,"lines":280}]}}

Use when: finding refactor candidates, code review automation.

churn — git change frequency per file

Commits-touching-file count over a window.

$ codemap churn --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"top":[{"file":"src/parser.rs","commits":78,"authors":12}]}}

Use when: combined with complexity for hotspots, ownership analysis.

clones — duplicated code blocks

Detects near-identical token sequences across files (copy-paste detection).

$ codemap clones --dir ./my-repo --json --quiet --min-tokens 50
{"ok":true,"result":{"clones":[{"size":120,"locations":[["a.rs:14","b.rs:22"]],"similarity":0.94}]}}

Use when: finding extraction candidates for shared functions.

circular — circular import detection

Reports module cycles (a → b → c → a).

$ codemap circular --dir ./my-repo --json --quiet
{"ok":true,"result":{"cycles":[["src/a.rs","src/b.rs","src/a.rs"]]}}

Use when: untangling architecture before a refactor.


Impact tracing & change analysis

trace — transitive callees (what does X depend on?)

Walks the call graph forward from a function/symbol, returns full dep tree.

$ codemap trace --dir ./my-repo --json --quiet RecalcInvoiceTotals
{"ok":true,"result":{"node":"RecalcInvoiceTotals","calls":[
  {"name":"ship_chg_sum","file":"backend/invoices.go:120","depth":1},
  {"name":"format_money","file":"util/money.go:8","depth":2}]}}

Use when: impact analysis before changing a function, generating context for an LLM.

callers — transitive callers (who calls X?)

Reverse of trace. Returns the function's call sites + their callers.

$ codemap callers --dir ./my-repo --json --quiet validate_user
{"ok":true,"result":{"callers":[{"caller":"login","file":"auth.py:88","depth":1}]}}

Use when: "if I change this signature, what breaks?"

blast-radius — affected entities from a change

Combines callers + dataflow + tests touched. Most pessimistic estimate.

$ codemap blast-radius --dir ./my-repo --json --quiet --target User.id
{"ok":true,"result":{"functions":42,"tests":7,"endpoints":3,"db_columns":2}}

Use when: "what's the size of changing this thing?"

diff — semantic diff between two refs

Function-level diff: added, removed, signature-changed, body-changed.

$ codemap diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"added":["validate_email"],"removed":["old_validator"],
  "signature_changed":[{"fn":"create","before":"(name)","after":"(name,email)"}]}}

Use when: generating PR descriptions, understanding code review scope.

api-diff — breaking-change classifier

Like diff but specifically flags BREAKING vs additive changes to public API.

$ codemap api-diff --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"breaking":[
  {"kind":"removed","fn":"OldAPI::v1_login"},
  {"kind":"signature_change","fn":"create_user","before":"(name)","after":"(name,email)"}]}}

Use when: versioning decisions (semver minor vs major), CHANGELOG generation.

diff-impact — functions affected by a commit range

Maps the diff to every transitively-affected caller.

$ codemap diff-impact --dir ./my-repo --json --quiet HEAD~5 HEAD
{"ok":true,"result":{"impacted_fns":127,"impacted_files":34,"high_risk":["payment::charge"]}}

Use when: deciding test scope for a PR.

churn-vs-complexity (via hotspots) — see Codebase understanding above


Data flow & security

audit — composite security report

Runs taint + secret-scan + dead-deps + dep-tree + license-check in one pass.

$ codemap audit --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[
  {"kind":"secret","file":".env.sample","line":3,"pattern":"AWS_KEY"},
  {"kind":"taint","source":"req.body","sink":"db.execute","path":[...]},
  {"kind":"dep-vuln","package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

Use when: first-pass security review of an unfamiliar repo.

taint — path-sensitive taint flow

Tracks tainted values from source(s) to sink(s). Sanitizer-aware, alias-aware (e.g. safe = sanitize(x)), cross-procedural (parses wrapper bodies to detect hidden sanitizers).

$ codemap taint --dir ./my-repo --json --quiet --source 'req.query' --sink 'db.execute'
{"ok":true,"result":{"paths":[{"source":"req.query.id","sink":"db.execute(sql)",
  "hops":["params.id","userId","query"],"sanitized":false}]}}

Use when: SQLi/XSS/SSRF detection, "is user input reaching this sink?"

slice — backward program slice

Given a target variable/sink, return only the code that influences it.

$ codemap slice --dir ./my-repo --json --quiet --var 'password' --file auth.py
{"ok":true,"result":{"slice_lines":[12,15,22,30,42],"file":"auth.py"}}

Use when: narrowing what to read when chasing a bug.

sinks — list all dangerous sinks

Enumerates every db.execute, eval, exec, Runtime.exec, subprocess.shell=True, innerHTML=, etc.

$ codemap sinks --dir ./my-repo --json --quiet
{"ok":true,"result":{"sinks":[{"kind":"sql","file":"api/users.rs","line":88,"expr":"db.execute(query)"}]}}

Use when: building taint queries, audit checklist generation.

secret-scan — credentials in source

20+ patterns (AWS key, GitHub PAT, Slack token, Stripe live key, private keys, JWT, DB conn strings, etc.). Redacted output.

$ codemap secret-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"findings":[{"file":".env.sample","line":3,"kind":"aws_access_key","masked":"AKIA****REDACTED"}]}}

Use when: pre-commit hook, pre-publish audit.

data-flow — value origin tracing

Where does this variable's value come from? (def-use chain)

$ codemap data-flow --dir ./my-repo --json --quiet --target 'user_id'
{"ok":true,"result":{"origins":[{"file":"auth.py:88","expr":"req.cookies['session']"}]}}

Use when: "where does this magic value come from?"

api-surface — every exported HTTP endpoint

Detects Flask/Express/Axum/FastAPI/Spring/Rocket route handlers. Lists path + method + handler.

$ codemap api-surface --dir ./my-repo --json --quiet
{"ok":true,"result":{"endpoints":[{"method":"POST","path":"/users","handler":"create_user","auth_required":false}]}}

Use when: generating OpenAPI from existing code, finding unauthenticated endpoints.


Graph algorithms (heterogeneous-graph queries)

These run on codemap's internal call graph + import graph + AST graph.

pagerank — most-important nodes

NetworkX-style PageRank. High score = central + many incoming refs.

$ codemap pagerank --dir ./my-repo --json --quiet --top 10
{"ok":true,"result":{"ranked":[{"fn":"handle_request","score":0.082}]}}

Use when: finding "load-bearing" functions, prioritizing code review.

hubs — high-out-degree nodes

Functions/modules that depend on many others. Different from PageRank (which is about incoming).

$ codemap hubs --dir ./my-repo --json --quiet
{"ok":true,"result":{"hubs":[{"fn":"orchestrator","out_degree":47}]}}

Use when: finding god-objects, refactor targets.

bridges — single-edge cut points

Edges whose removal disconnects the graph. These are critical paths.

$ codemap bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"from":"auth","to":"db","modules":["auth.rs","db.rs"]}]}}

Use when: identifying single points of failure in module coupling.

centrality (17 measures) — broker / connector detection

Run with a specific measure: betweenness, eigenvector, katz, closeness, harmonic, load, structural-holes (brokers), voterank, etc. All NetworkX standards.

$ codemap betweenness --dir ./my-repo --json --quiet --top 5
{"ok":true,"result":{"top":[{"node":"db_session","betweenness":0.34}]}}

Use when: finding modules that connect otherwise-separate subsystems.

clusters — community detection (Leiden default)

Partitions the graph into densely-connected sub-communities.

$ codemap clusters --dir ./my-repo --json --quiet leiden
{"ok":true,"result":{"clusters":[{"id":0,"size":34,"members":["auth.rs","users.rs"]}]}}

Use when: discovering implicit module boundaries.

paths — shortest path between two nodes

Returns the chain of imports/calls connecting source → target.

$ codemap paths --dir ./my-repo --json --quiet user_input db_write
{"ok":true,"result":{"path":["user_input","sanitize","query_builder","db_write"],"length":4}}

Use when: "how does X reach Y?"

subgraph — extract a focused subgraph

Returns nodes within N hops of a target. Useful before deep analysis.

$ codemap subgraph --dir ./my-repo --json --quiet --target login --depth 2
{"ok":true,"result":{"nodes":[...],"edges":[...]}}

Use when: narrowing scope before more expensive analysis.

bellman-ford <src> / astar <src> <tgt> / floyd-warshall / etc.

Classical shortest-path algorithms exposed for graph queries. See ACTION_CATALOG.md for full list.


Binary analysis & reverse engineering

Decompiler (ir / decompile) — full lift → SSA → simplify → type-recovery → variable-recovery → calling-convention → SAILR structuring → C++ RTTI → readable-C emit pipeline

This is a real decompiler. 14-stage pipeline that reconstructs expressions, variables, types, and if / while / switch syntax (incl. jump tables / computed branches / string-literal returns) from compiled binaries. Full G10 fidelity (10/10) + 79/79 protected-bin decomp test pass (bugbins-verify + reexec_harness) with switch_dispatch special-case recovery (const char* + "zero".."seven"/"unknown" map, a1 scrutinee, correct default VA); see CHANGELOG + docs/COMMIT_LEDGER.md for G10 fixes + Job 3 consolidation + GAP3-6 (F-4 -O2 dangling-goto/continue, C++ vcall via rtti, XMM/float ABI + libc-extern recomp fix, Mach-O x86-64 thin+FAT) + GAP9 (no more rsp/rbp/rbx/r12-r15 "lifter gap =0" noise decls in every fn; frame uses elided to 0) + GAP8 (struct field recovery: ptr->field_0xN with synthesized typedefs for recompile) + GAP7 Part A (array element type from access width: int32_t* for 4-byte loads). Emitted C is gcc-recompilable (current 79/79 state supersedes earlier ~48/60 notes). Cross-binary type propagation + RTTI + stack slots + confidence scores. Mach-O x86-64 support (function discovery via LC_FUNCTION_STARTS + symtab + sections; feeds iced-x86/IR).

Known limitation (gap 11, deferred): Array indexing inside loops can decompile with an incorrect (use-before-def) index (e.g. ghost reg instead of loop counter v), producing behaviorally-wrong recompiled output (sum may return a[0]*n instead of 10); element type is correct. Root: copy-prop drops the index register's def on register reuse inside the loop. Tracked as gap 11.

Remaining gaps documented in DECOMPILER.md. (New direction: user-driven decompiler quality per Ghidra issues etc.)

# Decompile a single function (full pipeline)
codemap ir <binary> [<hex-fn-addr> | <name>]

# Decompile entry point
codemap ir <binary>

# Batch call-tree walk with structural hints
codemap decompile <binary> [max-depth=N] [max-children=N] [deep]

Pipeline stages:

  1. Lift — iced-x86 decode → IRCFG (three-address IR with explicit BitWidth)
  2. SSA construction — Cytron et al. (1991): iterated-dominance-frontier phi placement + pre-order DFS renaming
  3. Simplify — 42 peephole rules (Miasm / angr reference-FIRST): constant folding, identity elimination, SSA-aware simplification, signed-div-by-power-of-2, ROL/ROR detection, byte-swap, etc.
  4. Calling-convention recovery — SysV AMD64 ABI: populate Call.args from rdi/rsi/rdx/rcx/r8/r9
  5. Dead-code elimination — backward dataflow liveness (~80% flag computations pruned)
  6. Copy/constant propagation — 4 alternating iterations of copy-prop + simplify + DCE
  7. Dead-block removal — reachability from entry; prunes linker padding
  8. Block coalescing — merge linear Goto-chains
  9. SAILR structuring — CFG + IRCFG → C-shaped AST (Sequence / IfThen / IfThenElse / While / For / Switch / Call / Goto)
  10. Variable recovery — classifies variables: Register, Stack, Memory, Temporary, Constant
  11. Type inference — Phase 2 seeded from widths + Mem-loads/Stores; iterated-meet solver infers Int / Pointer / struct types
  12. Stack-slot analysis — rsp-relative offsets for *(rsp_N)stack[<offset>]
  13. C++ RTTI analysis — vtable references → class declarations (base classes, virtual methods, fields)
  14. C emission — structured AST → readable C source with type annotations, stack-slot names, symbol resolution

Differentiators:

  • Cross-binary type/name propagation — types from one binary's RTTI flow into another's
  • Graph-as-validator — heterogeneous code graph cross-checks decompilation output
  • Recompilable-C target — structured, typed, symbol-resolved C suitable for recompilation

Example output:

=== codemap ir ===
Binary:        ./target/release/codemap
Format:        ELF64 (64-bit, arch=x64)
Function:      main @ 0x401000 (234 bytes, 78 insns)
CFG blocks:    12
CFG edges:     18 (pre-enrich) → 18 (post-enrich)
Jump tables:   0 resolved indirect-JMPs
SSA phis:      3 inserted
Variables:     45 total (12 reg, 20 stack[-0x10..+0x18], 10 mem, 3 const, 0 tmp)
Types:         30 bound (15 int, 10 ptr, 3 top, 2 bot, 0 other)
CC args:       5 call sites populated (SysV AMD64)
DCE removed:   62 dead stmts (pre-prop) + 8 (post-prop)
Copy-prop:     15 stmts inlined
Dead blocks:   2 removed (unreachable)
Coalesced:     4 blocks merged

--- structured AST ---
Sequence {
  Let { rbp_0 = rbp }
  Let { rsp_0 = (rsp - 0x10) }
  IfThen {
    Cond: (rax_0 == 0)
    Then: Sequence { Call { printf("usage\n") } }
  }
  While {
    Cond: (argc_0 > 0)
    Body: Sequence { ... }
  }
  Ret { rax_0 }
}

--- C-shaped output ---
int main(int argc, char *argv[]) {
    uint64_t rbp_0 = rbp;
    uint64_t rsp_0 = (rsp - 0x10);

    if (rax_0 == 0) {
        printf("usage\n");
    }

    while (argc_0 > 0) {
        // ... loop body ...
        argc_0 = argc_0 - 1;
    }

    return rax_0;
}

Use when: binary reverse engineering, understanding compiled code, patch generation, static analysis of binaries. See docs/DECOMPILER.md for full pipeline reference.


bin-info / elf-info / macho-info / pe-info — binary fingerprint

Format detection, arch, sections, strip state, language hints (Rust/Go/C++), anti-debug rules, packer detection.

$ codemap bin-info /usr/local/bin/codemap --json --quiet
{"ok":true,"result":{"format":"ELF64","arch":"aarch64","rust":true,"strip":false,
  "sections":34,"anti_debug":[],"packed":false}}

Use when: triage step 1 — "what is this binary?"

pe-imports / pe-exports — Windows PE import/export tables

Lists every DLL imported + every function exported.

$ codemap pe-imports ./sample.exe --json --quiet
{"ok":true,"result":{"imports":[{"dll":"kernel32.dll","functions":["VirtualAlloc","CreateProcessA"]}]}}

Use when: static behavioral profiling — what APIs does this binary depend on?

pe-strings / bin-strings — string extraction

Ascii + utf16le + entropy-filtered.

$ codemap pe-strings ./sample.exe --json --quiet --min-len 8
{"ok":true,"result":{"strings":["http://c2.example.com","cmd.exe /c"]}}

Use when: triaging unknown binaries — strings often reveal C2 URLs, command lines, paths.

binary-diff — semantic binary diff

Functions added / removed / modified between two builds.

$ codemap binary-diff --json --quiet --left v1.exe --right v2.exe
{"ok":true,"result":{"added":["new_handler"],"removed":["legacy_proc"],"modified":["main"]}}

Use when: patch analysis, regression hunting in firmware.

dotnet-meta — .NET assembly metadata

PE that contains CLI/.NET — reads the metadata streams, lists types + methods.

$ codemap dotnet-meta ./sample.dll --json --quiet
{"ok":true,"result":{"assembly":"Sample.Dll","types":["Foo","Bar"],"methods_count":42}}

Use when: analyzing .NET malware or .NET 3rd-party libs.

java-class — JVM class file

Constant pool, method signatures, bytecode summaries.

wasm-info — WebAssembly module

Imports, exports, function table, memory layout.


Schemas & config-as-code

openapi-schema / graphql-schema / proto-schema — extract API schemas

Parses spec files and reports endpoints/types/operations.

$ codemap openapi-schema --dir ./api --json --quiet
{"ok":true,"result":{"paths":[{"method":"GET","path":"/users","operationId":"listUsers"}]}}

Use when: generating client code, checking spec consistency.

k8s-scan — Kubernetes CIS audit (16 rules)

Checks privileged containers, hostNetwork, missing resource limits, etc.

$ codemap k8s-scan --dir ./k8s/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"K8S-001","resource":"Deployment/api","severity":"high","msg":"privileged=true"}]}}

Use when: auditing manifests before apply.

iac-scan — Terraform/CloudFormation/Pulumi audit (12 rules)

$ codemap iac-scan --dir ./infra/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"IAC-007","file":"main.tf","msg":"S3 bucket public-read ACL"}]}}

dockerfile-scan — Dockerfile audit (10 rules)

$ codemap dockerfile-scan --dir ./ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"DKR-002","msg":"running as root","line":18}]}}

ci-scan — CI/CD pipeline audit (37 rules across 6 ecosystems)

GitHub Actions, GitLab CI, Jenkinsfile, CircleCI, Azure Pipelines, Travis. Catches injection, unpinned actions, secret literals, pull_request_target misuse.

$ codemap ci-scan --dir ./.github/ --json --quiet
{"ok":true,"result":{"findings":[{"rule":"GH-003","file":"deploy.yml","msg":"unpinned action ref"}]}}

oci-scan — OCI image / docker save tarball audit

Per-layer manifest, layer-resident secrets (11 patterns), licenses, file/dir/symlink counts.

$ codemap oci-scan --dir ./image.tar --json --quiet --mode all
{"ok":true,"result":{"layers":[...],"secrets":[...],"licenses":[...]}}

sql-extract — SQL DDL/DML extraction

Pulls SQL out of source code or .sql files. Schema + queries.

$ codemap sql-extract --dir ./my-repo --json --quiet
{"ok":true,"result":{"tables":[{"name":"users","columns":[...]}],"queries":[...]}}

Supply chain

osv-scan — match deps against OSV.dev advisories (offline)

Semver-range-aware.

$ codemap osv-scan --dir ./my-repo --json --quiet
{"ok":true,"result":{"vulns":[{"package":"lodash","version":"4.17.20","cve":"CVE-2021-23337"}]}}

sbom-diff — CycloneDX/SPDX diff

Added, removed, upgraded, downgraded packages between two SBOMs.

$ codemap sbom-diff --left ./sbom-1.spdx.json --right ./sbom-2.spdx.json --json --quiet
{"ok":true,"result":{"added":[...],"removed":[...],"upgraded":[...]}}

license-check — SPDX compatibility

Per-package license + compatibility verdict.

$ codemap license-check --dir ./my-repo --json --quiet
{"ok":true,"result":{"deps":[{"name":"foo","license":"GPL-3.0","compatible":false}]}}

cve-scan — same as osv-scan but specifically against MITRE CVE corpus


ML / AI model files

gguf-info — llama.cpp GGUF inspection

Architecture, layer count, head count, quant level, vocab size.

$ codemap gguf-info ./model.gguf --json --quiet
{"ok":true,"result":{"arch":"llama","n_layers":32,"n_heads":32,"vocab_size":32000,"quant":"Q4_K_M"}}

Use when: "what model is this file?" Pre-load sanity check.

safetensors-info — HuggingFace safetensors inspection

Tensor shapes, dtypes, total params.

$ codemap safetensors-info ./model.safetensors --json --quiet
{"ok":true,"result":{"tensors":291,"total_params":7240000000,"dtype":"float16"}}

onnx-info — ONNX model graph

Operators, inputs, outputs, opset.

$ codemap onnx-info ./model.onnx --json --quiet
{"ok":true,"result":{"opset":17,"ops":["Conv","Relu","MaxPool"],"inputs":[{"name":"x","shape":[1,3,224,224]}]}}

cuda-info — CUDA fatbin/cubin inspection

SM versions present, kernel symbols.

pyc-info — Python bytecode inspection

Magic number, marshalled code object, imports.


Cross-language & web

lang-bridges — FFI/binding detection

Detects PyO3 / napi / wasm-bindgen / JNI etc. — where languages interop.

$ codemap lang-bridges --dir ./my-repo --json --quiet
{"ok":true,"result":{"bridges":[{"kind":"pyo3","rust_fn":"create_user","py_module":"my_lib"}]}}

gpu-functions — GPU kernels in source

CUDA __global__, OpenCL kernels, Metal compute kernels, ROCm/HIP.

$ codemap gpu-functions --dir ./my-repo --json --quiet
{"ok":true,"result":{"kernels":[{"name":"matmul_kernel","framework":"cuda","file":"kernels.cu"}]}}

monkey-patches — runtime mutation detection

obj.method = new_fn, setattr, prototype patching.

dispatch-map — generic dispatch tables

Routers, registries, plugin maps. Finds the "switch statement that controls behavior."

web-sitemap — sitemap.xml + crawled link graph

js-api-extract — extract API calls from HAR / JS source


LSP bridge (requires a running language server)

lsp-symbols — workspace symbol table from LSP

Real symbol info, not AST-inferred. More accurate for typed languages.

lsp-references — every reference to a symbol (LSP-grade)

lsp-calls — call hierarchy from LSP

lsp-diagnostics — current LSP diagnostics across the workspace

$ codemap lsp-diagnostics --dir ./my-repo --json --quiet
{"ok":true,"result":{"diagnostics":[{"file":"src/main.rs","line":42,"severity":"error","msg":"E0308: mismatched types"}]}}

Use when: programmatic access to compiler/type-checker errors.

lsp-types — type info on hover for a position


arXiv-derived research actions (advanced)

These implement specific research papers. cegio and pointer-analysis have real implementations with proof reports; bin-taint Phase A shipped with empirical proof (P@10 target, achieved P=1.00/R=0.80).

pointer-analysis — Andersen field-sensitive PA

Computes points-to sets (which pointers can alias which memory). Field-sensitive + flow-insensitive + Tarjan SCC pre-pass for performance.

$ codemap pointer-analysis --dir ./my-repo --json --quiet
{"ok":true,"result":{"scope_vars":102000,"copy_constraints":132000,
  "aliases":[{"ptr":"p","may_alias":["a","b"]}]}}

Use when: understanding aliasing for refactoring (rename a field safely), upstream of taint analysis.

cegio — counterexample-guided inductive optimization

arXiv 1704.03738. Given taint paths, synthesizes the minimum input that triggers a vulnerability.

$ codemap cegio --dir ./my-repo --json --quiet --taint-result <prior-taint-output>
{"ok":true,"result":{"trigger":{"input":"' OR 1=1--","reaches_sink":true}}}

Use when: turning a taint finding into a proof-of-concept exploit input.

bin-taint — binary taint analysis (Phase A)

Lifts x86-64 ELF executable sections to a taint IR, builds CFG, propagates forward may-taint dataflow from PLT-resolved sources (read/recv/fread/getenv/strcpy/memcpy) to sinks (system/popen/exec/sprintf/dlopen), reports ranked source→sink paths. Stripped-binary fallback via bounded .text pathfinding. Proof: precision 1.00, recall 0.80 on 8-binary corpus (4 vuln classes detected, 0 false positives on 3 safe programs).

$ codemap bin-taint ./vulnerable-binary --json --quiet
{"ok":true,"result":{"findings":[{"source":"getenv","sink":"system","hops":["env","cmd","system"],"confidence":0.9},{"source":"read","sink":"sprintf","hops":["buf","format","sprintf"],"confidence":0.7}]}}

Use when: binary taint analysis on stripped ELF, finding command injection / format string / exec injection paths in compiled code.


Composite workflows

audit — kitchen-sink security report

See "Data flow & security" section above.

validate — sanity check (build + lint + tests + audit summary)

Single composite for "is this repo broken?"

changeset — file-grouped diff summary

$ codemap changeset --dir ./my-repo --json --quiet HEAD~10 HEAD
{"ok":true,"result":{"changes":{"feat":[...],"fix":[...],"refactor":[...]}}}

handoff — generate handoff document for a project

Distills repo state into a single MD doc (status + open issues + recent work + next-steps).

pipeline — multi-action pipeline runner

Run several actions in sequence, accumulate results.

$ codemap pipeline --dir ./my-repo --json --quiet --target 'audit:./,trace:main,hotspots:'
{"ok":true,"result":{"audit":{...},"trace":{...},"hotspots":{...}}}

Use when: scripted multi-step analysis.


Architecture (1-paragraph)

codemap walks --dir, parses with tree-sitter, builds a file-level import graph and a function-level call graph, layers PE/ELF/Mach-O/WASM/Java binary parsers + x86/x64 disassembly, and exposes 644 actions through a uniform CLI registry (inventory::submit!). Cache: .codemap/cache.bincode next to the scanned dir. Pure static. No daemons, no network access at analysis time.

Repo layout

  • codemap-core/ — parsing, graph, algorithms, actions
  • codemap-cli/ — the codemap binary
  • codemap-napi/ — Node.js bindings (optional)
  • docs/ — REFERENCE.md, ACTION_CATALOG.md, SCHEMAS.md, HUMAN.md
  • install.sh — single install entry

License

MIT. See LICENSE.

About

Static codebase + binary analyzer and decompiler. Decompiles stripped PE/ELF/Mach-O to readable, behaviorally-verified C — structs, arrays, strings, C++ vtables and try/catch. 524 actions, single Rust binary, zero deps.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages