Skip to content

maudlin/checkup

Repository files navigation

🩺 Application Checkup

CI License: MIT

A deterministic localiser of codebase-health problems β€” it tells you (or an agent) where the effort should go, before a line of code is read. A single shell entrypoint runs ~20 checks across code, dependencies, security, containers, CI and git history, and produces a machine-readable JSON stream (primary) plus a human-readable markdown report (always) β€” two signals: an overall health read and the biggest problems, ranked.

It is not a deploy gate β€” that's CI's job. Its value is front-loading, deterministically and up front, the gestalt a smart agent would otherwise spend tokens inferring ("this is Classic ASP", "there are no tests", "this module is hot, complex and bug-prone"). Cheaper, certain, reproducible, pre-token. See ADR-0009.

Four contexts, one job β€” "here's where the health problems are," never "may I ship?":

  1. Prime an AI agent β€” a deterministic "start here" before the expensive, non-deterministic agent runs.
  2. Team prioritisation β€” find the highest-leverage fix for today's pains and tomorrow's failures.
  3. Tech due diligence β€” a fast read of a product's code hygiene.
  4. Periodic safety-net sweep β€” a coarse-cadence catch of what slipped past CI.

Health is read across four pillars: maintainability (complexity, duplication, coupling, hotspots), safety/maturity (tests, coverage, mutation, docs β€” absence is a loud signal), currency & technology-viability (dependency rot, EOL runtimes, dead platforms), and correctness (does it build / pass β€” lowest weight, often unrunnable on a target you don't own).

It is tool-agnostic and portable: every check degrades gracefully when its tool is absent, and the contract documented below lets you swap the language-specific checks for your own stack's equivalents without touching the helpers, the renderer, or the report format. A grade is fine; a gate it is not.

Released under MIT (see LICENSE). Forks, ports, and ports-back welcome.


Quick start

# Run against the current project (resolves the enclosing git repo):
cd /path/to/your-project
/path/to/checkup/bin/checkup.sh

# …or scan an explicit target without cd-ing into it:
CHECKUP_TARGET=/path/to/your-project /path/to/checkup/bin/checkup.sh

Output lands under the scanned project: docs/reports/checkup-report.md (committable "latest") plus reports/parsed/*.json (machine-readable, one file per check). Add an alias (alias checkup=/path/to/checkup/bin/checkup.sh) or symlink bin/checkup.sh onto your PATH. To pin it into a project, vendor the repo (e.g. as a git submodule) and call bin/checkup.sh from an npm/make task.

Run in Docker (no host installs)

The checkup-core image bakes the cross-stack tools (gitleaks, semgrep, shellcheck, yamllint, hadolint, scc) so you can examine any repository with nothing installed but Docker β€” ideal for ad-hoc audits and due diligence:

docker build -t checkup-core .          # one-off, from this repo

# Scan a project: source mounted READ-ONLY, report written to ./checkup-out
docker run --rm \
  -v "/path/to/project:/src:ro" \
  -v "$PWD/checkup-out:/out" \
  checkup-core
# β†’ ./checkup-out/checkup-report.md  (+ parsed/*.json, by-file.json)

The source is mounted read-only β€” checkup writes nothing into it; everything goes to /out. Mount a full clone (not a shallow/exported tree) so the git-forensics checks have history.

For sensitive or due-diligence scans, run it sealed (--network none + a minimal sandbox) so a compromised tool can't exfiltrate the code β€” see SECURITY.md.

What runs in checkup-core: the cross-stack security, hygiene and forensics checks (secrets, SAST, shell/YAML/Dockerfile lint, stats, churn Γ— complexity). Language- and build-specific checks (typecheck, test, build, coverage) belong to per-stack images β€” see ROADMAP.md. On a repo without the Node toolchain they skip honestly (they don't fail or false-pass); read the cross-stack sections for the core signal.

checkup-dotnet overlay (.NET / legacy ASP)

FROM checkup-core plus the .NET SDK, Microsoft DevSkim and PMD CPD. Runs every core check, then adds four .NET / legacy-ASP passes β€” asp-classic (semgrep ruleset for Classic ASP/VBScript), devskim (source SAST, no build, reaches .NET Framework source), dotnet-vuln (dotnet list package --vulnerable, skips honestly on legacy packages.config), and duplication (PMD CPD β€” language-aware copy-paste detection for C# and other CPD languages; Classic ASP has no CPD tokeniser). New findings flow into the same report automatically (the renderer is tool-agnostic).

docker build -t checkup-core .                        # base first
docker build -f Dockerfile.dotnet -t checkup-dotnet . # overlay

docker run --rm -v "/path/to/app:/src:ro" -v "$PWD/out:/out" checkup-dotnet

The report location is controlled by CHECKUP_OUT_DIR (set to /out in the image): set it in any context to write outputs outside the scanned tree. Unset, checkup keeps the committed docs/reports/checkup-report.md convention.


Priming an agent

checkup's first-class use is front-loading an AI coding agent: run it, then hand the result to the agent as a briefing so it starts in the right place, the right way β€” before it spends a token reading code. The agent-first artefact is reports/checkup.json (a single versioned bundle; see architecture).

A prompt that turns the report into safe, prioritised action:

A checkup health report exists for this codebase. Start with reports/checkup.json
β€” the bundled signal β€” before reading source:
  β€’ overall        β€” the headline health read
  β€’ headlineAlarms β€” the loudest whole-codebase risks
  β€’ pillars        β€” health by axis (maintainability / safety / currency / correctness) + security
  β€’ focusTop       β€” the highest-risk files (hot Γ— complex Γ— bug-prone)

Use it to decide where and how to start:

1. Headline alarms first, explicitly. A leaked secret β†’ rotate & purge before
   anything else. A dead/declining platform β†’ flag and discuss; don't sink
   refactor effort into a rewrite candidate. No test safety net β†’ write
   characterisation tests before you change behaviour.
2. Let the safety/maturity pillar set your method. If tests are absent or weak,
   work in small verifiable steps and add coverage as you go β€” don't refactor blind.
3. Take the highest-leverage item from focusTop, open those files to confirm,
   and propose a short plan before changing anything.
4. Treat skipped / "no data" checks as "not assessed", not "fine" β€” state what
   you couldn't determine.

checkup tells you WHERE and HOW SAFELY to start; you read the code to decide WHAT to do.

This is a starting template β€” tailor it to your agent and stack. The same bundle drives non-agentic uses too (a human reads checkup-report.md; CI/trend consumers read the JSON). checkup is a localiser and a briefing, not a gate (ADR-0009).

For a CTO-level read β€” what is this, what's the real risk posture, what's the dominant unknown, what would de-risk it β€” see the executive-summary recipe: a portable prompt that synthesises a run into a one-page brief. Note its privacy trade-off β€” feeding source to a hosted LLM breaks the deterministic scan's deny-egress property (ADR-0008); the recipe flags it and offers a report-only mode.


Documentation

Doc What
docs/architecture.md How it works β€” contract, schema, layering
docs/build-your-own.md Run on a host, slim images, extract tools
docs/tools.md Bundled tools, versions, verification
docs/decisions/ ADRs β€” why it's built this way
ROADMAP.md + Issues What's next (milestone v0.2.0)
AGENTS.md Β· CONTRIBUTING.md Agent guidance Β· engagement model

Entrypoints

Script Purpose
bin/checkup.sh Orchestrator. Sources lib/run-tool.sh, runs every check, emits the normalised stream.
bin/checkup-dotnet.sh .NET / legacy-ASP overlay. Runs core, then appends asp-classic + devskim + dotnet-vuln + duplication.
bin/checkup-report.sh Tool-agnostic markdown renderer. Reads reports/parsed/*.json β†’ writes the report.
lib/run-tool.sh Shared helpers (run_tool, write_parsed, write_skipped, write_failed, is_valid_json, slug).
./bin/checkup.sh        # runs all checks, then renders the report

Prerequisites

Tier 1 β€” required (everything below this line is mandatory)

The substrate cannot run without these. Most are already on any modern dev box; listed for forker completeness.

Tool Why
bash (4+) Orchestrator + helpers
jq All parsed-JSON emission and the cross-tool renderer
git Required by git-hotspots + the git-smells trio
node / npm Every npm-script-driven check
POSIX find, grep, sort, awk, sed Helpers in lib/run-tool.sh and section parsers

Tier 2 β€” per-check graceful-degrade (each check skips with a documented reason if its tool is absent)

Every section in checkup.sh follows the contract: if its tool is missing (LAST_EXIT == 127), the section emits a skip parsed JSON with a human reason. No check is mandatory; missing tools never block the run.

Tool Used by Install
shellcheck shellcheck section apt install shellcheck / brew install shellcheck / static binary on GitHub releases
yamllint yamllint section pipx install yamllint (recommended) / apt install yamllint
hadolint hadolint section brew install hadolint / Linux static binary on GitHub releases (arch-mapped: x86_64/arm64)
gitleaks gitleaks section brew install gitleaks / Linux static binary on GitHub releases (arch-mapped: x64/arm64)
scc codebase-stats section brew install scc / Linux static binary on GitHub releases
madge, jscpd, knip, semgrep, stryker various npm-script-driven sections npm install (devDependencies; provided by the host project)

Configuration files (project-owned)

The substrate is config-driven for the linters that can be tuned:

  • .shellcheckrc β€” disabled rules
  • .yamllint.yml β€” line-length, truthy keywords, comment style
  • .hadolint.yaml β€” ignored rules
  • .gitleaks.toml β€” allowlist (paths, regexes, stopwords)

Without these, each tool runs on defaults β€” the substrate doesn't depend on the config files existing.


Other ecosystems

The substrate is more language-agnostic than the npm-script defaults suggest. The following all work unchanged on any stack:

  • Complexity β€” auto-routed by the detected stack (no manual swap): ESLint (complexity + sonarjs/cognitive-complexity, AST-aware via typescript-eslint) on the JS/TS slice of any repo with a resolvable ESLint config β€” whether node-dominant or a Python/Java/Go-dominant polyglot that also carries a JS/TS slice (#73); lizard (true per-function CCN across C, C++, Java, JS, Python, Ruby, Rust, Go, Swift, Kotlin, Lua, Scala, PHP, Objective-C, …) on every other stack β€” and on the non-JS slice of a polyglot repo, merged with the ESLint slice into one record; scc's heuristic as the universal fallback (e.g. Classic ASP). All emit the same per-function CSV the git-hotspots join consumes, so churn Γ— complexity works on any stack. ESLint is preferred for TS because lizard's state-machine TS parser mis-attributes class-method CCN; when ESLint can't run, that slice is reported as unmeasured rather than silently downgraded.
  • Stats β€” scc covers ~150 languages.
  • Security β€” gitleaks is content-based (not language-aware); semgrep has community rulesets for most major languages.
  • Config-lint β€” yamllint, hadolint are language-neutral.
  • Git-axis β€” git-hotspots, change-coupling, bug-fix-density, branch-hygiene are pure git; identical on every stack.
  • Contract, helpers, renderer β€” language-agnostic by design.

The language-specific work is concentrated in the ten npm-script-driven sections. Swap those for your build system's equivalents and the rest of the substrate works as-is.

Section TS / Node (default) Java / Kotlin Python Go Rust C# / .NET
build npm run build gradle build / mvn package pip install -e . go build ./... cargo build dotnet build
typecheck tsc --noEmit (compile-time) mypy / pyright (compile-time) (compile-time) (compile-time)
test vitest / jest JUnit (gradle / maven) pytest go test ./... cargo test dotnet test
lint ESLint SpotBugs + Checkstyle + PMD ruff golangci-lint clippy built-in analyzers
format:check prettier google-java-format / ktlint ruff format / black gofmt -l cargo fmt --check dotnet format
coverage vitest --coverage JaCoCo coverage.py go test -cover (native) tarpaulin / llvm-cov coverlet
unused knip (reflection-limited) vulture / ruff F841 go vet / deadcode cargo-udeps R# CLI / IDE
duplication jscpd jscpd jscpd dupl / jscpd jscpd jscpd
security:audit npm audit OWASP dep-check / Snyk pip-audit / safety govulncheck cargo-audit dotnet list package --vulnerable
mutation Stryker PIT (Pitest) mutmut / cosmic-ray go-mutesting cargo-mutants Stryker.NET
circular-deps madge jdeps (built-in) pydeps go list / staticcheck cargo-modules NDepend (commercial)

The mapping is approximate β€” many of these tools cover different surface area than their JS-ecosystem equivalents (e.g. golangci-lint wraps ~10 linters; clippy is more conservative than ESLint by default). The point is the shape is portable: parse the tool's output into the standard top[] finding shape and the rest of the substrate carries it through unchanged.


Environment variables

Variable Used by Default Purpose
CHECKUP_TARGET path resolution (checkup.sh + renderer) enclosing git repo, else $PWD Explicit project root to scan, instead of auto-detecting from the git top level. For one service in a monorepo, see Scanning a monorepo subdirectory.
CHECKUP_MODE closing verdict (checkup.sh + renderer) tailored tailored (a repo you own & tune): verdict framed for your own codebase ("where to focus next"); a low score exits non-zero as a quality signal you may act on β€” not a deploy gate. audit (a repo you don't own / due diligence): informational only, framed as "where to invest", always exits 0. checkup never gates (ADR-0009).
CHECKUP_SRC_ROOTS complexity + git-axis sections whole tree (VCS-tracked source) NARROWS the complexity + git-forensics scan to specific space-separated roots (e.g. app cmd). By default checkup assesses all VCS-tracked source (honest coverage); set this only to focus the scan or speed up a very large monorepo.
CHECKUP_FORENSIC_SINCE git-axis sections 6.months.ago git log --since window for hotspots / change-coupling / bug-fix-density. Widen (e.g. 2.years.ago) for repos with sparse recent history; an empty window degrades to skip, never a false pass.
CHECKUP_EXCLUDE source inventory (all scanners + scc/identity/stats) unset Extra space-separated fnmatch globs excluded from the whole inventory β€” complexity, duplication and the scc-based stats/identity (#109) β€” on top of the built-in generated/vendored defaults (node_modules, migrations, snapshots, *.min.*, …). Also settable as a top-level exclude: list in .checkup.yml.
CHECKUP_EXCLUDE_GENERATED source inventory (generated-marker pass) on (set 0 to disable) Drop files carrying a banner-shaped generated marker (Go // Code generated … DO NOT EDIT., comment-leader @generated, C# <auto-generated>). Default-on since v0.2.0; the dropped set is enumerated to raw/…generated and announced by a loud banner. CHECKUP_EXCLUDE_GENERATED=0 keeps the whole tree.
CHECKUP_CONCENTRATION_PCT coverage banner (single-dir concentration) 25 Threshold (% of all first-party code) at which one directory is flagged as a possible vendored tree β€” the banner names the directory and prints the exact CHECKUP_EXCLUDE snippet. Advisory only (never auto-excluded).
CHECKUP_SHELL_DIRS shellcheck section scripts .husky .githooks .claude/hooks Space-separated dirs to search for shell scripts. Missing dirs are skipped silently.
HADOLINT_DOCKERFILE hadolint section auto-detect Dockerfile* at root Override the Dockerfile filename when it is named non-conventionally.
MUTATION_TEST mutation section unset (skipped) Set to 1 to enable Stryker; opt-in because mutation testing is slow (~2 min).
RAW_DIR every section (via run_tool) reports/raw Where each section's stdout/stderr capture is written.
PARSED_DIR every section (via run_tool) reports/parsed Where each section's normalised JSON is written.

Forks adding new env-overridable knobs should follow the same <TOOL>_<NOUN> naming convention and document them here in one table.

Scanning a monorepo subdirectory

Point CHECKUP_TARGET at one service inside a larger repo and scope the source roots to it:

CHECKUP_TARGET=/path/to/monorepo/services/api \
CHECKUP_SRC_ROOTS="src" \
CHECKUP_OUT_DIR=/tmp/checkup-out \
bin/checkup.sh

Caveats:

  • Churn / coupling / bug-fix density scope correctly β€” git pathspecs are cwd-relative, so the git-forensics scans see only the subtree.
  • Paths are target-relative β€” the file-based scanners and git-forensics share one namespace (e.g. src/app.ts, not services/api/src/app.ts), so the by-file hotspot aggregate joins correctly.
  • branch-hygiene is repo-wide β€” branches can't be scoped to a subtree, so its counts cover the whole monorepo, not just the service.
  • One stack per run β€” a monorepo mixing stacks wants one run per service with the matching overlay; there's no built-in cross-service roll-up.

npm-script contract

These commands are the default Node profile (profiles/node.sh). Each is overridable per check β€” via a .checkup.yml commands: block or a CHECKUP_CMD_<NAME> environment variable β€” so adapting checkup to another stack is "set the commands", not "fork the orchestrator" (see Overrides). On a Node repo with no overrides the defaults below apply unchanged.

Every npm run <script> the orchestrator invokes must satisfy a small contract. If the host project's package.json deviates, the corresponding section may break. This table is the surface area a forker has to wire up.

npm script Section Required behaviour
typecheck typecheck Exit 0 if zero errors. Stderr/stdout should list TS errors in path:line:col form (consumed by the section parser).
test unit-tests Exit 0 if all tests pass. Vitest summary on stdout (the parser strips ANSI then regex-matches the summary line).
format:check code-quality Exit 0 if all files formatted. Non-zero with the list of unformatted files when drift exists.
lint code-quality Run ESLint with the default text formatter β€” the section parser anchors on the βœ– N problems summary line.
build build Exit 0 if production build succeeds. Output captured to reports/raw/production-build.txt.
quality:security semgrep Run Semgrep with the project ruleset. Write reports/semgrep-report.json (Semgrep's native JSON).
quality:deps circular-deps Run madge --circular --json and write to reports/madge-circular.json (the section reads from that path).
quality:duplicates duplication Run jscpd and write its report to reports/jscpd/jscpd-report.json.
quality:unused unused-code Run knip and emit findings on stdout (default text format).
test:coverage:report coverage Run vitest with coverage; coverage tooling must write coverage/coverage-summary.json (the section reads from there).

Direct (non-npm) invocations β€” no npm run indirection, but listed for completeness:

Command Section Notes
npm audit --json npm-audit Native npm audit JSON
npm outdated --json deps-freshness Native npm outdated JSON
npx eslint --rule ... --format json complexity ESLint JSON: parsed into per-function {file, line, code: "CCN-N"/"COG-N", severity, message}. Cyclomatic findings also appended to reports/complexity-full.csv in lizard-compatible columns for the git-hotspots section. On a node-dominant polyglot repo, lizard additionally measures the non-JS slice (Python/C#/Go/…) and the two are merged into one record + CSV, partitioned by extension (#68).
scc … --no-cocomo codebase-stats Default text output, parsed by the section
shellcheck -f json -x … shellcheck Native JSON
yamllint -f parsable … yamllint path:line:col: [level] message (rule) format
hadolint --no-color … hadolint Text output with Dockerfile:line code: message lines
gitleaks dir --report-format=json gitleaks Native JSON
npx stryker run mutation Stryker writes its own HTML/JSON reports under reports/mutation/

If a forker doesn't have one of these wired up, the section either skips (tool on PATH not present) or writes fail with a parse-error reason β€” never breaks the whole orchestrator.


Overrides (.checkup.yml)

checkup auto-detects your stack (detection.json) and picks the default commands above. A repo-local .checkup.yml at the scan target overrides those deliberately, with no install step β€” the agent-tailoring seam. It is consulted first; an absent file changes nothing (broad, unconfigured runs are the norm for an audit of a repo you don't own).

Copy .checkup.yml.example and prune. Keys (all optional):

stack:
  force: dotnet          # treat this as the primary stack (overrides detection)
  suppress: [node]       # treat a detected stack as absent (e.g. a tooling-only package.json)
checks:
  disable: [mutation]    # skip a project-built check (reports as skip: "disabled in .checkup.yml")
  enable:  [mutation]    # opt in to an off-by-default check
commands:
  test: "dotnet test"    # override a check's command ("" disables it); or set CHECKUP_CMD_TEST
thresholds:              # tune warn/fail banding (status only, never the score)
  complexity_ccn_warn: 10  # report functions at/above this CCN (ESLint + lizard engines)
  complexity_ccn_fail: 30  # any function at/above this β†’ the complexity record fails
  duplication_warn_pct: 3  # duplication % at/above this β†’ warn; …_fail_pct β†’ fail

It's a small YAML subset (inline lists [a, b], # comments); yq is used if present but is not required. Unknown keys or malformed input are warned about and ignored β€” never fatal, never a false pass.


The contract

Every check in checkup.sh MUST satisfy four properties. These are the invariants that make the script modular, reference-quality, LLM-consumable, and graceful-degrading. They are checked by code review, not by tooling.

1. Registry-style plugin

Adding a check is one section in checkup.sh. No edits to the markdown writer β€” it discovers new parsed JSONs automatically. The skeleton:

# N. <Human Label> β€” <ticket>
# section:    <slug>
# purpose:    <one sentence β€” what is this measuring?>
# pass_means: <what good looks like, and WHY this threshold>
# fail_means: <what bad looks like, and the suggested response>
print_section "<Human Label>"
echo "Command: <whatever the user would run by hand>"
echo ""

<SLUG>_INTENT=$(jq -n '{
    purpose:    "...",
    pass_means: "...",
    fail_means: "..."
}')

MAX_SCORE=$((MAX_SCORE + <points>))   # keep the 168-pt trend score
run_tool "<Human Label>" <tool> [args…]

if [ "$LAST_EXIT" = "127" ]; then
    write_skipped "<slug>" "<tool> not installed (install: …)"
elif [ ! -s "$LAST_RAW" ]; then
    write_skipped "<slug>" "<tool> produced no output"
else
    # …derive status, count, summary, top[] from $LAST_RAW…
    <SLUG>_TOP=$(jq -c '...' "$LAST_RAW")   # see severity vocabulary below

    if [ "$<count>" -eq 0 ]; then
        echo -e "${GREEN}βœ… Passed${NC}"
        HEALTH_SCORE=$((HEALTH_SCORE + <points>))
        STATUS="pass"; SUMMARY="..."
    elif # …warn case…
        STATUS="warn"; …
    else
        STATUS="fail"; …
    fi
    write_parsed "<slug>" "$STATUS" "$COUNT" "$SUMMARY" "$<SLUG>_TOP" "$<SLUG>_INTENT"
fi
echo ""

In practice, sections are inline (not wrapped in check_<slug>() functions) β€” extracting to functions is a worthwhile future cleanup but not required by the contract. The comment block at the top, the intent heredoc, the run_tool call, and the write_parsed/write_skipped are the load-bearing parts.

2. Documented intent

Every check declares intent: {purpose, pass_means, fail_means} so that anyone (human or LLM) reading the parsed JSON can understand what the check is for without reading the source.

The comment block at the top of the check function is the source of truth. The JSON intent field is a copy for downstream consumers. The markdown report renders the intent under each check.

3. Standardised parsed JSON

Each check writes reports/parsed/<slug>.json:

{
  "slug": "complexity",
  "status": "warn", // pass | warn | fail | skip
  "count": 77,
  "summary": "77 hotspots over CCN 10 / cognitive 15 (top 20 reported)",
  "top": [
    // optional β€” most checks populate
    {
      "file": "src/services/example-service.ts",
      "line": 347,
      "code": "CCN-37", // short tag for grouping/dedup
      "severity": "warning", // see vocabulary below
      "message": "handleRequest β€” CCN 37",
    },
  ],
  "intent": {
    "purpose": "Identify functions whose complexity makes them bug-incubators.",
    "pass_means": "No functions over CCN 10 or cognitive 15.",
    "fail_means": "CCN/cognitive > 30 should be refactored or covered with dedicated tests.",
  },
}

Status vocabulary

status meaning summary headline
pass check ran and the codebase met the threshold βœ…
warn check ran, threshold breached but non-blocking ⚠️
fail check ran, threshold breached and treated as serious ❌
skip check did not run (tool missing, prereq absent) ⏭️

Severity vocabulary (top[].severity)

Ordered by triage weight:

severity weight use for
critical 0 security CVEs, secrets, RLS gaps
error 0 hard failures (typecheck errors, test failures)
high 0 high-severity findings (semgrep ERROR, CVEs)
warning 1 non-blocking but actionable
medium 1 mid-severity findings
low 2 minor issues
style 2 formatting, style
info 3 informational

The "Top Problems" markdown aggregate uses the weight column to sort across tools.

4. Graceful degrade

Every check writes a parsed JSON, even when skipped. The report shows "what was supposed to run vs. what did". The convention:

if [ "$LAST_EXIT" = "127" ]; then
    write_skipped "$slug" "tool-name not installed (install: …)"
    return
fi

A missing tool is never a script failure. The orchestrator continues; the final report shows the gap explicitly.


Helpers (lib/run-tool.sh)

run_tool "<Label>" <cmd> [args…]

Runs <cmd>, captures stdout to $RAW_DIR/<slug>.txt, stderr to $RAW_DIR/<slug>.stderr.txt (deleted if empty). Sets globals:

global meaning
LAST_LABEL the human label passed in
LAST_SLUG computed slug β€” used for parsed filename
LAST_RAW absolute path to the stdout capture
LAST_STDERR absolute path to the stderr capture (or empty)
LAST_EXIT exit code of the tool β€” 127 if not on PATH

Always returns 0 so that set -e callers do not abort on expected non-zero exits (e.g. lint with warnings, typecheck with errors). The tool's real exit code is in $LAST_EXIT; 127 means the tool is not on $PATH.

write_parsed <slug> <status> <count> <summary> [top-json] [intent-json]

Emits the parsed JSON. top-json and intent-json are passed verbatim into jq --argjson so they must be valid JSON; defaults are [] and {}.

write_skipped <slug> <reason> [intent-json]

Sugar for write_parsed <slug> skip 0 "<reason>" [] <intent> β€” always-write discipline for the "deliberately not run" case (tool not installed, opt-in not set, prereq genuinely absent). Pass the check's intent heredoc so the report still explains why this check matters even when it didn't run; a reader can then decide whether to enable it.

write_failed <slug> <reason> [intent-json]

For the "tool ran but we can't interpret the output" case β€” missing/ malformed report file, non-zero exit without parseable diagnostic. Status is fail with empty top[]. Use this in preference to write_skipped when the tool actually executed; the distinction matters because skipped implies "deliberately bypassed" whereas this case is "we tried and got nothing usable, can't claim safety". The wrong choice silently downgrades a real failure.

is_valid_json <path>

Returns 0 if the file exists, is non-empty, and parses as JSON. Use as a guard before jq against tool output you didn't construct yourself.

slug "<Label>"

Lowercase kebab-case derivation. Stable identifier for filenames and the slug field. slug "Code Quality (Formatting + Linting)" β†’ code-quality-formatting-linting.


Output layout

reports/                        # gitignored (whole tree)
β”œβ”€β”€ raw/                        # one file per check β€” tool stdout/stderr
β”‚   β”œβ”€β”€ complexity-hotspots.txt # ESLint JSON output (one finding per line)
β”‚   β”œβ”€β”€ eslint.txt
β”‚   β”œβ”€β”€ eslint.stderr.txt       # only present when stderr was non-empty
β”‚   └── …
β”œβ”€β”€ parsed/                     # one file per check β€” normalised JSON
β”‚   β”œβ”€β”€ complexity.json
β”‚   β”œβ”€β”€ eslint.json
β”‚   └── …
β”œβ”€β”€ focus.json                  # derived "where to focus" ranking β€” files
β”‚                               #   ranked by how many health axes they land
β”‚                               #   on (hotspot / coupling / bug-fix density /
β”‚                               #   complexity), each with a per-axis "why".
β”œβ”€β”€ by-file.json                # derived cross-cut β€” files ranked by
β”‚                               #   severity-weighted finding count across
β”‚                               #   every check. Combined with the
β”‚                               #   `git-hotspots` check this completes
β”‚                               #   the Tornhill bug-hotspot triangle.
β”œβ”€β”€ checkup-report-<utc-ts>.md   # timestamped history (kept for trend)
β”œβ”€β”€ checkup-summary.json         # score + max for the trend headline
└── complexity-full.csv         # complexity findings in lizard-CSV format
                                # (consumed by git-hotspots; written by the
                                # complexity section)

docs/reports/                   # committed
└── checkup-report.md     # always = "latest", overwritten on each run

Cross-tool aggregates

The markdown writer computes three cross-tool views automatically β€” no check contributes to them directly; they emerge from the standardised parsed JSON.

Focus Areas (reports/focus.json)

The report's headline "where should this team focus first?" view. Fuses the four per-file health axes β€” git-hotspots (churn Γ— complexity), change-coupling, bug-fix-density, and complexity β€” by file, so a file landing on several axes (hot Γ— complex and coupled and bug-dense) rises to the top. Ranking is axis-count first (multi-signal concentration is the point), then a severity-weighted focus score; each row carries a one-line why (the strongest message per axis). Renderer-only, so it works on any stack whose run produced those checks, and is simply empty when none did. It is a focus signal, never a gate. Full ranking β†’ reports/focus.json; the report shows the top 10.

Top Problems

A single triage list across every check. Max 3 entries per tool (so a wide check like code-quality with 472 warnings can't drown everything else), total cap 30. Severity-weighted: critical/error/high β†’ 0, warning/medium β†’ 1, low/style β†’ 2, info β†’ 3. Sort ascending by weight.

Files with most findings (reports/by-file.json)

Joins every check's top[] by file. The ranking is severity-weighted (critical=4 … info=1) so a file with one critical finding outranks one with three info-only findings. A file appearing across multiple checks (e.g. complexity + lint + semgrep) is a likely bug hotspot β€” the spatial axis of the Tornhill triangle. The temporal axis lives in the dedicated git-hotspots check, which joins six-month commit churn against per-file max CCN. Together they cover both legs of the bug-prediction signal.

The full ranking goes to reports/by-file.json for LLM consumption; the markdown report shows the top 10.


Bring your own checks

checkup is built to be forked and adapted. The full fork-and-modify guide β€” what to keep verbatim, what to swap, a worked example, "how to tell you're done", and guidance for AI agents driving a port β€” lives in docs/build-your-own.md.


Design rationale

Why the dual stream, per-check JSON, status-vs-score, and in-band intent: see docs/architecture.md. The deeper decisions (pinning, layering, contribution model) are recorded as ADRs in docs/decisions/.

About

🩺 Application Checkup β€” a portable, tool-agnostic scanner for codebase health, quality, security & hygiene. Runs anywhere (incl. Docker), degrades gracefully, emits a normalised report.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors