Skip to content

feat(loop): gate merge safety on evidence depth#615

Merged
kunaldhongade merged 1 commit into
mainfrom
feature/614-loop-evidence-gate
Jul 1, 2026
Merged

feat(loop): gate merge safety on evidence depth#615
kunaldhongade merged 1 commit into
mainfrom
feature/614-loop-evidence-gate

Conversation

@kunaldhongade

Copy link
Copy Markdown
Contributor

Closes #614

What changed

  • Replaced the loop's single merge-safe terminal state with qualified merge-safe-verified and merge-safe-shallow verdicts.
  • Added an evidence-aware safety gate that checks security score, unresolved high-severity findings, configured checks, Semgrep, coverage, and mutation evidence.
  • Surfaced Semgrep, coverage, and StrykerJS adapter evidence into loop snapshots and markdown/JSON verdict reporting.
  • Added a bounded indirect SQL construction matcher and benchmark positive/decoy coverage without replacing Semgrep as the deep security path.
  • Added opt-in loop convergence E2E coverage with a deterministic scripted agent that edits a weak test and proves re-verification converges.

Why

codedecay loop could previously report a clean outcome from risk level, weak-test count, and passing checks alone. This makes the verdict honest about missing security, coverage, and mutation depth so users know what was actually verified.

Validation

  • pnpm lint
  • pnpm typecheck
  • pnpm test
  • pnpm build
  • pnpm eval:benchmark
  • node packages/cli/dist/index.js benchmark --format json
    • totalExpected: 22
    • totalMatched: 22
    • overallRecall: 1
    • falsePositiveRate: 0.0278
    • costUsd: 0
    • llmCalled: false
    • telemetrySent: false
  • node packages/cli/dist/index.js loop --help
  • CODEDECAY_LOOP_E2E=1 pnpm vitest run packages/cli/test/loop-e2e.test.ts

@github-actions github-actions Bot added documentation Improvements or additions to documentation type: test Test coverage, fixtures, or verification improvements area: cli CLI package or command behavior area: core Core types, scoring, or rule runner area: docs README, community files, or documentation area: harness Agent and tool harness interfaces labels Jun 30, 2026
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

CodeDecay PR Check

Lead catch: Test appears to copy implementation logic — packages/cli/test/benchmark-corpus.test.ts:355

packages/cli/test/benchmark-corpus.test.ts includes logic copied from packages/cli/src/benchmark/corpus.ts; this can make tests pass without protecting real behavior.

Risk: High · Merge 100/100 · Decay 54/100 · Security 100/100

Full CodeDecay report

CodeDecay Report

Overall risk: High

Score Value
Merge risk 100/100
Decay risk 54/100
Security risk 100/100
Findings Count
High 14
Medium 14
Low 20

Changed Files

  • docs/loop.md modified (+13/-3)
  • packages/cli/src/benchmark/corpus.ts modified (+53/-0)
  • packages/cli/src/commands/loop.ts modified (+188/-3)
  • packages/cli/src/docs/command-docs/orchestration.ts modified (+3/-1)
  • packages/cli/src/parsers/loop.ts modified (+22/-1)
  • packages/cli/src/types/loop.ts modified (+1/-0)
  • packages/cli/test/benchmark-corpus.test.ts modified (+5/-3)
  • packages/cli/test/benchmark.test.ts modified (+14/-6)
  • packages/cli/test/loop-e2e.test.ts added (+154/-0)
  • packages/cli/test/loop.test.ts modified (+64/-3)
  • packages/core/src/types/security.ts modified (+1/-1)
  • packages/harness/src/index.ts modified (+7/-1)
  • packages/harness/src/loop/controller.ts modified (+152/-15)
  • packages/harness/src/loop/index.ts modified (+6/-2)
  • packages/harness/src/loop/render.ts modified (+29/-0)
  • packages/harness/src/loop/types.ts modified (+78/-1)
  • packages/harness/test/loop.test.ts modified (+122/-4)
  • packages/matchers/src/defaults.ts modified (+27/-9)
  • packages/matchers/src/utils.ts modified (+42/-0)
  • packages/matchers/test/matchers.test.ts modified (+46/-0)

Likely Impacted Areas

  • High API surface (api): packages/harness/src/loop/controller.ts
  • Low Documentation (docs): docs/loop.md, packages/cli/src/docs/command-docs/orchestration.ts
  • Low Source code (source): packages/cli/src/benchmark/corpus.ts, packages/cli/src/commands/loop.ts, packages/cli/src/parsers/loop.ts, packages/cli/src/types/loop.ts, packages/core/src/types/security.ts, packages/harness/src/index.ts, packages/harness/src/loop/index.ts, packages/harness/src/loop/render.ts, packages/harness/src/loop/types.ts, packages/matchers/src/defaults.ts, packages/matchers/src/utils.ts
  • Low Tests (test): packages/cli/test/benchmark-corpus.test.ts, packages/cli/test/benchmark.test.ts, packages/cli/test/loop-e2e.test.ts, packages/cli/test/loop.test.ts, packages/harness/test/loop.test.ts, packages/matchers/test/matchers.test.ts

Language And Parser Coverage

  • Source files classified: 19
  • Fully supported parser files: 19
  • Limited files: 0
  • Unsupported files: 0

Merge Risk Breakdown

  • Score: 100/100
  • Raw score before dampeners: 100/100
  • Adjusted score before severity cap: 100/100
  • Highest contributing severity: High

Top contributors:

  • +30 Api area changed (direct): packages/harness/src/loop/controller.ts touches a api area and should be reviewed for regression impact.
  • +30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • +30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • +30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • +30 Project invariant may be impacted (direct): Memory invariant "No hidden cloud or model call" applies to this change. The OSS CLI must remain useful without telemetry, API keys, hosted services, required LLM calls, or CodeDecayCloud.

Decay Risk Breakdown

  • Score: 54/100
  • Raw score before dampeners: 100/100
  • Adjusted score before severity cap: 84/100
  • Highest contributing severity: High
  • Evidence mode: heuristic-only

Top contributors:

  • +18 High complexity in changed function (heuristic): parseLoopArgs has estimated cyclomatic complexity 23.
  • +18 High complexity in changed function (heuristic): createLoopVerdictEvidence has estimated cyclomatic complexity 30.
  • +18 High complexity in changed function (heuristic): renderLoopMarkdown has estimated cyclomatic complexity 24.
  • +10 Broad unrelated change set (heuristic): This PR changes 18 files across 1 top-level areas and 3 risk categories.
  • +10 Duplicated added logic (heuristic): A similar block of added logic appears 2 times across 2 file(s).

Dampeners:

  • -16 Heuristic-only dampener: Decay stays conservative until direct evidence exists.

Notes:

  • Heuristic-only decay is capped at 54/100 until direct evidence exists.

Security Risk Breakdown

  • Score: 100/100
  • Raw score before dampeners: 100/100
  • Adjusted score before severity cap: 100/100
  • Highest contributing severity: High

Top contributors:

  • +30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • +30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • +30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • +18 Change size (structural): Changed lines amplify review cost across 20 file(s).
  • +18 Missing auth entry-point candidate (heuristic): A public route or controller entry point changed without an obvious auth/session guard in the file. Evidence: Route/controller entry point has no obvious auth, session, token, or permission guard in the same file.

Security Matcher Coverage

  • Changed source files scanned: 13
  • Security candidates found: 9
  • Skipped files: 0

Security Candidates

  • High Missing auth entry-point candidate CWE-306 (entry-point) at packages/cli/src/benchmark/corpus.ts:1: Route/controller entry point has no obvious auth, session, token, or permission guard in the same file.
  • Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:180: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:334: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:354: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:531: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • High Path traversal candidate CWE-22 (direct) at packages/cli/src/benchmark/corpus.ts:560: File access is built from request-controlled input.
  • High Path traversal candidate CWE-22 (direct) at packages/cli/src/benchmark/corpus.ts:576: File access is built from request-controlled input.
  • High Path traversal candidate CWE-22 (direct) at packages/cli/src/benchmark/corpus.ts:579: File access is built from request-controlled input.
  • Medium SQL injection candidate CWE-89 (indirect) at packages/matchers/src/defaults.ts:30: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.

Test Evidence

  • Mode: heuristic-only
  • Sources: none
  • Changed source coverage:
  • packages/cli/src/benchmark/corpus.ts: not measured (no measurable changed lines)
  • packages/cli/src/commands/loop.ts: not measured (no measurable changed lines)
  • packages/cli/src/docs/command-docs/orchestration.ts: not measured (no measurable changed lines)
  • packages/cli/src/parsers/loop.ts: not measured (no measurable changed lines)
  • packages/cli/src/types/loop.ts: not measured (no measurable changed lines)
  • packages/core/src/types/security.ts: not measured (no measurable changed lines)
  • packages/harness/src/index.ts: not measured (no measurable changed lines)
  • packages/harness/src/loop/controller.ts: not measured (no measurable changed lines)
  • Notes:
  • No runtime coverage artifact was found. Test audit remains heuristic-only.

High Risk Findings

  • Test appears to copy implementation logic (packages/cli/test/benchmark-corpus.test.ts:355): packages/cli/test/benchmark-corpus.test.ts includes logic copied from packages/cli/src/benchmark/corpus.ts; this can make tests pass without protecting real behavior.
  • Test appears to copy implementation logic (packages/cli/test/loop-e2e.test.ts:127): packages/cli/test/loop-e2e.test.ts includes logic copied from packages/cli/src/benchmark/corpus.ts; this can make tests pass without protecting real behavior.
  • Test appears to copy implementation logic (packages/harness/test/loop.test.ts:233): packages/harness/test/loop.test.ts includes logic copied from packages/cli/src/commands/loop.ts; this can make tests pass without protecting real behavior.
  • High complexity in changed function (packages/cli/src/parsers/loop.ts:6): parseLoopArgs has estimated cyclomatic complexity 23.
  • High complexity in changed function (packages/harness/src/loop/controller.ts:211): createLoopVerdictEvidence has estimated cyclomatic complexity 30.
  • High complexity in changed function (packages/harness/src/loop/render.ts:11): renderLoopMarkdown has estimated cyclomatic complexity 24.
  • Project invariant may be impacted (packages/cli/src/benchmark/corpus.ts:77): Memory invariant "No hidden cloud or model call" applies to this change. The OSS CLI must remain useful without telemetry, API keys, hosted services, required LLM calls, or CodeDecayCloud.
  • Project invariant may be impacted (packages/cli/src/benchmark/corpus.ts:77): Memory invariant "Commands are explicit" applies to this change. CodeDecay must not run project commands unless they are configured and safety.allowCommands is true.
  • Project invariant may be impacted (docs/loop.md:33): Memory invariant "Tool evidence is separate from AI suggestions" applies to this change. Reports must not present agent/model suggestions as verified evidence unless backed by deterministic checks or command output.
  • Api area changed (packages/harness/src/loop/controller.ts:11): packages/harness/src/loop/controller.ts touches a api area and should be reviewed for regression impact.
  • Missing auth entry-point candidate (packages/cli/src/benchmark/corpus.ts:1): A public route or controller entry point changed without an obvious auth/session guard in the file. Evidence: Route/controller entry point has no obvious auth, session, token, or permission guard in the same file.
  • Path traversal candidate (packages/cli/src/benchmark/corpus.ts:560): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • Path traversal candidate (packages/cli/src/benchmark/corpus.ts:576): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
  • Path traversal candidate (packages/cli/src/benchmark/corpus.ts:579): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.

Medium Risk Findings

  • Broad unrelated change set: This PR changes 18 files across 1 top-level areas and 3 risk categories.
  • Duplicated added logic (packages/harness/src/index.ts:20): A similar block of added logic appears 2 times across 2 file(s).
  • Duplicated added logic (packages/harness/src/index.ts:23): A similar block of added logic appears 2 times across 2 file(s).
  • High complexity in changed function (packages/harness/src/loop/controller.ts:21): runCodeDecayLoop has estimated cyclomatic complexity 13.
  • Large changed function (packages/cli/src/parsers/loop.ts:6): parseLoopArgs spans 122 lines, which increases review and regression risk.
  • Large changed function (packages/harness/src/loop/controller.ts:21): runCodeDecayLoop spans 143 lines, which increases review and regression risk.
  • Project invariant may be impacted (docs/loop.md:33): Memory invariant "Output must be actionable" applies to this change. Redteam reports and agent bundles should say what behavior to verify, which test proof is weak or missing, and what task a coding agent should perform.
  • SQL injection candidate (packages/cli/src/benchmark/corpus.ts:180): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • SQL injection candidate (packages/cli/src/benchmark/corpus.ts:334): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • SQL injection candidate (packages/cli/src/benchmark/corpus.ts:354): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • SQL injection candidate (packages/cli/src/benchmark/corpus.ts:531): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • SQL injection candidate (packages/matchers/src/defaults.ts:30): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
  • Large test change relative to source change (packages/cli/test/loop-e2e.test.ts:1): packages/cli/test/loop-e2e.test.ts adds 154 lines of tests for 609 source additions.
  • Large test change relative to source change (packages/harness/test/loop.test.ts:7): packages/harness/test/loop.test.ts adds 122 lines of tests for 609 source additions.

Low Risk Findings

  • Architecture note applies (packages/cli/src/benchmark/corpus.ts:77): CLI is the published surface: The public npm package is @submuxhq/codedecay and the binary is codedecay. Internal workspace packages are implementation details.
  • Docs area changed (docs/loop.md:33): docs/loop.md touches a docs area and should be reviewed for regression impact.
  • Docs area changed (packages/cli/src/docs/command-docs/orchestration.ts:130): packages/cli/src/docs/command-docs/orchestration.ts touches a docs area and should be reviewed for regression impact.
  • Source area changed (packages/cli/src/benchmark/corpus.ts:77): packages/cli/src/benchmark/corpus.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/cli/src/commands/loop.ts:5): packages/cli/src/commands/loop.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/cli/src/parsers/loop.ts:10): packages/cli/src/parsers/loop.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/cli/src/types/loop.ts:13): packages/cli/src/types/loop.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/core/src/types/security.ts:3): packages/core/src/types/security.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/harness/src/index.ts:7): packages/harness/src/index.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/harness/src/loop/index.ts:2): packages/harness/src/loop/index.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/harness/src/loop/render.ts:22): packages/harness/src/loop/render.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/harness/src/loop/types.ts:5): packages/harness/src/loop/types.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/matchers/src/defaults.ts:6): packages/matchers/src/defaults.ts touches a source area and should be reviewed for regression impact.
  • Source area changed (packages/matchers/src/utils.ts:145): packages/matchers/src/utils.ts touches a source area and should be reviewed for regression impact.
  • Test area changed (packages/cli/test/benchmark-corpus.test.ts:250): packages/cli/test/benchmark-corpus.test.ts touches a test area and should be reviewed for regression impact.
  • Test area changed (packages/cli/test/benchmark.test.ts:31): packages/cli/test/benchmark.test.ts touches a test area and should be reviewed for regression impact.
  • Test area changed (packages/cli/test/loop-e2e.test.ts:1): packages/cli/test/loop-e2e.test.ts touches a test area and should be reviewed for regression impact.
  • Test area changed (packages/cli/test/loop.test.ts:15): packages/cli/test/loop.test.ts touches a test area and should be reviewed for regression impact.
  • Test area changed (packages/harness/test/loop.test.ts:7): packages/harness/test/loop.test.ts touches a test area and should be reviewed for regression impact.
  • Test area changed (packages/matchers/test/matchers.test.ts:190): packages/matchers/test/matchers.test.ts touches a test area and should be reviewed for regression impact.

Recommended Checks

  • Add or run tests covering packages/cli/src/docs/command-docs/orchestration.ts
  • Add or run tests covering packages/harness/src/index.ts
  • Add or run tests covering packages/harness/src/loop/controller.ts
  • Add or run tests covering packages/harness/src/loop/index.ts
  • Add or run tests covering packages/matchers/src/utils.ts
  • Exercise packages/cli/src/benchmark/corpus.ts through its public API instead of copying its logic
  • Exercise packages/cli/src/commands/loop.ts through its public API instead of copying its logic
  • Flow check (CLI release smoke): Run built CLI smoke tests
  • Flow check (CLI release smoke): Run package dry-run
  • Flow check (CLI release smoke): Run published-package or tarball demo before release
  • Flow check (Pull request redteam review): Check weak or missing test proof
  • Flow check (Pull request redteam review): Keep deterministic evidence separate from AI suggestions

Notes

CodeDecay is deterministic and local-first. This report was generated without telemetry, API keys, LLMs, or model calls.


Found by CodeDecay - deterministic, local-first, no telemetry.

@kunaldhongade kunaldhongade force-pushed the feature/614-loop-evidence-gate branch from 6869a69 to 47467f4 Compare July 1, 2026 08:55
@kunaldhongade kunaldhongade merged commit bd82ee7 into main Jul 1, 2026
7 checks passed
@kunaldhongade kunaldhongade deleted the feature/614-loop-evidence-gate branch July 1, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli CLI package or command behavior area: core Core types, scoring, or rule runner area: docs README, community files, or documentation area: harness Agent and tool harness interfaces documentation Improvements or additions to documentation type: test Test coverage, fixtures, or verification improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(loop): make merge-safety verdict evidence-aware

1 participant