feat(loop): gate merge safety on evidence depth by kunaldhongade · Pull Request #615 · SubmuxHQ/CodeDecay

kunaldhongade · 2026-06-30T19:14:37Z

Closes #614

What changed

Replaced the loop's single merge-safe terminal state with qualified merge-safe-verified and merge-safe-shallow verdicts.
Added an evidence-aware safety gate that checks security score, unresolved high-severity findings, configured checks, Semgrep, coverage, and mutation evidence.
Surfaced Semgrep, coverage, and StrykerJS adapter evidence into loop snapshots and markdown/JSON verdict reporting.
Added a bounded indirect SQL construction matcher and benchmark positive/decoy coverage without replacing Semgrep as the deep security path.
Added opt-in loop convergence E2E coverage with a deterministic scripted agent that edits a weak test and proves re-verification converges.

Why

codedecay loop could previously report a clean outcome from risk level, weak-test count, and passing checks alone. This makes the verdict honest about missing security, coverage, and mutation depth so users know what was actually verified.

Validation

pnpm lint
pnpm typecheck
pnpm test
pnpm build
pnpm eval:benchmark
node packages/cli/dist/index.js benchmark --format json
- totalExpected: 22
- totalMatched: 22
- overallRecall: 1
- falsePositiveRate: 0.0278
- costUsd: 0
- llmCalled: false
- telemetrySent: false
node packages/cli/dist/index.js loop --help
CODEDECAY_LOOP_E2E=1 pnpm vitest run packages/cli/test/loop-e2e.test.ts

github-actions · 2026-06-30T19:15:00Z

CodeDecay PR Check

Lead catch: Test appears to copy implementation logic — packages/cli/test/benchmark-corpus.test.ts:355

packages/cli/test/benchmark-corpus.test.ts includes logic copied from packages/cli/src/benchmark/corpus.ts; this can make tests pass without protecting real behavior.

Risk: High · Merge 100/100 · Decay 54/100 · Security 100/100

Full CodeDecay report

CodeDecay Report

Overall risk: High

Score	Value
Merge risk	100/100
Decay risk	54/100
Security risk	100/100

Findings	Count
High	14
Medium	14
Low	20

Changed Files

docs/loop.md modified (+13/-3)
packages/cli/src/benchmark/corpus.ts modified (+53/-0)
packages/cli/src/commands/loop.ts modified (+188/-3)
packages/cli/src/docs/command-docs/orchestration.ts modified (+3/-1)
packages/cli/src/parsers/loop.ts modified (+22/-1)
packages/cli/src/types/loop.ts modified (+1/-0)
packages/cli/test/benchmark-corpus.test.ts modified (+5/-3)
packages/cli/test/benchmark.test.ts modified (+14/-6)
packages/cli/test/loop-e2e.test.ts added (+154/-0)
packages/cli/test/loop.test.ts modified (+64/-3)
packages/core/src/types/security.ts modified (+1/-1)
packages/harness/src/index.ts modified (+7/-1)
packages/harness/src/loop/controller.ts modified (+152/-15)
packages/harness/src/loop/index.ts modified (+6/-2)
packages/harness/src/loop/render.ts modified (+29/-0)
packages/harness/src/loop/types.ts modified (+78/-1)
packages/harness/test/loop.test.ts modified (+122/-4)
packages/matchers/src/defaults.ts modified (+27/-9)
packages/matchers/src/utils.ts modified (+42/-0)
packages/matchers/test/matchers.test.ts modified (+46/-0)

Likely Impacted Areas

High API surface (api): packages/harness/src/loop/controller.ts
Low Documentation (docs): docs/loop.md, packages/cli/src/docs/command-docs/orchestration.ts
Low Source code (source): packages/cli/src/benchmark/corpus.ts, packages/cli/src/commands/loop.ts, packages/cli/src/parsers/loop.ts, packages/cli/src/types/loop.ts, packages/core/src/types/security.ts, packages/harness/src/index.ts, packages/harness/src/loop/index.ts, packages/harness/src/loop/render.ts, packages/harness/src/loop/types.ts, packages/matchers/src/defaults.ts, packages/matchers/src/utils.ts
Low Tests (test): packages/cli/test/benchmark-corpus.test.ts, packages/cli/test/benchmark.test.ts, packages/cli/test/loop-e2e.test.ts, packages/cli/test/loop.test.ts, packages/harness/test/loop.test.ts, packages/matchers/test/matchers.test.ts

Language And Parser Coverage

Source files classified: 19
Fully supported parser files: 19
Limited files: 0
Unsupported files: 0

Merge Risk Breakdown

Score: 100/100
Raw score before dampeners: 100/100
Adjusted score before severity cap: 100/100
Highest contributing severity: High

Top contributors:

+30 Api area changed (direct): packages/harness/src/loop/controller.ts touches a api area and should be reviewed for regression impact.
+30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
+30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
+30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
+30 Project invariant may be impacted (direct): Memory invariant "No hidden cloud or model call" applies to this change. The OSS CLI must remain useful without telemetry, API keys, hosted services, required LLM calls, or CodeDecayCloud.

Decay Risk Breakdown

Score: 54/100
Raw score before dampeners: 100/100
Adjusted score before severity cap: 84/100
Highest contributing severity: High
Evidence mode: heuristic-only

Top contributors:

+18 High complexity in changed function (heuristic): parseLoopArgs has estimated cyclomatic complexity 23.
+18 High complexity in changed function (heuristic): createLoopVerdictEvidence has estimated cyclomatic complexity 30.
+18 High complexity in changed function (heuristic): renderLoopMarkdown has estimated cyclomatic complexity 24.
+10 Broad unrelated change set (heuristic): This PR changes 18 files across 1 top-level areas and 3 risk categories.
+10 Duplicated added logic (heuristic): A similar block of added logic appears 2 times across 2 file(s).

Dampeners:

-16 Heuristic-only dampener: Decay stays conservative until direct evidence exists.

Notes:

Heuristic-only decay is capped at 54/100 until direct evidence exists.

Security Risk Breakdown

Score: 100/100
Raw score before dampeners: 100/100
Adjusted score before severity cap: 100/100
Highest contributing severity: High

Top contributors:

+30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
+30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
+30 Path traversal candidate (direct): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
+18 Change size (structural): Changed lines amplify review cost across 20 file(s).
+18 Missing auth entry-point candidate (heuristic): A public route or controller entry point changed without an obvious auth/session guard in the file. Evidence: Route/controller entry point has no obvious auth, session, token, or permission guard in the same file.

Security Matcher Coverage

Changed source files scanned: 13
Security candidates found: 9
Skipped files: 0

Security Candidates

High Missing auth entry-point candidate CWE-306 (entry-point) at packages/cli/src/benchmark/corpus.ts:1: Route/controller entry point has no obvious auth, session, token, or permission guard in the same file.
Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:180: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:334: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:354: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
Medium SQL injection candidate CWE-89 (indirect) at packages/cli/src/benchmark/corpus.ts:531: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
High Path traversal candidate CWE-22 (direct) at packages/cli/src/benchmark/corpus.ts:560: File access is built from request-controlled input.
High Path traversal candidate CWE-22 (direct) at packages/cli/src/benchmark/corpus.ts:576: File access is built from request-controlled input.
High Path traversal candidate CWE-22 (direct) at packages/cli/src/benchmark/corpus.ts:579: File access is built from request-controlled input.
Medium SQL injection candidate CWE-89 (indirect) at packages/matchers/src/defaults.ts:30: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.

Test Evidence

Mode: heuristic-only
Sources: none
Changed source coverage:
packages/cli/src/benchmark/corpus.ts: not measured (no measurable changed lines)
packages/cli/src/commands/loop.ts: not measured (no measurable changed lines)
packages/cli/src/docs/command-docs/orchestration.ts: not measured (no measurable changed lines)
packages/cli/src/parsers/loop.ts: not measured (no measurable changed lines)
packages/cli/src/types/loop.ts: not measured (no measurable changed lines)
packages/core/src/types/security.ts: not measured (no measurable changed lines)
packages/harness/src/index.ts: not measured (no measurable changed lines)
packages/harness/src/loop/controller.ts: not measured (no measurable changed lines)
Notes:
No runtime coverage artifact was found. Test audit remains heuristic-only.

High Risk Findings

Test appears to copy implementation logic (packages/cli/test/benchmark-corpus.test.ts:355): packages/cli/test/benchmark-corpus.test.ts includes logic copied from packages/cli/src/benchmark/corpus.ts; this can make tests pass without protecting real behavior.
Test appears to copy implementation logic (packages/cli/test/loop-e2e.test.ts:127): packages/cli/test/loop-e2e.test.ts includes logic copied from packages/cli/src/benchmark/corpus.ts; this can make tests pass without protecting real behavior.
Test appears to copy implementation logic (packages/harness/test/loop.test.ts:233): packages/harness/test/loop.test.ts includes logic copied from packages/cli/src/commands/loop.ts; this can make tests pass without protecting real behavior.
High complexity in changed function (packages/cli/src/parsers/loop.ts:6): parseLoopArgs has estimated cyclomatic complexity 23.
High complexity in changed function (packages/harness/src/loop/controller.ts:211): createLoopVerdictEvidence has estimated cyclomatic complexity 30.
High complexity in changed function (packages/harness/src/loop/render.ts:11): renderLoopMarkdown has estimated cyclomatic complexity 24.
Project invariant may be impacted (packages/cli/src/benchmark/corpus.ts:77): Memory invariant "No hidden cloud or model call" applies to this change. The OSS CLI must remain useful without telemetry, API keys, hosted services, required LLM calls, or CodeDecayCloud.
Project invariant may be impacted (packages/cli/src/benchmark/corpus.ts:77): Memory invariant "Commands are explicit" applies to this change. CodeDecay must not run project commands unless they are configured and safety.allowCommands is true.
Project invariant may be impacted (docs/loop.md:33): Memory invariant "Tool evidence is separate from AI suggestions" applies to this change. Reports must not present agent/model suggestions as verified evidence unless backed by deterministic checks or command output.
Api area changed (packages/harness/src/loop/controller.ts:11): packages/harness/src/loop/controller.ts touches a api area and should be reviewed for regression impact.
Missing auth entry-point candidate (packages/cli/src/benchmark/corpus.ts:1): A public route or controller entry point changed without an obvious auth/session guard in the file. Evidence: Route/controller entry point has no obvious auth, session, token, or permission guard in the same file.
Path traversal candidate (packages/cli/src/benchmark/corpus.ts:560): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
Path traversal candidate (packages/cli/src/benchmark/corpus.ts:576): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.
Path traversal candidate (packages/cli/src/benchmark/corpus.ts:579): File-system access appears to use request-controlled path input. Evidence: File access is built from request-controlled input.

Medium Risk Findings

Broad unrelated change set: This PR changes 18 files across 1 top-level areas and 3 risk categories.
Duplicated added logic (packages/harness/src/index.ts:20): A similar block of added logic appears 2 times across 2 file(s).
Duplicated added logic (packages/harness/src/index.ts:23): A similar block of added logic appears 2 times across 2 file(s).
High complexity in changed function (packages/harness/src/loop/controller.ts:21): runCodeDecayLoop has estimated cyclomatic complexity 13.
Large changed function (packages/cli/src/parsers/loop.ts:6): parseLoopArgs spans 122 lines, which increases review and regression risk.
Large changed function (packages/harness/src/loop/controller.ts:21): runCodeDecayLoop spans 143 lines, which increases review and regression risk.
Project invariant may be impacted (docs/loop.md:33): Memory invariant "Output must be actionable" applies to this change. Redteam reports and agent bundles should say what behavior to verify, which test proof is weak or missing, and what task a coding agent should perform.
SQL injection candidate (packages/cli/src/benchmark/corpus.ts:180): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
SQL injection candidate (packages/cli/src/benchmark/corpus.ts:334): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
SQL injection candidate (packages/cli/src/benchmark/corpus.ts:354): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
SQL injection candidate (packages/cli/src/benchmark/corpus.ts:531): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
SQL injection candidate (packages/matchers/src/defaults.ts:30): A database query appears to include unsafe raw SQL or request-controlled input. Evidence: Dynamic SQL is built from request or function-parameter input without a visible database sink. Use the Semgrep adapter for deeper validation.
Large test change relative to source change (packages/cli/test/loop-e2e.test.ts:1): packages/cli/test/loop-e2e.test.ts adds 154 lines of tests for 609 source additions.
Large test change relative to source change (packages/harness/test/loop.test.ts:7): packages/harness/test/loop.test.ts adds 122 lines of tests for 609 source additions.

Low Risk Findings

Architecture note applies (packages/cli/src/benchmark/corpus.ts:77): CLI is the published surface: The public npm package is @submuxhq/codedecay and the binary is codedecay. Internal workspace packages are implementation details.
Docs area changed (docs/loop.md:33): docs/loop.md touches a docs area and should be reviewed for regression impact.
Docs area changed (packages/cli/src/docs/command-docs/orchestration.ts:130): packages/cli/src/docs/command-docs/orchestration.ts touches a docs area and should be reviewed for regression impact.
Source area changed (packages/cli/src/benchmark/corpus.ts:77): packages/cli/src/benchmark/corpus.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/cli/src/commands/loop.ts:5): packages/cli/src/commands/loop.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/cli/src/parsers/loop.ts:10): packages/cli/src/parsers/loop.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/cli/src/types/loop.ts:13): packages/cli/src/types/loop.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/core/src/types/security.ts:3): packages/core/src/types/security.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/harness/src/index.ts:7): packages/harness/src/index.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/harness/src/loop/index.ts:2): packages/harness/src/loop/index.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/harness/src/loop/render.ts:22): packages/harness/src/loop/render.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/harness/src/loop/types.ts:5): packages/harness/src/loop/types.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/matchers/src/defaults.ts:6): packages/matchers/src/defaults.ts touches a source area and should be reviewed for regression impact.
Source area changed (packages/matchers/src/utils.ts:145): packages/matchers/src/utils.ts touches a source area and should be reviewed for regression impact.
Test area changed (packages/cli/test/benchmark-corpus.test.ts:250): packages/cli/test/benchmark-corpus.test.ts touches a test area and should be reviewed for regression impact.
Test area changed (packages/cli/test/benchmark.test.ts:31): packages/cli/test/benchmark.test.ts touches a test area and should be reviewed for regression impact.
Test area changed (packages/cli/test/loop-e2e.test.ts:1): packages/cli/test/loop-e2e.test.ts touches a test area and should be reviewed for regression impact.
Test area changed (packages/cli/test/loop.test.ts:15): packages/cli/test/loop.test.ts touches a test area and should be reviewed for regression impact.
Test area changed (packages/harness/test/loop.test.ts:7): packages/harness/test/loop.test.ts touches a test area and should be reviewed for regression impact.
Test area changed (packages/matchers/test/matchers.test.ts:190): packages/matchers/test/matchers.test.ts touches a test area and should be reviewed for regression impact.

Recommended Checks

Add or run tests covering packages/cli/src/docs/command-docs/orchestration.ts
Add or run tests covering packages/harness/src/index.ts
Add or run tests covering packages/harness/src/loop/controller.ts
Add or run tests covering packages/harness/src/loop/index.ts
Add or run tests covering packages/matchers/src/utils.ts
Exercise packages/cli/src/benchmark/corpus.ts through its public API instead of copying its logic
Exercise packages/cli/src/commands/loop.ts through its public API instead of copying its logic
Flow check (CLI release smoke): Run built CLI smoke tests
Flow check (CLI release smoke): Run package dry-run
Flow check (CLI release smoke): Run published-package or tarball demo before release
Flow check (Pull request redteam review): Check weak or missing test proof
Flow check (Pull request redteam review): Keep deterministic evidence separate from AI suggestions

Notes

CodeDecay is deterministic and local-first. This report was generated without telemetry, API keys, LLMs, or model calls.

Found by CodeDecay - deterministic, local-first, no telemetry.

feat(loop): gate merge safety on evidence depth

47467f4

kunaldhongade force-pushed the feature/614-loop-evidence-gate branch from 6869a69 to 47467f4 Compare July 1, 2026 08:55

kunaldhongade merged commit bd82ee7 into main Jul 1, 2026
7 checks passed

kunaldhongade deleted the feature/614-loop-evidence-gate branch July 1, 2026 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(loop): gate merge safety on evidence depth#615

feat(loop): gate merge safety on evidence depth#615
kunaldhongade merged 1 commit into
mainfrom
feature/614-loop-evidence-gate

kunaldhongade commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026 •

edited

Loading

CodeDecay Report

Changed Files

Likely Impacted Areas

Language And Parser Coverage

Merge Risk Breakdown

Decay Risk Breakdown

Security Risk Breakdown

Security Matcher Coverage

Security Candidates

Test Evidence

High Risk Findings

Medium Risk Findings

Low Risk Findings

Recommended Checks

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kunaldhongade commented Jun 30, 2026

What changed

Why

Validation

Uh oh!

github-actions Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodeDecay PR Check

CodeDecay Report

Changed Files

Likely Impacted Areas

Language And Parser Coverage

Merge Risk Breakdown

Decay Risk Breakdown

Security Risk Breakdown

Security Matcher Coverage

Security Candidates

Test Evidence

High Risk Findings

Medium Risk Findings

Low Risk Findings

Recommended Checks

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 30, 2026 •

edited

Loading