Skip to content

feat(topology): recover code duplication per sub-package with per-child engine routing#122

Merged
maudlin merged 1 commit into
mainfrom
78-perpackage-duplication
Jun 25, 2026
Merged

feat(topology): recover code duplication per sub-package with per-child engine routing#122
maudlin merged 1 commit into
mainfrom
78-perpackage-duplication

Conversation

@maudlin

@maudlin maudlin commented Jun 25, 2026

Copy link
Copy Markdown
Owner

What

On an undeclared fan-out, duplication ran once over the whole tree, routed by whole-tree engine selection. This recovers it per assessment root, with the engine re-routed per child:

  • a node package (own package.json + JS/TS-dominant source — read from the one cached scc walk sliced to the subtree) runs jscpd in the package (npm, cwd = the child);
  • otherwise lizard runs over the inventory file list sliced to the subtree (inventory_paths_under, CWD-relative).

Records are namespaced via SLUG_NS (svc/duplication); the console is labelled per package (📦 Sub-package: svc). Score accumulates per package, matching the increment-1 cluster precedent.

This is #78 Phase 2b — the duplication slice. Complexity (the eslint/lizard/scc merge) still runs whole-tree; recovering it is the next, larger slice.

Why this is safe

inventory_paths_under "." is the identity (whole inventory, TARGET-relative), and the loop runs once at . with SLUG_NS empty reusing the whole-tree engine + probes (no cd) → a single package / declared workspace is byte-identical to before (the gate).

Verification

  • test/source-inventory.test.sh — added inventory_paths_under cases (. identity, prefix-strip CWD-relative, extension filter, identity matches inventory_paths, no-match): 40 passed. Full CI suite green locally (12 suites).
  • End-to-end fan-out smoke: a Go-dominant child (with package.json+lockfile) → lizard arm (svc/duplication); a node child with no quality:duplicates script → jscpd arm, honest skip (web/duplication); both labelled.
  • End-to-end byte-identical on a single-package Go repo (exercises the . lizard path): the duplication record + console block + all 27 parsed records identical between main and this branch (only the absolute CHECKUP_OUT_DIR path inside one git-hotspots message string differs).

Refs #78

🤖 Generated with Claude Code

…ld engine routing (#78)

On an undeclared fan-out, `duplication` ran once over the whole tree, routed by
whole-tree engine selection. Recover it per assessment root with the engine
re-routed per child: a node package (own package.json + JS/TS-dominant source,
read from the one cached scc walk sliced to the subtree) runs jscpd *in* the
package (npm); otherwise lizard runs over the inventory file list sliced to the
subtree (`inventory_paths_under`, CWD-relative). Records are namespaced via
SLUG_NS (`svc/duplication`) and the console is labelled per package.

`inventory_paths_under "."` is the identity (whole inventory, TARGET-relative)
and the loop runs once at "." with SLUG_NS empty reusing the whole-tree engine,
so a single package / declared workspace is byte-identical to before (verified
end-to-end: all parsed records unchanged on a single-package repo).

This is #78 Phase 2b, the duplication slice. Complexity (the eslint/lizard/scc
merge) still runs whole-tree — recovering it per package is the next slice.

Refs #78

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012oHR4g8pH7Ui242SRycFzw
@maudlin maudlin merged commit 8b426be into main Jun 25, 2026
6 checks passed
@maudlin maudlin deleted the 78-perpackage-duplication branch June 25, 2026 13:30
maudlin added a commit that referenced this pull request Jun 25, 2026
…ine routing (#78) (#123)

The last whole-tree measurement arm. On an undeclared fan-out, complexity ran
once over the whole tree, routed by whole-tree engine selection. Recover it per
assessment root with the FULL engine ladder re-routed per child
(route_complexity_child, a scoped mirror of the whole-tree routing): ESLint on
the JS/TS slice using the child's OWN flat config + local bin, lizard on the
non-JS slice (inventory sliced to the subtree), scc fallback (keep-set sliced).
Findings are namespaced (backend/complexity) and labelled per package.

The single git-hotspots CSV is accumulated across packages — truncated once
before the loop, appended per arm, always in TARGET-relative (namespaced) paths
(scc/lizard/eslint findings re-prefixed; the standalone-lizard CSV's file column
namespaced via awk) — so the churn × complexity join stays whole-tree. If nothing
is measured the (empty) CSV is dropped, matching the pre-#78 absent state.

Single package / declared workspace → one iteration at "." reusing the whole-tree
routing, no cd, SLUG_NS empty, $PWD == $TARGET → byte-identical to before, CSV
included (verified end-to-end on the lizard, scc and merged arms + full
parsed-set diff). A fan-out routes each child to its own engine and accumulates a
namespaced CSV.

With stats (#121), duplication (#122) and now complexity, every measurement arm
recovers per sub-package — #78 is complete.

Closes #78


Claude-Session: https://claude.ai/code/session_012oHR4g8pH7Ui242SRycFzw

Co-authored-by: Mark Ridley <210189+maudlin@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant