Skip to content

refactor(migration-tools): consolidate dag.ts BFS + VP3 analysis#360

Merged
saevarb merged 3 commits intomainfrom
migration-performance
Apr 22, 2026
Merged

refactor(migration-tools): consolidate dag.ts BFS + VP3 analysis#360
saevarb merged 3 commits intomainfrom
migration-performance

Conversation

@saevarb
Copy link
Copy Markdown
Contributor

@saevarb saevarb commented Apr 21, 2026

Migration graph: perf refactor + VP3 analysis

Summary

Refactors dag.ts to consolidate five hand-rolled BFS loops behind a single generic bfs<E> generator + a minimal Queue<T> class, and fixes a real stack-overflow bug in detectCycles. Ships the analysis documents that close (or explicitly defer) the three halves of April milestone VP3 — graph op scaling, planner cost, and on-disk contract size.

The PR is three commits:

  1. fix(migration-tools): make detectCycles iterative to avoid stack overflow
  2. refactor(migration-tools): consolidate dag.ts BFS around a generic generator
  3. docs(graph-based-migrations): graph perf + disk-sizing analysis

The benchmark harness used to produce the numbers below is not part of this PR — it lives on a companion branch (see Notes for the reviewer).

Why

  • Correctness: detectCycles's recursive DFS threw RangeError: Maximum call stack size exceeded at ~5,000 linear migration nodes under Node's default stack — well within the size of a plausible migration history.
  • Clarity: five subtly-different hand-rolled BFS loops in dag.ts (findPath, findDivergencePoint, findReachableLeaves, detectOrphans, plus a local helper in findPathWithDecision), each with its own queue, visited set, and state machine. Replacing them with one generator makes the direction of traversal a caller concern and collapses each site to a short for-of loop.
  • VP3 close-out: April milestone asked for pass/fail numbers on graph scaling, plan/diff time, and disk footprint. This PR gets those numbers on paper (and defers the planner half explicitly).

Behaviour changes

1. detectCycles terminates on any graph size

Before: recursive DFS; crashed at ~5,000 consecutive linear nodes.
After: iterative DFS with an explicit frame stack. Same three-colour algorithm, same cycle-reconstruction logic, same public signature. No stack-depth limit beyond available memory.

2. BFS ops unified behind bfs<E> + Queue<T>

Before: each call site maintained its own const queue = [start]; while (queue.length) { const x = queue.shift(); … } with Array.shift() (O(n) on V8) and site-specific visited/parent bookkeeping.

After: a generic bfs<E> generator at packages/1-framework/3-tooling/migration/src/graph-ops.ts yields { node, parent, incomingEdge } steps. Direction is expressed by the caller's neighbours closure (returns { next, edge } pairs). A tiny amortised-O(1) Queue<T> at packages/1-framework/3-tooling/migration/src/queue.ts uses a head-index cursor instead of Array.shift.

Each former BFS site becomes a few lines:

// collectNodesReachingTarget (used by findPathWithDecision)
const reached = new Set<string>();
for (const step of bfs([toHash], (n) => reverseNeighbours(graph, n))) {
  reached.add(step.node);
}
return reached;

3. findPathWithDecision is O(V + E), not O(|path| · (V + E))

Before: the alternative-neighbour loop called findPath(e.to, toHash) once per outgoing edge along the selected path, so a graph with |path|=1000 incurred 1,000 extra BFS traversals.

After: a single reverse-reachability BFS from toHash populates a Set<string>; the per-edge check becomes O(1) set membership. Structural, not cosmetic — the old code was quadratic in path length.

4. findLeaf returns string | null instead of an overloaded sentinel

Before: returned EMPTY_CONTRACT_HASH for two unrelated cases (empty graph, root-only graph). Callers had to decode that as "no target".

After: returns string | nullnull means "no target". Happy-path callers that constructed non-empty graphs add leaf! where they pass it on to findPath (8 call sites in CLI tests, all of which construct a deliberate non-empty graph).

Design decision worth flagging

The bfs<E> generator is measurably slower than the hand-rolled loops it replaces. Per-op microbenches show 1.7–3.2× regressions across the suite — generator dispatch and per-yield object allocation aren't free. The absolute numbers are tiny (every op stays below 10 ms on 10,000-node graphs) but the ratio is real.

This is an intentional trade:

  • Every operation runs once per CLI invocation (migration status / plan / apply), not in a loop.
  • The worst-case absolute costfindPathWithDecision on a merge-heavy 1k-spine graph — is under 1 ms. It was 100 ms with the pre-fix quadratic; it's 1 ms now even with generator overhead.
  • Code clarity matters more than constants for a cold-path subsystem: the 5 former BFS sites are now 3-line for-of loops over a shared primitive.

The analysis document shows the scaling curves and explains this in more detail.

Conclusions (from the analysis docs)

Graph ops — docs/planning/benchmarks/migration-graph-baseline.md

Linear in edge count across every shape. At 10k-edge graphs, no op exceeds ~5 ms. Rendering pipeline (CLI formatter + dagre layout) sits downstream and is fast in the default path because it truncates before layout. Full---graph on huge graphs is a known risk because dagre's rank assignment can degrade, but that's not a new concern introduced by this PR.

On-disk contract size — projects/graph-based-migrations/disk-sizing-investigation.md

Every migration.json embeds both fromContract and toContract in full with no compression or dedup. Measured on three real Prisma ORM schemas (215, 880, 893 models): ~7 KB per model per contract copy. At the 880-model scale a single migration is ≥ 18 MiB, and 100 migrations ≈ 2 GiB raw.

Compression experiments show gzip alone is a ~20× win (contract JSON has heavy structural repetition; compresses to ~4%). A content-addressed store + zstd with a trained dictionary gets a further 50–100× by exploiting cross-contract redundancy that gzip's 32 KiB window can't see. Memo explores three architectural options and recommends gzip-per-file as the quick win, zstd + shared dictionary as the right long-term shape.

Planner cost — docs/planning/benchmarks/migration-planner-baseline.md

Deferred. The migration planner is being reworked and unified for different targets in parallel; benchmarking the current implementation would measure code about to change. Revisit after the rework lands.

Verification

  • pnpm --filter @prisma-next/migration-tools typecheck — green
  • pnpm --filter @prisma-next/migration-tools test — 171 tests pass across 10 files (10 new bfs tests, 7 new Queue tests, 2 new detectCycles / dead-end regression tests, rest unchanged)
  • pnpm --filter @prisma-next/cli typecheck — green after propagating findLeaf's nullable return (8 test-site ! assertions in migration-apply.test.ts)
  • pnpm --filter @prisma-next/cli test — 536 tests pass

Non-goals / out of scope

  • Planner benchmark — deferred pending the in-progress planner rework.
  • Compression / storage-layout changes — the disk-sizing memo proposes options; this PR does not implement them.
  • CI bench gates — deliberately not wired up. Graph-based migrations aren't a perf-critical subsystem and CI bench flakiness would be noise.
  • findLeaf AMBIGUOUS_TARGET diagnostic path — remains quadratic-ish in divergent branch count. Only on the error path; noted in the memo but not fixed.

Follow-ups

  • Revisit the planner bench once the descriptor-planner rework lands.
  • Revisit the disk-sizing recommendation if real users hit storage friction. Gzip-per-file is a small PR; content-addressed + zstd dictionary is a week of focused work.

Notes for the reviewer

  • Reading order: start with the analysis doc you care most about; the memos link back to each other. The code changes are straightforward on their own.
  • Companion bench branch. The benchmark harness (vitest-bench suite + scaling-chart generator + PSL preprocessing tooling) lives on migration-performance-bench — same three commits as this PR plus one additional chore commit that adds packages/1-framework/3-tooling/migration/bench/. Kept off the PR deliberately (graph-based migrations aren't a perf-critical subsystem; not worth maintaining a harness in-tree). Run git checkout migration-performance-bench + pnpm bench:save + node bench/snapshots/build-chart.mjs locally to reproduce the numbers.

Summary by CodeRabbit

  • Documentation

    • Added detailed performance baselines for graph and planner components, plus a rendering-pipeline outlook.
  • Bug Fixes

    • Cycle detection made robust for very deep chains; traversal/path-finding reworked to remove redundant work and improve scalability.
  • Refactor

    • Traversal internals rewritten for deterministic, iterative BFS/DFS behavior; target-finding now yields null when no valid target exists.
  • Tests

    • Added comprehensive tests for traversal, queue behavior, and migration-graph scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough

Walkthrough

Refactors DAG traversal to a shared BFS/Queue infrastructure, replaces recursive cycle detection with an iterative three-color DFS, tightens path/leaf return semantics (nullable leaf), adds BFS/Queue tests and a deep-chain cycle safety test, and adds two benchmark documentation pages for graph-layer performance.

Changes

Cohort / File(s) Summary
Documentation
docs/planning/benchmarks/migration-graph-baseline.md, docs/planning/benchmarks/migration-planner-baseline.md
Added graph-layer performance baseline doc and a deferred Postgres-planner baseline; records scope, reproduction context, measured scaling for DAG ops, hypotheses, and rendering pipeline outlook.
Queue Utility
packages/1-framework/3-tooling/migration/src/queue.ts
Added Queue<T> with head-index cursor for amortized O(1) push/shift to avoid array shift() costs.
BFS Traversal Infrastructure
packages/1-framework/3-tooling/migration/src/graph-ops.ts
Added generic bfs generator and BfsStep interface yielding parent/incoming-edge metadata, visited tracking, and optional deterministic ordering.
DAG Operations Refactor
packages/1-framework/3-tooling/migration/src/dag.ts
Replaced manual queues with bfs/Queue; added forwardNeighbours/reverseNeighbours and appendEdge; deterministic neighbor tie-breaking; optimized findPathWithDecision via reverse-BFS precomputation; replaced recursive detectCycles with iterative three-color DFS; changed findLeaf → `string
Tests — DAG / Graph Ops / Queue
packages/1-framework/3-tooling/migration/test/dag.test.ts, packages/1-framework/3-tooling/migration/test/graph-ops.test.ts, packages/1-framework/3-tooling/migration/test/queue.test.ts
Added BFS unit tests (traversal, ordering, multiple starts, cycles, early termination), Queue tests (FIFO, iterable init, reuse, error on empty shift), updated findLeaf expectation to null for empty graphs, and added a 20k-node deep-chain cycle-detection regression test.
CLI Test Adjustment
packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
Updated findPath call sites to use non-null assertion leaf! to satisfy stricter typing at callers.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🐇 I hopped through queues with head held high,
BFS made neat tracks beneath the sky,
No more deep stacks tumbling out of sight,
Iterative checks keep cycles light,
Cheers — the graph now hops just right! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly summarizes the main changes: refactoring BFS logic in dag.ts and adding VP3 analysis documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch migration-performance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 21, 2026

Open in StackBlitz

@prisma-next/mongo-runtime

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-runtime@360

@prisma-next/family-mongo

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/family-mongo@360

@prisma-next/sql-runtime

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-runtime@360

@prisma-next/family-sql

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/family-sql@360

@prisma-next/middleware-telemetry

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/middleware-telemetry@360

@prisma-next/mongo

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo@360

@prisma-next/extension-paradedb

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/extension-paradedb@360

@prisma-next/extension-pgvector

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/extension-pgvector@360

@prisma-next/postgres

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/postgres@360

@prisma-next/sql-orm-client

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-orm-client@360

@prisma-next/sqlite

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sqlite@360

@prisma-next/target-mongo

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/target-mongo@360

@prisma-next/adapter-mongo

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/adapter-mongo@360

@prisma-next/driver-mongo

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/driver-mongo@360

@prisma-next/contract

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/contract@360

@prisma-next/utils

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/utils@360

@prisma-next/config

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/config@360

@prisma-next/errors

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/errors@360

@prisma-next/framework-components

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/framework-components@360

@prisma-next/operations

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/operations@360

@prisma-next/ts-render

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/ts-render@360

@prisma-next/contract-authoring

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/contract-authoring@360

@prisma-next/ids

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/ids@360

@prisma-next/psl-parser

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/psl-parser@360

@prisma-next/psl-printer

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/psl-printer@360

@prisma-next/cli

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/cli@360

@prisma-next/emitter

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/emitter@360

@prisma-next/migration-tools

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/migration-tools@360

prisma-next

npm i https://pkg.pr.new/prisma/prisma-next@360

@prisma-next/vite-plugin-contract-emit

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/vite-plugin-contract-emit@360

@prisma-next/runtime-executor

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/runtime-executor@360

@prisma-next/mongo-codec

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-codec@360

@prisma-next/mongo-contract

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-contract@360

@prisma-next/mongo-value

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-value@360

@prisma-next/mongo-contract-psl

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-contract-psl@360

@prisma-next/mongo-contract-ts

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-contract-ts@360

@prisma-next/mongo-emitter

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-emitter@360

@prisma-next/mongo-schema-ir

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-schema-ir@360

@prisma-next/mongo-query-ast

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-query-ast@360

@prisma-next/mongo-orm

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-orm@360

@prisma-next/mongo-query-builder

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-query-builder@360

@prisma-next/mongo-lowering

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-lowering@360

@prisma-next/mongo-wire

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/mongo-wire@360

@prisma-next/sql-contract

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-contract@360

@prisma-next/sql-errors

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-errors@360

@prisma-next/sql-operations

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-operations@360

@prisma-next/sql-schema-ir

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-schema-ir@360

@prisma-next/sql-contract-psl

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-contract-psl@360

@prisma-next/sql-contract-ts

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-contract-ts@360

@prisma-next/sql-contract-emitter

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-contract-emitter@360

@prisma-next/sql-lane-query-builder

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-lane-query-builder@360

@prisma-next/sql-relational-core

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-relational-core@360

@prisma-next/sql-builder

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/sql-builder@360

@prisma-next/target-postgres

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/target-postgres@360

@prisma-next/target-sqlite

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/target-sqlite@360

@prisma-next/adapter-postgres

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/adapter-postgres@360

@prisma-next/adapter-sqlite

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/adapter-sqlite@360

@prisma-next/driver-postgres

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/driver-postgres@360

@prisma-next/driver-sqlite

npm i https://pkg.pr.new/prisma/prisma-next/@prisma-next/driver-sqlite@360

commit: bf0f31d

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/1-framework/3-tooling/migration/src/dag.ts (2)

237-258: ⚠️ Potential issue | 🟡 Minor

Skip common ancestors that are not reachable from fromHash.

findPath() returning null currently becomes depth 0, so an orphan node that points to multiple leaves can beat fromHash if it appears first in commonAncestors. Skip unreachable ancestors before comparing depths.

Proposed fix
   let deepest = fromHash;
   let deepestDepth = -1;
   for (const ancestor of commonAncestors) {
     const path = findPath(graph, fromHash, ancestor);
-    const depth = path ? path.length : 0;
+    if (!path) continue;
+    const depth = path.length;
     if (depth > deepestDepth) {
       deepestDepth = depth;
       deepest = ancestor;
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/1-framework/3-tooling/migration/src/dag.ts` around lines 237 - 258,
commonAncestors may include nodes not reachable from fromHash which currently
get treated as depth 0 and can incorrectly beat fromHash; update the loop that
computes deepest (using ancestorSets/commonAncestors) to skip any ancestor for
which findPath(graph, fromHash, ancestor) returns null (i.e., continue if no
path) before computing depth and comparing to deepestDepth so only reachable
ancestors are considered when setting deepest/deepestDepth.

284-316: ⚠️ Potential issue | 🟠 Major

Return null before treating the root as a reachable leaf.

findReachableLeaves() yields the start node, so a graph containing only EMPTY_CONTRACT_HASH returns the root as the leaf here, despite the new contract saying this should be null. Guard on the root’s outgoing edges before collecting leaves.

Proposed fix
 export function findLeaf(graph: MigrationGraph): string | null {
   if (graph.nodes.size === 0) {
     return null;
   }
 
   if (!graph.nodes.has(EMPTY_CONTRACT_HASH)) {
     throw errorNoInitialMigration([...graph.nodes]);
   }
 
+  const initialEdges = graph.forwardChain.get(EMPTY_CONTRACT_HASH) ?? [];
+  if (initialEdges.length === 0) {
+    const nonRootNodes = [...graph.nodes].filter((n) => n !== EMPTY_CONTRACT_HASH);
+    if (nonRootNodes.length > 0) {
+      throw errorNoInitialMigration([...graph.nodes]);
+    }
+    return null;
+  }
+
   const leaves = findReachableLeaves(graph, EMPTY_CONTRACT_HASH);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/1-framework/3-tooling/migration/src/dag.ts` around lines 284 - 316,
findLeaf currently treats the start node (EMPTY_CONTRACT_HASH) as a reachable
leaf because findReachableLeaves includes the root; modify findLeaf so that
before using findReachableLeaves you check whether the root actually has
outgoing edges and if not and there are no other reachable nodes return null.
Concretely, in findLeaf (using EMPTY_CONTRACT_HASH) guard on the root’s outgoing
edges (the graph structure you use to represent edges), return null if the root
has no outbound edges and no other nodes to reach, otherwise compute leaves
excluding treating the root as a leaf and proceed with the existing
ambiguity/branch logic.
🧹 Nitpick comments (1)
packages/1-framework/3-tooling/migration/src/queue.ts (1)

30-31: Avoid suppressing Biome in production code.

The bounds check already proves the access is safe; use a local checked cast instead of biome-ignore.

Proposed fix
-    // biome-ignore lint/style/noNonNullAssertion: bounds-checked on the line above
-    return this.items[this.head++]!;
+    const item = this.items[this.head] as T;
+    this.head += 1;
+    return item;

As per coding guidelines, **/*.{ts,tsx}: "Never suppress biome lints".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/1-framework/3-tooling/migration/src/queue.ts` around lines 30 - 31,
Remove the biome-ignore and the non-null assertion on the return expression;
instead assign the value to a local const and perform a checked cast before
returning. Replace the line "return this.items[this.head++]!;" with two
statements: "const item = this.items[this.head++];" followed by "return item as
T;" (referencing the same this.items and this.head symbols) so the access
remains proven safe without suppressing Biome lints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/planning/benchmarks/migration-graph-baseline.md`:
- Line 18: The planning doc docs/planning/benchmarks/migration-graph-baseline.md
currently points readers to “See the PR body” for the benchmark harness which
will not be durable; update the document to include a permanent reproduction
pointer by adding either a stable branch or commit SHA link to the harness, an
archived snapshot (e.g., tarball or gist) or embedding the exact commands and
setup used to run the harness (dependencies, command-line invocation,
environment variables, and dataset version). Make the change inline in
migration-graph-baseline.md so the harness location and the exact reproduction
command are preserved after merge.

---

Outside diff comments:
In `@packages/1-framework/3-tooling/migration/src/dag.ts`:
- Around line 237-258: commonAncestors may include nodes not reachable from
fromHash which currently get treated as depth 0 and can incorrectly beat
fromHash; update the loop that computes deepest (using
ancestorSets/commonAncestors) to skip any ancestor for which findPath(graph,
fromHash, ancestor) returns null (i.e., continue if no path) before computing
depth and comparing to deepestDepth so only reachable ancestors are considered
when setting deepest/deepestDepth.
- Around line 284-316: findLeaf currently treats the start node
(EMPTY_CONTRACT_HASH) as a reachable leaf because findReachableLeaves includes
the root; modify findLeaf so that before using findReachableLeaves you check
whether the root actually has outgoing edges and if not and there are no other
reachable nodes return null. Concretely, in findLeaf (using EMPTY_CONTRACT_HASH)
guard on the root’s outgoing edges (the graph structure you use to represent
edges), return null if the root has no outbound edges and no other nodes to
reach, otherwise compute leaves excluding treating the root as a leaf and
proceed with the existing ambiguity/branch logic.

---

Nitpick comments:
In `@packages/1-framework/3-tooling/migration/src/queue.ts`:
- Around line 30-31: Remove the biome-ignore and the non-null assertion on the
return expression; instead assign the value to a local const and perform a
checked cast before returning. Replace the line "return
this.items[this.head++]!;" with two statements: "const item =
this.items[this.head++];" followed by "return item as T;" (referencing the same
this.items and this.head symbols) so the access remains proven safe without
suppressing Biome lints.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 6974d52a-3cd4-4631-883a-52ea8bcd5ede

📥 Commits

Reviewing files that changed from the base of the PR and between 8770d1d and d2fb168.

⛔ Files ignored due to path filters (1)
  • projects/graph-based-migrations/disk-sizing-investigation.md is excluded by !projects/**
📒 Files selected for processing (9)
  • docs/planning/benchmarks/migration-graph-baseline.md
  • docs/planning/benchmarks/migration-planner-baseline.md
  • packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
  • packages/1-framework/3-tooling/migration/src/dag.ts
  • packages/1-framework/3-tooling/migration/src/graph-ops.ts
  • packages/1-framework/3-tooling/migration/src/queue.ts
  • packages/1-framework/3-tooling/migration/test/dag.test.ts
  • packages/1-framework/3-tooling/migration/test/graph-ops.test.ts
  • packages/1-framework/3-tooling/migration/test/queue.test.ts

Comment thread docs/planning/benchmarks/migration-graph-baseline.md
@saevarb saevarb force-pushed the migration-performance branch from d2fb168 to 553e0a7 Compare April 22, 2026 07:09
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/1-framework/3-tooling/migration/test/graph-ops.test.ts (2)

47-61: Prefer object matchers for related-field assertions.

Two spots check multiple related fields on the same step with individual expects — the repo guideline prefers toMatchObject for 2+ related values.

♻️ Proposed refactor
   it('yields the starting node with parent=null and incomingEdge=null', () => {
     const first = bfs(['A'], forward(sampleEdges)).next();
     expect(first.done).toBe(false);
     if (first.done) return;
-    expect(first.value.node).toBe('A');
-    expect(first.value.parent).toBeNull();
-    expect(first.value.incomingEdge).toBeNull();
+    expect(first.value).toMatchObject({ node: 'A', parent: null, incomingEdge: null });
   });

   it('yields parent and incomingEdge for non-start nodes', () => {
     const steps = [...bfs(['A'], forward(sampleEdges))];
     const b = steps.find((s) => s.node === 'B');
-    expect(b?.parent).toBe('A');
-    expect(b?.incomingEdge).toEqual({ from: 'A', to: 'B' });
+    expect(b).toMatchObject({ parent: 'A', incomingEdge: { from: 'A', to: 'B' } });
   });

As per coding guidelines: "Prefer object matchers (toMatchObject) over multiple individual expect().toBe() calls when checking 2 or more related values in tests".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/1-framework/3-tooling/migration/test/graph-ops.test.ts` around lines
47 - 61, Tests use multiple related expect() calls; replace them with object
matchers. In the first test (variable first from bfs(['A'],
forward(sampleEdges))) assert the returned step with a single
expect(first.value).toMatchObject({ node: 'A', parent: null, incomingEdge: null
}) instead of three separate expects; in the second test, find the step b from
[...bfs(...)] and replace the two expects on b?.parent and b?.incomingEdge with
expect(b).toMatchObject({ parent: 'A', incomingEdge: { from: 'A', to: 'B' } }).
Use the existing variables (first, b, steps) and functions (bfs, forward,
sampleEdges) to locate and update the assertions.

85-107: Test name overstates what's verified.

Title says "reaches target via shortest path when ordering is applied", but both diamond paths have equal length; the assertion only verifies that ordering influences parent selection (tie-break), not path length. Consider renaming to something like 'ordering callback determines parent selection on equal-length paths' to match the actual behavior being asserted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/1-framework/3-tooling/migration/test/graph-ops.test.ts` around lines
85 - 107, Rename the test to accurately describe what it verifies: update the
it(...) title from "reaches target via shortest path when ordering is applied"
to something like "ordering callback determines parent selection on equal-length
paths" (or similar). The test body uses preferMain, bfs, forward, steps and
asserts dStep.parent is 'C', so only the title needs changing to reflect that
the test checks tie-breaking/parent selection on equal-length diamond paths
rather than shortest-path behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/1-framework/3-tooling/migration/test/graph-ops.test.ts`:
- Around line 47-61: Tests use multiple related expect() calls; replace them
with object matchers. In the first test (variable first from bfs(['A'],
forward(sampleEdges))) assert the returned step with a single
expect(first.value).toMatchObject({ node: 'A', parent: null, incomingEdge: null
}) instead of three separate expects; in the second test, find the step b from
[...bfs(...)] and replace the two expects on b?.parent and b?.incomingEdge with
expect(b).toMatchObject({ parent: 'A', incomingEdge: { from: 'A', to: 'B' } }).
Use the existing variables (first, b, steps) and functions (bfs, forward,
sampleEdges) to locate and update the assertions.
- Around line 85-107: Rename the test to accurately describe what it verifies:
update the it(...) title from "reaches target via shortest path when ordering is
applied" to something like "ordering callback determines parent selection on
equal-length paths" (or similar). The test body uses preferMain, bfs, forward,
steps and asserts dStep.parent is 'C', so only the title needs changing to
reflect that the test checks tie-breaking/parent selection on equal-length
diamond paths rather than shortest-path behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 689ff2ff-1f3d-40e9-a2bb-2c66108d5842

📥 Commits

Reviewing files that changed from the base of the PR and between d2fb168 and 553e0a7.

⛔ Files ignored due to path filters (1)
  • projects/graph-based-migrations/disk-sizing-investigation.md is excluded by !projects/**
📒 Files selected for processing (9)
  • docs/planning/benchmarks/migration-graph-baseline.md
  • docs/planning/benchmarks/migration-planner-baseline.md
  • packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
  • packages/1-framework/3-tooling/migration/src/dag.ts
  • packages/1-framework/3-tooling/migration/src/graph-ops.ts
  • packages/1-framework/3-tooling/migration/src/queue.ts
  • packages/1-framework/3-tooling/migration/test/dag.test.ts
  • packages/1-framework/3-tooling/migration/test/graph-ops.test.ts
  • packages/1-framework/3-tooling/migration/test/queue.test.ts
✅ Files skipped from review due to trivial changes (4)
  • docs/planning/benchmarks/migration-planner-baseline.md
  • packages/1-framework/3-tooling/migration/test/dag.test.ts
  • packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
  • packages/1-framework/3-tooling/migration/test/queue.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/1-framework/3-tooling/migration/src/queue.ts
  • packages/1-framework/3-tooling/migration/src/graph-ops.ts
  • packages/1-framework/3-tooling/migration/src/dag.ts

Copy link
Copy Markdown
Contributor

@wmadden wmadden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice work @saevarb 👍🏻

The generic BFS implementation is a big improvement over the handrolled and repeated graph traversal that preceded it, and the generator interface is a nice touch.

My agent had some nitpicky comments which I won't waste your time with, the only one worth noting is that callers who want the path to the target node need to track it themselves, data which the BFS implementation technically maintains internally but doesn't expose.

saevarb added 3 commits April 22, 2026 16:27
…flow

The recursive DFS in detectCycles overflowed Node's default stack at around
5,000 linear migration nodes — a plausible size for long-lived projects.
Rewrite as an iterative three-colour DFS with an explicit frame stack,
preserving the cycle-reconstruction semantics.

Ships with a 20,000-node linear-chain regression test that would have
triggered `RangeError: Maximum call stack size exceeded` under the
recursive implementation.
…nerator

Prior state: five hand-rolled BFS loops in dag.ts (findPath,
findDivergencePoint, findReachableLeaves, detectOrphans, and a local
helper inside findPathWithDecision) with `Array.prototype.shift()` for
dequeue (O(n) on V8, making each loop O(V² + E)) and inconsistent
state management across sites.

This change introduces:

- `Queue<T>` (src/queue.ts): minimal amortised-O(1) FIFO with a
  head-index cursor over an array.
- `bfs<E>` (src/graph-ops.ts): a generic generator-based BFS.
  Direction is expressed by the caller's neighbour closure; each yielded
  step carries parent + incoming-edge for optional path reconstruction.
- dag.ts: every BFS site rewritten as a short for-of loop over `bfs(...)`.
  `findPathWithDecision` additionally switches from per-edge `findPath`
  re-traversal (O(|path|·(V+E))) to a single reverse-reachability BFS
  (O(V+E)).
- Assorted tidying: appendEdge helper in reconstructGraph,
  simplified findLatestMigration null-cascade, justified trailing
  assertion in findLeaf, section-commented tie-break helpers.

Performance note: the generator introduces measurable per-yield dispatch
overhead — microbench ops regress roughly 1.7–3.2× versus the old
hand-rolled loops (once those have their shift() quadratic fixed). This
is imperceptible at realistic call frequency: graph ops run once per CLI
invocation (migration status / plan / apply), never in a loop. Absolute
wall-clock stays sub-5 ms even at 10,000-node graphs. See the analysis
doc for numbers.
Ships the analysis documents that close out (or defer) April milestone
VP3's "graph scales with large contracts" question:

- docs/planning/benchmarks/migration-graph-baseline.md
  Scaling of every exported dag operation (findPath,
  findPathWithDecision, detectCycles, findReachableLeaves, etc.)
  across shape families, plus a by-inspection review of the CLI
  rendering pipeline downstream of those ops.

- docs/planning/benchmarks/migration-planner-baseline.md
  Stub marking the planner measurement as deferred — the
  descriptor-based Postgres planner is being reworked in parallel, so
  benchmarking the current implementation would measure code that is
  about to change.

- projects/graph-based-migrations/disk-sizing-investigation.md
  Real-world measurements on three 200–900 model schemas plus
  compression experiments. Conclusion: today's layout stores both
  fromContract and toContract in full in every migration manifest
  (O(20 MiB) per migration on large schemas). Gzip alone is a ~20× win;
  content-addressed store + zstd with a trained dictionary can reduce
  total disk by another order of magnitude or two.

The benchmark harness itself is not part of this PR. The analysis
documents are self-contained with the numbers captured on this branch.
@saevarb saevarb force-pushed the migration-performance branch from 553e0a7 to bf0f31d Compare April 22, 2026 14:28
@saevarb saevarb enabled auto-merge (squash) April 22, 2026 14:28
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/1-framework/3-tooling/migration/src/dag.ts (1)

293-316: ⚠️ Potential issue | 🟠 Major

Handle the root-only graph before returning the sole leaf.

findReachableLeaves(graph, EMPTY_CONTRACT_HASH) yields the start node when it has no outgoing edges, so a graph containing only EMPTY_CONTRACT_HASH reaches Line 316 and returns the root hash instead of null, contradicting the new nullable contract documented on Lines 278-280.

🐛 Proposed fix
   const leaves = findReachableLeaves(graph, EMPTY_CONTRACT_HASH);
 
+  if (leaves.length === 1 && leaves[0] === EMPTY_CONTRACT_HASH) {
+    const nonRootNodes = [...graph.nodes].filter((n) => n !== EMPTY_CONTRACT_HASH);
+    if (nonRootNodes.length > 0) {
+      throw errorNoTarget(nonRootNodes);
+    }
+    return null;
+  }
+
   if (leaves.length === 0) {
     const reachable = [...graph.nodes].filter((n) => n !== EMPTY_CONTRACT_HASH);
     if (reachable.length > 0) {
       throw errorNoTarget(reachable);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/1-framework/3-tooling/migration/src/dag.ts` around lines 293 - 316,
findReachableLeaves may return the start node (EMPTY_CONTRACT_HASH) when the
graph only contains the root with no outgoing edges, causing the function to
return the root instead of null; update the final branch in the function that
currently returns leaves[0]! to detect this "root-only" case and return null:
after computing leaves (from findReachableLeaves) but before returning the sole
leaf, check if that sole leaf equals EMPTY_CONTRACT_HASH and that there are no
other reachable nodes (e.g., filter graph.nodes or the reachable set to exclude
EMPTY_CONTRACT_HASH or check reachable.length === 0); if so return null,
otherwise return the leaf as before (preserving the existing
ambiguity/divergence checks using findDivergencePoint, findPath,
errorAmbiguousTarget, errorNoTarget).
♻️ Duplicate comments (1)
docs/planning/benchmarks/migration-graph-baseline.md (1)

18-18: ⚠️ Potential issue | 🟡 Minor

Make the benchmark harness pointer durable.

This still points to “the PR body,” which will be brittle after merge. Please include a stable branch/commit, archived harness snapshot, or the exact commands/setup used to reproduce the numbers.

As per coding guidelines, **/*.{md,ts,tsx}: "Keep docs current (READMEs, rules, links) and prefer links to canonical docs over long comments".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/planning/benchmarks/migration-graph-baseline.md` at line 18, Update the
docs/planning/benchmarks/migration-graph-baseline.md entry that currently says
"See the PR body" to include a durable pointer: add a stable branch name or
exact commit SHA for the benchmark harness (or a permalink to an archived
snapshot), and/or include the exact commands, environment, and setup steps used
to reproduce the numbers; ensure the text that referenced the PR body is
replaced with the chosen durable reference and a brief reproducibility checklist
so readers can rerun the harness without relying on the PR description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/1-framework/3-tooling/migration/src/dag.ts`:
- Around line 293-316: findReachableLeaves may return the start node
(EMPTY_CONTRACT_HASH) when the graph only contains the root with no outgoing
edges, causing the function to return the root instead of null; update the final
branch in the function that currently returns leaves[0]! to detect this
"root-only" case and return null: after computing leaves (from
findReachableLeaves) but before returning the sole leaf, check if that sole leaf
equals EMPTY_CONTRACT_HASH and that there are no other reachable nodes (e.g.,
filter graph.nodes or the reachable set to exclude EMPTY_CONTRACT_HASH or check
reachable.length === 0); if so return null, otherwise return the leaf as before
(preserving the existing ambiguity/divergence checks using findDivergencePoint,
findPath, errorAmbiguousTarget, errorNoTarget).

---

Duplicate comments:
In `@docs/planning/benchmarks/migration-graph-baseline.md`:
- Line 18: Update the docs/planning/benchmarks/migration-graph-baseline.md entry
that currently says "See the PR body" to include a durable pointer: add a stable
branch name or exact commit SHA for the benchmark harness (or a permalink to an
archived snapshot), and/or include the exact commands, environment, and setup
steps used to reproduce the numbers; ensure the text that referenced the PR body
is replaced with the chosen durable reference and a brief reproducibility
checklist so readers can rerun the harness without relying on the PR
description.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 1f6fa39a-cc37-4d97-b6bb-daf45a89b144

📥 Commits

Reviewing files that changed from the base of the PR and between 553e0a7 and bf0f31d.

⛔ Files ignored due to path filters (1)
  • projects/graph-based-migrations/disk-sizing-investigation.md is excluded by !projects/**
📒 Files selected for processing (9)
  • docs/planning/benchmarks/migration-graph-baseline.md
  • docs/planning/benchmarks/migration-planner-baseline.md
  • packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
  • packages/1-framework/3-tooling/migration/src/dag.ts
  • packages/1-framework/3-tooling/migration/src/graph-ops.ts
  • packages/1-framework/3-tooling/migration/src/queue.ts
  • packages/1-framework/3-tooling/migration/test/dag.test.ts
  • packages/1-framework/3-tooling/migration/test/graph-ops.test.ts
  • packages/1-framework/3-tooling/migration/test/queue.test.ts
✅ Files skipped from review due to trivial changes (4)
  • packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
  • docs/planning/benchmarks/migration-planner-baseline.md
  • packages/1-framework/3-tooling/migration/test/queue.test.ts
  • packages/1-framework/3-tooling/migration/src/queue.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/1-framework/3-tooling/migration/test/dag.test.ts

@saevarb saevarb merged commit 3134f51 into main Apr 22, 2026
16 checks passed
@saevarb saevarb deleted the migration-performance branch April 22, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants