refactor(migration-tools): consolidate dag.ts BFS + VP3 analysis#360
refactor(migration-tools): consolidate dag.ts BFS + VP3 analysis#360
Conversation
📝 WalkthroughWalkthroughRefactors DAG traversal to a shared BFS/Queue infrastructure, replaces recursive cycle detection with an iterative three-color DFS, tightens path/leaf return semantics (nullable leaf), adds BFS/Queue tests and a deep-chain cycle safety test, and adds two benchmark documentation pages for graph-layer performance. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
@prisma-next/mongo-runtime
@prisma-next/family-mongo
@prisma-next/sql-runtime
@prisma-next/family-sql
@prisma-next/middleware-telemetry
@prisma-next/mongo
@prisma-next/extension-paradedb
@prisma-next/extension-pgvector
@prisma-next/postgres
@prisma-next/sql-orm-client
@prisma-next/sqlite
@prisma-next/target-mongo
@prisma-next/adapter-mongo
@prisma-next/driver-mongo
@prisma-next/contract
@prisma-next/utils
@prisma-next/config
@prisma-next/errors
@prisma-next/framework-components
@prisma-next/operations
@prisma-next/ts-render
@prisma-next/contract-authoring
@prisma-next/ids
@prisma-next/psl-parser
@prisma-next/psl-printer
@prisma-next/cli
@prisma-next/emitter
@prisma-next/migration-tools
prisma-next
@prisma-next/vite-plugin-contract-emit
@prisma-next/runtime-executor
@prisma-next/mongo-codec
@prisma-next/mongo-contract
@prisma-next/mongo-value
@prisma-next/mongo-contract-psl
@prisma-next/mongo-contract-ts
@prisma-next/mongo-emitter
@prisma-next/mongo-schema-ir
@prisma-next/mongo-query-ast
@prisma-next/mongo-orm
@prisma-next/mongo-query-builder
@prisma-next/mongo-lowering
@prisma-next/mongo-wire
@prisma-next/sql-contract
@prisma-next/sql-errors
@prisma-next/sql-operations
@prisma-next/sql-schema-ir
@prisma-next/sql-contract-psl
@prisma-next/sql-contract-ts
@prisma-next/sql-contract-emitter
@prisma-next/sql-lane-query-builder
@prisma-next/sql-relational-core
@prisma-next/sql-builder
@prisma-next/target-postgres
@prisma-next/target-sqlite
@prisma-next/adapter-postgres
@prisma-next/adapter-sqlite
@prisma-next/driver-postgres
@prisma-next/driver-sqlite
commit: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/1-framework/3-tooling/migration/src/dag.ts (2)
237-258:⚠️ Potential issue | 🟡 MinorSkip common ancestors that are not reachable from
fromHash.
findPath()returningnullcurrently becomes depth0, so an orphan node that points to multiple leaves can beatfromHashif it appears first incommonAncestors. Skip unreachable ancestors before comparing depths.Proposed fix
let deepest = fromHash; let deepestDepth = -1; for (const ancestor of commonAncestors) { const path = findPath(graph, fromHash, ancestor); - const depth = path ? path.length : 0; + if (!path) continue; + const depth = path.length; if (depth > deepestDepth) { deepestDepth = depth; deepest = ancestor; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/1-framework/3-tooling/migration/src/dag.ts` around lines 237 - 258, commonAncestors may include nodes not reachable from fromHash which currently get treated as depth 0 and can incorrectly beat fromHash; update the loop that computes deepest (using ancestorSets/commonAncestors) to skip any ancestor for which findPath(graph, fromHash, ancestor) returns null (i.e., continue if no path) before computing depth and comparing to deepestDepth so only reachable ancestors are considered when setting deepest/deepestDepth.
284-316:⚠️ Potential issue | 🟠 MajorReturn
nullbefore treating the root as a reachable leaf.
findReachableLeaves()yields the start node, so a graph containing onlyEMPTY_CONTRACT_HASHreturns the root as the leaf here, despite the new contract saying this should benull. Guard on the root’s outgoing edges before collecting leaves.Proposed fix
export function findLeaf(graph: MigrationGraph): string | null { if (graph.nodes.size === 0) { return null; } if (!graph.nodes.has(EMPTY_CONTRACT_HASH)) { throw errorNoInitialMigration([...graph.nodes]); } + const initialEdges = graph.forwardChain.get(EMPTY_CONTRACT_HASH) ?? []; + if (initialEdges.length === 0) { + const nonRootNodes = [...graph.nodes].filter((n) => n !== EMPTY_CONTRACT_HASH); + if (nonRootNodes.length > 0) { + throw errorNoInitialMigration([...graph.nodes]); + } + return null; + } + const leaves = findReachableLeaves(graph, EMPTY_CONTRACT_HASH);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/1-framework/3-tooling/migration/src/dag.ts` around lines 284 - 316, findLeaf currently treats the start node (EMPTY_CONTRACT_HASH) as a reachable leaf because findReachableLeaves includes the root; modify findLeaf so that before using findReachableLeaves you check whether the root actually has outgoing edges and if not and there are no other reachable nodes return null. Concretely, in findLeaf (using EMPTY_CONTRACT_HASH) guard on the root’s outgoing edges (the graph structure you use to represent edges), return null if the root has no outbound edges and no other nodes to reach, otherwise compute leaves excluding treating the root as a leaf and proceed with the existing ambiguity/branch logic.
🧹 Nitpick comments (1)
packages/1-framework/3-tooling/migration/src/queue.ts (1)
30-31: Avoid suppressing Biome in production code.The bounds check already proves the access is safe; use a local checked cast instead of
biome-ignore.Proposed fix
- // biome-ignore lint/style/noNonNullAssertion: bounds-checked on the line above - return this.items[this.head++]!; + const item = this.items[this.head] as T; + this.head += 1; + return item;As per coding guidelines,
**/*.{ts,tsx}: "Never suppress biome lints".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/1-framework/3-tooling/migration/src/queue.ts` around lines 30 - 31, Remove the biome-ignore and the non-null assertion on the return expression; instead assign the value to a local const and perform a checked cast before returning. Replace the line "return this.items[this.head++]!;" with two statements: "const item = this.items[this.head++];" followed by "return item as T;" (referencing the same this.items and this.head symbols) so the access remains proven safe without suppressing Biome lints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/planning/benchmarks/migration-graph-baseline.md`:
- Line 18: The planning doc docs/planning/benchmarks/migration-graph-baseline.md
currently points readers to “See the PR body” for the benchmark harness which
will not be durable; update the document to include a permanent reproduction
pointer by adding either a stable branch or commit SHA link to the harness, an
archived snapshot (e.g., tarball or gist) or embedding the exact commands and
setup used to run the harness (dependencies, command-line invocation,
environment variables, and dataset version). Make the change inline in
migration-graph-baseline.md so the harness location and the exact reproduction
command are preserved after merge.
---
Outside diff comments:
In `@packages/1-framework/3-tooling/migration/src/dag.ts`:
- Around line 237-258: commonAncestors may include nodes not reachable from
fromHash which currently get treated as depth 0 and can incorrectly beat
fromHash; update the loop that computes deepest (using
ancestorSets/commonAncestors) to skip any ancestor for which findPath(graph,
fromHash, ancestor) returns null (i.e., continue if no path) before computing
depth and comparing to deepestDepth so only reachable ancestors are considered
when setting deepest/deepestDepth.
- Around line 284-316: findLeaf currently treats the start node
(EMPTY_CONTRACT_HASH) as a reachable leaf because findReachableLeaves includes
the root; modify findLeaf so that before using findReachableLeaves you check
whether the root actually has outgoing edges and if not and there are no other
reachable nodes return null. Concretely, in findLeaf (using EMPTY_CONTRACT_HASH)
guard on the root’s outgoing edges (the graph structure you use to represent
edges), return null if the root has no outbound edges and no other nodes to
reach, otherwise compute leaves excluding treating the root as a leaf and
proceed with the existing ambiguity/branch logic.
---
Nitpick comments:
In `@packages/1-framework/3-tooling/migration/src/queue.ts`:
- Around line 30-31: Remove the biome-ignore and the non-null assertion on the
return expression; instead assign the value to a local const and perform a
checked cast before returning. Replace the line "return
this.items[this.head++]!;" with two statements: "const item =
this.items[this.head++];" followed by "return item as T;" (referencing the same
this.items and this.head symbols) so the access remains proven safe without
suppressing Biome lints.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 6974d52a-3cd4-4631-883a-52ea8bcd5ede
⛔ Files ignored due to path filters (1)
projects/graph-based-migrations/disk-sizing-investigation.mdis excluded by!projects/**
📒 Files selected for processing (9)
docs/planning/benchmarks/migration-graph-baseline.mddocs/planning/benchmarks/migration-planner-baseline.mdpackages/1-framework/3-tooling/cli/test/commands/migration-apply.test.tspackages/1-framework/3-tooling/migration/src/dag.tspackages/1-framework/3-tooling/migration/src/graph-ops.tspackages/1-framework/3-tooling/migration/src/queue.tspackages/1-framework/3-tooling/migration/test/dag.test.tspackages/1-framework/3-tooling/migration/test/graph-ops.test.tspackages/1-framework/3-tooling/migration/test/queue.test.ts
d2fb168 to
553e0a7
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
packages/1-framework/3-tooling/migration/test/graph-ops.test.ts (2)
47-61: Prefer object matchers for related-field assertions.Two spots check multiple related fields on the same step with individual
expects — the repo guideline preferstoMatchObjectfor 2+ related values.♻️ Proposed refactor
it('yields the starting node with parent=null and incomingEdge=null', () => { const first = bfs(['A'], forward(sampleEdges)).next(); expect(first.done).toBe(false); if (first.done) return; - expect(first.value.node).toBe('A'); - expect(first.value.parent).toBeNull(); - expect(first.value.incomingEdge).toBeNull(); + expect(first.value).toMatchObject({ node: 'A', parent: null, incomingEdge: null }); }); it('yields parent and incomingEdge for non-start nodes', () => { const steps = [...bfs(['A'], forward(sampleEdges))]; const b = steps.find((s) => s.node === 'B'); - expect(b?.parent).toBe('A'); - expect(b?.incomingEdge).toEqual({ from: 'A', to: 'B' }); + expect(b).toMatchObject({ parent: 'A', incomingEdge: { from: 'A', to: 'B' } }); });As per coding guidelines: "Prefer object matchers (toMatchObject) over multiple individual expect().toBe() calls when checking 2 or more related values in tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/1-framework/3-tooling/migration/test/graph-ops.test.ts` around lines 47 - 61, Tests use multiple related expect() calls; replace them with object matchers. In the first test (variable first from bfs(['A'], forward(sampleEdges))) assert the returned step with a single expect(first.value).toMatchObject({ node: 'A', parent: null, incomingEdge: null }) instead of three separate expects; in the second test, find the step b from [...bfs(...)] and replace the two expects on b?.parent and b?.incomingEdge with expect(b).toMatchObject({ parent: 'A', incomingEdge: { from: 'A', to: 'B' } }). Use the existing variables (first, b, steps) and functions (bfs, forward, sampleEdges) to locate and update the assertions.
85-107: Test name overstates what's verified.Title says "reaches target via shortest path when ordering is applied", but both diamond paths have equal length; the assertion only verifies that ordering influences parent selection (tie-break), not path length. Consider renaming to something like
'ordering callback determines parent selection on equal-length paths'to match the actual behavior being asserted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/1-framework/3-tooling/migration/test/graph-ops.test.ts` around lines 85 - 107, Rename the test to accurately describe what it verifies: update the it(...) title from "reaches target via shortest path when ordering is applied" to something like "ordering callback determines parent selection on equal-length paths" (or similar). The test body uses preferMain, bfs, forward, steps and asserts dStep.parent is 'C', so only the title needs changing to reflect that the test checks tie-breaking/parent selection on equal-length diamond paths rather than shortest-path behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/1-framework/3-tooling/migration/test/graph-ops.test.ts`:
- Around line 47-61: Tests use multiple related expect() calls; replace them
with object matchers. In the first test (variable first from bfs(['A'],
forward(sampleEdges))) assert the returned step with a single
expect(first.value).toMatchObject({ node: 'A', parent: null, incomingEdge: null
}) instead of three separate expects; in the second test, find the step b from
[...bfs(...)] and replace the two expects on b?.parent and b?.incomingEdge with
expect(b).toMatchObject({ parent: 'A', incomingEdge: { from: 'A', to: 'B' } }).
Use the existing variables (first, b, steps) and functions (bfs, forward,
sampleEdges) to locate and update the assertions.
- Around line 85-107: Rename the test to accurately describe what it verifies:
update the it(...) title from "reaches target via shortest path when ordering is
applied" to something like "ordering callback determines parent selection on
equal-length paths" (or similar). The test body uses preferMain, bfs, forward,
steps and asserts dStep.parent is 'C', so only the title needs changing to
reflect that the test checks tie-breaking/parent selection on equal-length
diamond paths rather than shortest-path behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 689ff2ff-1f3d-40e9-a2bb-2c66108d5842
⛔ Files ignored due to path filters (1)
projects/graph-based-migrations/disk-sizing-investigation.mdis excluded by!projects/**
📒 Files selected for processing (9)
docs/planning/benchmarks/migration-graph-baseline.mddocs/planning/benchmarks/migration-planner-baseline.mdpackages/1-framework/3-tooling/cli/test/commands/migration-apply.test.tspackages/1-framework/3-tooling/migration/src/dag.tspackages/1-framework/3-tooling/migration/src/graph-ops.tspackages/1-framework/3-tooling/migration/src/queue.tspackages/1-framework/3-tooling/migration/test/dag.test.tspackages/1-framework/3-tooling/migration/test/graph-ops.test.tspackages/1-framework/3-tooling/migration/test/queue.test.ts
✅ Files skipped from review due to trivial changes (4)
- docs/planning/benchmarks/migration-planner-baseline.md
- packages/1-framework/3-tooling/migration/test/dag.test.ts
- packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
- packages/1-framework/3-tooling/migration/test/queue.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/1-framework/3-tooling/migration/src/queue.ts
- packages/1-framework/3-tooling/migration/src/graph-ops.ts
- packages/1-framework/3-tooling/migration/src/dag.ts
wmadden
left a comment
There was a problem hiding this comment.
Looks good, nice work @saevarb 👍🏻
The generic BFS implementation is a big improvement over the handrolled and repeated graph traversal that preceded it, and the generator interface is a nice touch.
My agent had some nitpicky comments which I won't waste your time with, the only one worth noting is that callers who want the path to the target node need to track it themselves, data which the BFS implementation technically maintains internally but doesn't expose.
…flow The recursive DFS in detectCycles overflowed Node's default stack at around 5,000 linear migration nodes — a plausible size for long-lived projects. Rewrite as an iterative three-colour DFS with an explicit frame stack, preserving the cycle-reconstruction semantics. Ships with a 20,000-node linear-chain regression test that would have triggered `RangeError: Maximum call stack size exceeded` under the recursive implementation.
…nerator Prior state: five hand-rolled BFS loops in dag.ts (findPath, findDivergencePoint, findReachableLeaves, detectOrphans, and a local helper inside findPathWithDecision) with `Array.prototype.shift()` for dequeue (O(n) on V8, making each loop O(V² + E)) and inconsistent state management across sites. This change introduces: - `Queue<T>` (src/queue.ts): minimal amortised-O(1) FIFO with a head-index cursor over an array. - `bfs<E>` (src/graph-ops.ts): a generic generator-based BFS. Direction is expressed by the caller's neighbour closure; each yielded step carries parent + incoming-edge for optional path reconstruction. - dag.ts: every BFS site rewritten as a short for-of loop over `bfs(...)`. `findPathWithDecision` additionally switches from per-edge `findPath` re-traversal (O(|path|·(V+E))) to a single reverse-reachability BFS (O(V+E)). - Assorted tidying: appendEdge helper in reconstructGraph, simplified findLatestMigration null-cascade, justified trailing assertion in findLeaf, section-commented tie-break helpers. Performance note: the generator introduces measurable per-yield dispatch overhead — microbench ops regress roughly 1.7–3.2× versus the old hand-rolled loops (once those have their shift() quadratic fixed). This is imperceptible at realistic call frequency: graph ops run once per CLI invocation (migration status / plan / apply), never in a loop. Absolute wall-clock stays sub-5 ms even at 10,000-node graphs. See the analysis doc for numbers.
Ships the analysis documents that close out (or defer) April milestone VP3's "graph scales with large contracts" question: - docs/planning/benchmarks/migration-graph-baseline.md Scaling of every exported dag operation (findPath, findPathWithDecision, detectCycles, findReachableLeaves, etc.) across shape families, plus a by-inspection review of the CLI rendering pipeline downstream of those ops. - docs/planning/benchmarks/migration-planner-baseline.md Stub marking the planner measurement as deferred — the descriptor-based Postgres planner is being reworked in parallel, so benchmarking the current implementation would measure code that is about to change. - projects/graph-based-migrations/disk-sizing-investigation.md Real-world measurements on three 200–900 model schemas plus compression experiments. Conclusion: today's layout stores both fromContract and toContract in full in every migration manifest (O(20 MiB) per migration on large schemas). Gzip alone is a ~20× win; content-addressed store + zstd with a trained dictionary can reduce total disk by another order of magnitude or two. The benchmark harness itself is not part of this PR. The analysis documents are self-contained with the numbers captured on this branch.
553e0a7 to
bf0f31d
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/1-framework/3-tooling/migration/src/dag.ts (1)
293-316:⚠️ Potential issue | 🟠 MajorHandle the root-only graph before returning the sole leaf.
findReachableLeaves(graph, EMPTY_CONTRACT_HASH)yields the start node when it has no outgoing edges, so a graph containing onlyEMPTY_CONTRACT_HASHreaches Line 316 and returns the root hash instead ofnull, contradicting the new nullable contract documented on Lines 278-280.🐛 Proposed fix
const leaves = findReachableLeaves(graph, EMPTY_CONTRACT_HASH); + if (leaves.length === 1 && leaves[0] === EMPTY_CONTRACT_HASH) { + const nonRootNodes = [...graph.nodes].filter((n) => n !== EMPTY_CONTRACT_HASH); + if (nonRootNodes.length > 0) { + throw errorNoTarget(nonRootNodes); + } + return null; + } + if (leaves.length === 0) { const reachable = [...graph.nodes].filter((n) => n !== EMPTY_CONTRACT_HASH); if (reachable.length > 0) { throw errorNoTarget(reachable);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/1-framework/3-tooling/migration/src/dag.ts` around lines 293 - 316, findReachableLeaves may return the start node (EMPTY_CONTRACT_HASH) when the graph only contains the root with no outgoing edges, causing the function to return the root instead of null; update the final branch in the function that currently returns leaves[0]! to detect this "root-only" case and return null: after computing leaves (from findReachableLeaves) but before returning the sole leaf, check if that sole leaf equals EMPTY_CONTRACT_HASH and that there are no other reachable nodes (e.g., filter graph.nodes or the reachable set to exclude EMPTY_CONTRACT_HASH or check reachable.length === 0); if so return null, otherwise return the leaf as before (preserving the existing ambiguity/divergence checks using findDivergencePoint, findPath, errorAmbiguousTarget, errorNoTarget).
♻️ Duplicate comments (1)
docs/planning/benchmarks/migration-graph-baseline.md (1)
18-18:⚠️ Potential issue | 🟡 MinorMake the benchmark harness pointer durable.
This still points to “the PR body,” which will be brittle after merge. Please include a stable branch/commit, archived harness snapshot, or the exact commands/setup used to reproduce the numbers.
As per coding guidelines,
**/*.{md,ts,tsx}: "Keep docs current (READMEs, rules, links) and prefer links to canonical docs over long comments".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/planning/benchmarks/migration-graph-baseline.md` at line 18, Update the docs/planning/benchmarks/migration-graph-baseline.md entry that currently says "See the PR body" to include a durable pointer: add a stable branch name or exact commit SHA for the benchmark harness (or a permalink to an archived snapshot), and/or include the exact commands, environment, and setup steps used to reproduce the numbers; ensure the text that referenced the PR body is replaced with the chosen durable reference and a brief reproducibility checklist so readers can rerun the harness without relying on the PR description.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/1-framework/3-tooling/migration/src/dag.ts`:
- Around line 293-316: findReachableLeaves may return the start node
(EMPTY_CONTRACT_HASH) when the graph only contains the root with no outgoing
edges, causing the function to return the root instead of null; update the final
branch in the function that currently returns leaves[0]! to detect this
"root-only" case and return null: after computing leaves (from
findReachableLeaves) but before returning the sole leaf, check if that sole leaf
equals EMPTY_CONTRACT_HASH and that there are no other reachable nodes (e.g.,
filter graph.nodes or the reachable set to exclude EMPTY_CONTRACT_HASH or check
reachable.length === 0); if so return null, otherwise return the leaf as before
(preserving the existing ambiguity/divergence checks using findDivergencePoint,
findPath, errorAmbiguousTarget, errorNoTarget).
---
Duplicate comments:
In `@docs/planning/benchmarks/migration-graph-baseline.md`:
- Line 18: Update the docs/planning/benchmarks/migration-graph-baseline.md entry
that currently says "See the PR body" to include a durable pointer: add a stable
branch name or exact commit SHA for the benchmark harness (or a permalink to an
archived snapshot), and/or include the exact commands, environment, and setup
steps used to reproduce the numbers; ensure the text that referenced the PR body
is replaced with the chosen durable reference and a brief reproducibility
checklist so readers can rerun the harness without relying on the PR
description.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 1f6fa39a-cc37-4d97-b6bb-daf45a89b144
⛔ Files ignored due to path filters (1)
projects/graph-based-migrations/disk-sizing-investigation.mdis excluded by!projects/**
📒 Files selected for processing (9)
docs/planning/benchmarks/migration-graph-baseline.mddocs/planning/benchmarks/migration-planner-baseline.mdpackages/1-framework/3-tooling/cli/test/commands/migration-apply.test.tspackages/1-framework/3-tooling/migration/src/dag.tspackages/1-framework/3-tooling/migration/src/graph-ops.tspackages/1-framework/3-tooling/migration/src/queue.tspackages/1-framework/3-tooling/migration/test/dag.test.tspackages/1-framework/3-tooling/migration/test/graph-ops.test.tspackages/1-framework/3-tooling/migration/test/queue.test.ts
✅ Files skipped from review due to trivial changes (4)
- packages/1-framework/3-tooling/cli/test/commands/migration-apply.test.ts
- docs/planning/benchmarks/migration-planner-baseline.md
- packages/1-framework/3-tooling/migration/test/queue.test.ts
- packages/1-framework/3-tooling/migration/src/queue.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/1-framework/3-tooling/migration/test/dag.test.ts
Migration graph: perf refactor + VP3 analysis
Summary
Refactors
dag.tsto consolidate five hand-rolled BFS loops behind a single genericbfs<E>generator + a minimalQueue<T>class, and fixes a real stack-overflow bug indetectCycles. Ships the analysis documents that close (or explicitly defer) the three halves of April milestone VP3 — graph op scaling, planner cost, and on-disk contract size.The PR is three commits:
fix(migration-tools): make detectCycles iterative to avoid stack overflowrefactor(migration-tools): consolidate dag.ts BFS around a generic generatordocs(graph-based-migrations): graph perf + disk-sizing analysisThe benchmark harness used to produce the numbers below is not part of this PR — it lives on a companion branch (see Notes for the reviewer).
Why
detectCycles's recursive DFS threwRangeError: Maximum call stack size exceededat ~5,000 linear migration nodes under Node's default stack — well within the size of a plausible migration history.dag.ts(findPath, findDivergencePoint, findReachableLeaves, detectOrphans, plus a local helper in findPathWithDecision), each with its own queue, visited set, and state machine. Replacing them with one generator makes the direction of traversal a caller concern and collapses each site to a short for-of loop.Behaviour changes
1.
detectCyclesterminates on any graph sizeBefore: recursive DFS; crashed at ~5,000 consecutive linear nodes.
After: iterative DFS with an explicit frame stack. Same three-colour algorithm, same cycle-reconstruction logic, same public signature. No stack-depth limit beyond available memory.
not.toThrow()— packages/1-framework/3-tooling/migration/test/dag.test.ts2. BFS ops unified behind
bfs<E>+Queue<T>Before: each call site maintained its own
const queue = [start]; while (queue.length) { const x = queue.shift(); … }withArray.shift()(O(n) on V8) and site-specific visited/parent bookkeeping.After: a generic
bfs<E>generator at packages/1-framework/3-tooling/migration/src/graph-ops.ts yields{ node, parent, incomingEdge }steps. Direction is expressed by the caller'sneighboursclosure (returns{ next, edge }pairs). A tiny amortised-O(1)Queue<T>at packages/1-framework/3-tooling/migration/src/queue.ts uses a head-index cursor instead ofArray.shift.Each former BFS site becomes a few lines:
findPathWithDecision3.
findPathWithDecisionis O(V + E), not O(|path| · (V + E))Before: the alternative-neighbour loop called
findPath(e.to, toHash)once per outgoing edge along the selected path, so a graph with |path|=1000 incurred 1,000 extra BFS traversals.After: a single reverse-reachability BFS from
toHashpopulates aSet<string>; the per-edge check becomes O(1) set membership. Structural, not cosmetic — the old code was quadratic in path length.findPathWithDecision+ newcollectNodesReachingTargethelper)4.
findLeafreturnsstring | nullinstead of an overloaded sentinelBefore: returned
EMPTY_CONTRACT_HASHfor two unrelated cases (empty graph, root-only graph). Callers had to decode that as "no target".After: returns
string | null—nullmeans "no target". Happy-path callers that constructed non-empty graphs addleaf!where they pass it on tofindPath(8 call sites in CLI tests, all of which construct a deliberate non-empty graph).Design decision worth flagging
The
bfs<E>generator is measurably slower than the hand-rolled loops it replaces. Per-op microbenches show 1.7–3.2× regressions across the suite — generator dispatch and per-yield object allocation aren't free. The absolute numbers are tiny (every op stays below 10 ms on 10,000-node graphs) but the ratio is real.This is an intentional trade:
migration status/plan/apply), not in a loop.findPathWithDecisionon a merge-heavy 1k-spine graph — is under 1 ms. It was 100 ms with the pre-fix quadratic; it's 1 ms now even with generator overhead.The analysis document shows the scaling curves and explains this in more detail.
Conclusions (from the analysis docs)
Graph ops — docs/planning/benchmarks/migration-graph-baseline.md
Linear in edge count across every shape. At 10k-edge graphs, no op exceeds ~5 ms. Rendering pipeline (CLI formatter + dagre layout) sits downstream and is fast in the default path because it truncates before layout. Full-
--graphon huge graphs is a known risk because dagre's rank assignment can degrade, but that's not a new concern introduced by this PR.On-disk contract size — projects/graph-based-migrations/disk-sizing-investigation.md
Every
migration.jsonembeds bothfromContractandtoContractin full with no compression or dedup. Measured on three real Prisma ORM schemas (215, 880, 893 models): ~7 KB per model per contract copy. At the 880-model scale a single migration is ≥ 18 MiB, and 100 migrations ≈ 2 GiB raw.Compression experiments show gzip alone is a ~20× win (contract JSON has heavy structural repetition; compresses to ~4%). A content-addressed store + zstd with a trained dictionary gets a further 50–100× by exploiting cross-contract redundancy that gzip's 32 KiB window can't see. Memo explores three architectural options and recommends gzip-per-file as the quick win, zstd + shared dictionary as the right long-term shape.
Planner cost — docs/planning/benchmarks/migration-planner-baseline.md
Deferred. The migration planner is being reworked and unified for different targets in parallel; benchmarking the current implementation would measure code about to change. Revisit after the rework lands.
Verification
pnpm --filter @prisma-next/migration-tools typecheck— greenpnpm --filter @prisma-next/migration-tools test— 171 tests pass across 10 files (10 newbfstests, 7 newQueuetests, 2 newdetectCycles/ dead-end regression tests, rest unchanged)pnpm --filter @prisma-next/cli typecheck— green after propagatingfindLeaf's nullable return (8 test-site!assertions inmigration-apply.test.ts)pnpm --filter @prisma-next/cli test— 536 tests passNon-goals / out of scope
findLeafAMBIGUOUS_TARGET diagnostic path — remains quadratic-ish in divergent branch count. Only on the error path; noted in the memo but not fixed.Follow-ups
Notes for the reviewer
migration-performance-bench— same three commits as this PR plus one additionalchorecommit that addspackages/1-framework/3-tooling/migration/bench/. Kept off the PR deliberately (graph-based migrations aren't a perf-critical subsystem; not worth maintaining a harness in-tree). Rungit checkout migration-performance-bench+pnpm bench:save+node bench/snapshots/build-chart.mjslocally to reproduce the numbers.Summary by CodeRabbit
Documentation
Bug Fixes
Refactor
Tests