Skip to content

Support subquery moves across AND/OR/NOT WHERE clauses#4051

Open
robacourt wants to merge 13 commits intomainfrom
rob/simple-subqueries-with-dnf
Open

Support subquery moves across AND/OR/NOT WHERE clauses#4051
robacourt wants to merge 13 commits intomainfrom
rob/simple-subqueries-with-dnf

Conversation

@robacourt
Copy link
Copy Markdown
Contributor

@robacourt robacourt commented Mar 24, 2026

Summary

This PR lets shapes with boolean combinations around subqueries stay live when dependency rows move. Previously, shapes that used AND, OR, or NOT around a subquery would return 409 on a move, forcing the client to throw away the shape and resync it from scratch. For large shapes that could mean a very expensive full resync; this PR avoids that by reconciling those changes in-stream.

The core of the PR is a rewrite of the subquery foundation. Move-ins are now handled as exact log splices: buffer outer-table transactions, run the move-in query, and insert its results into the shape log at the precise point where they become valid. That gives subqueries a much more reliable ordering model, removes a large class of move-handling edge cases, and gives us a stable base for richer boolean WHERE support over subqueries.

Compatibility

This PR changes the wire protocol.

The sync service from this branch is not compatible with elixir-client from before this PR. For the TanStack DB client, this requires:

  • @tanstack/db >= 0.6.2
  • @tanstack/electric-db-collection >= 0.3.0

The protocol change is needed because DNF-based subquery shapes need more than tags alone. Rows can now stay in the shape for multiple independent reasons, so the server sends real active_conditions values and position-aware move-in broadcast control messages. That lets clients update the truth value of the affected DNF positions for rows they already have locally, re-evaluate inclusion correctly, and avoid unnecessary deletes or full resyncs.

Architecture

  • compile each shape's WHERE clause into a DNF plan and use that plan consistently for routing, move planning, SQL generation, tags, and active_conditions
  • replace the previous subquery reverse-indexing approach with a per-shape SubqueryIndex that each shape consumer seeds and updates from its own dependency views, keeping the reverse index exactly in sync with consumer state
  • refactor Electric.Shapes.Consumer around explicit event handlers and ordered effects, with a single move queue and explicit buffering/splice handling
  • emit DNF-aware row metadata and position-aware move broadcasts so visibility can change incrementally instead of forcing invalidation

Secondary changes

  • update the materializer and clients to understand DNF-aware tags, active_conditions, and move broadcasts
  • expand coverage across DNF planning, routing, move ordering, resume behavior, and boolean/subquery edge cases
  • update the oracle test harness and generators to cover the new boolean/subquery behavior more accurately

@robacourt robacourt force-pushed the rob/simple-subqueries-with-dnf branch 2 times, most recently from c7da8bb to f33c781 Compare March 25, 2026 14:38
@robacourt robacourt force-pushed the rob/simple-subqueries-with-dnf branch 2 times, most recently from 6669eb8 to a1e0fc1 Compare March 26, 2026 12:38
@robacourt robacourt force-pushed the rob/simple-subqueries-with-dnf branch from 226f9b3 to 203f4fc Compare March 26, 2026 16:51
@robacourt robacourt force-pushed the rob/simple-subqueries-with-dnf branch 6 times, most recently from 67f228e to 438e6cf Compare April 13, 2026 14:22
@robacourt robacourt force-pushed the rob/simple-subqueries-with-dnf branch 3 times, most recently from 66d52fe to fb8a7d9 Compare April 15, 2026 15:40
@electric-sql electric-sql deleted a comment from github-actions Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from github-actions Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from github-actions Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from github-actions Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from netlify Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from github-actions Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from github-actions Bot Apr 15, 2026
@electric-sql electric-sql deleted a comment from codecov Bot Apr 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 85.41667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.29%. Comparing base (59a96b8) to head (801d714).
⚠️ Report is 8 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...kages/elixir-client/lib/electric/client/message.ex 0.00% 7 Missing ⚠️
...s/elixir-client/lib/electric/client/tag_tracker.ex 94.87% 4 Missing ⚠️
packages/elixir-client/lib/electric/client/poll.ex 66.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4051      +/-   ##
==========================================
- Coverage   89.20%   86.29%   -2.92%     
==========================================
  Files          25       51      +26     
  Lines        2520     3553    +1033     
  Branches      633      638       +5     
==========================================
+ Hits         2248     3066     +818     
- Misses        270      485     +215     
  Partials        2        2              
Flag Coverage Δ
elixir 79.18% <85.41%> (?)
elixir-client 79.18% <85.41%> (?)
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 94.30% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 89.20% <ø> (ø)
unit-tests 86.29% <85.41%> (-2.92%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robacourt robacourt changed the title DRAFT: simple subqueries with DNF DRAFT: keep boolean subquery shapes live with DNF move handling Apr 18, 2026
@robacourt robacourt changed the title DRAFT: keep boolean subquery shapes live with DNF move handling Support subquery moves across AND/OR/NOT WHERE clauses Apr 18, 2026
@robacourt robacourt force-pushed the rob/simple-subqueries-with-dnf branch from 413acaf to 4208d42 Compare April 18, 2026 20:46
@robacourt robacourt marked this pull request as ready for review April 20, 2026 06:35
@robacourt robacourt requested a review from icehaunter April 20, 2026 06:35
@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

Claude Code Review

Summary

This PR is a substantial architectural overhaul that adds DNF-based WHERE clause decomposition to enable subquery move-ins/outs across AND/OR/NOT compound expressions, avoiding expensive 409-forced resyncs. The core design — a functional effects layer, position-indexed DNF, SubqueryIndex ETS routing, and splice-based log insertion — is well thought out and comes with strong test coverage (~4000+ lines of new tests).

What's Working Well

  • Effects architecture: Decoupling functional event handlers from imperative side effects (Effects.execute/3) makes the code testable and the control flow explicit. Clean separation of business logic from storage/task interactions.
  • DNF position-indexed format: The expanded fixed-width format where each position maps to a specific conjunction in the original query structure elegantly supports tracking active_conditions across multiple disjuncts without per-row re-evaluation.
  • active_conditions: []nil normalization in tag_tracker.ex: The tag tracker correctly converts the default wire value [] to nil (no DNF), preventing false classification of non-subquery rows as DNF-aware. The case headers.active_conditions do [] -> nil; nil -> nil; ac -> ac end guard is the right defensive touch.
  • Test coverage: Decomposer, SQL generator, DNF plan, filter, event handler, and shape tests are all comprehensive. The property-based round-trip test for SqlGenerator is particularly valuable.
  • Bounded task supervision: Task.Supervisor.start_child (not bare Task.async) for move-in queries; @max_disjuncts 100 guard against combinatorial explosion in DNF cross-products.
  • Changeset file present: Protocol change is documented and client version requirements are called out.

Issues Found

Important (Should Fix)

1. reraise followed by dead code in query_move_in_error handler

File: packages/sync-service/lib/electric/shapes/consumer.ex:349-358

def handle_info({:query_move_in_error, error, stacktrace}, state) do
  Logger.error(...)
  reraise(error, stacktrace)
  # No-op as the raise will crash the process   ← the comment admits the problem
  stop_and_clean(state)                          ← unreachable
end

reraise crashes the GenServer, calling terminate/2 which runs terminate_writer + ShapeCleaner.handle_writer_termination. But stop_and_clean likely does additional work (e.g., marking the shape for 409 invalidation and notifying subscribers). That path is skipped on the crash route.

Suggested fix — use a controlled stop rather than a crash:

def handle_info({:query_move_in_error, error, stacktrace}, state) do
  Logger.error("Error querying move in for #{state.shape_handle}: #{Exception.format(:error, error, stacktrace)}")
  {:stop, {error, stacktrace}, state}
end

This still calls terminate/2 but is intentional (:temporary restart means the supervisor won't respawn it), avoids dead code, and gives ShapeCleaner a clean reason tuple to inspect.


2. Mixed-polarity subquery validation happens at runtime, not at shape-creation time

File: packages/sync-service/lib/electric/shapes/dnf_plan.ex:151-172

DnfPlan.build_dependency_polarities/1 returns {:error, "a subquery dependency cannot be used with both positive and negative polarity..."} when the same dependency appears both positively and negated (e.g. col IN (SELECT ...) AND NOT (col IN (SELECT ...))). This error surfaces only when DnfPlan.compile/1 is called at runtime — not during Shape.new/2 / validate_where_clause/3.

A shape with mixed polarity would be created successfully and written to storage, then fail the first time it tries to serve data. Depending on how callers handle {:error, reason} from compile/1, this could cause a silent failure or an unhandled match crash.

Suggested fix: call DnfPlan.compile/1 (or inline build_dependency_polarities) during Shape.make_new/1's validation pipeline, after build_shape_dependencies. Return {:error, reason} from shape creation instead.


3. SplicePlan.build error path needs a test

File: packages/sync-service/lib/electric/shapes/consumer/event_handler/subqueries/buffering.ex:127-168

defp splice(%{active_move: active_move} = state) do
  with {:ok, splice_plan} <- SplicePlan.build(active_move, state.shape_info) do
    ...
  end
end

If SplicePlan.build/2 returns {:error, reason}, the with expression propagates it as the result of splice/1maybe_splice/1handle_event/2. It's not clear from the code whether handle_apply_event_result/2 in consumer.ex correctly handles this case or whether it would cause a function-clause crash. No test currently exercises the SplicePlan.build failure path. Given the complexity of the splice-boundary logic, a test covering this branch would give confidence that the consumer does not silently drop the error or crash unexpectedly.


4. No linked issue

The PR description is detailed, but it does not reference any issue number. PRs should link to the issue they address so context and requirements are discoverable from the code history.


Suggestions (Nice to Have)

5. String.to_integer in event-handler hot path

File: packages/sync-service/lib/electric/shapes/consumer/event_handler/subqueries/buffering.ex:89

dep_index = subquery_ref |> List.last() |> String.to_integer()

String.to_integer/1 raises ArgumentError on non-numeric input. The value comes from an internal shape definition, so it should always be a valid integer — but a defensive Integer.parse/1 + explicit error would make failures more diagnosable than a bare ArgumentError in a hot path. (Same pattern appears in dnf_plan.ex:113.)


6. DnfPlan.compile/1 is called on every operation and never cached

The plan is deterministic for a given shape and involves AST traversal + SQL generation for all positions. For shapes with complex subquery DNFs, repeated compilation per transaction adds up. Consider storing the compiled plan in the consumer's process state (alongside shape) or as a lazy field on the ShapeInfo struct.


7. Missing telemetry for buffer overflow and splice completion

Effects.query_move_in_async emits [:electric, :subqueries, :move_in_triggered] — good. But buffer overflow ({:error, :buffer_overflow} in Buffering.handle_event) and successful splice completion (Buffering.splice/1) have no telemetry. Under load these would be invisible in dashboards. A counter for overflow drops and a duration/count for splices would help operators correlate consumer stalls with buffer saturation.


8. Changeset description could be more specific about protocol fields

The current changeset note says "changes the wire protocol" but doesn't describe what changed (new active_conditions field on change messages, new move-broadcast shape, tag format changes). Operators running rolling deployments need to know the exact incompatibility window. A brief bullet list of new/changed wire fields would help.


Issue Conformance

No linked issue is present, so there are no formal requirements to cross-reference. The PR description is thorough and accurately describes the architecture. The compatibility matrix (@tanstack/db >= 0.6.2, etc.) is explicit, which is appreciated.


Review iteration: 1 | 2026-04-22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants