Skip to content

design: recorders — recorded, non-deterministic, at-least-once transformations#36909

Draft
antiguru wants to merge 29 commits into
mainfrom
claude/sleepy-wright-IuxeS
Draft

design: recorders — recorded, non-deterministic, at-least-once transformations#36909
antiguru wants to merge 29 commits into
mainfrom
claude/sleepy-wright-IuxeS

Conversation

@antiguru

@antiguru antiguru commented Jun 4, 2026

Copy link
Copy Markdown
Member

What

A design doc proposing recorders: a way to house transformations that must record a decision made at processing time and never recompute it — non-deterministic, side-effecting, or time-dependent logic (including UDFs). The guarantee is at-least-once application of the function and at-most-once persistence of its result (committed once via a per-commit CAS).

Write-side complement to the CHANGES table function (#36869, draft); a deliberate revival of the removed Continual Tasks feature (database-issues#9694 / PR #35967), pulled for lack of design consensus and incompleteness, not because the model is unsound.

Docs: doc/developer/design/20260604_recorders.md (conceptual) and 20260604_recorders_implementation.md (feasibility). The model matches the WG "Recorders" design (Notion); these docs are the long-form version.

The calculus

  • a TVC — a normal collection; change implicit.
  • a dTVC — a changelog carrying its change as data: a time and a diff column (ordinary, user-named).

differentiate (CHANGES) and integrate (INTEGRATE) are a carrier-preserving inverse pair, both in the input's timeline (domain A). The cross-domain, non-deterministic leg — picking the commit time T, sampling frozen values like now() — lives only in RECORD. freeze is sample-and-hold.

The surface (no new collection kind)

-- a regular table that declares its carriers + an associated progress collection.
CREATE TABLE enriched (
  a BIGINT, b BIGINT, c BIGINT, val TEXT,
  change_ts mz_timestamp NOT NULL, change_diff bigint NOT NULL
) WITH (PROGRESS, TIMESTAMP = change_ts, DIFF = change_diff);   -- inits progress to (0,0)

-- the one new standing object; bare query like an MV, INTO like a SINK.
CREATE RECORDER enrich INTO enriched AS
  SELECT e.*, d.val
  FROM CHANGES(events AS OF AT LEAST mz_now() - INTERVAL '1 hour'
               USING TIME change_ts, DIFF change_diff) e
  JOIN dim d ON e.fk = d.key;            -- bare TVC ref ⇒ frozen lookup

-- argument-free: carriers + progress come from the table.
CREATE MATERIALIZED VIEW enriched_now AS SELECT * FROM INTEGRATE(enriched);

-- bounding is ordinary DELETE; integral-preserving = data-domain compaction.
  • The store is a regular table that declares its carriers + an associated progress collection via WITH (PROGRESS, …) (initialized to (0,0); empty progress = sealed frontier). No DELTA TABLE collection kind.
  • change_diff is data, not the persist multiplicity. CHANGES emits each change as a +1 row with the signed count in the column.
  • INTEGRATE(table) is argument-free — carriers and the progress collection come from the table; it accumulates DIFF per row, thresholding at max(0, Σ) internally (safe by construction; repeat_row not exposed). A stateful reduce, memory ∝ live output.
  • The progress collection is owned by the table, engine-written / user-read-only (the CREATE SOURCE … EXPOSE PROGRESS AS precedent); INTEGRATE reclocks through it. A RECORD writer only writes a lane. Taking a table (not arbitrary SQL) is what keeps the progress lookup unambiguous.
  • Application shaping — dedup, first-seen, upsert, top-k — lives in the RECORD body, not in INTEGRATE; first-seen/upsert bodies are self-referential (Tier 3).
  • Bounding is ordinary DELETE; the integral-preserving form is data-domain compaction — a clamp + GROUP BY/SUM reduce, not a bare UPDATE.

Decisions worth reviewers' attention

  • Per-table progress collection + per-writer lanes. B is the table's single shard frontier ⇒ exactly one progress collection per table; all RECORDs and INTEGRATEs share it (multiple readers consistent by construction). Multi-writer = per-writer lanes R_i: B → X_i, table A-completeness = meet_i R_i[B_upper] at read time (a merged frontier can't work). Drop the last writer → freeze or seal (open).
  • Optimizer barrier (hard invariant): the recorded table is authoritative; never recomputed from the RECORD body's inputs.
  • Freeze is typing, not a keyword — bare TVC ref in a RECORD body = frozen; must be diagnosable, never silent (EXPLAIN + plan-time NOTICE). In a body, now() (frozen wall-clock) ≠ mz_now() (domain-A event time).
  • Aging domain decided: retention defaults to wall-clock (domain B), event-age (A) opt-in. Where RETAIN HISTORY sits (table vs INTEGRATE consumer vs recorder — leaning INTEGRATE) is open.
  • GDPR erasure = DELETE + advancing since; a cascade DELETE alone is not erasure.

SQL-consistency pass

An agent reviewed the surface against Materialize's grammar (file:line) and prior art:

  • Carriers via a USING clause / table WITH (…), not => named args (MZ has no named-arg support; => already means map-entry). CHANGES's AS OF parses like SUBSCRIBE (query) AS OF …. Dropped the AS RECORD(...) wrapper; IN DOMAINWITH (TIMELINE = …).
  • Deliberate divergences: DIFF is a signed multiplicity, not an op-enum (cf. Snowflake METADATA$ACTION, Flink +I/-U, Debezium c,u,d); CHANGES knowingly collides with Snowflake's CHANGES clause.

Status

Draft, for discussion. De-risked by the CHANGES PR (#36869) and the BEGIN CONTINUAL TRANSACTION prototype. Gating dependency: the (unbuilt) OCC timestamped-write substrate — INTEGRATE and bounding DML are off that critical path. Collapsing to a regular table + progress collection removes the speculative DELTA TABLE kind (impl M2) and reshapes engine-owned compaction (M3/M5). Open agenda: co-design the CHANGES carrier spelling with #36869, the RECORDER keyword, commit-timestamp policy, RETAIN HISTORY placement, freeze-vs-seal on drop-all.

Stakeholders: @antiguru + the WG Continual Tasks folks (Aljoscha, Seth, Frank).

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H

claude added 8 commits June 4, 2026 17:48
…e transformations)

Design doc reviving the continual-tasks capability on top of the CHANGES
table function (#36869) and the OCC read-then-write commit substrate.
Presents a declarative recorded-collection surface and the imperative
BEGIN CONTINUAL TRANSACTION surface as two altitudes of one engine, with
a layered correctness ladder (exactly-once into persist / eventual via an
MV / at-least-once external). Motivated by stream-table joins, finalization,
upsert-in-compute, and non-deterministic/UDF enrichment.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Elevate unbounded output/state growth to a first-class concern: the
record-don't-recompute vs bounded-resources tension, the output models
(append+retention / keyed-upsert / writable table) and their growth
stories, and the internal reclamation invariants (retractions must
consolidate with their inserts; bounded growth <=> bounded since-lag <=>
bounded retention window; RETAIN HISTORY trades off against growth).
Prefer keyed/upsert output reusing the storage upsert machinery over the
two-shard+MV pattern whose working shard is itself the growth risk.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Reframe output growth around how data leaves: age/time-based phase-out is
automatic via the temporal-filter + lagged-read-hold mechanism (clock-driven,
self-finalizing, no re-integration, no self-reference), NOT a DELETE;
supersession is a declared key emitting the exact prior-value retraction;
arbitrary predicate eviction is explicit DELETE in the imperative surface
only. Decouples when output is produced (input-driven) from when old data is
forgotten (clock-driven). Drop the upsert_continual_feedback framing (it's an
ordinary keyed arrangement). Unbounded key spaces are fine; bound time, not keys.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…output mode

Collapse the keyed/upsert output model: upsert = top-1 (top-k), so the reduce
already emits the exact retract/insert diffs and the output just records them
(O(live keys)). No declared key, no special output mode, no per-commit
self-reference. Output side reduces to: record the body's diffs + optional
time-based retention. The real distinction from a plain top-k MV is input-side:
consume input as a listen-only changelog and persist the reduce state as the
output (rehydrated on restart), so the upstream input can be forgotten without a
rehydration snapshot.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…ential lifetime

Time-based retention is one corner of eviction, not the whole story. A
recorded row has two independent relationships to its sources: its value
(frozen at processing time) and its existence (which can stay live). The
join structure governs lifetime; a FROZEN(...) marker governs which values
are snapshotted. Compliance (erase a deleted user's records) is referential
ownership / ON DELETE CASCADE -- a physical retraction, not a read-time
filter -- costing an index on the output's liveness key. This is a cost
spectrum chosen per relationship: fully-frozen enrichment needs no stream
index; live-existence enrichment indexes the output by its liveness key. The
currency-conversion-with-compliance case needs both behaviors in one transform.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Capture the surface question: the asymmetric stream-driven join wants to be
explicit (prior art: Flink temporal/lookup join, Kafka stream-table join), but
a single keyword conflates value-freshness and existence-freshness. STREAM JOIN
is the fully-frozen floor; FROZEN() and a live-existence opt-in handle the
deviations (the compliance cascade is a capability Flink/Kafka lack). The
construct only yields non-definite results, so it is valid only inside a
recorded collection -- possibly a third altitude alongside the object and the
imperative transaction.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Lead with the three orthogonal primitives -- RECORD (definite-by-persistence),
COMMIT@T (processing-time triggered timestamped write), FROZEN (per-value
snapshot cut) -- plus CHANGES as a boxed dependency. Recast everything else as
composition: a reconstruction table (use case -> primitives + SQL), the
stream/table-duality + sample-and-hold framing, the determinism boundary with a
single lint rule (FROZEN/processing-time writes legal only inside a RECORD
output), and eviction / STREAM JOIN / surfaces / correctness ladder all
explicitly derived. Resolve the freshness default (frozen-by-default in a stream
context; live existence opt-in). Add bespoke-features-per-use-case and
FROZEN-as-join-modifier to Alternatives.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…TVC-dTVC calculus

Rename the feature to 'Recorders'. Lead with the four-operation calculus
(differentiate / integrate / record / bound) over TVC <-> dTVC, with freeze as
sample-and-hold. Concrete surface: CREATE DELTA TABLE and CREATE RECORDER with
RECORD / INTEGRATE / DELETE actions and a worked dedup example. Freeze is
typing, not a keyword (bare TVC reference = frozen; CHANGES/DELTA TABLE =
tracked); compliance cascade = a CHANGES(dim)-driven DELETE action. Add the key
evaluation rule (pre-commit reads, atomic writes at T, no intra-commit
fixpoint), the two stream-table joins (as-of-event-time definite vs
processing-time recorded) and the history/record duality, INTEGRATE monotonicity
clamp, and restart behavior. Reserve 'continual transaction' for the imperative
surface; retire 'task'/'transform'.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
@antiguru antiguru changed the title design: continual transforms — recorded, non-deterministic, at-least-once transformations design: recorders — recorded, non-deterministic, at-least-once transformations Jun 5, 2026
claude added 3 commits June 5, 2026 12:15
…king claim

Add a feasibility section from a codebase pass against the recovered Continual
Tasks implementation: the gating dependency is the unbuilt OCC timestamped-write
substrate + the data-plane->control-plane hand-off (the storage-level atomic
multi-shard write via txn-wal commit_at already exists); the control-plane
commit path is mandatory (CTs' bespoke sink bypassed txn-wal); sharp edges
(self-reference reclocking, multi-output catalog/controller model, freeze-by-
typing, DELTA TABLE collection kind) all hit by CTs. Correct the overclaim that
the no-fixpoint rule removes ALL T-1 reclocking: the cross-commit self-read
still needs the CT step_forward/read-hold machinery.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Companion to the recorders design. Picks the control-plane-commit architecture
(Option B; the CT compute-sink path is a single-output dead end), names the
unbuilt OCC timestamped group-commit + dataflow->control-plane hand-off as the
gating dependency (storage-level commit_at/UpperMismatch already exists), and
gives a per-crate change map, a ranked risk register (commit substrate;
self-reference reclocking; one-item-N-outputs catalog model; freeze-by-typing;
DELTA TABLE collection kind; INTEGRATE compaction), what to salvage vs rebuild
from the removed CT code, a phased plan, and open questions. Main doc points to
it.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
claude added 11 commits June 5, 2026 17:38
Close the gap that the design pinned down the row mz_timestamp (data) but not
the output collection's frontier (progress). State that an output's upper
advances with the meet of its input frontiers (like an MV), independent of the
mz_timestamp data; the write-time clamp (>= upper) is what makes monotone
frontier advance sound; and idle recorders must still tick output uppers forward
(clock-driven frontier-only commits) or downstream reads stall. Add impl risk M4
noting the mechanism already exists via append_table's advance_to and must be
wired into the Phase-0 commit loop. Tie the commit-cadence open question to its
frontier face.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Close the correctness gap: a fact definite at logical time t takes effect in a
recorded output at commit time t' >= t (cannot write below upper / retroact), so
the output is bitemporal -- frontier/logical time = system time (recorded-by),
mz_timestamp column = event time. Defining property: output(T) = recorded-by-T,
not f(inputs(T)). RECORD -> DELTA TABLE is bitemporal (event-time queries
recoverable via mz_timestamp); INTEGRATE -> TVC is reclocked to T and discards
event time (use for current state). Same reclocking ingestion does, but
re-stamps an already-timestamped fact. Impl doc: optimizer/controller must not
treat a recorder output as a recomputable function of inputs.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…tem time)

Supersede the weaker 'INTEGRATE reclocks to T, non-deterministic history'
framing. Treat a recorder like source ingestion: record a durable reclock
t -> t' (event-time completeness vs system-time write frontier) and use it to
drive the integrated collection's frontier. INTEGRATE places each fact at its
event time (max(mz_timestamp, upper); logical compaction for late data) on the
input's timeline, so the integrated collection is a definite function of the
recorded DELTA TABLE + reclock -- stable history, consistent downstream
composition. Non-determinism is confined to the recorded values + the reclock;
everything downstream is definite. A proper bitemporal object (query by event
time or system time). Reuse source reclock/remap machinery. Impl doc: the RECORD
step is the only non-deterministic boundary; INTEGRATE outputs are recomputable
over the recorded data; new reclock/remap component per recorder.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…rontier/erasure caveats

Adversarial probe found a real bug: the max(mz_timestamp, upper) 'logical
compaction of late data' splits retract/insert pairs across a frontier advance
and is unsound for updates. Fix: place each fact at its event-time mz_timestamp
and advance the integrated frontier only after integrating all input through
that time -- no clamp, late data is impossible for a well-formed input frontier,
INTEGRATE is an ordinary dataflow over the DELTA TABLE. This dissolves the
replica-nondeterminism and non-monotone-reclock concerns (placement is
event-time only; t' is allowed-nondeterministic when-recorded metadata; the
reclock relates two monotone frontiers).

Also: clarify event time and system time are two timestamps on the SAME timeline
(t' >= t), so recorder outputs are lagged system-timeline collections and
cross-table joins work (no separate-timeline wall). Add the dual-frontier
requirement (time-based aging needs a system/wall-clock frontier, since the
event-time frontier stalls when input is idle; mz_now() = processing time).
Correct the self-referential prune to read the system-time frontier (the
one-tick convergence claim held only on system time). Document the
freeze-back-stamping caveat (frozen value = processing-time fact at event-time
stamp, not historical truth) and the GDPR-erasure vs stable-history tradeoff.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Revert last commit's over-correction. mz_timestamp is arbitrary data (could be
0), a distinct timeline from the table's system/write time; you cannot use one
for the other. INTEGRATE must clamp max(mz_timestamp, upper) to hold the TVC
invariant (logical compaction of below-frontier data) -- this is sound because a
same-row +1/-1 accumulates correctly regardless of times, so it is NOT the
split-retract/insert bug. The output lives on the system timeline (joinable, no
separate-timeline wall); event-time answers come from filtering the mz_timestamp
column. Time-based aging works via mz_now()=system time on the clock (no
dual-frontier needed); self-ref prune is automatically system-time. The
reclock (inherent: each row carries mz_timestamp + write time) makes the clamped
integration reproducible. Keep the GDPR-erasure-vs-history and
queryability-vs-compaction tradeoffs.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…A), not system time

Correct the timeline model precisely. Two domains: A = input timeline
(mz_timestamp lives here), B = recorder processing/system time (the DELTA TABLE's
physical write frontier). RECORD reads A, writes data into the DELTA TABLE (B),
and notes the A->B mapping (reclock). INTEGRATE reads the DELTA TABLE (B), places
output by mz_timestamp (A), and uses the reclock to drive its output frontier
back into domain A -- so an event at input-time t appears in the output at t
(round-trip A->B->A), preserving consistency (an event at t happens at t
throughout the system). The output is NOT on system time. The clamp
max(mz_timestamp, upper) remains as the safety net for arbitrary below-frontier
mz_timestamp data. Re-open the genuine domain question: aging/mz_now() in domain
A (event-age, stalls on idle input) vs domain B (wall-clock). Cite the
reclocking framework design (20210714).

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…SUBSCRIBE)

Adopt 'frontier as data': rather than a separate reclock collection, encode the
A->B mapping in-band as SUBSCRIBE-style mz_progressed rows in the DELTA TABLE, so
a DELTA TABLE is a persisted subscribe stream (data + progress). Symmetric
partner of differentiate: CHANGES turns changes into data (mz_diff), progress
markers turn the frontier into data (mz_progressed), INTEGRATE reads both back.
Key benefit: RECORD's data + reclock are one shard, so exactly-once is a single
CAS (no multi-shard atomicity for the reclock; they can't diverge); replica races
are an exactly-once concern resolved by that CAS, not a correctness problem.
Separate-reclock-collection noted as the alternative.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…el change)

Walk back over-adoption of in-band progress markers. It is a representation
choice that does not change the two-domain/reclock model, and it makes the DELTA
TABLE harder to consume (mz_progressed-style progress-marker noise). Its
single-shard-CAS benefit is marginal because the multi-output bundle already
needs a multi-shard txn, so committing a separate reclock atomically with the
data is nearly free. Present both representations neutrally (lean separate for
data cleanliness); keep the conceptual changes->data / frontier->data symmetry.
Reframe replica races as exactly-once (CAS on the recording commit), not
determinism. Add the open question.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Record the decision: the reclock is a separate, engine-owned collection (the
source-remap pattern), not data in the DELTA TABLE. Rationale: clean
argumentation (a separable mapping R; v provably = INTEGRATE(DELTA TABLE) driven
by R) and no tampering/validation (engine-owned metadata with assumed
invariants, not 'just data' in a writable table), plus independent retention.
State the DELTA TABLE = user data / reclock = control-plane bookkeeping
boundary. The reclock commits in the same multi-shard bundle txn as the
recording, so the extra shard is free. In-band mz_progressed markers moved to
Alternatives as considered-and-rejected; remove the now-decided open question.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
… RECORDER bundle)

Record the model decisions: RECORD/INTEGRATE/prune are separate,
independently-created/dropped objects over DELTA TABLEs -- NOT one atomic
RECORDER bundle. Cross-object consistency comes from reading at a common logical
time; each object commits via a frontier-gated OCC write (compute through X,
commit at X+1), which is also what makes the split-object dedup safe (the prune
sees finalized data <= X). The DELTA TABLE names its domain and owns its reclock;
INTEGRATE reclocks by the delta table's reclock (one writer commits data+reclock
together). Updated surfaces, dependencies, MVP, feasibility, open questions; and
the impl doc's architecture fork (per-object control-plane commit, no
multi-output bundle), risk register (H3 object model demoted to Med), and phased
plan. Drops the agent's gaps that the logical-time/frontier framing dissolves.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…lta tables

- INTEGRATE becomes a read operator (dual of CHANGES) usable in plain MVs, not
  an object kind; it is the typing boundary (mz_timestamp/mz_diff are data inside
  its argument, gone in its result). Fix the dedup example accordingly, and note
  the timestamp-as-data stability pitfall (subject to the compaction clamp).
- DELTA TABLEs are mutable: bounding is ordinary DELETE/UPDATE DML, defined as
  consolidating (retract at original mz_timestamp), distinct from forward age-out.
  The standing OCC pruner is deferred, keeping v1 bounding off the OCC critical path.
- The only new standing object is the RECORD writer; cadence is frontier-driven
  (COMMIT EVERY rejected as an anti-pattern).
- DELTA TABLE domain is inherited from the first RECORD writer by default, with
  IN DOMAIN as escape hatch; multiple writers per table are sound because the
  table-owned reclock recovers the merged A->B mapping; domain bound-once/immutable.
- Record data-domain compaction as a deferred future capability.
- CHANGES is an open PR (#36869), not shipped; keep CHANGES (not DIFFERENTIATE).
- Update the implementation companion to match (architecture, change map, H1/H3,
  phases, open questions).

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H

@antiguru antiguru left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — Recorders design doc

Strong, unusually well-grounded design. The conceptual framing is the best part; my substantive concerns are concentrated in freeze-by-typing ergonomics, the multi-writer domain-A frontier, and one success-criterion vs. mechanism mismatch on compliance erasure. None are blockers for a discussion draft — they're the agenda.

Strengths

  • The calculus reduces surface area honestly. Framing stream-table join, upsert, finalization, dedup, and compliance as compositions of four operations — rather than bespoke features — is the right answer to exactly why CTs bloated and stalled.
  • Intellectual honesty about feasibility. H1 states the gating OCC commit substrate is unbuilt ("zero hits in the tree") — confirmed. H2 admits the relaxed evaluation rule removes the fixpoint but not the lagged self-read machinery.
  • Code references are accurate. Line numbers check out (group_commit at appends.rs:344, append_table at storage-controller/src/lib.rs:2082); the CT as-of special-casing the doc wants to reuse is genuinely still live in compute-client/src/as_of_selection.rs. The continual_task.rs render module is gone but the doc correctly notes it's recoverable from base commit add050bf8. The salvage list is credible.
  • The z-transform duality (D = 1−z⁻¹, I = D⁻¹, freeze = sample-and-hold) is a genuinely clarifying mental model, and makes "INTEGRATE carries no non-determinism, hence no new object kind" obvious.

Substantive concerns / questions

1. Freeze-by-typing is the sharpest usability footgun — strengthen the diagnostics story.
The same SQL — JOIN dim d — means a maintained join in a plain MV but a silently frozen, sampled-once snapshot inside a RECORD body, with freeze as the default and tracking the opt-in. This is spooky-action-at-a-distance: identical join syntax flips semantics based on the enclosing object kind, with no marker at the join site. The plan-time taint rule protects plain MVs from accidental freeze, but nothing protects a RECORD author from accidentally freezing a reference they meant to track — and they get no error. I'd push Open Question 4 further: consider an explicit marker (even if redundant for the type checker) or strong EXPLAIN/notice output naming every frozen reference in a body. Arguably the safe (tracked) reading should be the default and freeze the explicit one.

2. The multi-writer domain-A completion frontier is underspecified.
Several RECORD writers into one delta table is marked Decided, and the table-owned reclock "recovers the precise A→B mapping." But INTEGRATE must drive v's domain-A frontier, which is a meet across all active writers' committed-through A-times — one slow/idle writer holds back the whole table's A-completeness. The doc never states this meet explicitly, nor what happens when a writer is dropped (does the A-frontier jump forward?) or stalls indefinitely. Given multi-writer is decided, the cross-writer A-frontier computation deserves specification, not just an assertion that it "works."

3. The mz_now()/aging domain (A vs B) is flagged open, but it's the most consequential unresolved semantic — resolve it early.
Whether retention mz_now() < mz_timestamp + W is event-age (domain A, stalls when input idles) or wall-clock (domain B, always advances) changes user-visible behavior dramatically and dictates which temporal-filter machinery you reuse (M3/M4). This isn't polish-phase; it gates the bounding design and the user mental model. Worth pulling forward to a Phase 0/1 decision.

4. mz_timestamp-surfaced-as-data instability may quietly undermine the headline dedup/first-seen use case.
The pitfall is honestly documented: … AS first_seen_at inherits the max(mz_timestamp, upper) clamp and can advance as since ticks forward, stable only within RETAIN HISTORY. But "first seen at" is precisely the column users will project and expect to be immutable — tying its stability to a retention knob makes a correctness-flavored property depend on an ops setting. This is more than "document it"; it warrants a louder warning / lint (gestured at in Open Q5), and the dedup example shouldn't be read as endorsing a stable first_seen_at it can't guarantee.

5. Make the "DELTA TABLE is not recomputable from inputs" optimizer barrier a stated hard invariant.
The impl doc (Open Q7) correctly says the optimizer must not treat the delta table as recomputable from the original inputs, but may treat INTEGRATE as recomputable over the delta table. This is correctness-critical — if the optimizer ever "sees through" RECORD to re-derive frozen/non-deterministic values, the determinism-boundary argument collapses. It currently reads as a passing remark; promote it to a named invariant in the conceptual doc's "determinism boundary" section, which is the load-bearing soundness argument.

6. Success-criterion vs. mechanism mismatch on compliance erasure.
Success Criteria says deleting a user "physically erases" their recorded rows. "Bounding growth" then correctly explains true erasure requires advancing since (forfeiting AS OF/replay in that range), and that a consolidating retraction at event-time alone leaves rows visible to earlier AS OF reads — "insufficient for GDPR." These are consistent only if you read the criterion as "consolidating delete + since advancement." As written it overpromises; restate the criterion with the since-advancement caveat so it can't be read as "a cascade DELETE alone is GDPR-compliant."

Minor / editorial

  • Heavy duplication between the two docs. The time-domains/reclock explanation appears nearly verbatim in both the conceptual doc ("Time domains and reclocking") and the impl doc (Open Q7) — ~40 echoed lines. The impl doc should reference rather than restate.
  • RECORDER vs RECORD writer terminology drifts (example says CREATE RECORDER … INTO, prose says "the RECORD writer", title says "Recorders"). Self-flagged as open, but worth picking one for readability even in draft.
  • Tier table: Tier 1 "exactly-once into persist" still rests on the per-commit CAS (stated elsewhere); worth noting in the table.
  • The DELETE (SELECT …) FROM enriched cascade example reads as pseudo-syntax — fine given "syntax TBD," but maybe label it inline.

Recommendation

Approve the direction for discussion — the doc is in excellent shape for driving WG consensus. Before it graduates from draft I'd want: (a) the freeze-default safety question resolved, (b) the multi-writer A-frontier specified, (c) the mz_now() domain chosen, (d) the compliance success-criterion reworded to match the mechanism. Items 2–4 are also the most likely to bite the implementation, so resolving them now de-risks Phase 1.


Generated by Claude Code

claude added 5 commits June 5, 2026 21:33
…ging domain, erasure wording

Incorporates the PR review on #36909:
- Promote 'DELTA TABLE is not recomputable from RECORD inputs' to a named
  optimizer-barrier invariant in the determinism-boundary section.
- Freeze stays the default but must be diagnosable, never silent: EXPLAIN +
  plan-time NOTICE naming every frozen reference (optional FROZEN/TRACKED marker).
- Specify the multi-writer domain-A frontier as the meet over active writers'
  reclocks (idle advances, dropped leaves the meet, stalled holds it back).
- Decide the mz_now()/aging domain: default wall-clock (B), event-age (A) opt-in;
  pulled forward to Phase 1.
- first_seen_at instability: louder warning/lint, do not treat as immutable.
- Reword the compliance success criterion: true erasure = consolidating DELETE +
  advancing since (forfeits AS OF/replay); a cascade DELETE alone is not GDPR.
- Editorial: trim the duplicated time-domains/reclock restatement in the impl
  doc's Open Q7 to a reference; note Tier 1 rests on the per-commit CAS; label the
  cascade DELETE as illustrative syntax; pin RECORDER (object) vs RECORD writer (role).

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…ata; no DELTA TABLE kind

Reworks the surface per review discussion:
- Resolve the diff fork: change_diff is DATA (not the persist multiplicity).
  CHANGES emits it as a +1 row with the signed count in the column.
- Sharpen the duality: differentiate/integrate are the carrier-preserving inverse
  pair (both domain A); the A->B move and the commit-time/freeze non-determinism
  live ONLY in RECORD. INTEGRATE is the pure, definite inverse.
- INTEGRATE becomes a SQL combinator INTEGRATE(rel, TIME=>, DIFF=>, RECLOCK=>) that
  accumulates diff per row and thresholds at max(0, sum) (safe by construction; not
  repeat_row). Stateful reduce; memory proportional to live output.
- Collapse DELTA TABLE into a regular table + an explicit, named, relation-valued,
  engine-written/user-read-only reclock object (source-remap precedent). No new
  collection kind. Kills impl M2; reshapes M3 (no -1/+1 consolidation).
- Bounding is an ordinary DELETE of changelog rows (real retraction); the
  integral-preserving form is data-domain compaction, now the named primitive.
- Columns are ordinary user-named data (no reserved mz_ names; change_ts/change_diff
  illustrative).
- Propagate through both docs: duality, surface, time domains, reclock decision,
  bounding, dependencies, MVP, feasibility, alternatives, open questions; impl
  context/architecture/change-map/risks (M2/M3/M4)/phases/open-Qs.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…mpaction as a reduce

Folds in the review (and the CHANGES carrier-naming we converged on):
- CHANGES/INTEGRATE are symmetric on named carriers: CHANGES(rel, TIME=>, DIFF=>)
  produces them, INTEGRATE(rel, TIME=>, DIFF=>) consumes them; bare CHANGES(rel)
  exposes none. No reserved/implicit columns.
- Drop the per-operator RECLOCK argument. The reclock is bound 1:1 to the table
  (inferred from rel's lineage); a per-site reclock could let two INTEGRATEs of one
  table disagree. Recorded as a rejected alternative.
- One reclock per table, structurally and necessarily (B is one shard/one clock);
  all RECORDs and all INTEGRATEs of a table share it, so multiple INTEGRATEs are
  consistent by construction. The table is the unit of shared completeness.
- Multi-writer via per-writer reclock lanes R_i: B -> X_i with a read-time meet
  (no merged frontier, no inter-writer coordination); add = register lane >= upper,
  drop = lane leaves the meet (one-way), drop-all = table seals (finalization).
- Data-domain compaction promoted to the named primitive and made precise: a reduce
  (clamp change_ts := max(change_ts, t) then GROUP BY <data cols, clamped ts>,
  SUM(change_diff)), NOT a bare in-place clamp (identical clamped SourceData merges
  to persist mult 2 while the data column reads +1 — a consequence of diff-as-data).
  Reclock-free, recorder-free, idempotent/monotone; A-since distinct from B-since.
- Propagated through both docs (surface, reclock decision, multi-writer, bounding,
  alternatives, open questions; impl change-map/M3/M4/phases/Q7).

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…RECORD wrapper

From an agent review against Materialize's grammar (file:line) and prior art:
- Carriers use a USING TIME/DIFF clause, not => named args: MZ has no named-arg
  support and => already lexes as map-entry; USING reuses a reserved keyword at
  near-zero parser cost. (=> is the cross-system convention; recorded as a possible
  future revisit, co-designed with #36869.)
- CHANGES's AS OF parses like SUBSCRIBE (query) AS OF … — attaches to the operator,
  not an inner SELECT (resolves the 'AS OF only at outermost SELECT' concern).
- Drop the AS RECORD(...) wrapper: CREATE RECORDER … INTO <table> AS <query> binds
  like MV's AS <query>; INTO matches CREATE SINK … INTO. Also dissolves the RECORD-
  verb vs composite-type 'record' overlap.
- Reclock surfaced via EXPOSE RECLOCK AS <name> (the CREATE SOURCE … EXPOSE PROGRESS
  AS precedent), engine-written / user-read-only.
- Replace IN DOMAIN with WITH (TIMELINE = …) — IN DOMAIN would overload IN and
  foreclose standard CREATE DOMAIN.
- Document deliberate divergences: DIFF is a signed multiplicity (not an op-enum like
  Snowflake METADATA$ACTION / Flink row-kinds / Debezium c,u,d); CHANGES collides
  with Snowflake's CHANGES clause (kept knowingly).
- Confirmed idiomatic as-is: INTO, RETAIN HISTORY FOR, WITH (opt = val), mz_now().
- Propagated through both docs (operations table, surface, reclock decision, open
  questions, alternatives; impl change-map + Phase 2).

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
… the recorder

- INTEGRATE(table USING TIME …, DIFF …) takes a *bare recorded table* and does the
  accumulate-and-threshold (GROUP BY non-carrier cols, SUM(diff) keeping >0)
  internally. No relation argument and no reclock argument: the reclock is looked up
  from the table by identity, so there is no lineage analysis and no ambiguity.
  Rejected arbitrary-relation INTEGRATE (no single well-defined reclock); a
  single-recorded-table relation is a possible future relaxation.
- The reclock is owned by the TABLE (auto-created when it first becomes a recording
  target; EXPOSE RECLOCK AS <name>, the source-progress precedent), not the recorder
  — fixes the mis-location where N recorders couldn't each 'own' it. A RECORDER only
  writes a per-writer lane.
- Application shaping (dedup, first-seen, upsert, top-k) moves into the RECORD body
  (which decides which deltas to record); first-seen/upsert bodies are self-
  referential (Tier 3), the raw-record cases are not. Reworked the running example
  accordingly (INTEGRATE over the table; bounding via data-domain compaction).
- Propagated through both docs: summary, operations table, surface bullets, reclock
  section, evaluation rule, bounding pitfall, open questions, alternatives; impl
  change-map, M2, Phase 2, Q7.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
claude added 2 commits June 8, 2026 07:17
…les)

Judgment: the 'emit on a-change' and 'system-time stamp' use cases are good
user-docs/teaching material but tangential to the design (the determinism boundary
and freeze are already stated and illustrated by the running example), so they are
NOT folded in. Two load-bearing facts they surfaced are now stated as rules:
- CHANGES over a long-lived input must be bounded (AS OF AT LEAST mz_now() - W +
  a temporal filter on the time carrier); promoted from an incidental example
  detail to an explicit rule.
- To stamp system/wall-clock time in a RECORD body use now() (a frozen sample,
  recorded once, as a value column), NOT mz_now() — which in the body is domain-A
  event time. Closes a real footgun for the staleness use case.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…rg-free INTEGRATE

Aligns the docs with the canonical Notion 'Recorders' page (same model; surface/
controller details reconciled):
- Carriers + progress declared on the TABLE: CREATE TABLE … WITH (PROGRESS,
  TIMESTAMP = …, DIFF = …), initialized to (0,0) (empty progress = sealed frontier).
- INTEGRATE(table) is argument-free — carriers and the progress collection come
  from the table; nothing to mismatch. (CHANGES carrier spelling still open / #36869.)
- Standardize terminology: the object is the 'progress collection' (a source's
  progress precedent); 'reclock'/'reclocking' is the operation of mapping through it.
- Controller obligations (new impl M5): writer registration = capability; progress-
  collection GC by tracking INTEGRATE consumers; RETAIN HISTORY sits on the
  INTEGRATE consumer/recorder, not the table (leaning INTEGRATE).
- Drop-last-writer reframed as open: freeze vs seal (advance to empty/top frontier).
- Framing: at-least-once application / at-most-once persistence.
Kept the doc's sharper points that Notion lacks: data-domain compaction must be a
clamp+GROUP BY/SUM reduce (a bare UPDATE is wrong), and the self-reference answer.

https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants