design: recorders — recorded, non-deterministic, at-least-once transformations#36909
design: recorders — recorded, non-deterministic, at-least-once transformations#36909antiguru wants to merge 29 commits into
Conversation
…e transformations) Design doc reviving the continual-tasks capability on top of the CHANGES table function (#36869) and the OCC read-then-write commit substrate. Presents a declarative recorded-collection surface and the imperative BEGIN CONTINUAL TRANSACTION surface as two altitudes of one engine, with a layered correctness ladder (exactly-once into persist / eventual via an MV / at-least-once external). Motivated by stream-table joins, finalization, upsert-in-compute, and non-deterministic/UDF enrichment. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Elevate unbounded output/state growth to a first-class concern: the record-don't-recompute vs bounded-resources tension, the output models (append+retention / keyed-upsert / writable table) and their growth stories, and the internal reclamation invariants (retractions must consolidate with their inserts; bounded growth <=> bounded since-lag <=> bounded retention window; RETAIN HISTORY trades off against growth). Prefer keyed/upsert output reusing the storage upsert machinery over the two-shard+MV pattern whose working shard is itself the growth risk. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Reframe output growth around how data leaves: age/time-based phase-out is automatic via the temporal-filter + lagged-read-hold mechanism (clock-driven, self-finalizing, no re-integration, no self-reference), NOT a DELETE; supersession is a declared key emitting the exact prior-value retraction; arbitrary predicate eviction is explicit DELETE in the imperative surface only. Decouples when output is produced (input-driven) from when old data is forgotten (clock-driven). Drop the upsert_continual_feedback framing (it's an ordinary keyed arrangement). Unbounded key spaces are fine; bound time, not keys. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…output mode Collapse the keyed/upsert output model: upsert = top-1 (top-k), so the reduce already emits the exact retract/insert diffs and the output just records them (O(live keys)). No declared key, no special output mode, no per-commit self-reference. Output side reduces to: record the body's diffs + optional time-based retention. The real distinction from a plain top-k MV is input-side: consume input as a listen-only changelog and persist the reduce state as the output (rehydrated on restart), so the upstream input can be forgotten without a rehydration snapshot. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…ential lifetime Time-based retention is one corner of eviction, not the whole story. A recorded row has two independent relationships to its sources: its value (frozen at processing time) and its existence (which can stay live). The join structure governs lifetime; a FROZEN(...) marker governs which values are snapshotted. Compliance (erase a deleted user's records) is referential ownership / ON DELETE CASCADE -- a physical retraction, not a read-time filter -- costing an index on the output's liveness key. This is a cost spectrum chosen per relationship: fully-frozen enrichment needs no stream index; live-existence enrichment indexes the output by its liveness key. The currency-conversion-with-compliance case needs both behaviors in one transform. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Capture the surface question: the asymmetric stream-driven join wants to be explicit (prior art: Flink temporal/lookup join, Kafka stream-table join), but a single keyword conflates value-freshness and existence-freshness. STREAM JOIN is the fully-frozen floor; FROZEN() and a live-existence opt-in handle the deviations (the compliance cascade is a capability Flink/Kafka lack). The construct only yields non-definite results, so it is valid only inside a recorded collection -- possibly a third altitude alongside the object and the imperative transaction. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Lead with the three orthogonal primitives -- RECORD (definite-by-persistence), COMMIT@T (processing-time triggered timestamped write), FROZEN (per-value snapshot cut) -- plus CHANGES as a boxed dependency. Recast everything else as composition: a reconstruction table (use case -> primitives + SQL), the stream/table-duality + sample-and-hold framing, the determinism boundary with a single lint rule (FROZEN/processing-time writes legal only inside a RECORD output), and eviction / STREAM JOIN / surfaces / correctness ladder all explicitly derived. Resolve the freshness default (frozen-by-default in a stream context; live existence opt-in). Add bespoke-features-per-use-case and FROZEN-as-join-modifier to Alternatives. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…TVC-dTVC calculus Rename the feature to 'Recorders'. Lead with the four-operation calculus (differentiate / integrate / record / bound) over TVC <-> dTVC, with freeze as sample-and-hold. Concrete surface: CREATE DELTA TABLE and CREATE RECORDER with RECORD / INTEGRATE / DELETE actions and a worked dedup example. Freeze is typing, not a keyword (bare TVC reference = frozen; CHANGES/DELTA TABLE = tracked); compliance cascade = a CHANGES(dim)-driven DELETE action. Add the key evaluation rule (pre-commit reads, atomic writes at T, no intra-commit fixpoint), the two stream-table joins (as-of-event-time definite vs processing-time recorded) and the history/record duality, INTEGRATE monotonicity clamp, and restart behavior. Reserve 'continual transaction' for the imperative surface; retire 'task'/'transform'. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…king claim Add a feasibility section from a codebase pass against the recovered Continual Tasks implementation: the gating dependency is the unbuilt OCC timestamped-write substrate + the data-plane->control-plane hand-off (the storage-level atomic multi-shard write via txn-wal commit_at already exists); the control-plane commit path is mandatory (CTs' bespoke sink bypassed txn-wal); sharp edges (self-reference reclocking, multi-output catalog/controller model, freeze-by- typing, DELTA TABLE collection kind) all hit by CTs. Correct the overclaim that the no-fixpoint rule removes ALL T-1 reclocking: the cross-commit self-read still needs the CT step_forward/read-hold machinery. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Companion to the recorders design. Picks the control-plane-commit architecture (Option B; the CT compute-sink path is a single-output dead end), names the unbuilt OCC timestamped group-commit + dataflow->control-plane hand-off as the gating dependency (storage-level commit_at/UpperMismatch already exists), and gives a per-crate change map, a ranked risk register (commit substrate; self-reference reclocking; one-item-N-outputs catalog model; freeze-by-typing; DELTA TABLE collection kind; INTEGRATE compaction), what to salvage vs rebuild from the removed CT code, a phased plan, and open questions. Main doc points to it. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Close the gap that the design pinned down the row mz_timestamp (data) but not the output collection's frontier (progress). State that an output's upper advances with the meet of its input frontiers (like an MV), independent of the mz_timestamp data; the write-time clamp (>= upper) is what makes monotone frontier advance sound; and idle recorders must still tick output uppers forward (clock-driven frontier-only commits) or downstream reads stall. Add impl risk M4 noting the mechanism already exists via append_table's advance_to and must be wired into the Phase-0 commit loop. Tie the commit-cadence open question to its frontier face. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Close the correctness gap: a fact definite at logical time t takes effect in a recorded output at commit time t' >= t (cannot write below upper / retroact), so the output is bitemporal -- frontier/logical time = system time (recorded-by), mz_timestamp column = event time. Defining property: output(T) = recorded-by-T, not f(inputs(T)). RECORD -> DELTA TABLE is bitemporal (event-time queries recoverable via mz_timestamp); INTEGRATE -> TVC is reclocked to T and discards event time (use for current state). Same reclocking ingestion does, but re-stamps an already-timestamped fact. Impl doc: optimizer/controller must not treat a recorder output as a recomputable function of inputs. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…tem time) Supersede the weaker 'INTEGRATE reclocks to T, non-deterministic history' framing. Treat a recorder like source ingestion: record a durable reclock t -> t' (event-time completeness vs system-time write frontier) and use it to drive the integrated collection's frontier. INTEGRATE places each fact at its event time (max(mz_timestamp, upper); logical compaction for late data) on the input's timeline, so the integrated collection is a definite function of the recorded DELTA TABLE + reclock -- stable history, consistent downstream composition. Non-determinism is confined to the recorded values + the reclock; everything downstream is definite. A proper bitemporal object (query by event time or system time). Reuse source reclock/remap machinery. Impl doc: the RECORD step is the only non-deterministic boundary; INTEGRATE outputs are recomputable over the recorded data; new reclock/remap component per recorder. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…rontier/erasure caveats Adversarial probe found a real bug: the max(mz_timestamp, upper) 'logical compaction of late data' splits retract/insert pairs across a frontier advance and is unsound for updates. Fix: place each fact at its event-time mz_timestamp and advance the integrated frontier only after integrating all input through that time -- no clamp, late data is impossible for a well-formed input frontier, INTEGRATE is an ordinary dataflow over the DELTA TABLE. This dissolves the replica-nondeterminism and non-monotone-reclock concerns (placement is event-time only; t' is allowed-nondeterministic when-recorded metadata; the reclock relates two monotone frontiers). Also: clarify event time and system time are two timestamps on the SAME timeline (t' >= t), so recorder outputs are lagged system-timeline collections and cross-table joins work (no separate-timeline wall). Add the dual-frontier requirement (time-based aging needs a system/wall-clock frontier, since the event-time frontier stalls when input is idle; mz_now() = processing time). Correct the self-referential prune to read the system-time frontier (the one-tick convergence claim held only on system time). Document the freeze-back-stamping caveat (frozen value = processing-time fact at event-time stamp, not historical truth) and the GDPR-erasure vs stable-history tradeoff. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Revert last commit's over-correction. mz_timestamp is arbitrary data (could be 0), a distinct timeline from the table's system/write time; you cannot use one for the other. INTEGRATE must clamp max(mz_timestamp, upper) to hold the TVC invariant (logical compaction of below-frontier data) -- this is sound because a same-row +1/-1 accumulates correctly regardless of times, so it is NOT the split-retract/insert bug. The output lives on the system timeline (joinable, no separate-timeline wall); event-time answers come from filtering the mz_timestamp column. Time-based aging works via mz_now()=system time on the clock (no dual-frontier needed); self-ref prune is automatically system-time. The reclock (inherent: each row carries mz_timestamp + write time) makes the clamped integration reproducible. Keep the GDPR-erasure-vs-history and queryability-vs-compaction tradeoffs. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…A), not system time Correct the timeline model precisely. Two domains: A = input timeline (mz_timestamp lives here), B = recorder processing/system time (the DELTA TABLE's physical write frontier). RECORD reads A, writes data into the DELTA TABLE (B), and notes the A->B mapping (reclock). INTEGRATE reads the DELTA TABLE (B), places output by mz_timestamp (A), and uses the reclock to drive its output frontier back into domain A -- so an event at input-time t appears in the output at t (round-trip A->B->A), preserving consistency (an event at t happens at t throughout the system). The output is NOT on system time. The clamp max(mz_timestamp, upper) remains as the safety net for arbitrary below-frontier mz_timestamp data. Re-open the genuine domain question: aging/mz_now() in domain A (event-age, stalls on idle input) vs domain B (wall-clock). Cite the reclocking framework design (20210714). https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…SUBSCRIBE) Adopt 'frontier as data': rather than a separate reclock collection, encode the A->B mapping in-band as SUBSCRIBE-style mz_progressed rows in the DELTA TABLE, so a DELTA TABLE is a persisted subscribe stream (data + progress). Symmetric partner of differentiate: CHANGES turns changes into data (mz_diff), progress markers turn the frontier into data (mz_progressed), INTEGRATE reads both back. Key benefit: RECORD's data + reclock are one shard, so exactly-once is a single CAS (no multi-shard atomicity for the reclock; they can't diverge); replica races are an exactly-once concern resolved by that CAS, not a correctness problem. Separate-reclock-collection noted as the alternative. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…el change) Walk back over-adoption of in-band progress markers. It is a representation choice that does not change the two-domain/reclock model, and it makes the DELTA TABLE harder to consume (mz_progressed-style progress-marker noise). Its single-shard-CAS benefit is marginal because the multi-output bundle already needs a multi-shard txn, so committing a separate reclock atomically with the data is nearly free. Present both representations neutrally (lean separate for data cleanliness); keep the conceptual changes->data / frontier->data symmetry. Reframe replica races as exactly-once (CAS on the recording commit), not determinism. Add the open question. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
Record the decision: the reclock is a separate, engine-owned collection (the source-remap pattern), not data in the DELTA TABLE. Rationale: clean argumentation (a separable mapping R; v provably = INTEGRATE(DELTA TABLE) driven by R) and no tampering/validation (engine-owned metadata with assumed invariants, not 'just data' in a writable table), plus independent retention. State the DELTA TABLE = user data / reclock = control-plane bookkeeping boundary. The reclock commits in the same multi-shard bundle txn as the recording, so the extra shard is free. In-band mz_progressed markers moved to Alternatives as considered-and-rejected; remove the now-decided open question. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
… RECORDER bundle) Record the model decisions: RECORD/INTEGRATE/prune are separate, independently-created/dropped objects over DELTA TABLEs -- NOT one atomic RECORDER bundle. Cross-object consistency comes from reading at a common logical time; each object commits via a frontier-gated OCC write (compute through X, commit at X+1), which is also what makes the split-object dedup safe (the prune sees finalized data <= X). The DELTA TABLE names its domain and owns its reclock; INTEGRATE reclocks by the delta table's reclock (one writer commits data+reclock together). Updated surfaces, dependencies, MVP, feasibility, open questions; and the impl doc's architecture fork (per-object control-plane commit, no multi-output bundle), risk register (H3 object model demoted to Med), and phased plan. Drops the agent's gaps that the logical-time/frontier framing dissolves. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…lta tables - INTEGRATE becomes a read operator (dual of CHANGES) usable in plain MVs, not an object kind; it is the typing boundary (mz_timestamp/mz_diff are data inside its argument, gone in its result). Fix the dedup example accordingly, and note the timestamp-as-data stability pitfall (subject to the compaction clamp). - DELTA TABLEs are mutable: bounding is ordinary DELETE/UPDATE DML, defined as consolidating (retract at original mz_timestamp), distinct from forward age-out. The standing OCC pruner is deferred, keeping v1 bounding off the OCC critical path. - The only new standing object is the RECORD writer; cadence is frontier-driven (COMMIT EVERY rejected as an anti-pattern). - DELTA TABLE domain is inherited from the first RECORD writer by default, with IN DOMAIN as escape hatch; multiple writers per table are sound because the table-owned reclock recovers the merged A->B mapping; domain bound-once/immutable. - Record data-domain compaction as a deferred future capability. - CHANGES is an open PR (#36869), not shipped; keep CHANGES (not DIFFERENTIATE). - Update the implementation companion to match (architecture, change map, H1/H3, phases, open questions). https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
antiguru
left a comment
There was a problem hiding this comment.
Review — Recorders design doc
Strong, unusually well-grounded design. The conceptual framing is the best part; my substantive concerns are concentrated in freeze-by-typing ergonomics, the multi-writer domain-A frontier, and one success-criterion vs. mechanism mismatch on compliance erasure. None are blockers for a discussion draft — they're the agenda.
Strengths
- The calculus reduces surface area honestly. Framing stream-table join, upsert, finalization, dedup, and compliance as compositions of four operations — rather than bespoke features — is the right answer to exactly why CTs bloated and stalled.
- Intellectual honesty about feasibility. H1 states the gating OCC commit substrate is unbuilt ("zero hits in the tree") — confirmed. H2 admits the relaxed evaluation rule removes the fixpoint but not the lagged self-read machinery.
- Code references are accurate. Line numbers check out (
group_commitatappends.rs:344,append_tableatstorage-controller/src/lib.rs:2082); the CT as-of special-casing the doc wants to reuse is genuinely still live incompute-client/src/as_of_selection.rs. Thecontinual_task.rsrender module is gone but the doc correctly notes it's recoverable from base commitadd050bf8. The salvage list is credible. - The z-transform duality (
D = 1−z⁻¹,I = D⁻¹, freeze = sample-and-hold) is a genuinely clarifying mental model, and makes "INTEGRATEcarries no non-determinism, hence no new object kind" obvious.
Substantive concerns / questions
1. Freeze-by-typing is the sharpest usability footgun — strengthen the diagnostics story.
The same SQL — JOIN dim d — means a maintained join in a plain MV but a silently frozen, sampled-once snapshot inside a RECORD body, with freeze as the default and tracking the opt-in. This is spooky-action-at-a-distance: identical join syntax flips semantics based on the enclosing object kind, with no marker at the join site. The plan-time taint rule protects plain MVs from accidental freeze, but nothing protects a RECORD author from accidentally freezing a reference they meant to track — and they get no error. I'd push Open Question 4 further: consider an explicit marker (even if redundant for the type checker) or strong EXPLAIN/notice output naming every frozen reference in a body. Arguably the safe (tracked) reading should be the default and freeze the explicit one.
2. The multi-writer domain-A completion frontier is underspecified.
Several RECORD writers into one delta table is marked Decided, and the table-owned reclock "recovers the precise A→B mapping." But INTEGRATE must drive v's domain-A frontier, which is a meet across all active writers' committed-through A-times — one slow/idle writer holds back the whole table's A-completeness. The doc never states this meet explicitly, nor what happens when a writer is dropped (does the A-frontier jump forward?) or stalls indefinitely. Given multi-writer is decided, the cross-writer A-frontier computation deserves specification, not just an assertion that it "works."
3. The mz_now()/aging domain (A vs B) is flagged open, but it's the most consequential unresolved semantic — resolve it early.
Whether retention mz_now() < mz_timestamp + W is event-age (domain A, stalls when input idles) or wall-clock (domain B, always advances) changes user-visible behavior dramatically and dictates which temporal-filter machinery you reuse (M3/M4). This isn't polish-phase; it gates the bounding design and the user mental model. Worth pulling forward to a Phase 0/1 decision.
4. mz_timestamp-surfaced-as-data instability may quietly undermine the headline dedup/first-seen use case.
The pitfall is honestly documented: … AS first_seen_at inherits the max(mz_timestamp, upper) clamp and can advance as since ticks forward, stable only within RETAIN HISTORY. But "first seen at" is precisely the column users will project and expect to be immutable — tying its stability to a retention knob makes a correctness-flavored property depend on an ops setting. This is more than "document it"; it warrants a louder warning / lint (gestured at in Open Q5), and the dedup example shouldn't be read as endorsing a stable first_seen_at it can't guarantee.
5. Make the "DELTA TABLE is not recomputable from inputs" optimizer barrier a stated hard invariant.
The impl doc (Open Q7) correctly says the optimizer must not treat the delta table as recomputable from the original inputs, but may treat INTEGRATE as recomputable over the delta table. This is correctness-critical — if the optimizer ever "sees through" RECORD to re-derive frozen/non-deterministic values, the determinism-boundary argument collapses. It currently reads as a passing remark; promote it to a named invariant in the conceptual doc's "determinism boundary" section, which is the load-bearing soundness argument.
6. Success-criterion vs. mechanism mismatch on compliance erasure.
Success Criteria says deleting a user "physically erases" their recorded rows. "Bounding growth" then correctly explains true erasure requires advancing since (forfeiting AS OF/replay in that range), and that a consolidating retraction at event-time alone leaves rows visible to earlier AS OF reads — "insufficient for GDPR." These are consistent only if you read the criterion as "consolidating delete + since advancement." As written it overpromises; restate the criterion with the since-advancement caveat so it can't be read as "a cascade DELETE alone is GDPR-compliant."
Minor / editorial
- Heavy duplication between the two docs. The time-domains/reclock explanation appears nearly verbatim in both the conceptual doc ("Time domains and reclocking") and the impl doc (Open Q7) — ~40 echoed lines. The impl doc should reference rather than restate.
RECORDERvsRECORDwriter terminology drifts (example saysCREATE RECORDER … INTO, prose says "theRECORDwriter", title says "Recorders"). Self-flagged as open, but worth picking one for readability even in draft.- Tier table: Tier 1 "exactly-once into persist" still rests on the per-commit CAS (stated elsewhere); worth noting in the table.
- The
DELETE (SELECT …) FROM enrichedcascade example reads as pseudo-syntax — fine given "syntax TBD," but maybe label it inline.
Recommendation
Approve the direction for discussion — the doc is in excellent shape for driving WG consensus. Before it graduates from draft I'd want: (a) the freeze-default safety question resolved, (b) the multi-writer A-frontier specified, (c) the mz_now() domain chosen, (d) the compliance success-criterion reworded to match the mechanism. Items 2–4 are also the most likely to bite the implementation, so resolving them now de-risks Phase 1.
Generated by Claude Code
…ging domain, erasure wording Incorporates the PR review on #36909: - Promote 'DELTA TABLE is not recomputable from RECORD inputs' to a named optimizer-barrier invariant in the determinism-boundary section. - Freeze stays the default but must be diagnosable, never silent: EXPLAIN + plan-time NOTICE naming every frozen reference (optional FROZEN/TRACKED marker). - Specify the multi-writer domain-A frontier as the meet over active writers' reclocks (idle advances, dropped leaves the meet, stalled holds it back). - Decide the mz_now()/aging domain: default wall-clock (B), event-age (A) opt-in; pulled forward to Phase 1. - first_seen_at instability: louder warning/lint, do not treat as immutable. - Reword the compliance success criterion: true erasure = consolidating DELETE + advancing since (forfeits AS OF/replay); a cascade DELETE alone is not GDPR. - Editorial: trim the duplicated time-domains/reclock restatement in the impl doc's Open Q7 to a reference; note Tier 1 rests on the per-commit CAS; label the cascade DELETE as illustrative syntax; pin RECORDER (object) vs RECORD writer (role). https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…ata; no DELTA TABLE kind Reworks the surface per review discussion: - Resolve the diff fork: change_diff is DATA (not the persist multiplicity). CHANGES emits it as a +1 row with the signed count in the column. - Sharpen the duality: differentiate/integrate are the carrier-preserving inverse pair (both domain A); the A->B move and the commit-time/freeze non-determinism live ONLY in RECORD. INTEGRATE is the pure, definite inverse. - INTEGRATE becomes a SQL combinator INTEGRATE(rel, TIME=>, DIFF=>, RECLOCK=>) that accumulates diff per row and thresholds at max(0, sum) (safe by construction; not repeat_row). Stateful reduce; memory proportional to live output. - Collapse DELTA TABLE into a regular table + an explicit, named, relation-valued, engine-written/user-read-only reclock object (source-remap precedent). No new collection kind. Kills impl M2; reshapes M3 (no -1/+1 consolidation). - Bounding is an ordinary DELETE of changelog rows (real retraction); the integral-preserving form is data-domain compaction, now the named primitive. - Columns are ordinary user-named data (no reserved mz_ names; change_ts/change_diff illustrative). - Propagate through both docs: duality, surface, time domains, reclock decision, bounding, dependencies, MVP, feasibility, alternatives, open questions; impl context/architecture/change-map/risks (M2/M3/M4)/phases/open-Qs. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…mpaction as a reduce Folds in the review (and the CHANGES carrier-naming we converged on): - CHANGES/INTEGRATE are symmetric on named carriers: CHANGES(rel, TIME=>, DIFF=>) produces them, INTEGRATE(rel, TIME=>, DIFF=>) consumes them; bare CHANGES(rel) exposes none. No reserved/implicit columns. - Drop the per-operator RECLOCK argument. The reclock is bound 1:1 to the table (inferred from rel's lineage); a per-site reclock could let two INTEGRATEs of one table disagree. Recorded as a rejected alternative. - One reclock per table, structurally and necessarily (B is one shard/one clock); all RECORDs and all INTEGRATEs of a table share it, so multiple INTEGRATEs are consistent by construction. The table is the unit of shared completeness. - Multi-writer via per-writer reclock lanes R_i: B -> X_i with a read-time meet (no merged frontier, no inter-writer coordination); add = register lane >= upper, drop = lane leaves the meet (one-way), drop-all = table seals (finalization). - Data-domain compaction promoted to the named primitive and made precise: a reduce (clamp change_ts := max(change_ts, t) then GROUP BY <data cols, clamped ts>, SUM(change_diff)), NOT a bare in-place clamp (identical clamped SourceData merges to persist mult 2 while the data column reads +1 — a consequence of diff-as-data). Reclock-free, recorder-free, idempotent/monotone; A-since distinct from B-since. - Propagated through both docs (surface, reclock decision, multi-writer, bounding, alternatives, open questions; impl change-map/M3/M4/phases/Q7). https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…RECORD wrapper From an agent review against Materialize's grammar (file:line) and prior art: - Carriers use a USING TIME/DIFF clause, not => named args: MZ has no named-arg support and => already lexes as map-entry; USING reuses a reserved keyword at near-zero parser cost. (=> is the cross-system convention; recorded as a possible future revisit, co-designed with #36869.) - CHANGES's AS OF parses like SUBSCRIBE (query) AS OF … — attaches to the operator, not an inner SELECT (resolves the 'AS OF only at outermost SELECT' concern). - Drop the AS RECORD(...) wrapper: CREATE RECORDER … INTO <table> AS <query> binds like MV's AS <query>; INTO matches CREATE SINK … INTO. Also dissolves the RECORD- verb vs composite-type 'record' overlap. - Reclock surfaced via EXPOSE RECLOCK AS <name> (the CREATE SOURCE … EXPOSE PROGRESS AS precedent), engine-written / user-read-only. - Replace IN DOMAIN with WITH (TIMELINE = …) — IN DOMAIN would overload IN and foreclose standard CREATE DOMAIN. - Document deliberate divergences: DIFF is a signed multiplicity (not an op-enum like Snowflake METADATA$ACTION / Flink row-kinds / Debezium c,u,d); CHANGES collides with Snowflake's CHANGES clause (kept knowingly). - Confirmed idiomatic as-is: INTO, RETAIN HISTORY FOR, WITH (opt = val), mz_now(). - Propagated through both docs (operations table, surface, reclock decision, open questions, alternatives; impl change-map + Phase 2). https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
… the recorder - INTEGRATE(table USING TIME …, DIFF …) takes a *bare recorded table* and does the accumulate-and-threshold (GROUP BY non-carrier cols, SUM(diff) keeping >0) internally. No relation argument and no reclock argument: the reclock is looked up from the table by identity, so there is no lineage analysis and no ambiguity. Rejected arbitrary-relation INTEGRATE (no single well-defined reclock); a single-recorded-table relation is a possible future relaxation. - The reclock is owned by the TABLE (auto-created when it first becomes a recording target; EXPOSE RECLOCK AS <name>, the source-progress precedent), not the recorder — fixes the mis-location where N recorders couldn't each 'own' it. A RECORDER only writes a per-writer lane. - Application shaping (dedup, first-seen, upsert, top-k) moves into the RECORD body (which decides which deltas to record); first-seen/upsert bodies are self- referential (Tier 3), the raw-record cases are not. Reworked the running example accordingly (INTEGRATE over the table; bounding via data-domain compaction). - Propagated through both docs: summary, operations table, surface bullets, reclock section, evaluation rule, bounding pitfall, open questions, alternatives; impl change-map, M2, Phase 2, Q7. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…les) Judgment: the 'emit on a-change' and 'system-time stamp' use cases are good user-docs/teaching material but tangential to the design (the determinism boundary and freeze are already stated and illustrated by the running example), so they are NOT folded in. Two load-bearing facts they surfaced are now stated as rules: - CHANGES over a long-lived input must be bounded (AS OF AT LEAST mz_now() - W + a temporal filter on the time carrier); promoted from an incidental example detail to an explicit rule. - To stamp system/wall-clock time in a RECORD body use now() (a frozen sample, recorded once, as a value column), NOT mz_now() — which in the body is domain-A event time. Closes a real footgun for the staleness use case. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
…rg-free INTEGRATE Aligns the docs with the canonical Notion 'Recorders' page (same model; surface/ controller details reconciled): - Carriers + progress declared on the TABLE: CREATE TABLE … WITH (PROGRESS, TIMESTAMP = …, DIFF = …), initialized to (0,0) (empty progress = sealed frontier). - INTEGRATE(table) is argument-free — carriers and the progress collection come from the table; nothing to mismatch. (CHANGES carrier spelling still open / #36869.) - Standardize terminology: the object is the 'progress collection' (a source's progress precedent); 'reclock'/'reclocking' is the operation of mapping through it. - Controller obligations (new impl M5): writer registration = capability; progress- collection GC by tracking INTEGRATE consumers; RETAIN HISTORY sits on the INTEGRATE consumer/recorder, not the table (leaning INTEGRATE). - Drop-last-writer reframed as open: freeze vs seal (advance to empty/top frontier). - Framing: at-least-once application / at-most-once persistence. Kept the doc's sharper points that Notion lacks: data-domain compaction must be a clamp+GROUP BY/SUM reduce (a bare UPDATE is wrong), and the self-reference answer. https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H
What
A design doc proposing recorders: a way to house transformations that must record a decision made at processing time and never recompute it — non-deterministic, side-effecting, or time-dependent logic (including UDFs). The guarantee is at-least-once application of the function and at-most-once persistence of its result (committed once via a per-commit CAS).
Write-side complement to the
CHANGEStable function (#36869, draft); a deliberate revival of the removed Continual Tasks feature (database-issues#9694 / PR #35967), pulled for lack of design consensus and incompleteness, not because the model is unsound.Docs:
doc/developer/design/20260604_recorders.md(conceptual) and20260604_recorders_implementation.md(feasibility). The model matches the WG "Recorders" design (Notion); these docs are the long-form version.The calculus
differentiate(CHANGES) andintegrate(INTEGRATE) are a carrier-preserving inverse pair, both in the input's timeline (domain A). The cross-domain, non-deterministic leg — picking the commit timeT, sampling frozen values likenow()— lives only inRECORD.freezeissample-and-hold.The surface (no new collection kind)
WITH (PROGRESS, …)(initialized to(0,0); empty progress = sealed frontier). NoDELTA TABLEcollection kind.change_diffis data, not the persist multiplicity.CHANGESemits each change as a+1row with the signed count in the column.INTEGRATE(table)is argument-free — carriers and the progress collection come from the table; it accumulatesDIFFper row, thresholding atmax(0, Σ)internally (safe by construction;repeat_rownot exposed). A stateful reduce, memory ∝ live output.CREATE SOURCE … EXPOSE PROGRESS ASprecedent);INTEGRATEreclocks through it. ARECORDwriter only writes a lane. Taking a table (not arbitrary SQL) is what keeps the progress lookup unambiguous.RECORDbody, not inINTEGRATE; first-seen/upsert bodies are self-referential (Tier 3).DELETE; the integral-preserving form is data-domain compaction — aclamp + GROUP BY/SUMreduce, not a bareUPDATE.Decisions worth reviewers' attention
Bis the table's single shard frontier ⇒ exactly one progress collection per table; allRECORDs andINTEGRATEs share it (multiple readers consistent by construction). Multi-writer = per-writer lanesR_i: B → X_i, table A-completeness= meet_i R_i[B_upper]at read time (a merged frontier can't work). Drop the last writer → freeze or seal (open).RECORDbody's inputs.RECORDbody = frozen; must be diagnosable, never silent (EXPLAIN+ plan-timeNOTICE). In a body,now()(frozen wall-clock) ≠mz_now()(domain-A event time).RETAIN HISTORYsits (table vsINTEGRATEconsumer vs recorder — leaningINTEGRATE) is open.DELETE+ advancingsince; a cascadeDELETEalone is not erasure.SQL-consistency pass
An agent reviewed the surface against Materialize's grammar (file:line) and prior art:
USINGclause / tableWITH (…), not=>named args (MZ has no named-arg support;=>already means map-entry).CHANGES'sAS OFparses likeSUBSCRIBE (query) AS OF …. Dropped theAS RECORD(...)wrapper;IN DOMAIN→WITH (TIMELINE = …).DIFFis a signed multiplicity, not an op-enum (cf. SnowflakeMETADATA$ACTION, Flink+I/-U, Debeziumc,u,d);CHANGESknowingly collides with Snowflake'sCHANGESclause.Status
Draft, for discussion. De-risked by the
CHANGESPR (#36869) and theBEGIN CONTINUAL TRANSACTIONprototype. Gating dependency: the (unbuilt) OCC timestamped-write substrate —INTEGRATEand bounding DML are off that critical path. Collapsing to a regular table + progress collection removes the speculativeDELTA TABLEkind (impl M2) and reshapes engine-owned compaction (M3/M5). Open agenda: co-design theCHANGEScarrier spelling with #36869, theRECORDERkeyword, commit-timestamp policy,RETAIN HISTORYplacement, freeze-vs-seal on drop-all.Stakeholders: @antiguru + the WG Continual Tasks folks (Aljoscha, Seth, Frank).
https://claude.ai/code/session_015YFH7J7PaEqkBSrAQYaq8H