feat: add transaction-level table_metadata_updates#6255
feat: add transaction-level table_metadata_updates#6255fecet wants to merge 1 commit intolance-format:mainfrom
Conversation
PR Review: feat: add transaction-level table_metadata_updatesClean design overall. Two conflict-detection gaps that can cause silent metadata loss: P0:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d708ec01c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn keys_overlap(&self, other: &Self) -> bool { | ||
| let self_keys: std::collections::HashSet<&str> = self | ||
| .update_entries | ||
| .iter() | ||
| .map(|e| e.key.as_str()) | ||
| .collect(); | ||
| other | ||
| .update_entries | ||
| .iter() | ||
| .any(|e| self_keys.contains(e.key.as_str())) |
There was a problem hiding this comment.
Treat replace=true metadata updates as conflicting
If either UpdateMap uses replace=true, apply_update_map clears the entire table metadata before inserting the listed entries, so a concurrent write to any other key is order-dependent. With the current keys_overlap() implementation, replace={"a": ...} and an incremental update to "b" are treated as compatible because the explicit key sets are disjoint, but whichever transaction rebases last will silently discard the other's metadata. This affects both transaction-level metadata updates and UpdateConfig table-metadata conflicts.
Useful? React with 👍 / 👎.
| // Check transaction-level table_metadata_updates key overlap | ||
| if let (Some(self_updates), Some(other_updates)) = ( | ||
| &self.transaction.table_metadata_updates, | ||
| &other_transaction.table_metadata_updates, | ||
| ) { | ||
| if self_updates.keys_overlap(other_updates) { | ||
| return Err(self.incompatible_conflict_err(other_transaction, other_version)); |
There was a problem hiding this comment.
Compare transaction-level metadata keys with UpdateConfig
This new conflict check only compares Transaction.table_metadata_updates against the same field on the other transaction. A concurrent UpdateConfig { table_metadata_updates: ... } still stores its table-metadata write inside operation, and check_append_txn / check_update_config_txn currently allow Append↔UpdateConfig, so two commits that update the same metadata key can both succeed and the later rebase simply wins. That leaves the new append-with-metadata path unprotected against the existing UpdateConfig path.
Useful? React with 👍 / 👎.
| //TODO: handle batch transaction merges in the future | ||
| transaction_properties: None, | ||
| table_metadata_updates: None, |
There was a problem hiding this comment.
Preserve table_metadata_updates in batch append commits
After this change, append transactions can legally carry table_metadata_updates, but execute_batch() always synthesizes the merged append transaction with table_metadata_updates: None. A commit_batch() caller that batches append+cursor/checkpoint transactions will get all of the data fragments committed while every metadata mutation is silently dropped, which breaks the new atomic append+metadata use case.
Useful? React with 👍 / 👎.
3d708ec to
5e3517e
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Hi maintainers, friendly ping — PTAL when you get a chance 🙏 @Xuanwo @wjones127 |
d5d363d to
6a59ee7
Compare
wjones127
left a comment
There was a problem hiding this comment.
This looks like an interesting way to solve this.
My thinking has been in a more comprehensive direction, described here: #5960. I could move that along and break out a story just focused on getting basic operations with metadata updates as the initial pass.
Or we could do something like this for the short term.
Because this is a format change, and any format change we need to support basically forever, we need to be careful about what we add. Take a look at that proposal I linked and let me know what you think.
There was a problem hiding this comment.
issue(blocking): all updates to the protobuf need to first be voted on by the PMC, since they are a format change. https://lance.org/community/voting/
There was a problem hiding this comment.
Hey @wjones127, thanks for the voting pointer.
After re-reading #5960 I'm fully sold on the Action direction — our
append+cursor use case here is just one UserOperation with AddFragments + UpdateTableMetadata, strictly cleaner than the field this PR introduces. Any rough timeline, and is there a piece I could pick up? UpdateTableMetadata feels like a natural first action.
One thought while #5960 is being shaped: would a non-format-changing overlay work as an interim? Reserve a prefix in transaction_properties (e.g. __table_metadata__.*) and have build_manifest() promote those entries into manifest.table_metadata — atomicity still rides on the manifest CAS, proto stays unchanged.
f5926f4 to
994f06a
Compare
994f06a to
f0847f3
Compare
Let Operation::Update commits atomically carry an UpdateMap that's merged into manifest.table_metadata alongside the row changes. Streaming writers (consumer merge_insert, broker-style fold) can record a cursor/offset slot in the same commit as the data they produce, avoiding a separate UpdateConfig round-trip. - proto: Transaction.Update.table_metadata_updates = 9 - conflict_resolver: central key-level check runs before per-op checks so a row conflict on concurrent Updates does not mask an incompatible metadata collision; also handles the cross-kind pairing with Operation::UpdateConfig - Python: LanceOperation.Update.table_metadata_updates - Java JNI: compile-compat passthrough (no Java API surface yet)
f0847f3 to
4896eab
Compare
Summary
table_metadata_updatesfrom anUpdateConfig-only field to aTransaction-level attribute, allowing any operation (e.g. Append) to atomically carry metadata updates merged into the manifest'stable_metadataUpdateMap::keys_overlap()and wire it into the conflict resolver to detect concurrent transactions writing the same metadata keysmodifies_same_metadata()did not checktable_metadata_updatesoverlap for UpdateConfig vs UpdateConfig conflicts (introduced in feat: add table metadata, support incremental updates of all metadata #4350)Motivation
transaction_propertiesare stored in separate.txnfiles, requiring O(N) reads to scan on recovery.table_metadatalives in the manifest and is loaded at dataset open time (O(1)). By allowing metadata updates alongside any operation in a single commit, consumers like persistent message queues can embed cursors atomically with data appends — no separate UpdateConfig round-trip, no scan on recovery.Design
In
build_manifest(), transaction-leveltable_metadata_updatesare applied after operation-level handling (UpdateConfig), so if both are present on the same key, the transaction-level value wins.Conflict detection: two concurrent transactions with overlapping
table_metadata_updateskeys are rejected as incompatible. Disjoint keys are commutative and always compatible.Test plan
cargo check -p lance -p pylancecompiles cleanlytest_table_metadata_key_conflict::disjoint_keys— two Appends with different metadata keys pass conflict checktest_table_metadata_key_conflict::overlapping_keys— two Appends with same metadata key rejected as IncompatibleTransaction