[WIP][SQL][DML] DSv2 Transaction Management#55278
Open
andreaschat-db wants to merge 10 commits intoapache:masterfrom
Open
[WIP][SQL][DML] DSv2 Transaction Management#55278andreaschat-db wants to merge 10 commits intoapache:masterfrom
andreaschat-db wants to merge 10 commits intoapache:masterfrom
Conversation
ed9f4af to
3f45e22
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently, DSv2 is lacking the required abstractions and machinery for allowing transactionality in DML operations.
First, this PR introduces the public Java interfaces that connectors need to implement. These are the following:
TransactionInfo— carries the transaction metadata.Transaction— represents a transaction. Exposes catalog(), commit(), abort(), and close().TransactionalCatalogPlugin— extends CatalogPlugin with beginTransaction(TransactionInfo) method.Second, it expands Spark to manage the Transaction lifecycle. This is done as follows:
Pre-analysis. At pre-analysis, we look at the plan for logical nodes that implement the
TransactionalWritetrait. When a plan contains such a node we initiate a transaction. Commands that do not result in execution, e.g. EXPLAIN, we should never initiate a transaction.Analysis. We create a copy of the analyzer that contains the
TransactionAwareCatalogManagerinstance. This is aCatalogManagerthat is aware of the current transaction and intercepts catalog(name) and currentCatalog lookups by returning the transaction catalog instead of the default session catalog. All catalog look ups within the query go through theTransactionAwareCatalogManager.Planning. At planning we need to propagate the transaction to the execution nodes. This is necessary so we can commit the transaction before the relation cache is refreshed. This process occurs post planning where the plan is fully formed. We walk the plan and inject the transaction callback to each
TransactionalExecnode.Execution.
V2ExistingTableWriteExecnodes invoke the transaction commit right after the write operation and before the relation cache is refreshed.Abort/close. Every QE operation, i.e. analyzed, commandExecuted, optimizedPlan, sparkPlan, executedPlan etc., is wrapped with executeWithTransactionContext closure. This ensures that if an issue occurs at any point throughout the QE, the transition is aborted and closed.
Note, since we can have multiple QueryExecution instantiations for the same operation, the abort/close operations need to be idempotent. Due the nested structure of query execution, we might end up triggering nested abort/close calls.
Future work
This PR is based on @aokolnychyi's prototype.
Why are the changes needed?
We are currently lacking the required abstractions and machinery for DSv2 connectors to implement transactions in write operations.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Introduced new suites and added tests in existing suites.
Was this patch authored or co-authored using generative AI tooling?
Claude Sonnet 4.6.