CASSANDRA-21216/CASSANDRA-21260: Clear savedBuffer and savedNextKey in BTree.FastBuilder.reset()#4746
Conversation
|
Looks good, and great catch. Why don't we move the AbstractUpdater.reset() method up to |
Sounds good, I moved
Looks like |
Ok great, yes I see that now. But moving to a single method for resetting helps ensure we update all of the relevant extensions in future. LGTM +1 |
7224e29 to
7eb41f6
Compare
|
do you mind checking if |
…, causing ClassCastException and SSTable header corruption during schema disagreement patch by Andrés Beck-Ruiz, Runtian Liu; reviewed by Zhao Yang, Benedict Elliott Smith for CASSANDRA-21216 Co-authored-by: Runtian Liu <runtian@uber.com>
7eb41f6 to
3343248
Compare
Just checked, that test was failing here when asserting that the In order to keep a consolidated reset method that works for both objects, I replaced these helper methods with |
Summary
BTree.FastBuilder.reset()does not clearsavedBufferorsavedNextKey, allowing staleColumnMetadataobjects to leak when aFastBuilderis reused from the thread-local pool after an exception during message deserialization.During a schema disagreement, a
READ_REQdeserialization failure on a replica leaves aFastBuilderin a dirty state withsavedBufferandsavedNextKeypopulated from the source table'sColumnMetadata. When the same thread reuses thatFastBuilderfor a subsequent BTree construction, the stale entries leak into the newBTree, causing:ColumnMetadataobjects from the source table end up in aRowBTree, causingClassCastException: ColumnMetadata cannot be cast to Rowduring mutations, reads, or flushes. This occurs on the large-message path where messages exceeding ~64KB are deserialized onSEPWorkerthreads that also service mutation tasks.savedBufferleak into a victim table'sSerializationHeadervia deletion-only mutations, writing foreign column entries into the SSTable metadata on disk. This can also occur on the small-message path via Netty event loop thread reuse, lowering the trigger threshold to tables with more than 31 columns.More context regarding these bugs can be found in this discussion thread.
Fix
Null out
savedBufferandsavedNextKeyinFastBuilder.reset()for both leaf and branch BTree nodes. Also addsavedNextKey = nulltoAbstractUpdater.reset()for consistency.Test plan
BTreeFastBuilderContaminationTest:testSchemaDisagreementCorruptsPartitionViaFastBuilder: Wide table (4200 columns) triggers large-message deserialization onSEPWorkerthreads, verifies noClassCastExceptionoccurs after schema disagreement.testSmallMessageContaminatesSSTableHeaderViaNettyEventLoop: Small-message scenario (150 columns) triggers deserialization on Netty event loop, verifies no foreign columns appear in victim SSTable headers.BTreeTest.testFastBuilderResetClearsSavedState: VerifiesFastBuilder.reset()clearssavedBuffer/savedNextKeywhen a builder is abandoned without callingbuild().BTreeTesttests pass (12/12).Patch by Andrés Beck-Ruiz, Runtian Liu, reviewed by Zhao Yang, Benedict Elliott Smith for CASSANDRA-21216, CASSANDRA-21260
Co-authored-by: Runtian Liu runtian@uber.com