Skip to content

CASSANDRA-21216/CASSANDRA-21260: Clear savedBuffer and savedNextKey in BTree.FastBuilder.reset()#4746

Open
andresbeckruiz wants to merge 1 commit intoapache:cassandra-4.1from
andresbeckruiz:abeckruiz/CASSANDRA-21216/cassandra-4.1
Open

CASSANDRA-21216/CASSANDRA-21260: Clear savedBuffer and savedNextKey in BTree.FastBuilder.reset()#4746
andresbeckruiz wants to merge 1 commit intoapache:cassandra-4.1from
andresbeckruiz:abeckruiz/CASSANDRA-21216/cassandra-4.1

Conversation

@andresbeckruiz
Copy link
Copy Markdown

@andresbeckruiz andresbeckruiz commented Apr 16, 2026

Summary

BTree.FastBuilder.reset() does not clear savedBuffer or savedNextKey, allowing stale ColumnMetadata objects to leak when a FastBuilder is reused from the thread-local pool after an exception during message deserialization.

During a schema disagreement, a READ_REQ deserialization failure on a replica leaves a FastBuilder in a dirty state with savedBuffer and savedNextKey populated from the source table's ColumnMetadata. When the same thread reuses that FastBuilder for a subsequent BTree construction, the stale entries leak into the new BTree, causing:

  1. ClassCastException (CASSANDRA-21216): ColumnMetadata objects from the source table end up in a Row BTree, causing ClassCastException: ColumnMetadata cannot be cast to Row during mutations, reads, or flushes. This occurs on the large-message path where messages exceeding ~64KB are deserialized on SEPWorker threads that also service mutation tasks.
  2. SSTable header corruption (CASSANDRA-21260): Stale columns from the source table's savedBuffer leak into a victim table's SerializationHeader via deletion-only mutations, writing foreign column entries into the SSTable metadata on disk. This can also occur on the small-message path via Netty event loop thread reuse, lowering the trigger threshold to tables with more than 31 columns.

More context regarding these bugs can be found in this discussion thread.

Fix

Null out savedBuffer and savedNextKey in FastBuilder.reset() for both leaf and branch BTree nodes. Also add savedNextKey = null to AbstractUpdater.reset() for consistency.

Test plan

  • JVM Dtest BTreeFastBuilderContaminationTest:
    • testSchemaDisagreementCorruptsPartitionViaFastBuilder: Wide table (4200 columns) triggers large-message deserialization on SEPWorker threads, verifies no ClassCastException occurs after schema disagreement.
    • testSmallMessageContaminatesSSTableHeaderViaNettyEventLoop: Small-message scenario (150 columns) triggers deserialization on Netty event loop, verifies no foreign columns appear in victim SSTable headers.
  • Unit test BTreeTest.testFastBuilderResetClearsSavedState: Verifies FastBuilder.reset() clears savedBuffer/savedNextKey when a builder is abandoned without calling build().
  • All existing BTreeTest tests pass (12/12).

Patch by Andrés Beck-Ruiz, Runtian Liu, reviewed by Zhao Yang, Benedict Elliott Smith for CASSANDRA-21216, CASSANDRA-21260

Co-authored-by: Runtian Liu runtian@uber.com

Copy link
Copy Markdown
Contributor

@jasonstack jasonstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

@belliottsmith
Copy link
Copy Markdown
Contributor

Looks good, and great catch. Why don't we move the AbstractUpdater.reset() method up to AbstractFastBuilder though? I don't think there's a reason not to share it, and we only need to get the logic right once. It looks like we also are not clearing hasRightChild in the FastBuilder version, for instance.

@andresbeckruiz
Copy link
Copy Markdown
Author

Looks good, and great catch. Why don't we move the AbstractUpdater.reset() method up to AbstractFastBuilder though? I don't think there's a reason not to share it, and we only need to get the logic right once. It looks like we also are not clearing hasRightChild in the FastBuilder version, for instance.

Sounds good, I moved AbstractUpdater.reset() up to AbstractFastBuilder along with helper methods clearLeafBuffer and clearBranchBuffer. Changes to reset now include:

  • Setting leaf().count = 0 and branch.count = 0, as the FastBuilder does not call completeBuild on the exception path
  • leaf().savedNextKey = null and branch.savedNextKey = null which were missing from AbstractUpdater.reset()
  • Adding leaf().savedBuffer[0] != null guard for consistency with the branch logic

Looks like hasRightChild was added in 7e32d92 for cassandra-6.0 and trunk, so I will make a separate patch for those branches. That patch will also exclude the JVM Dtest test as the schema disagreement condition should not occur in these versions.

@belliottsmith
Copy link
Copy Markdown
Contributor

belliottsmith commented May 7, 2026

Looks like hasRightChild was added in 7e32d92 for cassandra-6.0 and trunk, so I will make a separate patch for those branches

Ok great, yes I see that now. But moving to a single method for resetting helps ensure we update all of the relevant extensions in future.

LGTM +1

@belliottsmith belliottsmith self-requested a review May 7, 2026 08:24
@jasonstack
Copy link
Copy Markdown
Contributor

jasonstack commented May 8, 2026

do you mind checking if LongBtreeTest#testFastBuilder passed locally? seems to be a regression in the refactoring

…, causing ClassCastException and SSTable header corruption during schema disagreement

patch by Andrés Beck-Ruiz, Runtian Liu; reviewed by Zhao Yang, Benedict Elliott Smith for CASSANDRA-21216

Co-authored-by: Runtian Liu <runtian@uber.com>
@andresbeckruiz andresbeckruiz force-pushed the abeckruiz/CASSANDRA-21216/cassandra-4.1 branch from 7eb41f6 to 3343248 Compare May 8, 2026 19:37
@andresbeckruiz
Copy link
Copy Markdown
Author

do you mind checking if LongBtreeTest#testFastBuilder passed locally? seems to be a regression in the refactoring

Just checked, that test was failing here when asserting that the FastBuilder buffers are empty. Seems like the clearLeafBuffer and clearBranchBuffer optimizations assume buffer invariants (eg: first null entry indicates empty buffer) that don't hold for the FastBuilder.

In order to keep a consolidated reset method that works for both objects, I replaced these helper methods with Arrays.fill(buffer, null), matching the original FastBuilder reset behavior. This update ensures that LongBtreeTest#testFastBuilder passes, along with the other BTree unit tests. I can also revert the refactor if that is preferred so that the AbstractUpdater can keep the optimization in its separate reset method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants