Skip to content

HATest.testSemiSyncReplica can fail before HA slave ack is visible on master #10494

@RongtongJin

Description

@RongtongJin

Problem

HATest.testSemiSyncReplica can be flaky with:

expected:<PUT_OK> but was:<FLUSH_SLAVE_TIMEOUT>

The test setup waits until the slave-side HA client enters TRANSFER, then immediately starts semi-sync writes. Entering TRANSFER only proves the slave has connected locally. The master-side HAConnection may not have received the slave's initial offset report yet, leaving slaveAckOffset at -1 during the first synchronous replication request.

Impact

On slower or busy CI machines, the first asyncPutMessage can race the initial slave ack report and time out even though the HA connection is otherwise healthy.

Proposed fix

Make the test wait for the actual readiness condition needed by semi-sync replication: the master-side HA connection is in TRANSFER and its slaveAckOffset has caught up to the slave's current max physical offset before sending messages.

Validation

Ran locally with Maven 3.9.9:

mvn -pl store -am -Dtest=HATest#testSemiSyncReplica -DskipITs -DfailIfNoTests=false test
mvn -pl store -am -Dtest=HATest -DskipITs -DfailIfNoTests=false test

The full HATest run reported Tests run: 4, Failures: 0, Errors: 0, Skipped: 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions