Skip to content

Broker show alive=false in controller getSyncStateSet command #213

@drivebyer

Description

@drivebyer

BUG REPORT

  1. Please describe the issue you observed:
    I deployed three controllers, two brokers, and one nameserver using an operator. After ensuring all pods were ready, I executed commands on the nameserver and the controllers.

On the nameserver, I ran the following command:

[root@master0 ~]# kubectl -n mcamel-system exec -it name-service-0 -- ./mqadmin clusterList -n 127.0.0.1:9876
#Cluster Name           #Broker Name            #BID  #Addr                  #Version              #InTPS(LOAD)     #OutTPS(LOAD)  #Timer(Progress)        #PCWait(ms)  #Hour         #SPACE    #ACTIVATED
broker                  broker-0                0     192.168.137.126:10911  V5_1_4                 0.00(0,0ms)       0.00(0,0ms)  0-0(0.0w, 0.0, 0.0)               0  474775.65     0.6800          true
broker                  broker-0                2     192.168.84.199:10911   V5_1_4                 0.00(0,0ms)       0.00(0,0ms)  2-0(0.0w, 0.0, 0.0)               0  474775.65     0.6500         false

The output seemed to be satisfactory.

On the controller, I executed:

[root@master0 ~]# kubectl -n mcamel-system exec -it controller-1 -- ./mqadmin getSyncStateSet -a 127.0.0.1:9878 -c broker -b broker-0

#brokerName	broker-0
#MasterBrokerId	1
#MasterAddr	192.168.137.126:10911
#MasterEpoch	1
#SyncStateSetEpoch	1
#SyncStateSetNums	1

InSyncReplica:	ReplicaIdentity{brokerName='broker-0', brokerId=1, brokerAddress='192.168.137.126:10911', alive=true}

NotInSyncReplica:	ReplicaIdentity{brokerName='broker-0', brokerId=2, brokerAddress='192.168.84.199:10911', alive=false}

It appears that the address 192.168.84.199:10911 is not alive with respect to the controller.

Additionally, I discovered an error log on 192.168.137.126:10911:

2024-02-29 15:50:26 ERROR AutoSwitchHAService_Executor_1 - Error happen when change SyncStateSet, broker:broker-0, masterAddress:192.168.137.126:10911, masterEpoch:1, oldSyncStateSet:[1], newSyncStateSet:[1, 2], syncStateSetEpoch:1
org.apache.rocketmq.client.exception.MQBrokerException: CODE: 2006  DESC: Rejecting alter syncStateSet request because the replicas {2} don't alive
For more information, please visit the url, https://rocketmq.apache.org/docs/bestPractice/06FAQ
	at org.apache.rocketmq.broker.out.BrokerOuterAPI.alterSyncStateSet(BrokerOuterAPI.java:1215)
	at org.apache.rocketmq.broker.controller.ReplicasManager.doReportSyncStateSetChanged(ReplicasManager.java:761)
	at org.apache.rocketmq.store.ha.autoswitch.AutoSwitchHAService.lambda$null$0(AutoSwitchHAService.java:263)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at org.apache.rocketmq.store.ha.autoswitch.AutoSwitchHAService.lambda$notifySyncStateSetChanged$1(AutoSwitchHAService.java:263)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2024-02-29 15:50:30 INFO ReplicasManager_ScheduledService_1 - Update controller leader address to controller-1.controller-svc-headless:9878
2024-02-29 15:50:31 ERROR AutoSwitchHAService_Executor_1 - Error happen when change SyncStateSet, broker:broker-0, masterAddress:192.168.137.126:10911, masterEpoch:1, oldSyncStateSet:[1], newSyncStateSet:[1, 2], syncStateSetEpoch:1
org.apache.rocketmq.client.exception.MQBrokerException: CODE: 2006  DESC: Rejecting alter syncStateSet request because the replicas {2} don't alive
For more information, please visit the url, https://rocketmq.apache.org/docs/bestPractice/06FAQ
	at org.apache.rocketmq.broker.out.BrokerOuterAPI.alterSyncStateSet(BrokerOuterAPI.java:1215)
	at org.apache.rocketmq.broker.controller.ReplicasManager.doReportSyncStateSetChanged(ReplicasManager.java:761)
	at org.apache.rocketmq.store.ha.autoswitch.AutoSwitchHAService.lambda$null$0(AutoSwitchHAService.java:263)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at org.apache.rocketmq.store.ha.autoswitch.AutoSwitchHAService.lambda$notifySyncStateSetChanged$1(AutoSwitchHAService.java:263)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
  • What did you expect to see?
    All broker shows alive=true

  • What did you see instead?

  1. Please tell us about your environment:
    RocketMQ 5.1.4

  2. Other information (e.g. detailed explanation, logs, related issues, suggestions how to fix, etc):
    When I deploy a single-replica controller, this issue does not occur.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions