Rabbitmq cluster cannot recover one of the slaves
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
Medium
|
Aleksandr Didenko |
Bug Description
{"build_id": "2014-08-
Centos, Nova HA flat (3 controllers, 2 computes)
Steps to reproduce:
1) Check for rabbit master node in corosync
Master/Slave Set: master_
Masters: [ node-3.
Slaves: [ node-2.
2) Login to master node (here is node-3) and block corosync traffic
iptables -I INPUT -p udp --dport 5405 -m state --state NEW,ESTABLISHED
3) wait 10 min and unblock it
iptables -D INPUT -p udp --dport 5405 -m state --state NEW,ESTABLISHED
4) check rabbitmqctl cluster_status at node-3 - there is no nodes running
Cluster status of node 'rabbit@node-3' ...
[{nodes,
...done.
And logs shows no problems found from OCF pov (/var/log/
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): get_status() returns 0.
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): also checking if we are master.
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): master attribute is (null)
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): checking if rabbit app is running
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): preparing to update master score for node
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): comparing our uptime (0) with node-2.
<30>Aug 8 16:08:43 node-3 lrmd: INFO: p_rabbitmq-server: get_monitor(): get_monitor function ready to return 0
There is also another error:
2014-08-
5) recheck rabbit master node in corosync, e.g.
Master/Slave Set: master_
Masters: [ node-2.
Slaves: [ node-3.
But rabbit cluster is broken and OSTF ha test [15 of 15] [failure] 'Check RabbitMQ is available' is also failing.
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko) |
Other nodes cannot see node-3 in cluster [{disc, ['rabbit@ node-2' ,'rabbit@ node-3' ,'rabbit@ node-4' ]}]}, nodes,[ 'rabbit@ node-4' ,'rabbit@ node-2' ]},
Cluster status of node 'rabbit@node-2' ...
[{nodes,
{running_
{partitions,[]}]
...done.
Cluster status of node 'rabbit@node-4' ... [{disc, ['rabbit@ node-2' ,'rabbit@ node-3' ,'rabbit@ node-4' ]}]}, nodes,[ 'rabbit@ node-2' ,'rabbit@ node-4' ]},
[{nodes,
{running_
{partitions,[]}]
...done.
But affected node-3 sees no running nodes at all [{disc, ['rabbit@ node-2' ,'rabbit@ node-3' ,'rabbit@ node-4' ]}]}]
Cluster status of node 'rabbit@node-3' ...
[{nodes,
...done.