Comment 24 for bug 1472230

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

@Artem,

this issue is floating and I've just got it on bare-metal lab after primary controller shutdown:

 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     Masters: [ node-29.mirantis.com ]
     Slaves: [ node-28.mirantis.com node-35.mirantis.com ]
     Stopped: [ node-30.mirantis.com ]

Pacemaker says that RabbitMQ is running on node-35, but it's actually down:

root@node-35:~# ps auxfw | grep [r]abbit
rabbitmq 7332 0.0 0.0 90832 12956 ? Ss 08:58 0:03 /usr/bin/python /usr/bin/rabbit-fence.py
root@node-35:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-35' ...
Error: unable to connect to node 'rabbit@node-35': nodedown

rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-29' ...
[{nodes,[{disc,['rabbit@node-28','rabbit@node-29','rabbit@node-35']}]},
 {running_nodes,['rabbit@node-29']},
 {cluster_name,<<"<email address hidden>">>},
 {partitions,[]}]

There are no issues with server resources (most of controllers have 16+ GB RAM, 8 CPUs and SSD drives): http://paste.openstack.org/show/472715/

Also, the fix https://review.openstack.org/#/c/223548 was merged to master (8.0) only, the patch for 7.0 https://review.openstack.org/#/c/223552/ is still on review.