Mirantis OpenStack

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1399272
Comment #39

Comment 39 for bug 1399272

Revision history for this message

Alexey Khivin (akhivin) wrote on 2015-03-18:

#39

Thе reason of this http://paste.openstack.org/show/145627/ behaviour is a corosync script

Each time primary controller has shutdown, corosync tries to rebuild RabbitMQ cluster and firstly corosync kills (or sometime stops rabbit application but not beam process ) RabbitMQ on the others nodes. So, whole RabbitMQ cluster becomes unavailable for a several minutes. After several experiments I saw that sometimes RabbitMQ application was stopped by corsync on all nodes and after that whole RabbitMQ cluster became unavailable permanently (or for a too long period of time).