rabbitmq cluster fell to pieces and didn't heal itself

Bug #1484185 reported by Anastasia Kuznetsova
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Anastasia Kuznetsova

Bug Description

Steps To Reproduce:
Env with ISO 157 (3 controllers, 1 compute) was deployed and stay for a one night. After some unknown events rabbitmq cluster fell to a few pieces:

2015-08-12T15:26:41.960518+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
2015-08-12T15:26:42.044585+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): get_status() returns generic error 1
2015-08-12T15:26:42.098906+00:00 info: INFO: p_rabbitmq-server: get_monitor(): ensuring this slave does not get promoted.
2015-08-12T15:27:00.428028+00:00 info: INFO: p_rabbitmq-server: get_monitor(): CHECK LEVEL IS: 0
2015-08-12T15:27:02.419380+00:00 info: INFO: p_rabbitmq-server: get_monitor(): get_status() returns 0.
2015-08-12T15:27:02.497602+00:00 info: INFO: p_rabbitmq-server: get_monitor(): also checking if we are master.
2015-08-12T15:27:04.253156+00:00 info: INFO: p_rabbitmq-server: get_monitor(): master attribute is 1
2015-08-12T15:27:05.208975+00:00 info: INFO: p_rabbitmq-server: get_monitor(): checking if rabbit app is running
2015-08-12T15:27:05.226972+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. checking if we are the part of healthy cluster
2015-08-12T15:27:05.301620+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. looking for master on node-3.domain.tld
2015-08-12T15:27:05.374315+00:00 info: INFO: p_rabbitmq-server: get_monitor(): fetched master attribute for node-3.domain.tld. attr value is 1
2015-08-12T15:27:05.408902+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. looking for master on node-1.domain.tld
2015-08-12T15:27:05.514029+00:00 info: INFO: p_rabbitmq-server: get_monitor(): fetched master attribute for node-1.domain.tld. attr value is 0
2015-08-12T15:27:05.522496+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. master is node-1.domain.tld
2015-08-12T15:27:08.024116+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running. looking for master on node-2.domain.tld
2015-08-12T15:27:08.075369+00:00 info: INFO: p_rabbitmq-server: get_monitor(): fetched master attribute for node-2.domain.tld. attr value is 1
2015-08-12T15:27:08.099939+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): rabbit node is running out of the cluster
2015-08-12T15:27:08.220098+00:00 err: ERROR: p_rabbitmq-server: get_monitor(): get_status() returns generic error 1

and didn't heal itself, so there are a lot of error messages in logs of other services

Here is an environment snapshot https://drive.google.com/file/d/0BzU7h7sQOuiqTG14UG5oM2FNbmc/view?usp=sharing (size is 635 MB)

Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :
Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :
summary: - rabbitmq cluster felt to pieces and didn't heal itself
+ rabbitmq cluster fell to pieces and didn't heal itself
description: updated
Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :
description: updated
Changed in mos:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Oslo (mos-oslo)
milestone: none → 7.0
description: updated
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Anastasia, please try to reproduce the issue once more and if it occurs provide us the environment.

Changed in mos:
status: Confirmed → Incomplete
assignee: MOS Oslo (mos-oslo) → Anastasia Kuznetsova (akuznetsova)
Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :

The issue has not been occurred for a long time, I couldn't reproduce it again.

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.