Nova services break after rabbitmq slave restart

Bug #1384125 reported by Dmitry Nikishov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Undecided
Unassigned

Bug Description

I'm using a highly-customized ISO based on 5.1 release. Basically, after a couple of hard resets of one specific rabbit slave nova services (nova-compute and nova-network) residing on compute node stop working.

Environment:
HA/nova-network/Ubuntu/3 mongodbs + 3 controllers + 1 compute

Steps to reproduce:
1. Hard power off rabbit slave.
2. Wait and check that cluster has recovered and OpenStack (OS) is operational (try to spawn an instance).
3. Power on rabbit slave.
4. Wait for it to come online and check that OpenStack is operational.
5. Power off that slave again.
6. Wait for cluster to recover and check that it's operational.
7. Power on that slave back.
8. Wait for it to come online.

Nova services can break either on steps 6 or 8.

Update 1: nova-compute (and nova-network) become unresponsive. When trying to restart, it's process eventually dies. Log: http://paste.openstack.org/show/123044/

Update 2: after a full rabbit restart (crm resource stop, wait, crm resource start) and restart of dead nova-compute/nova-network (if they are dead), OpenStack works fine again.

description: updated
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

Dmitry, thanks for issue. Could you please add diagnostic snapshot?

Changed in fuel:
status: New → Incomplete
description: updated
Revision history for this message
Dmitry Nikishov (nikishov-da) wrote :

Nastya,

since the snapshot is almost 1G in size (this master node has seen quite a lot of OpenStack redeployments), I've shared it with you via google drive.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Dmitry Nikishov (nikishov-da) wrote :

One of the customizations is to use Ubuntu 14.04. It seems that it had an outdated rabbitmq package which couldn't handle ha failover. Thus, the bug has been changed to invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.