Deployment fails because RabbitMQ is down on one of the nodes

Bug #1621931 reported by Dmitry Mescheryakov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Dmitry Mescheryakov
Mitaka
Invalid
Undecided
Unassigned

Bug Description

Version: 10.0

Steps to reproduce:
1. Deploy an environment consisting of 3 controllers.

With rather high probability deployment fails because RabbitMQ fails to start on one of the nodes. Example fail: https://ci.fuel-infra.org/job/master.puppet-openstack.fuel-library.pkgs.ubuntu.review_in_fuel_library/2522/

Here unpack the snapshot and see see fuel/var/log/remote/node-3.test.domain.local/lrmd.log Here the problem is that the node was demoted, but not stopped. During demote rabbit app was shut down. But that is not discovered during subsequent monitor calls because we deleted monitor with CHECK level 30. So overall, it is a problem of outdated OCF script in our rabbitmq-server package for 10.0.

Failure of RabbitMQ to start on one node leads to deployment failure: http://paste.openstack.org/show/570269/

Tags: area-library
tags: added: area-library
Changed in fuel:
importance: Undecided → High
assignee: nobody → Dmitry Mescheryakov (dmitrymex)
milestone: none → 10.0
status: New → In Progress
importance: High → Critical
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Should be fixed by https://review.fuel-infra.org/#/c/26188. Right now we need to wait for CI team to enable that new package in the CI.

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Andres Toomsalu (andres-active) wrote :

Im having same problem with 9.1 as well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.