RabbitMQ app in some cases may retain stopped w/o being noticed by OCF monitor action

Bug #1458828 reported by Bogdan Dobrelya
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin
5.1.x
Won't Fix
High
Denis Meltsaykin
6.0.x
Won't Fix
High
Denis Meltsaykin

Bug Description

Steps to reproduce:

0) deploy any HA cluster of 3 controllers.
Assume we have a node-1 as a primary controller with the rabbit multistate clone resource master running, and
node-2, node-3 as running multistate resource slaves.

1) Move rabbit master resource to node-2
2) wait for ostf ha passed
3) kill the node-2, which should be a rabbit master now
4) wait for ostf ha passed
5) power on node-2
6) wait for it joined the rabbit cluster (wait for ostf ha passed)
7) repeat 1-6

Expected:
A. The rabbitmq cluster assembles from two remaining nodes, having 1 master and 1 slave after the step #4, no longer than in 5 minutes.
B. The rabbitmq cluster assembles from three nodes, having 1 master and 2 slaves after the steps #2, #6, no longer than in 5 minutes.
C. At least one node is always available for AMQP connections and it's queues and messages synced with other nodes, if any available as well.

Actual: after the step #6, the node-2 will have its rabbit app stopped and will not be shown as running db node in rabbitmcrl report by other nodes and by itself.

Note, sometimes this case can be reproduced after few iterations, sometimes it may take up to 20 or 40 - it looks random.

ISO info:
      build_id: 2015-05-25_20-55-26
      build_number: '466'

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

related sys tests update request https://bugs.launchpad.net/fuel/+bug/1458830

tags: added: to-be-covered-by-tests
tags: added: ha rabbitmq
Changed in fuel:
assignee: nobody → Bogdan Dobrelya (bogdando)
status: New → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
milestone: none → 6.1
importance: Undecided → High
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/185692

Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Vladimir Kuklin (vkuklin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/185530

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/185692
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=d71c3722f736f87996bf2dbda8e133b87946eca5
Submitter: Jenkins
Branch: master

commit d71c3722f736f87996bf2dbda8e133b87946eca5
Author: Vladimir Kuklin <email address hidden>
Date: Tue May 26 21:33:34 2015 +0300

    Add second monitor operation to check RabbitMQ

    This commit checks whether there is a running
    cluster of rabbitmq and if rabbitmq app is running
    on the node and exits with non-zero code if
    current node is not running rabbitmq, but should
    do so

    Change-Id: I2098405b39ade7325b94781aeb997de0937bdf4c
    Closes-bug: #1458828

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Setting this as Won't Fix for 5.1.1-updates and 6.0-updates, as such a complex change cannot be delivered in the scope of the Maintenance Update. Also, the possible solution of the backporting of RabbitMQ OCF script is covered in details by the Operations Guide from the official documentation of the Product.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.