RabbitMQ OCF applications check should rely on the 'kernel' module running instead of exit code

Bug #1446251 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Alexander Nevenchannyy
5.1.x
Fix Committed
Critical
Bogdan Dobrelya
6.0.x
Fix Committed
Critical
Bogdan Dobrelya

Bug Description

This issue was discovered at the scale lab, when rabbit nodes were running under load.

In the get_status() https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cluster/files/ocf/rabbitmq#L852-L853, the command rabbitmqctl eval 'rabbit_misc:which_applications().' may return an empty result [] and exit code will be 0.
Current logic relies on the exit code only. But it should check at least the 'kernel' module running, otherwise report "not running".

These issues may appear only when the specified timeout for commands to stop or wait have exceeded. That is a usual case under load, hence is critical by its impact.

Tags: scale
Changed in fuel:
milestone: none → 6.1
status: New → In Progress
assignee: nobody → Bogdan Dobrelya (bogdando)
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/175457

Dina Belova (dbelova)
tags: added: scale
Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Alexander Nevenchannyy (anevenchannyy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/175457
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=bc4a7ec8093db81d8d1de478788825e228890b05
Submitter: Jenkins
Branch: master

commit bc4a7ec8093db81d8d1de478788825e228890b05
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Apr 20 17:31:33 2015 +0200

    Fix RabbitMQ apps eval in OCF

    W/o this fix, if there are no apps running and rabbit node is actually
    not functioning, get_status() would still report 0 considering the rabbit
    resource is running. This is an issue as it may lead to the situations
    when the resource reported OK, but in fact, the rabbit node is not a cluster
    member.

    The solution is to not rely only on which_applications() eval exit code
    and test if the kernel app is running. Otherwise consider the pacemaker
    resource is "not running" as well.

    Closes-bug: #1446251

    Change-Id: Ia2fcb18abb3d977c5fcb26bfdeac864b6834f478
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.0)

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/179750

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/179751

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/6.0)

Reviewed: https://review.openstack.org/179750
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=4369fc9c20db94dc641d6ce8e1179a6a57306546
Submitter: Jenkins
Branch: stable/6.0

commit 4369fc9c20db94dc641d6ce8e1179a6a57306546
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Apr 20 17:31:33 2015 +0200

    Fix RabbitMQ apps eval in OCF

    W/o this fix, if there are no apps running and rabbit node is actually
    not functioning, get_status() would still report 0 considering the rabbit
    resource is running. This is an issue as it may lead to the situations
    when the resource reported OK, but in fact, the rabbit node is not a cluster
    member.

    The solution is to not rely only on which_applications() eval exit code
    and test if the kernel app is running. Otherwise consider the pacemaker
    resource is "not running" as well.

    Closes-bug: #1446251

    Change-Id: Ia2fcb18abb3d977c5fcb26bfdeac864b6834f478
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit bc4a7ec8093db81d8d1de478788825e228890b05)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/179751
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=0237cfb11a8cd68fdf511eef9d0999cde1324584
Submitter: Jenkins
Branch: stable/5.1

commit 0237cfb11a8cd68fdf511eef9d0999cde1324584
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Apr 20 17:31:33 2015 +0200

    Fix RabbitMQ apps eval in OCF

    W/o this fix, if there are no apps running and rabbit node is actually
    not functioning, get_status() would still report 0 considering the rabbit
    resource is running. This is an issue as it may lead to the situations
    when the resource reported OK, but in fact, the rabbit node is not a cluster
    member.

    The solution is to not rely only on which_applications() eval exit code
    and test if the kernel app is running. Otherwise consider the pacemaker
    resource is "not running" as well.

    Closes-bug: #1446251

    Change-Id: Ia2fcb18abb3d977c5fcb26bfdeac864b6834f478
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit bc4a7ec8093db81d8d1de478788825e228890b05)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.