[library] Rabbitmq OCF script fails to handle 1 controller failure

Bug #1336777 reported by Vladimir Kuklin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin

Bug Description

From https://review.openstack.org/#/c/93956/:

Rabbit HA-cluster broken with this patchset. For reproduce: 1.upload patch to master mode 2. deploy HA|Centos|Neutron gre 3. poweroff second controller 4. rabbitmaster are failed to start.

Tags: ha
tags: added: ha
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel QA Team (fuel-qa)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Dmitry Ilyin (idv1985)
summary: - Rabbitmq OCF script fails to handle 1 controller failure
+ [library] Rabbitmq OCF script fails to handle 1 controller failure
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/106865
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=33a9794bdf59aefb815137632b039c011095cfa3
Submitter: Jenkins
Branch: master

commit 33a9794bdf59aefb815137632b039c011095cfa3
Author: Vladimir Kuklin <email address hidden>
Date: Tue Jul 15 00:50:20 2014 +0400

    Refactoring of rabbitmq OCF script

    1) Store attributes in CIB instead of files
    2) Do not use ocf_run if command may fail
    3) Eliminate master_score race condition:
    set master_score to 1000 for the older nodes
    and do not forget to update their uptime value
    4) fix messed interleave/ordered settings
    5) set failure-timeout to 60 seconds to recover
    from RabbitMQ master node failure
    6) for slave nodes only run beam and
    start rabbitmq only if there is master promoted
    7) stop RMQ app on slaves in case of master demotion
    8) clean up other nodes master attribute in case
    of promotion
    9) fix exit codes for failed services start and cluster
    joining
    10) get running nodes into running_nodes variable
    11) apply timeout command to cluster_status function

    Closes-bug: #1339080
    Closes-bug: #1336777

    Change-Id: I271c6d7db4cf8fe4c9dfc7599954cb0ec8813293

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package corosync has been built from changeset: http://gerrit.mirantis.com/19795
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-19795/centos
You can build an ISO with this package:
make iso EXTRA_RPM_REPOS="osci-testing,http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-19795/centos"

Changed in fuel:
status: Fix Committed → In Progress
Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.