OCF Pacemaker resources for DB/MQ clusters should run with requires=nothing

Bug #1577689 reported by Bogdan Dobrelya
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Bogdan Dobrelya
Mitaka
Fix Released
Medium
Bogdan Dobrelya
Newton
Fix Committed
Medium
Bogdan Dobrelya

Bug Description

Jepsen tests shown [0] a Galera cluster may handle network partitions on its own, without quorum policy enforced by Pacemaker, and maintain serializable transactions in multi-master mode w/o consistency issues. This makes the quorum requirement for it an overkill. Same for MQ cluster (although should be postponed until jepsen tests TBD).

Note, that this change shall not apply for fencing. But Fuel configures quorum policy, no fencing. This cannot cover cases when a node become unresponsive. An unresponsive node would behave the same way and will fail to stop resources with requires=quorum/nothing as well.

UX: this issue impacts only AV and self-healing (recovery) of clusters: with requires=nothing, cluster members will not be stopped by a Pacemaker, which is
expected to bring less issues with self-healing, like this one https://bugs.launchpad.net/fuel/+bug/1388779.

[0] https://goo.gl/VHyIIE

Changed in fuel:
milestone: none → 10.0
importance: Undecided → High
tags: added: galera pacemaker rabbitmq tech-debt
description: updated
description: updated
description: updated
Maciej Relewicz (rlu)
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
status: New → Confirmed
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Bogdan Dobrelya (bogdando)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/314031

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

How to verify:
MySQL resource should be like:

clone clone_p_mysqld p_mysqld \
        meta requires=nothing target-role=Started

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Why is this bug in high priority?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

UX is expected to be high, although I could have done wrong estimation.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/314031
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f55a2b98e0db2d5a54c09bbb806d53ef4b3cb794
Submitter: Jenkins
Branch: master

commit f55a2b98e0db2d5a54c09bbb806d53ef4b3cb794
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 9 11:53:35 2016 +0200

    Do not stop DB/MQ by a Pacemaker quorum

    Also configure fail modes for the DB resource
    the same as for the MQ one.

    Closes-bug: #1577689
    Closes-bug: #1572440

    Change-Id: I3e1ecf67b8bc205920ecd3157b3a422ecd9c6564
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/317934

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Due to high UX impact (fixes avalability and recovery time for rabbit/galera nodes) I strongly adwise to backport that to the stable/mitaka as well

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/317934
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=8c595c20fa155fab95d390939b50495503a0b028
Submitter: Jenkins
Branch: stable/mitaka

commit 8c595c20fa155fab95d390939b50495503a0b028
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 9 11:53:35 2016 +0200

    Do not stop DB/MQ by a Pacemaker quorum

    Also configure fail modes for the DB resource
    the same as for the MQ one.

    Closes-bug: #1577689
    Closes-bug: #1572440

    Change-Id: I3e1ecf67b8bc205920ecd3157b3a422ecd9c6564
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit f55a2b98e0db2d5a54c09bbb806d53ef4b3cb794)

Revision history for this message
Alexey Galkin (agalkin) wrote :

Verify on MOS 9.1.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.