User should be able to choose RabbitMQ network partition policy

Bug #1837761 reported by Gaëtan Trellu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Radosław Piliszek
Train
Fix Released
Medium
Radosław Piliszek

Bug Description

By defaut [1], the RabbitMQ network policy should not be set to autoheal
which could result to a split-brain but to pause_minority or even ignore.

With pause_minority, you get consistency while sacrificing availability.
The minority node(s) will pause and disconnect all clients. The clients
will reconnect to other nodes in the majority-half of the cluster and
resume normal operation.

With autoheal, you get availability while sacrificing consistency. The
cluster becomes "split-brained". The success of each RPC request is
contingent upon all participating connections involved in the request
being on the same partition as one another, which is not very likely.
So until the partition ends, the system will be in a degraded state and
most things are going to fail.

Source: [2]

User should be able to choose the policy by using an option in globals.yml

- [1] https://github.com/openstack/kolla-ansible/blob/stable/stein/ansible/roles/rabbitmq/templates/rabbitmq.conf.j2#L5
- [2] https://bugzilla.redhat.com/show_bug.cgi?id=1189480#c34

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672562

Changed in kolla-ansible:
assignee: nobody → Gaëtan Trellu (goldyfruit)
status: New → In Progress
Changed in kolla-ansible:
assignee: Gaëtan Trellu (goldyfruit) → Radosław Piliszek (yoctozepto)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/686032

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (stable/stein)

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: stable/stein
Review: https://review.opendev.org/686032

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/672562
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=5b0a281d5122d1806f2e689f5b5c7f48658b41d7
Submitter: Zuul
Branch: master

commit 5b0a281d5122d1806f2e689f5b5c7f48658b41d7
Author: Gaëtan Trellu <email address hidden>
Date: Wed Jul 24 12:38:40 2019 -0400

    Set RabbitMQ cluster_partition_handling to pause_minority

    This is to avoid split-brain.

    This change also adds relevant docs that sort out the
    HA/quorum questions.

    Change-Id: I9a8c2ec4dbbd0318beb488548b2cde8f4e487dc1
    Closes-Bug: #1837761
    Co-authored-by: Radosław Piliszek <email address hidden>

Changed in kolla-ansible:
status: In Progress → Fix Released
Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 9.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.