Set RabbitMQ cluster_partition_handling mode to pause_minority

Bug #1450537 reported by Alexey Khivin
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Wishlist
Fuel Library (Deprecated)
6.1.x
Won't Fix
Wishlist
Fuel Library (Deprecated)
7.0.x
Won't Fix
Wishlist
Fuel Library (Deprecated)
8.0.x
Won't Fix
Wishlist
Fuel Library (Deprecated)

Bug Description

Set RabbitMQ cluster_partition_handling mode to pause_minority instead of autoheal

this mode should protect cluster out of issues about syncing after split brain when two parts of the cluster could be used independently

Alexey Khivin (akhivin)
tags: added: ha low-hanging-fruit
Revision history for this message
Alexey Khivin (akhivin) wrote :

But I should note that this feature could be more useful if HA would not break and rebuild RMQ cluster

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/179171

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Alex Khivin (akhivin)
status: New → In Progress
Alexey Khivin (akhivin)
Changed in fuel:
milestone: none → 6.1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please describe a test case for QA team - how this change would help to prevent permanent rabbit partitions comparing to existing auto_heal value

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I'm marking this one as a duplicate as switching to the pause-minority is a part of suggested work around for the main bug, which is https://bugs.launchpad.net/bugs/1447619

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As testing shows, there is no reasons to switch to the pause minority. Autoheal works as well as pause minority, when it comes partitioning recovery.

Changed in fuel:
status: In Progress → Won't Fix
Revision history for this message
Alexey Khivin (akhivin) wrote :

¨As testing shows¨ is only about rabbitmq cluster recovery, but not in the sense of rabbitmq usage.

For example, one controller loose connection with other two.
Does this mean that clients that using this rabbit node loose connection too? Is it possible that in the autoheal mode one of the controllers works separately and it´s clients knows nothing about split brain? I think, yes. With the autoheal mode clients will not try to move to another node.

I think it is more desirable to be sure that separated node would not handle incoming connections. Thus clients should try to move to another node.

The main reason why I want to change cluster mode to pause_minority is to guarantee that all of the clients will always connected to the same part of the splitted cluster

Changed in fuel:
milestone: 6.1 → 7.0
status: Won't Fix → Confirmed
no longer affects: fuel/7.0.x
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Alex, there is no difference at app level, which is Oslo and its rabbit_hosts parameter, either the underlying AMQP is using pause_minority or autoheal. When failover occurs, Oslo will have to reconnect anyway.

Revision history for this message
Alexey Khivin (akhivin) wrote :

yes, oslo should reconnect anyway

but I want a guarantee that if one node gone from the cluster then it should not handle incoming connection until rejoined to the cluster

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I could be wrong, but split brain by definition assumes this situation *had* happened, and there were incoming connections for each partition. Pause minority and autoheal is about partition recovery, which is how to merge changes and which decisions for cluster logic to be made in order to exit partitioned state.

Revision history for this message
Alexey Khivin (akhivin) wrote :

pause_minority is not only about merging strategy

the single node should never handle incoming connections
it must reset all connections for moving all clients to the one partition of cluster

Changed in fuel:
status: Confirmed → Won't Fix
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Alexey Khivin's patch on review says we need a new RabbitMQ, but the version # isn't known. Moving to incomplete.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Alexey Khivin (<email address hidden>) on branch: master
Review: https://review.openstack.org/179171
Reason: I think too many things changed since the patch

Alexey Khivin (akhivin)
Changed in fuel:
status: Incomplete → Invalid
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Alexey Khivin (akhivin) → Fuel Library Team (fuel-library)
milestone: 6.1 → 8.0
status: Invalid → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.