rabbitmq's min-masters queue locator setting is suboptimal

Bug #1789373 reported by Michele Baldessari
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
High
Unassigned

Bug Description

First reported via https://bugzilla.redhat.com/show_bug.cgi?id=1585032
...
=INFO REPORT==== 31-May-2018::17:32:39 ===
closing AMQP connection <0.6475.0> (172.17.0.19:46030 -> 172.17.0.19:5672 - cinder-volume:118015:a9c3e789-694a-42b8-9104-6f11d93e2d0e)

This is attempting to declare the queue master for the cinder-volume.hostgroup@tripleo_iscsi.hostgroup queue onto overcloud-controller-1. The reason for this is because we have set the queue_master_locator option in rabbitmq.config:

    {queue_master_locator, <<"min-masters">>},

So controller-0 decides that controller-1 has the fewest number of master queues and tries to declare it there.

However, at the time this is happening, controller-1 is restarting. Note the error is at 17:32:39, and then compare to the rabbit log on controller-1:

=INFO REPORT==== 31-May-2018::17:32:36 ===
Stopped RabbitMQ application

=INFO REPORT==== 31-May-2018::17:32:38 ===
Clustering with ['rabbit@overcloud-controller-0'] as disc node

=INFO REPORT==== 31-May-2018::17:32:41 ===
Starting RabbitMQ 3.6.5 on Erlang 18.3.4.7

So at 39 seconds, controller-1 has rejoined the cluster but has not yet started the rabbit app.

This is probably a bug somewhere in the master locator code. I would expect it to verify the target node is actually up, not just clustered.
...

It seems more often than not rabbitmq has issues with min-master as it locates a node that is not fully up and hence fails afterwards.

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :
Changed in tripleo:
milestone: rocky-rc2 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → stein-rc1
Changed in tripleo:
milestone: stein-rc1 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Revision history for this message
Alex Schultz (alex-schultz) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in tripleo:
assignee: Michele Baldessari (michele) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.opendev.org/587064
Reason: This review is > 180 days without comment and WIP -1. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewers. For more details check policy https://specs.openstack.org/openstack/tripleo-specs/specs/policy/patch-abandonment.html

Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
status: New → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.