Rabbitmq must forget node during infra node recovery

Bug #1494295 reported by Mark DeVerter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Opinion
Wishlist
Unassigned

Bug Description

When an infra node is being recovered, manual intervention is required to get that infra node back into the rabbitmq cluster. This was found when recovering infra3 after data loss on that node. After running the infrastructure playbooks, you will not see infra in the cluster. From infra1, you must perform forget_cluster_node then add infra3 back in from infra3. It was also found to remove some neutron problems after doing this, the enite rabbitmq cluster needs to be restarted. Playbooks should be adjusted for the example below.

$ infra3:# rabbitmqctl stop_app
$ infra1:# rabbitmqctl forget_cluster_node infra3
$ infra3:# rabbitmqctl join_cluster rabbit@infra1
$ infra3:# rabbitmqctl start_app
$ infra3:# rabbitmqctl cluster_status
$ infra3:# service rabbitmq-server stop
$ infra2:# service rabbitmq-server stop
$ infra1:# service rabbitmq-server stop
$ infra1:# service rabbitmq-server start
$ infra2:# service rabbitmq-server start
$ infra3:# service rabbitmq-server start

Revision history for this message
Bjoern (bjoern-t) wrote :

AFAIK it's just enough to forget the node and run the playbooks. In my maintenance, it just rebuilt the missing node.
Before considering this bug I like to add additional requirements like not adding nodes if there is a partition present.
Also should this feature be enabled by default but the operator should be able to disable this behavior if desired (polices etc)

Revision history for this message
Kevin Carter (kevin-carter) wrote :

As discussed in the triage meeting we've decided that the best approach will be to create a hard stop in the role to bail if its detected that the cluster is not in a functional state. The methods discussed today are similar to what is presently being done within the galera_server role.

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Wishlist
Changed in openstack-ansible:
status: Confirmed → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.