Rabbitmq must forget node during infra node recovery
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Opinion
|
Wishlist
|
Unassigned |
Bug Description
When an infra node is being recovered, manual intervention is required to get that infra node back into the rabbitmq cluster. This was found when recovering infra3 after data loss on that node. After running the infrastructure playbooks, you will not see infra in the cluster. From infra1, you must perform forget_cluster_node then add infra3 back in from infra3. It was also found to remove some neutron problems after doing this, the enite rabbitmq cluster needs to be restarted. Playbooks should be adjusted for the example below.
$ infra3:# rabbitmqctl stop_app
$ infra1:# rabbitmqctl forget_cluster_node infra3
$ infra3:# rabbitmqctl join_cluster rabbit@infra1
$ infra3:# rabbitmqctl start_app
$ infra3:# rabbitmqctl cluster_status
$ infra3:# service rabbitmq-server stop
$ infra2:# service rabbitmq-server stop
$ infra1:# service rabbitmq-server stop
$ infra1:# service rabbitmq-server start
$ infra2:# service rabbitmq-server start
$ infra3:# service rabbitmq-server start
Changed in openstack-ansible: | |
status: | Confirmed → Opinion |
AFAIK it's just enough to forget the node and run the playbooks. In my maintenance, it just rebuilt the missing node.
Before considering this bug I like to add additional requirements like not adding nodes if there is a partition present.
Also should this feature be enabled by default but the operator should be able to disable this behavior if desired (polices etc)