juju add-unit --to lxd:7 resulted in non-operational cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Expired
|
Undecided
|
Unassigned |
Bug Description
Description
===========
A node outage brought rabbitmq-server units from 3/3 to 2/3.
Engineers attempted to fix the cluster and it resulted in a non-operational cluster.
Steps to reproduce
====
juju add-unit --to lxd:7
# 3/4
wait for "ready and clustered"
juju remove-mahcine --force (offline machine)
# 3/3
Expected result
===============
Rabbit clustered and fully opertional
Actual result
=============
Rabbit became non responsive
# following command hung
juju run -a rabbitmq-server 'rabbitmqctl cluster_status; rabbitmqctl list_queues -p openstack |wc -l'
Suspected quorum lost, leader lost, break away/sharded cluster or other type of scenario.
Environment
===========
juju --version
2.6.
juju config keystone openstack-origin
cloud:
Logs
====
https:/
sosreport-
sosreport-
sosreport-
sosreport-
Unfortunately, the logs aren't available (they are locked inside salesforce); it's not clear whether this is a transient error (i.e. a problem with rabbitmq) or an issue with the charm doing the wrong thing. Please could you either attached sanitized logs to the bug report of paste relevant errors from logs/status reports. Thanks.