Comment 5 for bug 1890759

Revision history for this message
Nobuto Murata (nobuto) wrote :

It might be totally unrelated but in the field we are seeing more deployment race condition in terms of RabbitMQ with 21.01 charm release than 20.10 somehow.

I think the biggest change in charm-rabbitmq-server is this one to introduce "queue-master-locator" and set "min-masters" as the default value which is not the default in upstream:
https://opendev.org/openstack/charm-rabbitmq-server/commit/07ec03b5d7a13aa40a2d6e2751c39ba4e5d7dedd

Some services gave up to start something like:

2021-02-19 13:42:43.254 49489 ERROR oslo_service.service oslo_messaging.exceptions.MessageDeliveryFailure: Unable to connect to AMQP server on 10.217.143.106:5672 after inf tries: Server unexpectedly closed connection
22:53

root@juju-d778c7-0-lxd-2:~# systemctl status cinder-scheduler.service
● cinder-scheduler.service - OpenStack Cinder Scheduler
   Loaded: loaded (/lib/systemd/system/cinder-scheduler.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Fri 2021-02-19 13:42:43 UTC; 10min ago
  Process: 49489 ExecStart=/etc/init.d/cinder-scheduler systemd-start (code=exited, status=0/SUCCESS)
 Main PID: 49489 (code=exited, status=0/SUCCESS)

Feb 19 13:37:02 juju-d778c7-0-lxd-2 systemd[1]: Started OpenStack Cinder Scheduler.

We will look into more, but there seems some known issues with "min-masters":
* deployment race condition - https://bugs.launchpad.net/tripleo/+bug/1789373
* failover scenario - https://github.com/rabbitmq/rabbitmq-server/issues/1405