OpenStack RabbitMQ Server Charm

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1890759
Comment #8

Comment 8 for bug 1890759

Revision history for this message

Nobuto Murata (nobuto) wrote on 2021-03-09:

Just for the record again,

> We found out the underlying bcache device for control plane including RabbitMQ wasn't set as writeback accidentally. So the whole race condition might have been caused by IO contention and starvation. So the new config and the new default value may not be the culprit here.

Even after setting writeback to bcache, the deployment wasn't reliable. With bionic's rabbitmq at least, other services had error status sometimes. And the following change in charm config made it reliable in the end.

known-wait: 180
queue-master-locator: client-local

I'm not saying queue-master-locator is the one, but we just need to keep an eye on it especially with large scale deployments.