2020-12-22 17:35:38 |
Drew Freiberger |
description |
On cs:rabbitmq-server-104, when I run juju config rabbitmq-server <key>=<value> for items that should not affect the configuration of the running cluster, I am provided juju status updates that show that cluster changes may be happening that are completely unexpected and unnecessary.
On a cluster of 3 units, running the following:
juju config rabbitmq-server queue_thresholds="[['\*', '\*', 1000, 2000]]"
Produces statuses like:
rabbitmq-server/0* active executing 0/lxd/0 10.0.0.1 5672/tcp (config-changed) Enabling queue mirroring
rabbitmq-server/1 maintenance executing 1/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 30 seconds for operation ...
rabbitmq-server/2 maintenance executing 2/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 60 seconds for operation ...
Here is a private link to the log from rabbitmq-server/0 (leader unit) during the above operation.
https://pastebin.canonical.com/p/cnsZTCtJxY/
It appears the entire config-changed kicks off cluster_with(), package installs (possibly updates?), and reconfigures all of the related amqp app clients. This seems a bit dangerous operationally that so much is touched for changes of the config unrelated to the functioning of rabbitmq-server.
This is further exacerbated by potential need for higher modulo-nodes and known-wait times that could hold machine locks hostage for a long period of time if you have, for instance, modulo-nodes = 6 and known-wait = 300, you could have your third unit of rabbitmq-server holding host lock on an innocent config-changed for 30+ minutes because of cluster_with()'s cluster_wait() call. See related bug lp#1903771. |
On cs:rabbitmq-server-104, when I run juju config rabbitmq-server <key>=<value> for items that should not affect the configuration of the running cluster, I am provided juju status updates that show that cluster changes may be happening that are completely unexpected and unnecessary.
On a cluster of 3 units, running the following:
juju config rabbitmq-server queue_thresholds="[['\*', '\*', 1000, 2000]]"
Produces statuses like:
rabbitmq-server/0* active executing 0/lxd/0 10.0.0.1 5672/tcp (config-changed) Enabling queue mirroring
rabbitmq-server/1 maintenance executing 1/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 30 seconds for operation ...
rabbitmq-server/2 maintenance executing 2/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 60 seconds for operation ...
Here is a private link to the log from rabbitmq-server/0 (leader unit) during the above operation.
https://pastebin.canonical.com/p/cnsZTCtJxY/
It appears the entire config-changed kicks off cluster_with(), package installs (possibly updates?), and reconfigures all of the related amqp app clients. This seems a bit dangerous operationally that so much is touched for changes of the config unrelated to the functioning of rabbitmq-server.
This is further exacerbated by potential need for higher modulo-nodes and known-wait times that could hold machine locks hostage for a long period of time if you have, for instance, modulo-nodes = 6 and known-wait = 300, you could have your unit matching modulo 5 of rabbitmq-server holding host lock on an innocent config-changed for 25+ minutes because of cluster_with()'s cluster_wait() call. See related bug lp#1903771. |
|