config-changed on non-service-affecting variables (known-wait, modulo-nodes, queue_thresholds) causes unexpected queue mirroring and cluster_wait

Bug #1909031 reported by Drew Freiberger
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Fix Released
High
Liam Young

Bug Description

On cs:rabbitmq-server-104, when I run juju config rabbitmq-server <key>=<value> for items that should not affect the configuration of the running cluster, I am provided juju status updates that show that cluster changes may be happening that are completely unexpected and unnecessary.

On a cluster of 3 units, running the following:
juju config rabbitmq-server queue_thresholds="[['\*', '\*', 1000, 2000]]"

Produces statuses like:
rabbitmq-server/0* active executing 0/lxd/0 10.0.0.1 5672/tcp (config-changed) Enabling queue mirroring
rabbitmq-server/1 maintenance executing 1/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 30 seconds for operation ...
rabbitmq-server/2 maintenance executing 2/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 60 seconds for operation ...

Here is a private link to the log from rabbitmq-server/0 (leader unit) during the above operation.
https://pastebin.canonical.com/p/cnsZTCtJxY/

It appears the entire config-changed kicks off cluster_with(), package installs (possibly updates?), and reconfigures all of the related amqp app clients. This seems a bit dangerous operationally that so much is touched for changes of the config unrelated to the functioning of rabbitmq-server.

This is further exacerbated by potential need for higher modulo-nodes and known-wait times that could hold machine locks hostage for a long period of time if you have, for instance, modulo-nodes = 6 and known-wait = 300, you could have your unit matching modulo 5 of rabbitmq-server holding host lock on an innocent config-changed for 25+ minutes because of cluster_with()'s cluster_wait() call. See related bug lp#1903771.

description: updated
Revision history for this message
Trent Lloyd (lathiat) wrote :

config_changed explicitly calls rabbit.set_all_mirroring_queues. This function always sets the policy and does not check if it was needed. There have been some upstream bugs we have hit that suggest to try and not call this function during turbulent times as it can cause some de-sync but I don't have a reference to that bug atm. But in general avoiding these types of operations when not needed is ideal.

config_changed then also calls cluster_changed "in case min-cluster-size has changed" and update_clients ("ensure all clients connections are up to date on upgrade)

cluster_changed then calls cluster_with though I couldn't see an obvious path for cluster_with to re-run the other code but it does reset relations which will fire relation changed hooks.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Triaged: it sounds like good practice to not keep setting the same thing if it may cause issues in the payload. A check-before-set may be a better solution.

Changed in charm-rabbitmq-server:
importance: Undecided → High
status: New → Triaged
tags: added: good-first-bug
Liam Young (gnuoy)
Changed in charm-rabbitmq-server:
assignee: nobody → Liam Young (gnuoy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)
Changed in charm-rabbitmq-server:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/818879
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/ccd11fdf9e801b5e3169c3482334a42101198042
Submitter: "Zuul (22348)"
Branch: master

commit ccd11fdf9e801b5e3169c3482334a42101198042
Author: Liam Young <email address hidden>
Date: Thu Nov 18 13:17:03 2021 +0000

    Check before applying plugin and perms changes

    Check that setting update is needed before applying a config
    update to the cluster. This is mainly applicable to
    rabbitmq-server > 3.8.2 which supports json output. If a
    parser is not available to extract the existing settings
    then the old behaviour of blindly applying the change
    is used.

    Closes-Bug: #1909031
    Change-Id: I9599f69cc11ea8d1a4e9d618aecdab4afe488d96

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
Changed in charm-rabbitmq-server:
milestone: none → 22.04
Changed in charm-rabbitmq-server:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.