OpenStack RabbitMQ Server Charm

config-changed on non-service-affecting variables (known-wait, modulo-nodes, queue_thresholds) causes unexpected queue mirroring and cluster_wait

Bug #1909031 reported by Drew Freiberger on 2020-12-22

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack RabbitMQ Server Charm	Fix Released	High	Liam Young	OpenStack RabbitMQ Server Charm 22.04

Bug Description

On cs:rabbitmq-server-104, when I run juju config rabbitmq-server <key>=<value> for items that should not affect the configuration of the running cluster, I am provided juju status updates that show that cluster changes may be happening that are completely unexpected and unnecessary.

On a cluster of 3 units, running the following:
juju config rabbitmq-server queue_thresholds="[['\*', '\*', 1000, 2000]]"

Produces statuses like:
rabbitmq-server/0* active executing 0/lxd/0 10.0.0.1 5672/tcp (config-changed) Enabling queue mirroring
rabbitmq-server/1 maintenance executing 1/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 30 seconds for operation ...
rabbitmq-server/2 maintenance executing 2/lxd/0 10.0.0.1 5672/tcp (config-changed) Waiting 60 seconds for operation ...

Here is a private link to the log from rabbitmq-server/0 (leader unit) during the above operation.
https://pastebin.canonical.com/p/cnsZTCtJxY/

It appears the entire config-changed kicks off cluster_with(), package installs (possibly updates?), and reconfigures all of the related amqp app clients. This seems a bit dangerous operationally that so much is touched for changes of the config unrelated to the functioning of rabbitmq-server.

This is further exacerbated by potential need for higher modulo-nodes and known-wait times that could hold machine locks hostage for a long period of time if you have, for instance, modulo-nodes = 6 and known-wait = 300, you could have your unit matching modulo 5 of rabbitmq-server holding host lock on an innocent config-changed for 25+ minutes because of cluster_with()'s cluster_wait() call. See related bug lp#1903771.

See original description

Tags:

Drew Freiberger (afreiberger) on 2020-12-22

description:

updated

Revision history for this message

Trent Lloyd (lathiat) wrote on 2021-06-02:

config_changed explicitly calls rabbit.set_all_mirroring_queues. This function always sets the policy and does not check if it was needed. There have been some upstream bugs we have hit that suggest to try and not call this function during turbulent times as it can cause some de-sync but I don't have a reference to that bug atm. But in general avoiding these types of operations when not needed is ideal.

config_changed then also calls cluster_changed "in case min-cluster-size has changed" and update_clients ("ensure all clients connections are up to date on upgrade)

cluster_changed then calls cluster_with though I couldn't see an obvious path for cluster_with to re-run the other code but it does reset relations which will fire relation changed hooks.

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2021-06-02:

Triaged: it sounds like good practice to not keep setting the same thing if it may cause issues in the payload. A check-before-set may be a better solution.

Changed in charm-rabbitmq-server:
importance:	Undecided → High
status:	New → Triaged
tags:	added: good-first-bug

Liam Young (gnuoy) on 2021-11-18

Changed in charm-rabbitmq-server:
assignee:	nobody → Liam Young (gnuoy)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-23: Fix proposed to charm-rabbitmq-server (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/818879

Changed in charm-rabbitmq-server:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-02-08: Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/818879
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/ccd11fdf9e801b5e3169c3482334a42101198042
Submitter: "Zuul (22348)"
Branch: master

commit ccd11fdf9e801b5e3169c3482334a42101198042
Author: Liam Young <email address hidden>
Date: Thu Nov 18 13:17:03 2021 +0000

Check before applying plugin and perms changes

    Check that setting update is needed before applying a config
    update to the cluster. This is mainly applicable to
    rabbitmq-server > 3.8.2 which supports json output. If a
    parser is not available to extract the existing settings
    then the old behaviour of blindly applying the change
    is used.

Closes-Bug: #1909031
Change-Id: I9599f69cc11ea8d1a4e9d618aecdab4afe488d96

Changed in charm-rabbitmq-server:
status:	In Progress → Fix Committed

Alex Kavanagh (ajkavanagh) on 2022-04-14

Changed in charm-rabbitmq-server:
milestone:	none → 22.04

Alex Kavanagh (ajkavanagh) on 2022-05-10

Changed in charm-rabbitmq-server:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.