sweeping all amqp related units on every hook event does not scale
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Fix Released
|
Medium
|
Edward Hope-Morley |
Bug Description
Currently, on each and every hook fired, we call update_clients() which in turn calls amqp_changed() which in turn calls (amongst other things and for leader only) configure_amqp() which in turn calls rabbitmqctl commands (that are slow to execute) for every unit that is related to the charm amqp relation. This of course includes the update-status hook which fires every 15 minutes. In larger deployments which could easily have hundreds if not thousands of units related (e.g. nova-compute), this call will take a very long time to complete. I understand the motivation for this is to ensure that the correct settings are applied at all times and that no change should result in no effect but since executing the commands themselves, even if idempotent, is an expensive operation I believe there are some obvious optimisations that could be implemented in order to mitigate the effects of these actions.
Changed in charm-rabbitmq-server: | |
status: | Triaged → In Progress |
assignee: | nobody → Edward Hope-Morley (hopem) |
Changed in charm-rabbitmq-server: | |
status: | Fix Committed → Fix Released |
Sorry to nitpick, but I think update-status is every 5 minutes, which makes the problem worse. I would agree that you could drop it from the 'update-status' hook, and probably from every other hook except for leader changed and amqp-relation- changed.
I expect that the 'update-clients()' function, however, is a workaround race-hazards or other async like problems with rabbitmq which we've had problems in the past with, and update-status hook 'fixes' it after the fact. Not sure how you would progress this.