Activity log for bug #2054607

Date Who What changed Old value New value Message
2024-02-21 21:01:38 Patrick Lüdeke bug added bug
2024-02-21 21:02:24 Patrick Lüdeke kolla-ansible: assignee Patrick Lüdeke (vivida)
2024-02-21 21:34:14 OpenStack Infra kolla-ansible: status New In Progress
2024-02-21 21:36:02 Patrick Lüdeke description What happened: Deploying RabbitMQ with Kolla-Ansible on a System with many CPU cores (in this case 256) causes the Health Check to cause CPU spikes which result in a high CPU load (>15). Bug #1846467 fixed this issue for RabbitMQ Server, the changes do not affect the health check though. The RabbitMQ Docker Health Check calls `/usr/local/bin/healthcheck_rabbitmq`, which then calls several times `rabbitmq-diagnostics`. `rabbitmq-diagnostics` is not affected by RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS (see Bug #1846467) and requires RABBITMQ_CTL_ERL_ARGS instead. `rabbitmq-diagnostics` will the try to use all CPU cores without restriction resulting in a high CPU load. Setting RABBITMQ_CTL_ERL_ARGS="+S 2:2" in `/etc/kolla/rabbitmq/rabbitmq-env.conf` fixed the issue for us and reduced the CPU load from >15 to <1. How to reproduce it (minimal and precise): - Get a system with 256 cores - Deploy Kolla-Ansible - Open htop and wait for the health check to run **Environment**: * OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS * Kernel (e.g. `uname -a`): 5.15.0-94-generic * Docker version if applicable (e.g. `docker version`): 25.0.3 * Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release): 2023.2 * Are you using official images from Docker Hub or self built? Dockerhub What happened: Deploying RabbitMQ with Kolla-Ansible on a System with many CPU cores (in this case 256) causes the Health Check to cause CPU spikes which result in a high CPU load (>15). Bug #1846467 fixed this issue for RabbitMQ Server, the changes do not affect the health check though. The RabbitMQ Docker Health Check calls `/usr/local/bin/healthcheck_rabbitmq`, which then calls several times `rabbitmq-diagnostics`. `rabbitmq-diagnostics` is not affected by RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS (see Bug #1846467) and requires RABBITMQ_CTL_ERL_ARGS instead. `rabbitmq-diagnostics` will the try to use all CPU cores without restriction resulting in a high CPU load. Setting RABBITMQ_CTL_ERL_ARGS="+S 2:2" in `/etc/kolla/rabbitmq/rabbitmq-env.conf` fixed the issue for us and reduced the CPU load from >15 to <1. How to reproduce it (minimal and precise): - Get a system with 256 cores - Deploy Kolla-Ansible - Open htop and wait for the health check to run **Environment**: * OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS * Kernel (e.g. `uname -a`): 5.15.0-94-generic * Docker version if applicable (e.g. `docker version`): 25.0.3 * Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release): 2023.2 * Are you using official images from Docker Hub or self built? Dockerhub Thanks to @frenner, @NobleMajo, @netde