Comment 0 for bug 2054607

Revision history for this message
Patrick Lüdeke (vivida) wrote :

What happened:
Deploying RabbitMQ with Kolla-Ansible on a System with many CPU cores (in this case 256) causes the Health Check to cause CPU spikes which result in a high CPU load (>15). Bug #1846467 fixed this issue for RabbitMQ Server, the changes do not affect the health check though.

The RabbitMQ Docker Health Check calls `/usr/local/bin/healthcheck_rabbitmq`, which then calls several times `rabbitmq-diagnostics`. `rabbitmq-diagnostics` is not affected by RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS (see Bug #1846467) and requires RABBITMQ_CTL_ERL_ARGS instead. `rabbitmq-diagnostics` will the try to use all CPU cores without restriction resulting in a high CPU load.

Setting RABBITMQ_CTL_ERL_ARGS="+S 2:2" in `/etc/kolla/rabbitmq/rabbitmq-env.conf` fixed the issue for us and reduced the CPU load from >15 to <1.

How to reproduce it (minimal and precise):
- Get a system with 256 cores
- Deploy Kolla-Ansible
- Open htop and wait for the health check to run

**Environment**:
* OS (e.g. from /etc/os-release): Ubuntu 22.04.3 LTS
* Kernel (e.g. `uname -a`): 5.15.0-94-generic
* Docker version if applicable (e.g. `docker version`): 25.0.3
* Kolla-Ansible version (e.g. `git head or tag or stable branch` or pip package version if using release): 2023.2
* Are you using official images from Docker Hub or self built? Dockerhub