RabbitMQ container - high CPU load on multicore systems

Bug #1846467 reported by Jan Vondra
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
Medium
Jan Vondra
Train
Fix Released
Medium
Unassigned

Bug Description

Steps to reproduce:
1, Deploy kolla control group on system with many CPU cores (32 virtual cores in my case)
2, See cpu load - docker stats
3, RabbitMQ container load is much higher than anticipated - in my case ~60%

Solution:
set RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 1:1" in rabbitmq-env.conf (as suggested in https://github.com/helm/charts/issues/3855#issuecomment-478529100 as erlang scheduling thread count seems to cause the issue) rabbitmq container load drops to ~10%

**Environment**:
* OS: Ubuntu 18.04
* Kernel: 4.15.0-47-generic #50-Ubuntu SMP Wed Mar 13 10:44:52 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
* Kolla-Ansible version: stable/stein
* Docker image Install type: binary
* Docker image distribution: debian - self built

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/686369

Changed in kolla-ansible:
assignee: nobody → Jan Vondra (janvondra)
status: New → In Progress
Changed in kolla-ansible:
importance: Undecided → Medium
milestone: none → 9.0.0
Revision history for this message
romano trampus (romano-trampus) wrote :

Same on all-in-one deploy: beam.smp (RabitMQ) process was 17-20% on an Intel i7 single note. After applying the suggested solution the process is 3%.

Revision history for this message
Keith Plant (kplant) wrote :

Same behavior on CentOS 7 - the suggested solution also seems to be successful for me.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/686369
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=9137828b979a2eb8ff1bfe3174f7c6ae8047e835
Submitter: Zuul
Branch: master

commit 9137828b979a2eb8ff1bfe3174f7c6ae8047e835
Author: Jan Vondra <email address hidden>
Date: Thu Oct 3 12:01:00 2019 +0200

    Allow passing arguments to RabbitMQ server

    Adds rabbitmq_server_additional_erl_args variable which
    is appended to RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS
    environment variable to RabbitMQ server startup script.

    This can be used to configure the schedulers.

    Docs attached.

    Change-Id: Id683c8cc6dac61354ffd94f3b460335b42136ba2
    Co-authored-by: Radosław Piliszek <email address hidden>
    Related-bug: #1846467

Mark Goddard (mgoddard)
Changed in kolla-ansible:
status: In Progress → Fix Released
milestone: 9.0.0 → none
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/723374
Committed: https://opendev.org/openstack/kolla-ansible/commit/70f6f8e4c02ed6a8687d2bd714d3fe0b9d04d84a
Submitter: "Zuul (22348)"
Branch: master

commit 70f6f8e4c02ed6a8687d2bd714d3fe0b9d04d84a
Author: John Garbutt <email address hidden>
Date: Mon Apr 27 10:59:06 2020 +0100

    Reduce RabbitMQ busy waiting, lowering CPU load

    On machines with many cores, we were seeing excessive CPU load on systems
    that were not very busy. With the following Erlang VM argument we saw
    RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
    40 hyperthreads.

        +S 2:2

    By default RabbitMQ starts N schedulers where N is the number of CPU
    cores, including hyper-threaded cores. This is fine when you assume all
    your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
    Kolla Ansible setup. Here we go for two scheduler threads.
    More details can be found here:
    https://www.rabbitmq.com/runtime.html#scheduling
    and here:
    https://erlang.org/doc/man/erl.html#emulator-flags

        +sbwt none

    This stops busy waiting of the scheduler, for more details see:
    https://www.rabbitmq.com/runtime.html#busy-waiting
    Newer versions of rabbit may need additional flags:
    "+sbwt none +sbwtdcpu none +sbwtdio none"
    But this patch should be back portable to older versions of RabbitMQ
    used in Train and Stein.

    Note that information on this tuning was found by looking at data from:
    rabbitmq-diagnostics runtime_thread_stats
    More details on that can be found here:
    https://www.rabbitmq.com/runtime.html#thread-stats

    Related-Bug: #1846467

    Change-Id: Iced014acee7e590c10848e73feca166f48b622dc

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/799237

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/799238

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/799239

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/799237
Committed: https://opendev.org/openstack/kolla-ansible/commit/6c9b9179de6bc6f97530e4a9a251cbc9d7ee2366
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 6c9b9179de6bc6f97530e4a9a251cbc9d7ee2366
Author: John Garbutt <email address hidden>
Date: Mon Apr 27 10:59:06 2020 +0100

    Reduce RabbitMQ busy waiting, lowering CPU load

    On machines with many cores, we were seeing excessive CPU load on systems
    that were not very busy. With the following Erlang VM argument we saw
    RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
    40 hyperthreads.

        +S 2:2

    By default RabbitMQ starts N schedulers where N is the number of CPU
    cores, including hyper-threaded cores. This is fine when you assume all
    your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
    Kolla Ansible setup. Here we go for two scheduler threads.
    More details can be found here:
    https://www.rabbitmq.com/runtime.html#scheduling
    and here:
    https://erlang.org/doc/man/erl.html#emulator-flags

        +sbwt none

    This stops busy waiting of the scheduler, for more details see:
    https://www.rabbitmq.com/runtime.html#busy-waiting
    Newer versions of rabbit may need additional flags:
    "+sbwt none +sbwtdcpu none +sbwtdio none"
    But this patch should be back portable to older versions of RabbitMQ
    used in Train and Stein.

    Note that information on this tuning was found by looking at data from:
    rabbitmq-diagnostics runtime_thread_stats
    More details on that can be found here:
    https://www.rabbitmq.com/runtime.html#thread-stats

    Related-Bug: #1846467

    Change-Id: Iced014acee7e590c10848e73feca166f48b622dc
    (cherry picked from commit 70f6f8e4c02ed6a8687d2bd714d3fe0b9d04d84a)

tags: added: in-stable-wallaby
tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/799238
Committed: https://opendev.org/openstack/kolla-ansible/commit/21cef390e7e5e152f05c311fc5df38030c8027fa
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 21cef390e7e5e152f05c311fc5df38030c8027fa
Author: John Garbutt <email address hidden>
Date: Mon Apr 27 10:59:06 2020 +0100

    Reduce RabbitMQ busy waiting, lowering CPU load

    On machines with many cores, we were seeing excessive CPU load on systems
    that were not very busy. With the following Erlang VM argument we saw
    RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
    40 hyperthreads.

        +S 2:2

    By default RabbitMQ starts N schedulers where N is the number of CPU
    cores, including hyper-threaded cores. This is fine when you assume all
    your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
    Kolla Ansible setup. Here we go for two scheduler threads.
    More details can be found here:
    https://www.rabbitmq.com/runtime.html#scheduling
    and here:
    https://erlang.org/doc/man/erl.html#emulator-flags

        +sbwt none

    This stops busy waiting of the scheduler, for more details see:
    https://www.rabbitmq.com/runtime.html#busy-waiting
    Newer versions of rabbit may need additional flags:
    "+sbwt none +sbwtdcpu none +sbwtdio none"
    But this patch should be back portable to older versions of RabbitMQ
    used in Train and Stein.

    Note that information on this tuning was found by looking at data from:
    rabbitmq-diagnostics runtime_thread_stats
    More details on that can be found here:
    https://www.rabbitmq.com/runtime.html#thread-stats

    Related-Bug: #1846467

    Change-Id: Iced014acee7e590c10848e73feca166f48b622dc
    (cherry picked from commit 70f6f8e4c02ed6a8687d2bd714d3fe0b9d04d84a)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/799239
Committed: https://opendev.org/openstack/kolla-ansible/commit/95de850250ab5394a7d1ac4fdf2920fda3dea77c
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 95de850250ab5394a7d1ac4fdf2920fda3dea77c
Author: John Garbutt <email address hidden>
Date: Mon Apr 27 10:59:06 2020 +0100

    Reduce RabbitMQ busy waiting, lowering CPU load

    On machines with many cores, we were seeing excessive CPU load on systems
    that were not very busy. With the following Erlang VM argument we saw
    RabbitMQ CPU usage drop from about 150% to around 20%, on a system with
    40 hyperthreads.

        +S 2:2

    By default RabbitMQ starts N schedulers where N is the number of CPU
    cores, including hyper-threaded cores. This is fine when you assume all
    your CPUs are dedicated to RabbitMQ. Its not a good idea in a typical
    Kolla Ansible setup. Here we go for two scheduler threads.
    More details can be found here:
    https://www.rabbitmq.com/runtime.html#scheduling
    and here:
    https://erlang.org/doc/man/erl.html#emulator-flags

        +sbwt none

    This stops busy waiting of the scheduler, for more details see:
    https://www.rabbitmq.com/runtime.html#busy-waiting
    Newer versions of rabbit may need additional flags:
    "+sbwt none +sbwtdcpu none +sbwtdio none"
    But this patch should be back portable to older versions of RabbitMQ
    used in Train and Stein.

    Note that information on this tuning was found by looking at data from:
    rabbitmq-diagnostics runtime_thread_stats
    More details on that can be found here:
    https://www.rabbitmq.com/runtime.html#thread-stats

    Related-Bug: #1846467

    Change-Id: Iced014acee7e590c10848e73feca166f48b622dc
    (cherry picked from commit 70f6f8e4c02ed6a8687d2bd714d3fe0b9d04d84a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/909797

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.