rabbitmq fails to start on servers with more than 42 cpus

Bug #1768986 reported by Vern Hart
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack RabbitMQ Server Charm
Fix Committed
Medium
Unassigned
Focal
Fix Released
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

Normally we run rabbitmq-server in a container but I was testing rabbitmq on a few bare-metal servers and all of them were coming up in 'error' state:

  Unit Workload Agent Machine Public address Ports Message
  rabbitmq-server/0* error idle 0 10.110.244.97 hook failed: "config-changed"
  rabbitmq-server/1 waiting executing 1 10.110.244.142 5672/tcp Waiting for all 3 peers to complete the cluster.
    filebeat/1 waiting idle 10.110.244.142 Waiting for:
  elasticsearch, logstash or kafka.
    nrpe/1 active idle 10.110.244.142 icmp,5666/tcp ready
  rabbitmq-server/2 error idle 2 10.110.244.143 5672/tcp hook failed: "cluster-relation-joined"
    filebeat/3 waiting idle 10.110.244.143 Waiting for:
  elasticsearch, logstash or kafka.
    nrpe/3 active idle 10.110.244.143 icmp,5666/tcp ready

After connecting to one of the units and trying to start rabbit manually, I see this error:

  # /usr/sbin/rabbitmq-server
  bad number of async threads 1920
  Usage: beam.smp [flags] [ -- [init_args] ]
  The flags are:

  -a size suggested stack size in kilo words for threads
                 in the async-thread pool, valid range is [16-8192]
  -A number set number of threads in async thread pool,
                 valid range is [0-1024]
  [...]

The usage above state 1024 is the max. And a quick search reveals where that 1920 is getting set:

  # cat /etc/rabbitmq/rabbitmq-env.conf
  ###############################################################################
  # [ WARNING ]
  # Configuration file maintained by Juju. Local changes may be overwritten.
  ###############################################################################

  RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS='+A 1920'

Turns out the charm sets that number based on a default multiplier (24) times the number of cpus on the server. In this case:

  # grep processor /proc/cpuinfo | wc -l
  80
  # echo $((80*24))
  1920

So, although this seems a rare thing, the charm code should impose a maximum value of 1024 for the size of the async thread pool.

Vern Hart (vern)
Changed in fuel:
assignee: nobody → Vern Hart (vhart)
Vern Hart (vern)
affects: fuel → charm-rabbitmq-server
Changed in charm-rabbitmq-server:
status: New → In Progress
Revision history for this message
Frode Nordahl (fnordahl) wrote :

It seems OpenStack Infra did not automatically post the review link, so for reference; https://review.openstack.org/#/c/566173/

Changed in charm-rabbitmq-server:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-rabbitmq-server (master)

Change abandoned by Frode Nordahl (<email address hidden>) on branch: master
Review: https://review.openstack.org/566173
Reason: Marking this review as abandoned due to lack of activity in the past 6 months.
Feel free to restore it again if you want to pick up and continue the work.

Vern Hart (vern)
Changed in charm-rabbitmq-server:
status: In Progress → Invalid
assignee: Vern Hart (vern) → nobody
Felipe Reyes (freyes)
Changed in charm-rabbitmq-server:
status: Invalid → Triaged
Changed in charm-rabbitmq-server:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/566173
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/3c1c05ee598b2fc85ddf0f971528dde433312bff
Submitter: "Zuul (22348)"
Branch: master

commit 3c1c05ee598b2fc85ddf0f971528dde433312bff
Author: Vern Hart <email address hidden>
Date: Thu May 3 21:46:22 2018 +0000

    Enforce a maximum of 1024 async threads.

    The beam.smp process won't start if more than 1024 are configured, the
    charm could make this by default on large systems (e.g. more than 42
    CPUs). This change makes RabbitMQEnvContext.calculate_threads() never
    return more than 1024 (MAX_NUM_THREADS).

    Change-Id: I92879445210bac6ee7d96a704cdf428ca738e3b6
    Closes-Bug: #1768986

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (stable/jammy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (stable/focal)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (stable/jammy)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/880172
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/552eb3a5d212d6121a894bd36b5f58870e18b92d
Submitter: "Zuul (22348)"
Branch: stable/jammy

commit 552eb3a5d212d6121a894bd36b5f58870e18b92d
Author: Vern Hart <email address hidden>
Date: Thu May 3 21:46:22 2018 +0000

    Enforce a maximum of 1024 async threads.

    The beam.smp process won't start if more than 1024 are configured, the
    charm could make this by default on large systems (e.g. more than 42
    CPUs). This change makes RabbitMQEnvContext.calculate_threads() never
    return more than 1024 (MAX_NUM_THREADS).

    Change-Id: I92879445210bac6ee7d96a704cdf428ca738e3b6
    Closes-Bug: #1768986
    (cherry picked from commit 3c1c05ee598b2fc85ddf0f971528dde433312bff)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (stable/focal)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/880215
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/7d72a74b646d662bd6866063b5a74e5f910f5ad9
Submitter: "Zuul (22348)"
Branch: stable/focal

commit 7d72a74b646d662bd6866063b5a74e5f910f5ad9
Author: Vern Hart <email address hidden>
Date: Thu May 3 21:46:22 2018 +0000

    Enforce a maximum of 1024 async threads.

    The beam.smp process won't start if more than 1024 are configured, the
    charm could make this by default on large systems (e.g. more than 42
    CPUs). This change makes RabbitMQEnvContext.calculate_threads() never
    return more than 1024 (MAX_NUM_THREADS).

    Resolved Conflicts:
            tox.ini
            unit_tests/test_rabbitmq_server_relations.py

    Change-Id: I92879445210bac6ee7d96a704cdf428ca738e3b6
    Closes-Bug: #1768986
    (cherry picked from commit 3c1c05ee598b2fc85ddf0f971528dde433312bff)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.