WARNING: 23 Queue(s) with insufficient members

Bug #2066188 reported by Nobuto Murata
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Snap
Triaged
Critical
Unassigned

Bug Description

By following the multi-node scenario:
https://microstack.run/docs/multi-node

$ juju status -m openstack rabbitmq
Model Controller Cloud/Region Version SLA Timestamp
openstack sunbeam-controller sunbeam-microk8s/localhost 3.4.2 unsupported 16:30:09Z

SAAS Status Store URL
microceph active local admin/controller.microceph

App Version Status Scale Charm Channel Rev Address Exposed Message
rabbitmq 3.12.1 active 3 rabbitmq-k8s 3.12/stable 33 10.0.123.84 no WARNING: 23 Queue(s) with insufficient members

Unit Workload Agent Address Ports Message
rabbitmq/0* active idle 10.1.32.227 WARNING: 23 Queue(s) with insufficient members
rabbitmq/1 active idle 10.1.193.216
rabbitmq/2 active idle 10.1.186.27

The rabbitmq cluster status looks okay.

root@rabbitmq-0:/# rabbitmqctl cluster_status [33/994]
Cluster status of node <email address hidden> ...
Basics

Cluster name: <email address hidden>
Total CPU cores available cluster-wide: 48

Disk Nodes

<email address hidden>
<email address hidden>
<email address hidden>

Running Nodes

<email address hidden>
<email address hidden>
<email address hidden>

Versions

<email address hidden>: RabbitMQ 3.12.1 on Erlang 25.2.3
<email address hidden>: RabbitMQ 3.12.1 on Erlang 25.2.3
<email address hidden>: RabbitMQ 3.12.1 on Erlang 25.2.3

CPU Cores

Node: <email address hidden>, available CPU cores: 16
Node: <email address hidden>, available CPU cores: 16
Node: <email address hidden>, available CPU cores: 16

Maintenance status

Node: <email address hidden>, status: not under maintenance
Node: <email address hidden>, status: not under maintenance
Node: <email address hidden>, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

...

Revision history for this message
Nobuto Murata (nobuto) wrote :

https://github.com/openstack-charmers/charm-rabbitmq-k8s/blob/54d248a5741b0c19381e7058fba43c981f1f0952/src/charm.py#L891-L895

https://github.com/openstack-charmers/charm-rabbitmq-k8s/blob/54d248a5741b0c19381e7058fba43c981f1f0952/src/charm.py#L856-L864

Indeed, those queues are undersized after the sunbeam resize operation. e.g.

- arguments:
    x-queue-type: quorum
  auto_delete: false
  consumer_capacity: 0
  consumer_utilisation: 0
  consumers: 0
  durable: true
  effective_policy_definition: {}
  exclusive: false
  garbage_collection:
    fullsweep_after: 65535
    max_heap_size: 0
    min_bin_vheap_size: 46422
    min_heap_size: 233
    minor_gcs: 5
  leader: <email address hidden>
  members:
    - <email address hidden> ########## <- only one member
  memory: 143172
  message_bytes: 0
  message_bytes_dlx: 0
  message_bytes_persistent: 0
  message_bytes_ram: 0
  message_bytes_ready: 0
  message_bytes_unacknowledged: 0
  messages: 0
  messages_details:
    rate: 0.0
  messages_dlx: 0
  messages_persistent: 0
  messages_ram: 0
  messages_ready: 0
  messages_ready_details:
    rate: 0.0
  messages_unacknowledged: 0
  messages_unacknowledged_details:
    rate: 0.0
  name: cinder-volume.cinder-ceph-0@cinder-ceph
  node: <email address hidden>
  online:
    - <email address hidden>
  open_files:
    <email address hidden>: 0
  operator_policy: None
  policy: None
  reductions: 33414
  reductions_details:
    rate: 401.0
  single_active_consumer_tag: None
  state: running
  type: quorum
  vhost: openstack

Not sure why grow_queues_onto_unit is not properly applied in this case.
https://github.com/openstack-charmers/charm-rabbitmq-k8s/blob/54d248a5741b0c19381e7058fba43c981f1f0952/src/charm.py#L402-L425

tags: added: sunbeam-high-availability
Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

Even in the "success" case, the queues do not have sufficient members after peers-relation-changed which is supposed to call grow_queues_onto_unit. It looks like there are some race conditions not well captured in the charm.

$ juju show-status-log -m openstack rabbitmq/0 --days 1

...

21 May 2024 08:48:01Z juju-unit idle
21 May 2024 08:48:04Z juju-unit executing running peers-relation-changed hook for rabbitmq/2
21 May 2024 08:48:11Z juju-unit idle
21 May 2024 08:48:23Z juju-unit executing running peers-relation-changed hook for rabbitmq/1
21 May 2024 08:48:35Z workload active WARNING: 23 Queue(s) with insufficient members
21 May 2024 08:48:45Z workload active WARNING: 5 Queue(s) with insufficient members
21 May 2024 08:48:55Z juju-unit idle
21 May 2024 08:49:03Z juju-unit executing running config-changed hook
21 May 2024 08:49:05Z juju-unit idle
21 May 2024 08:49:15Z juju-unit executing running peers-relation-changed hook for rabbitmq/2
21 May 2024 08:49:27Z juju-unit executing running peers-relation-changed hook for rabbitmq/1
21 May 2024 08:49:39Z juju-unit idle
21 May 2024 12:14:35Z workload active

Revision history for this message
Nobuto Murata (nobuto) wrote :
James Page (james-page)
Changed in snap-openstack:
status: New → Triaged
importance: Undecided → Critical
tags: added: open-2198
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.