Masakari charm doesn't configure transport_url properly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Masakari Charm |
Fix Released
|
Medium
|
Jorge Merlino |
Bug Description
Hi,
Masakari charm doesn't configure transport_url properly when RabbitMQ deployed in HA mode (cluster size more than 1).
When we have 3 instances of Masakari in HA cluster and 3 inctances of RabbitMQ in HA mode, every instance of Masakari in its config has only 1 transport_url address (first instance of RabbitMQ cluster).
So Masakari in worst case scenarion will break after failure of 1 node with running RabbitMQ server. And it means Masakari charm doesn't provide "true" HA.
I think problem located in template masakari.conf
https:/
...
[DEFAULT]
enabled_apis = masakari_api
debug = {{ options.debug }}
auth_strategy = keystone
{% if amqp.password -%}
transport_url = rabbit:
{% endif -%}
...
_______
Example of this Bug from our production environment:
1) We have 3 instances of rabbitmq-server
$ juju status rabbitmq-server
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud 2.9.16 unsupported 14:20:36Z
SAAS Status Store URL
graylog active foundations-maas admin/lma.
nagios active foundations-maas admin/lma.
prometheus active foundations-maas admin/lma.
App Version Status Scale Charm Store Channel Rev OS Message
filebeat 5.6.16 active 3 filebeat charmstore stable 33 ubuntu Filebeat ready.
landscape-client active 3 landscape-client charmstore stable 35 ubuntu System successfully registered (source version/commit 20210317-catzulnr))
logrotated active 3 logrotated charmstore stable 3 ubuntu Unit is ready.
nrpe-container active 3 nrpe charmstore stable 75 ubuntu Ready (source version/commit cs-nrpe-...)
rabbitmq-server 3.8.2 active 3 rabbitmq-server charmstore stable 114 ubuntu Unit is ready and clustered
telegraf active 3 telegraf charmstore stable 44 ubuntu Monitoring aodh/0 (source version/commit 26e531a)
Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0 active idle 0/lxd/20 10.35.174.14 5672/tcp Unit is ready and clustered
filebeat/50 active idle 10.35.174.14 Filebeat ready.
landscape-
logrotated/13 active idle 10.35.174.14 Unit is ready.
nrpe-container/37 active idle 10.35.174.14 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
telegraf/28 active idle 10.35.174.14 9103/tcp Monitoring rabbitmq-server/0 (source version/commit 26e531a)
rabbitmq-server/1* active idle 1/lxd/20 10.35.174.52 5672/tcp Unit is ready and clustered
filebeat/49 active idle 10.35.174.52 Filebeat ready.
landscape-
logrotated/33 active idle 10.35.174.52 Unit is ready.
nrpe-container/49 active idle 10.35.174.52 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
telegraf/29 active idle 10.35.174.52 9103/tcp Monitoring rabbitmq-server/1 (source version/commit 26e531a)
rabbitmq-server/2 active idle 2/lxd/20 10.35.174.36 5672/tcp Unit is ready and clustered
filebeat/48 active idle 10.35.174.36 Filebeat ready.
landscape-
logrotated/18 active idle 10.35.174.36 Unit is ready.
nrpe-container/36 active idle 10.35.174.36 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
telegraf/56 active idle 10.35.174.36 9103/tcp Monitoring rabbitmq-server/2 (source version/commit 26e531a)
Machine State DNS Inst id Series AZ Message
0 started 10.35.174.1 u0400s2enthc01 focal default Deployed
0/lxd/20 started 10.35.174.14 juju-112649-
1 started 10.35.174.2 u0400s2enthc02 focal default Deployed
1/lxd/20 started 10.35.174.52 juju-112649-
2 started 10.35.174.3 u0400s2enthc03 focal default Deployed
2/lxd/20 started 10.35.174.36 juju-112649-
2) We have 3 instaces of Masakari
juju status masakari
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud 2.9.16 unsupported 14:22:18Z
SAAS Status Store URL
graylog active foundations-maas admin/lma.
nagios active foundations-maas admin/lma.
prometheus active foundations-maas admin/lma.
App Version Status Scale Charm Store Channel Rev OS Message
custom-
filebeat 5.6.16 active 3 filebeat charmstore stable 33 ubuntu Filebeat ready.
hacluster-masakari active 3 hacluster charmstore stable 78 ubuntu Unit is ready and clustered
landscape-client active 3 landscape-client charmstore stable 35 ubuntu System successfully registered (source version/commit 20210317-catzulnr))
logrotated active 3 logrotated charmstore stable 3 ubuntu Unit is ready.
masakari 9.0.0 active 3 masakari charmstore stable 13 ubuntu Unit is ready
masakari-
nrpe-container active 3 nrpe charmstore stable 75 ubuntu Ready (source version/commit cs-nrpe-...)
telegraf active 3 telegraf charmstore stable 44 ubuntu Monitoring aodh/0 (source version/commit 26e531a)
Unit Workload Agent Machine Public address Ports Message
masakari/0 active idle 0/lxd/12 10.35.174.26 15868/tcp Unit is ready
custom-
filebeat/62 active idle 10.35.174.26 Filebeat ready.
hacluster-
landscape-
logrotated/23 active idle 10.35.174.26 Unit is ready.
masakari-
nrpe-container/50 active idle 10.35.174.26 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
telegraf/39 active idle 10.35.174.26 9103/tcp Monitoring masakari/0 (source version/commit 26e531a)
masakari/1* active idle 1/lxd/11 10.35.174.63 15868/tcp Unit is ready
custom-
filebeat/63 active idle 10.35.174.63 Filebeat ready.
hacluster-
landscape-
logrotated/43 active idle 10.35.174.63 Unit is ready.
masakari-
nrpe-container/51 active idle 10.35.174.63 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
telegraf/40 active idle 10.35.174.63 9103/tcp Monitoring masakari/1 (source version/commit 26e531a)
masakari/2 active idle 2/lxd/11 10.35.174.45 15868/tcp Unit is ready
custom-
filebeat/64 active idle 10.35.174.45 Filebeat ready.
hacluster-
landscape-
logrotated/50 active idle 10.35.174.45 Unit is ready.
masakari-
nrpe-container/52 active idle 10.35.174.45 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
telegraf/38 active idle 10.35.174.45 9103/tcp Monitoring masakari/2 (source version/commit 26e531a)
Machine State DNS Inst id Series AZ Message
0 started 10.35.174.1 u0400s2enthc01 focal default Deployed
0/lxd/12 started 10.35.174.26 juju-112649-
1 started 10.35.174.2 u0400s2enthc02 focal default Deployed
1/lxd/11 started 10.35.174.63 juju-112649-
2 started 10.35.174.3 u0400s2enthc03 focal default Deployed
2/lxd/11 started 10.35.174.45 juju-112649-
3) If node 0 fails, Masakari wont be able to communicate with RabbitMQ cluster, because it doesn't know about other rabbitmq endpoints.
juju ssh masakari/0 sudo cat /etc/masakari/
transport_url = rabbit:
$ juju ssh masakari/1 sudo cat /etc/masakari/
transport_url = rabbit:
$ juju ssh masakari/2 sudo cat /etc/masakari/
transport_url = rabbit:
tags: | added: sts |
Changed in charm-masakari: | |
assignee: | nobody → Jorge Merlino (jorge-merlino) |
importance: | Undecided → Medium |
milestone: | none → 22.04 |
Changed in charm-masakari: | |
status: | Fix Committed → Fix Released |
Fix proposed to branch: master /review. opendev. org/c/openstack /charm- masakari/ +/819650
Review: https:/