Masakari charm doesn't configure transport_url properly

Bug #1950331 reported by Kabanov Oleg
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Masakari Charm
Fix Released
Medium
Jorge Merlino

Bug Description

Hi,

Masakari charm doesn't configure transport_url properly when RabbitMQ deployed in HA mode (cluster size more than 1).

When we have 3 instances of Masakari in HA cluster and 3 inctances of RabbitMQ in HA mode, every instance of Masakari in its config has only 1 transport_url address (first instance of RabbitMQ cluster).
So Masakari in worst case scenarion will break after failure of 1 node with running RabbitMQ server. And it means Masakari charm doesn't provide "true" HA.

I think problem located in template masakari.conf
https://opendev.org/openstack/charm-masakari/src/branch/master/src/templates/masakari.conf
...
[DEFAULT]
enabled_apis = masakari_api
debug = {{ options.debug }}
auth_strategy = keystone

{% if amqp.password -%}
transport_url = rabbit://masakari:{{ amqp.password }}@{{ amqp.host }}:5672/masakari
{% endif -%}
...

_________________________________________________
Example of this Bug from our production environment:

1) We have 3 instances of rabbitmq-server

$ juju status rabbitmq-server
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud 2.9.16 unsupported 14:20:36Z

SAAS Status Store URL
graylog active foundations-maas admin/lma.graylog-beats
nagios active foundations-maas admin/lma.nagios-monitors
prometheus active foundations-maas admin/lma.prometheus-target

App Version Status Scale Charm Store Channel Rev OS Message
filebeat 5.6.16 active 3 filebeat charmstore stable 33 ubuntu Filebeat ready.
landscape-client active 3 landscape-client charmstore stable 35 ubuntu System successfully registered (source version/commit 20210317-catzulnr))
logrotated active 3 logrotated charmstore stable 3 ubuntu Unit is ready.
nrpe-container active 3 nrpe charmstore stable 75 ubuntu Ready (source version/commit cs-nrpe-...)
rabbitmq-server 3.8.2 active 3 rabbitmq-server charmstore stable 114 ubuntu Unit is ready and clustered
telegraf active 3 telegraf charmstore stable 44 ubuntu Monitoring aodh/0 (source version/commit 26e531a)

Unit Workload Agent Machine Public address Ports Message
rabbitmq-server/0 active idle 0/lxd/20 10.35.174.14 5672/tcp Unit is ready and clustered
  filebeat/50 active idle 10.35.174.14 Filebeat ready.
  landscape-client/45 active idle 10.35.174.14 System successfully registered (source version/commit 20210317-catzulnr))
  logrotated/13 active idle 10.35.174.14 Unit is ready.
  nrpe-container/37 active idle 10.35.174.14 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
  telegraf/28 active idle 10.35.174.14 9103/tcp Monitoring rabbitmq-server/0 (source version/commit 26e531a)
rabbitmq-server/1* active idle 1/lxd/20 10.35.174.52 5672/tcp Unit is ready and clustered
  filebeat/49 active idle 10.35.174.52 Filebeat ready.
  landscape-client/44 active idle 10.35.174.52 System successfully registered (source version/commit 20210317-catzulnr))
  logrotated/33 active idle 10.35.174.52 Unit is ready.
  nrpe-container/49 active idle 10.35.174.52 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
  telegraf/29 active idle 10.35.174.52 9103/tcp Monitoring rabbitmq-server/1 (source version/commit 26e531a)
rabbitmq-server/2 active idle 2/lxd/20 10.35.174.36 5672/tcp Unit is ready and clustered
  filebeat/48 active idle 10.35.174.36 Filebeat ready.
  landscape-client/43 active idle 10.35.174.36 System successfully registered (source version/commit 20210317-catzulnr))
  logrotated/18 active idle 10.35.174.36 Unit is ready.
  nrpe-container/36 active idle 10.35.174.36 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
  telegraf/56 active idle 10.35.174.36 9103/tcp Monitoring rabbitmq-server/2 (source version/commit 26e531a)

Machine State DNS Inst id Series AZ Message
0 started 10.35.174.1 u0400s2enthc01 focal default Deployed
0/lxd/20 started 10.35.174.14 juju-112649-0-lxd-20 focal default Container started
1 started 10.35.174.2 u0400s2enthc02 focal default Deployed
1/lxd/20 started 10.35.174.52 juju-112649-1-lxd-20 focal default Container started
2 started 10.35.174.3 u0400s2enthc03 focal default Deployed
2/lxd/20 started 10.35.174.36 juju-112649-2-lxd-20 focal default Container started

2) We have 3 instaces of Masakari
juju status masakari
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud 2.9.16 unsupported 14:22:18Z

SAAS Status Store URL
graylog active foundations-maas admin/lma.graylog-beats
nagios active foundations-maas admin/lma.nagios-monitors
prometheus active foundations-maas admin/lma.prometheus-target

App Version Status Scale Charm Store Channel Rev OS Message
custom-policy-routing active 3 advanced-routing charmstore stable 5 ubuntu Unit is ready
filebeat 5.6.16 active 3 filebeat charmstore stable 33 ubuntu Filebeat ready.
hacluster-masakari active 3 hacluster charmstore stable 78 ubuntu Unit is ready and clustered
landscape-client active 3 landscape-client charmstore stable 35 ubuntu System successfully registered (source version/commit 20210317-catzulnr))
logrotated active 3 logrotated charmstore stable 3 ubuntu Unit is ready.
masakari 9.0.0 active 3 masakari charmstore stable 13 ubuntu Unit is ready
masakari-mysql-router 8.0.26 active 3 mysql-router charmstore stable 11 ubuntu Unit is ready
nrpe-container active 3 nrpe charmstore stable 75 ubuntu Ready (source version/commit cs-nrpe-...)
telegraf active 3 telegraf charmstore stable 44 ubuntu Monitoring aodh/0 (source version/commit 26e531a)

Unit Workload Agent Machine Public address Ports Message
masakari/0 active idle 0/lxd/12 10.35.174.26 15868/tcp Unit is ready
  custom-policy-routing/11 active idle 10.35.174.26 Unit is ready
  filebeat/62 active idle 10.35.174.26 Filebeat ready.
  hacluster-masakari/0 active idle 10.35.174.26 Unit is ready and clustered
  landscape-client/62 active idle 10.35.174.26 System successfully registered (source version/commit 20210317-catzulnr))
  logrotated/23 active idle 10.35.174.26 Unit is ready.
  masakari-mysql-router/0 active idle 10.35.174.26 Unit is ready
  nrpe-container/50 active idle 10.35.174.26 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
  telegraf/39 active idle 10.35.174.26 9103/tcp Monitoring masakari/0 (source version/commit 26e531a)
masakari/1* active idle 1/lxd/11 10.35.174.63 15868/tcp Unit is ready
  custom-policy-routing/30 active idle 10.35.174.63 Unit is ready
  filebeat/63 active idle 10.35.174.63 Filebeat ready.
  hacluster-masakari/1* active idle 10.35.174.63 Unit is ready and clustered
  landscape-client/63 active idle 10.35.174.63 System successfully registered (source version/commit 20210317-catzulnr))
  logrotated/43 active idle 10.35.174.63 Unit is ready.
  masakari-mysql-router/1* active idle 10.35.174.63 Unit is ready
  nrpe-container/51 active idle 10.35.174.63 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
  telegraf/40 active idle 10.35.174.63 9103/tcp Monitoring masakari/1 (source version/commit 26e531a)
masakari/2 active idle 2/lxd/11 10.35.174.45 15868/tcp Unit is ready
  custom-policy-routing/37 active idle 10.35.174.45 Unit is ready
  filebeat/64 active idle 10.35.174.45 Filebeat ready.
  hacluster-masakari/2 active idle 10.35.174.45 Unit is ready and clustered
  landscape-client/61 active idle 10.35.174.45 System successfully registered (source version/commit 20210317-catzulnr))
  logrotated/50 active idle 10.35.174.45 Unit is ready.
  masakari-mysql-router/2 active idle 10.35.174.45 Unit is ready
  nrpe-container/52 active idle 10.35.174.45 icmp,5666/tcp Ready (source version/commit cs-nrpe-...)
  telegraf/38 active idle 10.35.174.45 9103/tcp Monitoring masakari/2 (source version/commit 26e531a)

Machine State DNS Inst id Series AZ Message
0 started 10.35.174.1 u0400s2enthc01 focal default Deployed
0/lxd/12 started 10.35.174.26 juju-112649-0-lxd-12 focal default Container started
1 started 10.35.174.2 u0400s2enthc02 focal default Deployed
1/lxd/11 started 10.35.174.63 juju-112649-1-lxd-11 focal default Container started
2 started 10.35.174.3 u0400s2enthc03 focal default Deployed
2/lxd/11 started 10.35.174.45 juju-112649-2-lxd-11 focal default Container started

3) If node 0 fails, Masakari wont be able to communicate with RabbitMQ cluster, because it doesn't know about other rabbitmq endpoints.

juju ssh masakari/0 sudo cat /etc/masakari/masakari.conf | grep transport_url
transport_url = rabbit://masakari:24knKMhCg39**********************@10.35.174.14:5672/masakari

$ juju ssh masakari/1 sudo cat /etc/masakari/masakari.conf | grep transport_url
transport_url = rabbit://masakari:24knKMhCg39**********************@10.35.174.14:5672/masakari

$ juju ssh masakari/2 sudo cat /etc/masakari/masakari.conf | grep transport_url
transport_url = rabbit://masakari:24knKMhCg39**********************@10.35.174.14:5672/masakari

Arif Ali (arif-ali)
tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-masakari (master)
Changed in charm-masakari:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-masakari (master)

Reviewed: https://review.opendev.org/c/openstack/charm-masakari/+/819650
Committed: https://opendev.org/openstack/charm-masakari/commit/4fe9f39585a7d034a2dc724165e5c4239e0aabd2
Submitter: "Zuul (22348)"
Branch: master

commit 4fe9f39585a7d034a2dc724165e5c4239e0aabd2
Author: Jorge Merlino <email address hidden>
Date: Mon Nov 29 11:02:54 2021 -0300

    Fix transport_url with multiple rabbit servers

    When RabbitMQ is deployed in HA mode the transport_url parameter of
    masakari must reference all RabbitMQ instances. Previously it
    referenced only one. Now uses the configuration from the base OpenStack
    layer

    Closes-Bug: #1950331
    Change-Id: Iea9d4f2484b82c22939a258e7f9faa3030a9bd1e

Changed in charm-masakari:
status: In Progress → Fix Committed
Felipe Reyes (freyes)
Changed in charm-masakari:
assignee: nobody → Jorge Merlino (jorge-merlino)
importance: Undecided → Medium
milestone: none → 22.04
Revision history for this message
Nobuto Murata (nobuto) wrote :

Great, can we get this backported?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-masakari (stable/21.10)

Fix proposed to branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-masakari/+/824952

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-masakari (stable/21.10)

Change abandoned by "Jorge Merlino <email address hidden>" on branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-masakari/+/824952
Reason: Moving to new charmhub branches

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-masakari (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/charm-masakari/+/837061

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-masakari (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/charm-masakari/+/837064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-masakari (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/charm-masakari/+/837061
Committed: https://opendev.org/openstack/charm-masakari/commit/c91304997571893b8133faf02611b7e5b049a2c0
Submitter: "Zuul (22348)"
Branch: stable/xena

commit c91304997571893b8133faf02611b7e5b049a2c0
Author: Jorge Merlino <email address hidden>
Date: Mon Nov 29 11:02:54 2021 -0300

    Fix transport_url with multiple rabbit servers

    When RabbitMQ is deployed in HA mode the transport_url parameter of
    masakari must reference all RabbitMQ instances. Previously it
    referenced only one. Now uses the configuration from the base OpenStack
    layer

    Closes-Bug: #1950331
    Change-Id: Iea9d4f2484b82c22939a258e7f9faa3030a9bd1e
    (cherry picked from commit 4fe9f39585a7d034a2dc724165e5c4239e0aabd2)

tags: added: in-stable-xena
Changed in charm-masakari:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-masakari (stable/wallaby)

Change abandoned by "Jorge Merlino <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/charm-masakari/+/837064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-masakari (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/charm-masakari/+/837064
Committed: https://opendev.org/openstack/charm-masakari/commit/cb7f5bd1f64992182c78f5298b3985a572c69deb
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit cb7f5bd1f64992182c78f5298b3985a572c69deb
Author: Jorge Merlino <email address hidden>
Date: Mon Nov 29 11:02:54 2021 -0300

    Fix transport_url with multiple rabbit servers

    When RabbitMQ is deployed in HA mode the transport_url parameter of
    masakari must reference all RabbitMQ instances. Previously it
    referenced only one. Now uses the configuration from the base OpenStack
    layer

    Closes-Bug: #1950331
    Change-Id: Iea9d4f2484b82c22939a258e7f9faa3030a9bd1e
    (cherry picked from commit 4fe9f39585a7d034a2dc724165e5c4239e0aabd2)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-masakari (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/charm-masakari/+/851050

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-masakari (stable/ussuri)

Change abandoned by "Edward Hope-Morley <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/charm-masakari/+/851050
Reason: This patch has not been updated in over 6 months so marking as abandoned. If it is still needed please update the patch and re-submit for review.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.