juju show-unit is clustered, but relation-get returns null

Bug #1978566 reported by Natalia Litvinova
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Undecided
Unassigned
OpenStack RabbitMQ Server Charm
Expired
Undecided
Unassigned

Bug Description

Hi we're seeing rabbitmq relations failing to complete on the current deployment

$ juju show-unit rabbitmq-server/0 | grep rabbitmq -C 50

related-units:
      rabbitmq-server/1:
        in-scope: true
        data:
          clustered: juju-f152b1-7-lxd-7
          coordinator: '{}'
          egress-subnets: 192.168.108.82/32
          hostname: juju-f152b1-7-lxd-7
          ingress-address: 192.168.108.82
          private-address: 192.168.108.82
          timestamp: "1655148142.2371707"
      rabbitmq-server/2:
        in-scope: true
        data:
          clustered: juju-f152b1-8-lxd-7
          cookie: VBRPFEYFNUGBRFZOBHMS
          coordinator: '{}'
          egress-subnets: 192.168.108.102/32
          hostname: juju-f152b1-8-lxd-7
          ingress-address: 192.168.108.102
          private-address: 192.168.108.102
          timestamp: "1655206877.899645"

We can see that the units are showing as clustered, but looking at the relation from the leader unit:
$ relation-get --format=json -r 39 clustered rabbitmq-server/2
null

Other unit outputs:
$ juju show-unit rabbitmq-server/0 | grep rabbitmq -C 50
    related-units:
      rabbitmq-server/1:
        in-scope: true
        data:
          clustered: juju-f152b1-7-lxd-7
          coordinator: '{}'
          egress-subnets: 192.168.108.82/32
          hostname: juju-f152b1-7-lxd-7
          ingress-address: 192.168.108.82
          private-address: 192.168.108.82
          timestamp: "1655148142.2371707"
      rabbitmq-server/2:
        in-scope: true
        data:
          clustered: juju-f152b1-8-lxd-7
          cookie: VBRPFEYFNUGBRFZOBHMS
          coordinator: '{}'
          egress-subnets: 192.168.108.102/32
          hostname: juju-f152b1-8-lxd-7
          ingress-address: 192.168.108.102
          private-address: 192.168.108.102
          timestamp: "1655206877.899645"
$ juju show-unit rabbitmq-server/2 | grep rabbitmq -C 50
    related-units:
      rabbitmq-server/0:
        in-scope: true
        data:
          clustered: juju-f152b1-6-lxd-7
          cookie: VBRPFEYFNUGBRFZOBHMS
          coordinator: '{}'
          egress-subnets: 192.168.108.97/32
          hostname: juju-f152b1-6-lxd-7
          ingress-address: 192.168.108.97
          private-address: 192.168.108.97
          timestamp: "1655148231.067994"
      rabbitmq-server/1:
        in-scope: true
        data:
          clustered: juju-f152b1-7-lxd-7
          coordinator: '{}'
          egress-subnets: 192.168.108.82/32
          hostname: juju-f152b1-7-lxd-7
          ingress-address: 192.168.108.82
          private-address: 192.168.108.82
          timestamp: "1655148142.2371707"

This is how it looks from the leader unit for all 3 units:
root@juju-f152b1-7-lxd-7:/var/lib/juju/agents/unit-rabbitmq-server-1/charm# relation-get --format=json -r 39 clustered rabbitmq-server/2
null
root@juju-f152b1-7-lxd-7:/var/lib/juju/agents/unit-rabbitmq-server-1/charm# relation-get --format=json -r 39 clustered rabbitmq-server/0
"juju-f152b1-6-lxd-7"
root@juju-f152b1-7-lxd-7:/var/lib/juju/agents/unit-rabbitmq-server-1/charm# relation-get --format=json -r 39 clustered rabbitmq-server/1
"juju-f152b1-7-lxd-7"

Workaround: reboot the node

Thanks,
Natalia

Revision history for this message
Natalia Litvinova (natalytvinova) wrote :
Revision history for this message
Natalia Litvinova (natalytvinova) wrote :

Subscribing field-high as control-plane is not working as expected

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote (last edit ):

Hi Natalia

Please could you add some more information to the bug so that it can be triaged effectively: https://docs.openstack.org/charm-guide/latest/community/software-bug.html#essential-information

I see that you've supplied some of it in the "log files" tar file, but it would be useful to know the MaaS, Juju, bundle information to get a better picture of the system.

We need the additional info as there are so many moving parts in an OpenStack deployment that it would be difficult to track this down without it. Thanks.

Changed in charm-rabbitmq-server:
status: New → Incomplete
Revision history for this message
Peter Jose De Sousa (pjds) wrote :

hey @ajkavanagh, please find the info below

overview of your environment:

9 Node Openstack, hyperconverged

3 nodes MAAS

Juju version: 2.9.31-ubuntu-amd64
Openstack: Yoga/Focal
MAAS: 3.1.0-10901-g.f1f8f1505

Identified areas:

The clustering method https://github.com/openstack/charm-rabbitmq-server/blob/eac35c1d9907cde3da007a4478d244168e5bc048/hooks/rabbit_utils.py#L1678 will return false even though the units appear clustered.

Steps to reproduce:

Deploy openstack with rabbitmq in standard archtecture. Observe rabbitmq in active/idle, but related units are stuck in '{amqp|messaging}-relation incomplete'.

Did the appearance of the problem coincide with a change made to the cloud (configuration or topological)?

No

Has AppArmor configuration been modified on the cloud nodes?

No

Are Policy overrides in use?

No

Please find the bundle/relations status attached

Revision history for this message
Ian Booth (wallyworld) wrote :

Can we get an export of the database put on a private fileshare, eg juju create-backup
We'll need to inspect up to 6 or so collections to see where the data might be incomplete and use that as a starting point to figure out what's wrong.

Revision history for this message
Peter Jose De Sousa (pjds) wrote :

please find database dumps and sanitised dumps attached here: https://private-fileshare.canonical.com/~pjds/dsv/lp1978566/

Revision history for this message
Ian Booth (wallyworld) wrote :

I had a look at the above fileshare and saw an export bundle yaml file there but no database dump/export.

Changed in juju:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack RabbitMQ Server Charm because there has been no activity for 60 days.]

Changed in charm-rabbitmq-server:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.