[yoga] nova-cloud-controller units stuck waiting: Incomplete relations: messaging

Bug #1971451 reported by Bas de Bruijne
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Invalid
Undecided
Unassigned
OpenStack RabbitMQ Server Charm
Fix Committed
Undecided
Unassigned

Bug Description

In testrun:
https://solutions.qa.canonical.com/testruns/testRun/a9e7b27f-bdb1-426a-b6d4-1f71a076d18b

The ncc units are stuck waiting:
```
nova-cloud-controller/0 waiting idle 0/lxd/7 10.246.167.175 8774/tcp,8775/tcp Incomplete relations: messaging
  filebeat/51 active idle 10.246.167.175 Filebeat ready.
  hacluster-nova-cloud-controller/2 active idle 10.246.167.175 Unit is ready and clustered
  landscape-client/51 maintenance idle 10.246.167.175 Need computer-title and juju-info to proceed
  logrotated/45 active idle 10.246.167.175 Unit is ready.
  nova-cloud-controller-mysql-router/2 active idle 10.246.167.175 Unit is ready
  nrpe/51 active idle 10.246.167.175 icmp,5666/tcp Ready
  public-policy-routing/25 active idle 10.246.167.175 Unit is ready
  telegraf/51 active idle 10.246.167.175 9103/tcp Monitoring nova-cloud-controller/0 (source version/commit cc7fa21)
```

In the logs we see:
```
var/log/juju/unit-nova-cloud-controller-1.log:2022-05-03 00:42:17 DEBUG unit.nova-cloud-controller/1.juju-log server.go:327 Skipping 10.246.169.105 password not sent which indicates unit is not ready.
var/log/juju/unit-nova-cloud-controller-1.log-2022-05-03 00:42:17 DEBUG unit.nova-cloud-controller/1.juju-log server.go:327 Skipping 10.246.169.98 password not sent which indicates unit is not ready.
var/log/juju/unit-nova-cloud-controller-1.log-2022-05-03 00:42:17 DEBUG unit.nova-cloud-controller/1.juju-log server.go:327 Skipping 10.246.169.72 password not sent which indicates unit is not ready.
```

However, the rabbitmq units themselves show that they are happy. We know that the relations are getting rendered because of the juju-show-unit output that is in the crashdump.
Pausing and resume the ncc units does not help, suggesting that its not a race condition.

Link to crashdumps etc:
https://oil-jenkins.canonical.com/artifacts/a9e7b27f-bdb1-426a-b6d4-1f71a076d18b/index.html
List of occurrences of this bug:
https://solutions.qa.canonical.com/bugs/bugs/bug/1971451

description: updated
summary: - [yoga-edge] nova-cloud-controller units stuck waiting: Incomplete
- relations: messaging
+ [yoga] nova-cloud-controller units stuck waiting: Incomplete relations:
+ messaging
tags: added: cdo-qa foundations-engine
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Moving over to rabbitmq-server as upon inspection of the juju status, many of the related services are waiting on an incomplete messaging relation. A quick perusal suggests this may be related to not creating users after the service is clustered.

Changed in charm-nova-cloud-controller:
status: New → Invalid
Revision history for this message
Billy Olsen (billy-olsen) wrote :

2022-05-02 21:12:16 DEBUG unit.rabbitmq-server/2.juju-log server.go:327 cluster:40: Sufficient number of peer units to form cluster 3
2022-05-02 21:12:16 INFO unit.rabbitmq-server/2.juju-log server.go:327 cluster:40: Informing peers via leaderdb to change cluster-partition-handling to pause_minority
2022-05-02 21:12:16 DEBUG juju.worker.uniter.remotestate watcher.go:615 got leader settings change for rabbitmq-server/2: ok=true
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 Traceback (most recent call last):
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/cluster-relation-changed", line 1062, in <module>
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 hooks.execute(sys.argv)
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/charmhelpers/core/hookenv.py", line 963, in execute
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 self._hooks[hook_name]()
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 1474, in wrapped_f
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 f(*args, **kwargs)
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/cluster-relation-changed", line 615, in cluster_changed
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 rabbit.ConfigRenderer(
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 File "/var/lib/juju/agents/unit-rabbitmq-server-2/charm/hooks/rabbit_utils.py", line 193, in __init__
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 for config_path, data in config.items():
2022-05-02 21:12:16 WARNING unit.rabbitmq-server/2.cluster-relation-changed logger.go:60 AttributeError: 'function' object has no attribute 'items'
2022-05-02 21:12:16 ERROR juju.worker.uniter.operation runhook.go:146 hook "cluster-relation-changed" (via explicit, bespoke hook script) failed: exit status 1
2022-05-02 21:12:16 DEBUG juju.machinelock machinelock.go:202 created rotating log file "/var/log/juju/machine-lock.log" with max size 10 MB and max backups 5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)
Changed in charm-rabbitmq-server:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/846233
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/f790b2f64589de7f1fa0390e3b47116e79d086c2
Submitter: "Zuul (22348)"
Branch: master

commit f790b2f64589de7f1fa0390e3b47116e79d086c2
Author: Billy Olsen <email address hidden>
Date: Thu Jun 16 15:26:07 2022 -0700

    CONFIG_FILES is not a constant

    The CONFIG_FILES is not a constant, its a method and when this
    is passed to the ConfigRenderer it will cause a failure because
    the items() function does not exist (since its not a dict).

    Change-Id: Ice6cce6a736d96883eb8bc003852c2df60af7c62
    Closes-Bug: 1971451

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

I am currently facing this issue and had to upgrade rabbitmq-server to "latest/edge". Raising it as field-medium.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (stable/jammy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (stable/jammy)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/850774
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/e86467b57abaaa61cac45ffc852a0f7c16910b9d
Submitter: "Zuul (22348)"
Branch: stable/jammy

commit e86467b57abaaa61cac45ffc852a0f7c16910b9d
Author: Billy Olsen <email address hidden>
Date: Thu Jun 16 15:26:07 2022 -0700

    CONFIG_FILES is not a constant

    The CONFIG_FILES is not a constant, its a method and when this
    is passed to the ConfigRenderer it will cause a failure because
    the items() function does not exist (since its not a dict).

    Change-Id: Ice6cce6a736d96883eb8bc003852c2df60af7c62
    Closes-Bug: 1971451
    (cherry picked from commit f790b2f64589de7f1fa0390e3b47116e79d086c2)

tags: added: in-stable-jammy
Revision history for this message
Aymen Frikha (aym-frikha) wrote :

Hello, I'm also facing this issue with focal/yoga.
Is it possible to backport it to focal branch ?

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Aymen; the bug fixed here is not present in the stable/focal git branch which pushes to the 3.8/stable branch in charmhub. i.e. between the 3.8/stable and 3.9/stable branches, the CONFIG_FILES constant switched to a function.

Thus if you are facing a similar issue, it would be best to open a new bug with the issues and please put "[stable/focal]" or "[track=3.8]" in the bug heading to indicate that it is a problem with the 3.8 charm. Thanks!

Revision history for this message
Jeffrey Chang (modern911) wrote :

SolQA got a similar case for openstack yoga/stable with juju 3.1, and memcache on latest/candidate(38, which is also latest/stable).
testrun: https://solutions.qa.canonical.com/testruns/44492416-057f-4468-956d-5b0817bec54a,
n-c-c is waiting memcache, but the memcached unit is active.

nova-cloud-controller/0 waiting idle 0/lxd/8 10.246.166.191 8774-8775/tcp Incomplete relations: memcache

in the logs I can see

2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: Generating template context from neutron api relation
2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: Missing required data: neutron_url neutron_plugin neutron_security_groups
2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: database relations's interface, shared-db, is related but has no units in the relation.
2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: messaging relation's interface, amqp, is related awaiting the following data from the relationship: rabbitmq_password.
2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: identity relations's interface, identity-service, is related but has no units in the relation.
2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: neutron-api relations's interface, neutron-api, is related but has no units in the relation.
2023-10-03 12:01:05 INFO unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: memcache relations's interface, memcache, is related but has no units in the relation.
2023-10-03 12:01:05 DEBUG unit.nova-cloud-controller/0.juju-log server.go:325 memcache:348: VIP HA: VIP is set 10.246.173.22 10.246.169.22
2023-10-03 12:01:06 INFO juju.worker.uniter.operation runhook.go:186 ran "memcache-relation-changed" hook (via explicit, bespoke hook script)
2023-10-03 12:01:06 DEBUG juju.worker.uniter.operation executor.go:135 committing operation "run relation-changed (348; unit: memcached/1) hook" for nova-cloud-controller/0

2023-10-03 12:01:16 WARNING unit.nova-cloud-controller/0.juju-log server.go:325 cloud-compute:381: Could not get memcache servers: 'private-address'

Please let me know if I should file a new bug.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

@modern911 I don't think it's the same bug; it appears to be something else entirely. I've seen a similar juju related issue where the private-address isn't available (yet) at the time the call is made to juju and the charm isn't able to handle it; I'm not sure if it's the same bug though. (I'm struggling to find the bug though).

Revision history for this message
Alexander Balderson (asbalderson) wrote :

@ajkavanagh I found the private and public address bug its https://bugs.launchpad.net/juju/+bug/1924780

I've added nova-cloud-controller as an affected project.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.