OpenStack Nova Cloud Controller Charm

Bug #1796923
Comment #17

Comment 17 for bug 1796923

Revision history for this message

Liam Young (gnuoy) wrote on 2018-12-11:

#17

I think we need to be a bit careful about attributing deployment failures to this bug as there are many reasons why a charm may not have got the full relation data from a charm it is related to. I think the latest crashdump is yet another bug (3rd distinct bug attributed this bug). Anyway, <tl;dr> The latest crash dump looks like a duplicate of Bug #1794850 </tl;dr>.

In the lastest crashdump the incomplete relations, as reported in the juju_status.yaml, show both neutron-gateways stuck with "Incomplete relations: network-service", this is consistent with them lacking information from the nova-cloud-controller. Looking at the nova-cloud-controller statuses (and their subordinates) shows them still running ha setup hooks. As admcleod observed the units of the nova-cloud-controller charm appear to get stuck just before 11:00am. Taking nova-cloud-controller/0 as an example nova-cloud-controller/0 (and its subordinates filebeat/66, hacluster-nova/2, nrpe-container/45 and telegraf/67) illustrates this point https://pastebin.canonical.com/p/Cd4XnKwdsj/. So, why are they stuck?

Only the leader hacluster unit should be setting up hacluster configuration, however, two seem to be doing it ad exactly the same time:

$ grep -E "Configuring Group.*nova" nova-cloud-controller_*/var/log/juju/unit-hacluster-nova*log
nova-cloud-controller_0/var/log/juju/unit-hacluster-nova-2.log:2018-11-15 10:55:33 DEBUG juju-log ha:64: Configuring Groups: {'grp_nova_vips': 'res_nova_eth0_vip res_nova_eth1_vip'}
nova-cloud-controller_2/var/log/juju/unit-hacluster-nova-1.log:2018-11-15 10:55:33 DEBUG juju-log ha:64: Configuring Groups: {'grp_nova_vips': 'res_nova_eth0_vip res_nova_eth1_vip'}

This should not be happening as in master this setup is gated by is_leader(). However, the deployment that the crashdump is from is using cs:hacluster-49 which is missing commit 6b6a1776. 6b6a1776 updates the charm from using the old and buggy 'oldest_peer' to using the Juju is-leader primitive. So, I think this should be fixed with the latest charms.

In the lastest crashdump the incomplete relations, as reported in the juju_status.yaml, show both neutron-gateways stuck with "Incomplete relations: network-service", this is consistent with them lacking information from the nova-cloud-controller. Looking at the nova-cloud-controller statuses (and their subordinates) shows them still running ha setup hooks. As admcleod observed the units of the nova-cloud-controller charm appear to get stuck just before 11:00am. Taking nova-cloud-controller/0 as an example nova-cloud-controller/0 (and its subordinates  filebeat/66, hacluster-nova/2, nrpe-container/45 and telegraf/67) illustrates this point https://pastebin.canonical.com/p/Cd4XnKwdsj/. So, why are they stuck?

Only the leader hacluster unit should be setting up hacluster configuration, however, two seem to be doing it ad exactly the same time: