I think we need to be a bit careful about attributing deployment failures to this bug as there are many reasons why a charm may not have got the full relation data from a charm it is related to. I think the latest crashdump is yet another bug (3rd distinct bug attributed this bug). Anyway, <tl;dr> The latest crash dump looks like a duplicate of Bug #1794850 </tl;dr>.
In the lastest crashdump the incomplete relations, as reported in the juju_status.yaml, show both neutron-gateways stuck with "Incomplete relations: network-service", this is consistent with them lacking information from the nova-cloud-controller. Looking at the nova-cloud-controller statuses (and their subordinates) shows them still running ha setup hooks. As admcleod observed the units of the nova-cloud-controller charm appear to get stuck just before 11:00am. Taking nova-cloud-controller/0 as an example nova-cloud-controller/0 (and its subordinates filebeat/66, hacluster-nova/2, nrpe-container/45 and telegraf/67) illustrates this point https://pastebin.canonical.com/p/Cd4XnKwdsj/. So, why are they stuck?
Only the leader hacluster unit should be setting up hacluster configuration, however, two seem to be doing it ad exactly the same time:
This should not be happening as in master this setup is gated by is_leader(). However, the deployment that the crashdump is from is using cs:hacluster-49 which is missing commit 6b6a1776. 6b6a1776 updates the charm from using the old and buggy 'oldest_peer' to using the Juju is-leader primitive. So, I think this should be fixed with the latest charms.
I think we need to be a bit careful about attributing deployment failures to this bug as there are many reasons why a charm may not have got the full relation data from a charm it is related to. I think the latest crashdump is yet another bug (3rd distinct bug attributed this bug). Anyway, <tl;dr> The latest crash dump looks like a duplicate of Bug #1794850 </tl;dr>.
In the lastest crashdump the incomplete relations, as reported in the juju_status.yaml, show both neutron-gateways stuck with "Incomplete relations: network-service", this is consistent with them lacking information from the nova-cloud- controller. Looking at the nova-cloud- controller statuses (and their subordinates) shows them still running ha setup hooks. As admcleod observed the units of the nova-cloud- controller charm appear to get stuck just before 11:00am. Taking nova-cloud- controller/ 0 as an example nova-cloud- controller/ 0 (and its subordinates filebeat/66, hacluster-nova/2, nrpe-container/45 and telegraf/67) illustrates this point https:/ /pastebin. canonical. com/p/Cd4XnKwds j/. So, why are they stuck?
Only the leader hacluster unit should be setting up hacluster configuration, however, two seem to be doing it ad exactly the same time:
$ grep -E "Configuring Group.*nova" nova-cloud- controller_ */var/log/ juju/unit- hacluster- nova*log controller_ 0/var/log/ juju/unit- hacluster- nova-2. log:2018- 11-15 10:55:33 DEBUG juju-log ha:64: Configuring Groups: {'grp_nova_vips': 'res_nova_eth0_vip res_nova_eth1_vip'} controller_ 2/var/log/ juju/unit- hacluster- nova-1. log:2018- 11-15 10:55:33 DEBUG juju-log ha:64: Configuring Groups: {'grp_nova_vips': 'res_nova_eth0_vip res_nova_eth1_vip'}
nova-cloud-
nova-cloud-
This should not be happening as in master this setup is gated by is_leader(). However, the deployment that the crashdump is from is using cs:hacluster-49 which is missing commit 6b6a1776. 6b6a1776 updates the charm from using the old and buggy 'oldest_peer' to using the Juju is-leader primitive. So, I think this should be fixed with the latest charms.