Running Juju ensure-availability twice in a row adds extra machines
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Expired
|
Medium
|
Unassigned | ||
juju-core |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
There appears to be a race condition of some kind with juju HA bootstrap and MaaS. After a bootstrap, if you run a juju ensure-
Its important to note here that we only have 3 nodes in MaaS tagged with openstack-ha, they're the units we want to use for bootstrap HA.
$ juju ensure-availability --constraints "tags=openstack-ha"
maintaining machines: 0
adding machines: 1, 2
$ juju status
environment: staging-bootstack
machines:
"0":
agent-state: started
agent-version: 1.20.9.1
dns-name: apollo.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"1":
agent-state: started
agent-version: 1.20.9.1
dns-name: ceco.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"2":
agent-state: started
agent-version: 1.20.9.1
dns-name: altman.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
services: {}
$ juju ensure-availability --constraints "tags=openstack-ha"
maintaining machines: 0, 2
adding machines: 3
demoting machines 1
$ juju status
environment: staging-bootstack
machines:
"0":
agent-state: started
agent-version: 1.20.9.1
dns-name: apollo.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"1":
agent-state: started
agent-version: 1.20.9.1
dns-name: ceco.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"2":
agent-state: started
agent-version: 1.20.9.1
dns-name: altman.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"3":
agent-
409 CONFLICT (No matching node is available.)'
instance-id: pending
series: trusty
state-
services: {}
$ juju ensure-availability --constraints "tags=openstack-ha"
maintaining machines: 0, 2
promoting machines 1
demoting machines 3
$ juju status
environment: staging-bootstack
machines:
"0":
agent-state: started
agent-version: 1.20.9.1
dns-name: apollo.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"1":
agent-state: started
agent-version: 1.20.9.1
dns-name: ceco.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"2":
agent-state: started
agent-version: 1.20.9.1
dns-name: altman.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"3":
agent-
409 CONFLICT (No matching node is available.)'
instance-id: pending
series: trusty
state-
services: {}
$ juju ensure-availability --constraints "tags=openstack-ha"
maintaining machines: 0, 1, 2
removing machines 3
$ juju ensure-availability --constraints "tags=openstack-ha"
$ echo $?
Comparing this to a working situation, where we wait for the nodes to settle:
$ juju ensure-availability --constraints "tags=openstack-ha"
maintaining machines: 0
adding machines: 1, 2
$ juju status
environment: staging-bootstack
machines:
"0":
agent-state: started
agent-version: 1.20.9.1
dns-name: apollo.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"1":
agent-state: started
agent-version: 1.20.9.1
dns-name: altman.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
"2":
agent-state: started
agent-version: 1.20.9.1
dns-name: ceco.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=8 mem=32768M tags=use-
state-
services: {}
$ juju ensure-availability --constraints "tags=openstack-ha"
$ juju ensure-availability --constraints "tags=openstack-ha"
$ echo $?
0
This occasionally occurs while doing openstack deployments where there are errors, so we end up having to tidy up the machines afterwards.
I'm not exactly sure how to fix this, since I could imagine we could be in a state where a unit is stuck in adding-vote, and we do actually want to remove it.
Changed in juju-core: | |
status: | New → Triaged |
importance: | Undecided → High |
tags: | added: ha |
tags: | added: canonical-is maas-provider |
Changed in juju-core: | |
milestone: | none → next-stable |
Changed in juju-core: | |
milestone: | 1.21 → 1.22 |
Changed in juju-core: | |
milestone: | 1.22-alpha1 → 1.23 |
Changed in juju-core: | |
milestone: | 1.23 → 1.24-alpha1 |
Changed in juju-core: | |
milestone: | 1.24-alpha1 → 1.25.0 |
summary: |
- Juju bootstrap HA mode with MaaS occasionally tries to create extra - machines + Running Juju ensure-availability twice in a row adds extra machines |
no longer affects: | juju-core/1.24 |
tags: | added: improvement |
Changed in juju-core: | |
milestone: | 1.25.0 → none |
importance: | High → Medium |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 2.1.0 |
Changed in juju-core: | |
status: | Triaged → Won't Fix |
We won't be able to fix this for 1.23, so leaving as targetted at 1.24