[2.6.8] juju retry-provisioning does not work on machines after receiving "suitable availability zone for machine <num> not found"

Bug #1843134 reported by Dmitrii Shcherbakov
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Medium
Unassigned

Bug Description

2.6.8-bionic-amd64

1) Ran `juju deploy <bundle>` with 5/6 machines that had constraints that could be satisfied in MAAS, 1/6 machines in MAAS did not have a correct tag

machines:
  "0":
    constraints: tags=compute zones=z1
  "1":
    constraints: tags=compute zones=z2
  "2":
    constraints: tags=compute zones=z3
  "3":
    constraints: tags=compute zones=z1
  "4":
    constraints: tags=compute zones=z2
  "5": # <- there was a machine in az3 but without the 'compute tag'
    constraints: tags=compute zones=z3

2) got "suitable availability zone for machine 5 not found"

3) Added the tag to the 6th machine in MAAS;

4) ran `juju retry-provisioning 5`

5) Juju did not try to find a machine again in MAAS and start the deployment.

ubuntu@maas:~$ juju machines
Machine State DNS Inst id Series AZ Message
0 pending 10.232.1.78 adze bionic z1 Deploying: Configuring OS
0/lxd/0 pending pending bionic
0/lxd/1 pending pending bionic
0/lxd/2 pending pending bionic
0/lxd/3 pending pending bionic
0/lxd/4 pending pending bionic
0/lxd/5 pending pending bionic
0/lxd/6 pending pending bionic
0/lxd/7 pending pending bionic
1 pending 10.232.1.182 emere bionic z2 Deploying: Installing OS
1/lxd/0 pending pending bionic
1/lxd/1 pending pending bionic
1/lxd/2 pending pending bionic
1/lxd/3 pending pending bionic
1/lxd/4 pending pending bionic
1/lxd/5 pending pending bionic
1/lxd/6 pending pending bionic
1/lxd/7 pending pending bionic
2 pending 10.232.24.16 obambo bionic z3 Deploying: Configuring storage
2/lxd/0 pending pending bionic
2/lxd/1 pending pending bionic
2/lxd/2 pending pending bionic
2/lxd/3 pending pending bionic
2/lxd/4 pending pending bionic
2/lxd/5 pending pending bionic
2/lxd/6 pending pending bionic
3 pending 10.232.24.2 ipotane bionic z1 Deploying: Loading ephemeral
3/lxd/0 pending pending bionic
3/lxd/1 pending pending bionic
3/lxd/2 pending pending bionic
3/lxd/3 pending pending bionic
3/lxd/4 pending pending bionic
3/lxd/5 pending pending bionic
4 pending 10.232.24.3 kachina bionic z2 Deploying: Loading ephemeral
4/lxd/0 pending pending bionic
4/lxd/1 pending pending bionic
4/lxd/2 pending pending bionic
4/lxd/3 pending pending bionic
5 down pending bionic suitable availability zone for machine 5 not found

juju retry-provisioning 5 --debug
16:14:42 INFO juju.cmd supercommand.go:57 running juju [2.6.8 gc go1.10.4]
16:14:42 DEBUG juju.cmd supercommand.go:58 args: []string{"/snap/juju/8873/bin/juju", "retry-provisioning", "5", "--debug"}
16:14:42 INFO juju.juju api.go:67 connecting to API addresses: [10.232.1.60:17070]
16:14:42 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.232.1.60:17070/model/357f3ec9-febe-42b8-80f7-8e5a24d0dd3b/api"
16:14:42 INFO juju.api apiclient.go:624 connection established to "wss://10.232.1.60:17070/model/357f3ec9-febe-42b8-80f7-8e5a24d0dd3b/api"
16:14:42 DEBUG juju.api monitor.go:35 RPC connection died
16:14:42 INFO cmd supercommand.go:502 command finished

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
David van der Spek (vanderspek-david) wrote :

Not sure if I missed it in the documentation but for the past 6 months I have not been able to figure out what the correct procedure is for dealing with a node that fails during deployment. The only solution I have found thus far is destroy the model and start fresh, which isn't really a solution. Using `retry-provisioning` gives the same logs as shown above, however, I am unsure if that is what retry-provisioning is actually meant to do. There seems to be a lack of documentation what retry-provisioning exactly does and in what situations it can and can't be used. Also, I haven't been able to find documentation for the correct procedure to deal with nodes that fail during deployment.

Revision history for this message
Ian Booth (wallyworld) wrote :

retry-provisioning was originally written to cater for the case where a machine could not be provisioned because of a cloud api rate limit exceeded type error

https://juju.is/docs/help-additions

ie, the provisioning would have succeeded if not for the api throttling.

If there's another type of failure, then the current best approach is to remove the machine from the juju model and retry

juju remove-machine <x> --force

It would be useful if retry-provisioning could be used for other types of "try again and it would work" type errors.

Changed in juju:
milestone: none → 3.0.0
Changed in juju:
milestone: 3.0.0 → 3.0.1
Changed in juju:
milestone: 3.0.1 → 3.0.2
Changed in juju:
milestone: 3.0.2 → 3.0.3
Revision history for this message
Juan M. Tirado (tiradojm) wrote :

This bug is really old. Reopen it if this error persists on modern juju versions.

Changed in juju:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.