Canonical Juju

[2.6.8] juju retry-provisioning does not work on machines after receiving "suitable availability zone for machine <num> not found"

Bug #1843134 reported by Dmitrii Shcherbakov on 2019-09-07

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Won't Fix	Medium	Unassigned	Canonical Juju 3.0.3

Bug Description

2.6.8-bionic-amd64

1) Ran `juju deploy <bundle>` with 5/6 machines that had constraints that could be satisfied in MAAS, 1/6 machines in MAAS did not have a correct tag

machines:
  "0":
    constraints: tags=compute zones=z1
  "1":
    constraints: tags=compute zones=z2
  "2":
    constraints: tags=compute zones=z3
  "3":
    constraints: tags=compute zones=z1
  "4":
    constraints: tags=compute zones=z2
  "5": # <- there was a machine in az3 but without the 'compute tag'
    constraints: tags=compute zones=z3

2) got "suitable availability zone for machine 5 not found"

3) Added the tag to the 6th machine in MAAS;

4) ran `juju retry-provisioning 5`

5) Juju did not try to find a machine again in MAAS and start the deployment.

ubuntu@maas:~$ juju machines
Machine State DNS Inst id Series AZ Message
0 pending 10.232.1.78 adze bionic z1 Deploying: Configuring OS
0/lxd/0 pending pending bionic
0/lxd/1 pending pending bionic
0/lxd/2 pending pending bionic
0/lxd/3 pending pending bionic
0/lxd/4 pending pending bionic
0/lxd/5 pending pending bionic
0/lxd/6 pending pending bionic
0/lxd/7 pending pending bionic
1 pending 10.232.1.182 emere bionic z2 Deploying: Installing OS
1/lxd/0 pending pending bionic
1/lxd/1 pending pending bionic
1/lxd/2 pending pending bionic
1/lxd/3 pending pending bionic
1/lxd/4 pending pending bionic
1/lxd/5 pending pending bionic
1/lxd/6 pending pending bionic
1/lxd/7 pending pending bionic
2 pending 10.232.24.16 obambo bionic z3 Deploying: Configuring storage
2/lxd/0 pending pending bionic
2/lxd/1 pending pending bionic
2/lxd/2 pending pending bionic
2/lxd/3 pending pending bionic
2/lxd/4 pending pending bionic
2/lxd/5 pending pending bionic
2/lxd/6 pending pending bionic
3 pending 10.232.24.2 ipotane bionic z1 Deploying: Loading ephemeral
3/lxd/0 pending pending bionic
3/lxd/1 pending pending bionic
3/lxd/2 pending pending bionic
3/lxd/3 pending pending bionic
3/lxd/4 pending pending bionic
3/lxd/5 pending pending bionic
4 pending 10.232.24.3 kachina bionic z2 Deploying: Loading ephemeral
4/lxd/0 pending pending bionic
4/lxd/1 pending pending bionic
4/lxd/2 pending pending bionic
4/lxd/3 pending pending bionic
5 down pending bionic suitable availability zone for machine 5 not found

juju retry-provisioning 5 --debug
16:14:42 INFO juju.cmd supercommand.go:57 running juju [2.6.8 gc go1.10.4]
16:14:42 DEBUG juju.cmd supercommand.go:58 args: []string{"/snap/juju/8873/bin/juju", "retry-provisioning", "5", "--debug"}
16:14:42 INFO juju.juju api.go:67 connecting to API addresses: [10.232.1.60:17070]
16:14:42 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://10.232.1.60:17070/model/357f3ec9-febe-42b8-80f7-8e5a24d0dd3b/api"
16:14:42 INFO juju.api apiclient.go:624 connection established to "wss://10.232.1.60:17070/model/357f3ec9-febe-42b8-80f7-8e5a24d0dd3b/api"
16:14:42 DEBUG juju.api monitor.go:35 RPC connection died
16:14:42 INFO cmd supercommand.go:502 command finished

Richard Harding (rharding) on 2019-09-09

Changed in juju:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

David van der Spek (vanderspek-david) wrote on 2020-11-18:

Not sure if I missed it in the documentation but for the past 6 months I have not been able to figure out what the correct procedure is for dealing with a node that fails during deployment. The only solution I have found thus far is destroy the model and start fresh, which isn't really a solution. Using `retry-provisioning` gives the same logs as shown above, however, I am unsure if that is what retry-provisioning is actually meant to do. There seems to be a lack of documentation what retry-provisioning exactly does and in what situations it can and can't be used. Also, I haven't been able to find documentation for the correct procedure to deal with nodes that fail during deployment.

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-11-19:

retry-provisioning was originally written to cater for the case where a machine could not be provisioned because of a cloud api rate limit exceeded type error

https://juju.is/docs/help-additions

ie, the provisioning would have succeeded if not for the api throttling.

If there's another type of failure, then the current best approach is to remove the machine from the juju model and retry

juju remove-machine <x> --force

It would be useful if retry-provisioning could be used for other types of "try again and it would work" type errors.

Changed in juju:
milestone:	none → 3.0.0

Canonical Juju QA Bot (juju-qa-bot) on 2022-10-22

Changed in juju:
milestone:	3.0.0 → 3.0.1

Canonical Juju QA Bot (juju-qa-bot) on 2022-11-15

Changed in juju:
milestone:	3.0.1 → 3.0.2

Canonical Juju QA Bot (juju-qa-bot) on 2022-11-16

Changed in juju:
milestone:	3.0.2 → 3.0.3

Revision history for this message

Juan M. Tirado (tiradojm) wrote on 2023-01-16:

This bug is really old. Reopen it if this error persists on modern juju versions.

Changed in juju:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.