retry-provisioning doesn't retry failed deployments on MAAS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| juju |
Medium
|
Unassigned |
Bug Description
Using MAAS 2.1.2 (bzr 5555) and Juju 2.0.1:
I tried deploying 6 units of Ubuntu, each with a LXD container also running Ubuntu. Two of the machines failed to deploy (because of bug 1635560 but unimportant - just note that it's transient). When I tried to retry-provisioning nothing happened.
⟫ juju status
Model Controller Cloud/Region Version
default hare hare 2.0.1
App Version Status Scale Charm Store Rev OS Notes
ubuntu 16.04 waiting 8/12 ubuntu jujucharms 8 ubuntu
Unit Workload Agent Machine Public address Ports Message
ubuntu/0 active idle 0 10.2.0.54 ready
ubuntu/1* active idle 1 10.2.0.55 ready
ubuntu/2 active idle 2 10.2.0.56 ready
ubuntu/3 active idle 3 10.2.0.57 ready
ubuntu/4 waiting allocating 4 10.2.0.52 waiting for machine
ubuntu/5 waiting allocating 5 10.2.0.53 waiting for machine
ubuntu/6 active idle 0/lxd/0 10.2.0.61 ready
ubuntu/7 active idle 1/lxd/0 10.2.0.58 ready
ubuntu/8 active idle 2/lxd/0 10.2.0.60 ready
ubuntu/9 active idle 3/lxd/0 10.2.0.59 ready
ubuntu/10 waiting allocating 4/lxd/0 waiting for machine
ubuntu/11 waiting allocating 5/lxd/0 waiting for machine
Machine State DNS Inst id Series AZ
0 started 10.2.0.54 4y3hbp xenial Raphael
0/lxd/0 started 10.2.0.61 juju-d0b4d0-0-lxd-0 xenial
1 started 10.2.0.55 4y3hbq xenial default
1/lxd/0 started 10.2.0.58 juju-d0b4d0-1-lxd-0 xenial
2 started 10.2.0.56 abnf8x xenial Raphael
2/lxd/0 started 10.2.0.60 juju-d0b4d0-2-lxd-0 xenial
3 started 10.2.0.57 x7nfeg xenial default
3/lxd/0 started 10.2.0.59 juju-d0b4d0-3-lxd-0 xenial
4 down 10.2.0.52 4y3h7x xenial Raphael
4/lxd/0 pending pending xenial
5 down 10.2.0.53 4y3h7y xenial default
5/lxd/0 pending pending xenial
⟫ juju retry-provisioning 5 --debug
18:07:46 INFO juju.cmd supercommand.go:63 running juju [2.0.1 gc go1.6.2]
18:07:46 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "retry-
18:07:46 INFO juju.juju api.go:72 connecting to API addresses: [10.2.0.51:17070]
18:07:46 INFO juju.api apiclient.go:530 dialing "wss://
18:07:47 INFO juju.api apiclient.go:466 connection established to "wss://
18:07:47 DEBUG juju.juju api.go:263 API hostnames unchanged - not resolving
18:07:47 INFO cmd supercommand.go:465 command finished
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Critical |
milestone: | none → 2.1.0 |
Curtis Hovey (sinzui) wrote : | #1 |
tags: | added: maas-provider retry-privisioning |
Changed in juju: | |
importance: | Critical → High |
Anastasia (anastasia-macmood) wrote : | #2 |
Removing 2.1 milestone as we will not be addressing this issue in 2.1.
tags: |
added: retry-provisioning removed: retry-privisioning |
Changed in juju: | |
milestone: | 2.1-rc2 → none |
Sandor Zeestraten (szeestraten) wrote : | #3 |
I hit this today on Juju 2.1.1 and MAAS 2.1.3.
retry-provisioning does nothing and the machine is just down/pending.
John A Meinel (jameinel) wrote : | #4 |
I believe the underlying issue is that maas has handed us an 'instance-id' which I think means that we think we have a concrete instance that is running. Which is different from failing-
John A Meinel (jameinel) wrote : | #5 |
I should also note that MAAS doesn't hand back 'instance for the request you made' but always hands back an exact identifier for a specific machine. So we have to be a bit careful that 'retry-
tags: | added: 4010 |
tags: | added: cdo-qa foundation-engine |
tags: |
added: foundations-engine removed: foundation-engine |
tags: | removed: foundations-engine |
Dmitrii Shcherbakov (dmitriis) wrote : | #6 |
The same for tags updated after 'juju deploy'.
Retry-provisioning should re-query machine metadata if said so in my view. This is a manual action and you probably know what you are doing.
Instead, one has to remove-machine --force and add-unit again.
tags: | added: cpe-onsite |
John A Meinel (jameinel) wrote : Re: [Bug 1645422] Re: retry-provisioning doesn't retry failed deployments on MAAS | #7 |
fwiw, I think the internal issue is that MAAS has already given us an
instance-id, so we think the machine is provisioned. Normally for providers
'juju retry-provisioning' probably does do some of what you want, but only
when an instance hasn't yet been assigned.
On Mon, Oct 23, 2017 at 6:59 AM, Dmitrii Shcherbakov <
<email address hidden>> wrote:
> The same for tags updated after 'juju deploy'.
>
> Retry-provisioning should re-query machine metadata if said so in my
> view. This is a manual action and you probably know what you are doing.
>
> Instead, one has to remove-machine --force and add-unit again.
>
> ** Tags added: cpe-onsite
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https:/
>
> Title:
> retry-provisioning doesn't retry failed deployments on MAAS
>
> To manage notifications about this bug go to:
> https:/
>
Dmitrii Shcherbakov (dmitriis) wrote : | #8 |
I think we just need to define what it means to "provision" better.
Conceptually, I would use the following definition:
provisioning = <matching a machine by constraints & other criteria> + <successfully deploying once and installing a machine agent>
At least for MAAS it is intuitive in my view.
If I have to reconfigure a machine, doing retry-provisioning also makes sense but with the following logic:
1. get a machine ID;
2. a deployment has failed either automatically or via a manual action before machine/unit agents have started;
3. a user has released the machine in MAAS;
4. reconfigured the machine/swapped out hardware etc.
5. a manual retry-provisioning detected that a given ID is no longer allocated and tried to allocate a new ID.
The target idea here would be that one could write an orchestrator/
If a node is not suitable it would be marked as broken by an orchestrator in MAAS and a different node would be picked without making remove-machine --force && add-unit steps.
tags: | added: canonical-bootstack |
Frode Nordahl (fnordahl) wrote : | #9 |
This is still an issue with Juju 2.7.8 and MAAS 2.8.2.
My occurrence is a transient MAAS failed deployment because of *reasons*, and I want Juju to retry so that I can get a working machine.
I see from bug discussion history that there is some disagreement about what retry-provisioning means or does, and I guess I'll add to the scale that to me I expected it to mean that Juju could re-use the machine slot it has in its model and either fill it with a new instance or reach out to maas and do a release+deploy dance with the instance ID it already has.
Right now nothing happens and there is zero feedback to the user.
tags: | added: ps5 |
Frode Nordahl (fnordahl) wrote : | #10 |
Typo in comment #9 juju version is 2.8.7
Pete Vander Giessen (petevg) wrote : | #11 |
Bumping importance to Medium to accurately reflect that this is a legitimate issue, but is not in scope for the current roadmap.
(I agree that it would be very nice to fix.)
Changed in juju: | |
importance: | High → Medium |
Why can't juju automatically retry-provisioning? It knows many cases where provisioning failed. juju is retrying hooks now; users rarely need to retry.