juju deploy doesn't always pick the optimal machine

Bug #1945688 reported by Simon Déziel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

To cope with slow VM provisioning on MAAS, I provision many machines ahead of time (`juju add-machine`). This way I don't have to wait too much when deploying units as juju usually picks a machine that's already "deployed".

Sometimes, juju decides to put the newly deployed unit on a machine that is still provisioning (machine 100 here):

$ juju deploy ./lxd_ubuntu-20.04-amd64.charm
Located local charm "lxd", revision 35
Deploying "lxd" from local charm "lxd", revision 35

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
test overlord maas/default 2.9.14 unsupported 18:07:05Z

App Version Status Scale Charm Store Channel Rev OS Message
https-client active 1 https-client local 35 ubuntu
lxd waiting 0/1 lxd local 35 ubuntu waiting for machine

Unit Workload Agent Machine Public address Ports Message
https-client/45* active idle 81 2602:fc62:b:1018:0:1:0:5
lxd/26 waiting allocating 100 2602:fc62:b:1018:0:1:0:e waiting for machine

Machine State DNS Inst id Series AZ Message
81 started 2602:fc62:b:1018:0:1:0:5 cloud-vm09 focal default Deployed
83 started 2602:fc62:b:1018:0:1:0:9 cloud-vm15 focal default Deployed
86 started 2602:fc62:b:1018:0:1:0:7 cloud-vm05 focal default Deployed
89 started 2602:fc62:b:1018:0:1:0:6 cloud-vm08 focal default Deployed
90 started 2602:fc62:b:1018:0:1:0:f cloud-vm02 focal default Deployed
91 started 2602:fc62:b:1018:0:1:0:2 cloud-vm07 focal default Deployed
92 started 2602:fc62:b:1018:0:1:0:3 cloud-vm10 focal default Deployed
94 started 2602:fc62:b:1018:0:1:0:d cloud-vm11 focal default Deployed
95 started 2602:fc62:b:1018:0:1:0:12 cloud-vm14 focal default Deployed
96 started 2602:fc62:b:1018:0:1:: cloud-vm01 focal default Deployed
97 started 2602:fc62:b:1018:0:1:0:c cloud-vm03 focal default Deployed
98 started 2602:fc62:b:1018:0:1:0:13 cloud-vm12 focal default Deployed
99 pending 2602:fc62:b:1018:0:1:0:b cloud-vm04 focal default Deploying: Configuring OS
100 pending 2602:fc62:b:1018:0:1:0:e cloud-vm06 focal default Deploying: Configuring OS
101 pending 2602:fc62:b:1018:0:1:0:14 cloud-vm13 focal default Deploying: Configuring OS

It should always pick any available machine that's already deployed, not one still deploying. It usually gets this right but not always, or maybe it's random and I am usually lucky ;)

Additional information:

$ juju --version
2.9.15-ubuntu-amd64

Tags: lxd-cloud
Simon Déziel (sdeziel)
description: updated
Revision history for this message
Ian Booth (wallyworld) wrote :

Juju's methodology is to set up the desired model, and then work asynchronously to make reality match the model. Juju doesn't block on machines being provisioned when assigning units - it picks a machine which does not yet have anything assigned to it. That machine may well be provisioning still. But when it does become ready the juju agent on the machine will then work to install any units that had been allocated to the machine. Large bundles benefit from this behaviour for example.

Selection of available machines is not deterministic so you may well have been "lucky" previously. Note that Juju doesn't just pick any unused machine - it makes sure that the machine's memory, cpu, disk etc matches any constraints used when deploying the app/unit.

Should Juju prefer fully provisioned machines, all other things being equal? That approach would ensure less chance of a failed deployment - a non-provisioned machine might fail to come up and thus the unit would not get deployed. Whereas if the unit were preferably placed on an unused provisioned machine, at least it would be running even if the other machine failed.

Revision history for this message
Simon Déziel (sdeziel) wrote : Re: [Bug 1945688] Re: juju deploy doesn't always pick the optimal machine

In my specific use case, all machines have identical specs so having Juju
favor those already provisioned but unused would be ideal. Thanks

Revision history for this message
John A Meinel (jameinel) wrote :

I don't think this is a case that we have encountered before. For most people they have specific expectations and don't want to overcommit (have 10 machines allocated but not doing anything).

I don't have any problem having a preference for Machines that are clean, empty, and already 'running'. I'm not sure if the existing infrastructure makes it easy to select this. (My guess is we prefer them based on numerical order, and the ones that you provision first are the most likely to have come up first.)

Changed in juju:
importance: Undecided → Low
status: New → Triaged
Simon Déziel (sdeziel)
tags: added: lxd-cloud
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.