juju deploy fails because no machines can be found in MaaS in a given AZ

Bug #1828076 reported by Giuseppe Petralia on 2019-05-07
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju
Medium
Unassigned

Bug Description

When trying to deploy a bundle where machines section looks like this:

'''
machines:
  machines:
  "0": {constraints: tags=infra zones=AZ1, series: *series}
  "1": {constraints: tags=infra zones=AZ1, series: *series}
  "2": {constraints: tags=infra zones=AZ2, series: *series}
  "3": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "4": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "5": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "6": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "7": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "8": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "9": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "10": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "11": {constraints: tags=os-cs-vnf zones=AZ1, series: *series}
  "12": {constraints: tags=os-cs-vplus zones=AZ1, series: *series}
  "13": {constraints: tags=os-cs-vplus zones=AZ1, series: *series}
  "14": {constraints: tags=os-cs-vplus zones=AZ1, series: *series}
  "15": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "16": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "17": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "18": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "19": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "20": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "21": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "22": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "23": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "24": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "25": {constraints: tags=os-cs-vnf zones=AZ2, series: *series}
  "26": {constraints: tags=os-cs-vplus zones=AZ2, series: *series}
  "27": {constraints: tags=os-cs-vplus zones=AZ2, series: *series}
  "28": {constraints: tags=os-cs-vplus zones=AZ2, series: *series}
  "29": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "30": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "31": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "32": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "33": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "34": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "35": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "36": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "37": {constraints: tags=os-cs-vplus zones=AZ3, series: *series}
  "38": {constraints: tags=os-cs-vnf zones=AZ3, series: *series}
  "39": {constraints: tags=os-cs-vplus zones=AZ3, series: *series}
  "40": {constraints: tags=os-cs-vplus zones=AZ3, series: *series}
'''

No machines can be found in AZ1 from Maas and deploy fails.
Machines in AZ2 and AZ3 are added correctly.

The deploy is run with option --map-machines=existing
because we added manually machine id "0" using the command:

juju add-machine --constraints "tags=infra,spaces=space-mgmt,space-api,space-sta" ssh:user@IP

This machine is supposed to be in AZ1 but the availability is not correctly showed in juju status.

We are using:
Juju version 2.5.4-xenial-amd64

tags: added: canonical-bootstack
Tim Penhey (thumper) wrote :

There isn't enough information here to do any debugging from the Juju point of view.

Are you able to provide a list of potential machines from MAAS?

It is also somewhat unclear what the underlying problem is.

Is it that the bundle doesn't deploy?
Is it that the initial machine you have added doesn't show the right zone?
It is possible that the machine you added isn't in zone AZ1 because you didn't ask for it to be in AZ1.

Unless we are able to see all the potential machines from MAAS with their tags, zone, and spaces, there is no way we'd be able to match up the behaviour of Juju to what someone might do manually.

Changed in juju:
status: New → Incomplete

Note that they have a manual machine as machine 0, not a machine recognized
by the provider. (I would guess it does ultimately come from the provider
but we don't have a way to know that as we don't have an instance id).

As a manually provisioned machine, we probably don't track things like AZ
or spaces correctly to line up that machine with the machines listed in the
bundle.

John
=:->

On Wed, May 8, 2019, 06:35 Tim Penhey <email address hidden> wrote:

> There isn't enough information here to do any debugging from the Juju
> point of view.
>
> Are you able to provide a list of potential machines from MAAS?
>
> It is also somewhat unclear what the underlying problem is.
>
> Is it that the bundle doesn't deploy?
> Is it that the initial machine you have added doesn't show the right zone?
> It is possible that the machine you added isn't in zone AZ1 because you
> didn't ask for it to be in AZ1.
>
> Unless we are able to see all the potential machines from MAAS with
> their tags, zone, and spaces, there is no way we'd be able to match up
> the behaviour of Juju to what someone might do manually.
>
> ** Changed in: juju
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1828076
>
> Title:
> juju deploy fails because no machines can be found in MaaS in a given
> AZ
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1828076/+subscriptions
>

Giuseppe Petralia (peppepetra) wrote :

The list of potential machines from MAAS is the following:

$ maas maas-root nodes read zone=AZ1 | jq '.[] | "\(.system_id) \(.zone.name) \(.tag_names)"'
"axsmcd AZ1 [\"os-cs-vnf\",\"physical\"]"
"htt6st AZ1 [\"os-cs-vnf\",\"physical\"]"
"rpbttp AZ1 [\"os-cs-vnf\",\"physical\"]"
"tkba4s AZ1 [\"os-cs-vnf\",\"physical\"]"
"pcnwph AZ1 [\"os-cs-vnf\",\"physical\"]"
"k8nb7d AZ1 [\"os-cs-vnf\",\"physical\"]"
"msntby AZ1 [\"os-cs-vnf\",\"physical\"]"
"8pnr7t AZ1 [\"os-cs-vnf\",\"physical\"]"
"7skp7g AZ1 [\"os-cs-vnf\",\"physical\"]"
"kmc4bp AZ1 [\"os-cs-vnf\",\"physical\"]"
"dfdqde AZ1 [\"physical\",\"os-cs-vplus\"]"
"gcyqfr AZ1 [\"physical\",\"os-cs-vplus\"]"
"a7c7y6 AZ1 [\"physical\",\"os-cs-vplus\"]"
"7cm7sg AZ1 [\"physical\",\"infra\"]"
"hbkdwr AZ1 [\"virtual\",\"juju\"]"

$ maas maas-root nodes read zone=AZ2 | jq '.[] | "\(.system_id) \(.zone.name) \(.tag_names)"'
"xe6y6d AZ2 [\"os-cs-vnf\",\"physical\"]"
"xqcwht AZ2 [\"os-cs-vnf\",\"physical\"]"
"nn8mad AZ2 [\"os-cs-vnf\",\"physical\"]"
"y66t48 AZ2 [\"os-cs-vnf\",\"physical\"]"
"8bat3n AZ2 [\"os-cs-vnf\",\"physical\"]"
"xdxgt6 AZ2 [\"os-cs-vnf\",\"physical\"]"
"wpbm48 AZ2 [\"os-cs-vnf\",\"physical\"]"
"s7smgc AZ2 [\"os-cs-vnf\",\"physical\"]"
"d3wg3t AZ2 [\"os-cs-vnf\",\"physical\"]"
"kgfrpb AZ2 [\"os-cs-vnf\",\"physical\"]"
"m7eqwm AZ2 [\"os-cs-vnf\",\"physical\"]"
"scc768 AZ2 [\"physical\",\"os-cs-vplus\"]"
"fk83sk AZ2 [\"physical\",\"os-cs-vplus\"]"
"cx4xt6 AZ2 [\"physical\",\"os-cs-vplus\"]"
"ms8qx7 AZ2 [\"physical\",\"infra\"]"

$ maas maas-root nodes read zone=AZ3 | jq '.[] | "\(.system_id) \(.zone.name) \(.tag_names)"'
 maas maas-root nodes read zone=AZ1 | jq '.[] | "\(.system_id) \(.zone.name) \(.tag_names)"'"k4kwta AZ3 [\"os-cs-vnf\",\"physical\"]"
"bqbw3p AZ3 [\"os-cs-vnf\",\"physical\"]"
"66gwcc AZ3 [\"os-cs-vnf\",\"physical\"]"
"73twnf AZ3 [\"os-cs-vnf\",\"physical\"]"
"qcxx4a AZ3 [\"os-cs-vnf\",\"physical\"]"
"me3em4 AZ3 [\"os-cs-vnf\",\"physical\"]"
"t8srpe AZ3 [\"os-cs-vnf\",\"physical\"]"
"4ccgs3 AZ3 [\"os-cs-vnf\",\"physical\"]"
"mm6wkr AZ3 [\"physical\",\"os-cs-vplus\"]"
"44xkst AZ3 [\"os-cs-vnf\",\"physical\"]"
"7w7nne AZ3 [\"physical\",\"os-cs-vplus\"]"
"6qmfbx AZ3 [\"physical\",\"os-cs-vplus\"]"

All machines were in Ready state when we hit the bug.

The problems that we faced were:
- The bundle doesn't deploy because no machines in AZ1 can be found, even if there are
  available machines satisfying the constraints.
- The initial machine doesn't show the right zone.
- Even if we ask explicitly for the first machine to be added in AZ1,
  then the deploy fails again because no machines can be found in AZ1

description: updated
John A Meinel (jameinel) wrote :
Download full text (7.2 KiB)

According to your initial report, you were using:
 "0": {constraints: tags=infra zones=AZ1, series: *series}
  "1": {constraints: tags=infra zones=AZ1, series: *series}
  "2": {constraints: tags=infra zones=AZ2, series: *series}

However, from what you listed from MAAS, there is

a) Only 1 node tagged "infra" in AZ1, and you have requested 2
b) No nodes in AZ2 that are tagged "infra", and you have requested one.

As mentioned earlier, if you did have a manually provisioned and added
machine, that does throw a bit of a wrench in the works if you are asking
Juju to then match up machines in the model to hypothetical machines in the
bundle.

On Wed, May 8, 2019 at 1:50 PM Giuseppe Petralia <email address hidden>
wrote:

> The list of potential machines from MAAS is the following:
>
> $ maas maas-root nodes read zone=AZ1 | jq '.[] | "\(.system_id) \(.
> zone.name) \(.tag_names)"'
> "axsmcd AZ1 [\"os-cs-vnf\",\"physical\"]"
> "htt6st AZ1 [\"os-cs-vnf\",\"physical\"]"
> "rpbttp AZ1 [\"os-cs-vnf\",\"physical\"]"
> "tkba4s AZ1 [\"os-cs-vnf\",\"physical\"]"
> "pcnwph AZ1 [\"os-cs-vnf\",\"physical\"]"
> "k8nb7d AZ1 [\"os-cs-vnf\",\"physical\"]"
> "msntby AZ1 [\"os-cs-vnf\",\"physical\"]"
> "8pnr7t AZ1 [\"os-cs-vnf\",\"physical\"]"
> "7skp7g AZ1 [\"os-cs-vnf\",\"physical\"]"
> "kmc4bp AZ1 [\"os-cs-vnf\",\"physical\"]"
> "dfdqde AZ1 [\"physical\",\"os-cs-vplus\"]"
> "gcyqfr AZ1 [\"physical\",\"os-cs-vplus\"]"
> "a7c7y6 AZ1 [\"physical\",\"os-cs-vplus\"]"
> "7cm7sg AZ1 [\"physical\",\"infra\"]"
> "hbkdwr AZ1 [\"virtual\",\"juju\"]"
>
>
> $ maas maas-root nodes read zone=AZ2 | jq '.[] | "\(.system_id) \(.
> zone.name) \(.tag_names)"'
> "xe6y6d AZ2 [\"os-cs-vnf\",\"physical\"]"
> "xqcwht AZ2 [\"os-cs-vnf\",\"physical\"]"
> "nn8mad AZ2 [\"os-cs-vnf\",\"physical\"]"
> "y66t48 AZ2 [\"os-cs-vnf\",\"physical\"]"
> "8bat3n AZ2 [\"os-cs-vnf\",\"physical\"]"
> "xdxgt6 AZ2 [\"os-cs-vnf\",\"physical\"]"
> "wpbm48 AZ2 [\"os-cs-vnf\",\"physical\"]"
> "s7smgc AZ2 [\"os-cs-vnf\",\"physical\"]"
> "d3wg3t AZ2 [\"os-cs-vnf\",\"physical\"]"
> "kgfrpb AZ2 [\"os-cs-vnf\",\"physical\"]"
> "m7eqwm AZ2 [\"os-cs-vnf\",\"physical\"]"
> "scc768 AZ2 [\"physical\",\"os-cs-vplus\"]"
> "fk83sk AZ2 [\"physical\",\"os-cs-vplus\"]"
> "cx4xt6 AZ2 [\"physical\",\"os-cs-vplus\"]"
> "ms8qx7 AZ2 [\"physical\",\"infra\"]"
>
>
> $ maas maas-root nodes read zone=AZ3 | jq '.[] | "\(.system_id) \(.
> zone.name) \(.tag_names)"'
> maas maas-root nodes read zone=AZ1 | jq '.[] | "\(.system_id) \(.
> zone.name) \(.tag_names)"'"k4kwta AZ3 [\"os-cs-vnf\",\"physical\"]"
> "bqbw3p AZ3 [\"os-cs-vnf\",\"physical\"]"
> "66gwcc AZ3 [\"os-cs-vnf\",\"physical\"]"
> "73twnf AZ3 [\"os-cs-vnf\",\"physical\"]"
> "qcxx4a AZ3 [\"os-cs-vnf\",\"physical\"]"
> "me3em4 AZ3 [\"os-cs-vnf\",\"physical\"]"
> "t8srpe AZ3 [\"os-cs-vnf\",\"physical\"]"
> "4ccgs3 AZ3 [\"os-cs-vnf\",\"physical\"]"
> "mm6wkr AZ3 [\"physical\",\"os-cs-vplus\"]"
> "44xkst AZ3 [\"os-cs-vnf\",\"physical\"]"
> "7w7nne AZ3 [\"physical\",\"os-cs-vplus\"]"
> "6qmfbx AZ3 [\"physical\",\"os-cs-vplus\"]"
>
>
> All machines were in Ready state when we hit the bug.
>
> The problems that we faced were:
> - The bundle doesn't deploy because no machines in AZ...

Read more...

Dmitrii Shcherbakov (dmitriis) wrote :

A duplicate created earlier this year: https://bugs.launchpad.net/juju/+bug/1819365

John A Meinel (jameinel) wrote :

I don't think that is the same bug. 1819365 is because *juju* is adding a
Zone placement automatically to support enforced HA (distribute units to
unique AZ so that singular failures don't take out multiple units).
This bug is that they are requesting an explicit AZ, and it is failing.

On Wed, May 8, 2019 at 4:41 PM Dmitrii Shcherbakov <
<email address hidden>> wrote:

> A duplicate created earlier this year:
> https://bugs.launchpad.net/juju/+bug/1819365
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1828076
>
> Title:
> juju deploy fails because no machines can be found in MaaS in a given
> AZ
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1828076/+subscriptions
>

Peter Sabaini (peter-sabaini) wrote :

I'm seeing this in a similar (but not the same) env. As in the case above, I have machine #0 which is added via ssh before deploying the bundle:

$ juju add-machine --constraints "tags=infra,spaces=space-mgmt,space-api,space-sta" ssh:user@ipaddr
$ juju deploy --debug ./$BUNDLE0 --map-machines=existing

This results in "suitable availability zone for machine X not found" messages for all AZ1 nodes, even though in fact there are plenty of nodes avail. in Maas in AZ1.

juju status: https://private-fileshare.canonical.com/~sabaini/lp1828076/juju-status.txt

Minimal test bundle: https://pastebin.canonical.com/p/rp6rTSG9mC/

> Are you able to provide a list of potential machines from MAAS?

Here:
$ maas maas-root nodes read zone=AZ1 | jq -r '.[] | "\(.system_id) \(.zone.name) \(.tag_names)"' | sort
6cpceq AZ1 ["physical","os-cs-vplus"]
8kwayw AZ1 ["physical","os-cs-vnf"]
d3cp3w AZ1 ["physical","os-cs-vnf"]
dxbn87 AZ1 ["physical","os-cs-vnf"]
eka6sd AZ1 ["physical","os-cs-vplus"]
ew8rw3 AZ1 ["physical","os-cs-vnf"]
ffbbse AZ1 ["physical","os-cs-vnf"]
gdrhba AZ1 ["physical","os-cs-vnf"]
h3pf8x AZ1 ["physical","os-cs-vnf"]
hawfsg AZ1 ["physical","os-cs-vplus"]
kgfkrc AZ1 ["physical","os-cs-vplus"]
rcmmkm AZ1 ["physical","os-cs-vnf"]
wtrfwr AZ1 ["physical","os-cs-vnf"]
x7dkdm AZ1 ["physical","os-cs-vplus"]
xqswcf AZ1 ["virtual","juju"]
yp83ae AZ1 ["physical","os-cs-vnf"]

> It is also somewhat unclear what the underlying problem is.
>
> Is it that the bundle doesn't deploy?

Right, the bundle doesn't deploy as it's missing nodes from AZ1.

The expectation would be that upon bundle deploy juju recognizes the machine #0 already added machine and request the additional machines named in the bundle.

If it helps, here are TRACE logs from the controller: https://private-fileshare.canonical.com/~sabaini/lp1828076/juju-ctrl-logs.tar.gz

Changed in juju:
status: Incomplete → New
Tim Penhey (thumper) wrote :

Is this still happening? What work-arounds are being done?

Changed in juju:
status: New → Incomplete
Peter Sabaini (peter-sabaini) wrote :

We haven't had this particular usecase recently (i.e. having a ssh-provided machine in a model and deploying a bundle with --map-machines=existing on top).

The workaround at the time was to add-machine (juju add-machine --series xenial zone=$zone --constraints tags=os-cs-vnf) all the nodes first, then deploy the bundle with --map-machines=existing.

Changed in juju:
status: Incomplete → New
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers