Zone constraints causes placement to fail although there available machines

Bug #1819365 reported by Pedro Guimarães on 2019-03-10
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju
High
Harry Pidcock

Bug Description

Juju version: 2.5.2 (candidate channel on snap package)

Here is the machine placement for this deployment: https://pastebin.canonical.com/p/TZNCrPMXvc/

The final result is: https://pastebin.canonical.com/p/pc8nfnRc62/

I can see failures such as: 22 down pending bionic No available machine matches constraints: [('agent_name', ['3df6e30c-a88d-41c1-8489-636881af71eb']), ('tags', ['landscape']), ('zone', ['AZ3'])] (resolved to "tags=landscape zone=AZ3")

Although placement for landscape machines is defined as:
  "15":
    constraints: tags=landscape

I do not define any zone constraint for those nodes but juju points it is not able to resolve a node to add landscape application.

I will add juju-crashdump as soon as I'm able to download it.

Pedro Guimarães (pguimaraes) wrote :

I've found a similar bug: https://bugs.launchpad.net/juju/+bug/1786309. But I consider it is discussing a different issue, with same final error.

Juju defaults to 'least-used' HA placement. (So if you deploy APP -n3, we
will spread the units of the app to AZ1, AZ2, AZ3 to actually get the
benefit of *availability* zones to prevent a single failure from taking out
multiple units).
I believe we do have code that says "try the next AZ, if we get a failure
provisioning there, skip that AZ and try the next one". However, we need to
understand the failure is AZ specific, and not some other
could-never-be-satisfied error.

On Sun, Mar 10, 2019 at 10:45 PM Pedro Guimarães <email address hidden>
wrote:

> I've found a similar bug: https://bugs.launchpad.net/juju/+bug/1786309.
> But I consider it is discussing a different issue, with same final
> error.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1819365
>
> Title:
> Zone constraints causes placement to fail although there available
> machines
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1819365/+subscriptions
>

Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (5.0 KiB)

This looks specific to the case where you also specify tags as constraints but I managed to reproduce it even by just using an AZ constraint.

The first batch of tests makes it feel like tags are a problem:

# success
juju add-machine --constraints zones=AZ1
juju add-machine --constraints zones=AZ2
juju add-machine --constraints zones=AZ3

# immediate failure
juju add-machine --constraints 'tags=control-openstack zones=AZ1'
created machine 3

juju status
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas foundations-maas 2.5.2 unsupported 01:52:35Z

Machine State DNS Inst id Series AZ Message
0 pending 192.0.2.159 xfyxt6 bionic AZ1 Deploying: 'cloudinit' searching for network datasources
1 pending 192.0.2.161 qexg6r bionic AZ2 Deploying: /images/ubuntu/amd64/ga-18.04/bionic/daily/squashfs
2 pending 192.0.2.163 836e34 bionic AZ3 Deploying: /images/ubuntu/amd64/ga-18.04/bionic/daily/boot-initrd
3 down pending bionic suitable availability zone for machine 3 not found

I tried many different tags and there are definitely free nodes out there in the right AZ and with the right tag. As soon as a tag constraint was added juju immediately failed with the message above (no retries, retry-provisioning doesn't work as well).

2019-03-14 02:01:41 ERROR juju.provisioner provisioner_task.go:1178 cannot start instance for machine "<num>": suitable availability zone for machine <num> not found

I then did some random testing as below and managed to make juju fail with adding nodes in AZ1 with `juju add-machine --constraints 'zones=AZ1'` (see the screenshot - there are quite a few of the machines ready in AZ1):

juju status
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas foundations-maas 2.5.2 unsupported 02:03:48Z

Machine State DNS Inst id Series AZ Message
0 started 192.0.2.159 xfyxt6 bionic AZ1 Deployed
1 started 192.0.2.161 qexg6r bionic AZ2 Deployed
2 started 192.0.2.163 836e34 bionic AZ3 Deployed
3 down pending bionic suitable availability zone for machine 3 not found
4 down pending bionic suitable availability zone for machine 4 not found
5 down pending bionic suitable availability zone for machine 5 not found
6 down pending bionic suitable availability zone for machine 6 not found
7 down pending bionic suitable availability zone for machine 7 not found
8 down pending bionic suitable availability zone for machine 8 not found
9 down pending bionic suitable availability zone for machine 9 not found
10 pending 192.0.2.164 pgk84a bionic AZ2 Deploying: Power state queried: on
11 down pending bionic suitable availability zone for machine 11 not found
12 down pending bionic ...

Read more...

tags: added: cpe-onsite
Tim Penhey (thumper) wrote :

I know that some work happened in this area. Could we get this retested with the latest stable Juju?

Changed in juju:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Harry Pidcock (hpidcock) on 2020-02-27
Changed in juju:
status: Expired → Fix Committed
assignee: nobody → Harry Pidcock (hpidcock)
milestone: none → 2.7.4
importance: Undecided → High
Changed in juju:
milestone: 2.7.4 → 2.7.5
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers