Zone constraints causes placement to fail although there available machines

Bug #1819365 reported by Pedro Guimarães
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Harry Pidcock

Bug Description

Juju version: 2.5.2 (candidate channel on snap package)

Here is the machine placement for this deployment: https://pastebin.canonical.com/p/TZNCrPMXvc/

The final result is: https://pastebin.canonical.com/p/pc8nfnRc62/

I can see failures such as: 22 down pending bionic No available machine matches constraints: [('agent_name', ['3df6e30c-a88d-41c1-8489-636881af71eb']), ('tags', ['landscape']), ('zone', ['AZ3'])] (resolved to "tags=landscape zone=AZ3")

Although placement for landscape machines is defined as:
  "15":
    constraints: tags=landscape

I do not define any zone constraint for those nodes but juju points it is not able to resolve a node to add landscape application.

I will add juju-crashdump as soon as I'm able to download it.

Tags: cpe-onsite
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

I've found a similar bug: https://bugs.launchpad.net/juju/+bug/1786309. But I consider it is discussing a different issue, with same final error.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1819365] Re: Zone constraints causes placement to fail although there available machines

Juju defaults to 'least-used' HA placement. (So if you deploy APP -n3, we
will spread the units of the app to AZ1, AZ2, AZ3 to actually get the
benefit of *availability* zones to prevent a single failure from taking out
multiple units).
I believe we do have code that says "try the next AZ, if we get a failure
provisioning there, skip that AZ and try the next one". However, we need to
understand the failure is AZ specific, and not some other
could-never-be-satisfied error.

On Sun, Mar 10, 2019 at 10:45 PM Pedro Guimarães <email address hidden>
wrote:

> I've found a similar bug: https://bugs.launchpad.net/juju/+bug/1786309.
> But I consider it is discussing a different issue, with same final
> error.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1819365
>
> Title:
> Zone constraints causes placement to fail although there available
> machines
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1819365/+subscriptions
>

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (5.0 KiB)

This looks specific to the case where you also specify tags as constraints but I managed to reproduce it even by just using an AZ constraint.

The first batch of tests makes it feel like tags are a problem:

# success
juju add-machine --constraints zones=AZ1
juju add-machine --constraints zones=AZ2
juju add-machine --constraints zones=AZ3

# immediate failure
juju add-machine --constraints 'tags=control-openstack zones=AZ1'
created machine 3

juju status
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas foundations-maas 2.5.2 unsupported 01:52:35Z

Machine State DNS Inst id Series AZ Message
0 pending 192.0.2.159 xfyxt6 bionic AZ1 Deploying: 'cloudinit' searching for network datasources
1 pending 192.0.2.161 qexg6r bionic AZ2 Deploying: /images/ubuntu/amd64/ga-18.04/bionic/daily/squashfs
2 pending 192.0.2.163 836e34 bionic AZ3 Deploying: /images/ubuntu/amd64/ga-18.04/bionic/daily/boot-initrd
3 down pending bionic suitable availability zone for machine 3 not found

I tried many different tags and there are definitely free nodes out there in the right AZ and with the right tag. As soon as a tag constraint was added juju immediately failed with the message above (no retries, retry-provisioning doesn't work as well).

2019-03-14 02:01:41 ERROR juju.provisioner provisioner_task.go:1178 cannot start instance for machine "<num>": suitable availability zone for machine <num> not found

I then did some random testing as below and managed to make juju fail with adding nodes in AZ1 with `juju add-machine --constraints 'zones=AZ1'` (see the screenshot - there are quite a few of the machines ready in AZ1):

juju status
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas foundations-maas 2.5.2 unsupported 02:03:48Z

Machine State DNS Inst id Series AZ Message
0 started 192.0.2.159 xfyxt6 bionic AZ1 Deployed
1 started 192.0.2.161 qexg6r bionic AZ2 Deployed
2 started 192.0.2.163 836e34 bionic AZ3 Deployed
3 down pending bionic suitable availability zone for machine 3 not found
4 down pending bionic suitable availability zone for machine 4 not found
5 down pending bionic suitable availability zone for machine 5 not found
6 down pending bionic suitable availability zone for machine 6 not found
7 down pending bionic suitable availability zone for machine 7 not found
8 down pending bionic suitable availability zone for machine 8 not found
9 down pending bionic suitable availability zone for machine 9 not found
10 pending 192.0.2.164 pgk84a bionic AZ2 Deploying: Power state queried: on
11 down pending bionic suitable availability zone for machine 11 not found
12 down pending bionic ...

Read more...

tags: added: cpe-onsite
Revision history for this message
Tim Penhey (thumper) wrote :

I know that some work happened in this area. Could we get this retested with the latest stable Juju?

Changed in juju:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
Harry Pidcock (hpidcock) wrote :
Harry Pidcock (hpidcock)
Changed in juju:
status: Expired → Fix Committed
assignee: nobody → Harry Pidcock (hpidcock)
milestone: none → 2.7.4
importance: Undecided → High
Revision history for this message
Harry Pidcock (hpidcock) wrote :
Changed in juju:
milestone: 2.7.4 → 2.7.5
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.