juju tries to acquire machines in specific zones even when no zone placement directive is specified

Bug #1706462 reported by Jason Hobbs
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned
MAAS
Invalid
Undecided
Unassigned

Bug Description

This causes MAAS to create new machines in the zone requested, rather than finding a machine outside of that zone that already exists. See the comments from bug 1706196.

When you tell juju to deploy, or bootstrap, anything to acquire a machine, it will ask MAAS for a list of zones, then start asking MAAS for a machine, zone by zone, with the expectation that it if one zone doesn't have a machine, MAAS will say no machines available and juju will go to the next zone and try it.

That worked fine (albiet ineffeciently) until MAAS added pod support. Now, instead of saying saying no machines are available in that zone, MAAS will create a new machine and return it to juju.

Perhaps juju should not include a zone constraint when acquiring machines, when no zone constraint has been supplied by the user.

This is with juju 2.2.2.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.2.3
status: New → Triaged
importance: Undecided → High
tags: added: foundations-engine
Revision history for this message
John A Meinel (jameinel) wrote :

This happens because Juju has a default policy of spreading instances across Availability Zones because that's what the "Availability" portion is supposed to mean.

As an example, deploying 3 units of an application on AWS tries to put the first unit on zone a, then the second unit on zone b, and the third in zone c, so that if any zone fails, then you still have availability for your application.

If we defaulted to just never supplying zones, then all instances would end up in the same zone, which means that a hardware failure would kill all instances.

It seems that MAAS is mixing AZ to mean "failure domain" with "collecting a group of machines together". Such that while there may be domains that we should spread units across, there are other domains that should be considered 'off limits' for provisioning.

I don't think we can just "not pass a zone if the user didn't specify it", because the default path behavior for all providers that I've seen is to use a single zone for instances if you don't ask for something else.

We may need a way to flag a zone as 'off limits' for automatic provisioning, which would need a fair bit of work to track what zones and what their usability is.

Another possibility would be to add a constraint to applications. Where users could specify something like:
 juju deploy application --constraints "zones=a,b,c"
And then rather than listing all Availability Zones, and round-robinning across them, we would only round-robin across the explicit set that was passed.

And if it was really useful, we could potentially support negated zones, so something like:
 juju deploy application --constraints "zones=^d,^e"

(If there are no positive zones listed, then the set of valid zones is all zones minus the negated ones.)

It feels like MAAS needs something other than "Availability Zones" as a mechanism for grouping machines, though. Because AZ is something that you *should* spread across, while a collection of machines used for a specific purpose is something that you *shouldn't* use for anything but that purpose. It might be a *Zone* but it isn't an *Availability* Zone.

Changed in juju:
milestone: 2.2.3 → 2.3.0
Revision history for this message
Tim Penhey (thumper) wrote :

I agree with John's summary. Due to the significant change in behaviour, this should be a 2.3 change, not 2.2.

We should add a model config option for "deploy-constraints". This should be a string that is parsed into a constraints.Value type. This type should gain

Zones *[]string

in a similar manner to spaces. It should be able to handl both inclusive and negative values. We already have the code to handle spaces in this way, so adding zones shouldn't be too much work.

Model deploy constraints are used unless specifically overridden by constraints passed in during deploy. This would allow a model to be created in maas that says "zones=^pod" to deal with this issue.

Changed in maas:
status: New → Invalid
tags: removed: foundations-engine
tags: added: foundations-engine
Changed in juju:
assignee: nobody → Eric Claude Jones (ecjones)
Tim Penhey (thumper)
Changed in juju:
milestone: 2.3.0 → 2.3-rc1
Changed in juju:
assignee: Eric Claude Jones (ecjones) → nobody
Changed in juju:
assignee: nobody → Eric Claude Jones (ecjones)
Ian Booth (wallyworld)
summary: juju tries to acquire machines in specific zones even when no zone
- constraint is specified
+ placement directive is specified
Revision history for this message
james beedy (jamesbeedy) wrote :

@ecjones @wallyworld feel liek I'm hitting something similar on aws too http://paste.ubuntu.com/25968550/

Revision history for this message
james beedy (jamesbeedy) wrote :

@jameinel might you have 2 cents to give here?

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

@jamesbeedy, there's a tiny change that went in after beta3, in edge now, that might resolve the problem you're seeing in the pastebin. What you're seeing is mostly due to the recent provisioner changes.

Tim Penhey (thumper)
Changed in juju:
milestone: 2.3-rc1 → 2.3.1
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1706462] Re: juju tries to acquire machines in specific zones even when no zone placement directive is specified

If you're seeing panics in the firewaller, you should be able to increase
the debug level to TRACE and be able to see the traceback for the panic,
which would help us determine where the nil reference lies.

On Wed, Nov 15, 2017 at 8:49 PM, james beedy <email address hidden> wrote:

> @ecjones @wallyworld feel liek I'm hitting something similar on aws too
> http://paste.ubuntu.com/25968550/
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1706462
>
> Title:
> juju tries to acquire machines in specific zones even when no zone
> placement directive is specified
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1706462/+subscriptions
>

Changed in juju:
status: Triaged → In Progress
status: In Progress → Triaged
Changed in juju:
milestone: 2.3.1 → none
Tim Penhey (thumper)
Changed in juju:
milestone: none → 2.3.2
John A Meinel (jameinel)
Changed in juju:
milestone: 2.3.2 → none
assignee: Eric Claude Jones (ecjones) → nobody
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.