juju doesn't randomise AZ selection where AZs have equal machines

Bug #1933690 reported by Junien F
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Hi,

Using Juju version 2.8.9. We deploy on an openstack cluster which has 3 AZs, each AZ has a single node for now. Here's the number of VMs on each node :

AZ1 node : 99
AZ2 node : 67
AZ3 node : 54

Something seems broken here.

I found the related bug 1919069, but the bug is for controllers only.

All machines are deployed from, essentially, a single Juju HA controller, in a lot of models where each model has its own Openstack credentials and cannot see the machines in the other models.

I don't see any randomness in https://github.com/juju/juju/blob/e13c61a8c69da5ada341012f9d971f6f9e9b78e8/provider/common/availabilityzones.go, so I suspect that, for example, the first machine a model will always go in AZ1. However, I don't know if a Juju controller will consider all the machines from all the models when inferring the "popularity" of an AZ, or if it will only consider the machines that the current model credentials can see.

Thanks

Tags: canonical-is
Revision history for this message
Ian Booth (wallyworld) wrote :

// Machines are spread across availability zones based on lowest population of
// the "available" zones, and any supplied zone constraints.

But this only applies to the machines in the current model IIANM. So Juju won't consider what's been done for other models, or indeed other controllers, or nodes created totally outside Juju itself. Within a given model, the spread should be even though.

Revision history for this message
Junien F (axino) wrote :

But even within a model, the first machine will always go to AZ1 no ?

Revision history for this message
Ian Booth (wallyworld) wrote :

Yeah, I think so; all things being equal with regard to number of machines in each AZ, I think it just picks the first one. We could randomise the choice of AZ in that case.

Changed in juju:
milestone: none → 2.9-next
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Junien F (axino) wrote :

FYI there's no clean way to move openstack VMs to a different AZ once they're created by juju, so this is pretty bad...

Revision history for this message
Ian Booth (wallyworld) wrote :

In investigating this bug, I have found a related issue. To allocate machines across AZs, the provisioner worker maintains a list of the AZs and how many machines are in each one. But, this list is reset any time the controller agent restarts, so after an upgrade for example, all the known machine allocations are set back to 0 and Juju makes poor decisions from that point about how to allocate the machines. So this makes the issue as one that needs fixing.

Changed in juju:
importance: Wishlist → High
assignee: nobody → Ian Booth (wallyworld)
status: Triaged → In Progress
milestone: 2.9-next → 2.9.9
summary: - openstack : juju doesn't distribute units across AZs
+ juju doesn't randomise AZ selection where AZs have equal machines
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1933690] Re: openstack : juju doesn't distribute units across AZs

I thought at start up we pulled through what AZs were in used from our list
of machines. If we aren't then the provisioning code is certainly incorrect.

On Wed, Jul 7, 2021 at 9:50 PM Ian Booth <email address hidden> wrote:

> In investigating this bug, I have found a related issue. To allocate
> machines across AZs, the provisioner worker maintains a list of the AZs
> and how many machines are in each one. But, this list is reset any time
> the controller agent restarts, so after an upgrade for example, all the
> known machine allocations are set back to 0 and Juju makes poor
> decisions from that point about how to allocate the machines. So this
> makes the issue as one that needs fixing.
>
> ** Changed in: juju
> Importance: Wishlist => High
>
> ** Changed in: juju
> Assignee: (unassigned) => Ian Booth (wallyworld)
>
> ** Changed in: juju
> Status: Triaged => In Progress
>
> ** Changed in: juju
> Milestone: 2.9-next => 2.9.9
>
> ** Summary changed:
>
> - openstack : juju doesn't distribute units across AZs
> + juju doesn't randomise AZ selection where AZs have equal machines
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1933690
>
> Title:
> juju doesn't randomise AZ selection where AZs have equal machines
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1933690/+subscriptions
>

Revision history for this message
Ian Booth (wallyworld) wrote :

@jam it "does" but the process of doing that requires that we have first queried the running instances and that only happens later in the worker loop. It's been broken for a while.

Revision history for this message
Ian Booth (wallyworld) wrote :

https://github.com/juju/juju/pull/13148

We'll pick a random AZ if there's more than one with the lowest machine count.

Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Loïc Gomez (kotodama)
tags: added: canonical-is
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.