[VM Provisioning] If constraints: zones=<ZONE> causes Juju and MAAS to provision new VMs on same node of correct zone in disrespect of overcommit restrictions

Bug #1842896 reported by Pedro Guimarães
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Undecided
Unassigned
MAAS
Expired
High
Unassigned

Bug Description

We are testing Juju as VM scheduler and provisioner with MAAS KVM pods.
In this testing scenario we had Juju 2.6.4 + MAAS 2.5.

We set 4 nodes as KVM hosts on MAAS.
3 nodes on zone-1 / 1 node on zone-2

If we set a VM with restriction:
machines:
  # KVMs
  "1":
    constraints: cores=2 mem=4G root-disk=8G spaces=oam-space,testspace zones=zone-1

Juju will pick the nodes from the right zone to compose the VM but it will always pick the same node over and over again.
It will schedule VMs in disregard to any overcommit limitations that we may have set on MAAS.

Related branches

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1842896] [NEW] [VM Provisioning] If constraints: zones=<ZONE> causes Juju to provision new VMs on same node of correct zone in disrespect of overcommit restrictions

Are you saying that you have other machines in different zones that would
have capacity, but you are explicitly requesting zone-1 ? Or is the issue
that provisioning 3 machines from zone-1 is not being spread to the second
and 3rd maas nodes?
I don't believe Juju explicitly requests the KVM host, but would set the
zone constraint. Have you tried doing a similar request without Juju?

John
=:->

On Thu, Sep 5, 2019 at 1:30 PM Pedro Guimarães <email address hidden>
wrote:

> Public bug reported:
>
> We are testing Juju as VM scheduler and provisioner with MAAS KVM pods.
> In this testing scenario we had Juju 2.6.4 + MAAS 2.6.
>
> We set 4 nodes as KVM hosts on MAAS.
> 3 nodes on zone-1 / 1 node on zone-2
>
> If we set a VM with restriction:
> machines:
> # KVMs
> "1":
> constraints: cores=2 mem=4G root-disk=8G spaces=oam-space,testspace
> zones=zone-1
>
> Juju will pick the nodes from the right zone to compose the VM but it will
> always pick the same node over and over again.
> It will schedule VMs in disregard to any overcommit limitations that we
> may have set on MAAS.
>
> ** Affects: juju
> Importance: Undecided
> Status: New
>
> ** Affects: maas
> Importance: Undecided
> Status: New
>
> ** Also affects: maas
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1842896
>
> Title:
> [VM Provisioning] If constraints: zones=<ZONE> causes Juju to
> provision new VMs on same node of correct zone in disrespect of
> overcommit restrictions
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1842896/+subscriptions
>

Revision history for this message
Pedro Guimarães (pguimaraes) wrote : Re: [VM Provisioning] If constraints: zones=<ZONE> causes Juju to provision new VMs on same node of correct zone in disrespect of overcommit restrictions

Hi John, I am saying that, given a zone constraint, Juju does not pick the other 2 machines on zone-1 to schedule its VMs. It always picks the same node, in disregard of spare resources somewhere else in the same zone.

In short, yes I've done manual tests. Still, I need to say that Juju/MAAS interaction on VM building is kind of a gray area for me.

Looking into MAAS docs, I can only see one VM creation op ("composition" as MAAS calls it):
POST /MAAS/api/2.0/pods/{id}/?op=compose
Where id is the KVM host ("pod") identifier. Here: https://maas.io/docs/api
This is consistent with MAAS cli, in which I always need to point which KVM host I want to build my VM from.

Therefore, I believe Juju is selecting one node to build the VM, which means it is acting as a scheduler.

However, I also believe MAAS is not reporting back a failure whenever it finds out that host has not enough resources or building the new VM will pass its overcommit threshold.

Can you confirm Juju part (i.e. picking one node, always the same)? What would happen if MAAS returns a failure in this case?

summary: - [VM Provisioning] If constraints: zones=<ZONE> causes Juju to provision
- new VMs on same node of correct zone in disrespect of overcommit
- restrictions
+ [VM Provisioning] If constraints: zones=<ZONE> causes Juju and MAAS to
+ provision new VMs on same node of correct zone in disrespect of
+ overcommit restrictions
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

Hi @jameinel, it was not clear but I can see what you mean now.
Indeed, a quick search for "compose" on Juju source code returns void, and any other similar reference.

Correct me if I am wrong, but MAAS provider is following this path:

Digging into Juju code, I can see machine allocation starts at:
https://github.com/juju/juju/blob/cfda560669d10ca1883d24292e2817243c8c9928/provider/maas/environ.go#L983
Which will eventually lead to:
https://github.com/juju/juju/blob/cfda560669d10ca1883d24292e2817243c8c9928/provider/maas/environ.go#L809

That is a call to your gomaasapi lib, which ends on:
https://github.com/juju/gomaasapi/blob/65f2e261f089fd379297df942bf12193bd300825/controller.go#L570

Which actually is calling allocate machine: POST /MAAS/api/2.0/machines/?op=allocate

Indeed, on MAAS source code, that call boils down to a logic where, if I cannot find an available machine, MAAS will try to boot a composed machine (VM) out of a Pod:
https://github.com/maas/maas/blob/5fe288985249afedadf4656b595238856b13ce4d/src/maasserver/api/machines.py#L2330
Which is exactly what I was looking for.

Regarding Juju logic, am I correct? Or is there any other path to allocate/create machines on MAAS provider?

Changed in juju:
status: New → Invalid
description: updated
Changed in maas:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Blake Rouse (blake-rouse)
milestone: none → 2.7.0alpha1
Changed in maas:
milestone: 2.7.0b1 → 2.7.0b2
Changed in maas:
milestone: 2.7.0b2 → none
Changed in maas:
assignee: Blake Rouse (blake-rouse) → nobody
Revision history for this message
Björn Tillenius (bjornt) wrote :

Why do you put the 3 nodes in one zone, and 1 in another?

If you put each node in its own AZ, things will work as you described.

Changed in maas:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.