juju does not properly prioritize constraints in a v4 bundle

Bug #1686887 reported by Dmitrii Shcherbakov
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned
MAAS
Invalid
Medium
Unassigned

Bug Description

Scenario: 9 'compute' tags, 3 'storage' tags, 3 'neutron' tags (see a picture attached).

Out of 9 nodes with 'compute' tags, 3 nodes are with the 'neutron' tag (mixed). All nodes with the 'storage' tag do not contain any other tags.

One of the machines that should have contained a neutron tag is missing: juju picked one of the 'mixed' neutron-compute nodes for a purely compute node. This is reproducible and is not a one-time thing.

juju show-machine 2
...
      current: provisioning error
      message: 'cannot run instances: cannot run instance: No available machine matches
        constraints: [(''tags'', [''neutron'']), (''agent_name'', [''53a16e43-1437-4599-83ac-48dc99e6b3eb'']),
        (''interfaces'', [''zeromq-configuration:space=2;0:space=2;nrpe-external-master:space=2;cluster:space=2;neutron-plugin-api:space=2;quantum-network-service:space=2;amqp:space=2;data:space=2;ha:space=2;amqp-nova:space=2'']),
        (''zone'', [''default''])] (resolved to "interfaces=zeromq-configuration:space=2;0:space=2;nrpe-external-master:space=2;cluster:space=2;neutron-plugin-api:space=2;quantum-network-service:space=2;amqp:space=2;data:space=2;ha:space=2;amqp-nova:space=2
        tags=neutron zone=default")'
...

This is a v4 bundle (https://github.com/juju/juju-bundlelib/blob/master/jujubundlelib/validation.py#L317):

machines:
  0:
    constraints: "tags=neutron"
  1:
    constraints: "tags=neutron"
  2:
    constraints: "tags=neutron"
  3:
    constraints: "tags=compute"
  4:
    constraints: "tags=compute"
  5:
    constraints: "tags=compute"
  6:
    constraints: "tags=storage"
  7:
    constraints: "tags=storage"
  8:
    constraints: "tags=storage"
services:
...

Seems like Juju does not perform the necessary calculations to avoid such a situation.

I realize that there may be conflicts in tag prioritization (picking one or the other leads to an error either way) so there would have to be a conflict resolution mechanism defined.

Currently, it is totally undocumented (correct me if I am wrong), which allocation priorities exist. I can, of course, delete compute tags on neutron nodes but I would rather not.

Specifying tags=usetag,^donotusethistag is not a good suggestion in my view. What if I have 10 tags assigned to a single node.

Versions:

dmitriis@maas-master:~$ dpkg -l 'maas*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=========================================-=========================-=========================-========================================================================================
ii maas 2.2.0~rc2+bzr5983-0ubuntu all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.2.0~rc2+bzr5983-0ubuntu all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.2.0~rc2+bzr5983-0ubuntu all MAAS server common files
ii maas-dhcp 2.2.0~rc2+bzr5983-0ubuntu all MAAS DHCP server
ii maas-dns 2.2.0~rc2+bzr5983-0ubuntu all MAAS DNS server
ii maas-proxy 2.2.0~rc2+bzr5983-0ubuntu all MAAS Caching Proxy
ii maas-rack-controller 2.2.0~rc2+bzr5983-0ubuntu all Rack Controller for MAAS
ii maas-region-api 2.2.0~rc2+bzr5983-0ubuntu all Region controller API service for MAAS
ii maas-region-controller 2.2.0~rc2+bzr5983-0ubuntu all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)

dmitriis@maas-master:~$ juju version
2.2-beta2-xenial-amd64

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
tags: added: bundles cpe
Revision history for this message
John A Meinel (jameinel) wrote :

Juju itself does not determine what machines will use what tags or do any prioritization based on it. We simply hand the information off to MaaS to provide us with machines based on the constraints that a user passed in.

I think it is fair that if you see a request for "foo" and you have some machines with "foo" and some with "foo" and "bar", that it might be possible to give preference for the machines that only have "foo" tag. I'm not sure if this is taking it too far.

I think you're reading a *lot* of user intended semantics into tags, when really it is just a list of fields that apply to specific machines.

It may also be a factor that you're creating a bundle and assuming it is going to exactly match the hardware you have available. And we suffer a bit because
a) MAAS does the actual mapping of requested machine characteristics (like tags) to what machine to use
b) Juju only requests the machines one at a time, so there isn't anywhere that can do a "ok, you're going to need 3 of these, and 2 of these, so lets makes sure not to use X so that it is available for Y".

eg, Juju knows the set that you're asking for, but doesn't do the mapping, and only asks MAAS for them one-by-one so it can't work out the set.

That said, if you have machines which *must be used for neutron* then why are you also marking those nodes as possible nodes for "compute"?

Changed in juju:
importance: Undecided → Wishlist
status: New → Incomplete
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Well, I'd say it is not strictly speaking 'must be used for neutron', rather 'may be used'. I may have x machines suitable for neutron but some of them may end up being used for something else.

The OpenStack example with neutron is an easy one, but I might come up with a different set of tags which may make it less apparent.

---

It could be done analogous to the Linux routing (which picks the most specific route): pick the most specific set of tags and get machines for it first.

Changed in juju:
status: Incomplete → Triaged
Changed in maas:
status: New → Incomplete
status: Incomplete → Triaged
importance: Undecided → Low
milestone: none → 2.3.0
importance: Low → Medium
tags: added: internal
Changed in maas:
milestone: 2.3.0 → 2.3.x
tags: added: cpe-onsite
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Still valid with 2.3.x of both Juju & MAAS.

Have to use hacks like that to avoid the problem:

machines:
  "0":
    series: xenial
    constraints: tags=neutron
  "1":
    series: xenial
    constraints: tags=^neutron
  "2":
    series: xenial
    constraints: tags=^neutron
  "3":
    series: xenial
    constraints: tags=^neutron
  "4":
    series: xenial
    constraints: tags=^neutron

Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Triaged → Invalid
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Wishlist → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.