conjure-up kubernetes creates 70+ VMs on KVM managed by MAAS with funny names

Bug #1779161 reported by Adham Sabry
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

Hi,

I have MAAS with KVM on a Server A

After conjure-up Server A, I end up seeing 70+ VMs with funny names.

The root cause of this is that conjure-up/juju does not provide a default naming convention to those VMs and that those VMs are getting created with blank names, in this scenario MAAS comes in and sets their names to some pet names from it's PET library.

To reproduce this issue:
# Follow the instructions in https://tutorials.ubuntu.com/tutorial/create-kvm-pods-with-maas#0
# Commission a new machine and ensure that the commissioning is successful
# Once the 2 above are successful, go ahead with conjure-up kubernetes

The real issue because of this is that there are too many VMs created and I can't track down why or what roles each is playing

StackOverFlow question that demonstrates the problem in details
https://stackoverflow.com/questions/50970133/installed-kubernetes-on-ubuntu-and-i-see-a-lot-of-nodes-are-getting-created-in-m

Github issue reported with logs:
https://github.com/conjure-up/conjure-up/issues/1476

Revision history for this message
Cory Johns (johnsca) wrote :

Adding my context from discussions on IRC:

It sounds like the MAAS is configured with pods so that VMs can be created on demand. The conjure-up logs from the GitHub issue show 5 attempts to start a deployment, where conjure-up was able to connect to the MAAS API, but was unable to connect to an apparently existing controller:

2018-06-23 13:57:26,418 [DEBUG] conjure-up/canonical-kubernetes - maas.py:402 - Found endpoint: http://192.168.1.2:5240/MAAS/ for cloud: microcodez-01
2018-06-23 13:57:26,488 [DEBUG] conjure-up/canonical-kubernetes - events.py:52 - Setting MAASConnected at conjureup/controllers/juju/configapps/gui.py:49
2018-06-23 13:57:27,935 [DEBUG] conjure-up/canonical-kubernetes - telemetry.py:17 - Showing screen: Creating Model
...
OSError: [Errno 113] Connect call failed ('192.168.30.28', 17070)

It seems like there were other attempts to deploy CDK that are not represented in this log, so I'm not clear where the VMs were coming from, but the end result seems to have been that 70+ VMs were created with no clear indication as to what role they were serving.

One obvious point of confusion is that the mentioned tutorial never explains about pre-allocating VMs from the pod, tagging them, and then targeting applications to those machines using tags. For reference, https://docs.maas.io/1.9/en/nodes-tags#using-the-tag documents the syntax of the constraint for targeting the machine via tag, which would be entered in conjure-up on the Configure Application screen (e.g., https://i.imgur.com/JPyhYee.png) which is reached by clicking the Configure button next to each application on the last screen before deploying. I'm not clear from that documentation how you would handle an application that has multiple units, like kubernetes-worker, however.

Another point of confusion is that even if constraints were not specified, I would expect that 1) MAAS would try to fulfill Juju's request for a machine from the already available pool of VMs, and 2) if it could not do so and created a new VM from the pod, Juju would then tag the VM in MAAS to indicate which Juju machine, and possibly which Juju units, were running on that VM (for instance, on AWS my instance i-04c41c1309bde47d4 got the tag juju-machine=conjure-kubernetes-core-0a9-machine-0, and while I didn't see anything about the unit this time, I seem to recall in the past seeing some metadata including a list of what Juju units were on the machine as well).

There also seems to have been some confusion across the board about what role each component of the toolchain serves and what level of control each gives. Perhaps we need a whitepaper giving a technical overview of all the pieces and a general idea of what role each plays in deploying CDK. (That sounded an awful lot like me volunteering myself for that whitepaper. Shoot. ;) Maybe we already have something like that and I'm not aware of it?)

Revision history for this message
Cory Johns (johnsca) wrote :

Adam Stokes pointed to https://docs.jujucharms.com/2.3/en/reference-constraints for more details on how the tag constraints can be specified, though it still doesn't clarify how multiple units would be handled.

Revision history for this message
Richard Harding (rharding) wrote :

In checking into this the current KVM pod work from MAAS is in beta phase and there's ongoing efforts to flesh it out including finishing networking support and things needed for Juju to properly build on top of. For now marking this wishlist as we definitely want this to work but need to coordinate with the feature progression in MAAS and it's not ready for this type of production kubernetes use.

Changed in juju:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Adham Sabry (atdhrhs) wrote :

@Cory: Sorry for the delay in responding here. I have checked the constraints, I don't see in the documentation anything in regards to the naming of the VMs.

I'm not sure what constraints that should be provided at this case to help with the naming convention.

Revision history for this message
Cory Johns (johnsca) wrote :

We discussed this on IRC, but posting it here for reference and adding a bit more detail.

The purpose of setting the constraints is not to influence the name of automatically created VMs, it's to control how the applications get deployed to manually created VMs. With MAAS, the classic usage pattern is to pre-allocate VMs with specific sets of resources, which you would then tag appropriately to create pools of VMs suitable for different roles. You could then target those VMs with tag constraints in Juju / conjure-up, so that the different applications would be placed on VMs that were well aligned with the resources that they need. It would also have the side-effect of preventing new VMs from being created from the pods if the pre-created and tagged VMs were exhausted, however the upshot of that is that provisioning would fail and conjure-up would report it as a failed deployment. But you wouldn't end up in a situation where you had a ton of VMs in which it wasn't clear what was allocated to them.

Aside from making it easier to manage VMs manually via MAAS, I guess pods allow for automatic VM creation by Juju. However, in that case, MAAS will create as many VMs as Juju asks for, and it sounds like there's not an easy way in MAAS to figure out what's assigned to each. Of course, you should be able to use Juju to determine this (with `juju list-models` and `juju status -m $model`) and I did think that Juju would tag the VMs via MAAS to at least give some clue as to how they're used. If MAAS supports having Juju provide the name for automatically created VMs, it would certainly be nice to have the MAAS provider in Juju support that and give them clear and useful names (presumably including the model ID and machine number).

Revision history for this message
Cory Johns (johnsca) wrote :

I'm still not really clear how you ended up with so many VMs, but the only ways I can think of are that multiple (11+) attempts were made to deploy CDK with it not being clear, if something failed, whether resources were being consumed, or hitting some strange bug in the Juju <-> MAAS pod interaction.

For the former, I didn't see anything of note in the conjure-up logs that were provided that would indicate that happened, but I created https://github.com/conjure-up/conjure-up/issues/1482 to try to ensure that something like that doesn't happen again / in the future.

For the latter, the MAAS and / or Juju controller logs would be the place to check.

Changed in juju:
status: Confirmed → Triaged
Revision history for this message
Adham Sabry (atdhrhs) wrote :

@Cory: first time I installed it, I checked many of the features at the beginning, i.e helm, etc... and that's why I ended with a lot of VMs. Next time I installed a very minimal version with no addons (probably this is what you saw)

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Wishlist → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.