Comment 1 for bug 1779161

Revision history for this message
Cory Johns (johnsca) wrote :

Adding my context from discussions on IRC:

It sounds like the MAAS is configured with pods so that VMs can be created on demand. The conjure-up logs from the GitHub issue show 5 attempts to start a deployment, where conjure-up was able to connect to the MAAS API, but was unable to connect to an apparently existing controller:

2018-06-23 13:57:26,418 [DEBUG] conjure-up/canonical-kubernetes - maas.py:402 - Found endpoint: http://192.168.1.2:5240/MAAS/ for cloud: microcodez-01
2018-06-23 13:57:26,488 [DEBUG] conjure-up/canonical-kubernetes - events.py:52 - Setting MAASConnected at conjureup/controllers/juju/configapps/gui.py:49
2018-06-23 13:57:27,935 [DEBUG] conjure-up/canonical-kubernetes - telemetry.py:17 - Showing screen: Creating Model
...
OSError: [Errno 113] Connect call failed ('192.168.30.28', 17070)

It seems like there were other attempts to deploy CDK that are not represented in this log, so I'm not clear where the VMs were coming from, but the end result seems to have been that 70+ VMs were created with no clear indication as to what role they were serving.

One obvious point of confusion is that the mentioned tutorial never explains about pre-allocating VMs from the pod, tagging them, and then targeting applications to those machines using tags. For reference, https://docs.maas.io/1.9/en/nodes-tags#using-the-tag documents the syntax of the constraint for targeting the machine via tag, which would be entered in conjure-up on the Configure Application screen (e.g., https://i.imgur.com/JPyhYee.png) which is reached by clicking the Configure button next to each application on the last screen before deploying. I'm not clear from that documentation how you would handle an application that has multiple units, like kubernetes-worker, however.

Another point of confusion is that even if constraints were not specified, I would expect that 1) MAAS would try to fulfill Juju's request for a machine from the already available pool of VMs, and 2) if it could not do so and created a new VM from the pod, Juju would then tag the VM in MAAS to indicate which Juju machine, and possibly which Juju units, were running on that VM (for instance, on AWS my instance i-04c41c1309bde47d4 got the tag juju-machine=conjure-kubernetes-core-0a9-machine-0, and while I didn't see anything about the unit this time, I seem to recall in the past seeing some metadata including a list of what Juju units were on the machine as well).

There also seems to have been some confusion across the board about what role each component of the toolchain serves and what level of control each gives. Perhaps we need a whitepaper giving a technical overview of all the pieces and a general idea of what role each plays in deploying CDK. (That sounded an awful lot like me volunteering myself for that whitepaper. Shoot. ;) Maybe we already have something like that and I'm not aware of it?)