[2.5a2] KVM pod networking needs to be smarter about networks to assign by default

Bug #1789521 reported by Mike Pontillo on 2018-08-29
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Newell Jensen

Bug Description

The current default network attachment algorithm for KVM pods regularly results in an incorrect configuration, and a poor out-of-the-box experience for users. It could be greatly improved in MAAS 2.5, but must be done carefully in order to avoid breaking backward compatibility.


In previous releases, (when allocating from KVM pods) MAAS would prefer to attach to networks based on the following algorithm:

(1) Look for a `maas` network defined in the hypervisor, and attach to that if it exists.
(2) Look for a `default` network defined in the hypervisor, and attach to that if it exists.

In MAAS 2.5 with KVM pod networking, MAAS still uses this algorithm (for backward compatibility) if no `interfaces` constraint is specified. However, this often results in a less-than-ideal configuration for the composed VM; we should revisit this decision now to see if we can make it better.

Consider the following:

 - When installing libvirt on Ubuntu, you get a `default` network at install time.[1] This network is guaranteed not to work with MAAS, because it provides DHCP (via dnsmasq) itself. Without MAAS-managed DHCP, (and without any way to add DHCP options to point to MAAS) there is no way for composed VMs to PXE boot from MAAS, and thus no way for the machines to be MAAS-managed.

 - Networks in libvirt can be defined as attachments to specific, non-libvirt-managed bridges. In MAAS 2.5, if the pod has a known host (and therefore, MAAS knows about its network model), MAAS can determine whether or not a particular network in virsh corresponds to a DHCP-enabled VLAN in MAAS.[2]

 - If the network is *managed* by libvirt, but /is not/ DHCP-enabled in libvirt, MAAS might be able to manage the network. The `virsh net-dumpxml` output would indicate this by the lack of a <dhcp/> XML element.[3]


(a) If a `maas` or `default` network is defined in libvirt, continue to prefer network attachments in that order of preference However, DO NOT attach to either network by default if libvirt's DHCP is enabled (this is virtually guaranteed to break the operation of MAAS).

(b) If a `maas` or `default` network is selected, validate that it is attached to a bridge whose VLAN is enabled for MAAS DHCP (or DHCP relay). If no libvirt network is found to be attached to a DHCP-enabled network in MAAS, skip to (c).

(c) If no MAAS-managed DHCP network was found after checking (a) and (b), prefer attachments in the following order of precedence:
 (1) A bridge interface on the pod host whose VLAN is DHCP-enabled.
 (2) A macvlan attachment on the pod host whose VLAN is DHCP-enabled

(d) If multiple VLANs on the pod host are DHCP enabled, prefer VLANs whose IPv4 subnets contain the largest number of free IP addresses.


[1]: https://paste.ubuntu.com/p/mDJYZNVVFX/
(example of libvirt-managed network)

[2]: https://paste.ubuntu.com/p/9PPJtc3QqP/
(example of a an network defined in libvirt, but not managed by libvirt)

[3]: https://paste.ubuntu.com/p/HcBTYKgkNM/
(example of a network defined and managed by libvirt, but without libvirt-managed DHCP)

Related branches

description: updated
Mike Pontillo (mpontillo) wrote :

I talked to Andres a little bit about this, and he had another good suggestion: if the pod host is a deployed machine in MAAS, we know where it PXE booted from. That could be a good indicator that the same bridge is valid to boot from, even in the absence of DHCP (or DHCP relay) enabled in MAAS.

Further, we could look at enhancing external DHCP detection as a way to test to see if PXE booting via MAAS is properly enabled for an external DHCP server.

I think these tasks can be handled separately from the primary fix for this issue, but one thing we could do in order to ensure backward compatibility in the case of external DHCP is to allow booting from the `maas` network in libvirt by default *if and only if libvirt's DHCP is not enabled*. This would allow the scenario where external DHCP could have been enabled already (and the fact that someone has a `maas` network in libvirt is a very specific thing to do, so it's likely to infer that as the user's intent.)

Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
status: Triaged → In Progress
description: updated
description: updated
tags: added: pod track
tags: added: pods
removed: pod
description: updated
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers