MAAS cannot deploy/boot if OVS bridge is configured on a single PXE NIC
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Netplan |
Fix Released
|
Undecided
|
Lukas Märdian | ||
cloud-init |
Fix Released
|
Undecided
|
Unassigned | ||
netplan.io (Ubuntu) |
Fix Released
|
Undecided
|
Lukas Märdian | ||
Focal |
Fix Released
|
Undecided
|
Lukas Märdian | ||
Groovy |
Fix Released
|
Undecided
|
Lukas Märdian |
Bug Description
Problem description:
If we try to deploy a single-NIC machine via MAAS, configuring an Open vSwitch bridge as the primary/PXE interface, the machine will install and boot Ubuntu 20.04 but it cannot finish the whole configuration (e.g. copying of SSH keys) and cannot be accessed/controlled via MAAS. It ends up in a "Failed" state.
This is because systemd-
Steps to reproduce:
* Setup a (virtual) MAAS system, e.g. inside a LXD container using a KVM host, as described here:
https:/
* Install & setup maas[-cli] snap from 2.9/beta channel (instead of the deb/PPA from the discourse post)
* Configure netplan PPA+key for testing via "Settings" -> "Package repos":
https:/
* Prepare curtin preseed in /var/snap/
=======
#cloud-config
debconf_selections:
maas: |
{{for line in str(curtin_
{{line}}
{{endfor}}
late_commands:
maas: [wget, '--no-proxy', '{{node_
90_create_user: ["curtin", "in-target", "--", "sh", "-c", "sudo useradd test -g 0 -G sudo"]
92_set_
94_cat: ["curtin", "in-target", "--", "sh", "-c", "cat /etc/passwd"]
98_cloud_init: ["curtin", "in-target", "--", "apt-get", "-y", "install", "cloud-init"]
=======
* Compose a new virtual machine via MAAS' "KVM" menu, named e.g. "test1"
* Watch it being commissioned via MAAS' "Machines" menu
* Once it's ready select your machine (e.g. "test1.maas") -> Network
* Select the single network interface (e.g. "ens4") -> Create bridge
* Choose "Bridge type: Open vSwitch (ovs)", Select "Subnet" and "IP mode", save.
* Deploy machine to Ubuntu 20.04 via "Take action" button
The machine will install the OS and boot, but will end up in a "Failed" state inside MAAS due to network/OVS not being setup correctly. MAAS/SSH has no control over it. You can access the (broken) machine via serial console from the KVM-host (i.e. LXD container) via "virsh console test1" using the "test:test" credentials.
=== SRU/Focal/
[Impact]
This update contains bug-fixes and packaging improvements and we would like to make sure all of our supported customers have access to these improvements.
The notable ones are:
* Setup OVS early in network-pre.target to avoid delays (LP: #1898997)
See the changelog entry below for a full list of changes and bugs.
[Test Case]
The following development and SRU process was followed:
https:/
Netplan contains an extensive integration test suite that is ran using
the SRU package for each releases. This test suite's results are available here:
http://
A successful run is required before the proposed netplan package
can be let into -updates.
The netplan team will be in charge of attaching the artifacts and console
output of the appropriate run to the bug. Netplan team members will not
mark ‘verification-done’ until this has happened.
[Regression Potential]
In order to mitigate the regression potential, the results of the
aforementioned integration tests are attached to this bug.
Focal:
https:/
https:/
https:/
https:/
https:/
[Discussion]
To fully fix the MAAS/OVS problem, cloud-init needs to be updated as well. The fixes to netplan.io and cloud-init can be applied independently, though.
[Changelog]
- Setup OVS early in network-pre.target to avoid delays (LP: #1898997)
- Suggest openvswitch-switch runtime dependency
- Improve stability of autopkgtests
Related branches
- Dimitri John Ledkov: Approve
-
Diff: 163 lines (+118/-1)6 files modifieddebian/changelog (+12/-0)
debian/control (+2/-1)
debian/patches/0003-tests-tunnels-improve-WG-handshake-regex.patch (+21/-0)
debian/patches/0004-tests-ovs-fix-OVS-timeouts.patch (+32/-0)
debian/patches/0005-Fix-MAAS-OVS-first-boot-for-single-NIC-PXE-systems-1.patch (+48/-0)
debian/patches/series (+3/-0)
description: | updated |
description: | updated |
tags: | added: fr-721 |
Changed in netplan: | |
status: | In Progress → Fix Committed |
description: | updated |
Changed in netplan.io (Ubuntu Groovy): | |
status: | New → Fix Released |
Changed in cloud-init: | |
status: | In Progress → Confirmed |
I found a way to execute the netplan-OVS systemd units earlier in the boot process, which fixes the broken networking after boon on single NIC systems.
Packages from this PPA should now enable booting of Focal images with a single OVS bridge PXE interface: /launchpad. net/~slyon/ +archive/ ubuntu/ ovs
https:/
But, even though the deploy succeeds, there still seem to be some related issues with SSH not being setup correctly after the deploy.
Upstream work is being tracked here: /github. com/CanonicalLt d/netplan/ pull/165
https:/