MAAS aggressively de-configures network interface
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Invalid
|
High
|
Unassigned |
Bug Description
We've recently upgraded the Server Certification lab from MAAS 2.6 to MAAS 2.9 (version 2.9.1 (9153-g.66318f531), to be precise), and we've encountered a new problem with MAAS 2.9. Our lab configuration has two IP blocks (10.245.128.0/22, the "external network"; and 10.1.10.0/23, the "internal network") on the same VLAN. The MAAS server has two physical NICs, one for each range of IP addresses. MAAS manages DHCP for the internal network (10.1.10.0/23), but not for the external one (10.245.128.0/22). This configuration enables us to easily move individual nodes' IP addresses from one IP range to another, and it worked fine with MAAS 2.6; however, in MAAS 2.9, a problem can arise:
If a specific network device for a node is configured to be on the external network, but if the node PXE-boots from that network device, then MAAS 2.9 will aggressively unconfigure the device; its settings disappear from the MAAS "Network" tab, and the deployed node has no network options set; the /etc/netplan/
eno2:
match:
mtu: 1500
Although it's possible to control the PXE-boot device on most servers and thus avoid this problem by ensuring the server PXE-boots from a device configured for the internal network, this isn't always 100% reliable. The server might fail over to another device if the first attempt times out or otherwise fails. Some servers have configuration options that are difficult, and perhaps impossible, to set correctly.
Reverting to the behavior of MAAS 2.6, which would PXE-boot on the internal network but do a final network configuration on the external network, is desirable.
tags: | added: hwcert-server |
Changed in maas: | |
status: | New → Triaged |
importance: | Undecided → High |
I'm attaching a tarball containing several files showing the state of the MAAS server's network and the before, during, and after states of a node (polari) configured as described. (There's no "during" screen shot, since the network device went to an unconfigured about a minute after I began the deployment, and nothing else interesting happened. The "during" text file was created after the network interface went to an unconfigured state.)