systemd-resolved failure when commissioning machine with 12 NICs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Expired
|
Undecided
|
Unassigned |
Bug Description
I've got a machine with 12 network devices that we're trying to add to MAAS. The machine successfully does the initial enlistment, but cannot commission. On digging through the logs (attached), I noticed that the machine does PXE and DNS is working at the beginning:
2021-10-
As you can see, the node succeeds in obtaining the scripts from the internal URL. From there it beings running the commissioning scripts and eventually brings up the remaining 11 NICs. Each NIC is successfully brought up and obtains a DHCP address from MAAS.
However, at this point, DNS stops working and commissioning fails because the node cannot talk back to the MAAS API:
2021-10-
After looking a bit deeper in the logs, I noticed that after each NIC comes up, systemd-resolved is reloaded. At the end of the sequence where the NICs are all brought up, that final reload of systemd-resolved fails:
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
My hypothesis is that the sheer number of NICs coming up at one time is basically causing that limit to hit due to too many rapid-fire reloads of systemd-resolved
This seems to at least get some validation in that we disconnected 4 of the ports, leaving only 8, and then commissioning immediately succeeded.
Seems related to https:/ /bugs.launchpad .net/ubuntu/ +source/ systemd/ +bug/1939255