Deployment fails if server's EFI variable storage is full
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Invalid
|
Undecided
|
Unassigned | ||
curtin |
Expired
|
Undecided
|
Unassigned |
Bug Description
Deployment can fail if a server's EFI variable storage is full. Unfortunately, I lack most relevant logs, since the error went away while I was investigating it, and a subsequent deployment worked; however, I'm pretty confident of the cause: When calling efibootmgr to add a local-disk boot variable and/or set the boot variable order, efibootmgr returned an error condition, which caused the deployment to fail. I don't recall the exact message, but in a deployment, there was a message to the effect that a call to efibootmgr had failed, which appeared to trigger the deployment failure. In my experiments, I booted an Ubuntu Artful desktop image and tried running "efibootmgr -o {a sensible boot order}", which returned:
could not set BootOrder: No space left on device
This error refers to an out-of-space condition on the system's NVRAM, blocking a change in the BootOrder variable. On a subsequent boot, the system deployed correctly. Perhaps a normal garbage collection by the EFI fixed it, or perhaps a change I made to the firmware settings cleared the problem. In either event, I lost the exact MAAS installation logs.
Failing the installation upon a failure of the "efibootmgr -o" command is an unnecessarily strict condition, IMHO, since if the system booted to the MAAS installer, we know that PXE-booting works. Adding a boot entry for the local disk and adjusting the boot order to boot from the network is done so that the system can continue to boot if the MAAS server goes down; but if these operations fail, it seems to me that it's better to reboot and (if the system comes up) call the installation a success -- but ideally to flag the system with a warning that the boot order may be set incorrectly or that the system might fail to boot if the MAAS server goes down, depending on which efibootmgr call failed.
I'm attaching the /var/log/maas directory tree from the server. The node that experienced the problem is oil-prunus. Here's the MAAS package version information:
$ dpkg -l '*maas*'|cat
Desired=
| Status=
|/ Err?=(none)
||/ Name Version Architecture Description
+++-===
ii maas 2.2.2-6099-
ii maas-cert-server 0.2.30-
ii maas-cli 2.2.2-6099-
un maas-cluster-
ii maas-common 2.2.2-6099-
ii maas-dhcp 2.2.2-6099-
ii maas-dns 2.2.2-6099-
ii maas-proxy 2.2.2-6099-
ii maas-rack-
ii maas-region-api 2.2.2-6099-
ii maas-region-
un maas-region-
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-
ii python3-django-maas 2.2.2-6099-
ii python3-maas-client 2.2.2-6099-
ii python3-
Hey rod, installation log would be helpful but without nothing much we can do I’m sfraid!