Installed image can be missing necessary boot file

Bug #1853906 reported by Rod Smith on 2019-11-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
grub
New
Undecided
Unassigned
shim
New
Undecided
Unassigned

Bug Description

Sometimes, an installation via MAAS will fail, with the following displayed on the node's screen:

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found

The boot process stops here. Bypassing network booting to boot from the hard disk succeeds. It turns out that the EFI System Partition's (ESP's) \efi\boot\grubx64.efi file (/boot/efi/EFI/BOOT/grubx64.efi) is indeed missing; SOMETHING is trying to load that file and failing. Copying the grubx64.efi and grub.cfg files from the ESP's \efi\ubuntu directory (/boot/efi/EFI/ubuntu in Ubuntu) to the ESP's \efi\boot enables the server to boot.

The server in question is meitner, a Supermicro 5018R-WR. I'm trying to deploy Ubuntu 19.10 on it; I haven't yet checked to see if the same problem occurs when deploying other versions of Ubuntu. Other servers boot just fine in the absence of this file; I don't know why this is a problem for just this one server.

$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-======================================-============-==================================================
ii maas 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cert-server 0.4.4-0ppa1~git3ac1382~ubuntu18.04.1 all Ubuntu certification support files for MAAS server
ii maas-cli 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS server common files
ii maas-dhcp 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS DHCP server
un maas-dns <none> <none> (no description available)
ii maas-proxy 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all Rack Controller for MAAS
ii maas-region-api 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.6.1-7832-g17912cdc9-0ubuntu1~18.04.1 all MAAS server provisioning libraries (Python 3)

Rod Smith (rodsmith) wrote :
Rod Smith (rodsmith) wrote :

This problem occurs when attempting to deploy Ubuntu 18.04, too.

It appears that the ESP's \efi\boot\bootx64.efi is Shim. There's also a copy of fbx64.efi in this directory, but it looks like this Shim isn't configured to look for it, so in the absence of grubx64.efi in this directory, the boot process hangs when Shim is launched. At least, that's my hypothesis.

Lee Trager (ltrager) wrote :

Machines managed by MAAS are always configured to boot off of the network. When an image is deployed MAAS sends grub, which was loaded over the network, a configuration file[1] which searches for the local boot loader to chain boot to. As per 13.3.1.3 of the UEFI spec[2] MAAS first tries the default location for a local boot loader, \EFI\BOOT\BOOTX64.EFI. If that is not found or fails to load a number of known vendor directories are searched and attempted. If everything fails GRUB exists so the firmware tries the next boot device.

If I keep a close eye on the UEFI machine booting I can see that \EFI\BOOT\BOOTX64.EFI is being loaded but fails to chain load \EFI\BOOT\GRUBX64.EFI. On UEFI QEMU as well as our CI machines GRUB continues, finds /EFI/ubuntu/shimx64.efi and booting succeeds.

Do you have secure boot enabled? If so can you try deploying with it disabled? It may be causing the boot process to lock when BOOTX64.EFI fails to load.

I'm adding GRUB and shim as \EFI\BOOT\BOOTX64.EFI should chain load to the locally installed GRUB. I'm also not sure why GRUB isn't trying the other alternatives its given.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template
[2] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf

Rod Smith (rodsmith) wrote :

Secure Boot is NOT enabled on the affected machine.

So far, I've seen this on only this one server. Other machines, including others deployed from the same MAAS server, deploy and boot fine.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers