Unable to load shimx64.efi using iPXE over UEFI

Bug #1789319 reported by Lee Trager on 2018-08-28
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
grub2 (Ubuntu)
Undecided
Unassigned
ipxe (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
shim (Ubuntu)
Undecided
Unassigned

Bug Description

[Impact]
libvirt supports creating virtual machines running in UEFI mode and uses iPXE to enable network booting. When MAAS gives shimx64.efi, as it does on all UEFI systems, to iPXE it chainloads grub but fails to the grub prompt. If I modify MAAS to give grubx64.efi instead of shimx64.efi UEFI booting works.

Ideally iPXE would be modified to properly chainload the shim however MAAS could also check the user-agent when returning the boot file as follows.

if option arch = 00:00 {
    # pxe
    filename "lpxelinux.0";
} elsif option arch = 00:07 and exists user-class and option user-class = "iPXE" {
    # iPXE uefi_amd64
    filename "grubx64.efi";
} elsif option arch = 00:07 {
    # uefi_amd64
    filename "bootx64.efi";
} elsif option arch = 00:09 and exists user-class and option user-class = "iPXE" {
    # iPXE uefi_amd64
    filename "grubx64.efi";
} elsif option arch = 00:09 {
    # uefi_amd64
    filename "bootx64.efi";
} elsif option arch = 00:0B {
    # uefi_arm64
    filename "grubaa64.efi";
} elsif option arch = 00:0C {
    # open-firmware_ppc64el
    filename "bootppc64.bin";
} elsif option arch = 00:0E {
    # powernv
    filename "pxelinux.0";
    option path-prefix "ppc64el/";
} elsif option arch = 00:1F {
    # s390x
    filename "boots390x.bin";
    option path-prefix "s390x/";
} else {
    # pxe
    filename "lpxelinux.0";
}

[Test case]
Minimal test case:

Run the following command and ensure it boots (assuming a EFI system with shim and grub):
sudo kvm -bios /usr/share/OVMF/OVMF_CODE.fd -device virtio-net,netdev=n1 -netdev user,id=n1,tftp=/boot/efi/EFI/ubuntu,bootfile=shimx64.efi

Optimally, also do the MAAS thing.

[Regression potential]
I switched the ipxe-qemu packages to build in qemu mode, which makes things use OVMF's internal network stack (so things might work differently with some bootloaders or something when netbooting).

It might do other stuff, too, I don't really know. That said, the configuration is specifically for qemu, and used by other distributions, so this aligns us more closely with them, reducing chances of breaking stuff.

Lee Trager (ltrager) on 2018-08-28
description: updated
description: updated
description: updated
description: updated
Steve Langasek (vorlon) wrote :

I don't understand what iPXE has to do with anything here. If you are running a virtual machine in UEFI mode, you have a full UEFI firmware implementation which directly supports dhcp netboot without any involvement of iPXE. And I am unaware of any issues with netbooting ovmf to shim->grub->grub.efi.

Lee Trager (ltrager) wrote :
Lee Trager (ltrager) wrote :
Lee Trager (ltrager) wrote :
Lee Trager (ltrager) wrote :

When I configure the machine to network boot iPXE is used to try to boot the machine via TFTP. If I go into the UEFI firmware on the virtual machine I can see there are two options to network boot. The first is PXEv4, the second is HTTPv4. It doesn't seem that booting HTTPv4 uses iPXE however that method is not currently supported by MAAS.

So, what is the failure mode here? Just not finding the binaries?

Lee Trager (ltrager) wrote :

Sorry forgot to include the failure screenshot.

I can reproduce. This looks to be something in grub that shim doesn't appear to like.

no longer affects: ipxe (Ubuntu)
affects: shim-signed (Ubuntu) → grub2 (Ubuntu)
Changed in grub2 (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon) wrote :

The error message shown by shim in the screenshot is:

 Malformed binary after Attribute Certificate Table
 datasize: ? SumOfBytesHashed: ? SecDir->Size: ?
 hashsize: ? SecDir->VirtualAddress: 0x002E6088

Something is wrong here. These '?' are the result of a printf %u format string... that shouldn't happen.

Not having right output makes it difficult to further debug and confirm whether this is a problem with shim or the uefi network driver or the grub binary or the tftp server.

Opening a shim task.

Changed in shim (Ubuntu):
status: New → Confirmed
status: Confirmed → Triaged

If anything, TBH it's unlikely to be shim /or/ grub; seeing as netbooting on hardware just works; but we should have a good look at shim and grub anyway.

Lee Trager (ltrager) wrote :

One thing to add, this happens *before* any GRUB configuration is loaded. The machine requests an address over DHCP, bootx64.efi(the shim) over TFTP, and then grubx64.efi over TFTP. Nothing else is requested.

Andres Rodriguez (andreserl) wrote :

On Tue, Aug 28, 2018 at 07:31:14PM -0000, Andres Rodriguez wrote:
> isn't this the same issue as
> https://bugs.launchpad.net/maas/+bug/1711203

No, this failure is unrelated to secureboot and appears to be specific to
the ovmf UEFI firmware implementation

Changed in maas:
milestone: none → 2.5.0beta1
tags: added: id-5b85855515a6063ed300711d
Julian Andres Klode (juliank) wrote :

I reproduced this using:

$ kvm -bios /usr/share/OVMF/OVMF_CODE.fd -device e1000,netdev=n1 -netdev user,id=n1,tftp=/boot/efi,bootfile=/EFI/ubuntu/grubx64.efi

and manually booting shim from the iPXE commandline which then tries to chainload grubx64.efi.

iPXE> boot tftp://10.0.2.2/EFI/boot/bootx64.efi
tftp://10.0.2.2/EFI/boot/bootx64.efi... ok
Fetching Netboot Image
Malformed binary after Attribute Certificate Table
datasize: 4194304 SumOfBytesHashed: 1103360 SecDir->Size: 1912
hashsize: 3089032 SecDir->VirtualAddress: 0x0010D600
Failed to load image: Invalid Parameter
start_image() returned Invalid Parameter
Could not boot: Error 0x7f048282 (http://ipxe.org/7f048282)

Julian Andres Klode (juliank) wrote :

With systemd-boot instead of grub, it simply fails, pretendign the file does not exist.

Changed in maas:
milestone: 2.5.0beta1 → 2.5.0beta2
Download full text (4.7 KiB)

I may be able to provide some information here, about iPXE. (Corrections welcome, obviously!)

iPXE can be built in a number of ways. Two of those are: (1) as a UEFI *driver* that is presented in a NIC's PCI ROM BAR (i.e., as part of a PCI expansion ROM), (2) as a UEFI *application* that can be loaded from optical media / an ISO image, or a USB flash drive, or chain-loaded over the network with the platform firmware's built-in netboot capabilities.

Independently of that dimension (i.e., build target), iPXE can be built with different features enabled (through feature test macros). One of those feature macros is "EFI_DOWNGRADE_UX". You can read about it e.g. in iPXE commit a15c0d7e868a ("[efi] Allow user experience to be downgraded", 2015-07-22).

I'll spare you the technical details; the point is that without "EFI_DOWNGRADE_UX" (that is, in the default case), iPXE sort of takes over the entire UEFI netboot process, supplanting most of the edk2 network stack (which is built into OVMF).

It is widely agreed upon that this behavior is appropriate for build type (2) above, i.e. when iPXE is launched as a UEFI *application*.

Opinions differ whether this behavior is approprite for build type (1), that is, when iPXE is built as a UEFI *driver*, to be included in a specific NIC's PCI ROM BAR / expansion ROM image.

In my personal opinion, which some others (but definitely not all) share, for build type (1), the "take-over" behavior of iPXE is not desirable, and iPXE should only provide the lowest level hardware driver for the NIC. (And then the rest of the UEFI network stack will come from the edk2 modules -- this is precisely the situation that Steve describes in comment#1.) In other words, for build type (1), the "EFI_DOWNGRADE_UX" feature test macro should be defined -- in my opinion. And, since iPXE commit a200ad462e69 ("[build] Add named configuration for qemu", 2015-07-22), this is easily achievable by passing the "CONFIG=qemu" macro definition to "make", when the "bin-x86_64-efi/*.efidrv" files are built.

From the pics attached earlier (e.g. comment#3), I'm nearly 100% sure that the iPXE UEFI drivers in question -- build type (1) -- had been built *without* EFI_DOWNGRADE_UX. While more recent iPXE commit 3376fa520b0c ("[efi] Implement the EFI_PXE_BASE_CODE_PROTOCOL", 2015-09-02) suggests this should function OK, I would still suggest rebuilding the iPXE UEFI drivers (the PCI oproms) with "CONFIG=qemu", and retrying.

"CONFIG=qemu" is certainly what we use in RHEL and Fedora, in the ipxe-roms-qemu package. It's easy to refer to the Fedora spec file:

https://koji.fedoraproject.org/koji/packageinfo?packageID=13673

One more addition: from comment #2, I see that the NIC device model is virtio. For a virtio NIC, you don't even need that "lowest level hardware driver" to come from iPXE -- OVMF has a built-in (= native to edk2) driver for virtio-net. It's located under OvmfPkg/VirtioNetDxe. This means that, if you want to, you can netboot entirely without iPXE. (Note I'm not saying you "should" boot without iPXE, just that you "can", *if* you want to.) This is how:

As explained in OvmfPkg/README (see "Network Support"), if you use a virtio NIC,...

Read more...

Julian Andres Klode (juliank) wrote :

I could successfully chainload grub from shim by using virtio-net and disabling iPXE. Minimal reproducer:

sudo kvm -bios /usr/share/OVMF/OVMF_CODE.fd -device virtio-net,netdev=n1 -netdev user,id=n1,tftp=/boot/efi/EFI/ubuntu,bootfile=shimx64.efi -global virtio-net-pci.romfile=""

if I remove the last option (which disables iPXE), it fails to boot again.

Building iPXE with CONFIG=qemu seems to be a fairly complicated affair. The build system is a mess.

Changed in ipxe (Ubuntu):
status: New → Triaged
Julian Andres Klode (juliank) wrote :

Thanks Laszlo, ipxe built with CONFIG=qemu successfully makes shim load grub.

Changed in shim (Ubuntu):
status: Triaged → Invalid
Changed in grub2 (Ubuntu):
status: Confirmed → Invalid
Changed in ipxe (Ubuntu):
status: Triaged → In Progress
Julian Andres Klode (juliank) wrote :

I just uploaded -0ubuntu4 (and -0ubuntu3 earlier to fix FTBFS), that enables CONFIG=qemu for our qemu roms. This means QEMU will work fine, but grub/efi binaries and the CD-ROM images would still fail to load grub via shim I think. Not sure there's much that can be done there.

Changed in ipxe (Ubuntu):
status: In Progress → Fix Committed
Changed in ipxe (Ubuntu Bionic):
status: New → In Progress
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-0ubuntu4

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu4) cosmic; urgency=medium

  * Build ROMs for QEMU with CONFIG=qemu (LP: #1789319)

 -- Julian Andres Klode <email address hidden> Mon, 10 Sep 2018 14:56:17 +0200

Changed in ipxe (Ubuntu):
status: Fix Committed → Fix Released
Phillip Susi (psusi) wrote :

Isn't that just a workaround? There is still an underlying bug in the iPXE network driver isn't there?

Julian Andres Klode (juliank) wrote :

Yes, I think there's a bug in the iPXE network driver, but now it only applies to bare-metal users essentially. If that is a problem to anyone in practice, I'd suggest opening a new bug for that (or directly going upstream, maybe); so we can use this one to track the fix for MAAS VM stuff.

Lee Trager (ltrager) wrote :

Isn't iPXE what implements TFTP PXE boot? When I tried Julian's command kvm tries UEFI HTTP boot which isn't currently implemented in MAAS and is blocked by lack of grub support(LP:1787630).

Julian Andres Klode (juliank) wrote :

Ugh. Let's try to make this clear. Nothing here does http boot. There are three scenarios, all do PXE TFTP booting:

(1) iPXE network stack replaces firmware one - caused this bug
(2) iPXE without replacing firmware's network stack (CONFIG=qemu) - this works fine
(3) No iPXE, use OVMF's native implementation of PXE booting on virtio-net

(3) only works on virtio, (1, 2) also work on other cards supported by qemu.

Hello Lee, or anyone else affected,

Accepted ipxe into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ipxe/1.0.0+git-20180124.fbe8c52d-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in ipxe (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
Julian Andres Klode (juliank) wrote :

Verified that -ubuntu2 fails and -ubuntu2.1 from proposed works by running the command specified in the test case in a bionic system.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Lee Trager (ltrager) wrote :

I've verified that the updated ipxe package fixes virsh UEFI deployments wtih MAAS. I tested commissioning, and deploying both Ubuntu 18.04 and CentOS 7.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ipxe - 1.0.0+git-20180124.fbe8c52d-0ubuntu2.1

---------------
ipxe (1.0.0+git-20180124.fbe8c52d-0ubuntu2.1) bionic; urgency=medium

  * Build ROMs for QEMU with CONFIG=qemu (LP: #1789319)

 -- Julian Andres Klode <email address hidden> Mon, 10 Sep 2018 14:56:17 +0200

Changed in ipxe (Ubuntu Bionic):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ipxe has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in maas:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers