Chainbooting from grub over the network to local shim breaks chain of trust

Bug #1865515 reported by Rod Smith on 2020-03-02
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
grub2 (Ubuntu)
Status tracked in Groovy
Focal
Undecided
Unassigned
Groovy
Undecided
Unassigned
shim-signed (Ubuntu)
Status tracked in Groovy
Focal
Undecided
Unassigned
Groovy
Undecided
Unassigned

Bug Description

MAAS (2.4.2 and 2.6.2) cannot deploy to a server with Secure Boot active. This appears to be a regression of bug #1711203; the symptoms are identical. Namely:

1) The system can begin deployment fine.
2) After deployment is complete except for the final reboot, the
   system will reboot.
3) GRUB appears briefly on the screen.
4) The system console briefly displays the message:
   Bootloader has not verified loaded image
   System is compromised. halting.
5) The node powers off.
6) Eventually MAAS times out on the deployment and declares
   that it's failed.

I've verified this on three MAAS servers and one node each (jehan, a Quanta QuantaGrid D52B-1U in 18T; capella, a Supermicro SYS-6028U-TR4+ in 1SS, and brennan, an Intel NUC DC53427HYE on my home network).

Two of the MAAS servers are running MAAS 2.6.2-7841-ga10625be3-0ubuntu1~18.04.1; the third is on 2.4.2-7034-g2f5deb8b8-0ubuntu1.

Lee Trager (ltrager) wrote :

What operating system are you trying to deploy? Can you deploy Ubuntu 18.04?

Changed in maas:
status: New → Incomplete
Woodrow Shen (woodrow-shen) wrote :

Hi, I'm doing the same experiment like bug described. There is my summary:

Test environment:
MAAS version: 2.7.0 (8232-g.6e1dba4ab-0ubuntu1~18.04.1) by following up this page (https://maas.io/docs/install-from-packages)

Dell Latitude 3510 (09ED) laptop with secure boot enabled

1. Deploying 18.04 from MAAS => Got the same error.
2. Deploying 20.04 from MAAS => Got the same error.

Rod Smith (rodsmith) wrote :

I, too, used Ubuntu 18.04 and 20.04 in my tests.

Changed in maas:
status: Incomplete → Confirmed
Changed in maas:
milestone: none → 2.8.0b2
Alberto Donato (ack) on 2020-04-24
Changed in maas:
milestone: 2.8.0b2 → 2.8.0rc1
Rod Smith (rodsmith) wrote :

A deployment to brennan a few days ago (via MAAS 2.4.2) succeeded, so I've done some re-testing today. Brennan continues to deploy successfully with Secure Boot active; however, systems in both 18T (I re-tested jehan and also tested feebas, a Cisco UCS C220 M4) and 1SS (I tested kies, a Supermicro SYS-6018U-TR4+) failed. Both 18T and 1SS have MAAS 2.6.2 servers.

Rod Smith (rodsmith) wrote :

Oh, most of the tests in #4 used Ubuntu 20.04; however, the failed jehan deployment used 18.04.

Alberto Donato (ack) on 2020-05-01
Changed in maas:
milestone: 2.8.0b3 → 2.8.0rc1
Alberto Donato (ack) on 2020-05-11
Changed in maas:
milestone: 2.8.0b4 → 2.8.0rc1
Lee Trager (ltrager) wrote :

This isn't a bug with MAAS, it's a bug with shim/grub. MAAS gets its bootloaders from the public stream at images.maas.io which is generated by lp:maas-images. lp:maas-images pulls the bootloaders out of the archive, its currently set to pull them from bionic.

Secure boot is working in the ephemeral environment it's failing when trying to local boot into the deployed environment. When an x86_64 UEFI machine local boots with MAAS it boots over the network, downloads bootx64.efi(shim) which downloads grubx64.efi and this grub.cfg[1]. The grub from over the network finds /boot/efi/ubuntu/shimx64.efi on the local filesystem and chainboots to it. Somehow the chain of trust breaks here causing the system to halt.

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found
EFI stub: UEFI Secure Boot is enabled.
Bootloader has not verified loaded image.
System is compromised. halting.

I tried using the shim and grub from Focal but I still get the same problem.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template

Changed in grub (Ubuntu):
status: New → Confirmed
Changed in shim-signed (Ubuntu):
status: New → Confirmed
summary: - MAAS can't deploy to a server with Secure Boot active
+ Chainbooting from grub over the network to local shim breaks chain of
+ trust
Lee Trager (ltrager) wrote :

For LXD Pods[1] we had to disable secure boot to get around this issue. When this bug is fixed we should reenable secure boot for LXD Pods.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/drivers/pod/lxd.py#n515

tags: added: rls-bb-incoming
tags: added: rls-ff-incoming
Jeff Lane (bladernr) wrote :

FWIW, a partner is also hitting this in the field when trying to do Secure Boot installs which break 100% of the time for them.

They have noted, however, that installing from ISO works and can successfully install and boot on a secure boot enabled server. They've only tested Focal ISOs at this time, but this tells me that there's some difference between what MAAS images are getting or have gotten, and what the ISO has or is doing during install.

tags: added: blocks-hwcert-server
Lee Trager (ltrager) wrote :

@Jeff MAAS uses the same bits as what the ISO uses. What is different is how local booting happens with MAAS vs with the ISO. When installed with the ISO the local boot process is UEFI Firmware -> Shim(from disk) -> GRUB(from disk) -> Boot local kernel. When installed with MAAS the local boot process is UEFI Firmware -> Shim(from network) -> GRUB(from network) -> Shim(from disk) -> Grub(from disk) -> Boot local kernel. The chain of trust when switching going from GRUB(from network) to Shim(from disk). I suspect but haven't verified that this may be due to the shim not being signed with a key GRUB has.

On Tue, May 19, 2020 at 04:15:47PM -0000, Lee Trager wrote:
> I suspect but haven't verified that this may be due to the shim
> not being signed with a key GRUB has.

GRUB embeds no keys, it calls out to shim for verification of signatures.

It would be helpful if someone could verify whether the boot chain is
stopping at the second shim, or at the second grub.

Lee Trager (ltrager) wrote :

Based on the MAAS logs the halt happens after the remote shim, grub, and grub.cfg have been loaded. I didn't see anything in the console to show grub running but it may have been cleared before I could see it.

Console output:

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found

Bootloader has not verified loaded image.
System is compromised. halting.

rackd.log

2020-05-19 20:54:04 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.0.0.117
2020-05-19 20:54:04 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.0.0.117
2020-05-19 20:54:05 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/command.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/fs.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/crypto.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/terminal.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg-00:16:3e:49:52:7b requested by 10.0.0.117

You can reproduce this pretty easily with MAAS 2.8 and LXD Pods.

1. Install MAAS 2.8
2. Add an LXD Pod
3. Compose a machine in the LXD Pod and let it commission
4. Reenable secure boot in the LXD virtual machine
   lxc config edit <vm name>
   Delete the line 'security.secureboot: "false"'
5. Attempt to deploy Ubuntu

Steve Langasek (vorlon) wrote :

On Tue, May 19, 2020 at 08:59:46PM -0000, Lee Trager wrote:
> Based on the MAAS logs the halt happens after the remote shim, grub, and
> grub.cfg have been loaded. I didn't see anything in the console to show
> grub running but it may have been cleared before I could see it.

> Console output:

> Booting local disk...
> Failed to open \efi\boot\grubx64.efi - Not Found
> Failed to load image \efi\boot\grubx64.efi: Not Found
> start_image() returned Not Found

> Bootloader has not verified loaded image.
> System is compromised. halting.

Doesn't this output show that it has successfully chained to the local shim,
since it's shim that is loading \efi\boot\grubx64.efi and those messages are
from shim?

What I don't currently understand is why this should behave any differently
with or without SecureBoot enabled; that will need digging into. But the
specific error "Not found" certainly implies there is a difference in the
path resolution when secureboot is on.

Lee Trager (ltrager) wrote :

MAAS doesn't know for sure what operating system is deployed locally. When booting locally MAAS sends a grub.cfg[1] which searches for the shim or local bootloader. MAAS first tries \efi\boot\bootx64.efi as that is the default location as per the UEFI spec. Most operating systems including Ubuntu put a bootloader there. The shim fails to find grub as Ubuntu only stores grub in \efi\ubuntu\grubx64.efi. The two failure messages are from that. The config then tries to load \efi\ubuntu\shimx64.efi which succeeds but is unable to verify either \efi\ubuntu\shimx64.efi or \efi\ubuntu\grubx64.efi.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template

Lee Trager (ltrager) wrote :

I tried modifying the MAAS local boot grub.cfg to directly chainboot \efi\ubuntu\shimx64.efi. This gets rid of the failed to open/failed to load errors. Local grub appears to load but halts saying the system is compromised when it tries to boot the local kernel.

Dimitri John Ledkov (xnox) wrote :

Can I have access to said MAAS environment and those machines?

Dimitri John Ledkov (xnox) wrote :

Please provide remote artifacts
Please provide local artifacts
Please provide reproducer steps
Please provide details how local artifacts were installed
Please provide list of certs trusted by the node's firmware
Please provide access to MAAS with a secureboot on & off target nodes

Changed in shim-signed (Ubuntu):
status: Confirmed → Incomplete
Changed in grub (Ubuntu):
status: Confirmed → Incomplete
Rod Smith (rodsmith) wrote :

Unfortunately, capella in 1SS is not currently accessible by our team. You can test on jehan in 18T, though; I'm sending you an e-mail with details.

I don't know what you mean by "remote artifacts" and "local artifacts." The steps to reproduce the problem is simply to enable Secure Boot and attempt to deploy the server; it will fail as described in the initial bug report.

Julian Andres Klode (juliank) wrote :

Well, quite simply we'd like a minimal test case without involving maas, so that we can test this in a sensible way. This is also important for SRUs, as we need to test them before releasing.

Julian Andres Klode (juliank) wrote :

Consider that we might need to upgrade grub on the MAAS server, and need to test this on bionic, focal, groovy on both maas server and deployed server sides.

e.g. we might need to test deploying a groovy server from a bionic MAAS, and vice versa, and other combinations of this.

Rod Smith (rodsmith) wrote :

I've managed to create a procedure that duplicates this problem without the involvement of MAAS, except for one file pulled from MAAS. The procedure is awkward, but it reproduces the problem. Here's the procedure:

1) Ensure that Secure Boot is enabled.
2) Install Ubuntu. (I used 20.04 LTS server.)
3) Retrieve shimx64.efi from a MAAS server
   (/var/lib/maas/boot-resources/current/grubx64.efi). I'm appending
   a copy of the file I used to this bug report.
4) sudo mkdir /boot/efi/EFI/foo
5) sudo cp /boot/efi/EFI/ubuntu/shimx64.efi /boot/efi/EFI/foo/
6) Copy the grubx64.efi retrieved from step #3 to /boot/efi/EFI/foo.
7) sudo efibootmgr -c -l \\EFI\\foo\\shimx64.efi -L "Secondary GRUB"
8) Reboot. A grub> prompt should appear, from shimx64.efi in the EFI/foo
   directory on the ESP.
9) Type "set root='(hd0,gpt1)'"
10) Type "chainloader /EFI/ubuntu/shimx64.efi"
11) Type "boot". The messages noted in the initial bug report should
    appear and the system should halt.

Note that some disk references may need to be adjusted on some systems -- (hd0,gpt1) is the ESP, and the efibootmgr command assumes the ESP is /dev/sda1 from within Ubuntu.

Interestingly, substituting grubx64.efi for shimx64.efi in step #10 results in a successful boot, which may be a simple workaround from within MAAS -- if MAAS's configuration is changed to bypass the second shimx64.efi, it may work better.

Julian Andres Klode (juliank) wrote :

I don't see a netboot in there, am I missing something?

Julian Andres Klode (juliank) wrote :

The grubx64.efi from #3 is probably a grubnetx64.efi?

Rod Smith (rodsmith) wrote :

As I said, the EFI/foo/grubx64.efi is taken from MAAS. It's presumably netboot-enabled, but can't seem to find its config file, hence the need for the manual entry in steps 9-11. Note that I'm not a MAAS developer, so my understanding of its internals is limited.

Julian Andres Klode (juliank) wrote :

I'm just wondering where Maas is getting the grubx64.efi from, I assume/hope it's the grubnetx64.efi binary built by the grub package.

Because the bug might be in there.

Anyway, this should be enough to investigate further and sounds somewhat familiar too

Changed in shim-signed (Ubuntu):
status: Incomplete → Confirmed
Changed in grub (Ubuntu):
status: Incomplete → Confirmed
Changed in shim-signed (Ubuntu):
status: Confirmed → Triaged
Changed in grub (Ubuntu):
status: Confirmed → Triaged
Lee Trager (ltrager) wrote :

The MAAS environment I've been using to reproduce this is virtual. I have MAAS running in an LXD container connected to an LXD Pod. To recreate this environment you'll have to install MAAS 2.8, python-pylxd from github(if using the Debian packages), and apply this[1] patch to reenable secure boot. After MAAS is setup you'll need to configure LXD to accept remote connections to be able to add it as a MAAS Pod.

This bug should be reproducible using LXD

1. Download GRUB and the shim. MAAS gets both from Bionic, you can download them direct here[1]
2. Setup a TFTP server to provide them
3. Add grub.cfg from MAAS[3]
4. Setup DHCP - Example dhcpd.conf from MAAS[4]
5. Create LXD VM
6. Modify LXD VM to boot from over the network
7. See boot failure

[1]http://paste.ubuntu.com/p/gjXhVTDgRv/
[2] https://images.maas.io/ephemeral-v3/daily/bootloaders/uefi/amd64/
[3] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template
[2] http://paste.ubuntu.com/p/RMRxYkDrNG/

Lee Trager (ltrager) wrote :

All bootloader files are pulled from the archive and provided on images.maas.io by lp:maas-images. bootloaders.yaml describes what files are pulled from what packages.

https://git.launchpad.net/maas-images/tree/conf/bootloaders.yaml

Alberto Donato (ack) on 2020-06-04
Changed in maas:
milestone: 2.8.0rc1 → 2.8.0
Alberto Donato (ack) on 2020-06-11
Changed in maas:
milestone: 2.8.0rc3 → 2.8.0
Steve Langasek (vorlon) on 2020-06-11
affects: grub (Ubuntu) → grub2 (Ubuntu)
Brian Murray (brian-murray) wrote :

Can you elaborate on this step? "6. Modify LXD VM to boot from over the network"

tags: added: id-5ee24d297b5c2a5aa43fda04
Lee Trager (ltrager) wrote :

By default an LXD VM boots from the disk first. However you can change the boot order by adding "boot.priority" to your devices. The device with the highest number boots first.

LXD devices config for booting off the boot disk.
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: default
    size: "8000000000"
    type: disk

LXD devices config for booting off the network first.
devices:
  eth0:
    boot.priority: "1"
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    boot.priority: "0"
    path: /
    pool: default
    size: "8000000000"
    type: disk

Alberto Donato (ack) on 2020-06-23
Changed in maas:
milestone: 2.8.0 → 2.9.0b1
tags: added: maas-grub
tags: removed: rls-bb-incoming rls-ff-incoming
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers