Chainbooting from grub over the network to local shim breaks chain of trust

Bug #1865515 reported by Rod Smith
40
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Unassigned
OEM Priority Project
Confirmed
High
ethan.hsieh
shim
New
Unknown
grub2 (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Groovy
Fix Released
Undecided
Unassigned
shim-signed (Ubuntu)
Triaged
Undecided
Unassigned
Focal
Triaged
Undecided
Unassigned
Groovy
Triaged
Undecided
Unassigned

Bug Description

[Impact]

 * UEFI Grub currently doesn't support exiting with an unsuccessful exit code. That means, a booted grub cannot determine that it should not be booting, exit, remove the installed shim protocol and ask the firmware to boot the next BootOrder BootEntry. Without this support livecd grub.cfg cannot perfrom "boot from local harddrive" or grub booted over the network cannot exit to continue regular boot off the harddrive, whilst preserving SecureBoot.

[Test Case]

 * On a regular Ubuntu install, with UEFI and SecureBoot on, upgrade to new grub2 from proposed.
 * Insert any Ubuntu installation CD as cdrom or usb-stick.
 * Add a new UEFI boot entry for the CD or the usb-stick using efibootmgr, or by using your firmware settings (sudo systemctl reboot --firmware-setup)
 * Make sure the regular Ubuntu install is the first in the BootOrder, followed by the cdrom/usb-stick.
 * Start regular boot, interrupt it with Esc, and enter the grub shell by pressing 'c'
 * Check that the new version of grub is running by doing
 * echo "${package_version}"
 * Next type `exit 1`
 * The current boot should reset and the boot off the installation media should proceed
 * The grub menu options will look different
 * Complete the boot, observe that one ended up in the livecd / installer environment and that secureboot is on by checking the output of `bootctl`.

[Where problems could occur]

 * `exit` command of grub has changed to accept optional arguments that are no-op on all platforms, but uefi as that's the only one that supports passing return status. However some might attempt to use this on non-uefi platforms in vain. Previously exit command accepted no arguments. One might start rely on this functionality whilst using mismatched grubs - for example this is not available in Debian or Upstream, but is starting to be available in Ubuntu and has been available in Fedora/CentOS for a while now. No regular boot flows use `exit` command to boot.

[Other Info]

 * Original bug report:

MAAS (2.4.2 and 2.6.2) cannot deploy to a server with Secure Boot active. This appears to be a regression of bug #1711203; the symptoms are identical. Namely:

1) The system can begin deployment fine.
2) After deployment is complete except for the final reboot, the
   system will reboot.
3) GRUB appears briefly on the screen.
4) The system console briefly displays the message:
   Bootloader has not verified loaded image
   System is compromised. halting.
5) The node powers off.
6) Eventually MAAS times out on the deployment and declares
   that it's failed.

I've verified this on three MAAS servers and one node each (jehan, a Quanta QuantaGrid D52B-1U in 18T; capella, a Supermicro SYS-6028U-TR4+ in 1SS, and brennan, an Intel NUC DC53427HYE on my home network).

Two of the MAAS servers are running MAAS 2.6.2-7841-ga10625be3-0ubuntu1~18.04.1; the third is on 2.4.2-7034-g2f5deb8b8-0ubuntu1.

Related branches

Revision history for this message
Lee Trager (ltrager) wrote :

What operating system are you trying to deploy? Can you deploy Ubuntu 18.04?

Changed in maas:
status: New → Incomplete
Revision history for this message
Woodrow Shen (woodrow-shen) wrote :

Hi, I'm doing the same experiment like bug described. There is my summary:

Test environment:
MAAS version: 2.7.0 (8232-g.6e1dba4ab-0ubuntu1~18.04.1) by following up this page (https://maas.io/docs/install-from-packages)

Dell Latitude 3510 (09ED) laptop with secure boot enabled

1. Deploying 18.04 from MAAS => Got the same error.
2. Deploying 20.04 from MAAS => Got the same error.

Revision history for this message
Rod Smith (rodsmith) wrote :

I, too, used Ubuntu 18.04 and 20.04 in my tests.

Changed in maas:
status: Incomplete → Confirmed
Changed in maas:
milestone: none → 2.8.0b2
Alberto Donato (ack)
Changed in maas:
milestone: 2.8.0b2 → 2.8.0rc1
Revision history for this message
Rod Smith (rodsmith) wrote :

A deployment to brennan a few days ago (via MAAS 2.4.2) succeeded, so I've done some re-testing today. Brennan continues to deploy successfully with Secure Boot active; however, systems in both 18T (I re-tested jehan and also tested feebas, a Cisco UCS C220 M4) and 1SS (I tested kies, a Supermicro SYS-6018U-TR4+) failed. Both 18T and 1SS have MAAS 2.6.2 servers.

Revision history for this message
Rod Smith (rodsmith) wrote :

Oh, most of the tests in #4 used Ubuntu 20.04; however, the failed jehan deployment used 18.04.

Alberto Donato (ack)
Changed in maas:
milestone: 2.8.0b3 → 2.8.0rc1
Alberto Donato (ack)
Changed in maas:
milestone: 2.8.0b4 → 2.8.0rc1
Revision history for this message
Lee Trager (ltrager) wrote :

This isn't a bug with MAAS, it's a bug with shim/grub. MAAS gets its bootloaders from the public stream at images.maas.io which is generated by lp:maas-images. lp:maas-images pulls the bootloaders out of the archive, its currently set to pull them from bionic.

Secure boot is working in the ephemeral environment it's failing when trying to local boot into the deployed environment. When an x86_64 UEFI machine local boots with MAAS it boots over the network, downloads bootx64.efi(shim) which downloads grubx64.efi and this grub.cfg[1]. The grub from over the network finds /boot/efi/ubuntu/shimx64.efi on the local filesystem and chainboots to it. Somehow the chain of trust breaks here causing the system to halt.

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found
EFI stub: UEFI Secure Boot is enabled.
Bootloader has not verified loaded image.
System is compromised. halting.

I tried using the shim and grub from Focal but I still get the same problem.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template

Changed in grub (Ubuntu):
status: New → Confirmed
Changed in shim-signed (Ubuntu):
status: New → Confirmed
summary: - MAAS can't deploy to a server with Secure Boot active
+ Chainbooting from grub over the network to local shim breaks chain of
+ trust
Revision history for this message
Lee Trager (ltrager) wrote :

For LXD Pods[1] we had to disable secure boot to get around this issue. When this bug is fixed we should reenable secure boot for LXD Pods.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/drivers/pod/lxd.py#n515

tags: added: rls-bb-incoming
tags: added: rls-ff-incoming
Revision history for this message
Jeff Lane  (bladernr) wrote :

FWIW, a partner is also hitting this in the field when trying to do Secure Boot installs which break 100% of the time for them.

They have noted, however, that installing from ISO works and can successfully install and boot on a secure boot enabled server. They've only tested Focal ISOs at this time, but this tells me that there's some difference between what MAAS images are getting or have gotten, and what the ISO has or is doing during install.

tags: added: blocks-hwcert-server
Revision history for this message
Lee Trager (ltrager) wrote :

@Jeff MAAS uses the same bits as what the ISO uses. What is different is how local booting happens with MAAS vs with the ISO. When installed with the ISO the local boot process is UEFI Firmware -> Shim(from disk) -> GRUB(from disk) -> Boot local kernel. When installed with MAAS the local boot process is UEFI Firmware -> Shim(from network) -> GRUB(from network) -> Shim(from disk) -> Grub(from disk) -> Boot local kernel. The chain of trust when switching going from GRUB(from network) to Shim(from disk). I suspect but haven't verified that this may be due to the shim not being signed with a key GRUB has.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1865515] Re: Chainbooting from grub over the network to local shim breaks chain of trust

On Tue, May 19, 2020 at 04:15:47PM -0000, Lee Trager wrote:
> I suspect but haven't verified that this may be due to the shim
> not being signed with a key GRUB has.

GRUB embeds no keys, it calls out to shim for verification of signatures.

It would be helpful if someone could verify whether the boot chain is
stopping at the second shim, or at the second grub.

Revision history for this message
Lee Trager (ltrager) wrote :

Based on the MAAS logs the halt happens after the remote shim, grub, and grub.cfg have been loaded. I didn't see anything in the console to show grub running but it may have been cleared before I could see it.

Console output:

Booting local disk...
Failed to open \efi\boot\grubx64.efi - Not Found
Failed to load image \efi\boot\grubx64.efi: Not Found
start_image() returned Not Found

Bootloader has not verified loaded image.
System is compromised. halting.

rackd.log

2020-05-19 20:54:04 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.0.0.117
2020-05-19 20:54:04 provisioningserver.rackdservices.tftp: [info] bootx64.efi requested by 10.0.0.117
2020-05-19 20:54:05 provisioningserver.rackdservices.tftp: [info] grubx64.efi requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/command.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/fs.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/crypto.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/x86_64-efi/terminal.lst requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg requested by 10.0.0.117
2020-05-19 20:54:06 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg-00:16:3e:49:52:7b requested by 10.0.0.117

You can reproduce this pretty easily with MAAS 2.8 and LXD Pods.

1. Install MAAS 2.8
2. Add an LXD Pod
3. Compose a machine in the LXD Pod and let it commission
4. Reenable secure boot in the LXD virtual machine
   lxc config edit <vm name>
   Delete the line 'security.secureboot: "false"'
5. Attempt to deploy Ubuntu

Revision history for this message
Steve Langasek (vorlon) wrote :

On Tue, May 19, 2020 at 08:59:46PM -0000, Lee Trager wrote:
> Based on the MAAS logs the halt happens after the remote shim, grub, and
> grub.cfg have been loaded. I didn't see anything in the console to show
> grub running but it may have been cleared before I could see it.

> Console output:

> Booting local disk...
> Failed to open \efi\boot\grubx64.efi - Not Found
> Failed to load image \efi\boot\grubx64.efi: Not Found
> start_image() returned Not Found

> Bootloader has not verified loaded image.
> System is compromised. halting.

Doesn't this output show that it has successfully chained to the local shim,
since it's shim that is loading \efi\boot\grubx64.efi and those messages are
from shim?

What I don't currently understand is why this should behave any differently
with or without SecureBoot enabled; that will need digging into. But the
specific error "Not found" certainly implies there is a difference in the
path resolution when secureboot is on.

Revision history for this message
Lee Trager (ltrager) wrote :

MAAS doesn't know for sure what operating system is deployed locally. When booting locally MAAS sends a grub.cfg[1] which searches for the shim or local bootloader. MAAS first tries \efi\boot\bootx64.efi as that is the default location as per the UEFI spec. Most operating systems including Ubuntu put a bootloader there. The shim fails to find grub as Ubuntu only stores grub in \efi\ubuntu\grubx64.efi. The two failure messages are from that. The config then tries to load \efi\ubuntu\shimx64.efi which succeeds but is unable to verify either \efi\ubuntu\shimx64.efi or \efi\ubuntu\grubx64.efi.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template

Revision history for this message
Lee Trager (ltrager) wrote :

I tried modifying the MAAS local boot grub.cfg to directly chainboot \efi\ubuntu\shimx64.efi. This gets rid of the failed to open/failed to load errors. Local grub appears to load but halts saying the system is compromised when it tries to boot the local kernel.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Can I have access to said MAAS environment and those machines?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Please provide remote artifacts
Please provide local artifacts
Please provide reproducer steps
Please provide details how local artifacts were installed
Please provide list of certs trusted by the node's firmware
Please provide access to MAAS with a secureboot on & off target nodes

Changed in shim-signed (Ubuntu):
status: Confirmed → Incomplete
Changed in grub (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Rod Smith (rodsmith) wrote :

Unfortunately, capella in 1SS is not currently accessible by our team. You can test on jehan in 18T, though; I'm sending you an e-mail with details.

I don't know what you mean by "remote artifacts" and "local artifacts." The steps to reproduce the problem is simply to enable Secure Boot and attempt to deploy the server; it will fail as described in the initial bug report.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Well, quite simply we'd like a minimal test case without involving maas, so that we can test this in a sensible way. This is also important for SRUs, as we need to test them before releasing.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Consider that we might need to upgrade grub on the MAAS server, and need to test this on bionic, focal, groovy on both maas server and deployed server sides.

e.g. we might need to test deploying a groovy server from a bionic MAAS, and vice versa, and other combinations of this.

Revision history for this message
Rod Smith (rodsmith) wrote :

I've managed to create a procedure that duplicates this problem without the involvement of MAAS, except for one file pulled from MAAS. The procedure is awkward, but it reproduces the problem. Here's the procedure:

1) Ensure that Secure Boot is enabled.
2) Install Ubuntu. (I used 20.04 LTS server.)
3) Retrieve shimx64.efi from a MAAS server
   (/var/lib/maas/boot-resources/current/grubx64.efi). I'm appending
   a copy of the file I used to this bug report.
4) sudo mkdir /boot/efi/EFI/foo
5) sudo cp /boot/efi/EFI/ubuntu/shimx64.efi /boot/efi/EFI/foo/
6) Copy the grubx64.efi retrieved from step #3 to /boot/efi/EFI/foo.
7) sudo efibootmgr -c -l \\EFI\\foo\\shimx64.efi -L "Secondary GRUB"
8) Reboot. A grub> prompt should appear, from shimx64.efi in the EFI/foo
   directory on the ESP.
9) Type "set root='(hd0,gpt1)'"
10) Type "chainloader /EFI/ubuntu/shimx64.efi"
11) Type "boot". The messages noted in the initial bug report should
    appear and the system should halt.

Note that some disk references may need to be adjusted on some systems -- (hd0,gpt1) is the ESP, and the efibootmgr command assumes the ESP is /dev/sda1 from within Ubuntu.

Interestingly, substituting grubx64.efi for shimx64.efi in step #10 results in a successful boot, which may be a simple workaround from within MAAS -- if MAAS's configuration is changed to bypass the second shimx64.efi, it may work better.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I don't see a netboot in there, am I missing something?

Revision history for this message
Julian Andres Klode (juliank) wrote :

The grubx64.efi from #3 is probably a grubnetx64.efi?

Revision history for this message
Rod Smith (rodsmith) wrote :

As I said, the EFI/foo/grubx64.efi is taken from MAAS. It's presumably netboot-enabled, but can't seem to find its config file, hence the need for the manual entry in steps 9-11. Note that I'm not a MAAS developer, so my understanding of its internals is limited.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I'm just wondering where Maas is getting the grubx64.efi from, I assume/hope it's the grubnetx64.efi binary built by the grub package.

Because the bug might be in there.

Anyway, this should be enough to investigate further and sounds somewhat familiar too

Changed in shim-signed (Ubuntu):
status: Incomplete → Confirmed
Changed in grub (Ubuntu):
status: Incomplete → Confirmed
Changed in shim-signed (Ubuntu):
status: Confirmed → Triaged
Changed in grub (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Lee Trager (ltrager) wrote :

The MAAS environment I've been using to reproduce this is virtual. I have MAAS running in an LXD container connected to an LXD Pod. To recreate this environment you'll have to install MAAS 2.8, python-pylxd from github(if using the Debian packages), and apply this[1] patch to reenable secure boot. After MAAS is setup you'll need to configure LXD to accept remote connections to be able to add it as a MAAS Pod.

This bug should be reproducible using LXD

1. Download GRUB and the shim. MAAS gets both from Bionic, you can download them direct here[1]
2. Setup a TFTP server to provide them
3. Add grub.cfg from MAAS[3]
4. Setup DHCP - Example dhcpd.conf from MAAS[4]
5. Create LXD VM
6. Modify LXD VM to boot from over the network
7. See boot failure

[1]http://paste.ubuntu.com/p/gjXhVTDgRv/
[2] https://images.maas.io/ephemeral-v3/daily/bootloaders/uefi/amd64/
[3] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template
[2] http://paste.ubuntu.com/p/RMRxYkDrNG/

Revision history for this message
Lee Trager (ltrager) wrote :

All bootloader files are pulled from the archive and provided on images.maas.io by lp:maas-images. bootloaders.yaml describes what files are pulled from what packages.

https://git.launchpad.net/maas-images/tree/conf/bootloaders.yaml

Alberto Donato (ack)
Changed in maas:
milestone: 2.8.0rc1 → 2.8.0
Alberto Donato (ack)
Changed in maas:
milestone: 2.8.0rc3 → 2.8.0
Steve Langasek (vorlon)
affects: grub (Ubuntu) → grub2 (Ubuntu)
Revision history for this message
Brian Murray (brian-murray) wrote :

Can you elaborate on this step? "6. Modify LXD VM to boot from over the network"

tags: added: id-5ee24d297b5c2a5aa43fda04
Revision history for this message
Lee Trager (ltrager) wrote :

By default an LXD VM boots from the disk first. However you can change the boot order by adding "boot.priority" to your devices. The device with the highest number boots first.

LXD devices config for booting off the boot disk.
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: default
    size: "8000000000"
    type: disk

LXD devices config for booting off the network first.
devices:
  eth0:
    boot.priority: "1"
    name: eth0
    nictype: bridged
    parent: br0
    type: nic
  root:
    boot.priority: "0"
    path: /
    pool: default
    size: "8000000000"
    type: disk

Alberto Donato (ack)
Changed in maas:
milestone: 2.8.0 → 2.9.0b1
tags: added: maas-grub
tags: removed: rls-bb-incoming rls-ff-incoming
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b1 → 2.9.0b2
Revision history for this message
Julian Andres Klode (juliank) wrote :

I could reproduce this by modifying my shim netboot testing script in bug 1862171 to add a local hard disk to the VM, and then run chainloader (hd0,gpt1)/efi/ubuntu/shimx64.efi, and boot; which successfully loads grub, but kernel then reports it is compromised.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

So what is the order of boot?

(FW) -> grubnet -> shim (local) -> grub (local) ? I don't think that would work, given that grubnet doesn't know how to validate shim, without shim protocol installed.

I thought the chain must be (FW) -> shim -> grubnet -> shim (local) -> grub (local). But I'm not sure how to netboot remote shim.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Sep 10, 2020 at 01:59:14PM -0000, Dimitri John Ledkov wrote:
> So what is the order of boot?

> (FW) -> grubnet -> shim (local) -> grub (local) ? I don't think that
> would work, given that grubnet doesn't know how to validate shim,
> without shim protocol installed.

> I thought the chain must be (FW) -> shim -> grubnet -> shim (local) ->
> grub (local). But I'm not sure how to netboot remote shim.

That *is* what is being done in MAAS. And Julian reports success booting to
the local shim+grub, so he must be doing the same.

Revision history for this message
Julian Andres Klode (juliank) wrote :

The problem seems to be caused by having two shims, as we get the "Bootloader has not verified loaded image" messages when loading this way, but if we instead chainload the local grub directly, things work.

So it seems like local shim does not correctly uninstall and replace remote shim protocol or something like that.

Revision history for this message
Lee Trager (ltrager) wrote :

MAAS tries to do

(FW) -> shim(net) -> grub(net) -> shim(local) -> grub(local)

When grub(net) runs MAAS send it this[1] config which searches for the local bootloader as we don't know where it is. It prefers chainloading the shim but will fall back on grub if that isn't found.

The reason we chainload the local shim is because we need to support secure boot for multiple operating systems. My understanding of the shim is that it only stores the keys from the OS vendor that provides it, not multiple vendors. MAAS officially supports Ubuntu, CentOS, RHEL, Windows, and VMware. Users have gotten other operating systems to work as well and there has been talk of adding SUSE support.

Secure boot must work for every operating system MAAS supports, not just Ubuntu.

[1] https://git.launchpad.net/maas/tree/src/provisioningserver/templates/uefi/config.local.amd64.template

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Sep 10, 2020 at 05:23:14PM -0000, Lee Trager wrote:
> Secure boot must work for every operating system MAAS supports, not just
> Ubuntu.

Chainloading to shim instead of directly to grub is mandatory /even/ for
Ubuntu because it is not guaranteed over time that the shim in the MAAS
stream and the shim on disk from different versions of Ubuntu have the same
security policies.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Further analysis today suggests the issue is that shim never uninstalls the old shim protocols, and then things get weird. Patching shim to call to the parent shim to uninstall itself, rather than falsely attempting to uninstall it ourselves, makes it work, but it's just a hack so far.

We can patch this properly I suppose by introducing a new shim protocol that can be used to uninstall shims, but this is obviously a problem, as you'll need updated shims on both the maas server and the client.

So, while I think we understand the issue better, I'm afraid this looks to be a long term issue that needs fixes in all other distros you want to load as well, and agreement with upstream on how to solve.

Changed in shim:
status: Unknown → New
Revision history for this message
Julian Andres Klode (juliank) wrote :

As a workaround, MAAS may for the time being, chainload EFI/ubuntu/grubx64.efi when the goal is to securely boot Ubuntu systems. Secure boot for other systems will require work on those other systems, once we have the patches available upstream, and is hopefully ready by 21.04, but other distros may not have picked it up by then.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I'll try to implement one last hack, and maybe that works, and is acceptable enough that everyone picks it this year. We'll see. It's a bit unfortunate the issue is in the second shim failing to unload the first shim, as otherwise, we'd only need a fixed maas shim :/

Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b2 → 2.9.0b3
milestone: 2.9.0b3 → 2.9.0b4
Changed in maas:
status: Confirmed → Triaged
importance: Undecided → High
tags: added: fr-24
Lee Trager (ltrager)
Changed in maas:
milestone: 2.9.0b4 → 2.9.0b7
Changed in maas:
milestone: 2.9.0b7 → 2.9.x
Revision history for this message
Jeff Lane  (bladernr) wrote :

Has there been any movement on this? We get asked about Secure Boot in MAAS periodically by various hardware partners.

Revision history for this message
Julian Andres Klode (juliank) wrote :

There is no movement and there will be no movement for some more months at least.

Revision history for this message
Julian Andres Klode (juliank) wrote :

This is not a problem we can fix on our side. It is an upstream shim problem. Every distribution you want to boot will need a fixed shim, and it's not a priority right now, given that we have a moratorium on new shims being signed.

We have the ability to secure boot Ubuntu already, by not chainloading to shim, but directly to grub, which we agreed on doing a few months ago.

Rex Tsai (chihchun)
Changed in oem-priority:
assignee: nobody → ethan.hsieh (ethan.hsieh)
importance: Undecided → Critical
Rex Tsai (chihchun)
tags: added: oem-priority
Changed in grub2 (Ubuntu):
status: Triaged → Fix Released
Changed in grub2 (Ubuntu Focal):
status: New → Triaged
Changed in shim-signed (Ubuntu):
status: Triaged → Invalid
Changed in shim-signed (Ubuntu Focal):
status: New → Invalid
Changed in shim-signed (Ubuntu Groovy):
status: Triaged → Invalid
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I can use grub from hirsute, to boot into Ubuntu's grub, then execute `exit 1` to fallback to the next BootOrder bootentry and boot into centos8 with Secureboot on.

Meaning the chain of events is Ubuntu's Shim => Ubuntu's grub => exit 1 => Centos Shim => Centos Grub => complete boot, and bootctl still reports that secureboot is on & dmesg/kernel too.

This will need the new grub and changes to MAAS how it does the "boot from local drive" menu entry.

See https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu37

The file that maas streams use from https://images.maas.io/ephemeral-v3/stable/bootloaders/uefi/amd64/20201123.0/grub2-signed.tar.xz is this one http://archive.ubuntu.com/ubuntu/dists/hirsute/main/uefi/grub2-amd64/2.04-1ubuntu37/grubnetx64.efi.signed

This is what needs to be deployed on the Maas provisioning side.

Then in MAAS for the boot from local drive menuentry should change i.e. https://github.com/maas/maas/blob/master/src/provisioningserver/templates/uefi/config.local.amd64.template

should be "just"

---8<---
set default="0"
set timeout=0

menuentry 'Local' {
    echo 'Booting local disk...'
    exit 1
}
---8<---

And then assuming that provisioning / curtin sets up correct bootorder entries _or_ a removable media path is autodetected by the device firmware, things should "just work".

I note that maas streams use grubnetx64.efi.signed from bionic-updates, and this change is currently only in hirsute.

Changed in oem-priority:
status: New → Confirmed
Revision history for this message
Lee Trager (ltrager) wrote :

Using "exit 1" to to chainboot breaks if there is no UEFI boot entry. MAAS currently has two known bugs where this is the case. There may be more, we need to test all operating systems MAAS supports.

LP:1906379 - Ubuntu is removing the UEFI boot entry during shutdown for CentOS.
LP:1910600 - MAAS does not create a UEFI boot entry for VMware ESXi 6.7

If we apply the patch in #41 we will be breaking existing deployments. There is no way for MAAS to fix this, the user will have to manually login and configure the system.

config.local.amd64.template originally chainbooted to the operating system based on the operating system name. We had to change this because MAAS has no way to know what operating system a custom image is or how to handle when RHEL changed its UEFI path. For example is custom/myimage Ubuntu, CentOS, Windows or VMware?

I'm hesitant to use this fix as we will likely be breaking existing deployments.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

There is no other way to deploy with secureboot, without modifying the deployed OS shim.

Thus either we fix above two bugs as well, or those targets will not have secureboot and continue to use chainloading.

Requirements for secureboot are:
* create valid uefi boot entries
* do not delete them
* use exit 1 to boot local disk

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I've now verified the proposed `exit 1` menu entry, and i can successfully boot via maas in secureboot mode.

Revision history for this message
Lee Trager (ltrager) wrote :

We're in a bit of a bind here, even if we do fix LP:1906379 and LP:1910600 there is no way for MAAS to fix existing deployments. Meaning unilaterally using "exit 1" will break existing deployments which users will have to manually fix.

Is it at all possible to create a grub.cfg which can detect if there is a UEFI local boot entry which is next? If so we could use "exit 1" if there is an entry and fall back onto the exiting grub.cfg if there isn't.

description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Rod, or anyone else affected,

Accepted grub2 into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu35.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2 (Ubuntu Groovy):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-groovy
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Rod, or anyone else affected,

Accepted grub2 into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.04-1ubuntu26.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2 (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (grub2/2.04-1ubuntu26.8)

All autopkgtests for the newly accepted grub2 (2.04-1ubuntu26.8) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

ubuntu-image/1.10+20.04ubuntu1 (s390x, ppc64el, amd64, armhf, arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#grub2

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (grub2/2.04-1ubuntu35.2)

All autopkgtests for the newly accepted grub2 (2.04-1ubuntu35.2) for groovy have finished running.
The following regressions have been reported in tests triggered by the package:

zsys/unknown (amd64)
grubzfs-testsuite/unknown (amd64)
ubiquity/unknown (amd64)
grml2usb/unknown (amd64)
ubuntu-image/unknown (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/groovy/update_excuses.html#grub2

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

1.155.2+2.04-1ubuntu35.2 and 1.142.10+2.04-1ubuntu26.8 were sideloaded onto MAAS to deploy Ubuntu Focal with secureboot on as part of https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1911439 verification.

tags: added: verification-done verification-done-focal verification-done-groovy
removed: verification-needed verification-needed-focal verification-needed-groovy
Revision history for this message
Rod Smith (rodsmith) wrote :

I may have installed it incorrectly, but what I've got is not working. It boots fine with Secure Boot disabled, but it shuts down with the same message about a compromised system as the stock GRUB. What I did to test:

1) Installed grub-efi-amd64-signed (version
   1.142.10+2.04-1ubuntu26.8)
2) Copied /usr/lib/grub/x86_64-efi-signed/grubnetx64.efi.signed to
   /var/lib/maas/boot-resources/snapshot-20210121-195440/bootloader/uefi/amd64/grubx64.efi
3) Booted a test node with Secure Boot disabled; it was
   fine.
4) Enabled Secure Boot and tried again; it failed.

If I've got the wrong package, I can try again. Please advise.

Revision history for this message
Julian Andres Klode (juliank) wrote :

Rod, yes the chain loading will still fail, the solution that works as Dimitri pointed out is to exit 1, which these SRUs backport to stable releases. Did you read the test case?

Shim chain loading can't be fixed in grub. And shims we can't touch right now, also upstream does not really consider chain loading to another shim to be a proper use case, and these places should use exit 1 or set BootNext and reboot instead.

Revision history for this message
Rod Smith (rodsmith) wrote :

I'm the original bug reporter, and in that context (of MAAS deployments), this fix does nothing helpful -- it literally does not change the original observed problem in any way. Thus, I'm marking this verification-failed-focal. (I've not tested under other Ubuntu versions.)

tags: added: verification-failed-focal
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@rodsmith

That's not helpful, as that's now blocking grub2 release staged for the point release. Despite this bug report being used to introduce `exit 1` support, which this bug report now provides for focal.

Chainloading will not work, and will never be possible to fix. Thus yes, MAAS bug report remains to be open, and we will have to modify MAAS to figure out a secureboot deployment without use of chainloading.

tags: removed: verification-failed-focal
Revision history for this message
Julian Andres Klode (juliank) wrote :

There is a plan for fixing shim upstream, it's possible. But the way it's being planned, I'm not sure we'll see it this year.

Changed in shim-signed (Ubuntu):
status: Invalid → Triaged
Changed in shim-signed (Ubuntu Focal):
status: Invalid → Triaged
Changed in shim-signed (Ubuntu Groovy):
status: Invalid → Triaged
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Ok, so I think we should release the grub2 parts even though they do not fix the ultimate problem (since as mentioned, this is unfixable via grub). Let's leave the shim task open to indicate that this is still something that will be addressed.
Thanks!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu35.2

---------------
grub2 (2.04-1ubuntu35.2) groovy; urgency=medium

  * debian/patches/grub-install-backup-and-restore.patch: Fix-up the patch
    to correctly initialyze the names of the modules to restore. LP:
    #1907085
  * rhboot-f34-make-exit-take-a-return-code.patch,
    rhboot-f34-dont-use-int-for-efi-status.patch: allow grub to exit
    non-zero under EFI, this should allow falling back to the next
    BootOrder BootEntry. LP: #1865515
  * rhboot-f34-tcp-add-window-scaling-support.patch: speed up netboot
    transfer speed. LP: #1911439
  * rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch:
    add support for link layer addresses of up to 32-bytes. LP: #1911439
  * rhboot-f34-make-pmtimer-tsc-calibration-fast.patch:
    speed up calibration time, especially when booting VMs. LP: #1911439
  * minilzo: built using the distribution's minilzo. LP: #1911440

 -- Dimitri John Ledkov <email address hidden> Thu, 14 Jan 2021 12:30:56 +0000

Changed in grub2 (Ubuntu Groovy):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for grub2 has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.04-1ubuntu26.8

---------------
grub2 (2.04-1ubuntu26.8) focal; urgency=medium

  * debian/patches/grub-install-backup-and-restore.patch: Fix-up the patch
    to correctly initialyze the names of the modules to restore. LP:
    #1907085
  * rhboot-f34-make-exit-take-a-return-code.patch,
    rhboot-f34-dont-use-int-for-efi-status.patch: allow grub to exit
    non-zero under EFI, this should allow falling back to the next
    BootOrder BootEntry. LP: #1865515
  * rhboot-f34-tcp-add-window-scaling-support.patch: speed up netboot
    transfer speed. LP: #1911439
  * rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet.patch,
    ubuntu-fixup-rhboot-f34-support-non-ethernet-2.patch:
    add support for link layer addresses of up to 32-bytes. LP: #1911439
  * rhboot-f34-make-pmtimer-tsc-calibration-fast.patch:
    speed up calibration time, especially when booting VMs. LP: #1911439
  * minilzo: built using the distribution's minilzo. LP: #1911440

 -- Dimitri John Ledkov <email address hidden> Wed, 13 Jan 2021 14:12:38 +0000

Changed in grub2 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
David van der Spek (vanderspek-david) wrote :

I believe grub2 - 2.04-1ubuntu26.8 is causing an issue with deploying a server using MAAS 2.9.1. See https://bugs.launchpad.net/grub/+bug/1898550.

Changed in maas:
milestone: 2.9.2 → 2.9.x
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

shim-signed is won't fix.

but separately, we are working on making maas use grub from focal to make "secureboot deployment with maas just work"

Changed in shim-signed (Ubuntu):
status: Triaged → Won't Fix
Changed in shim-signed (Ubuntu Groovy):
status: Triaged → Won't Fix
Changed in shim-signed (Ubuntu Focal):
status: Triaged → Won't Fix
Revision history for this message
Steve Langasek (vorlon) wrote :

> shim-signed is won't fix.

No, it absolutely is not.

The behavior of shim is WRONG and MUST be fixed; we've previously idenitfied that this affects not only cross-distro chainloading, it also impacts chainloading from the removable disk shim to \EFI\ubuntu\shimx64.efi via fallback.efi on non-TPM-enabled hardware. And per Julian, there has been acknowledgement from upstream that the behavior here needs to be fixed.

Changed in shim-signed (Ubuntu):
status: Won't Fix → Triaged
Changed in shim-signed (Ubuntu Focal):
status: Won't Fix → Triaged
Changed in shim-signed (Ubuntu Groovy):
status: Won't Fix → Triaged
Revision history for this message
Lee Trager (ltrager) wrote :

I was told that the latest SHIM/GRUB from Hirsute may have fixed this issue. It looks like chain loading from network GRUB to local GRUB works but the VM turns off when it tries to start the kernel. Attached is GRUB output with debug="all"

My test setup is as follows:
grub over the network and local: 2.04-1ubuntu45
shim over the network and local: 1.46+15.4-0ubuntu1
OS: Ubuntu 21.04
Machine: LXD VM 4.0.5
qemu command: /snap/lxd/19647/bin/qemu-system-x86_64 -S -name lxd-vm -uuid f5fec2be-21c7-4ffb-847f-f88589c06686 -daemonize -cpu host -nographic -serial chardev:console -nodefaults -no-reboot -no-user-config -sandbox on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny -readconfig /var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.conf -pidfile /var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.pid -D /var/snap/lxd/common/lxd/logs/maas_lxd-vm/qemu.log -chroot /var/snap/lxd/common/lxd/virtual-machines/maas_lxd-vm -smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas lxd

Revision history for this message
Lee Trager (ltrager) wrote :

I'm using the default LXD settings, attached is qemu.conf.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

ltrager - the secure-boot.log ends with

"EFI stub: UEFI Secure Boot is enabled."

Which is the first message from the loaded linux kernel. At this point grub & shim have all succeeded, no?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Can you please try to have the installed machine booting without 'quiet' and with 'earlyprintk=efi' ?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I am concerned about the -no-reboot option, and what constitutes a "reboot".

Because chainloader; exit 1; boot linux => all can look like reboots depending what qemu is monitoring/trapping. As all of those things start brand new EFI application via LoadImage2 call.

Revision history for this message
Lee Trager (ltrager) wrote :

I just tried retesting with grub-efi-amd64-signed_1.169+2.04-1ubuntu45_amd64.deb and shim-signed_1.47+15.4-0ubuntu2_amd64.deb from Impish. I made sure those versions of GRUB and the shim were served by MAAS and that they were both used in the local deployed environment. During testing I added 'set debug="all"' to grub.cfg coming from MAAS as well as the local grub.cfg. I also made sure 'earlyprintk=efi' was passed to the kernel in the local grub.cfg.

I don't think the kernel is ever being loaded. The last message I get is "EFI stub: UEFI Secure Boot is enabled." The system then hangs and never proceeds.

Revision history for this message
Julian Andres Klode (juliank) wrote :

That _is_ the kernel, well, its EFI stub.

Revision history for this message
Lee Trager (ltrager) wrote :

Upon further testing I was able to confirm Dimitri's suspicion that the -no-reboot option in LXD is causing the SecureBoot failures. I have reported this to LXD[1] and will work to resolve that separately. I am able to get SecureBoot working when using libvirt with signed OVMF using grub-efi-amd64-signed_1.169+2.04-1ubuntu45_amd64.deb and shim-signed_1.47+15.4-0ubuntu2_amd64.deb from Impish. I was able to deploy both 20.04 and 21.04 with SecureBoot enabled as verified by mokutil --sb-state.

Our currently policy in MAAS is to only add bootloaders from an LTS in main to the stream. Is there any ETA as to when the shim and grub will be backported to Focal?

[1] https://github.com/lxc/lxd/issues/8770

Revision history for this message
Yuan-Chen Cheng (ycheng-twn) wrote :

lower priority in oem-priority since no activity

Changed in oem-priority:
importance: Critical → High
Revision history for this message
Jerzy Husakowski (jhusakowski) wrote :

Let's re-test with the updated shim & bootloaders from Jammy.

Changed in maas:
milestone: 2.9.x → 3.3.0
Changed in maas:
milestone: 3.3.0 → 3.4.0
Changed in maas:
milestone: 3.4.0 → 3.5.0
Revision history for this message
Jeff Hillman (jhillman) wrote :

subscribed field high. This is an ongoing issue for a customer that is about to do a large scale deploy of Secureboot for non ubuntu operating systems (RHEL / ESXi / Windows). And this issue is affecting all 3 of them in the same way.

Revision history for this message
Julian Andres Klode (juliank) wrote :

To the best of our knowledge this issue has been worked around in shim 15.4-0ubuntu1

shim (15.4-0ubuntu1) hirsute; urgency=medium

  [ Dimitri John Ledkov ]
  * deiban/rules: start using DISABLE_EBS_PROTECTION=1 to allow
    chainloading shim to shim, and shim to kernel.efi.

so it should no longer be affecting anyone.

But then recent comments shifted to an entirely different topic where the kernel was loaded successfully and then failed to boot, and I think the bug is not meaningful to continue discussing on because we're now talking about vastly different things.

If you/the customers still see issues, I'd advise filing a new bug.

The advise about using exit instead of chainloading is still, and increasingly moreso, valid, as MAAS's approach of chainloading is inappropriate as it breaks all the measurements of TPM, TDX, etc and so becomes increasingly less useful.

Changed in maas:
status: Triaged → Fix Committed
Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.