25.04 beta TPMFDE: first boot failure

Bug #2104316 reported by Dan Bungert
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gadget snap for Personal Computers using Intel or AMD processors
New
Undecided
Unassigned
edk2 (Ubuntu)
Fix Released
Undecided
Mate Kukri
grub2 (Ubuntu)
New
Undecided
Unassigned
systemd (Ubuntu)
New
Undecided
Unassigned

Bug Description

25.04 beta hybrid TPMFDE: first boot failure

Using virt-manager, creating a VM, adjusting the firmware for UEFI (.ms), and adding a TPM (default settings), the resulting system appears to install but fails on first boot.

The screen shows TianoCore along with

BdsDxe: loading Booot0003...
BdsDxe: starting Booot0003...

If I repeat this test with ubuntu 24.04.2 boot makes it boots as expected, showing this prior to continuing to the desktop:

BdsDxe: loading Booot0003...
BdsDxe: starting Booot0003...
/EndEntire
/EndEntire

On 24.04.2, if I hit escape during the /EndEntire bit, I can see the Grub menu offering the "Run Ubuntu Core" option, which never seems to work on the 25.04 beta install.

Revision history for this message
Dan Bungert (dbungert) wrote :
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/2104316

tags: added: iso-testing
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

In one of those very headbanging things, it works for me.

mwhudson@orcrist:~/isos/plucky$ sha256sum plucky-desktop-amd64.iso
3bdf7673f7af4589b7f0e934ccc113f9549e3f498208c4b83c7d0f53ac5d65f6 plucky-desktop-amd64.iso

Install started with

virt-install --disk size=45 --connect qemu:///system --os-variant ubuntu24.04 --name ubuntu-25.04 --memory 8192 --cdrom plucky-desktop-amd64.iso --features smm.state=on --boot loader=/usr/share/OVMF/OVMF_CODE_4M.secboot.fd,loader.readonly=yes,loader.type=pflash,nvram.template=/usr/share/OVMF/OVMF_VARS_4M.ms.fd,loader_secure=yes --tpm backend.type=emulator,backend.version=2.0,model=tpm-tis

So I don't know what is going on.

Revision history for this message
Alessandro Astone (aleasto) wrote :
Download full text (8.0 KiB)

Can reproduce the same failure in QEMU/KVM with https://cdimage.ubuntu.com/daily-live/20250326.6/plucky-desktop-amd64.iso
Here's my full QEMU/KVM config:

<domain type="kvm">
  <name>ubuntu25.04-TPM</name>
  <uuid>a61958b6-2418-4e85-b25a-d8c3f17f7ee4</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/25.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">8192000</memory>
  <currentMemory unit="KiB">8192000</currentMemory>
  <vcpu placement="static">8</vcpu>
  <os firmware="efi">
    <type arch="x86_64" machine="pc-q35-9.2">hvm</type>
    <firmware>
      <feature enabled="yes" name="enrolled-keys"/>
      <feature enabled="yes" name="secure-boot"/>
    </firmware>
    <loader readonly="yes" secure="yes" type="pflash" format="raw">/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
    <nvram template="/usr/share/OVMF/OVMF_VARS_4M.ms.fd" templateFormat="raw" format="raw">/home/aleasto/.config/libvirt/qemu/nvram/ubuntu25.04-TPM_VARS.fd</nvram>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state="off"/>
    <smm state="on"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on"/>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" discard="unmap"/>
      <source file="/home/aleasto/.local/share/libvirt/images/ubuntu25.04-TPM.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <target dev="sda" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" m...

Read more...

Revision history for this message
Tim Andersson (andersson123) wrote :

This also affects this testcase:
https://iso.qa.ubuntu.com/qatracker/milestones/466/builds/327173/testcases/1771/results

 Install UEFI SecureBoot nVidia in Ubuntu Desktop amd64 in Plucky Daily

Revision history for this message
Tim Andersson (andersson123) wrote :
Skia (hyask)
Changed in ubuntu:
milestone: none → ubuntu-25.04
Revision history for this message
Alessandro Astone (aleasto) wrote :

The screenshot you posted shows that UEFI is already trying to boot from the ISO. Is it possible you simply have the system configured to always boot from cdrom?

I'm unfamiliar with virt-install; does it leave the CD connected after install?

I'm also puzzled at the attempt to test NVIDIA in a VM. Is that supposed to use PCI passthrough to reach some nvidia card?

Sorry if this is silly.

Revision history for this message
Tim Andersson (andersson123) wrote :

removed my comments, there was an issue with my command line args I used for virt-install and my testing for that specific test case is void. This only affects TPM FDE :)

Revision history for this message
Dan Bungert (dbungert) wrote :

> I'm unfamiliar with virt-install; does it leave the CD connected after install?

I actually did testing with both virt-manager and raw qemu command line, both show the same problem, virt-manager is easier to explain. virt-manager has the behavior that the install ISO is automatically removed after the install boot, so I'm quite certain it's not trying to boot the cdrom there. I have additionally confirmed this with the VM that still won't first boot, and verified that there is no cdrom attached.

Revision history for this message
Alessandro Astone (aleasto) wrote :

Sorry Dan, my comments were addressing Tim's.

I can reproduce your very same failure.

Revision history for this message
Chris Peterson (cpete) wrote :

I am speculating this is a VM firmware issue. I ran the test on a Noble host system and was unable to recreate the issue. I then used the OVMF firmware from the latest version in Plucky and was able to recreate the failure on first boot behavior.

For reference:
- Working install: ovmf 2024.02-2ubuntu0.1
- Failing install: ovmf 2025.02-3ubuntu1

Revision history for this message
Chris Peterson (cpete) wrote :

There appears to be some error text on the serial terminal

Revision history for this message
Chris Peterson (cpete) wrote :

Dan and I bisected this to ovmf 2025.02-1 but we're not sure why yet.

To be explicit: We started with 2024.02-2ubuntu0.1 and worked our way up the publishing history. 2024.11-5 is the last release of edk2/ovmf with working firmware. 2025.02-1, 2025.02-2, 2025.02-3, and 2025.02-3ubuntu1 all fail.

https://launchpad.net/ubuntu/+source/edk2/2025.02-1

Revision history for this message
dann frazier (dannf) wrote :

This is likely https://github.com/tianocore/edk2/issues/10883#issue-2938078412

Which is very likely due to a bug in grub (or shim?). I'm looking to disable the mem attribute protocol by default in the non-secure boot images in Debian, but I'd prefer to leave it on for the secure boot ones that users can override with a -fw_cfg switch.

Revision history for this message
Mate Kukri (mkukri) wrote :

I wouldn't discount an edk2 regression here, ive seen another recently too.

Will try to look into this today.

Revision history for this message
Mate Kukri (mkukri) wrote :

Maybe the presence of the mem attr protocol triggers edk2 to change memory attributes to NX on allocated buffer for even explicitly non-NX images...

We have an optional NX compat chain available in the shim package (that's off by default) and all this was tested a while ago, but at that point the only AMD64 firmware with mem attribute protocol was project mu.

Revision history for this message
Mate Kukri (mkukri) wrote :

Also keep in mind that the /EndEntire message was simply removed from GRUB a while back, so that not showing up means nothing by itself.

Mate Kukri (mkukri)
tags: added: foundations-todo
Changed in edk2 (Ubuntu):
assignee: nobody → Mate Kukri (mkukri)
Revision history for this message
Dan Bungert (dbungert) wrote :

> This is likely https://github.com/tianocore/edk2/issues/10883#issue-2938078412

Thanks Dann!

I did a ppa build with that commit reverted, then a fresh install, and indeed the 25.04 system boots. PPA with that revert can be found at https://launchpad.net/~dbungert/+archive/ubuntu/proposed-amd64/. I don't know enough about grub or edk2 to know if uploading that revert makes sense so I don't plan to do so, I only sought to confirm dannf's comment.

Revision history for this message
Chris Peterson (cpete) wrote :

I can confirm Dan's test build worked for me too.

Also digging through the link Dann provided I found this, which seems to match the errors I posted above: https://edk2.groups.io/g/devel/topic/110601533#msg120986

Since this should (has to?) work with secureboot then this points to figuring out the bug in grub/shim?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Bug 2104316] Re: 25.04 beta TPMFDE: first boot failure

What a mess. What is different about hybrid / core boot that triggers this
that isn't impacting regular Ubuntu? I mean the installer boots and that's
using the same shim and grub (or at least it's certainly suppose to!!)

Revision history for this message
dann frazier (dannf) wrote :

Might be worth running qemu under gdb to see. It could also be in the UEFI stub I suppose. I left some comments in edk2:debian/rules to show how to do a DEBUG build, which should let you see where different objects are getting loaded:
  https://salsa.debian.org/qemu-team/edk2/-/commit/419f172aa9d7d4332facb0f6f627f25c802fefb7

I've uploaded 2025.02-5 to sid w/ support for the -fw_cfg flag to uninstall the mem attribute protocol in Secure Boot mode, if that's OK for your use case:
  https://salsa.debian.org/qemu-team/edk2/-/commit/5256dc095ae1c5a3c6df219a2eef71391a365dc1

Revision history for this message
Dan Bungert (dbungert) wrote :

https://launchpad.net/~dbungert/+archive/ubuntu/lp-2104316 contains an edk2 build with patch OvmfPkg-X64-add-opt-org.tianocore-UninstallMemAttrPr.patch, as seen in DannF's commit on salsa at https://salsa.debian.org/qemu-team/edk2/-/commit/5256dc095ae1c5a3c6df219a2eef71391a365dc1

I can confirm that the TPMFDE install and first boot works fine with the resulting firmware from that edk2 build.

Revision history for this message
Mate Kukri (mkukri) wrote :

So update is im 99% sure this a is bad interaction between the systemd UKI stub and the memory attribute protocol...

Revision history for this message
Mate Kukri (mkukri) wrote :

Two possible causes:
- UKIs don't work with mem attribute protocol + secure boot enabled, this is likely due to systemd's borked hooking of the SEC ARCH 2 protocol pointer, which i assume is in protected memory...
- grub page faults when running chainloader on kernel.efi , or linux + boot on kernel.efi, the exact cause of this is unclear right now. but probably because systemd-stub incorrectly marks itself as NX_COMPAT despite not being so.

Both of these things need to be investigated.

I think we should get rid of the mem attribute protocol in ubuntu's edk2 until these are fixed.

Changed in grub2 (Ubuntu):
assignee: nobody → Mate Kukri (mkukri)
Changed in systemd (Ubuntu):
assignee: nobody → Mate Kukri (mkukri)
Revision history for this message
Mate Kukri (mkukri) wrote :

GRUB issue is that peimage tries to write relocations addends to read-only sections after setting them read only....

But i suspect fixing that will make the GRUB problem be the same as the firmware direct boot problem, so we nicely caught two bugs here.

Changed in edk2 (Ubuntu):
status: New → Invalid
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

> - UKIs don't work with mem attribute protocol + secure boot enabled, this is likely due to systemd's borked hooking of the SEC ARCH 2 protocol pointer, which i assume is in protected memory...

This makes sense because other systems boot after all. Is this "borked hooking" only in newer systemds? (because Noble TPM FDE installs apparently work with plucky ovmf)

> I think we should get rid of the mem attribute protocol in ubuntu's edk2 until these are fixed.

You marked the edk2 task as invalid, but this sounds like you think we should make a change to edk2? (even if it's not a bug in edk2, per se)

Revision history for this message
Mate Kukri (mkukri) wrote :

> This makes sense because other systems boot after all. Is this "borked hooking" only in newer systemds? (because Noble TPM FDE installs apparently work with plucky ovmf)

Hmm I am not sure, I'll look into this in more detail as part of fixing this bug properly after edk2 workaround is done.

> You marked the edk2 task as invalid, but this sounds like you think we should make a change to edk2? (even if it's not a bug in edk2, per se)

Yes I will upload edk2 to work around it today, I marked as Invalid because I dont consider this an edk2 bug per se.

Revision history for this message
Dan Bungert (dbungert) wrote :

I retested based on edk2_2025.02-3ubuntu2.dsc in unapproved, and indeed it installs and boots the TPMFDE vm fine. Thanks for the upload.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package edk2 - 2025.02-3ubuntu2

---------------
edk2 (2025.02-3ubuntu2) plucky; urgency=medium

  * Uninstall memory attribute protocol in all images, workaround for (LP: #2104316)

 -- Mate Kukri <email address hidden> Fri, 04 Apr 2025 18:44:53 +0100

Changed in edk2 (Ubuntu):
status: Invalid → Fix Released
Utkarsh Gupta (utkarsh)
Changed in edk2 (Ubuntu):
milestone: none → ubuntu-25.04
Changed in ubuntu:
milestone: ubuntu-25.04 → none
no longer affects: ubuntu
Utkarsh Gupta (utkarsh)
no longer affects: grub2 (Ubuntu)
no longer affects: systemd (Ubuntu)
Revision history for this message
Mate Kukri (mkukri) wrote :

This still needs a grub2 upload and systemd stub change to properly fix so i wouldnt remove, edk2 is only a workaround.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.