yakkety: backport (or rebase to) fix eliminating a double-close in shim

Bug #1624096 reported by Jason Gerard DeRose on 2016-09-15
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
shim (Ubuntu)
High
Mathieu Trudel-Lapierre
Xenial
Undecided
Unassigned

Bug Description

Sometime after August 25th (or so) something changed in the Yakkety ISOs that make them no longer boot under QEMU in UEFI mode. However, the ISOs do work fine still on the physical UEFI hardware I've tested (3 different systems). I'm not sure about other VM solutions like Virtual Box, etc., as I haven't tested under anything other than QEMU. But under QEMU, UEFI mode installs are definitely broken.

You get stuck in the OVMF firmware with the following text on the screen (see attached screenshot):

Boot Failed. EFI Floppy
Boot Failed. EFI Floppy 1

Thus far I've only tested with a Xenial host, so I'm not sure whether this problem exists with a Yakkety host + Yakkety guest.

This problem also doesn't seem to be the result of any changes in QEMU (and related) in Xenial. With a Xenial host, you can still do UEFI mode installs fine under QEMU when the guest is using the 16.04.1 ISOs, and likewise when the guest is using the latest Xenial daily (16.04.2 WIP) ISOs. So the problem seems to be only when using a Yakkety guest in UEFI mode.

Note this problem effects both Yakkety desktop and server ISOs (when installing under QEMU in UEFI mode).

Finally, on the off chance it might be helpful to anyone who comes across this bug report, I wrote a blog post a while back on how to use QEMU in UEFI mode on a Xenial (or newer) host:

http://blog.system76.com/post/139138591598/howto-qemu-w-ubuntu-xenial-host-uefi-guest

Thanks!

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in debian-installer (Ubuntu):
status: New → Confirmed
Laszlo Ersek (Red Hat) (lersek) wrote :

Add all three of the following options to your QEMU command line:

    -debugcon file:debug.log \
    -global isa-debugcon.iobase=0x402 \
    -serial stdio

In the OVMF debug log, you will see that your boot loader is launched:

    [Bds]Booting UEFI QEMU DVD-ROM QM00003
    FatDiskIo: Cache Page OutBound occurred!
    FSOpen: Open '\EFI\BOOT\BOOTX64.EFI' Success
    [Bds] DevicePath expand: PciRoot(0x0)/Pci(0x1,0x1)/Ata(Secondary,Master,0x0) -> PciRoot(0x0)/Pci(0x1,0x1)/Ata(Secondary,Master,0x0)/CDROM(0x1,0xC83AD,0x11C0)/\EFI\BOOT\BOOTX64.EFI
    InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 6C4D040
    Loading driver at 0x00006486000 EntryPoint=0x000064A3000
    InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 700E318

And on the serial console, you will get the register dump for the crash:

    !!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
    RIP - AFAFAFAFAFAFAFAF, CS - 0000000000000038, RFLAGS - 0000000000000206
    ExceptionData - 0000000000000000
    RAX - AFAFAFAFAFAFAFAF, RCX - 00000000070176A0, RDX - 00000000070176A0
    RBX - 0000000006C4D018, RSP - 0000000007AFBA28, RBP - 0000000007AFBAE0
    RSI - 0000000006534D9A, RDI - 0000000006485FBA
    R8 - 0000000000000000, R9 - 0000000000000000, R10 - 0000000000000020
    R11 - 00000000067E7180, R12 - 0000000000000000, R13 - 0000000006F883E8
    R14 - 0000000006F883F0, R15 - 0000000007B1E9D0
    DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
    GS - 0000000000000030, SS - 0000000000000030
    CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 0000000007A9A000
    CR4 - 0000000000000668, CR8 - 0000000000000000
    DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
    DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
    GDTR - 0000000007A88698 0000000000000047, LDTR - 0000000000000000
    IDTR - 0000000007442018 0000000000000FFF, TR - 0000000000000000
    FXSAVE_STATE - 0000000007AFB680

The pattern AFAFAFAFAFAFAFAF is used to fill memory that's being freed, for debugging purposes. So, your BOOTX64.EFI application dances fandango on core.

Jason Gerard DeRose (jderose) wrote :

@Laszlo - thanks for the debugging tips! I'm getting the same result that you are.

I found a possibly related opensuse bug:
https://lists.opensuse.org/opensuse-bugs/2016-01/msg00965.html

Laszlo Ersek (Red Hat) (lersek) wrote :

I found the error in shim. It is a double-close (on error handling) of the root directory of the filesystem. I'll submit a patch soon.

Jason Gerard DeRose (jderose) wrote :

@Laszlo - awesome, thanks! Do you happen to know what version introduced the bug?

Laszlo Ersek (Red Hat) (lersek) wrote :

Actually, I don't need to write any new patches, upstream shim has the problem fixed already:

    commit 7052e75307553edc8f04eb529b0d37844fbcc30b
    Author: Benjamin Antin <email address hidden>
    Date: Mon Jul 18 12:28:12 2016 -0700

        Don't close file twice in should_use_fallback error path

        When fallback.efi is not present, the should_use_fallback error path
        attempts to close a file that has already been closed, resulting in a
        hang. This issue only affects certain systems.

        This is a regression from version 0.8 and was introduced by commit
        4794822.

        Signed-off-by: Benjamin Antin <email address hidden>

You guys just need to rebase Yakkety's shim package on top of an upstream git commit that comes after 7052e7530755. (Alternatively, you can also backport 7052e7530755, but I guess Yakkety's release schedule might allow another rebase at this point.) You are currently based on 14a5905, which does not include the fix.

affects: debian-installer (Ubuntu) → shim (Ubuntu)
summary: - yakkety: desktop and server ISOs wont boot under QEMU in UEFI mode
+ yakkety: backport (or rebase to) fix eliminating a double-close in shim
Laszlo Ersek (Red Hat) (lersek) wrote :

@Jason -- according to the upstream fix (7052e7530755) that Yakkety is currently missing, the upstream regression comes from upstream commit 4794822.

That commit (i.e., the regression) is between the 0.9 release and 14a5905. According to <http://changelogs.ubuntu.com/changelogs/pool/main/s/shim/shim_0.9+1465500757.14a5905-0ubuntu1/changelog>, the previous ubuntu shim version was "0.8-0ubuntu2", while the most recent one is "0.9+1465500757.14a5905-0ubuntu1" (including the regression but not its fix). So, in Ubuntu, it was the latest shim rebase (dated "Tue, 26 Jul 2016 16:48:32 -0400") that introduced the bug.

Triaging, this is my problem.

In my defense, I don't think the regression was known at the point I took that snapshot :)

Changed in shim (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
Changed in debian-cd (Ubuntu):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
Changed in grub2 (Ubuntu):
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
importance: Undecided → Medium

Given that it will take a bit of time to get a new shim signed; we'll also need to ship fallback.efi on the CD (and it makes sense to do this anyway), and on disk in general. I've added the tasks for debian-cd and grub2 to do so.

Jason Gerard DeRose (jderose) wrote :

@Mathieu - I've been doing some quick experiments with fallback.efi on the ISO, and I'm not sure that alone will fix things.

The problem is that when fallback.efi is present, the installer isn't launching. Instead, OVMF tries to PXE boot (which in my test environment fails because I don't have the needed DHCP/TFTP setup), then OVMF falls back to Shell>

I tried adding /EFI/BOOT/fallback.efi to both the latest Yakkety daily ISO and the 16.04.1 ISO. In both cases, the installer doesn't boot, I end up at Shell>

So although fallback.efi can work around the X64 Exception in the Yakkety version of shim, it still doesn't give you a bootable installer. If there's something obvious I'm missing, please let me know!

Also, do you have any idea why this faulty shim code path is taken when running under QEMU + OVMF, but does not seem to be taken when running on physical hardware?

Download full text (4.3 KiB)

@Jason, there are two separate topics in your question.

First, controlling the boot order from the QEMU command line (i.e., filtering and/or reordering the persistent UEFI boot options that (a) exist from earlier in the varstore, plus (b) OVMF's platform BDS regenerates at every boot).

For this, you have to use the

    -device XXXX,bootindex=N

propertiey, which in turn necessitates the modern, separate notation for backend/frontend.

For example, for network devices you have to spell out

    -netdev XXXX,id=netdev0,... \
    -device virtio-net-pci,netdev=netdev0,bootindex=2

For disks, for example with the virtio-blk-pci frontend, it requires

    -drive if=none,id=drive0,file=ZZZ,... \
    -device virtio-blk-pci,drive=drive0,bootindex=1

The various shorthands like "-net nic", "-hda", "-drive if=virtio" don't allow you to specify the bootindex=N property, and therefore are unsuitable for OVMF. (At least if you want to control the boot order from the QEMU command line.)

So, in this specific case, assuming you have one QCOW2 system disk (created with qemu-img) that you want to install Ubuntu to, plus the installer ISO you want to install from, I would recommend:

    -drive if=pflash,readonly,format=raw,file=PATH_TO_OVMF_CODE_FD \
    -drive if=pflash,format=raw,file=PATH_TO_PRIVATE_VARSTORE \
    \
    -debugcon file:ovmf.debug.log \
    -global isa-debugcon.iobase=0x402 \
    \
    -chardev stdio,signal=off,mux=on,id=char0 \
    -mon chardev=char0,mode=readline,default \
    -serial chardev:char0 \
    \
    -device virtio-scsi-pci,id=scsi0 \
    \
    -drive id=sysdisk,if=none,format=qcow2,discard=on,cache=writeback,file=... \
    -device scsi-hd,drive=sysdisk,bus=scsi0.0,bootindex=1 \
    \
    -drive id=installer,if=none,format=raw,file=... \
    -device scsi-cd,drive=installer,bus=scsi0.0,bootindex=2 \

This will (a) capture the OVMF log; (b) give you access to both the QEMU monitor and the guest's serial console -- switch between them with [C-a c]; (c) create a virtio-scsi disk and CD-ROM for the guest, with the (target) system disk and the installer ISO, respectively; (d) assign bootindex=1 to the system disk, and bootindex=2 to the installer ISO.

The upshot is that when you first boot the VM, the installer ISO will be launched (because the system disk is still empty), but after installation, the VM will boot off of the system disk.

If there is a (QEMU default, or manually configured) virtual NIC in the VM as well, then PXE boot will *not* be attempted. The reason is that you assign a bootindex to at least one device, but no bootindex is assigned to the NIC. This will cause OVMF to filter out any UEFI boot options (created manually or automatically) that would refer to the NIC.

If the yakkety installer still doesn't boot with the above command line snippet (*and* with the shim bug fixed or worked around), then I'd say the installer ISO is malformed in some other way.

The second topic is why the shim bug doesn't hit hard on some physical systems. For this, consider how EFI_FILE_PROTOCOL.Close() works -- it releases the entire container structure that contains EFI_FILE_PROTOCOL. When you call FileProtocol->Close() next, using ...

Read more...

Also, I should mention in passing -- again -- that Launchpad is completely retarded for truncating comments in the full bug view. It doesn't offer any option to see both the full bug and full comments. How stupid is that?! Are people who take the time to explain things in detail really considered "verbose"? Do their comments really deserve to be abbreviated in the full bug view? "Your comment is too long, so users who care about it should click another link, and *replace* the full bug view with a sole comment".

Screw you Launchpad.

Jason Gerard DeRose (jderose) wrote :

@Laszlo - thank you very much for the detailed explanation!

Sounds like your tips, plus /EFI/BOOT/fallback.efi being present on the ISO, should be enough for me to work-around this issue. I'll let you know how it goes.

Thanks again!

Jason Gerard DeRose (jderose) wrote :

@Laszlo - darn, no luck. Following your recommendations, PXE booting is no longer attempted, but I still end up at Shell> and the installer doesn't launch.

I'm attaching the isolated test script I'm using ATM. If I try it with the latest Yakkety desktop daily ISO, I hit the above hardware exception (as expected because of the issue in `shim`). However, if I try it with the same ISO modified to include /EFI/BOOT/fallback.efi, it goes directly to Shell> rather than launching the installer.

Please let me know if you spot any goofs in my script or can think of anything else to try.

(Note: my test script doesn't have any -net devices at all, but my image mastering tools do, so that's why I know that PXE booting wasn't being tried any more.)

@Mathieu - under the assumption that there is more to this than just the issue in `shim`, or at least that the presence of fallback.efi can't fully work-around it, do you have any suggestions as to where I should go looking for other things that have changed between the Yakkety and Xenial ISOs, things that might be interacting with the `shim` bug in odd ways?

Also, I should make it clear why this bug is critical to System76: our imaging mastering tools use QEMU + OVMF to create our UEFI images, so this is something we absolutely need to find some solution for in order to ship 16.10. Because (most likely) we'll need 16.10 to initially ship Kaby Lake :D

Jason Gerard DeRose (jderose) wrote :

Oops, forgot to attach my test script :P

@Jason -- your script looks alright to me. Can you attach the OVMF debug log captured with the script? (Although, if the debug mask configured at build time in the DSC files don't enable the DEBUG_VERBOSE bit, I won't see everything in the log that I would like to see.)

More importantly, can you upload your test ISO image (with the shim bug fixed, or worked around) somewhere? If you don't want to expose the URL publicly, feel free to send it to me in a private email, or in a private Launchpad message. (I vaguely recall that such a thing exists.)

As Laszlo mentioned, this can affect other systems than QEMU. I definitely can't boot the ISO on my thinkpad when shim debugging is enabled.

Then, as discussed, fallback.efi shouldn't be on the ISO. It's clearly not going to work due to the way shim is designed. Given that, we don't need a debian-cd task to install fallback...

I'm working on preparing the shim update since yesterday. I can get you a working shim if necessary for testing for a custom remastered CD image.

Changed in grub2 (Ubuntu):
status: New → Triaged
Changed in debian-cd (Ubuntu):
status: Triaged → Invalid
Changed in grub2 (Ubuntu):
importance: Medium → High
Changed in shim (Ubuntu):
status: Triaged → In Progress
Changed in grub2 (Ubuntu):
status: Triaged → In Progress
Jason Gerard DeRose (jderose) wrote :

@@Mathieu yeah, if you can get me a custom ISO with an updated shim package (doesn't need to be signed, I'm not using secure boot), then I'll giving a thorough testing under QEMU and on all the UEFI hardware I have access too (6 different laptops, 3 different desktops).

Thanks!

Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1624096

tags: added: iso-testing
pereze (pereze) on 2016-10-02
Changed in shim (Ubuntu):
status: In Progress → Fix Released
Colin Watson (cjwatson) on 2016-10-02
Changed in shim (Ubuntu):
status: Fix Released → In Progress
sierdzio (sierdzio) wrote :

Apparently this affects me as well, with Kubuntu 16.10 (beta2 and all subsequent daily builds), on a self-assembled PC (no QEMU, no laptop). If it helps in anything, I'm using Asus Z170P motherboard.

Jason Gerard DeRose (jderose) wrote :

@Mathieu - I was on vacation last week, so I wasn't in the loop on IRC.

What's the current status of this? Does it seem feasible that the fixed shim package can be signed (and FFE'd) in time for 16.10? Or are we already at the point where reverting to the shim 0.8-0ubuntu2 package from Xenial (with whatever needed version trickery) is the only realistic hope for fixing this?

If there's anything I can do to help, please don't hesitate to ask!

We're still waiting for shim to be signed by Microsoft. I don't expect issues with a FFE for the new shim, since it fixes some important bugs. If it doesn't make it though, we can provide the new shim as a stable release update.

Given that we're very close to release however, it seems like it's time to do a revert for now.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package shim - 0.9+1465500757.14a5905.is.0.8-0ubuntu2

---------------
shim (0.9+1465500757.14a5905.is.0.8-0ubuntu2) wily; urgency=medium

  * Revert to shim 0.8 for now; which at least doesn't crash if fallback.efi
    is absent. (LP: #1624096)
    - This effectively reverts shim to 0.8-0ubuntu2.

 -- Mathieu Trudel-Lapierre <email address hidden> Mon, 03 Oct 2016 14:32:28 -0400

Changed in shim (Ubuntu):
status: In Progress → Fix Released
Jason Gerard DeRose (jderose) wrote :

Hmmm, today's yakkety-desktop-amd64.iso (sha1:494bc027be3d29c494eb17d057dcc51cdfc6f50b) is seemingly still using the broken shim package?

I'm guessing there's something special about how the shim package gets onto the ISO as it doesn't seem to be listed in yakkety-desktop-amd64.manifest?

But for whatever reason, I'm still getting the same exception when debugging with `-serial stdio`:

!!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
RIP - 000000007E64D5BA, CS - 0000000000000038, RFLAGS - 0000000000010202
ExceptionData - 0000000000000000
RAX - AFAFAFAFAFAFAFAF, RCX - 000000007F1C5820, RDX - 000000007F1C5820
RBX - 000000007F132198, RSP - 000000007FB1BA40, RBP - 000000007FB1BAF0
RSI - 000000007E6DBD9A, RDI - 000000007E62CFBA
R8 - 0000000000000004, R9 - 0000000000000000, R10 - 0000000000000020
R11 - 0000000000000002, R12 - 000000007EEB34B8, R13 - 000000007EEB34C0
R14 - 000000007FB33620, R15 - 000000007EDD6018
DS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030
GS - 0000000000000030, SS - 0000000000000030
CR0 - 0000000080000033, CR2 - 0000000000000000, CR3 - 000000007FABA000
CR4 - 0000000000000668, CR8 - 0000000000000000
DR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3 - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007FAA8698 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007F5E4018 0000000000000FFF, TR - 0000000000000000
FXSAVE_STATE - 000000007FB1B6A0
!!!! Find PE image (No PDB) (ImageBase=000000007E62D000, EntryPoint=000000007E64A000) !!!!

Jason Gerard DeRose (jderose) wrote :

Okay, the 20161005.1 ISOs seem to have done the trick. Tested the desktop and server ISOs under QEMU+OVMF, plus tested the desktop ISO on a slew of UEFI hardware. No issues encountered shim-wise.

I'll test the server ISO on UEFI hardware shortly, but there are a few other things I need to finish up first.

Big thanks to everyone who helped on this!

Jason Gerard DeRose (jderose) wrote :

And I sanity checked the server ISO on the same slew of UEFI hardware... no issues found.

Turns out we didn't need grub2 for this case since we reverted to the "old" shim.

Zesty now has the new shim and we'll proceed with the SRUs shortly.

Changed in grub2 (Ubuntu):
status: In Progress → Invalid
no longer affects: grub2 (Ubuntu)
no longer affects: debian-cd (Ubuntu)

Hello Jason, or anyone else affected,

Accepted shim into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/shim/0.9+1474479173.6c180c6-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in shim (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Jason Gerard DeRose (jderose) wrote :

Steve,

I'm not sure whether it was a truly representative test, but it works fine with latest Xenial daily desktop ISO under QEMU + OVMF, and these dailies do have xenial-proposed enabled.

(I'm testing the 20161111 ISO, sha1sum 0ed4db8dad7142837ce9175e2b9617c4dd93a326.)

I recall that d-i needed to be rebuilt for a new shim to be properly represented in an ISO... has this happened yet?

Thanks!

On Fri, Nov 11, 2016 at 07:24:13PM -0000, Jason Gerard DeRose wrote:
> I recall that d-i needed to be rebuilt for a new shim to be properly
> represented in an ISO... has this happened yet?

It has not. OTOH this is the same exact binary that is currently in the
zesty release, so it should be possible to test a daily image there
(provided we get a d-i rebuild in zesty).

The latest zesty d-i image (20101020ubuntu487) build on 2016-11-07 should include the right shim already.

Steve Langasek (vorlon) on 2017-03-24
Changed in shim (Ubuntu Xenial):
status: Fix Committed → Fix Released

An upload of shim-signed to trusty-proposed has been rejected from the upload queue for the following reason: "needs adjusted versioned dep on grub2-common; drop ref to LP: #1624096 from changelog".

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers