Comment 7 for bug 1882671

Revision history for this message
Laszlo Ersek (Red Hat) (lersek) wrote : Re: qemu-system-x86_64 (ver 4.2) stuck at boot with OVMF bios

Hi Vlad,

the ipxe-qemu package in Ubuntu (1.0.0+git-20190109.133f4c4-0ubuntu3) is
built with DOWNLOAD_PROTO_HTTPS enabled (in "src/config/general.h").
According to the Ubuntu changelog, this is a new feature added in
"1.0.0+git-20190109.133f4c4-0ubuntu1".

With DOWNLOAD_PROTO_HTTPS enabled, I can reproduce the issue locally,
with iPXE built from source at git commit 133f4c4 (which you report the
issue for), and also at current iPXE master (9ee70fb95bc2).

The issue does not reproduce (with DOWNLOAD_PROTO_HTTPS enabled) at
commit fbe8c52d. This suggests the problem should be bisectable.

If I disable DOWNLOAD_PROTO_HTTPS, then the problem goes away even at
133f4c4 (i.e., the issue is masked).

I've used current edk2 master to test with (14c7ed8b51f6).

Viewed at 133f4c4:

The DOWNLOAD_PROTO_HTTPS feature test macro seems to result in iPXE
attempting to gather entropy. (Likely for setting up TLS connections.)
For entropy gathering, iPXE seems to use an EFI timer, and to measure
jitter across one timer tick. In this, iPXE plays some tricks with the
UEFI TPL (Task Priority Level).

In general, iPXE seems to want to run at TPL_CALLBACK most of the time,
to mask the timer interrupt in most code locations, and drops down to
TPL_APPLICATION only when it actively wants a timer callback (for the
jitter collection, see above).

When the iPXE driver is launched, the StartImage() UEFI boot service
takes a note of the current TPL. It is TPL_APPLICATION (value 4). Then
iPXE seems to perform the above trickery with TPL_CALLBACK & entropy
collection. Finally, after installing EfiDriverBindingProtocol and
EfiComponentName2Protocol, the iPXE driver exits (as expected from a
UEFI driver model driver -- the entry point function is only supposed to
perform some setup steps & install some protocol interfaces). At this
point, StartImage() verifies whether the TPL has been restored to the
same as it was before launching the driver.

Unfortunately, something about the TPL manipulations in iPXE is
unbalanced, because I see the following TPL changes:

- raise: APPLICATION (4) -> CALLBACK (8)
- raise: CALLBACK (8) -> NOTIFY (16)
- raise: NOTIFY (16) -> NOTIFY (16)
- restore: NOTIFY (16) -> NOTIFY (16)
- restore: NOTIFY (16) -> CALLBACK (8)

Note that the final "restore: CALLBACK (8) -> APPLICATION (4)"
transition is missing, before iPXE exits. This is what StartImage()
catches and reports with the failed ASSERT().

So, as I mentioned, the problem is bisectable. Here's the bisection log:

> git bisect start
> # bad: [9ee70fb95bc266885ff88be228b044a2bb226eeb] [efi] Attempt to
> # connect our driver directly if ConnectController fails
> git bisect bad 9ee70fb95bc266885ff88be228b044a2bb226eeb
> # bad: [133f4c47baef6002b2ccb4904a035cda2303c6e5] [build] Handle
> # R_X86_64_PLT32 from binutils 2.31
> git bisect bad 133f4c47baef6002b2ccb4904a035cda2303c6e5
> # good: [fbe8c52d0d9cdb3d6f5fe8be8edab54618becc1f] [ena] Fix spurious
> # uninitialised variable warning on older versions of gcc
> git bisect good fbe8c52d0d9cdb3d6f5fe8be8edab54618becc1f
> # bad: [bc85368cdd311fe68ffcf251e7e8e90c14f8a9dc] [librm] Ensure that
> # inline code symbols are unique
> git bisect bad bc85368cdd311fe68ffcf251e7e8e90c14f8a9dc
> # bad: [0778418e29ea16fc897fc5b6e497054f5ba86ebd] [golan] Do not
> # assume all devices are identical
> git bisect bad 0778418e29ea16fc897fc5b6e497054f5ba86ebd
> # good: [f672a27b34220865b403df519593f382859559e0] [efi] Raise TPL
> # within EFI_USB_IO_PROTOCOL entry points
> git bisect good f672a27b34220865b403df519593f382859559e0
> # bad: [d8c500b7945e57023dde5bd0be2b0e40963315d9] [efi] Drop to
> # TPL_APPLICATION when gathering entropy
> git bisect bad d8c500b7945e57023dde5bd0be2b0e40963315d9
> # good: [c84f9d67272beaed98f98bf308471df16340a3be] [iscsi] Parse IPv6
> # address in root path
> git bisect good c84f9d67272beaed98f98bf308471df16340a3be
> # first bad commit: [d8c500b7945e57023dde5bd0be2b0e40963315d9] [efi]
> # Drop to TPL_APPLICATION when gathering entropy

The bisection fingers d8c500b7945e ("[efi] Drop to TPL_APPLICATION when
 gathering entropy", 2018-03-12) as first bad commit.

Feel free to report this problem on the upstream iPXE mailing list.

Regarding Ubuntu downstream, you should be able to work around this
issue by #undef-ing DOWNLOAD_PROTO_HTTPS again, in
"src/config/general.h" -- *minimally* in the CONFIG=qemu build(s). That
is, in the ipxe-qemu subpackage.

That's because in a CONFIG=qemu build, you totally don't need (or even
*use*) the iPXE HTTPS infrastructure (the entropy gathering that trips
the ASSERT seems spurious to me, with CONFIG=qemu). With CONFIG=qemu,
iPXE provides the UEFI SNP (Simple Network Protocol) interface on top of
the e1000 NIC, and the crypto stuff (if any) is done by the platform
firmware (edk2 / OVMF).