Deployments fail when Secure Boot enabled

Bug #1711203 reported by Rod Smith
44
This bug affects 7 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Andres Rodriguez
2.3
Fix Released
High
Andres Rodriguez
OEM Priority Project
New
Undecided
Unassigned
curtin
Invalid
Undecided
Unassigned
dellserver
Fix Released
Undecided
Unassigned
maas-images
Fix Released
Critical
Lee Trager
shim (Ubuntu)
Fix Released
High
Mathieu Trudel-Lapierre
Bionic
In Progress
High
Mathieu Trudel-Lapierre

Bug Description

I've recently encountered a problem with deploying nodes on which Secure Boot is enabled. The symptoms are:

1. The node enlists and commissions fine
2. The node boots and begin deploying fine
3. After deployment completes, the node reboots
4. When booting at this point, after showing a few routine messages,
   including a GRUB menu, the node displays the following text on its
   screen:

error: invalid video mode specification `text'.
Booting in blind mode
Bootloader has not verified loaded image
System is compromised. halting

Disabling Secure Boot on the node enables it to boot. If this is done quickly enough, deployment will succeed.

I've encountered this problem on two systems managed by two MAAS servers: An Intel NUC DC53247HYE and a Cisco UCS C-240 M4 (VIC). One MAAS server is running 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1) and the other is running 2.2.1 (6078-g2a6d96e-0ubuntu1~16.04.1). I'm attaching log files from the first server to this bug report. The affected node is brennan on that server.

Further observations:

* Once booted, I see that there's no kernel with a .efi.signed extension on
  the hard disk. Installing such a kernel does NOT fix the problem;
  however, it may be necessary to install such a kernel for a proper fix.
* If I force a boot directly through the Shim and GRUB installed on the
  hard disk, the system boots correctly, even with Secure Boot enabled.

I found a copy of the error message in Shim source code, and reports of this message on Fedora as early as 2014:

* https://github.com/rhboot/shim/blob/master/replacements.c
* https://ask.fedoraproject.org/en/question/39126/bootloader-has-not-verified-loaded-image/

It looks to me as if the Shim that MAAS uses for the post-deployment boot has been updated/changed to include this strict verification that the kernel is honoring Secure Boot rules; but the Shim installed to the hard disk, and used during enlistment and commissioning, does not perform this check. OTOH, I can't find any evidence of separate Shim binaries on the MAAS server.

MAAS version information from one server:

$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-====================================-============-==================================================
ii maas 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cert-server 0.2.30-0~76~ubuntu16.04.1 all Ubuntu certification support files for MAAS server
ii maas-cli 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.2.2-6099-g8751f91-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Related branches

Revision history for this message
Rod Smith (rodsmith) wrote :
Revision history for this message
Jeff Lane  (bladernr) wrote :

Setting to confirmed... this has also been reported by a customer in the field (OEM Partner) and I was investigating it for them, Rod now has also experienced this.

Changed in maas:
status: New → Confirmed
Changed in maas:
status: Confirmed → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Please attach the curtin configuration and the installation log.:

1. Curtin config: maas <user> machine get-curtin-config <system_id>
2. Install log in 2.2+: maas <user> node-script-result download <system-id> current-installation > install.log

David Britton (dpb)
Changed in curtin:
status: New → Incomplete
Revision history for this message
Rod Smith (rodsmith) wrote :
Revision history for this message
Rod Smith (rodsmith) wrote :
Revision history for this message
Rod Smith (rodsmith) wrote :

Note that these logs reflect installation with Secure Boot OFF, at least part of the time. If you need a log of a FAILED deployment with Secure Boot on, please tell me.

Revision history for this message
Ryan Harper (raharper) wrote :

Can we please get some clarity on what curtin needs to do? We've had two bugs in this area:

https://bugs.launchpad.net/maas/+bug/1680917 (bootorder fixed in curtin)

and

https://bugs.launchpad.net/maas/+bug/1687729 (closed with no update, suspected firmware issue)

it's not clear how this bug is different than those?

Not sure which machines in the logs to look at so, walking through a few:

brenan:
  installs grub-efi-amd64-signed, shim, shim-signed
  during grub install, it reports that it can't access efi vars
  I don't recall if that's fatal. Nothing for *curtin* here, if
  anything maybe bug against grub or other secureboot packages in their post-inst scripts?

Aug 16 18:07:29 brennan cloud-init[2576]: Setting up grub-efi-amd64-signed (1.66.12+2.02~beta2-36ubuntu3.12) ...
Aug 16 18:07:29 brennan cloud-init[2576]: Setting up mokutil (0.3.0-0ubuntu3) ...
Aug 16 18:07:29 brennan cloud-init[2576]: Setting up sbsigntool (0.6-0ubuntu10.1) ...
Aug 16 18:07:29 brennan cloud-init[2576]: Setting up secureboot-db (1.1) ...
Aug 16 18:07:29 brennan cloud-init[2576]: Can't access efivars filesystem at /sys/firmware/efi/efivars, aborting
Aug 16 18:07:29 brennan cloud-init[2576]: Setting up shim (0.9+1474479173.6c180c6-1ubuntu1) ...
Aug 16 18:07:29 brennan cloud-init[2576]: Setting up shim-signed (1.32~16.04.1+0.9+1474479173.6c180c6-1ubuntu1) ...
Aug 16 18:07:29 brennan cloud-init[2576]: No DKMS packages installed: not changing Secure Boot validation state.
Aug 16 18:07:29 brennan cloud-init[2576]: Processing triggers for libc-bin (2.23-0ubuntu9) ...
Aug 16 18:07:35 brennan cloud-init[2576]: BootCurrent: 0004
Aug 16 18:07:35 brennan cloud-init[2576]: Timeout: 1 seconds
Aug 16 18:07:35 brennan cloud-init[2576]: BootOrder: 0004,0006,0007,0005,0003,0002,0001,0000
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0000* rEFInd Boot Manager
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0001* UEFI C400-MTFDDAT064MAM MSA1727018J
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0002* UEFI PXEv4 (MAC:ECA86BFF33AD)
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0003* UEFI PXEv6 (MAC:ECA86BFF33AD)
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0004* UEFI : LAN : IP4 Intel(R) 82579LM Gigabit Network Connection
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0005* rEFInd Boot Manager
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0006* ubuntu
Aug 16 18:07:35 brennan cloud-init[2576]: Boot0007* UEFI : LAN : IP6 Intel(R) 82579LM Gigabit Network Connection

kzanol:
  shows the same error w.r.t can't find efivars:
Jun 13 13:57:38 kzanol cloud-init[2268]: Setting up grub-efi-amd64-signed (1.66.9+2.02~beta2-36ubuntu3.9) ...
Jun 13 13:57:38 kzanol cloud-init[2268]: Setting up mokutil (0.3.0-0ubuntu3) ...
Jun 13 13:57:38 kzanol cloud-init[2268]: Setting up sbsigntool (0.6-0ubuntu10.1) ...
Jun 13 13:57:38 kzanol cloud-init[2268]: Setting up secureboot-db (1.1) ...
Jun 13 13:57:38 kzanol cloud-init[2268]: Can't access efivars filesystem at /sys/firmware/efi/efivars, aborting

Revision history for this message
Rod Smith (rodsmith) wrote :

Ryan, neither of the bugs you reference is a duplicate of this one. This bug is new. Both the nodes I've tested have been successfully deployed with Secure Boot active in the past. In fact, brennan had been successfully deployed with Secure Boot but then failed to boot from that very deployment some days or weeks later (I'm not sure how long it had been since I last booted it), which suggests to me that the Shim/GRUB provided by the MAAS server when PXE-booting had changed in that period.

Bug #1680917 is about what happens when the MAAS server becomes unavailable and a node tries to boot. This problem would exist with or without Secure Boot being active.

Bug #1687729 is more similar to this new bug, but that problem does NOT affect all systems. My read on that bug report is that it was an incompatibility between our Shim and the Secure Boot implementation in some computers. The bug I'm reporting now appears to be a problem caused by Shim refusing to allow a GRUB that doesn't check the validity of a loaded kernel to boot.

In some sense, if my analysis is correct, the problem is caused by Shim "tightening the screws" on Secure Boot policy; however, those changes are done for a reason (to improve security), so the solution should be to ensure that the GRUB versions MAAS and curtin deploy perform the checks that Shim wants, and that the kernels we install are signed. AFAIK, we have all the required pieces in the standard Ubuntu toolset, but clearly, a deployed system does not have signed kernels. As my tests show, though, that doesn't seem to be enough; it LOOKS LIKE the GRUB that MAAS is using does not enforce Secure Boot checks on the kernels it loads. This used to be the case for Ubuntu until (IIRC) 16.04, but our more recent GRUB binaries do perform such checks. As noted in my original report, though, I couldn't find the exact binary that's to blame. This calls into question at least some of my analysis, so take the above with a grain of salt -- but I might just not know where MAAS tucks away all its boot loader files, so I may have missed the file.

In the logs from the MAAS server I've provided, you can ignore kzanol; that system does not support Secure Boot. Brennan is the machine I used for testing, and that exhibits the problem. (The other computer is on another MAAS server with dozens of deployed nodes, so its log files would be VERY cluttered by comparison.) I recall noticing warnings about an inability to access efivars filesystems in the past, but AFAIK this is not correlated with any problem. In fact, this problem manifests before the Linux kernel is loaded -- that's the problem, in fact, because the problem reported in this bug is that the kernel won't load and then the node shuts down.

Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1711203] Re: Deployments fail when Secure Boot enabled

Rod,

Thanks for the follow-up.

In some sense, if my analysis is correct, the problem is caused by Shim
> "tightening the screws" on Secure Boot policy; however, those changes
> are done for a reason (to improve security), so the solution should be
> to ensure that the GRUB versions MAAS and curtin deploy perform the
> checks that Shim wants, and that the kernels we install are signed.
>

Curtin/MAAS will install the linux-image-generic kernel for the specific
release
unless otherwise specified by MAAS in their kernel config mapping.

If there is a specific kernel package that *should* be selected instead of
the linux-image-generic kernel then MAAS/Curtin need to know:

1) what is that package name
2) how to know when to use (1) instead of linux-image-generic

A quick search of apt-cache shows

linux-signed-image-< >

Which appears to be what we'd want to use in the Secure Boot path.
In one of the other bugs I believe I had asked how curtin or MAAS can
detect whether a platform is configured for SecureBoot, but I didn't see
a definitive answer.

Revision history for this message
Steve Langasek (vorlon) wrote :

> Curtin/MAAS will install the linux-image-generic kernel for the specific
> release unless otherwise specified by MAAS in their kernel config mapping.

It should be noted that there are changes pending to the kernel packaging, such that installing linux-image *always* gives you the UEFI-signed vmlinuz, and you don't have to worry about having two copies of vmlinuz (signed and unsigned) installed to /boot.

However, in the meantime it would be best if curtin always used linux-signed-image-generic when installing on any UEFI system, so that the system isn't rendered unbootable if the user enables SecureBoot post-install. This is the current behavior of ubiquity and d-i.

None of this explains the behavior of an unsigned kernel failing to boot post-install. The current boot process is:
 - boot shim
 - verify signature on grub and boot it
 - if kernel signature verifies, boot it
 - if kernel signature does not verify, call ExitBootServices() and boot it anyway

This *will* be changing, and in preparation for this we have begun building and signing a separate grub.efi image which enforces kernel signatures: http://archive.ubuntu.com/ubuntu/dists/artful/main/uefi/grub2-amd64/current/gsbx64.efi.signed

However, we don't even package this in grub-efi-amd64-signed, it's only available for download from the archive. If maas or curtin are pulling this binary in at install time, instead of using the grubx64.efi.signed from the package, that's definitely a bug.

And if nothing is pulling gsbx64.efi.signed, then the bug is somewhere else but I'm not sure where. It's worth checking whether this problem mysteriously resolves once linux-signed is being pulled in; if it does, then it's possible we have a bug in grub (enforcing signature when it's not supposed to) or simply a bug in firmware.

Revision history for this message
Rod Smith (rodsmith) wrote :

I agree with Steve that installing linux-signed-image-generic as a default is best at the moment. Such kernels should boot whether or not Secure Boot is enabled, and AFAIK they'll even boot on BIOS-based computers (but I've not checked that, and there may be dependency issues on such systems). There is the caveat that this will increase the space used in /boot, which could cause out-of-space errors if /boot is a separate partition that's too small (see bug #1465050). I recommend 500 MB as a MINIMUM size for /boot these days.

If you want to detect whether Secure Boot is enabled, you can check the file /sys/firmware/efi/efivars/SecureBoot-8be4df61-93ca-11d2-aa0d-00e098032b8c. If its sixth byte is 0x01, then Secure Boot is enabled. For instance:

$ hexdump /sys/firmware/efi/efivars/SecureBoot-8be4df61-93ca-11d2-aa0d-00e098032b8c
0000000 0006 0000 0001

Secure Boot is enabled on this system.

As Steve notes, though, the user might enable Secure Boot post-deployment, and if a signed kernel is not installed, the boot would then fail.

One more point: In the past, nodes deployed via MAAS have booted by MAAS's PXE server delivering GRUB to the node, and that GRUB then loading the GRUB on the hard disk. If this hasn't changed, it could be that Shim or GRUB is getting confused by this sequence. (Shim unloads itself after a handoff from one EFI program to another.)

If it would help for debugging this, I can give whoever needs it access to one of the affected systems.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I don't think this requires gsbx64.efi at all, it looks to me like a different issue; but one that we'd likely hit in the future anyway when enforcing signatures on kernels.

So, how this works is that a shim retrieved over TFTP loads a grub retrieved over TFTP, which is provided some config that tells it to chainload more stuff (in the "boot from disk" case which is failing here).

Since booting straight to disk, with SB enabled, we can infer that the portion on disk of the chain is valid and working correctly -- otherwise you would have validation failures when trying that.

I can only work with the assumption that all the bits in the chain are signed either by a Microsoft key that is known by the firmware (for shim), or by the Canonical key (which itself is known by shim) in the case of grub. Everything is signed, so there's no reason for things to fail validation -- it has to be that something isn't validating signatures.

Now, that points to a grub bug, but I'm not sure how it fails here -- by my read, you'd have grub go through validating even chainloaded images for their key by asking shim to validate them. This can fail, but then we'll need the output of that boot process with:

set debug="chain,secureboot"

You set this in grub.cfg.

We don't do multiple builds of shim -- there's only one shim that exists in the archive, so it's not likely to be the culprit, and grub already manages to validate things successfully and chainload them correctly in the context of UEFI Secure Boot. This was in fact a recently fixed bug in grub that caused it to fail to chainload Windows in UEFI.

One further thing to try would be to grab grubnetx64.efi from the archive and test with replacing the grubx64.efi file in MaaS with it? That would establish that the issue is a regression in grub:

http://archive.ubuntu.com/ubuntu/dists/xenial-updates/main/uefi/grub2-amd64/2.02~beta2-36ubuntu3.11/grubnetx64.efi.signed

Revision history for this message
Rod Smith (rodsmith) wrote :

To get this in the bug report: Replacing /var/lib/maas/boot-resources/current/bootloader/uefi/amd64/grubx64.efi with the grubnetx64.efi.signed file specified by Mathieu results in a successful boot. My understanding is that this is NOT good news. :(

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Since 2.02-beta2-36ubuntu3.11 works but .12 (which is the latest in Xenial updates doesn't) this seems to confirm the regression in grub. marking invalid for NAAS and curtin!

Maas will automatically pick up a fixed grub once on the archive.

Changed in curtin:
status: Incomplete → Invalid
Changed in maas:
status: Incomplete → Invalid
Revision history for this message
Narinder Gupta (narindergupta) wrote :

I am facing similar issue at Dell site and all Dell servers are exhibiting this behavior when secure boot is enabled.

Revision history for this message
Jeff Lane  (bladernr) wrote :

Set the Grub2 task to High to grab attention (and because it's at least a High, if not Critical, bug). My gut says this should be critical as it's blocking the deployment of systems from multiple vendors in multiple datacenter and lab environments anytime SecureBoot is enabled.

Changed in grub2 (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Narinder Gupta (narindergupta) wrote :

any updates on this issue?

Revision history for this message
Steve Langasek (vorlon) wrote :

If maas+curtin are not installing the signed variant of the linux-image package on UEFI systems, this is not invalid for maas+curtin - when we rev the grub secureboot policy (ETA January), these systems will be unbootable BY DESIGN. Regardless of whether this configuration has tickled a regression in grub, this MUST be fixed.

Changed in maas:
status: Invalid → Confirmed
Changed in grub2 (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Ryan Harper (raharper) wrote :

No one in this thread has answered how MAAS or curtin
knows that it should install the -signed version of linux-image.

Once that knowledge is passed on, we can work out if curtin
can detect that or if maas can and specify which kernel package
to use.

On Thu, Nov 16, 2017 at 3:25 PM, Steve Langasek <
<email address hidden>> wrote:

> If maas+curtin are not installing the signed variant of the linux-image
> package on UEFI systems, this is not invalid for maas+curtin - when we
> rev the grub secureboot policy (ETA January), these systems will be
> unbootable BY DESIGN. Regardless of whether this configuration has
> tickled a regression in grub, this MUST be fixed.
>
> ** Changed in: maas
> Status: Invalid => Confirmed
>
> ** Changed in: grub2 (Ubuntu)
> Status: Confirmed => Won't Fix
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1711203
>
> Title:
> Deployments fail when Secure Boot enabled
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1711203/+subscriptions
>

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Nov 16, 2017 at 09:53:18PM -0000, Ryan Harper wrote:
> No one in this thread has answered how MAAS or curtin
> knows that it should install the -signed version of linux-image.

It should *unconditionally* prefer the -signed version of linux-image.

Changed in maas-images:
status: New → Confirmed
importance: Undecided → High
importance: High → Critical
Changed in maas:
importance: Undecided → Critical
milestone: none → 2.3.0
Changed in maas-images:
status: Confirmed → In Progress
Revision history for this message
Rod Smith (rodsmith) wrote :

To be clear, although installing the signed kernel package is necessary, a failure to do this is NOT the source of this bug, which seems to relate to how Shim and/or GRUB handle the MAAS boot path, which involves Shim and GRUB being PXE-booted and then chainloaded to (Shim and?) GRUB on the hard disk. I am available for testing of proposed fixes; I have one system with Secure Boot available on my home network and sporadic access to others in 1SS (from OIL; we can transfer them over to the certification network from time to time).

Changed in maas-images:
assignee: nobody → Lee Trager (ltrager)
Revision history for this message
Andres Rodriguez (andreserl) wrote :

We have a test streams that uses the signed linux kernel instead of the non-signed for x86. Can you please test it from this stream:

http://162.213.35.187/proposed/streams/v1/index.json

Revision history for this message
Rod Smith (rodsmith) wrote :

Andres, I've downloaded that file, but I have no idea where to put it. I can't find a file called index.json on my MAAS server.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Rod,

Can you retry this URL as a different images source:

http://162.213.35.187/proposed/streams/v1/index.json

Revision history for this message
Rod Smith (rodsmith) wrote :

I've tried this and the problem persists. Note that MAAS *IS* installing the signed kernel, which is necessary but insufficient for a fix; the problem seems to be that Shim/GRUB is becoming confused by the handoff from the PXE-boot version of GRUB to the GRUB stored on the hard disk. If my analysis is correct, this will require either:

* Changes to Shim/GRUB so that it works in this configuration. This used to
  be the case, but the Shim/GRUB configuration has been tightening
  security, which introduced this bug as a side effect.
* A change in the way MAAS/curtin configures the PXE-booted GRUB so that it
  boots the system directly, without chainloading to GRUB on the hard disk.
  Note that this approach to a solution used to be used on ARM64 EFI
  systems, but that created a (now-fixed) bug #1582070. Thus, if this
  approach is used, care will have to be taken to not cause a regression on
  that bug.

Changed in maas:
status: Confirmed → Invalid
Revision history for this message
Rod Smith (rodsmith) wrote :

Here's the install log, cut-and-pasted from the MAAS web UI, for the latest installation. Note that after the node shut down, I restarted it and disabled Secure Boot to get it to complete.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

So to clarify, MAAS pxe config searches & chainloads /efi/ubuntu/shimx64.efi. It seems here the issue is with the shim. As per Rod's comments:

"Changes to Shim/GRUB so that it works in this configuration. This used to be the case, but the Shim/GRUB configuration has been tightening security, which introduced this bug as a side effect."

Revision history for this message
Ryan Harper (raharper) wrote :

Should we re-open the grub2 task then? or add a shim task?

Lee Trager (ltrager)
Changed in maas-images:
status: In Progress → Fix Committed
Revision history for this message
Lee Trager (ltrager) wrote :

I've updated lp:maas-images to produce new images using the linux-signed kernel on AMD64. New images are produced when http://cloud-images.ubuntu.com/daily/ adds new images so it may take a few days for signed kernels to appear in the stream. Unsupported releases are no longer updated so we'll have to manually regenerate them if we want signed kernels.

The stream also contains all bootloaders including the shim. Once a new shim-signed package is released to Xenial the stream will automatically ingest the the update. Let me know if we want to test an updated bootloader, I can produce a new proposed stream.

Revision history for this message
Rod Smith (rodsmith) wrote :

I'd just like to emphasize that, although a change to always install the linux-signed kernel on AMD64 systems is necessary to fix this bug, it's not sufficient to fix the bug. As noted in my comment #25 (and elsewhere), another change is also required -- either a change to Shim or GRUB (I don't know which) or a change to how MAAS handles the boot process (to have the PXE-booted GRUB read the configuration file from the hard disk rather than chainload to GRUB on the hard disk; or perhaps a change to the way the handoff is done, if some tweak could bypass the bug).

As before, I remain able and willing to test potential fixes.

Revision history for this message
Ryan Harper (raharper) wrote :

Reviewing @slangasek's notes

> It's worth checking whether this problem
> mysteriously resolves once linux-signed is being pulled in; if it does,
> then it's possible we have a bug in grub (enforcing signature when it's
> not supposed to) or simply a bug in firmware.

It would appear that despite the change to linux-signed, there is still a
bug.
In that light, can we get next steps on debugging grub or firmware or
whateever
else is needed to push this along?

On Tue, Dec 5, 2017 at 7:58 AM, Rod Smith <email address hidden> wrote:

> I'd just like to emphasize that, although a change to always install the
> linux-signed kernel on AMD64 systems is necessary to fix this bug, it's
> not sufficient to fix the bug. As noted in my comment #25 (and
> elsewhere), another change is also required -- either a change to Shim
> or GRUB (I don't know which) or a change to how MAAS handles the boot
> process (to have the PXE-booted GRUB read the configuration file from
> the hard disk rather than chainload to GRUB on the hard disk; or perhaps
> a change to the way the handoff is done, if some tweak could bypass the
> bug).
>
> As before, I remain able and willing to test potential fixes.
>
> --
> You received this bug notification because you are subscribed to curtin.
> Matching subscriptions: curtin-bugs-all
> https://bugs.launchpad.net/bugs/1711203
>
> Title:
> Deployments fail when Secure Boot enabled
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1711203/+subscriptions
>

Revision history for this message
Andres Rodriguez (andreserl) wrote :

As per Rod's comments, I'm re-opening the grub task.

Changed in maas-images:
status: Fix Committed → Fix Released
Changed in grub2 (Ubuntu):
status: Won't Fix → New
Revision history for this message
Steve Langasek (vorlon) wrote :

From Andres, the grub.cfg used for chainloading to local disk is:

set default="0"
set timeout=0

menuentry 'Local' {
    echo 'Booting local disk...'
    search --set=root --file /efi/ubuntu/shimx64.efi
    chainloader /efi/ubuntu/shimx64.efi
}

It should be possible to recreate an environment outside of maas for reproducing this (UEFI VM configured with SB on, netboot w/ shim+grub, chainload to disk via the above .cfg).

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Yes, it's absolutely possible to recreate the environment for testing this without MAAS -- there's nothing all that special to it, chainloading *any* image should work and maintain a Secure Boot-verified chain provided all the links in the chain validate images.

This looks to be pretty clearly a bug in chainloader's validation of images, it used to work, but only because it wasn't actually verifying much of it in the first place.

Changed in grub2 (Ubuntu):
status: New → In Progress
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
Revision history for this message
Lee Trager (ltrager) wrote :

While reading through #1730493 and #1437024 I noticed both had various UEFI bootloader issues fixed by switching to the Artful version of grub and the shim. I've updated http://162.213.35.187/proposed/streams/v1/index.json to use boot loaders from Artful in case anyone wants to test.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

That's not going to change anything -- grub is doing exactly what it should: ask shim to validate the image it tries to chainload; and the image *does* validate successfully. The chain of trust is technically preserved, but shim doesn't manage to make sense of things, and refuses to continue loading.

This is a "bug" in shim, in that it's not a use case that was anticipated. Shim makes sense of the shim->fallback->shim->grub case because in that case things do go through the steps of calling load_image() and start_image() in firmware.

It also seems to me like a bug in grub because we ought to be loading things in such a way that shim would be able to make sense of it -- currently, that's not quite the case because some relocations and other image mangling needs to happen. I have an idea of a hack to fix this, but I think the "right" fix would be in shim.

What happens is that given that load_image() isn't called directly, when the second shim runs it doesn't uninstall the protocols and we end up validating against the first loaded shim when we try to verify the kernel's signature. This is effectively a variation on an issue that was fixed in shim for the fallback EFI binary.

In the meantime, there's also a valid workaround: you should be able to chainload *grub* rather than shim from the disk, and thus maintain the chain of trust for Secure Boot:

menuentry 'Local' {
 echo 'Booting local disk...'
 search --set=root --file /efi/ubuntu/grubx64.efi
 chainloader /efi/ubuntu/grubx64.efi
}

Revision history for this message
Rod Smith (rodsmith) wrote :

Lee, I tried http://162.213.35.187/proposed/streams/v1/index.json earlier, in response to Andres' suggestion, and that stream did not help. (See comments #24 and #25.) If you think that stream has changed since I did my testing on November 27, I'm happy to try again; but if not, it doesn't help.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Rod,

Any chance you can test the work around of comment #36. You will need to manually modify a file under:

/usr/lib/python3/dist-packages/provisioningserver/templates/uefi/config.local.amd64.template

And then restart maas-regiond & maas-rackd.

Thanks!

Revision history for this message
Rod Smith (rodsmith) wrote :

Andres,

I've checked that, and it does *NOT* fix the problem; the system fails to boot after a deployment in exactly the same way it did before.

Revision history for this message
Jochen Wezel (jwezel) wrote :

I also face this issue with nodes running on Hyper-V 2016 and enabled Secure Boot (Microsoft UEFI cert.).

My node (with deployed Ubuntu 17.10) shows following warning:
---
Bootloader has not verified loaded image.
System is compromised. halting.
---
After a few seconds, the node powers off.

I'm currently using MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~17.10.1)

Revision history for this message
Jeff Lane  (bladernr) wrote :

Hi Matthieu,

Any update on this? I'm also getting reports on this same issue from one of the hardware partners as well who is unable to deploy nodes and perform cert testing while Secure Boot is enabled.

tags: added: blocks-hwcert-server
tags: added: id-5a28802797729aedf99dcd37
Revision history for this message
Jeff Lane  (bladernr) wrote :

Just as an update, this is still an issue with Grub in Bionic...

Revision history for this message
Jeff Lane  (bladernr) wrote :

and Xenial

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I have provided a workaround in comment #36, has this not been applied? Landing a fix for this is going to take time, as it depends on a full roundtrip of getting shim prepared, tested, and signed by Microsoft.

Revision history for this message
Rod Smith (rodsmith) wrote :

Mathieu, the workaround of chainloading GRUB rather than shim that you suggested in comment #36 does not work; see my comment #39.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I'm at a loss to explain that. This works quite well in my netboot testing when I remove MAAS from the equation. You *are* meant to be able to chainload grub from another grub; and the reason why grub can't chainload shim is that you then get the wrong set of shim protocols to properly validate the next binary. This will need more testing; I will need to know what hardware this is and what exactly is the content of the grub configs.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

So I've enabled secure boot on my Intel NUC's and have *not* used to workaround in #36, and the machines deployed just fine (that is, they pxe boot off MAAS and they are told to load the shim). The same scenario is when using workaround in #36.

That said, the interesting bit is I remember testing these machines with secure boot enabled when having the non-signed kernel, and they didn't deploy. With the signed kernel, they started deploying.

So, I would like to test and see the difference in other machine other than a NUC.

Revision history for this message
Jeff Lane  (bladernr) wrote :

MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1)
This is my observation on a Lenovo RS140 with workaround enabled from comment #36:
Also, to be sure it's not something we've injected, I am using the default curtin_userdata, NOT our customized cert one.

1: edit: /usr/lib/python3/dist-packages/provisioningserver/templates/uefi/config.local.amd64.template
2: sudo service maas-regiond restart
3: sudo service maas-rackd restart
4: Enable Secure Boot on server
5: Re-Commission node in MAAS
5.1: re-commission successful
6: Deploy Bionic
6.1 Bionic fails. Ephemeral boots and deployment proceeds. On reboot, node PXEs and gets the boot loader stuff from MAAS and proceeds to boot locally. This is where it fails with this on screen:

Booting local disk...
error: no such device: /efi/ubuntu/grubx64.efi.
error: File not found.

Press any key to continue...

Failed to boot both default and fallback entries.

Press any key to continue.

I retried this with Xenial and got the same failure to boot on the initial reboot.

This is what I have in the template per comments #36 and #38 above:
bladernr@critical-maas:/usr/lib/python3/dist-packages/provisioningserver/templates/uefi$ cat config.local.amd64.template
set default="0"
set timeout=0

menuentry 'Local' {
    echo 'Booting local disk...'
    {{if kernel_params.osystem == "windows"}}
    search --set=root --file /efi/Microsoft/Boot/bootmgfw.efi
    chainloader /efi/Microsoft/Boot/bootmgfw.efi
    {{else}}
    search --set=root --file /efi/ubuntu/grubx64.efi
    chainloader /efi/ubuntu/grubx64.efi
    {{endif}}
}

Revision history for this message
Jeff Lane  (bladernr) wrote :

Now, at this point, I'm stuck unbooted on the initial post-deployment reboot. So I reset the node by hand (poked the reset button) and disabled SecureBoot in the config and rebooted it again.

This time, the node booted, pxe booted, got the edict to boot local, and successfully booted locally.

If I do not take this step to disable secure boot during this post-deployment reboot cycle, the system fails to boot and eventually is marked as "Failed Deployment" once MAAS times out waiting for an update.

By manually intervening here, MAAS gets the proper message from the node and markes the deployment as successful (Sets node to Deployed state).

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Ok, so I've tested the workaround in a supermicro system provided by the cert team, and this is my evaluation:

1. Without the workaround on #36, the machine fails to deploy (e.g. Using the shim fails and the machine powersoff)

2. With the work around on #36, the machine deploys successfully.

I'm making this change in MAAS as a working work around.

Changed in maas:
status: Invalid → Triaged
importance: Critical → High
assignee: nobody → Andres Rodriguez (andreserl)
Changed in maas:
status: Triaged → In Progress
Revision history for this message
Jeff Lane  (bladernr) wrote :

Can we please verify that with one of the original failing systems (Cisco UCS C-240 M4) as well?

Because that supermicro system works, my Lenovo fails even with the workaround (comments #48 and #49).

Unless I somehow mangled the workaround (see comment #48) and should re-try with slightly different changes in that efi template.

Revision history for this message
Rod Smith (rodsmith) wrote :

The workaround in #36 is now working for me on my home network, too. Perhaps when I tested it in December (comment #39) I had different software versions; or maybe I didn't correctly reproduce the changes in comment #36.

I did a diff on what you posted in #48, Jeff, and it exactly matches what I'm using, and what Andres put on weavile, so I don't think your result is caused by an error in your configuration file.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Feb 22, 2018 at 08:45:17PM -0000, Jeff Lane wrote:
> Can we please verify that with one of the original failing systems
> (Cisco UCS C-240 M4) as well?

> Because that supermicro system works, my Lenovo fails even with the
> workaround (comments #48 and #49).

Is /efi/ubuntu/grubx64.efi on your EFI System Partition definitely the
Canonical-signed image from grub-efi-amd64-signed?

Which version of Ubuntu's grub are you booting via pxe?

If you re-enable SecureBoot and configure this system to boot directly from
local disk instead of booting pxe first and chainloading, does it boot
successfully?

Steve Langasek (vorlon)
affects: grub2 (Ubuntu) → shim (Ubuntu)
Revision history for this message
Jeff Lane  (bladernr) wrote :

> Is /efi/ubuntu/grubx64.efi on your EFI System Partition definitely the
> Canonical-signed image from grub-efi-amd64-signed?

I presume so? dpkg says it is:

ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -S grubx64.efi
grub-efi-amd64-signed: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed

That's the only thing that provides the file (that I can tell).

> Which version of Ubuntu's grub are you booting via pxe?

ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -l |grep grub|awk '{print $2": "$3}'
grub-common: 2.02~beta2-36ubuntu3.16
grub-efi-amd64: 2.02~beta2-36ubuntu3.16
grub-efi-amd64-bin: 2.02~beta2-36ubuntu3.16
grub-efi-amd64-signed: 1.66.16+2.02~beta2-36ubuntu3.16
grub-pc: 2.02~beta2-36ubuntu3.16
grub-pc-bin: 2.02~beta2-36ubuntu3.16
grub2-common: 2.02~beta2-36ubuntu3.16

That is what is installed on the node.

> If you re-enable SecureBoot and configure this system to boot directly from
> local disk instead of booting pxe first and chainloading, does it boot
> successfully?

So I re-enabled SecureBoot and removed all NICs from the boot order. I added in the HDD (since this is an EFI boot, the HDD is an entry called "Ubuntu" under "OTHER" in the boot order)

This fails to boot, I get an error from the system:

Error 1962: No operating system found. Boot sequence will automatically repeat.

Because I have no NICs listed in the boot order, this just churns as it keeps retrying the HDD entry.

So next, I went back and disabled SecureBoot once more. It immediately booted straight from the HDD.

I also just tried a USB install with Secure Boot enabled. I was able to install bionic from USB, but it too fails to boot with the same error.

To be fair at this point, given that this does work elsewhere, I'm suspicious that this is possibly an issue with my server.

That said, I'd like to see this verified on that Cisco C240 system as an extra data point.

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Feb 22, 2018 at 11:06:51PM -0000, Jeff Lane wrote:
> > Is /efi/ubuntu/grubx64.efi on your EFI System Partition definitely the
> > Canonical-signed image from grub-efi-amd64-signed?

> I presume so? dpkg says it is:

> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -S grubx64.efi
> grub-efi-amd64-signed: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed

That doesn't establish that /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
and /boot/efi/EFI/ubuntu/grubx64.efi match. Can you please verify that they
do?

> > Which version of Ubuntu's grub are you booting via pxe?

> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -l |grep grub|awk '{print $2": "$3}'
> grub-common: 2.02~beta2-36ubuntu3.16
> grub-efi-amd64: 2.02~beta2-36ubuntu3.16
> grub-efi-amd64-bin: 2.02~beta2-36ubuntu3.16
> grub-efi-amd64-signed: 1.66.16+2.02~beta2-36ubuntu3.16
> grub-pc: 2.02~beta2-36ubuntu3.16
> grub-pc-bin: 2.02~beta2-36ubuntu3.16
> grub2-common: 2.02~beta2-36ubuntu3.16

> That is what is installed on the node.

Sorry, I was asking about the other end of this: what version of
grubnetx64.efi is being served by maas?

(But it is also good to confirm what version of grub is installed on the
node's disk.)

> So I re-enabled SecureBoot and removed all NICs from the boot order. I
> added in the HDD (since this is an EFI boot, the HDD is an entry called
> "Ubuntu" under "OTHER" in the boot order)

> This fails to boot, I get an error from the system:

> Error 1962: No operating system found. Boot sequence will automatically
> repeat.

> Because I have no NICs listed in the boot order, this just churns as it
> keeps retrying the HDD entry.

> So next, I went back and disabled SecureBoot once more. It immediately
> booted straight from the HDD.

> I also just tried a USB install with Secure Boot enabled. I was able to
> install bionic from USB, but it too fails to boot with the same error.

> To be fair at this point, given that this does work elsewhere, I'm
> suspicious that this is possibly an issue with my server.

Agreed. Something is wrong with the boot configuration of this node, which
is independent of the question of whether we have a viable workaround for
the netboot chainloading bug.

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (3.7 KiB)

This brings a good point. What I didn’t test, which will do tomorrow, is
what happens if I kill Maas and let the same system boot from disk. I
wonder if it will boot.

On Thu, Feb 22, 2018 at 6:20 PM Jeff Lane <email address hidden>
wrote:

> > Is /efi/ubuntu/grubx64.efi on your EFI System Partition definitely the
> > Canonical-signed image from grub-efi-amd64-signed?
>
> I presume so? dpkg says it is:
>
> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -S grubx64.efi
> grub-efi-amd64-signed: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
>
> That's the only thing that provides the file (that I can tell).
>
> > Which version of Ubuntu's grub are you booting via pxe?
>
> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -l |grep grub|awk '{print $2":
> "$3}'
> grub-common: 2.02~beta2-36ubuntu3.16
> grub-efi-amd64: 2.02~beta2-36ubuntu3.16
> grub-efi-amd64-bin: 2.02~beta2-36ubuntu3.16
> grub-efi-amd64-signed: 1.66.16+2.02~beta2-36ubuntu3.16
> grub-pc: 2.02~beta2-36ubuntu3.16
> grub-pc-bin: 2.02~beta2-36ubuntu3.16
> grub2-common: 2.02~beta2-36ubuntu3.16
>
> That is what is installed on the node.
>
> > If you re-enable SecureBoot and configure this system to boot directly
> from
> > local disk instead of booting pxe first and chainloading, does it boot
> > successfully?
>
> So I re-enabled SecureBoot and removed all NICs from the boot order. I
> added in the HDD (since this is an EFI boot, the HDD is an entry called
> "Ubuntu" under "OTHER" in the boot order)
>
> This fails to boot, I get an error from the system:
>
> Error 1962: No operating system found. Boot sequence will automatically
> repeat.
>
> Because I have no NICs listed in the boot order, this just churns as it
> keeps retrying the HDD entry.
>
> So next, I went back and disabled SecureBoot once more. It immediately
> booted straight from the HDD.
>
> I also just tried a USB install with Secure Boot enabled. I was able to
> install bionic from USB, but it too fails to boot with the same error.
>
> To be fair at this point, given that this does work elsewhere, I'm
> suspicious that this is possibly an issue with my server.
>
> That said, I'd like to see this verified on that Cisco C240 system as an
> extra data point.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1711203
>
> Title:
> Deployments fail when Secure Boot enabled
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1711203/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=curtin; status=Invalid; importance=Undecided;
> assignee=None;
> Launchpad-Bug: product=dellserver; status=New; importance=Undecided;
> assignee=None;
> Launchpad-Bug: product=maas; milestone=2.3.0; status=In Progress;
> importance=High; <email address hidden>;
> Launchpad-Bug: product=maas; productseries=2.3; milestone=2.3.1; status=In
> Progress; importance=High; <email address hidden>;
> Launchpad-Bug: product=maas-images; status=Fix Released;
> importance=Critical; <email address hidden>;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=shim; component=main;
> status=...

Read more...

Revision history for this message
Jeff Lane  (bladernr) wrote :
Download full text (3.6 KiB)

On Thu, Feb 22, 2018 at 6:28 PM, Steve Langasek
<email address hidden> wrote:
> On Thu, Feb 22, 2018 at 11:06:51PM -0000, Jeff Lane wrote:
>> > Is /efi/ubuntu/grubx64.efi on your EFI System Partition definitely the
>> > Canonical-signed image from grub-efi-amd64-signed?
>
>> I presume so? dpkg says it is:They look the same to me:
>
>> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -S grubx64.efi
>> grub-efi-amd64-signed: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
>
> That doesn't establish that /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
> and /boot/efi/EFI/ubuntu/grubx64.efi match. Can you please verify that they
> do?

Doh!... indeed.
ubuntu@xwing:~$ md5sum /boot/efi/EFI/ubuntu/grubx64.efi
/usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
474a3900382e54c2129626683f12f3b5 /boot/efi/EFI/ubuntu/grubx64.efi
474a3900382e54c2129626683f12f3b5
/usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
ubuntu@xwing:~$ diff -s /boot/efi/EFI/ubuntu/grubx64.efi
/usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
Files /boot/efi/EFI/ubuntu/grubx64.efi and
/usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed are identical

>> > Which version of Ubuntu's grub are you booting via pxe?
>
>> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -l |grep grub|awk '{print $2": "$3}'
>> grub-common: 2.02~beta2-36ubuntu3.16
>> grub-efi-amd64: 2.02~beta2-36ubuntu3.16
>> grub-efi-amd64-bin: 2.02~beta2-36ubuntu3.16
>> grub-efi-amd64-signed: 1.66.16+2.02~beta2-36ubuntu3.16
>> grub-pc: 2.02~beta2-36ubuntu3.16
>> grub-pc-bin: 2.02~beta2-36ubuntu3.16
>> grub2-common: 2.02~beta2-36ubuntu3.16
>
>> That is what is installed on the node.
>
> Sorry, I was asking about the other end of this: what version of
> grubnetx64.efi is being served by maas?

I have no idea. Andres?

As far as I can tell, it's serving up a copy of grubx64.efi out of
/var/lib/maas/boot-resources/current

which has files dated Feb 5.

bladernr@critical-maas:/var/lib/maas/boot-resources/current/bootloader/uefi/amd64$
ll
total 2328
drwxr-xr-x 2 maas maas 4096 Feb 22 17:34 ./
drwxr-xr-x 4 maas maas 4096 Feb 22 17:34 ../
-rw-r--r-- 2 maas maas 1196736 Feb 5 07:29 bootx64.efi
-rw-r--r-- 2 maas maas 1173368 Feb 5 07:29 grubx64.efi

That all comes from maas.io.

I presume its one of these?

http://images.maas.io/ephemeral-v3/daily/streams/v1/com.ubuntu.maas:daily:1:bootloader-download.json

>
> (But it is also good to confirm what version of grub is installed on the
> node's disk.)
>
>> So I re-enabled SecureBoot and removed all NICs from the boot order. I
>> added in the HDD (since this is an EFI boot, the HDD is an entry called
>> "Ubuntu" under "OTHER" in the boot order)
>
>> This fails to boot, I get an error from the system:
>
>> Error 1962: No operating system found. Boot sequence will automatically
>> repeat.
>
>> Because I have no NICs listed in the boot order, this just churns as it
>> keeps retrying the HDD entry.
>
>> So next, I went back and disabled SecureBoot once more. It immediately
>> booted straight from the HDD.
>
>> I also just tried a USB install with Secure Boot enabled. I was able to
>> install bionic from USB, but it too fails to boot with the same...

Read more...

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (5.6 KiB)

On Thu, Feb 22, 2018 at 7:55 PM, Jeff Lane <email address hidden>
wrote:

> On Thu, Feb 22, 2018 at 6:28 PM, Steve Langasek
> <email address hidden> wrote:
> > On Thu, Feb 22, 2018 at 11:06:51PM -0000, Jeff Lane wrote:
> >> > Is /efi/ubuntu/grubx64.efi on your EFI System Partition definitely the
> >> > Canonical-signed image from grub-efi-amd64-signed?
> >
> >> I presume so? dpkg says it is:They look the same to me:
> >
> >> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -S grubx64.efi
> >> grub-efi-amd64-signed: /usr/lib/grub/x86_64-efi-
> signed/grubx64.efi.signed
> >
> > That doesn't establish that /usr/lib/grub/x86_64-efi-
> signed/grubx64.efi.signed
> > and /boot/efi/EFI/ubuntu/grubx64.efi match. Can you please verify that
> they
> > do?
>
> Doh!... indeed.
> ubuntu@xwing:~$ md5sum /boot/efi/EFI/ubuntu/grubx64.efi
> /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
> 474a3900382e54c2129626683f12f3b5 /boot/efi/EFI/ubuntu/grubx64.efi
> 474a3900382e54c2129626683f12f3b5
> /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
> ubuntu@xwing:~$ diff -s /boot/efi/EFI/ubuntu/grubx64.efi
> /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
> Files /boot/efi/EFI/ubuntu/grubx64.efi and
> /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed are identical
>
> >> > Which version of Ubuntu's grub are you booting via pxe?
> >
> >> ubuntu@xwing:/boot/efi/EFI/ubuntu$ dpkg -l |grep grub|awk '{print
> $2": "$3}'
> >> grub-common: 2.02~beta2-36ubuntu3.16
> >> grub-efi-amd64: 2.02~beta2-36ubuntu3.16
> >> grub-efi-amd64-bin: 2.02~beta2-36ubuntu3.16
> >> grub-efi-amd64-signed: 1.66.16+2.02~beta2-36ubuntu3.16
> >> grub-pc: 2.02~beta2-36ubuntu3.16
> >> grub-pc-bin: 2.02~beta2-36ubuntu3.16
> >> grub2-common: 2.02~beta2-36ubuntu3.16
> >
> >> That is what is installed on the node.
> >
> > Sorry, I was asking about the other end of this: what version of
> > grubnetx64.efi is being served by maas?
>
> I have no idea. Andres?
>
> As far as I can tell, it's serving up a copy of grubx64.efi out of
> /var/lib/maas/boot-resources/current
>
> which has files dated Feb 5.

> bladernr@critical-maas:/var/lib/maas/boot-resources/
> current/bootloader/uefi/amd64$
> ll
> total 2328
> drwxr-xr-x 2 maas maas 4096 Feb 22 17:34 ./
> drwxr-xr-x 4 maas maas 4096 Feb 22 17:34 ../
> -rw-r--r-- 2 maas maas 1196736 Feb 5 07:29 bootx64.efi
> -rw-r--r-- 2 maas maas 1173368 Feb 5 07:29 grubx64.efi
>
> That all comes from maas.io.
>
> I presume its one of these?
>
> http://images.maas.io/ephemeral-v3/daily/streams/v1/
> com.ubuntu.maas:daily:1
> :bootloader-download.json

Whichever is the latest version in -updates at the time the streams were
created.

But yes, the latest version on the bootloader stream.

>
>
>
> >
> > (But it is also good to confirm what version of grub is installed on the
> > node's disk.)
> >
> >> So I re-enabled SecureBoot and removed all NICs from the boot order. I
> >> added in the HDD (since this is an EFI boot, the HDD is an entry called
> >> "Ubuntu" under "OTHER" in the boot order)
> >
> >> This fails to boot, I get an error from the system:
> >
> >> Error 1962: No operating system found. Boot sequence will automatic...

Read more...

Revision history for this message
Andres Rodriguez (andreserl) wrote :

FWIW, I did a bit of extra testing. I killed maas' rackd (which provides PXE). Rebooted the machine and I saw:

1. It attempted to PXE boot multiple times (like a lot)
2. It eventually gave up and booted from disk

So it successfully booted into the deployed OS.

I noticed that the curtin installation reported the boot order, and seems that (1) above was caused because of the following:

BootCurrent: 0006
Timeout: 1 seconds
BootOrder: 0006,0000,0004,0003,0008,0007,0005,0009,000A
Boot0000* ubuntu
Boot0003* UEFI: Intel(R) I350 Gigabit Network Connection
Boot0004* UEFI: IP4 Intel(R) I350 Gigabit Network Connection
Boot0005* UEFI: Intel(R) I350 Gigabit Network Connection
Boot0006* UEFI: IP4 Intel(R) I350 Gigabit Network Connection
Boot0007* UEFI: Intel(R) 82599 10 Gigabit Dual Port Network Connection
Boot0008* UEFI: IP4 Intel(R) 82599 10 Gigabit Dual Port Network Connection
Boot0009* UEFI: Intel(R) 82599 10 Gigabit Dual Port Network Connection
Boot000A* UEFI: IP4 Intel(R) 82599 10 Gigabit Dual Port Network Connection

Revision history for this message
Steve Langasek (vorlon) wrote :

On Fri, Feb 23, 2018 at 01:13:42AM -0000, Andres Rodriguez wrote:
> > bladernr@critical-maas:/var/lib/maas/boot-resources/
> > current/bootloader/uefi/amd64$
> > ll
> > total 2328
> > drwxr-xr-x 2 maas maas 4096 Feb 22 17:34 ./
> > drwxr-xr-x 4 maas maas 4096 Feb 22 17:34 ../
> > -rw-r--r-- 2 maas maas 1196736 Feb 5 07:29 bootx64.efi
> > -rw-r--r-- 2 maas maas 1173368 Feb 5 07:29 grubx64.efi

> > That all comes from maas.io.

> > I presume its one of these?

> > http://images.maas.io/ephemeral-v3/daily/streams/v1/
> > com.ubuntu.maas:daily:1
> > :bootloader-download.json

> Whichever is the latest version in -updates at the time the streams were
> created.

> But yes, the latest version on the bootloader stream.

This matches the filesize of the grubnetx64.efi.signed from
grub2 2.02~beta2-36ubuntu3.16 - so it looks like this is up-to-date.

Revision history for this message
Rod Smith (rodsmith) wrote :

Jeff,

The Cisco C-240 M4 (boldore) that originally produced this bug seems to have been returned to OIL, so I can't test with it, at least not quickly; however, I did just run a test with feebas, a Cisco C220 M4. I was able to deploy Ubuntu 16.04 and boot it with Secure Boot enabled, and verified SB was enabled on the deployed system, by using the workaround in post #36.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
milestone: 2.3.0 → none
Michael Reed (mreed8855)
Changed in dellserver:
status: New → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

The SRU of shim 15+ has been rolled back from bionic-updates while we investigate this issue.

tags: added: regression-update
Revision history for this message
Steve Langasek (vorlon) wrote :

Sorry, commenting on the wrong bug - this bug is obviously older than the most recent SRU-induced problem.

tags: removed: regression-update
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Shim 15+ includes the fix for this chainloading trick; you should now be able to chainload from:

tftp shim -> tftp grub -> disk shim -> disk grub

That shim 15+ version is in cosmic for now; pending more investigation into the relocation bug that was identified in bionic.

Changed in shim (Ubuntu):
status: In Progress → Fix Released
Changed in shim (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
Revision history for this message
Woodrow Shen (woodrow-shen) wrote :

Hi,

I know this bug was gone for a while, but now there are my findings which may be a regression:

Test environment:
MAAS version: 2.5.0 (7442-gdf68e30a5-0ubuntu1~18.04.1)

1. Dell G3 3590 laptop with secure boot enabled

Deploying 18.04 from MAAS => Got the same error as bug described.
Deploying 19.10 from MAAS => Got the same error as bug described.

2. Shuttle Inc. DH270 with secure boot enabled

Deploying 18.04 from MAAS => Got the same error as bug described.
Deploying 19.10 from MAAS => Got the same error as bug described.

From screenshot I attached, it apparently said the machine had enabled secure boot but it still show shim's message.

Another phenomenon was doing grub chainload from local disk, the grub provided by maas will find grubx64.efi from /efi/boot instead of /efi/ubuntu/ and it reported *no found" from that path because grubx64.efi actually didn't exist under /efi/boot/. I'm not sure if this behaviour is expected or not.

Any comment?

Rex Tsai (chihchun)
tags: added: oem-priority
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.