Comment 8 for bug 1711203

Revision history for this message
Rod Smith (rodsmith) wrote :

Ryan, neither of the bugs you reference is a duplicate of this one. This bug is new. Both the nodes I've tested have been successfully deployed with Secure Boot active in the past. In fact, brennan had been successfully deployed with Secure Boot but then failed to boot from that very deployment some days or weeks later (I'm not sure how long it had been since I last booted it), which suggests to me that the Shim/GRUB provided by the MAAS server when PXE-booting had changed in that period.

Bug #1680917 is about what happens when the MAAS server becomes unavailable and a node tries to boot. This problem would exist with or without Secure Boot being active.

Bug #1687729 is more similar to this new bug, but that problem does NOT affect all systems. My read on that bug report is that it was an incompatibility between our Shim and the Secure Boot implementation in some computers. The bug I'm reporting now appears to be a problem caused by Shim refusing to allow a GRUB that doesn't check the validity of a loaded kernel to boot.

In some sense, if my analysis is correct, the problem is caused by Shim "tightening the screws" on Secure Boot policy; however, those changes are done for a reason (to improve security), so the solution should be to ensure that the GRUB versions MAAS and curtin deploy perform the checks that Shim wants, and that the kernels we install are signed. AFAIK, we have all the required pieces in the standard Ubuntu toolset, but clearly, a deployed system does not have signed kernels. As my tests show, though, that doesn't seem to be enough; it LOOKS LIKE the GRUB that MAAS is using does not enforce Secure Boot checks on the kernels it loads. This used to be the case for Ubuntu until (IIRC) 16.04, but our more recent GRUB binaries do perform such checks. As noted in my original report, though, I couldn't find the exact binary that's to blame. This calls into question at least some of my analysis, so take the above with a grain of salt -- but I might just not know where MAAS tucks away all its boot loader files, so I may have missed the file.

In the logs from the MAAS server I've provided, you can ignore kzanol; that system does not support Secure Boot. Brennan is the machine I used for testing, and that exhibits the problem. (The other computer is on another MAAS server with dozens of deployed nodes, so its log files would be VERY cluttered by comparison.) I recall noticing warnings about an inability to access efivars filesystems in the past, but AFAIK this is not correlated with any problem. In fact, this problem manifests before the Linux kernel is loaded -- that's the problem, in fact, because the problem reported in this bug is that the kernel won't load and then the node shuts down.