merlin boards set to boot from disk after MAAS deploy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin |
Expired
|
Low
|
Unassigned |
Bug Description
curtin 18.1-632-
We have some AMI X-Gene 2 "merlin" boards in CI using MAAS, which use UEFI (Tianocore) firmware. These systems used to work with MAAS but were out of commission (no pun intended) for a while due to a kernel bug. When that was resolved and we brought them online, we found that MAAS was no longer working reliably. Turns out that after an initial deployment, these boards were no longer PXE booting, but instead booting from the previous "ubuntu" boot entry.
I believe the root cause is the following: when these systems have both EFI boot entries for PXE and ubuntu, firmware sets "BootCurrent" to the "ubuntu" entry, even if we really booted from PXE[*]. What I gleaned from LP: #1789650 is that curtin will rejuggle the boot entries so that the "BootCurrent" entry is first, followed by "ubuntu". Due to this seemingly clear firmware bug, that will cause the PXE entry to get buried. I'm assuming that when these systems worked before, curtin was still calling grub-install w/ --no-nvram, so no ubuntu entry was created.
I don't believe there will ever be a firmware fix for these systems, so we'd probably need some workaround in curtin to proceed. One thought that comes to mind is to revert to the old --no-nvram parameter if we found that BootCurrent points to an on-disk entry. We'll lose the ability to boot when the MAAS server is down, but that seems like a fair trade-off.
[*] I'm not sure if firmware is incorrectly setting BootCurrent - it could just not be setting it at all. In my testing, ubuntu is always entry "0000" and BootCurrent is always "0000".
Is there a way to revert to the old behavior (update_ nvram=false) using a preseed? If so, what would that look like?