Comment 55 for bug 1789650

Revision history for this message
Ryan Harper (raharper) wrote :

> Yes, #1, #3, and #4 were successful from the perspective of forcing a PXE boot; however, putting the ubuntu entry at the end of the boot list likely breaks the reason for creating the ubuntu entry at all (namely, enabling the node to boot if the MAAS server is down), since chances are an entry to boot to the firmware setup utility, EFI shell, or something else that would prevent an automated boot when the ubuntu entry is last in the list.

MAAS being down fallback is a good point; I'd not thought of that.

> Given lack of a BootCurrent variable, I think that pushing the ubuntu entry to the second position is the easiest compromise.

Yes

> A more complex approach would be to move that entry beyond at least the first PXE-boot item in the Boot#### entries. This would be trickier to program and would require identifying PXE-boot items, which might not be 100% reliable. FWIW, there's code to identify PXE-boot items in the efi-pxeboot script in Checkbox. (See https://code.launchpad.net/~checkbox-dev/plainbox-provider-checkbox/+git/plainbox-provider-checkbox/+ref/master; and specifically, lines 99-100 of https://git.launchpad.net/plainbox-provider-checkbox/tree/bin/efi-pxeboot.py. The code looks for the strings "Network", "PXE", "NIC", "Ethernet", "IP4", or "IP6" in the description field. So far and AFAIK, those strings have correctly identified every network-boot option we've encountered in certification, but there's no guarantee that the next server released will use something else.)

That seems reasonable enough; again this is down a path that doesn't
currently boot; so we can only make it better.

> Even if you did this, moving "ubuntu" after the first PXE-boot entry might not work, because the system might boot from a later one; and moving it after the last one might not work, because there might be some intervening non-functional entry (boot to firmware setup, etc.). Maybe keep moving the ubuntu entry down until you hit a non-PXE entry (or the end of the list)? I don't think there's a perfect solution without BootCurrent -- but moving it to the second entry, or beyond the first network-boot option if you think it's worth writing the extra code, would be better than what we've got now.

Right; I'm happy to iterate on this until we're working reliably on
the machines you've got that demonstrate the failure.

As for the potential boot failure of an Network/PXE entry before
getting to Ubuntu (in the no MAAS scanerio); I don't think there's
much curtin can do about that since we've no way of knowing which
of those entries work and do not.

I think it's reasonable to move Ubuntu after the first network
entry; the logic being that any Network entry past the first once
is not likely to be the one used to PXE boot a machine as waiting
for more than one PXE failure before successfully booting the next
is likley a misconfiguration rather than a designed fallback
scenario.

Not sure if it's worth it, but we could configure whether
curtin pre-pends the first PXE/Net entry, or all entries.
Then different curtin config could be tweaked on a per-machine
bases.

So, in summary I think we have this:
If no BootCurrent, curtin will reorder the menu like this

1. the first (or all) PXE/Network boot entry
2. The newly installed entry
3. The other items in the boot order that are not in [1, 2]

If we cannot find any PXE/Network entry, we'll put the newly
installed entry in the second slot keeping whatever was in the
first spot prior to the install.

WDYT?