grub2 upgrade doesn't preserve current boot order.

Bug #1714090 reported by Andres Rodriguez
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This is a follow up discussion from #1642298 and filing this bug to keep a record.

On a fresh Ubuntu install, grub2 overwrites the NVRAM and updates to boot order to have Ubuntu as the first boot device. This affects situations in which PXE had been set the first boot device in the boot order as it would overwrite it.

However, every single time grub2 package upgrades, it will overwrite the NVRAM and completely override the boot order again. For example, consider that ubuntu is first in the boot order.

efibootmgr -v
BootCurrent: 0000
Timeout: 2 seconds
BootOrder: 0000,0001
Boot0000* ubuntu
Boot0001 PCI LAN

But the administrator changed Network to be the first in the boot order.

efibootmgr -v
BootCurrent: 0001
Timeout: 2 seconds
BootOrder: 0001,0000
Boot0001* PCI LAN
Boot0000 ubuntu

After a package upgrade, grub will overwrite the NVRAM and change this back to:

efibootmgr -v
BootCurrent: 0000
Timeout: 2 seconds
BootOrder: 0000,0001
Boot0000* ubuntu
Boot0001 PCI LAN

Revision history for this message
Andres Rodriguez (andreserl) wrote :

So the concern we are primarily interested on is the upgrade scenario.

Effectively, if 'ubuntu' is already in the boot order (and it is not the first one), on grub2 upgrade, it shouldn't be overwriting the current BootOrder (e.g. because we specifically configured it to Boot from the Network first).

So on package upgrades, it should be preserving the current boot order.

summary: - Upgrading grub2 shouldn't overwrite EFI boot order
+ grub2 upgrade doesn't preserve current boot order.
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Comment I made on https://bugs.launchpad.net/maas/+bug/1642298. Was requested to direct it here.

I think this is more of a GRUB issue overall instead of a MAAS issue directly. True it affects MAAS and we can do the debconf selections to work around this issue but overall for quality of Ubuntu I do not believe this is the proper fix.

I will give an example without MAAS.

1. First the user installs Ubuntu on a partition on their local disk, EFI is updated so Ubuntu can boot.
2. Second the user installs Windows on another partition. EFI is updates so Windows can boot and its first.
3. User reboots into Ubuntu, runs apt-get, and grub updates changing the boot order so now that Ubuntu boots first.
4. User reboots their machine and Ubuntu boots but the user expected Windows to boot.

Overall this is a bad experience to the user.

I think the grub code should be smart about this:

First check if the grub.efi loader already exists in efibootmgr. If it does not exists add it to the loader and set it to boot first. If it does exist record its current place in the boot order, update the loader and reset the boot order to its previous location.

That change would fix this for any user that uses Ubuntu as well as MAAS users.

Revision history for this message
Rod Smith (rodsmith) wrote :

I'm copying my comment #49 from bug #1642298 here....

Blake, your proposal makes sense on the surface; however, there are cases where it would cause problems. For instance, suppose that, outside of a MAAS environment, somebody installs Ubuntu, then installs Windows in a dual-boot configuration, then re-installs Ubuntu because Windows grabbed the boot process and the user couldn't figure out how to boot into Ubuntu. In this case, Ubuntu/GRUB would not then gain control of the boot process, which is what the user was hoping would happen, and (I THINK) what would happen today. Of course, re-installing Ubuntu was overkill in this scenario, and a little knowledge would go a long way to resolving the problem in an easier way; but I've seen posts on user forums from new users who do things like this. This isn't to say that your suggestion is a bad one; but implementing it would create some new problems of its own. They might be smaller than the ones we've got now, but they should be considered.

Three more points, should Blake's proposal be implemented:

First, and most importantly, the initial installation of GRUB in a MAAS environment would get it wrong, since as outlined, the proposal would give boot control to the local hard disk, which is exactly the problem we want to avoid. In such an environment, you'd need to install GRUB and ensure that it comes AFTER the PXE-boot option, otherwise the initial problem (MAAS losing control of nodes' boot process) would exist. Thus, either the GRUB package would need to take a cue from MAAS to leave the current top boot option in control (that is, install GRUB as the second or later boot option) or it would need enough smarts to figure this out itself. Given the wide variation in the way PXE-boot options appear in efibootmgr output, the former is likely to be more reliable than the latter.

Second, there's a potential implementation pitfall: There might be stale/invalid NVRAM entries that point to GRUB on non-existent devices. This could happen when MAAS redeploys a node, since the partition table will be wiped and new partitions created, but the NVRAM-based boot entries will be untouched. (Analogous things can happen in local/manual installations, too, of course.) The new EFI System Partition (ESP) will have a new GUID, which won't match the old one for the original installation. Thus, if the check for a reference to grubx64.efi doesn't include the GUID value (at a minimum; there are other identifying features, too), it might think the existing entry is valid, when in fact it's not. (Note that some, but not all, EFIs wipe invalid boot entries, so some computers might not exhibit this problem, but others will.)

Third, on systems that boot with Secure Boot active, the NVRAM entry will normally point to shimx64.efi, not grubx64.efi. In fact, this is usually the case even when Secure Boot is NOT active, or is unavailable; but with Secure Boot out of the picture, either binary should work to boot the computer.

Revision history for this message
Steve Langasek (vorlon) wrote :

In principle, we have all the necessary information to:

 - distinguish between a new install of the grub package and an upgrade
 - distinguish between a user-directed grub-install request and a maintainer-script-driven request
 - detect whether any of the currently configured boot options match what grub wants to create

So I don't see a design reason why we need to reorder the boot sequence as part of the grub package upgrade. If our boot entry has gone missing, ok, assume that's not intended and put it back (and put it back as the first boot option). But if the boot entry is still there, and we're obviously *running* package upgrades so we got into Ubuntu somehow, we can assume we don't need to reorder the boot entries.

Revision history for this message
Rod Smith (rodsmith) wrote :

Steve, there have been two problems that are feeding into this bug report, both on MAAS installations to EFI-based nodes, which must normally PXE-boot so that the MAAS server can maintain control of its nodes:

* After GRUB package updates, the GRUB package would add GRUB to the start
  of the boot order, thus removing MAAS's ability to control the node.
  (MAAS had been preventing the GRUB package from setting the boot order
  during the initial installation, but MAAS loses control of this detail
  when software is subsequently updated.) This would sometimes happen
  immediately after an installation, but other times it would take weeks or
  months before a new GRUB package would become available. Dann Frazier
  produced a patch to fix this, as noted in bug #1642298 (although I don't
  think his fix is actually linked to that bug report). Dann's fix worked
  by setting a debconf variable to prevent GRUB updates from making changes
  to the boot order.
* After Dann's fix was in place, it was noted that, because there was no
  GRUB entry in the boot order, if the MAAS server became inaccessible,
  nodes would become unbootable. I believe a bug was filed for this, but I
  don't happen to have a reference. In fixing this second bug, changes
  were made that caused a regression on bug #1642298, as noted in comments
  #21, #23, and later to that bug report.

I don't see a way to address the second issue without re-ordering the NVRAM-based boot order -- AFAIK, efibootmgr always adds a new entry as the first item, so either you'll have no GRUB entry (as in Dann's initial fix to bug #1642298), but this will leave the second issue unaddressed; or you'll have to change the order of boot entries created when the GRUB package creates a GRUB entry as the first one in NVRAM. Of course, leaving that second problem (nodes not booting if the MAAS server becomes inaccessible) unaddressed is another option -- although perhaps not to whoever encountered it.

Note also that when I say "GRUB entry," the entry may actually point to shimx64.efi. I don't think I've looked into what ITS packaging does; there might or might not be a parallel problem there. Dann traced the initial problem to the GRUB package, and that's where his fix was applied.

This all "just works" on conventional BIOS-based systems because they've got a much simpler boot configuration that isn't changed from the node itself, but that can be changed by MAAS, at least on servers that use IPMI. (Those IPMI features don't work on EFI-based computers, so they aren't helpful in resolving this problem, which applies only to EFI-based computers.)

Revision history for this message
Phillip Susi (psusi) wrote :

If you are PXE booting, then why do you have grub-efi installed? If you aren't using it, remove it.

Revision history for this message
Rod Smith (rodsmith) wrote :

Philip, after installation, MAAS configures nodes to PXE-boot, but the GRUB delivered chainloads a locally-installed GRUB on the hard disk. The PXE-boot part is a requirement of the MAAS environment. (If systems booted straight from the hard disk, the MAAS server could not re-deploy them.) Re-thinking the design so that only one GRUB is loaded (via PXE) might help with this problem, but would render the design fragile to the case when the MAAS server becomes unavailable.

tags: added: id-59a726ab19e4a300b45a8a4d
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.