Grub package upgrades overwrites NVRAM, causing MAAS boot order to be overwritten.

Bug #1642298 reported by Rod Smith on 2016-11-16
38
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
High
Andres Rodriguez
2.2
High
Unassigned
curtin
Medium
Unassigned
grub2 (Ubuntu)
Undecided
dann frazier
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Yakkety
Undecided
Unassigned
grub2-signed (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Yakkety
Undecided
Unassigned

Bug Description

[Impact]
Typically when you install Ubuntu on an EFI system, it installs a new default EFI boot entry that makes the system reboot directly into the OS. During MAAS installs, curtin is careful to disable that behavior. MAAS requires the default boot entry to remain PXE, so that it can direct the system to boot from disk or network as necessary. curtin does this by passing --no-nvram to grub-install when installing the bootloader.

*Update*: newer curtin releases actually allow the creation of a new boot entry, but updates the boot menu to make PXE the default. That change is orthogonal to this bug.

***However***, this doesn't stop a new default boot entry from being added after deploy. If the user installs a grub package update or manually runs 'grub-install', booting from disk will become the default, and MAAS will lose control of the system.

[Proposed Solution (er... glorified workaround)]
The GRUB package in zesty now has support for setting the --no-nvram flag *persistently*. This is implemented via a debconf template (grub2/update_nvram). If curtin sets this flag to "false" during install, post-deploy grub updates will also pass the --no-nvram flag when running grub-install.

This isn't a perfect solution - users can still call grub-install manually and omit this flag.

[Test Case]
 - MAAS deploy an EFI system.
 - After deploy, login and run 'sudo apt --reinstall install grub-efi-$(dpkg --print-architecture)
 - Reboot and observe that the system does not PXE boot.

[Regression Risk]
 - The GRUB implementation does not change the defaults of the package. The user would need to opt-in to the "grub2/update_nvram=false". This option is also only presented to users who specifically request a low debconf priority (e.g. expert mode installs).
 - XXX curtin risk XXX

Related branches

Rod Smith (rodsmith) wrote :
Rod Smith (rodsmith) wrote :

Some more information:

I've tried with another computer on another MAAS 2.0 server. What I discovered:

- The node was already deployed and running when I started. It had the
  "ubuntu" entry set as the default in "efibootmgr" output, suggesting
  that when it was last deployed (about a month ago?), the bug existed.
- I redeployed, and it worked as expected.

It's starting to look as if either the bug may have existed for a while but been corrected recently or it's rather inconsistent in the conditions that cause it to appear.

Larry Michel (lmic) wrote :

This is impacting onboarding. We won't be able to test these servers since this require a manual step to workaround the issue before every deployment. Our builds are automated, so they will fail to deploy.

tags: added: oil
Changed in maas:
status: New → Invalid
Ryan Harper (raharper) wrote :

Hi can you attach your curtin config and the install output?

maas 2.0
maas <session> machine get-curtin-config <system-id>

# On the node details page in the installation output section at the bottom of the page

Ryan Harper (raharper) on 2016-11-17
Changed in curtin:
importance: Undecided → Medium
status: New → Incomplete
Rod Smith (rodsmith) wrote :

Here's the output of "maas admin machine get-curtin-config node-71680a80-0be9-11e6-9e07-0023aeff4e6f" on wildorange, the one UEFI system permanently in place in certification's 1SS environment.

I've just redeployed it, and it exhibits the bug. It's never had this problem before.

Rod Smith (rodsmith) wrote :

Here's the installation output from my latest deployment of wildorange. Note that we use a custom /etc/maas/preseeds/curtin_userdata file to install the certification suite, which is rather big, so there's lots of extra software installation (and a few other things) in there.

On Thu, Nov 17, 2016 at 5:20 PM, Rod Smith <email address hidden> wrote:

> Here's the output of "maas admin machine get-curtin-config node-
> 71680a80-0be9-11e6-9e07-0023aeff4e6f" on wildorange, the one UEFI system
> permanently in place in certification's 1SS environment.
>

And do we know that this machine's set to PXE by default?

>
> I've just redeployed it, and it exhibits the bug. It's never had this
> problem before.
>

Curtin's calling grub the same way it has since Feb 2016,

 grub-install $target $efi_dir \
                --bootloader-id=ubuntu --recheck $no_nvram'

target=x86_64-efi
efi_dir=--efi-directory=/boot/efi
no_nvram="--no-nvram"

Let me see about putting up a curtin with shell execution tracing so we can
see the grub-install command running.

> ** Attachment added: "maas admin machine get-curtin-config
> node-71680a80-0be9-11e6-9e07-0023aeff4e6f"
> https://bugs.launchpad.net/maas/+bug/1642298/+attachment/
> 4778873/+files/wildorange-curtin-config.txt
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> MAAS UEFI install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>

I've uploaded curtin - 0.1.0~bzr432-0ubuntu1 to ppa:raharper/bugfixes which includes execution tracing during grub install, specifically dumping out efibootmgr output before and after the grub-install command. This should help see where (if any) change to the boot order is happening.

(Note this is output from a VM install which we explicity request nvram updates)

[ 124.997847] cloud-init[1431]: curtin uefi: installing grub-efi-amd64 to: /boot/efi
[ 125.001034] cloud-init[1431]: + echo before grub-install efiboot settings
[ 125.004153] cloud-init[1431]: before grub-install efiboot settings
[ 125.006996] cloud-init[1431]: + efibootmgr
[ 125.007626] cloud-init[1431]: Timeout: 0 seconds
[ 125.009512] cloud-init[1431]: BootOrder: 0000
[ 125.010146] cloud-init[1431]: Boot0000* UiApp
[ 125.012019] cloud-init[1431]: + dpkg-reconfigure grub-efi-amd64

<snip grub-install --help parsing output>

[ 125.604192] cloud-init[1431]: + grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=ubuntu --recheck
[ 125.606094] cloud-init[1431]: Installing for x86_64-efi platform.
[ 125.996485] cloud-init[1431]: Installation finished. No error reported.
[ 125.998933] cloud-init[1431]: + echo after grub-install efiboot settings
[ 126.000287] cloud-init[1431]: after grub-install efiboot settings
[ 126.001402] cloud-init[1431]: + efibootmgr
[ 126.003282] cloud-init[1431]: Timeout: 0 seconds
[ 126.003947] cloud-init[1431]: BootOrder: 0001,0000
[ 126.004926] cloud-init[1431]: Boot0000* UiApp
[ 126.005721] cloud-init[1431]: Boot0001* ubuntu

summary: - MAAS UEFI install sets computer to boot from hard disk
+ UEFI Xenial install sets computer to boot from hard disk (doesn't happen
+ with trusty).

I was able to recreate with Trusty.

summary: - UEFI Xenial install sets computer to boot from hard disk (doesn't happen
- with trusty).
+ UEFI Xenial install sets computer to boot from hard disk
dann frazier (dannf) wrote :

@Larry: Are you able to provide a console log using Ryan's debug curtin package?

I wonder if GRUB is getting upgraded post-install, thus causing a second grub-install command to run in the postinst. This is a known issue w/ MAAS on EFI systems - though it usually only bites when you upgrade GRUB after deploy. I haven't seen it occur *during* deploy. If this is the case, a patch like the following would allow curtin to preseed GRUB to not make changes to the EFI boot manager during upgrades.

Rod Smith (rodsmith) wrote :

I'm afraid that the problem has disappeared for me. It had been occurring quite reliably on our MAAS server and our one EFI system in 1SS, but it just stopped happening late last week. Following Larry's comment in an e-mail thread that he had no problem with a Trusty deployment, I tried that, but (probably coincidentally), the problem disappeared then and has not recurred with subsequent Xenial deployments, either. This is frustrating, of course, since we need to be able to reliably reproduce the problem to fix it.

That said, it occurred to me that there is a "nuclear option" that SHOULD eliminate the problem altogether: Pass "noefi" to the kernel as a default option, at least on EFI-based computers. This kernel option disables access to the EFI variable store, which should make it impossible for Ubuntu tools to adjust the boot order.

The down side, of course, is that this will also make it impossible to take advantage of EFI-specific features that might be desirable -- it won't be possible to check the boot order, check whether Secure Boot is active, use fwupdate to update firmware, etc. I have tested it on deployments of both an EFI-based and a BIOS-only computer (via the "Global Kernel Parameters" field at http://localhost/MAAS/settings/ on the MAAS server) and it does seem to disable efibootmgr as expected on the EFI system, while causing no obvious problems on the BIOS system. Because the problem has disappeared for me, I can't confirm that it will definitely solve the problem.

There are at least three ways this option might be employed:

* On ephemerals -- Unless the ephemerals used for enlistment, commissioning,
  and deployment require access to EFI variables, passing "noefi" to their
  kernels by default makes sense; this practice should prevent accidentally
  setting EFI variables inappropriately during deployment.
* As a default on installed systems -- MAAS could set this option as a
  default on all deployed systems. Because of the down sides just mentioned,
  I'm reluctant to advocate for this, but it may deserve consideration.
* On an installation-by-installation basis -- Individual administrators
  can set this option, as I did in my tests, if they run into this bug.
  This would basically be an individual workaround for this bug until
  a better resolution is reached.

One question I have is whether the kernel options set via the "Global Kernel Parameters" field in the web UI apply to the ephemerals. If they don't, and if the problem is occurring late in deployment (but before the system reboots into the installed system), then setting "noefi" in that field will not be a useful workaround for this bug.

Blake Rouse (blake-rouse) wrote :

Global parameters are applied always, enlistment, commission, deployment, disk erasing. They are also installed to the deployed OS so booting the installed kernel get the same parameters.

dann frazier (dannf) wrote :

@Rod: noefi would have the benefit of preventing the OS from making changes to the boot manager entries, but I agree that it is a pretty "nuclear" change since it would disable all runtime services (such as the RTC). I think we could look into some lighter weight options, e.g. somehow making the efivars subdir read-only, or hiding it under another mount. I'd really hoped there was a way we could do this from EFI itself (e.g. unsetting the NV flag for those vars) - that would have the benefit of working w/ other operating systems - but I haven't found an obvious way to do that yet.

Rod Smith (rodsmith) wrote :

I've tested, as best I can, Dann's hypothesis that a post-install GRUB update is causing the problems. Dann provided a version-bump GRUB in his PPA:

https://launchpad.net/~dannf/+archive/ubuntu/test

After deploying normally (and with no problems), I updated with that PPA, and the update caused the EFI boot variable to be set to boot from disk rather than from network. I also tried using a custom /etc/maas/preseeds/curtin_userdata file (attached), which caused the system to update GRUB as part of the installation. This also caused the EFI boot variable to be set to boot from GRUB on disk rather than PXE-booting. (Note that my custom curtin_userdata file is a minor hack of the file we use in server certification and so includes some cruft that's irrelevant to this bug report.)

Pushing forward on this hypothesis, the question becomes how to prevent GRUB's installation scripts from messing with the EFI firmware variables. This may require coordinating with GRUB's packagers or finding some other workaround.

dann frazier (dannf) wrote :

Though I think we really do need an OS-agnostic solution long-term, the short(er)-term option I like the best is to introduce the grub2/update_nvram preseed so that curtin can configure grub to not install a boot entry in a more persistent way that survives grub package upgrades. I discussed this w/ Ryan at a sprint last month, and he seemed ok with it from the curtin side. Assuming that is still the case, I'll prepare a PR for the Grub side and see if I can get that preseed option landed.

Changed in curtin:
status: Incomplete → Confirmed
Changed in grub2 (Ubuntu):
status: New → Confirmed
assignee: nobody → dann frazier (dannf)
dann frazier (dannf) wrote :

Pull request for grub2:

The following changes since commit fffdd1085e34858f21ba823105b655726db04aba:

  Drop build-dependency on libxen-dev, unnecessary now that upstream has taken a copy of the necessary public headers. (2016-11-05 15:45:15 +0000)

are available in the git repository at:

  git://git.launchpad.net/~dannf/grub2 lp1642298

for you to fetch changes up to dabbd8feacb41ec5f8fc587c4c10ae3a62266fdc:

  Add grub2/update_nvram template to allow users to disable NVRAM updates during package upgrades (LP: #1642298). (2017-01-19 15:12:59 -0700)

----------------------------------------------------------------
dann frazier (1):
      Add grub2/update_nvram template to allow users to disable NVRAM updates during package upgrades (LP: #1642298).

 debian/changelog | 5 +++++
 debian/config.in | 5 +++++
 debian/postinst.in | 13 +++++++++++--
 debian/templates.in | 10 ++++++++++
 4 files changed, 31 insertions(+), 2 deletions(-)

tags: added: patch
dann frazier (dannf) wrote :

The grub2 changes are in zesty as of 2.02~beta3-4ubuntu1. Could I ask the curtin developers to take a look at implementing the necessary changes there? Specifically, using the new grub2/update_nvram preseed instead of manually passing --no-nvram to grub-install during install.

Changed in grub2 (Ubuntu):
status: Confirmed → Fix Released
Changed in grub2 (Ubuntu Trusty):
status: New → Triaged
Changed in grub2 (Ubuntu Yakkety):
status: New → Triaged
Changed in grub2 (Ubuntu Xenial):
status: New → Triaged

On Thu, Feb 9, 2017 at 4:31 PM, dann frazier <email address hidden>
wrote:

> The grub2 changes are in zesty as of 2.02~beta3-4ubuntu1. Could I ask
> the curtin developers to take a look at implementing the necessary
> changes there? Specifically, using the new grub2/update_nvram preseed
> instead of manually passing --no-nvram to grub-install during install.
>

Can you expand on what changes are needed? Only on certain versions of
grub2 can we skip passing the flag IIF it was going to pass it?

All arches? arm-only?

dann frazier (dannf) wrote :

On Feb 9, 2017 16:11, "Ryan Harper" <email address hidden> wrote:

On Thu, Feb 9, 2017 at 4:31 PM, dann frazier <email address hidden>
wrote:

> The grub2 changes are in zesty as of 2.02~beta3-4ubuntu1. Could I ask
> the curtin developers to take a look at implementing the necessary
> changes there? Specifically, using the new grub2/update_nvram preseed
> instead of manually passing --no-nvram to grub-install during install.
>

Can you expand on what changes are needed? Only on certain versions of
grub2 can we skip passing the flag IIF it was going to pass it?

Good questions.

My thought is, do the preseed all the time. It won't change the behavior of
older GRUB packages, but won't hurt either. It might make sense to also
continue to do the manual grub-install --no-nvram every time. That will
also be safe with any version of GRUB.

All arches? arm-only?

It only has an effect on EFI and power systems. I know we want it on all
EFI (x86 or ARM). I don't know enough about power to say for sure, but
seems like it would also benefit. Would need to find a power/MAAS person to
confirm.

   -dann

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1642298

Title:
  UEFI Xenial install sets computer to boot from hard disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions

Launchpad-Notification-Type: bug
Launchpad-Bug: product=curtin; status=Confirmed; importance=Medium;
assignee=None;
Launchpad-Bug: product=maas; status=Invalid; importance=Undecided;
assignee=None;
Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main;
status=Fix Released; importance=Undecided; assignee=dann.frazier@
canonical.com;
Launchpad-Bug: distribution=ubuntu; distroseries=trusty;
sourcepackage=grub2; component=main; status=Triaged; importance=Undecided;
assignee=None;
Launchpad-Bug: distribution=ubuntu; distroseries=xenial;
sourcepackage=grub2; component=main; status=Triaged; importance=Undecided;
assignee=None;
Launchpad-Bug: distribution=ubuntu; distroseries=yakkety;
sourcepackage=grub2; component=main; status=Triaged; importance=Undecided;
assignee=None;
Launchpad-Bug-Tags: oil patch
Launchpad-Bug-Information-Type: Public
Launchpad-Bug-Private: no
Launchpad-Bug-Security-Vulnerability: no
Launchpad-Bug-Commenters: blake-rouse dannf lmic raharper rodsmith
Launchpad-Bug-Reporter: Rod Smith (rodsmith)
Launchpad-Bug-Modifier: Ryan Harper (raharper)
Launchpad-Message-Rationale: Subscriber
Launchpad-Message-For: dannf

Ryan Harper (raharper) wrote :
Download full text (3.6 KiB)

On Thu, Feb 9, 2017 at 9:23 PM, dann frazier <email address hidden>
wrote:

> On Feb 9, 2017 16:11, "Ryan Harper" <email address hidden> wrote:
>
> On Thu, Feb 9, 2017 at 4:31 PM, dann frazier <email address hidden>
> wrote:
>
> > The grub2 changes are in zesty as of 2.02~beta3-4ubuntu1. Could I ask
> > the curtin developers to take a look at implementing the necessary
> > changes there? Specifically, using the new grub2/update_nvram preseed
> > instead of manually passing --no-nvram to grub-install during install.
> >
>
> Can you expand on what changes are needed? Only on certain versions of
> grub2 can we skip passing the flag IIF it was going to pass it?
>
>
> Good questions.
>
> My thought is, do the preseed all the time. It won't change the behavior of
>

What does this preseed look like?

> older GRUB packages, but won't hurt either. It might make sense to also
>

I'd like to avoid doing a dpkg-reconfigure unless it's going to do something
otherwise we're just wasting install time for platforms and versions that
don't need it.

> continue to do the manual grub-install --no-nvram every time. That will
> also be safe with any version of GRUB.
>

Yes.

>
> All arches? arm-only?
>
>
> It only has an effect on EFI and power systems. I know we want it on all
> EFI (x86 or ARM). I don't know enough about power to say for sure, but
> seems like it would also benefit. Would need to find a power/MAAS person to
> confirm.
>

Pointer to the change of behavior? That might help explain.

We can help review and guide an MP with changes to curtin but without
one of the systems (or a way to recreate the scenario) it's not easy for
us to understand the changes needed to curtin.

Happy to discuss on irc as well, #curitn on freenode

> -dann
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=curtin; status=Confirmed; importance=Medium;
> assignee=None;
> Launchpad-Bug: product=maas; status=Invalid; importance=Undecided;
> assignee=None;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main;
> status=Fix Released; importance=Undecided; assignee=dann.frazier@
> canonical.com;
> Launchpad-Bug: distribution=ubuntu; distroseries=trusty;
> sourcepackage=grub2; component=main; status=Triaged; importance=Undecided;
> assignee=None;
> Launchpad-Bug: distribution=ubuntu; distroseries=xenial;
> sourcepackage=grub2; component=main; status=Triaged; importance=Undecided;
> assignee=None;
> Launchpad-Bug: distribution=ubuntu; distroseries=yakkety;
> sourcepackage=grub2; component=main; status=Triaged; importance=Undecided;
> assignee=None;
> Launchpad-Bug-Tags: oil patch
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: blake-rouse dannf lmic raharper rodsmith
> Launchpad-Bug-Reporter: Rod Smith (rodsmith)
> La...

Read more...

dann frazier (dannf) on 2017-02-10
description: updated
description: updated

I noticed that a GRUB SRU is about to hit -updates that will trigger this bug on existing systems. Though too late to avoid that, I did want to recheck on the status of this bug on the curtin side.

I think this is related and should address things.

http://bazaar.launchpad.net/~curtin-dev/curtin/trunk/revision/503

This curtin is available in the daily PPA, and is in process for SRU'ing

https://launchpad.net/~curtin-dev/+archive/ubuntu/daily

If that's not applicable, I'm not aware of any curtin changes to address
the issue.

On Wed, Jun 14, 2017 at 2:13 PM, dann frazier <email address hidden>
wrote:

> I noticed that a GRUB SRU is about to hit -updates that will trigger
> this bug on existing systems. Though too late to avoid that, I did want
> to recheck on the status of this bug on the curtin side.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>

I'm seeing a regression on this bug, as of 14 June 2017. If it's curtin-common on the MAAS server that's relevant, here's the version information:

$ dpkg -s curtin-common | grep Version
Version: 0.1.0~bzr470-0ubuntu1~16.04.1

The suggestion of creating the GRUB/Ubuntu entry but putting it lower in the boot list than the PXE-boot option sounds fine to me, with the caveat that some EFIs do pretty weird (buggy) things, so I suspect that SOME systems will break. There's no way of knowing how common such problems will be, or indeed if any really WILL break, without trying that suggested fix.

Cool, yeah - that looks like a more complete solution.

On Wed, Jun 14, 2017 at 2:35 PM, Ryan Harper <email address hidden> wrote:
> I think this is related and should address things.
>
> http://bazaar.launchpad.net/~curtin-dev/curtin/trunk/revision/503
>
> This curtin is available in the daily PPA, and is in process for SRU'ing
>
> https://launchpad.net/~curtin-dev/+archive/ubuntu/daily
>
>
> If that's not applicable, I'm not aware of any curtin changes to address
> the issue.
>
>
> On Wed, Jun 14, 2017 at 2:13 PM, dann frazier <email address hidden>
> wrote:
>
>> I noticed that a GRUB SRU is about to hit -updates that will trigger
>> this bug on existing systems. Though too late to avoid that, I did want
>> to recheck on the status of this bug on the curtin side.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1642298
>>
>> Title:
>> UEFI Xenial install sets computer to boot from hard disk
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=curtin; status=Confirmed; importance=Medium; assignee=None;
> Launchpad-Bug: product=maas; status=Invalid; importance=Undecided; assignee=None;
> Launchpad-Bug: distribution=ubuntu; sourcepackage=grub2; component=main; status=Fix Released; importance=Undecided; <email address hidden>;
> Launchpad-Bug: distribution=ubuntu; distroseries=trusty; sourcepackage=grub2; component=main; status=Triaged; importance=Undecided; assignee=None;
> Launchpad-Bug: distribution=ubuntu; distroseries=xenial; sourcepackage=grub2; component=main; status=Triaged; importance=Undecided; assignee=None;
> Launchpad-Bug: distribution=ubuntu; distroseries=yakkety; sourcepackage=grub2; component=main; status=Triaged; importance=Undecided; assignee=None;
> Launchpad-Bug-Tags: oil patch
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: blake-rouse dannf lmic raharper rodsmith
> Launchpad-Bug-Reporter: Rod Smith (rodsmith)
> Launchpad-Bug-Modifier: Ryan Harper (raharper)
> Launchpad-Message-Rationale: Subscriber
> Launchpad-Message-For: dannf

dann frazier (dannf) wrote :

On Wed, Jun 14, 2017 at 3:13 PM, dann frazier
<email address hidden> wrote:
> Cool, yeah - that looks like a more complete solution.

I've tested the latest curtin daily, and found that it does not
address this problem.
While http://bazaar.launchpad.net/~curtin-dev/curtin/trunk/revision/503
seems like a good *install-time* improvement, it does not prevent a
subsequent GRUB package upgrade from making Ubuntu the default boot
entry.

We still need a runtime solution, such as having curtin set
grub2/update_nvram to False.

dann frazier (dannf) on 2017-06-30
description: updated

I think having MAAS tell curtin to set the grub2/update_nvram to False, works fine. Since the path to the EFI loader should not change this is acceptable.

Having grub adjust the EFI vars because of an upgrade is really bad in MAAS case because a following re-deploy will no longer work.

Phillip Susi (psusi) wrote :

If you are booting from PXE, then why install grub-efi at all?

David Britton (davidpbritton) wrote :

Some Testing Scenarios and concerns before we agree that this change should be made in curtin:

1) After this change, after curtin boots first time, is EFI boot order a)pxe b)local disk

2) This doesn't appear scoped to grub2-efi, but modifies grub2/update_nvram, what effect does this have on other bios/firmware environments that rely on grub?

3) Does a new kernel install work correctly?

On Fri, Jul 7, 2017 at 2:44 PM, David Britton
<email address hidden> wrote:
> Some Testing Scenarios and concerns before we agree that this change
> should be made in curtin:

I'm not sure if you're asking about the design, or test results on a
given implementation, so I'll answer with respect to the design I've
proposed.

> 1) After this change, after curtin boots first time, is EFI boot order
> a)pxe b)local disk

This proposal does not involve changing the install-time EFI boot order at all.

As I understand it, curtin now installs a new boot entry for the
target OS - that is, it no longer passes --no-nvram to grub-install.
But, prior to reboot, it restores the previous default boot entry
(PXE). This proposal would not change this. grub2/update_nvram does
not change the behavior of grub-install called directly - just when
called by the grub maintainer scripts. Of course, this should be
verified when we have a testable implementation.

> 2) This doesn't appear scoped to grub2-efi, but modifies
> grub2/update_nvram, what effect does this have on other bios/firmware
> environments that rely on grub?

The impacted archs are described here:
https://bugs.launchpad.net/maas/+bug/1642298/comments/19

In short, EFI-based and POWER systems will be impacted. AFAICT, this
change is also beneficial for POWER, for teh same reasons, and it
should certainly be verified there before merging.

> 3) Does a new kernel install work correctly?

The value of grub2/update_nvram is orthogonal to a new kernel install.
A new kernel install does not trigger a call to grub-install, it just
updates the configuration of the existing grub installation
(update-grub).

  -dann

dann,

I think you are correct that 'grub2/update_nvram' is still not set by curtin and an upgrade of grub could break the machine.

On Fri, Jun 30, 2017 at 3:06 PM, Blake Rouse <email address hidden>
wrote:

> I think having MAAS tell curtin to set the grub2/update_nvram to False,
> works fine. Since the path to the EFI loader should not change this is
> acceptable.
>
> Having grub adjust the EFI vars because of an upgrade is really bad in
> MAAS case because a following re-deploy will no longer work.
>

We currently have a config key:

grub:
  update_nvram: <boolean: default False>

Which tells curtin whether or not to pass the --no-nvram flag to
grub-install

We could also set the grub2/update_nvram debconf value to match this config.
Will that achieve the desired goal?
Is there a use-case for a separate value used during grub-install than from
what one would put in the debconf value?

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>

dann frazier (dannf) wrote :

On Mon, Jul 31, 2017 at 12:07 PM, Ryan Harper
<email address hidden> wrote:
> On Fri, Jun 30, 2017 at 3:06 PM, Blake Rouse <email address hidden>
> wrote:
>
>> I think having MAAS tell curtin to set the grub2/update_nvram to False,
>> works fine. Since the path to the EFI loader should not change this is
>> acceptable.
>>
>> Having grub adjust the EFI vars because of an upgrade is really bad in
>> MAAS case because a following re-deploy will no longer work.
>>
>
> We currently have a config key:
>
> grub:
> update_nvram: <boolean: default False>
>
> Which tells curtin whether or not to pass the --no-nvram flag to
> grub-install
>
> We could also set the grub2/update_nvram debconf value to match this config.
> Will that achieve the desired goal?

If grub2/update_nvram is always false, then yes. But if it were set to
true, that would regress this issue.

> Is there a use-case for a separate value used during grub-install than from
> what one would put in the debconf value?

AIUI, curtin now *does* update NVRAM at install, but then reorders the
boot entries after the fact to keep PXE booting as the default. I'm
not sure if this is related to the config key or not. Even so, we
would still want update_nvram to be set to false after install, to
avoid grub updates from overriding the PXE boot default.

  -dann

Ryan Harper (raharper) wrote :

On Tue, Aug 1, 2017 at 2:39 PM, dann frazier <email address hidden>
wrote:

> On Mon, Jul 31, 2017 at 12:07 PM, Ryan Harper
> <email address hidden> wrote:
> > On Fri, Jun 30, 2017 at 3:06 PM, Blake Rouse <email address hidden>
> > wrote:
> >
> >> I think having MAAS tell curtin to set the grub2/update_nvram to False,
> >> works fine. Since the path to the EFI loader should not change this is
> >> acceptable.
> >>
> >> Having grub adjust the EFI vars because of an upgrade is really bad in
> >> MAAS case because a following re-deploy will no longer work.
> >>
> >
> > We currently have a config key:
> >
> > grub:
> > update_nvram: <boolean: default False>
> >
> > Which tells curtin whether or not to pass the --no-nvram flag to
> > grub-install
> >
> > We could also set the grub2/update_nvram debconf value to match this
> config.
> > Will that achieve the desired goal?
>
> If grub2/update_nvram is always false, then yes. But if it were set to
> true, that would regress this issue.
>

OK, curtin defaults to false, and MAAS is what drives that setting. I
don;t know if/when they set that value to True

It sounds like grub2 should always have that value when it's installed
then? Is that the case with new installs?
Are we just attempting to plug upgrades of images which have an older grub2?

>
> > Is there a use-case for a separate value used during grub-install than
> from
> > what one would put in the debconf value?
>
> AIUI, curtin now *does* update NVRAM at install, but then reorders the
>

Only if MAAS sets update_nvram to true; otherwise we pass the no-nvram flag

> boot entries after the fact to keep PXE booting as the default. I'm
> not sure if this is related to the config key or not. Even so, we
> would still want update_nvram to be set to false after install, to
> avoid grub updates from overriding the PXE boot default.
>

The reordering of the UEFI menu is not related to the update_nvram flag;
That always happens on UEFI installs now, primarily to allow for booting
nodes when MAAS is offline (no PXE)

>
> -dann
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>

update_nvram is now default for curtin and re-ordering is also default for UEFI installs. I don't think setting grub2/update_nvram in debconf should be a default. To me MAAS should tell curtin to set that debconf as that is overriding a package default.

On Wed, Aug 2, 2017 at 8:12 AM, Blake Rouse <email address hidden>
wrote:

> update_nvram is now default for curtin and re-ordering is also default
>

MAAS send update_nvram as True ? or False by default?

> for UEFI installs. I don't think setting grub2/update_nvram in debconf
> should be a default. To me MAAS should tell curtin to set that debconf
> as that is overriding a package default.
>

OK, do we want to re-use the update_nvram curtin config to *ALSO* set the
debconf value? Or do we need to be able to send/set those values
independently?

>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
>

Blake Rouse (blake-rouse) wrote :
Download full text (3.9 KiB)

On Wed, Aug 2, 2017 at 9:49 AM, Ryan Harper <email address hidden>
wrote:

> On Wed, Aug 2, 2017 at 8:12 AM, Blake Rouse <email address hidden>
> wrote:
>
> > update_nvram is now default for curtin and re-ordering is also default
> >
>
> MAAS send update_nvram as True ? or False by default?
>

MAAS sends nothing. The default is True, so MAAS uses the default.

>
>
> > for UEFI installs. I don't think setting grub2/update_nvram in debconf
> > should be a default. To me MAAS should tell curtin to set that debconf
> > as that is overriding a package default.
> >
>
> OK, do we want to re-use the update_nvram curtin config to *ALSO* set the
> debconf value? Or do we need to be able to send/set those values
> independently?
>

You could re-use it but that would mean by default curtin would always set
that for an installed system. By default having curtin update the nvram on
install makes since, even using subiquity. But overriding the packing of
grub of the installed system is more of a MAAS specific case to me, so I
would think it should be its own setting, that MAAS must explicitly set to
True.

>
>
> >
> > --
> > You received this bug notification because you are subscribed to the bug
> > report.
> > https://bugs.launchpad.net/bugs/1642298
> >
> > Title:
> > UEFI Xenial install sets computer to boot from hard disk
> >
> > To manage notifications about this bug go to:
> > https://bugs.launchpad.net/curtin/+bug/1642298/+subscriptions
> >
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1642298
>
> Title:
> UEFI Xenial install sets computer to boot from hard disk
>
> Status in curtin:
> Confirmed
> Status in MAAS:
> Invalid
> Status in grub2 package in Ubuntu:
> Fix Released
> Status in grub2 source package in Trusty:
> Triaged
> Status in grub2 source package in Xenial:
> Triaged
> Status in grub2 source package in Yakkety:
> Triaged
>
> Bug description:
> [Impact]
> Typically when you install Ubuntu on an EFI system, it installs a new
> default EFI boot entry that makes the system reboot directly into the OS.
> During MAAS installs, curtin is careful to disable that behavior. MAAS
> requires the default boot entry to remain PXE, so that it can direct the
> system to boot from disk or network as necessary. curtin does this by
> passing --no-nvram to grub-install when installing the bootloader.
>
> *Update*: newer curtin releases actually allow the creation of a new
> boot entry, but updates the boot menu to make PXE the default. That
> change is orthogonal to this bug.
>
> ***However***, this doesn't stop a new default boot entry from being
> added after deploy. If the user installs a grub package update or
> manually runs 'grub-install', booting from disk will become the
> default, and MAAS will lose control of the system.
>
> [Proposed Solution (er... glorified workaround)]
> The GRUB package in zesty now has support for setting the --no-nvram
> flag *persistently*. This is implemented via a debconf template
> (grub2/update_nvram). If curtin sets this flag to "false" during install,
> post-deploy grub updates will also pass the --n...

Read more...

Targeting for MAAS 2.3.0 since it sounds like MAAS changes are required.

Changed in maas:
status: Invalid → Triaged
importance: Undecided → High
milestone: none → 2.3.0
Andres Rodriguez (andreserl) wrote :

Hey guys.

Sorry for joining the party late. But let me see if I understand the full extent of the problem here.

0. A machine PXE boots off MAAS, installs Ubuntu in it, grub gets installed and configured without updating the NVRAM.

1. The machine reboots, PXE's, boots from local disk, and finishes installation (an unnecessary step).

2. An upgrade of <grub2 package> overwrites NVRAM to boot from disk.

3. A reboot boots from disk directly and not from network.

4. If we re-deploy, the machine is told to PXE boot via the BMC.

5. If we re-deploy the machine, the machine PXE boots off MAAS again and installs. The machine reboots and boots off the disk.

Now, the above only becomes a real problem when. The machine does not support changing the boot order remotely via IPMI or other power management mechanism.

 - That, however, is a certification problem. MAAS expects to be able to control the boot order of the machine. If this is not possible, it is a certification challenge.

That said, regardless of being able to set the boot order or not, I see a completely different problem:

1. The administrator configured their BIOS with a specific boot order.
2. The administrator installs ubuntu, expecting their BIOS boot order configuration to remain (somewhat) unchanged (specially when it is set to boot from network first).
3. The software (grub2) overrides the BIOS' boot order to boot from disk. This overrides the fact that the administrator set the BIOS boot order to PXE first.

So, based on this, the REAL bug that needs to be fixed is that grub needs to somewhat respect the boot order set in the BIOS. For example:

1. If BIOS Boot order before Ubuntu install is: 1. PXE, 2. HDD1, 3. HDD2
2. And we install Ubuntu on HDD2, then grub needs to /somewhat/ preserve the boot order and make it so: 1. PXE, 2. HDD2, 3. HDD1.

To conclude:
1. The real bug is in grub, and this needs to be addressed to properly preserve the boot order in a sane way if PXE/Network boot is set as the first boot type on the BIOS boot order.

2. If we want to make work arounds in the meantime, I think it is acceptable to say that curtin should tell grub to disable nvram updating by default, and only enabled it if the users wishes so, so we can respect what was set on the firmware.

Either way, IMHO, the real solution is to fix grub.

Changed in maas:
status: Triaged → Incomplete
importance: High → Undecided
Rod Smith (rodsmith) wrote :

Andres, fundamentally the problem is that EFI provides no means to control the boot order via IPMI; it's set via non-IPMI firmware mechanisms or via the efibootmgr tool in Ubuntu. That is, point #4 in your first list is incorrect, and everything falls apart after that. When a GRUB package uses efibootmgr to set GRUB as the default boot option, the configuration to PXE-boot first is overridden and MAAS loses control of the computer's boot process. Although I, the original reporter, am on the server certification team, this isn't a certification issue per se; it affects ANYBODY who uses MAAS to regularly redeploy nodes. Larry Michel, then on the OIL team, noted that it affects OIL in comment #3. Sooner or later, customers will be affected by this, too.

You can view this as a problem with EFI and/or how IPMI interacts with EFI if you like, but that won't resolve the problem. Like it or not, the industry is moving to EFI, and my understanding is that the big players in this realm aren't interested in providing a way for IPMI to set the boot order on EFI-based systems. Even if they did, at this late date there'd still be a large installed base of computers that wouldn't work with any mechanism that might be developed.

What we (Canonical) CAN control is how and when we use efibootmgr to adjust the boot order. This is PARTLY a GRUB packaging issue, but we need some way to tell that package whether or not to set itself as the default boot option. Blake, Ryan, and Dann are discussing ways to use debconf variables to tell the GRUB package what to do, and if I'm understanding correctly, MAAS will have to set such a variable in a preseed file to produce results that keep nodes controllable by MAAS.

Of course, once a node is deployed, its owner could log in and mess things up. There's not much we can do about that, but at least we can ensure that our own tools and procedures don't mess things up.

Andres Rodriguez (andreserl) wrote :

Hi Rod,

Judging by your message in #2, unless I'm missing something, you would have not been able to re-deploy the machine because efibootmgr had set the boot order to the disk first. When you re-deployed the machine, MAAS told that machine that "on the next boot, PXE boot".
"
- The node was already deployed and running when I started. It had the
  "ubuntu" entry set as the default in "efibootmgr" output, suggesting
  that when it was last deployed (about a month ago?), the bug existed.
- I redeployed, and it worked as expected.
"

As far as IPMI, you do have a way to ensure that the next boot, boots from PXE. Example EFI boot config for freeipmi-tools:

Section Chassis_Boot_Flags
 Boot_Flags_Persistent Yes
 BIOS_Boot_Type EFI
 Boot_Device PXE
EndSection

And exactly, we as Canonical can control how and when we use efibootmgr to adjust the boot order. The fact that we are, by default, completely eliminating the boot order that has been set on the BIOS by the administrator, is IMHO, the wrong approach. If the administrator has, by default, set PXE as the first in the boot order, then grub should not be overriding such configuration. There is a reason why the administrator has set the boot order to PXE first and Ubuntu shouldn't be changing that.

While I understand you guys have been discussing the solution, and whatever the solution may be, that doesn't change the fact that both Blake and I agree in the fact that grub shouldn't be changing the boot order by default. The fact that it is currently doing it overriding BIOS settings and we want to work around it, is a completely different matter. Also, it doesn't mean that curtin shouldn;t be setting this by default. IMHO, as I already expressed, curtin should ensure setting the grub to not update the nvram to ensure that curtin maintains what it does today.

Today, after installation, curtin uses efibootmgr *AND* ensures that PXE is first on the boot order, and the disk is second as a fallback. As such, curtin needs to ensure, by default that grub does not overwrite the efibootmgr configuration that curtin already made.

On Wed, Aug 30, 2017 at 1:44 PM, Andres Rodriguez
<email address hidden> wrote:
> Hi Rod,
>
> Judging by your message in #2, unless I'm missing something, you would have not been able to re-deploy the machine because efibootmgr had set the boot order to the disk first. When you re-deployed the machine, MAAS told that machine that "on the next boot, PXE boot".
> "
> - The node was already deployed and running when I started. It had the
> "ubuntu" entry set as the default in "efibootmgr" output, suggesting
> that when it was last deployed (about a month ago?), the bug existed.
> - I redeployed, and it worked as expected.
> "
>
> As far as IPMI, you do have a way to ensure that the next boot, boots
> from PXE. Example EFI boot config for freeipmi-tools:
>
> Section Chassis_Boot_Flags
> Boot_Flags_Persistent Yes
> BIOS_Boot_Type EFI
> Boot_Device PXE
> EndSection

hey Andres,

The BIOS_Boot_Type and Boot_Device fields are mutually exclusive. If a
system supports both EFI and Legacy mode, BIOS_Boot_Type allows you to
choose between them. Boot_Device only works when booting in Legacy
mode - See section 28.13 of the IPMI Spec. It has no effect in EFI
mode for any implementation I've seen.

 - dann

In ipmitool 1.18.17+

$ ipmitool -I lanplus -U xxx -P xxx -H X.X.X.X chassis bootdev pxe options=persistent,efiboot
Set Boot Device to pxe

$ ipmitool -I lanplus -U xxx -P xxx -H X.X.X.X chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: a004020000
 Boot Flags :
   - Boot Flag Valid
   - Options apply to only next boot
   - BIOS EFI boot
   - Boot Device Selector : Force PXE
   - Console Redirection control : System Default
   - Lock Out Sleep Button
   - BIOS verbosity : Request console redirection be enabled
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST

David Britton (davidpbritton) wrote :

14:34 rharper │ http://curtin.readthedocs.io/en/latest/topics/config.html#debconf-selections
14:34 rharper │ we already have a generic debconf-selections in curtin, so I believe maas could just send the grub value needed to
                  │ control the grub2 behavior
14:35 dpb │ rharper: ooooh, beautiful
14:35 roaksoax │ rharper: cool

From IRC above. I believe with this, the MAAS task should be considered as new.

summary: - UEFI Xenial install sets computer to boot from hard disk
+ Grub upgrades overwrites NVRAM cuasing MAAS/curtin boot order to be
+ overwriten
Changed in maas:
status: Incomplete → New

The work around is to pass a debconf configuration option to grub2 package to not update the NVRAM on package upgrade.

Changed in maas:
assignee: nobody → Andres Rodriguez (andreserl)
status: New → Confirmed
importance: Undecided → High
summary: - Grub upgrades overwrites NVRAM cuasing MAAS/curtin boot order to be
- overwriten
+ Grub package upgrades overwrites NVRAM, causing MAAS boot order to be
+ overwritten.
Rod Smith (rodsmith) wrote :

Andres, it's perfectly possible to redeploy a system after it's been set to boot from the local disk by either adjusting the boot order on the node or by deleting GRUB from the local disk. I don't recall which of those I did in the procedure noted in my comment #2, but it was likely one of those things; I simply didn't mention it; probably I thought it was obvious that I was working around the problem.

I'm not familiar with freeipmi-tools; however, I've just tested the following with ipmitool, and neither forced a server to PXE-boot:

ipmitool -H 10.20.30.13 -I lanplus -U user -P pass chassis bootparam set force_pxe
ipmitool -H 10.20.30.13 -I lanplus -U pass -P pass chassis bootdev pxe

Both times, ipmitool reported that the node was set to PXE-boot, but in both cases, the boot order reported by efibootmgr did not change, and when I rebooted, the node booted from the first item on that list (that is, not PXE-booting). I verified the boot order both by watching the boot process on the terminal and noting no PXE-boot attempt and by verifying via efibootmgr, once the system was up, that the BootCurrent variable was set to the "ubuntu" entry, not to a PXE-boot entry. I did this testing on betelnut, a Dell PowerEdge T110 in Lexington.

As to what GRUB should be doing on installation, that's different for MAAS vs. a non-network install (say, your laptop). EFI is explicitly designed to support multiple boot loaders on a disk, so an OS that installs in a non-PXE-boot way *MUST* register itself with the firmware, and in practice, OSes normally set their boot loaders as the default. Thus, changes to the boot order on OS installation are normal and expected behavior in the EFI world. MAAS is the special case here; if a MAAS-deployed OS should continue to boot with the help of the MAAS server, then the OS should either not register itself or register itself but ensure that the PXE-boot option(s) come earlier in the boot list than the local boot loader.

Note that, back in February, Dann Frazier submitted a change to GRUB and/or curtin that fixed this bug by causing the behavior you describe as optimal -- that is, the GRUB package no longer touched the EFI boot order when it was installed from the network, and it kept a debconf variable to ensure that future GRUB updates didn't cause problems (those updates, really, are the problem). (I don't see Dann's changes linked from this bug report, but maybe I'm missing something.) That, however, caused nodes to fail to boot if the MAAS server became inaccessible, since then there was no fallback to boot from GRUB on the hard disk. I believe somebody filed a bug on that, but I don't have a reference. Dann noted in comment #21 to this bug report that a GRUB SRU was imminent that would likely cause a regression on THIS bug, and in comment #23, I confirmed that this was the case.

Rod Smith (rodsmith) wrote :

Ooh, lots of activity while I typed my last comment.

This may be beating a dead horse, but concerning comment #42, Andres, I tried that command and got similar output:

$ ipmitool -H 10.20.30.13 -I lanplus -U user -P pass chassis bootdev pxe options=persistent,efiboot
Set Boot Device to pxe
$ ipmitool -H 10.20.30.13 -I lanplus -U user -P pass chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: a004000000
 Boot Flags :
   - Boot Flag Valid
   - Options apply to only next boot
   - BIOS EFI boot
   - Boot Device Selector : Force PXE
   - Console Redirection control : System Default
   - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST

On the node, efibootmgr showed an unchanged boot order, and it continued to boot from the hard disk without PXE-booting. What's more, on reboot I noticed that ipmitool claimed the system would boot in legacy mode:

$ ipmitool -H 10.20.30.13 -I lanplus -U user -P pass chassis bootparam get 5
Boot parameter version: 1
Boot parameter 5 is valid/unlocked
Boot parameter data: 0000000000
 Boot Flags :
   - Boot Flag Invalid
   - Options apply to only next boot
   - BIOS PC Compatible (legacy) boot
   - Boot Device Selector : No override
   - Console Redirection control : System Default
   - BIOS verbosity : Console redirection occurs per BIOS configuration setting (default)
   - BIOS Mux Control Override : BIOS uses recommended setting of the mux at the end of POST

(Note the "BIOS PC Compatible (legacy) boot" line.) On reboot, the system booted in EFI mode. I see similar things on other nodes that are configured to boot in EFI mode. Thus, you really can't trust what IPMI tells you (or what you tell it) about boot mode or boot device on an EFI-based computer.

Blake Rouse (blake-rouse) wrote :

I think this is more of a GRUB issue overall instead of a MAAS issue directly. True it affects MAAS and we can do the debconf selections to work around this issue but overall for quality of Ubuntu I do not believe this is the proper fix.

I will give an example without MAAS.

1. First the user installs Ubuntu on a partition on their local disk, EFI is updated so Ubuntu can boot.
2. Second the user installs Windows on another partition. EFI is updates so Windows can boot and its first.
3. User reboots into Ubuntu, runs apt-get, and grub updates changing the boot order so now that Ubuntu boots first.
4. User reboots their machine and Ubuntu boots but the user expected Windows to boot.

Overall this is a bad experience to the user.

I think the grub code should be smart about this:

First check if the grub.efi loader already exists in efibootmgr. If it does not exists add it to the loader and set it to boot first. If it does exist record its current place in the boot order, update the loader and reset the boot order to its previous location.

That change would fix this for any user that uses Ubuntu as well as MAAS users.

David Britton (davidpbritton) wrote :

Hi Blake --

Could you please redirect that discussion to:

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1714090? Which has been opened to discuss making grub2 UX better on Ubuntu.

Rod Smith (rodsmith) wrote :

Blake, your proposal makes sense on the surface; however, there are cases where it would cause problems. For instance, suppose that, outside of a MAAS environment, somebody installs Ubuntu, then installs Windows in a dual-boot configuration, then re-installs Ubuntu because Windows grabbed the boot process and the user couldn't figure out how to boot into Ubuntu. In this case, Ubuntu/GRUB would not then gain control of the boot process, which is what the user was hoping would happen, and (I THINK) what would happen today. Of course, re-installing Ubuntu was overkill in this scenario, and a little knowledge would go a long way to resolving the problem in an easier way; but I've seen posts on user forums from new users who do things like this. This isn't to say that your suggestion is a bad one; but implementing it would create some new problems of its own. They might be smaller than the ones we've got now, but they should be considered.

Three more points, should Blake's proposal be implemented:

First, and most importantly, the initial installation of GRUB in a MAAS environment would get it wrong, since as outlined, the proposal would give boot control to the local hard disk, which is exactly the problem we want to avoid. In such an environment, you'd need to install GRUB and ensure that it comes AFTER the PXE-boot option, otherwise the initial problem (MAAS losing control of nodes' boot process) would exist. Thus, either the GRUB package would need to take a cue from MAAS to leave the current top boot option in control (that is, install GRUB as the second or later boot option) or it would need enough smarts to figure this out itself. Given the wide variation in the way PXE-boot options appear in efibootmgr output, the former is likely to be more reliable than the latter.

Second, there's a potential implementation pitfall: There might be stale/invalid NVRAM entries that point to GRUB on non-existent devices. This could happen when MAAS redeploys a node, since the partition table will be wiped and new partitions created, but the NVRAM-based boot entries will be untouched. (Analogous things can happen in local/manual installations, too, of course.) The new EFI System Partition (ESP) will have a new GUID, which won't match the old one for the original installation. Thus, if the check for a reference to grubx64.efi doesn't include the GUID value (at a minimum; there are other identifying features, too), it might think the existing entry is valid, when in fact it's not. (Note that some, but not all, EFIs wipe invalid boot entries, so some computers might not exhibit this problem, but others will.)

Third, on systems that boot with Secure Boot active, the NVRAM entry will normally point to shimx64.efi, not grubx64.efi. In fact, this is usually the case even when Secure Boot is NOT active, or is unavailable; but with Secure Boot out of the picture, either binary should work to boot the computer.

Blake Rouse (blake-rouse) wrote :

Rod,

I moved my comment to the other bug. But to respond to your comments.

I think your point about #1 is something that should be solved during installation. There is no reason the Ubuntu installation cannot be smart enough to re-order the boot order. I think on fresh install that makes perfect sense, but on an upgrade of an existing package this should not occur.

#2 I agree the bootloader would need to be a direct match. GUID and path.

#3 Yeah using the shimx64 is what you want. My suggestion was more of a suggestion on how it should work over all. I was being over specific with grubx64.efi.

Changed in maas:
status: Confirmed → In Progress
Changed in maas:
status: In Progress → Fix Committed
dann frazier (dannf) on 2017-09-14
Changed in grub2 (Ubuntu Yakkety):
status: Triaged → Won't Fix
Changed in maas:
milestone: 2.3.0 → 2.3.0alpha3
Changed in maas:
status: Fix Committed → Fix Released
dann frazier (dannf) wrote :

Thanks Andres! I've verified that this is working as designed when deploying zesty. I've prepared an SRU for GRUB in xenial and verified that it also behaves correctly, so I'll go ahead and upload that.

Po-Hsu Lin (cypressyew) wrote :
Phillip Susi (psusi) wrote :

So I guess nobody saw my question before. If you are booting with PXE, then why don't you just not install grub-efi in the first place?

dann frazier (dannf) wrote :

@psusi because the only thing we boot from PXE is a GRUB that chainloads to the GRUB on disk. We need the GRUB on disk to boot the kernel/ramdisk/cmdline installed by the OS.

Brian Murray (brian-murray) wrote :

The uploading purporting to fix this bug is blocked on the verification of the existing grub2 SRU for bug 1716424.

Hello Rod, or anyone else affected,

Accepted grub2 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.02~beta2-36ubuntu3.14 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-xenial
Brian Murray (brian-murray) wrote :

Hello Rod, or anyone else affected,

Accepted grub2-signed into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.66.14 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2-signed (Ubuntu Xenial):
status: New → Fix Committed
Changed in grub2-signed (Ubuntu):
status: New → Fix Released
Changed in grub2-signed (Ubuntu Yakkety):
status: New → Won't Fix
dann frazier (dannf) wrote :

SRU verification[*]:

ubuntu@dawes:~$ sudo efibootmgr | grep ubuntu
ubuntu@dawes:~$ sudo debconf-show grub-efi-arm64 | grep update_nvram
  grub2/update_nvram: true
ubuntu@dawes:~$ sudo dpkg-reconfigure -pcritical grub-efi-arm64
Installing for arm64-efi platform.
Installation finished. No error reported.
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.4.0-97-generic
Found initrd image: /boot/initrd.img-4.4.0-97-generic
Adding boot menu entry for EFI firmware configuration
done
ubuntu@dawes:~$ sudo efibootmgr | grep ubuntu
Boot0000* ubuntu
ubuntu@dawes:~$ sudo efibootmgr -B -b 0000 > /dev/null

ubuntu@dawes:~$ echo debconf grub2/update_nvram boolean false | sudo debconf-set-selections
ubuntu@dawes:~$ sudo debconf-show grub-efi-arm64 | grep update_nvram
* grub2/update_nvram: false
ubuntu@dawes:~$ sudo efibootmgr | grep ubuntu
ubuntu@dawes:~$ sudo dpkg-reconfigure -pcritical grub-efi-arm64
Installing for arm64-efi platform.
Installation finished. No error reported.
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.4.0-97-generic
Found initrd image: /boot/initrd.img-4.4.0-97-generic
Adding boot menu entry for EFI firmware configuration
done
ubuntu@dawes:~$ sudo efibootmgr | grep ubuntu
ubuntu@dawes:~$

[*] Note: the arm64/amd64 builds are still stuck in "Unapproved" - I did this verification by pulling the builds from LP directly

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers