quick-boot-lvm.patch caused regression - menu always appear if root is on Btrfs

Bug #1815002 reported by RussianNeuroMancer
134
This bug affects 21 people
Affects Status Importance Assigned to Milestone
Default settings and artwork for Baltix OS
Triaged
Medium
Mantas Kriaučiūnas
grub2 (Baltix)
Triaged
Medium
Unassigned
grub2 (Ubuntu)
Opinion
Undecided
Unassigned

Bug Description

Comments about this regression:

https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1800722/comments/12
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1800722/comments/13

In my case menu starting to always appear on various devices from small tablets to workstations. First time it happened with laptop https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/736743/comments/95 and since then more devices became affected. I not 100% sure, but seems like at least one unsuccessfully boot is required to reproduce this issue, but maybe I am mistake here and issue starting to happen for all Btrfs users.

tags: added: regression-update
Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Additional information: all affected devices boot via grub-efi.

Revision history for this message
Steve Langasek (vorlon) wrote :

If you are booting via UEFI and your /boot is on btrfs then grub has detected that /boot/grub/env is not writable from within grub and therefore the boot menu is shown with a delay. As discussed in bug #1814403, this is by design.

Changed in grub2 (Ubuntu):
status: New → Invalid
Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> As discussed in bug #1814403, this is by design.

Disagree. Mathieu mentioned in Comment #31 of bug #1814403:

> When your system doesn't boot correctly through the kernel however, it would likely not show the menu without this fix -- you'd have no way to switch back to a working kernel, because the menu can be hard to reach at all in these setups.

This is not the case for UEFI and / on btrfs (including /boot) as I can't remember any issues with accessing GRUB menu for eight years (since Ubuntu started supporting installation on btrfs). So, if my understanding is correct, quick-boot-lvm.patch fix nothing for this combination while introducing unnecessary timeout on systems that should not have needed quick-boot-lvm.patch workaround.

Mathieu mentioned in Comment #31 of bug #1814403:

> If you still feel your system is showing the menu and timeout at every boot unnecessarily, or if you feel the default timeout is too long, please file a separate bug with the information specific to your system and particular case so we can look into it -- the particularities of the system will be important to include in such a bug report.

Which is what I just did. A bunch of tablets, 2-in-1, embedded boards, etc. load at least two times longer now with no way to skip menu due to lack of keyboard - such behaviour is hardly necessary or expected, as there was no issues with accessing GRUB2 menu before, if keyboard is connected.

So, please reconsider your resolution on this bug, due to two points:

1. Switch back to previous kernels works before for UEFI and / on btrfs, without quick-boot-lvm.patch

2. Tablets, 2-in-1, embedded boards with UEFI and / on btrfs now stuck for 30 seconds on every boot with no way to skip menu.

Changed in grub2 (Ubuntu):
status: Invalid → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
Revision history for this message
Gannet (ken20001) wrote :

I am booting from MBR/GPT and Grub menu also appears all the time with 30 sec. timeout. Definitely some kind of ridiculous bug.

Revision history for this message
m4t (m4t) wrote :

Also affected by this with BTRFS filesystem on EFI. The patch which introduced the behavior change was titled quick-boot-lvm.patch, not slow-boot-btrfs.patch ;-)

Revision history for this message
Steve Langasek (vorlon) wrote :

Tested on bionic:
mkfs.btrfs /dev/vda2
mount /dev/vda2 /mnt
cp -a /boot/* /mnt/
umount /mnt
mount /dev/vda2 /mnt
grub-install vda
update-grub
reboot

- Boot to the grub prompt and hit 'c'

grub> save_env timeout_style
error: sparse file not allowed
grub>

grub does not support writes to btrfs; therefore Ubuntu does not have recordfail handling; therefore we cannot default to a timeout of zero on UEFI systems when grub is installed to btrfs.

You are welcome to adjust the timeouts in /etc/default/grub, but the grub package's behavior here is deliberate.

Changed in grub2 (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> therefore we cannot default to a timeout of zero on UEFI systems when grub is installed to btrfs.

How that was not the case for eight years?

Revision history for this message
Steve Langasek (vorlon) wrote :

The bug went unnoticed because UEFI plus /boot on btrfs is a corner case that is not widely used by the Ubuntu developers. Nevertheless, this configuration does NOT reliably allow the user to reach the boot menu with the default timeout of 0. It is more important to be reliable by default than to be fast by default.

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> Nevertheless, this configuration does NOT reliably allow the user to reach the boot menu with the default timeout of 0.

Could you please clarify under which circumstances this configuration does not reliably allow user reach boot menu? I asking because I never seen situation where this menu wasn't reachable on dozens/hundreds different devices with / on btrfs, from 7 inch tablets to two-socket servers.

Gannet (ken20001)
Changed in grub2 (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Gannet (ken20001) wrote :

Hey, all the time I'm performing a rebooting my remote servers I should wait 30 sec more to connect to them. This is not normall and never been before. It shoud be optional: who needs it to appear at each boot, let him turn it on, but it not should be the default behaviour.

Steve Langasek (vorlon)
Changed in grub2 (Ubuntu):
status: Confirmed → Opinion
Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1815002] Re: quick-boot-lvm.patch caused regression - menu always appear if root is on Btrfs

On February 15, 2019 4:08:36 AM PST, RussianNeuroMancer <email address hidden> wrote:
>> Nevertheless, this configuration does NOT reliably allow the user to
>reach the boot menu with the default timeout of 0.
>
>Could you please clarify under which circumstances this configuration
>does not reliably allow user reach boot menu? I asking because I never
>seen situation where this menu wasn't reachable on dozens/hundreds
>different devices with / on btrfs, from 7 inch tablets to two-socket
>servers.
>
>--
>You received this bug notification because you are subscribed to
>Ubuntu.
>Matching subscriptions: regression tagged
>https://bugs.launchpad.net/bugs/1815002
>
>Title:
> quick-boot-lvm.patch caused regression - menu always appear if root is
> on Btrfs
>
>To manage notifications about this bug go to:
>https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1815002/+subscriptions

Under UEFI, there is no way for grub to detect a modifier key being held down; instead of holding shift at boot to get to the menu, you have to press the shift key at the right moment.

If you have a boot timeout of 0, that means grub waits 0 seconds for you to press the shift key before booting.

That means the window in which you can press the shift key to get to the boot menu is 0 seconds.

If you miss the window, your only option is to reboot and try again.

If you reboot before the boot finishes, then grub will show you the boot menu by default with a timeout.

But if /boot is on btrfs, then grub has no way to record that a boot was attempted, which means it will not know that you rebooted before the boot finished. And you will never reliably get the boot menu.

--
Steve Langasek

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Steve, with all respect, If possible, can you answer question in the end of this message? This is critical question because if you doesn't have answer that mean quick-boot-lvm.patch introduced regression for affected devices without keyboard for no single good reason. You said:

> Nevertheless, this configuration does NOT reliably allow the user to reach the boot menu with the default timeout of 0.

However, as I can see this is not the case in practice. Which is why I asked in Comment #10: could you please clarify under which circumstances this configuration does not reliably allow user reach boot menu?

Changed in grub2 (Ubuntu):
status: Opinion → Confirmed
Steve Langasek (vorlon)
Changed in grub2 (Ubuntu):
status: Confirmed → Opinion
Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> Under UEFI, there is no way for grub to detect a modifier key being held down; instead of holding shift at boot to get to the menu, you have to press the shift key at the right moment.
> And you will never reliably get the boot menu.

Press Esc when GRUB2 gray screen appear is reliable way to get into menu, I just tested this on three devices around half-hundred times in total. Somehow it works with version 2.02+dfsg1-5ubuntu8 and no longer works with 2.02+dfsg1-5ubuntu8.2. Why is that?

> If you have a boot timeout of 0, that means grub waits 0 seconds for you to press the shift key before booting.
> That means the window in which you can press the shift key to get to the boot menu is 0 seconds.

If I read systemd-analyze report right, version 2.02+dfsg1-5ubuntu8 wait three seconds for user input, even if grub config contain:
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_RECORDFAIL_TIMEOUT=0

Meanwhile 2.02+dfsg1-5ubuntu8.2 doesn't wait for user input and indeed behave like you described - where this difference between 2.02+dfsg1-5ubuntu8 and 2.02+dfsg1-5ubuntu8.2 came from?

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I'm not convinced the keypress is 100% reliable. It does get better all the time, but there are still systems on which it's just not possible to get it working reliably, and sometimes not at all - you'll press consistantly "too late" or "to early" and GRUB won't notice, so you won't get the menu.

AIUI, the issue came from EFI timers being odd, but I can't qualify that: TSC calibration isn't done using the EFI timer first -- we first look at pmtimer and PIT before we use the EFI timers.

As for / on btrfs, I agree it's a side-effect of the check_writable() call, but it does seem to still be "up to date" at first glance: it's *supposed* to be affected the same way: if you're on a system on which the keypress doesn't get you the menu (this is system-specific), *and* you're set up with /boot on btrfs (which doesn't have write support as far as I can tell), then a failing boot won't get you the menu automatically. So... three machines is good, but if they're all the same kind (all Dell, all AMI, etc.) then it might not be a good sample.

I think the point is that we need a better idea of exactly which systems are affected, to try and tell how many people it really affects. Let's try to get that now.

Steve, do we have a good list of exactly which systems we've seen the LVM issues on?

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> but if they're all the same kind (all Dell, all AMI, etc.) then it might not be a good sample

All three is different: HP Elite x2 1013 G3, Dell Venue 8 Pro 5855 and Acer Aspire Switch Alpha 12 SA5-271. Besides this three, again, no such issue on other devices, as mentioned in Comment #10 (including embedded and server boards).

Maybe it's possible to load grubenv from esp partition? It will make recordfail usable for all UEFI users regardless of FS.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Yes, it's likely possible. It's actually something we've been discussing, just need to figure out how to do it.

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Is this discussion somewhere on public mail-list or some other bugreport?

Revision history for this message
Felipe Castillo (fcastillo.ec) wrote :

Any news on this issue? The suggested workaround, which is to modify the timeout is not even a workaround. It will indeed shorten the boot time, but in case of any errors or interruption to booting, I won't be able to see the GRUB menu on next boot.
This is such a regression, for years (as many people mentioned here) I was able to boot without delay, and if there were errors, I did see the GRUB menu. This patch just messed everything up.

I'm not using Btrfs at all, I have my /boot on an ext4-LVM partition, and my /boot/efi on a vfat (non-LVM) partition

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Mar 07, 2019 at 05:04:51PM -0000, Felipe Castillo wrote:

> Any news on this issue? The suggested workaround, which is to modify the
> timeout is not even a workaround. It will indeed shorten the boot time,
> but in case of any errors or interruption to booting, I won't be able to
> see the GRUB menu on next boot.

> This is such a regression, for years (as many people mentioned here) I was
> able to boot without delay, and if there were errors, I did see the GRUB
> menu.

The only way this could be true is if your grub root is configured on a
filesystem that grub knows how to write to; and to the best of our knowledge
this is not supported on top of LVM, as described in /etc/grub.d/00_header.
To test whether this is the case, you can do the following from the grub
prompt:

set var_test=value
save_env var_test

You should get an error message:

error: diskfilter writes are not supported

If you find a system where the above command succeeds, but you are seeing
the 30 second boot delay, then that is a bug and we should fix it.

But if the above save_env command fails, then there is no possible way that
you saw the GRUB menu on error on that system with that filesystem
configuration, because there is no possible way for GRUB to detect that
there was an error.

RussianNeuroMancer's argument is different; he argues that using Esc to
access the GRUB menu during UEFI boot DOES work reliably and therefore we
should default to booting fast. So far, I am unable to confirm that it's
reliable; but at best, it would be dependent on the user detecting flickers
of the screen to know when the right moment is to press Esc, which is still
bad UX, especially for the case in which there was actually an error on the
previous boot and the user struggles to figure out how to get to the menu.

The right solution is for us to make sure we always have writable space
where we can store grubenv between boots, so that we don't have to choose
between two suboptimal user experiences.

Revision history for this message
Felipe Castillo (fcastillo.ec) wrote :

@Steve Thanks so much for your post. Based on this, would you recommend having a different partition just for boot, which would be a non-LVM partition? If this is the preferred way, how big should I make it? I've heard recommendations of 150MB, others say 250MB.

Also, should I repurpose my EFI partition and convert it into a /boot one (given that EFI is inside /boot)? Or do I still need both partitions? Does /boot have any special requirements? I know that the EFI partition had to be at the beginning of the disk and has to be FAT32, is the same true for /boot.

The reason why I still think this is a bug, it's because the default installation of Ubuntu 18.10 and 18.04, when selecting to use LVM, is not to create a /boot partition, but rather just an EFI one. So everybody using LVM is going to see this 30 seconds delay.

Instead of trying to fix the patch that introduced the problem. We should fix the default partitioning when using LVM on Ubuntu, maybe this bug should focus on that, so we don't see this problem on Ubuntu 19.04

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> RussianNeuroMancer's argument is different; he argues that using Esc to access the GRUB menu during UEFI boot DOES work reliably and therefore we should default to booting fast.

Just in case, I want to note, that it works with 2.02+dfsg1-5ubuntu8 but not with 2.02+dfsg1-5ubuntu8.2. Could you please clarify where this difference between 2.02+dfsg1-5ubuntu8 and 2.02+dfsg1-5ubuntu8.2 came from? More information in Comment #14.

> The right solution is for us to make sure we always have writable space where we can store grubenv between boots, so that we don't have to choose between two suboptimal user experiences.

What is outcome of discussion mentioned by Mathieu in Comment #17?

> Instead of trying to fix the patch that introduced the problem. We should fix the default partitioning when using LVM on Ubuntu, maybe this bug should focus on that, so we don't see this problem on Ubuntu 19.04

Or store grubenv on esp partition as per Comment #17. I hope for this one, as it's seems like less disturbing change for end users (no need for repartition) and developers (no need for change default partitioning).

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Steve, Mathieu, any news on this issue?

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> > Maybe it's possible to load grubenv from esp partition? It will make recordfail usable for all UEFI users regardless of FS.

> Yes, it's likely possible. It's actually something we've been discussing, just need to figure out how to do it.

Is there progress on this?

Changed in grub2 (Baltix):
milestone: none → baltix-18.04
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Mantas Kriaučiūnas (mantas) wrote :

As workaround I've added line GRUB_RECORDFAIL_TIMEOUT=0 to /etc/default/grub

Changed in baltix-default-settings:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Mantas Kriaučiūnas (mantas)
Revision history for this message
anymouse (anymouse) wrote :

Hi, my system is affected as well (installed ubuntu as the only system using LVM), so I would like to ask whether my understanding of the problem as follows is correct.
I would also like to know whether bug reports have been filed regarding the 5)a) and 5)b) points, or if there has been any other progress on this bug.

1) Normally, intended behavior of Grub is that if boot of an operating system is unsuccessful, the 'recordfail' variable is set, which causes Grub timeout during the next boot to be GRUB_RECORDFAIL_TIMEOUT instead of GRUB_TIMEOUT, to give the user enough time to boot the recovery environment.

2) Grub normally writes this 'recordfail' variable to the operating system partition, but since Grub lacks the ability to write to BTRFS or any LVM partition, Grub wouldn't be able to set the 'recordfail' variable in the case of unsuccessful boot. This could potentially leave the user needing to boot to recovery, but unable to do it, when GRUB_TIMEOUT is set to 0.

3) Currently implemented workaround for 2) is for the 'recordfail' variable to be set permanently, if Grub is unable to write the variable to disk. This causes Grub to ignore GRUB_TIMEOUT every time and wait for 30 seconds instead, which is the default value of GRUB_RECORDFAIL_TIMEOUT.

4) This happens by default on every EFI system where user chooses to install Ubuntu with LVM partitioning (my case) or (I'm not sure about this one) if user chooses to use BTRFS partition during the install.

5) Possible solutions would be one of the following:

  a) Ubuntu installer should create partitions in such a way that leaves a small separate partition that would be writable by Grub.

  b) Implement ability to store environment variables on the EFI System Partition (ESP) which is FAT32 and already is created by the installer. This has been mentioned in #16 and #17.

  c) Implement functionality to write to LVM, BTRFS, and other possibly affected filesystems. This is not gonna happen due to complexity and unnecessity.

6) Possible workarounds are:

  a) In /etc/default/grub set the variable GRUB_RECORDFAIL_TIMEOUT to small but non-zero amount of seconds. Setting it to zero is strongly discouraged as it could leave system unable to boot to recovery or other entries in case of boot fail, although RussinNeuroMancer describes in #14 that they're able to reliably enter Grub even if this is set to 0.

  b) Manually repartition the system in such a way that Grub's recordfail functionality works. I don't know what exactly is needed for this, though, but I would be tremendously grateful if anyone does this and shares how to do it.

Revision history for this message
Trideep Roy (tr1r0y) wrote :

Hi Steve,
I am seeing the same 30s timeout on "Elementary OS 7" which uses the following partitioning scheme for a standard install with disk encryption.

$ lsblk -o NAME,PARTTYPENAME,FSTYPE,MOUNTPOINTSNAME            PARTTYPENAME     FSTYPE      MOUNTPOINTS
nvme0n1
├─nvme0n1p1     EFI System       vfat        /boot/efi
├─nvme0n1p2     Linux filesystem ext4        /boot
└─nvme0n1p3     Linux filesystem crypto_LUKS
  └─cryptdata                    LVM2_member
    ├─data-root                  ext4        /
    └─data-swap                  swap        [SWAP]
Although the encrypted partition uses LVM the /boot and /boot/efi partitions are ESP and EXT4 and GRUB should be able to write to them. I verified that GRUB is able to write from the GRUB command line using the following commands -grub>list_envrecordfail=1grub>set recordfail=0grub>save_env recordfailgrub>list_envrecordfail=0grub>reboot
On rebooting after setting "recordfail=0" GRUB no longer shows the menu. But on the next boot after restarting the system using "sudo shutdown -r 0" GRUB again shows the menu. I confirmed that "recordfail" was reset to 1.

$ sudo grub-editenv /boot/efi/EFI/ubuntu/grub/grubenv list
recordfail=1
So it seems like GRUB is forcefully always setting recordfail to 1 on every boot even though GRUB should be able to write to the environment block on /boot and /boot/efi.

Here is my system information.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Elementary
Description: elementary OS 7 Horus
Release: 7
Codename: horus

$ uname -a
Linux t6-laptop 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Julian Andres Klode (juliank) wrote :

@tr1r0y grub always sets recordfail=1, and the OS when booted unsets it with grub-common.service. You likely have a broken grubenv file and hence grub-common.service fails to run.

Revision history for this message
bombela (bombela) wrote (last edit ):

Same bug here. EFI+LVM with ubuntu.

One NVME/SSD. With one EFI partition and one LVM PV partition.
Within the LVM, one VG, with one / LV and one swap LV.

For now I workaround it with GRUB_RECORDFAIL_TIMEOUT=1.

EDIT: I should clarify that to me the bug is the surprise of hitting this 30s timeout without any indication that it is a fail safe. While in my config I have GRUB_TIMEOUT=1. Maybe GRUB could report that it switched to failsafe. Maybe the default /etc/default/grub should contain a wording about it. Anything, but a surprise 30s grub wait.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.