Grub2 2.06 has upstream bug that results in Non-booting with ZFS after snapshot of bpool.

Bug #2051999 reported by Mike Ferreira
66
This bug affects 16 people
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Confirmed
Undecided
Unassigned
grub2-unsigned (Ubuntu)
Confirmed
Undecided
Unassigned
zfs-linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

There is an upstream Bug with Grub where if you create snapshots of bpool, it results in a non-booting System. The problem was found to be an upstream Bug with Grub2:
https://savannah.gnu.org/bugs/index.php?64297

Multiple Ubuntu 22.04.3 Users Affected:
https://ubuntuforums.org/showthread.php?t=2494397&highlight=zfs+grub+bug
https://ubuntuforums.org/showthread.php?t=2494957

Brought up as an issue at OpenZFS:
https://github.com/openzfs/zfs/issues/13873

If you look at this comment (https://github.com/openzfs/zfs/issues/13873#issuecomment-1892911836), if was found the Savanaugh at GNU released a fix for it in Grub2 2.12, here:
https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c

Ubuntu Jammy 22.04.3 is Grub2 2.06. We need to backported this patch to Grub2 2.06 so that Users are not caught of in this bug for or currently supported LTS Release.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: grub-efi-amd64 2.06-2ubuntu14.4
ProcVersionSignature: Ubuntu 6.2.0-39.40~22.04.1-generic 6.2.16
Uname: Linux 6.2.0-39-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: GNOME
Date: Thu Feb 1 16:40:28 2024
InstallationDate: Installed on 2021-09-23 (861 days ago)
InstallationMedia: Ubuntu 20.04.3 LTS "Focal Fossa" - Release amd64 (20210819)
SourcePackage: grub2-unsigned
UpgradeStatus: Upgraded to jammy on 2022-08-17 (533 days ago)

Revision history for this message
Mike Ferreira (mafoelffen) wrote :
Revision history for this message
Rick S (1fallen) wrote :

It also happens on 24.04 all flavors.

We have seen a few users in the forums now with this bug.

Revision history for this message
Mike Ferreira (mafoelffen) wrote :

I see Noble as Grub2 2.12 package version 2.12~rc1-12ubuntu4 amd64...

This patch was 2 weeks ago. Th ink is not dry. It may have not hit us with 2.12 yet either?

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

No: The date in the current package version's change log is: Fri, 08 Dec 2023 09:22:22

Nothing here includes that patch yet.

I'm wondering if it also affects 20.04? IDK, Though we have not had any Users report any problems with this in 20.04.

Revision history for this message
Mate Kukri (mkukri) wrote :

Hi Mike,

Do you happen to know what changes to GRUB the "patch 2 weeks ago" refers to? I am not seeing any ZFS related changes in the GRUB tree 2 weeks ago, and that Savannah ticket is also many months old.

Nonetheless, Noble is getting a GRUB 2.12 release very soon (which is based on a lot newer upstream than 2.12~rc1), which should include any remotely recent ZFS changes. And according to the OpenZFS issue page linked from Savannah, also resolves this particular issue. Supported stable releases are also planned to get backports of GRUB 2.12 sometime after NN gets released.

I wonder if there is any scenario where users can experience breakage without manually enabling bpool snapshotting? Because I feel like we need some sort of solid justification to have a cherry-pick SRU of 2.06 with the ZFS commits right now.

Revision history for this message
Bill Jennings (copperheadbill) wrote :

Affected 22.04 after installing zfs-auto-snapshot.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
Changed in grub2-unsigned (Ubuntu):
status: New → Confirmed
Revision history for this message
Rick S (1fallen) wrote :

Hi Mate, the below sounds encouraging
"Nonetheless, Noble is getting a GRUB 2.12 release very soon (which is based on a lot newer upstream than 2.12~rc1), which should include any remotely recent ZFS changes. And according to the OpenZFS issue page linked from Savannah, also resolves this particular issue. Supported stable releases are also planned to get backports of GRUB 2.12 sometime after NN gets released."

I'll report any good news when it hits

Revision history for this message
Rick S (1fallen) wrote :

Dang I forgot to mention for NN 24.04....sorry.

Revision history for this message
Rick S (1fallen) wrote :

@ Mate I wonder if there is any scenario where users can experience breakage without manually enabling bpool snapshotting? Because I feel like we need some sort of solid justification to have a cherry-pick SRU of 2.06 with the ZFS commits right now.

That's the only trigger ATM, this is after a manual bpool snapshot, resulting in a "No kernel found"

Revision history for this message
Rick S (1fallen) wrote :

Auto snaphots are still good:
   **zfs list -r -t snapshot -o name,creation bpool
NAME CREATION
bpool/BOOT/ubuntu_l7y21m@autozsys_bk6cdb Fri Feb 2 10:41 2024
bpool/BOOT/ubuntu_l7y21m@autozsys_sin22d Sat Feb 3 17:22 2024
bpool/BOOT/ubuntu_l7y21m@autozsys_41kebx Sat Feb 3 17:23 2024

Grub Version
   **apt policy grub2
grub2:
  Installed: (none)
  Candidate: 2.12~rc1-12ubuntu4
  Version table:
     2.12~rc1-12ubuntu4 500
        500 http://us.archive.ubuntu.com/ubuntu noble/universe amd64 Packages

All manual snapshots and the kernel is lost>> "you need to load the kernel first" and no boot.

Revision history for this message
Mike Ferreira (mafoelffen) wrote :

Also affect Mantic...

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):
Download full text (3.4 KiB)

I'm working on trying various differing work-arounds with what is there natively.

I tried building the newest git on Grub2 2.12. Fails building at a point which is says way fixed for the error i am getting, but still fails on the make at the same point. (asorti ot defined)

I'm also working on another theory, recreating bpool with differing options, on the theory that on the creation of bpool, that some on the zpool options are causing Grub2 to fail with those added challenges.

I'll keep you up-to date how those go. Trying to balance my time between "real life", helping the upstream Intel Graphics Support Engineers with drivers issues with Ubuntu, and helping Users with other issues. ...All while trying to find underlying information answers for this issue.

I know in the ZFS install scripts, when I was tracing the process of that, for bpool, it says compatibility grub2, and features all (or something to that affect)... but some features being enabled are causing this problem after snapshots are being taken on bpool.

This is why I am pursuing this and testing what is going on with that... If that is true, and if the new Grub2 patches fail, then there is a much bigger problem there with far more reaching implications on what has been installed as ZFS (to correct it). ...And for changing ZFS feature settings in the installer for Noble, BEFORE it's release.

I think if that is true, then a new compatibility option named ubuntu24.04 can just be set to set the allowed options of the correct disabled feature options, then nothing would have to change in the installer script for Noble. That would take care of all that for new installs. That compatibility option would need to be added to ubuntu-desktop-installer image, then to the ZFS installer script as "option compatibility=grub2,ubuntu24.04"...

What get's me, it that "feature" or "ability" that this is failing on, being able to snapshot bpool, because that is the failure point we need to protect from, was covered by Zsys, and why Canonical came up with that in the first place. It worked in 20.04 LTS... There was not a problem there, with that installed. We were able to do that in that release. Marketing-wise, it put Ubuntu ahead of things with that "need" covered well.

Zsys was removed from being a default install (no mention why), but I still see commits to it. The need to make snapshots of the boot related files is still there. I wish I knew more on the why Zsys fell from favor. I thought basically, besides some minor needed config and customization changes, it was a great idea to do what it was intended to do.

Some users, have gone around this problem by using ZFS Boot Menu (ZBM). But that is not a solution, Rather it removes the conditions, by what they have to do in the restructure to make it work. For it, bpool cannot exist. There cannot be a separate /boot. /boot has to be inside the root pool, and not inside it's own ZFS dataset... If this restructure is done this way, then those conditions do not exist. That is why is works. No that ZBM gets around the problem itself. If you crate a dataset for /boot itself, to make snapshots of boot related files, then ZBM does not work, even bef...

Read more...

Revision history for this message
Mike Ferreira (mafoelffen) wrote :

Forgot to mention... Those "@feature" changes, that are suspected as being the problems when enabled... unfortunately cannot be done on an already created bpool. Those can only be set at pool "creation time".

So I'm working on a script to recreate bpool which those specific option settings from the existing bpool, offline from an installer LiveUSB, environment.

If I try it from online, it will not export the bpool. It just says "busy". Using an additional USB flash drive to temporarily hold the backup from bpool to destroy and recreate the bpool.

That is the path I'm investigating and testing currently. That currently is, in the absence of a Grub2 2.12 that works with those conditions.

If the new Grub2 2.12 does work with the older option setting, then a lot of thigs will be resolved, and a work-around will not be needed to fix existing installs nor new options set in ubuntu-desktop-installer.

I installed just Grub 2.12~rc1 to 22.04.3 with the associated long line of depends. It wasn't pretty. And it did not correct the problem. That was not an answer to recommend.

But at this point in time, because of Grub 2.12 not building and is vaporware. We don't have enough here to test, and make informed decisions.

Revision history for this message
Mate Kukri (mkukri) wrote (last edit ):

@mafoelffen You can find builds of GRUB 2.12 for Ubuntu in this PPA: https://launchpad.net/~ubuntu-uefi-team/+archive/ubuntu/build

NOTE that the stuff in that PPA isn't signed, so you need to disable UEFI secure boot to test it easily.

Revision history for this message
Rick S (1fallen) wrote :

I've added the PPA for Noble just now, will report after a manual bpool snapshot

     **The following packages will be upgraded:
  automake autopoint ayatana-indicator-common bind9-dnsutils bind9-host bind9-libs cpio
  gcc-14-base gcc-14-base:i386 gettext gettext-base grub-common grub-efi-amd64-bin
  grub-efi-amd64-signed grub-pc grub-pc-bin grub2-common language-pack-gnome-en
  libasan8 libatomic1 libatomic1:i386 libcc1-0 libgcc-s1 libgcc-s1:i386 libgfortran5
  libgnutls30 libgomp1 libhwasan0 libitm1 liblsan0 libneon27 libp11-kit0 libpcap0.8
  libquadmath0 libstdc++6 libstdc++6:i386 libtsan2 libubsan1 linux-firmware
  openssh-client p11-kit p11-kit-modules python3-mako ufw
44 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Revision history for this message
Rick S (1fallen) wrote :

It worked:

The grub issue with a PPA solved it for NNoble
Code:

apt policy grub-common
grub-common:
  Installed: 2.12-1ubuntu1~ppa1
  Candidate: 2.12-1ubuntu1~ppa1
  Version table:
 *** 2.12-1ubuntu1~ppa1 500
        500 https://ppa.launchpadcontent.net/ubuntu-uefi-team/build/ubuntu noble/main amd64 Packages
        100 /var/lib/dpkg/status
     2.12~rc1-12ubuntu4 500
        500 http://us.archive.ubuntu.com/ubuntu noble/main amd64 Packages

zfs list -r -t snapshot -o name,creation bpool
NAME CREATION
bpool/BOOT/ubuntu_l7y21m@autozsys_bk6cdb Fri Feb 2 10:41 2024
bpool/BOOT/ubuntu_l7y21m@autozsys_sin22d Sat Feb 3 17:22 2024
bpool/BOOT/ubuntu_l7y21m@autozsys_41kebx Sat Feb 3 17:23 2024
bpool/BOOT/ubuntu_l7y21m@autozsys_25r5is Sun Feb 4 13:55 2024
bpool/BOOT/ubuntu_l7y21m@autozsys_1n6o7d Sun Feb 4 14:07 2024
bpool/BOOT/ubuntu_l7y21m@2-4-2024test Sun Feb 4 14:09 2024

As seen by the bottom "bpool/BOOT/ubuntu_l7y21m@2-4-2024test Sun Feb 4 14:09 2024"

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

Since it is also built for Mantic, I would expect it works for Mantic Also. (Will confirm later.)

I spun up a Jammy ZFS image. Changed the sources list to Noble. Added the PPA. Installed packages 'grub2 & shim'... Let it pull in the depends it needed. Change the sources.list back to jammy. Did some snapshots of bpool. Reconfigured grub, as it would do in an update... It said that was successful, so at that point was hopeful. Had fingers crossed.

Rebooted, failed. Booted straight to BIOS. Shutdown > Cold boot > Failed. Booted straight to BIOS.

So Grub2 2.12 works for the problems with Mantic and Noble, but not as a fix to Jammy 22.04.

That sort of goes back to those last 4 patches (Grub2 related to ZFS) working to fix this problem for Noble & Mantic, but (somehow) it needs to be backported to fix Jammy. I'm thinking that might be possible by either backporting to Grub2 2.06, or finding a way to make Grub2 2.12 work with Jammy.

Dang.

Revision history for this message
Richard Laager (rlaager) wrote :

Any chance this test needs a re-run of “grub-install”, not just “update-grub” (as you would get from a reconfigure)?

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

Wahoo! Found the magic combination.

chrooted into the broken jammy system.

Add a Noble sources list.

Add the ppa. Edit the added /etc/apt/sources.list.d/ubuntu-uefi-team-ubuntu-build-jammy.list's active line to:
>>>
deb [trusted=yes] https://ppa.launchpadcontent.net/ubuntu-uefi-team/build/ubuntu/ noble main
>>>
Update the apt cache

Install grub-efi-amd64 & grub-efi-amd64-signed from the ppa.

Reconfigure Grub2 and update the initramfs images.

Change the sources list back to the original jammy sources.list

There were still existing snapshots there.

Exit the chroot. Umount the mounts. Export the pools. Reboot.

Booted fine.

Fixed. Successful. Booting from Grub2 2.12.

Writing up the work-around to fix it. I can post it on my ZFS -Fixes/Work-Arounds GitHub Repo, and on the htread in the Forum on this problem...Or I can post it here, and refer to this bug report again.

Maybe here is best. That way it is a known work-around for broken affected systems, until we can figure out the next step.

The next step would be, how do we get Grub2 2.12 into Jammy through the updates channel? I would say it first needs to go to jammy proposed, then tested that it works through an update process.

Then respin the installer ISO or include it in the next point release.. Since there is a fix, we don't need to create new victims.

Just thinking out loud.

Revision history for this message
Mike Ferreira (mafoelffen) wrote :

@rlaager ---

When I said reconfigure, i did:
>>>
sudo grub-install --target=x86_64-efi --efi-directory=/boot/efi \
    --bootloader-id=ubuntu --recheck --no-floppy
>>>
So yes, was that.

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

The command log for the work-around for a broken jammy install:
>>>
sudo su -
zpool export -a
zpool import -N -R /mnt rpool
zpool import -N -R /mnt bpool
UUID=$(zfs list | awk '/^bpool\/BOOT\/ubuntu_/ {print $1}' | sed 's/bpool\/BOOT\/ubuntu_//g')
zfs mount rpool/ROOT/ubuntu_$UUID
zfs mount bpool/BOOT/ubuntu_$UUID
zfs mount -a
mount --make-private --rbind /dev /mnt/dev
mount --make-private --rbind /proc /mnt/proc
mount --make-private --rbind /sys /mnt/sys
mount --make-private --rbind /run /mnt/run
chroot /mnt /bin/bash --login
mount -a

# Make backup of the original Jammy sources.list
cp /etc/apt/sources.list /etc/apt/sources.list.jammy

# Create a new Noble sources.list
sudo nano /etc/apt/sources.list.noble

# Fill with these contents:
deb http://us.archive.ubuntu.com/ubuntu/ noble main restricted universe multiverse
deb http://us.archive.ubuntu.com/ubuntu/ noble-updates main restricted universe multiverse
deb http://us.archive.ubuntu.com/ubuntu/ noble-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu noble-security main restricted universe multiverse
# Save and exit
cp /etc/apt/sources.list.noble /etc/sources.list
apt update

# Add the Grub2 2.12 ppa
add-apt-repository ppa:ubuntu-uefi-team/build

# Modify the sources.list line of the PPA to work with Jammy
nano /etc/apt/sources.list.d/ubuntu-uefi-team-ubuntu-build-jammy.list

# Change the active line to:
deb [trusted=yes] https://ppa.launchpadcontent.net/ubuntu-uefi-team/build/ubuntu/ noble main
# Save and exit nano
apt update

# Install the Grub2 packages
apt install grub-efi-amd64 grub-efi-amd64-signed
# This will pull in some needed depends of Grub2 2.12~rc1 from the Nolble Repo's

# Reinstall/configure grub
grub-install --target=x86_64-efi --efi-directory=/boot/efi \
    --bootloader-id=ubuntu --recheck --no-floppy

# Update the intramfs images
update-intramfs -c -k all

# Change the repo sources back to Jammy
cp /etc/apt/sources.list.jammy /etc/apt/sources.list
apt update

# Test: (Skip if snapshots already exist...)
zfs snapshot bpool/BOOT/ubuntu_2nlhsy@20230204a
zfs snapshot bpool/BOOT/ubuntu_2nlhsy@20230204b
zfs snapshot bpool/BOOT/ubuntu_2nlhsy@20230204c

zfs list -t snapshot

# Output:
#NAME USED AVAIL REFER MOUNTPOINT
#bpool/BOOT/ubuntu_2nlhsy@20230204a 0B - 298M -
#bpool/BOOT/ubuntu_2nlhsy@20230204b 0B - 298M -
#bpool/BOOT/ubuntu_2nlhsy@20230204c 0B - 298M -

# Exit Gracefully:
exit
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
    xargs -i{} umount -lf {}
zpool export -a

reboot
>>>

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

So yes, confirmed.

Grub2 2.12 with those patches fixes this issue for Jammy, Mantic and Noble. We need to apply the fix to all these releases. (Please.)

Revision history for this message
Mate Kukri (mkukri) wrote (last edit ):

GRUB 2.12 isn't going into the updates channel anywhere at this point. (It's going in proposed then release on Noble soon however)

The question is, are we going to backport the fix to 2.06 and push that to the updates channel, or can this wait for the 2.12 SRU sometimes after April?

It's good to confirm that this works btw, but I wouldn't recommend installing the "noble" series from that PPA on anything but noble, it can and will break things.

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):
Download full text (4.0 KiB)

@mkukri ---
>>>
The question is, are we going to backport the fix to 2.06 and push that to the updates channel, or can this wait for the 2.12 SRU sometimes after April?
>>>
What does that mean?

Option #1 -- The cherry picked Grub2 2.12 ZFS related commit code gets applied to Grub2 2.06 as a patch?

Option #2 -- After April, when Grub2 2.12 gets published and released as a "stable release update," that it also gets released to Jammy and Mantic?

But you said it has the potential to break Jammy? I'm not seeing that in my tests yet, but I believe you.

****
Here is how I see the logic of this, but it means nothing, as I am not in the decision process, but:

For option 1, that would take time anyways, and the Noble April Release is less than 2 months. So option #2 would be about the same wait with a whole lot less work involved by everyone.

***
Until then, I have three working work-arounds for this non-booting error:

Work-around #1-- The above work-around, which you say may break some things.

Work-around #2-- I do have another work-around, that involves booting from an Installer LiveUSB. Backing up the content of bpool to a created backup directory "somehwere" (size needed is less than 2GB), saving the old UUID that was used in the old dataset name to a file. I used another USB Flash drive. Destroy the old bpool pool. Create the new bpool with these explicit creation options:
>>>
zpool create \
    -o ashift=12 \
    -o autotrim=on \
    -o cachefile=/etc/zfs/zpool.cache \
    -o feature@async_destroy=enabled \
    -o feature@empty_bpobj=active \
    -o feature@lz4_compress=active \
    -o feature@multi_vdev_crash_dump=disabled \
    -o feature@spacemap_histogram=active \
    -o feature@enabled_txg=active \
    -o feature@hole_birth=active \
    -o feature@extensible_dataset=disabled \
    -o feature@embedded_data=active \
    -o feature@bookmarks=disabled \
    -o feature@filesystem_limits=disabled \
    -o feature@large_blocks=disabled \
    -o feature@large_dnode=disabled \
    -o feature@sha512=disabled \
    -o feature@skein=disabled \
    -o feature@edonr=disabled \
    -o feature@userobj_accounting=disabled \
    -o feature@encryption=disabled \
    -o feature@project_quota=disabled \
    -o feature@device_removal=disabled \
    -o feature@obsolete_counts=disabled \
    -o feature@zpool_checkpoint=disabled \
    -o feature@spacemap_v2=disabled \
    -o feature@allocation_classes=disabled \
    -o feature@resilver_defer=disabled \
    -o feature@bookmark_v2=disabled \
    -o feature@redaction_bookmarks=disabled \
    -o feature@redacted_datasets=disabled \
    -o feature@bookmark_written=disabled \
    -o feature@log_spacemap=disabled \
    -o feature@livelist=disabled \
    -o feature@device_rebuild=disabled \
    -o feature@zstd_compress=disabled \
    -o feature@draid=disabled \
    -o feature@zilsaxattr=disabled \
    -o feature@head_errlog=disabled \
    -o feature@blake3=disabled \
    -o feature@block_cloning=disabled \
    -o feature@vdev_zaps_v2=disabled \
    -o compatibility=grub2,ubuntu-22.04 \
    -O devices=off \
    -O acltype=posixacl \
    -O xattr=sa \
    -O compression=lz4 \
    -O normalization=formD \
    -O re...

Read more...

Revision history for this message
Rick S (1fallen) wrote :

Mike and I have posted Disclaimers to the users on Jammy **Warning of potential breakage with the added PPA

Mike has also included some very straight forward work-around's with The *Warning* to users on 22.04 until the fix is pushed to the users.

Revision history for this message
Mate Kukri (mkukri) wrote :

1. Doing 2.12 backports to Jammy will happen, but it's not an overnight process. The delta from 2.06 to 2.12 is large, and even though it might work on your and my machine. We are hoping that the 2.12 release to Debian testing now, upload to Debian -backports later, and release to Noble in April will give us more time to see the effect in the field.

Some minor breakage on old and odd machines will happen, I know that, as we have changed the entire kernel loading mechanism going from 2.12 -> 2.06, but hopefully it can be kept as minimal as possible. I am already in the process of integrating patches for issues reported in reaction to the Debian testing roll-out.

2. The PPA breaking Jammy has an additional very different reason to not doing 2.12 backports right now, it's because of the wrong build OS being used, basically:

If you only install grub-efi-* from the PPA it's less likely to break, but the problem is `grub*-common` is linked against user-space shlibs from Noble which can and will be newer than Jammy's, and ABI backwards compatibility isn't exactly always a guarantee.

The correct solution for Jammy would be to upload the same package with a slightly different version number and `jammy` as the series in d/changelog to a PPA, and then have
"deb [trusted=yes] https://ppa.launchpadcontent.net/your_user_here/ppa_here/ubuntu/ jammy main" in sources.

If you want a GRUB 2.12 build for Jammy right now, I can upload that to a PPA in my namespace.

Revision history for this message
Mate Kukri (mkukri) wrote :

Update: Instead of my name space, I have uploaded a 2.12 version to this PPA: https://launchpad.net/~ubuntu-uefi-team/+archive/ubuntu/backports-build

If it builds correctly, it can be used on Jammy with less fear by adding:
"deb [trusted=yes] https://ppa.launchpadcontent.net/ubuntu-uefi-team/backports-build/ubuntu/ jammy main"

The caveat of this being non-validated development software, and it not being compatible with Secure Boot are still there, but at least it should avoid the shlibs breakage. (Please note I have only uploaded grub2-unsigned, which only includes grub-efi-*, but that's all you need to test EFI changes).

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Julian Andres Klode (juliank) wrote :

Oh but please don't use `[trusted=yes]`, just add the repository with add-apt-repository ppa:ubuntu-uefi-team/backports-build.

Revision history for this message
Ryan C. Underwood (nemesis-icequake) wrote :

If you're using zfs-auto-snapshot, you can tell it to ignore bpool by setting the com.sun:auto-snapshot property to false on the bpool dataset:

# zfs set com.sun:auto-snapshot=false bpool

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

Ir worked to use that PPA without using the sources.list changes to point to Noble for Mantic and Noble... but Jammy still needs the source list from Noble to be able to pull in the needed depends. The needed depends versions are not in the jammy Repo.

The command log for the updated work-around using the "Backports build PPA"
>>>
sudo su -
zpool export -a
zpool import -N -R /mnt rpool
zpool import -N -R /mnt bpool
UUID=$(zfs list | awk '/^bpool\/BOOT\/ubuntu_/ {print $1}' | sed 's/bpool\/BOOT\/ubuntu_//g')
zfs mount rpool/ROOT/ubuntu_$UUID
zfs mount bpool/BOOT/ubuntu_$UUID
zfs mount -a
mount --make-private --rbind /dev /mnt/dev
mount --make-private --rbind /proc /mnt/proc
mount --make-private --rbind /sys /mnt/sys
mount --make-private --rbind /run /mnt/run
chroot /mnt /bin/bash --login
mount -a

# Make backup of the original Jammy sources.list
cp /etc/apt/sources.list /etc/apt/sources.list.jammy

# Create a new Noble sources.list
sudo nano /etc/apt/sources.list.noble

# Fill with these contents:
deb http://us.archive.ubuntu.com/ubuntu/ noble main restricted universe multiverse
deb http://us.archive.ubuntu.com/ubuntu/ noble-updates main restricted universe multiverse
deb http://us.archive.ubuntu.com/ubuntu/ noble-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu noble-security main restricted universe multiverse
# Save and exit
cp /etc/apt/sources.list.noble /etc/sources.list
apt update

# Add the "Backports build PPA" for Grub2 2.12
add-apt-repository ppa:ubuntu-uefi-team/backports-build
apt update

# If, for some reason, it gets an error saygn there is not release file, then do this
# Modify the sources.list line of the PPA to work with Jammy
nano /etc/apt/sources.list.d/ubuntu-uefi-team-ubuntu-build-jammy.list

# Change the active line to:
deb [trusted=yes] https://ppa.launchpadcontent.net/ubuntu-uefi-team/build/ubuntu/ noble main
# Save and exit nano
apt update

# Install the Grub2 packages
apt install grub-efi-amd64 grub-efi-amd64-signed

# Reinstall/configure grub
grub-install --target=x86_64-efi --efi-directory=/boot/efi \
    --bootloader-id=ubuntu --recheck --no-floppy

# Update the intramfs images
update-intramfs -c -k all

# Change the repo sources back to Jammy
cp /etc/apt/sources.list.jammy /etc/apt/sources.list
apt update

# Test: (Skip if snapshots already exist...)
zfs snapshot bpool/BOOT/ubuntu_2nlhsy@20230204a
zfs snapshot bpool/BOOT/ubuntu_2nlhsy@20230204b
zfs snapshot bpool/BOOT/ubuntu_2nlhsy@20230204c

zfs list -t snapshot

# Output:
#NAME USED AVAIL REFER MOUNTPOINT
#bpool/BOOT/ubuntu_2nlhsy@20230204a 0B - 298M -
#bpool/BOOT/ubuntu_2nlhsy@20230204b 0B - 298M -
#bpool/BOOT/ubuntu_2nlhsy@20230204c 0B - 298M -

# Exit Gracefully:
exit
mount | grep -v zfs | tac | awk '/\/mnt/ {print $3}' | \
    xargs -i{} umount -lf {}
zpool export -a

reboot
>>>

Revision history for this message
Mike Ferreira (mafoelffen) wrote :

In the next few days, when I get some "spare" time... I will spin up a few test Jammy VM's to capture what depends it has to pull in from other repo's, so that the depends can be ID'ed and maybe those packages can be added to that Jammy package in this new PPA.

Otherwise, as I just found out with a User, it put him into broken packages depends hell.

Doing that while possibly bringing in those depends would simplify that... as just add the PPA and go with it.

Revision history for this message
Nils Herde (hernil) wrote :

I've had two systems taken down by what seems to be this bug now in the last few weeks. This, together with the mismatched kernel and userspace version of the ZFS tooling is really, really bad.

I do get that a ZFS root and the HWE kernel together is a rather esoteric setup but it's all out-of-the-box available to setup from the installer and repos.

At this point any roll out of the 6.5 kernel should not be offered to users with a ZFS root and some sort of official documentation for how to recover should probably be posted somewhere.

Revision history for this message
Warren Prince (wprince) wrote :

I'm having the same problem but with mantic. I get to the grub-install line which causes an "unknown filesystem" error. Do you have any suggestions? Thx.

Revision history for this message
Ofloo (ofloo) wrote :

Remove snapshots from bpool and it should boot again at least that's what I understand is causing the issue.

Revision history for this message
Danny (dannyp777) wrote :

Just thought I would add the salient findings from Bug #2041739.

* update-grub/grub-probe fails when the `feature@extensible_dataset` flag is enabled on bpool
* this happens when the bpool is created without the `-o compatibility=grub2,ubuntu-22.04` compatibility flags and a subsequent snapshot is taken.
* a successful work around is to backup bpool and recreate it using the `-o compatibility=grub2,ubuntu-22.04` flags

Further thoughts:
* apparently problem is still happening with fresh Ubuntu Noble Numbat installations, which implies NN installation process not creating bpool with correct compatibility flags
* problem should be fixed in grub, but work-around compatibility flags works as long as they are used consistently
* I am not sure how our findings relate with the discussion around kernel/module versions. I am using the latest version of Ubuntu Mantic with kernel version 6.5.0-28-generic successfully. I also have linux-genereic-hwe-22.04-edge package installed on my system.

Revision history for this message
Danny (dannyp777) wrote :

Ignore what I said about Noble Numbat, it was actually a fresh Mantic installation that creates the bpool with incorrect compatibility flags.

Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

Those compatibility flags, mirror the pool create options I noted above... With doing a snapshot of bpool, and Send'ing it 'somewhere', even if that was to a flash-drive... Destroying the original bpool, recreate the new bpool, then Receive, the snapshot back to restore it, does fix that with less than 2GB of storage.

That is just one work-around. I've started (manually) making bpool's 3GB on ZFS-On-Root installs, to allow more room for my snapshots...

Upgrading Grub2 to 2.12, from 2.12rc and previous, fixes the problem also. There is 3-4 work-around listed in this report that works.

-----------------------------------------------------------------------
@wprince: You contacted me via email asking for assistance on your company's production email server being affected by this bug. Hopefully you see this. Your email address you emailed me with gets a server error where the DNS record is missing to that server. Cannot reply back unless you give me some alternate contact info. tried. Sorry. Please try again. Left you a phone message.

Revision history for this message
Mike Ferreira (mafoelffen) wrote :

@mate ---

I checked the changelogs of the current Grub2 packages for Jammy.

I don't see patches for this issue yet. The last updates for Grub2 packages for Jammy were in April, and were for ARM, and a CVE security update by you. Because that was "Security", I understand that those took precedence.

But is now 2 months since then, with no updates/patches.

You said 2.12 was patched, but current Grub2 in Jammy is still 2.06-2ubuntu7.2 amd64...

Any expected timeline for Jammy?

Revision history for this message
Mate Kukri (mkukri) wrote (last edit ): Re: [Bug 2051999] Re: Grub2 2.06 has upstream bug that results in Non-booting with ZFS after snapshot of bpool.

Hi Mike,

Backporting and SRU-ing GRUB 2.12 to Jammy is part of the roadmap for
the Ubuntu 24.10 cycle. I would like to not make a more specific public
commitment right now, but it's inside the realm of possibility that it
will happen a lot sooner than October 2024.

Mate

On Sat, May 18, 2024 at 6:40 PM Mike Ferreira
<email address hidden> wrote:
>
> @mate ---
>
> I checked the changelogs of the current Grub2 packages for Jammy.
>
> I don't see patches for this issue yet. The last updates for Grub2
> packages for Jammy were in April, and were for ARM, and a CVE security
> update by you. Because that was "Security", I understand that those took
> precedence.
>
> But is now 2 months since then, with no updates/patches.
>
> You said 2.12 was patched, but current Grub2 in Jammy is still
> 2.06-2ubuntu7.2 amd64...
>
> Any expected timeline for Jammy?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2051999
>
> Title:
> Grub2 2.06 has upstream bug that results in Non-booting with ZFS after
> snapshot of bpool.
>
> Status in grub2 package in Ubuntu:
> Confirmed
> Status in grub2-unsigned package in Ubuntu:
> Confirmed
> Status in zfs-linux package in Ubuntu:
> Confirmed
>
> Bug description:
> There is an upstream Bug with Grub where if you create snapshots of bpool, it results in a non-booting System. The problem was found to be an upstream Bug with Grub2:
> https://savannah.gnu.org/bugs/index.php?64297
>
> Multiple Ubuntu 22.04.3 Users Affected:
> https://ubuntuforums.org/showthread.php?t=2494397&highlight=zfs+grub+bug
> https://ubuntuforums.org/showthread.php?t=2494957
>
> Brought up as an issue at OpenZFS:
> https://github.com/openzfs/zfs/issues/13873
>
> If you look at this comment (https://github.com/openzfs/zfs/issues/13873#issuecomment-1892911836), if was found the Savanaugh at GNU released a fix for it in Grub2 2.12, here:
> https://git.savannah.gnu.org/cgit/grub.git/log/grub-core/fs/zfs/zfs.c
>
> Ubuntu Jammy 22.04.3 is Grub2 2.06. We need to backported this patch
> to Grub2 2.06 so that Users are not caught of in this bug for or
> currently supported LTS Release.
>
> ProblemType: Bug
> DistroRelease: Ubuntu 22.04
> Package: grub-efi-amd64 2.06-2ubuntu14.4
> ProcVersionSignature: Ubuntu 6.2.0-39.40~22.04.1-generic 6.2.16
> Uname: Linux 6.2.0-39-generic x86_64
> NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
> ApportVersion: 2.20.11-0ubuntu82.5
> Architecture: amd64
> CasperMD5CheckResult: unknown
> CurrentDesktop: GNOME
> Date: Thu Feb 1 16:40:28 2024
> InstallationDate: Installed on 2021-09-23 (861 days ago)
> InstallationMedia: Ubuntu 20.04.3 LTS "Focal Fossa" - Release amd64 (20210819)
> SourcePackage: grub2-unsigned
> UpgradeStatus: Upgraded to jammy on 2022-08-17 (533 days ago)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/2051999/+subscriptions
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.