grub stuck on loading kernel, fails to ls zfs and swap partitions

Bug #1867542 reported by Andreas Hasenack on 2020-03-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Undecided
Unassigned

Bug Description

I did a fresh install on a test laptop of 20.04 a while back, and after today's update, it no longer boots. Today's update included 2.31-0ubuntu6, but other 20.04 machines of mine also applied that and didn't fail, so I can't really be sure it's the cause.

I also have zsys installed, and I noticed there are many snapshots of my datasets, and grub has a new menu entry about history.

All that being said, I'm still troubleshooting and so far these are the facts:

- partition layout:
Device Start End Sectors Size Type
/dev/sda1 2048 1050623 1048576 512M EFI System
/dev/sda2 1050624 5244927 4194304 2G Linux swap
/dev/sda3 5244928 9439231 4194304 2G Solaris boot
/dev/sda4 9439232 937703054 928263823 442.6G Solaris root

- grub hangs when doing an ls on (hd0,gpt2), which is swap
- grub hangs when doing an ls on (hd0,gpt4).
- zfs-info (from grub's command line) on gpt3 is happy. It identifies it as bpool
- zfs-info on gpt4 complains it has unsupported features, but does not hang

When booting the system, it hangs right after I select a menu entry in grub, regardless of which one. Even the history ones from zfs hang, although I didn't try all. It looks just like the hang on the simple ls command.

This is grub2 2.04-1ubuntu22, and zsys 0.4.1. I'll see if I can run collect from inside the mounted system in a chroot from a rescue image, and attach info to this log.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu20
Architecture: amd64
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2020-02-29 (15 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Alpha amd64 (20200228)
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: grub2-common 2.04-1ubuntu22
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 5.4.0-14.17-generic 5.4.18
Tags: focal
Uname: Linux 5.4.0-14-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True

Andreas Hasenack (ahasenack) wrote :

photo showing the hang, and where it works

apport information

description: updated
tags: added: apport-collected focal
description: updated

apport information

apport information

summary: - grub stuck on loading kernel, fails to ls zfs partition
+ grub stuck on loading kernel, fails to ls zfs and swap partitions
Andreas Hasenack (ahasenack) wrote :

If I select advanced, then safe mode, all I see is

Loading Linux 5.4.0-14-generic ...
<cursor>

I've left it at that for about 30min, nothing changes. ctrl-alt-del also doesn't work, nor does sysrq boot.

Andreas Hasenack (ahasenack) wrote :

Another important bit of information is that rpool (NOT bpool) has zfs encryption enabled. This was working just fine: I got a prompt for the password during boot (graphical even). But now the kernel doesn't even load.

Andreas Hasenack (ahasenack) wrote :

I just ran dist-upgrade, fetched the latest focal (not proposed) updated, ran update-grub, grub-install, same thing.

tags: added: champagne
Andreas Hasenack (ahasenack) wrote :

I removed all the zsys snapshots (had about 17), and then it booted fine. grub must have some trouble when there are many zfs snapshots. I tried removing only the last and rebooted, and it still wouldn't boot, but I didn't run update-grub, nor updated the initramfs, nor reinstalled grub, in that attempt.

Note that update-grub was listing the kernels from all those snapshots.

Steve Langasek (vorlon) wrote :

There is definitely a bug in grub here if it's hanging, but it seems like zsys should be doing a better job of garbage collecting old snapshots instead of letting this list grow to the point that it breaks grub.

Didier Roche (didrocks) wrote :

The ZSys part of the issue (garbage collection not being agressive enough when reaching bpool or rpool size limit) is handled on bug #1876334.

I’ll just add a reference there and remove the ZSys task instead of dupping so that the foundation team can handle the grub side of it.

no longer affects: zsys (Ubuntu)
tags: added: rls-gg-notfixing
removed: champagne
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers