grub-probe fails with failed to get canonical path of rpool with ZFS

Bug #1874304 reported by Kevin Menard
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
zsys (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I've been running Ubuntu 20.04 daily builds for about a month. Within the past several days, something broke with grub where it is no longer able to probe my ZFS datasets in order to build up its menu. This had been working fine, aside from the issue #1867007 I previously filed, so I believe this is a regression. As a result of it, I'm unable to update my kernel.

I noticed it when trying to perform a dist-upgrade, but it's reproducible running `grub-update` on its own:

❯ sudo update-grub
[sudo] password for nirvdrum:
/usr/sbin/grub-probe: error: failed to get canonical path of `rpool/ROOT/ubuntu_bp7ow2'.

❯ zsysctl list
ID ZSys Last Used
-- ---- ---------
rpool/ROOT/ubuntu_bp7ow2 true current

I realize that's not much information to go off of. Please let me know what other diagnostic information you would need.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: MATE
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2020-03-12 (46 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Alpha amd64 (20200309)
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: zsys 0.4.5
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/usr/bin/fish
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_bp7ow2@/vmlinuz-5.4.0-18-generic root=ZFS=rpool/ROOT/ubuntu_bp7ow2 ro quiet splash
ProcVersionSignature: Ubuntu 5.4.0-18.22-generic 5.4.24
RelatedPackageVersions:
 zfs-initramfs 0.8.3-1ubuntu12
 zfsutils-linux 0.8.3-1ubuntu12
Tags: focal
Uname: Linux 5.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip docker lpadmin lxd plugdev sambashare sudo
ZFSImportedPools:
 NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
 rpool 920G 458G 462G - - 21% 49% 1.00x DEGRADED -
ZFSListcache-bpool:
 bpool /boot off on on off on off on off - none
 bpool/BOOT none off on on off on off on off - none
 bpool/BOOT/ubuntu_bp7ow2 /boot on on on off on off on off - none
ZSYSDump: Error: command ['zsysctl', 'service', 'dump'] failed with exit code 1: level=error msg="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
_MarkForUpload: True

Kevin Menard (nirvdrum)
description: updated
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

Hey,

It’s a little bit hard to debug as you opened the bug without using ubuntu-bug tools which collects a lot of information (you can run apport-collect to attach more of them).

Some tracks of though to get this going:
- do you mind checking that zfs-initramfs and zfsutils-linux are both installed?
- in general, can you list all zfs packages states? (dpkg -l *zfs*)

This error is mostly grub not being able to load the zfs module (wed should probably fail earlier than letting grub itself failing). However, you do have the module loaded as zsysctl is returning you something, which is puzzling…

Revision history for this message
Kevin Menard (nirvdrum) wrote :

Sorry about not using apport. I wasn't sure I was picking the correct package and didn't want to send over a bunch of useless logs. But, if there's something in particular you'd like, I'm happy to collect it.

As for the package listing:

❯ dpkg -l "*zfs*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=================-===============-============-==========================================================
un libguestfs-zfs <none> <none> (no description available)
un libzfs2 <none> <none> (no description available)
ii libzfs2linux 0.8.3-1ubuntu12 amd64 OpenZFS filesystem library for Linux
un zfs <none> <none> (no description available)
ii zfs-auto-snapshot 1.2.4-2 all ZFS automatic snapshot service
un zfs-dkms <none> <none> (no description available)
un zfs-dracut <none> <none> (no description available)
un zfs-fuse <none> <none> (no description available)
ii zfs-initramfs 0.8.3-1ubuntu12 amd64 OpenZFS root filesystem capabilities for Linux - initramfs
un zfs-modules <none> <none> (no description available)
ii zfs-zed 0.8.3-1ubuntu12 amd64 OpenZFS Event Daemon
un zfsutils <none> <none> (no description available)
ii zfsutils-linux 0.8.3-1ubuntu12 amd64 command-line tools to manage OpenZFS filesystems

tags: added: zfs
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

In order to understand which call fails, could you edit the file /usr/sbin/grub-mkconfig and add a 'set -x' just under the 'set -e' as follow:

========================
#! /bin/sh
set -e
set -x

# Generate grub.cfg by inspecting /boot contents.
[...]
========================

Then run update-grub with:
$ sudo update-grub 2>&1|tee /tmp/grub.log

and attach the resulting file /tmp/grub.log

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Could you also run the following command:

$ apport-collect 1874304

It'll attach logs that'll help us troubleshoot this issue.

Thanks.

affects: grub2 (Ubuntu) → zsys (Ubuntu)
Changed in zsys (Ubuntu):
status: New → Incomplete
Revision history for this message
Kevin Menard (nirvdrum) wrote :

❯ cat /tmp/grub.log

+ prefix=/usr
+ exec_prefix=/usr
+ datarootdir=/usr/share
+ prefix=/usr
+ exec_prefix=/usr
+ sbindir=/usr/sbin
+ bindir=/usr/bin
+ sysconfdir=/etc
+ PACKAGE_NAME=GRUB
+ PACKAGE_VERSION=2.04-1ubuntu26
+ host_os=linux-gnu
+ datadir=/usr/share
+ [ x = x ]
+ pkgdatadir=/usr/share/grub
+ export pkgdatadir
+ grub_cfg=
+ grub_mkconfig_dir=/etc/grub.d
+ basename /usr/sbin/grub-mkconfig
+ self=grub-mkconfig
+ grub_probe=/usr/sbin/grub-probe
+ grub_file=/usr/bin/grub-file
+ grub_editenv=/usr/bin/grub-editenv
+ grub_script_check=/usr/bin/grub-script-check
+ export TEXTDOMAIN=grub
+ export TEXTDOMAINDIR=/usr/share/locale
+ . /usr/share/grub/grub-mkconfig_lib
+ prefix=/usr
+ exec_prefix=/usr
+ datarootdir=/usr/share
+ datadir=/usr/share
+ bindir=/usr/bin
+ sbindir=/usr/sbin
+ [ x/usr/share/grub = x ]
+ test x/usr/sbin/grub-probe = x
+ test x/usr/bin/grub-file = x
+ test x = x
+ grub_mkrelpath=/usr/bin/grub-mkrelpath
+ which gettext
+ :
+ grub_tab=
+ test 2 -gt 0
+ option=-o
+ shift
+ argument -o /boot/grub/grub.cfg
+ opt=-o
+ shift
+ test 1 -eq 0
+ echo /boot/grub/grub.cfg
+ grub_cfg=/boot/grub/grub.cfg
+ shift
+ test 0 -gt 0
+ fgrep -qs ${GRUB_PREFIX}/video.lst /etc/grub.d/00_header
+ [ x = x ]
+ id -u
+ EUID=0
+ [ 0 != 0 ]
+ set /usr/sbin/grub-probe dummy
+ test -f /usr/sbin/grub-probe
+ :
+ /usr/sbin/grub-probe --target=device /
/usr/sbin/grub-probe: error: failed to get canonical path of `rpool/ROOT/ubuntu_bp7ow2'.
+ GRUB_DEVICE=

tags: added: apport-collected focal
description: updated
Revision history for this message
Kevin Menard (nirvdrum) wrote : Dependencies.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : Grub.cfg.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : Mounts.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : MountsGenerated.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : SystemdDefaultUnitsState.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : SystemdFailedUnits.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZFSDatasets.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZFSListcache-rpool.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZFSModules.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZFSMounts.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZFSPoolCache.gz

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZFSPoolsStatus.txt

apport information

Revision history for this message
Kevin Menard (nirvdrum) wrote : ZSYSJournal.txt

apport information

Changed in zsys (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Thanks for the logs

The error happens in grub-mkconfig. So much earlier than any zfs specific grub script.

One of the main issue on your system that could lead to this error is the corrupted rpool.

 NAME STATE READ WRITE CKSUM
 rpool DEGRADED 0 0 0
   nvme0n1p4 DEGRADED 0 0 32 too many errors

You should start by fixing it and see if it fixes the problem.

Then bpool is not mounted.
There is something that does automated snapshots which duplicate the automated snapshot feature of zsys.
docker + zfs creates a lot of dataset which may lead to timeouts in zsys.

Please start by fixing rpool and see if it improves the situation.

Changed in zsys (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Kevin Menard (nirvdrum) wrote :

I'm unfortunately not going to be able to repair the pool. Since grub-mkconfig was blocking all apt package installation, I simply added "exit 0" to the top of the script to get things working. But, this was a bad time to discover my backups weren't working. That's my fault for not verifying them. I used to have a manually created ZFS root using the ZoL documentation and replicated that back to a FreeNAS system (FreeBSD 11.3). This time I let Ubuntu create the root and apparently it added feature flags that FreeBSD doesn't so my replication system ceased working.

So, it's looking like I'm just going to have to recreate the pool. I leave it up to you as to whether you want to close this issue out. I do think it'd be nice if a grub-mkconfig failing didn't get apt into a state where nothing can be added or removed until grub-mkconfig successfully runs. It may very well be the case that fixing the pool requires fetching another software package.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for zsys (Ubuntu) because there has been no activity for 60 days.]

Changed in zsys (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Ajeet S (e5081) wrote :

This is the status of rpool. The error persists even after adding the missing files
$ sudo zpool status rpool -v
pool: rpool
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: scrub repaired 0B in 0 days 00:05:52 with 5 errors on Sun Dec 10 18:49:50 2023
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
7b079781-02d8-b74f-a11f-4171bcb642d1 DEGRADED 16 0 5 too many errors

errors: Permanent errors have been detected in the following files:

rpool/ROOT/ubuntu_o0g4qe@autozsys_od3hsc:/usr/lib/x86_64-linux-gnu/libsamba-errors.so.1
rpool/ROOT/ubuntu_o0g4qe@autozsys_od3hsc:/usr/lib/x86_64-linux-gnu/libsmbconf.so.0
rpool/ROOT/ubuntu_o0g4qe@autozsys_od3hsc:/usr/lib/x86_64-linux-gnu/samba/libndr-samba4.so.0

Revision history for this message
Najib Muhammad (najib632) wrote :

I found a way to fix it:
$ sudo zpool status rpool -v

or

$ ZPOOL_VDEV_NAME_PATH=1 zpool status

Then you copy the drive path under the "rpool" section

$ sudo zpool clear rpool /dev/disk/by-partuuid/{disk-partuuid}

In your case it should be:

$ sudo zpool clear rpool /dev/disk/by-partuuid/7b079781-02d8-b74f-a11f-4171bcb642d1

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.