zfs-initramfs: zfs import rpool fails - too early

Bug #1905160 reported by Jens Elkner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

/usr/share/initramfs-tools/scripts/zfs::import_pool() tries to import the root pool (in my case named rpool) before the related devices are available, i.e. scsi scan has been finished.

I use the following Q&D workaround to fix it:
--- /usr/share/initramfs-tools/scripts/zfs.orig 2020-08-18 11:10:41.000000000 +0200
+++ /usr/share/initramfs-tools/scripts/zfs 2020-11-22 09:03:10.199997698 +0100
@@ -217,12 +217,21 @@
 import_pool()
 {
    local pool="$1"
- local dirs dir
+ local dirs dir T=30

    # Verify that the pool isn't already imported
    # Make as sure as we can to not require '-f' to import.
    "${ZPOOL}" get name,guid -o value -H 2>/dev/null | grep -Fxq "$pool" && return 0

+ # Make sure, that at least rpool devices are available. Otherwise
+ # zfs import may fail for "no" reason.
+ while [ ! -e /dev/chassis/SYS/DOM0p2 ]; do
+ sleep 1
+ let T-=1
+ [ $T -lt 0 ] && break
+ done
+ /bin/udevadm settle
+
    # For backwards compatibility, make sure that ZPOOL_IMPORT_PATH is set
    # to something we can use later with the real import(s). We want to
    # make sure we find all by* dirs, BUT by-vdev should be first (if it

###

/dev/chassis/SYS/DOM0p2 is one dev of the rpool 3-way mirror [the others are /dev/chassis/SYS/DOM1p2 and /dev/nvme0n1p2]. DOM* are SM SuperDOMs sitting on */ata[56]/* bus, but get handled s scsi devices as well. Unfortunately things like 'udevadm settle' or 'udevadm settle --exit-if-exists=/dev/chassis/SYS/DOM0p2' or 'udevadm trigger --verbose --type=devices --subsystem-match=scsi_disk' did not fix the problem. Not sure whether there is a way to say, wait until scsi subsystem [scan] has been finished.

To get a better idea I'll attach 2 files: xxxlong-fail.out shows the debug output when booting the vanilla focal, xxxlong-patched.out shows the debug output with the patch above applied.

BTW: Because it takes Ubuntu ~ 29 min to boot when the 2nd pool with 44 disks is ZoL formatted, I disabled the OPROM usage of related SAS-HBA in the UEFI-BIOS (so that the disks cannot be seen) which brings it back to ~ 1:45 min until the kernel actually starts. I guess, therefore Linux needs a little bit more time to initialize it and that's why 'zfs import ...' starts too early. However, a system boot time of 30+ min is unacceptable and thus letting the BIOS initialize the HBA is not an option. I guess that's a linux efi bootloader problem: Solaris did not have any problems on this machine, even with HBA-OPROM enabled ...

Revision history for this message
Jens Elkner (jelmd) wrote :
Revision history for this message
Jens Elkner (jelmd) wrote :
Revision history for this message
lotuspsychje (lotuspsychje) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command only once, as it will automatically gather debugging information, in a terminal:
apport-collect 1905160

When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

Revision history for this message
Jens Elkner (jelmd) wrote :

There is no apport* installed on the machine.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.