grub fails zfs root filesystem detection due to failed checksum

Bug #1635115 reported by Rob Starkey
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
Confirmed
Undecided
Unassigned
zfs-linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

<email address hidden>:~# lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04

<email address hidden>:~# apt-cache policy grub2-common
grub2-common:
  Installed: 2.02~beta2-36ubuntu3.2
  Candidate: 2.02~beta2-36ubuntu3.2
  Version table:
 *** 2.02~beta2-36ubuntu3.2 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.02~beta2-36ubuntu3 500
        500 http://us.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

I followed https://github.com/zfsonlinux/zfs/wiki/Ubuntu-16.04-Root-on-ZFS to make a ZFS native root disk system.

Installation went fine, systems works like a champ... but then, after a while, it's time to apt-get update ; apt-get upgrade to update my kernel. That apt-get install fails because grub can't detect it's root file system:

<email address hidden>:~# grub-probe /
grub-probe: error: unknown filesystem.

Strange, this use to work. Add in a -vv and I see:

grub-core/kern/fs.c:56: Detecting zfs...
grub-core/osdep/hostdisk.c:415: opening the device `/dev/sda1' in open_device()
grub-core/fs/zfs/zfs.c:1192: label ok 0
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:1007: check 2 passed
grub-core/fs/zfs/zfs.c:1018: check 3 passed
grub-core/fs/zfs/zfs.c:1025: check 4 passed
grub-core/fs/zfs/zfs.c:1035: check 6 passed
grub-core/fs/zfs/zfs.c:1043: check 7 passed
grub-core/fs/zfs/zfs.c:1054: check 8 passed
grub-core/fs/zfs/zfs.c:1064: check 9 passed
grub-core/fs/zfs/zfs.c:1086: check 11 passed
grub-core/fs/zfs/zfs.c:1112: check 10 passed
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1137: check 12 passed (feature flags)
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 2048/2048
grub-core/fs/zfs/zfs.c:1898: endian = -1
grub-core/fs/zfs/zfs.c:595: dva=8, b00258
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:442: checksum fletcher4 verification failed
grub-core/fs/zfs/zfs.c:447: actual checksum 00000059d95adad1 00006322fba304b3 00430f321ad7f439 21e873b51c15e3bc
grub-core/fs/zfs/zfs.c:452: expected checksum 000000035ca143fb 0000062fc3a48ff8 0005b49e0d96b07e 03841abf1fed586a
grub-core/fs/zfs/zfs.c:1919: incorrect checksum
grub-core/kern/fs.c:78: zfs detection failed.

Now here's the fun part. Rebooting the system will fix the checksum issue for a short amount of time. Here is what it looks like immediately after a reboot:

<email address hidden>:~# grub-probe /
zfs

grub-core/kern/fs.c:56: Detecting zfs...
grub-core/osdep/hostdisk.c:415: opening the device `/dev/sda1' in open_device()
grub-core/fs/zfs/zfs.c:1192: label ok 0
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:1007: check 2 passed
grub-core/fs/zfs/zfs.c:1018: check 3 passed
grub-core/fs/zfs/zfs.c:1025: check 4 passed
grub-core/fs/zfs/zfs.c:1035: check 6 passed
grub-core/fs/zfs/zfs.c:1043: check 7 passed
grub-core/fs/zfs/zfs.c:1054: check 8 passed
grub-core/fs/zfs/zfs.c:1064: check 9 passed
grub-core/fs/zfs/zfs.c:1086: check 11 passed
grub-core/fs/zfs/zfs.c:1112: check 10 passed
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1128: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1137: check 12 passed (feature flags)
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 2048/2048
grub-core/fs/zfs/zfs.c:1898: endian = -1
grub-core/fs/zfs/zfs.c:595: dva=8, c001c0
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:2678: endian = -1, blkid=0
grub-core/fs/zfs/zfs.c:2020: endian = -1
grub-core/fs/zfs/zfs.c:2051: endian = -1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = -1
grub-core/fs/zfs/zfs.c:595: dva=8, c001b8
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, c001a0
grub-core/fs/zfs/zfs.c:2682: alive
grub-core/fs/zfs/zfs.c:2493: looking for 'features_for_read'
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 200
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:2503: zap read
grub-core/fs/zfs/zfs.c:2516: fat zap
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, bb0c48
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:2276: fzap: length 18
grub-core/fs/zfs/zfs.c:2520: returned 0
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 512/512
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 70
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:2109: zap: name = org.illumos:lz4_compress, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2109: zap: name = com.delphix:hole_birth, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2109: zap: name = com.delphix:extensible_dataset, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2109: zap: name = com.delphix:embedded_data, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2109: zap: name = org.open-zfs:large_blocks, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2109: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2109: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:3250: alive
grub-core/fs/zfs/zfs.c:3062: endian = 1
grub-core/fs/zfs/zfs.c:2678: endian = 1, blkid=0
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, c001b8
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, c001a0
grub-core/fs/zfs/zfs.c:2682: alive
grub-core/fs/zfs/zfs.c:3069: alive
grub-core/fs/zfs/zfs.c:2493: looking for 'root_dataset'
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 200
grub-core/fs/zfs/zfs.c:2503: zap read
grub-core/fs/zfs/zfs.c:2516: fat zap
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, bb0c48
grub-core/fs/zfs/zfs.c:2276: fzap: length 13
grub-core/fs/zfs/zfs.c:2520: returned 0
grub-core/fs/zfs/zfs.c:3075: alive
grub-core/fs/zfs/zfs.c:3081: alive
grub-core/fs/zfs/zfs.c:3259: alive
grub-core/fs/zfs/zfs.c:3263: endian = 0
grub-core/fs/zfs/zfs.c:3272: endian = 1
grub-core/fs/zfs/zfs.c:3127: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 2048/2048
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 60
grub-core/fs/zfs/zfs.c:3352: endian = 1
grub-core/fs/zfs/zfs.c:3127: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 2048/2048
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 60
grub-core/fs/zfs/zfs.c:2678: endian = 1, blkid=0
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 50
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 48
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 40
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 38
grub-core/osdep/hostdisk.c:394: reusing open device `/dev/sda1'
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 30
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2051: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 28
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 16384/4096
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 20
grub-core/fs/zfs/zfs.c:2682: alive
grub-core/fs/zfs/zfs.c:2493: looking for 'ROOT'
grub-core/fs/zfs/zfs.c:2020: endian = 1
grub-core/fs/zfs/zfs.c:2046: endian = 1
grub-core/fs/zfs/zfs.c:1875: zio_read: E 0: size 512/512
grub-core/fs/zfs/zfs.c:1898: endian = 1
grub-core/fs/zfs/zfs.c:595: dva=8, 0
grub-core/fs/zfs/zfs.c:2503: zap read
grub-core/fs/zfs/zfs.c:2507: micro zap
grub-core/fs/zfs/zfs.c:2510: returned 0
zfs
grub-core/kern/disk.c:295: Closing `hostdisk//dev/sda'.

I see this issue across many servers that's I've done this to.

Any help would be appreciated.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: grub2-common 2.02~beta2-36ubuntu3.2
ProcVersionSignature: Ubuntu 4.4.0-43.63-generic 4.4.21
Uname: Linux 4.4.0-43-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
Date: Thu Oct 20 04:36:35 2016
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: grub2
UpgradeStatus: No upgrade log present (probably fresh install)

Tags: grub zfs
Revision history for this message
Rob Starkey (rstarkey-m) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grub2 (Ubuntu):
status: New → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

my server (zfs on root) fails to boot because of that checksum error after latest updates.. how to recover from that?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

'ls (hd0,1)/' from the grub rescue prompt says "unknown filesystem"

Revision history for this message
Richard Laager (rlaager) wrote :

In case it's related, can you confirm the kernel version you are using now, and the kernel version from before the upgrade?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

mine was (still) running 4.4.0-22 and upgraded to the latest (-45) which failed to boot

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

this could be related, nothing else pops up from the update log that I still had on a terminal backlog:

Tehdään asetuksia: grub-legacy-ec2 (0.7.8-1-g3705bb5-0ubuntu1~16.04.3) ...
Searching for GRUB installation directory ... found: /boot/grub
Cannot determine root device. Assuming /dev/hda1
This error is probably caused by an invalid /etc/fstab
Searching for default file ... found: /boot/grub/default
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-4.4.0-36-generic
Found kernel: /boot/vmlinuz-4.4.0-24-generic
Found kernel: /boot/vmlinuz-4.4.0-22-generic
Found kernel: /boot/vmlinuz-4.4.0-45-generic
Found kernel: /boot/vmlinuz-4.4.0-42-generic
Found kernel: /boot/vmlinuz-4.4.0-38-generic
Found kernel: /boot/vmlinuz-4.4.0-36-generic
Found kernel: /boot/vmlinuz-4.4.0-24-generic
Found kernel: /boot/vmlinuz-4.4.0-22-generic
Replacing config file /run/grub/menu.lst with new version
Updating /boot/grub/menu.lst ... done

maybe my bug isn't a dupe of the original one?

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

when attempting to rescue it from a live session, I get this:

grub-probe /
error: failed to get canonical path of '/dev/ata-WDC_....-part2'

which does look odd and wrong, should have /dev/disk/by-id or something?

Revision history for this message
Rob Starkey (rstarkey-m) wrote :

Hey Timo,

I don't think your issue it related to the original issue I raised when I opened this bug. Grub isn't having device path issue, it's having checksum issues.

Revision history for this message
Rob Starkey (rstarkey-m) wrote :

Kernel versions were 4.4.0-42 upgrading to 4.4.0-45.

The checksum mismatch happens with every kernel version, so I don't think it's kernel specific. Looks to be a grub or ZOL issue.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

well I get the checksum error when loading grub, so I can't even boot.. but I'll try to get help on irc

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

reinstalling grub on the disks fixed my issue, so it was caused by something else than what this bug was about, sorry about the noise

Revision history for this message
Colin Ian King (colin-king) wrote :

Just to add a note that this feature is not currently supported by Ubuntu.

Changed in zfs-linux (Ubuntu):
status: New → Won't Fix
Revision history for this message
Marcos Alano (mhalano) wrote :

I'm having the same problem with Ubuntu 20.4 after update my kernel.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.