grub-probe for zfs assumes all devices prefix with /dev, ignoring /dev/disk/...

Bug #1527727 reported by Chad Miller on 2015-12-18
128
This bug affects 18 people
Affects Status Importance Assigned to Milestone
grub
Unknown
Unknown
grub2 (Ubuntu)
Medium
Chad Miller
Xenial
Medium
Mathieu Trudel-Lapierre
grub2-signed (Ubuntu)
Medium
Unassigned
Xenial
Medium
Mathieu Trudel-Lapierre
zfs-linux (Ubuntu)
Medium
Colin Ian King
Xenial
Medium
Unassigned

Bug Description

[Impact]
Installs over ZFS where a ZFS disk is expected to be used as a root device.

[Test case]
- Run update-grub on a system with a ZFS root filesystem.

[Regression Potential]
Installs relying on the current broken behavior to avoid listing other operating systems in grub menu may find that new entries are added.

---

update-grub runs /usr/sbin/grub-probe

Without libzfslinux support compiled in, /usr/sbin/grub-probe runs ["zpool", "status", poolname] to find out ZFS info.

zpool responds with device names as used at (I think!) pool creation time. Often, this is /dev/disk/by-id/... names, without the path.

grub-probe then parses the output, and takes the names of devices, and if they do not start with a "/", it prepends "/dev/".

It then tests the existence of the path name of the device. it fails.

grub-probe then returns something like

/usr/sbin/grub-probe: error: failed to get canonical path of `/dev/ata-ST31000333AS_99999999-part1'.

The actual path is of course /dev/disk/by-id/ST31000333AS_99999999-part1

It can prepend smarter than "/dev" or it can understand ZFS natively, to fix the problem.

Chad Miller (cmiller) on 2015-12-18
description: updated
description: updated
Chad Miller (cmiller) on 2015-12-18
description: updated
description: updated
Chad Miller (cmiller) wrote :

Bug confirmed here:

https://github.com/zfsonlinux/grub/issues/5

and step 2 of

https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubuntu-14.04---15.04-to-a-Native-ZFS-Root-Filesystem

which advocates some craziness with udev to copy /dev/disk/by-id/ names to /dev/ .

Changed in grub2 (Ubuntu):
status: New → Confirmed
Chad Miller (cmiller) wrote :

Rationale for the order is that it should go in most unique to least unique order. Use device model and serial , then uuid, then human names, then assigned numbers.

Changed in grub2 (Ubuntu):
importance: Undecided → Medium
Chad Miller (cmiller) wrote :

A better change. Commented, manage memory, changelog better.

Chad Miller (cmiller) wrote :

The correct fix for this is to link in libzfslinux-dev to get smarter zfs polling, instead of parsing output of "zpool" command.

Changed in grub2 (Ubuntu):
assignee: nobody → Chad Miller (cmiller)
Chad Miller (cmiller) on 2015-12-27
description: updated
Chad Miller (cmiller) wrote :

License for libzfs is incompatible with grub. Also, main vs universe.

Chad Miller (cmiller) wrote :

So, back to patch in #4.

Chad Miller (cmiller) wrote :

Here's the short summary:

libzfs has a license that (most people agree) does not let grub link against it.

Linking against it is assumed by grub upstream to be the right way. But they have a fallback way that involves running a zfs utils command to list its devices.

The devices the zfs utils program prints are not full device names but things like "hda1" or "ata-ST12346124623-ajqwers". Those are names that were scanned when you fabricated and imported the zfs device pool. There's no way to get exact paths.

Grub assumes they're all device names that are directly within /dev/ directory, and that's often wrong if the user used /dev/by-id/ paths to set up zfs pools. ZFS-knowledgeable people are *really* likely to have used unambiguous device names.

So, this patch in #4 takes the professed device file name and tests whether a name exists before printing it. It goes from most unique names to least unique names. It's incredibly unlikely to collide. bus-model-serial names are unlike uuid names, and unlike sdx names. It should be safe in all cases.

Colin Ian King (colin-king) wrote :

Chad, OK, sounds good to me. Do you want me to SRU this for wily? or are we looking at just for Xenial?

Chad Miller (cmiller) wrote :

Colin, I don't think many people expect last release to be capable of ZFS root installation, so I'm happy with ignoring it in 15.10.

Have you sent this to grub upstream? Ideally we should sent it there for inclusion, then we can upload the fix to Debian and sync it to Ubuntu.

Chad Miller (cmiller) wrote :

As noted on IRC, upstream doesn't want this patch. "Use libzfs! We can't anticipate all device names, so we don't want to add any more. Or, change zpool to emit full names."

Chad Miller (cmiller) wrote :
Hajo Möller (dasjoe) wrote :

As the pool name is known we could run:
zdb -C poolname
to get the pool's configuration, which includes the path to all members of all (available) vdevs.

To get the configuration of a known, exported pool we could use:
zdb -C -e poolname
This will work as long as the pool is not imported and its devices can be found in /dev/.

Sample output follows:
root@sbooblehat:~/zfs# zdb -C sbooblehat-rpool

MOS Configuration:
        version: 5000
        name: 'sbooblehat-rpool'
        state: 0
        txg: 3550165
        pool_guid: 13767226917234919597
        errata: 0
        hostid: 2831164860
        hostname: 'sbooblehat'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 13767226917234919597
            children[0]:
                type: 'disk'
                id: 0
                guid: 2058733885599967477
                path: '/dev/disk/by-id/ata-Samsung_SSD_850_EVO_120GB_S21UNSAG310692M-part1'
                whole_disk: 1
                metaslab_array: 35
                metaslab_shift: 29
                ashift: 13
                asize: 90011336704
                is_log: 0
                DTL: 59
                create_txg: 4
        features_for_read:

root@sbooblehat:~/zfs# zdb -C -e sbooblehat-rpool
zdb: can't open 'sbooblehat-rpool': File exists

root@sbooblehat:~/zfs# zdb -C -e TEMP
zdb: can't open 'TEMP': No such file or directory

root@sbooblehat:~/zfs# zdb -C -e -p . TEMP

MOS Configuration:
        version: 5000
        name: 'TEMP'
        state: 1
        txg: 8
        pool_guid: 6410361307144069801
        errata: 0
        hostid: 2831164860
        hostname: 'sbooblehat'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 6410361307144069801
            create_txg: 4
            children[0]:
                type: 'file'
                id: 0
                guid: 15915301152600074830
                path: '/root/zfs/A'
                metaslab_array: 34
                metaslab_shift: 24
                ashift: 9
                asize: 100139008
                is_log: 0
                create_txg: 4
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data

Chad Miller (cmiller) wrote :

dasjoe, that's not bad, but I don't know enough of "zdb" to propose it.

Chad Miller (cmiller) wrote :
Chad Miller (cmiller) wrote :

What it looks like without this patch:

Setting up grub-efi-amd64 (2.02~beta2-36) ...
Installing for x86_64-efi platform.
grub-install: error: failed to get canonical path of `/dev/ata-ST31000333AS_XXXXXXXX-part1'.
Failed: grub-install --target=x86_64-efi --force-extra-removable
WARNING: Bootloader is not properly installed, system may not be bootable
/usr/sbin/grub-probe: error: failed to get canonical path of `/dev/ata-ST31000333AS_XXXXXXXX-part1'.
dpkg: error processing package grub-efi-amd64 (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of linux-image-extra-4.4.0-4-generic:
 linux-image-extra-4.4.0-4-generic depends on linux-image-4.4.0-4-generic; however:
  Package linux-image-4.4.0-4-generic is not configured yet.

Martin Pitt (pitti) wrote :

Just prepending /dev to the device name is obviously wrong indeed, but slapping random other prefixes onto it does not make it better really. This hardcodes udev rules and kernel/driver behaviour, which is always going to be brittle or incomplete -- people might set up their own device namings, or use a funny device driver. For example, in this patch /dev/mapper/ is missing.

So this is a hack which I wouldn't like to see in an LTS release -- let's please fix the ZFS CLI tools by either

  - show the full path by default. It's more useful for human users, and the current behaviour of showing only the basename of a device node is completely useless for tools that parse the output

 - add a new option --machine-readable or similar which shows full paths, if not changing the default output format is important

Alternatively it might be possible to pry this information out of /sys somewhere?

Marlin Cremers (marlinc) wrote :

The idea behind the short name is readability and simplicity. The recommended way to set up a zpool is to use partitions using device labels that allow you to identify devices in the chassis. Adding the full device name would add a lot of unnecessary clutter in the UI.

I kind of support the idea for a machine-readable flag but it might be better a improve zdb rather than the zpool command.

Chad Miller (cmiller) on 2016-02-16
Changed in zfs-linux (Ubuntu):
status: New → Confirmed
assignee: nobody → Chad Miller (cmiller)
status: Confirmed → In Progress
Changed in grub2 (Ubuntu):
status: Confirmed → In Progress
Chad Miller (cmiller) wrote :

Okay, so making grub ask for zfs details using a new, nonstandard command-line parameter is bad because I can't depend on zfs version Foo in grub, and I don't want to Conflict on older versions, and I don't want zfs zpool status to fail on an unknown parameter.

So, this makes zfs zpool peek at the environment for an optional variable "WITH_FULL_PATHS" and not truncate the emitted path names if it is set. This change is compatible with everyone.

Hajo Möller (dasjoe) wrote :

Chad, as of now Ubuntu's shipped ZFS comes with a custom udev rule to generate the required links under /dev/, so this patch may not be required.

See https://bugs.launchpad.net/ubuntu/+source/zfs-initramfs/+bug/1530953/comments/28 for details.

Colin Ian King (colin-king) wrote :

Chad, can you check if comment #24 is correct, and if so, we can close this bug

Changed in zfs-linux (Ubuntu):
assignee: Chad Miller (cmiller) → Colin Ian King (colin-king)
importance: Undecided → Medium

Technically I'd consider it more of a workaround. The actual issue in GRUB
hasn't been resolved.
On 22 Feb 2016 12:16 p.m., "Colin Ian King" <email address hidden>
wrote:

> Chad, can you check if comment #24 is correct, and if so, we can close
> this bug
>
> ** Changed in: zfs-linux (Ubuntu)
> Assignee: Chad Miller (cmiller) => Colin Ian King (colin-king)
>
> ** Changed in: zfs-linux (Ubuntu)
> Importance: Undecided => Medium
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1527727
>
> Title:
> grub-probe for zfs assumes all devices prefix with /dev, ignoring
> /dev/disk/...
>
> Status in grub:
> Unknown
> Status in grub2 package in Ubuntu:
> In Progress
> Status in zfs-linux package in Ubuntu:
> In Progress
>
> Bug description:
> update-grub runs /usr/sbin/grub-probe
>
> Without libzfslinux support compiled in, /usr/sbin/grub-probe runs
> ["zpool", "status", poolname] to find out ZFS info.
>
> zpool responds with device names as used at (I think!) pool creation
> time. Often, this is /dev/disk/by-id/... names, without the path.
>
> grub-probe then parses the output, and takes the names of devices, and
> if they do not start with a "/", it prepends "/dev/".
>
> It then tests the existence of the path name of the device. it fails.
>
> grub-probe then returns something like
>
> /usr/sbin/grub-probe: error: failed to get canonical path of `/dev
> /ata-ST31000333AS_99999999-part1'.
>
> The actual path is of course /dev/disk/by-
> id/ST31000333AS_99999999-part1
>
> It can prepend smarter than "/dev" or it can understand ZFS natively,
> to fix the problem.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/grub/+bug/1527727/+subscriptions
>

Richard Laager (rlaager) wrote :

This is related:
https://github.com/zfsonlinux/zfs/pull/4343

If that was merged, then GRUB could be patched to use -L and/or -p.

Marlin Cremers (marlinc) wrote :

That'd be great! I hope both ZoL and Ubuntu are able to pull that in in time for 16.04.

Richard Laager (rlaager) wrote :

The upstream change was merged. I propose the following:

1) Update zfs-linux in Xenial with the patch: https://github.com/zfsonlinux/zfs/commit/d2f3e292dccab23e47ade3c67677a10f353b9e85

2) Patch grub2 in Xenial to setenv("ZPOOL_VDEV_NAME_PATH", "YES")

3) Remove the udev rules from zfs-initramfs in Xenial.

I'll attach debdiffs which accomplish this. They could use a second set of eyes, as I made these changes quickly. Rather than edit the patch of the ZFS commit to make it apply, I backported additional man page fixes.

Richard Laager (rlaager) wrote :
Chad Miller (cmiller) wrote :

Thanks, Richard. Okay, so first thing to change is grub.

After changing grub, at any time we can change zfs-linux to simultaneously, drop the symlink udev rule, accept the environment variable to change output, and Conflict on grub older than the one that starts emitting the new environment variable.

Richard Laager (rlaager) wrote :

Is this going to make the Xenial release?

Clint Silvester (5-clint) wrote :

Looks like the needed changes could go into the next release. https://github.com/zfsonlinux/zfs/issues/4414

Hajo Möller (dasjoe) wrote :

It is my understanding that upgrading to a new upstream ZFS version would require a FFe.

I have asked ZFS on Linux's release manager Ned Bass whether he plans to tag a point release which includes the relevant commits soon.

Hajo Möller (dasjoe) wrote :

Ned replied to my mail already, he plans to release 0.6.5.6 soon:

"I hadn't planned to, but it seems worthwhile to get this into 16.04 if it fixes grub-probe.
I'll get a 0.6.5.6 release out in the next few days that includes these patches."

Colin Ian King (colin-king) wrote :

Any idea when 0.6.5.6 will be out? I'll sync up with that as soon as it lands as a Feature Freeze Exception.

Colin Ian King (colin-king) wrote :

Thanks, I'm keenly aware of this and I'm working on it :-)

Changed in zfs-linux (Ubuntu):
status: In Progress → Fix Released

Uploaded to yakkety:

grub2 (2.02~beta2-36ubuntu5) yakkety; urgency=medium

  * debian/patches/zpool_full_device_name.patch: Signal to zpool that
    it should emit full names of constituent devices.

Changed in grub2 (Ubuntu):
status: In Progress → Fix Released
description: updated
Changed in grub2 (Ubuntu Xenial):
importance: Undecided → Medium
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
status: New → In Progress
Richard Laager (rlaager) wrote :

I confirmed the package in yakkety works (on Xenial). That is, I can successfully run update-grub without having the special-case "/dev/disk"-style symlink in /dev.

Is the next step to upload this to xenial-proposed?

This doesn't really depend on any extra steps -- I've just uploaded grub2 and grub2-signed to xenial-proposed queue, so as soon as it's reviewed by a member of the SRU team it will be available in proposed.

From there, we'll need people to test this carefully to make sure nothing is broken and that it's fulfilling its intended purposed (ie. fixing the issue), after that we'll be able to make it available to the general public in -updates.

Changed in grub2-signed (Ubuntu):
status: New → Fix Released
Changed in grub2-signed (Ubuntu Xenial):
status: New → In Progress
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
importance: Undecided → Medium
Changed in grub2-signed (Ubuntu):
importance: Undecided → Medium
Changed in zfs-linux (Ubuntu Xenial):
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu Xenial):
status: New → Confirmed
Richard Laager (rlaager) wrote :

cyphermox: Any progress? It's been three months since your last comment where it was close to ready.

Richard Laager (rlaager) wrote :

[My previous comment was incorrect. I had the wrong file in the wrong package.]

Colin, with grub2 in yakkety being patched, we should be ready for zfs-initramfs in yakkety to drop /lib/udev/rules.d/60-zpool.rules and "Conflicts: grub2 << 2.02~beta2-36ubuntu5".

Timo Aaltonen (tjaalton) wrote :

cyphermox: Those uploads seem to have been dropped in favor of some other changes, so this needs to be reuploaded

Currently blocked on a grub SRU in xenial-proposed that isn't completely verified.

Jens Elkner (jelmd) wrote :

Just encountered the same problem with xenial:

grub-probe: error: failed to get canonical path of `/dev/HDD0p2'
See also http://savannah.gnu.org/bugs/?50896

IMHO "zpool status -P $pool" should be used, everything else is a kludge, which will fail sooner or later.

Chad Miller (cmiller) wrote :

Jens, as far as I know, you shouldn't have gotten an error. Please file a new bug report, and include version numbers of 'grub2' and 'zfs-linux'.

Jens Elkner (jelmd) wrote :

Done - see 1687664.

Scott Moser (smoser) wrote :

I've marked this 'Fix Released' in xenial as Ubuntu's package in xenial at 0.6.5.6-0ubuntu3 which has the fix at [1] as mentioned in comment 29. Further, my experience with our zfs work in curtin indicates that it is fixed.

--
https://github.com/zfsonlinux/zfs/commit/d2f3e292dccab23e47ade3c67677a10f353b9e85

Changed in zfs-linux (Ubuntu Xenial):
status: Confirmed → Fix Released
Scott Moser (smoser) wrote :

I've uploaded just now grub2 (2.02~beta2-36ubuntu3.17) and
grub2-signed (1.66.17) to the xenial SRU queue.

There are currently versions in xenial-proposed for these packages with
fixes for bug 722950 and bug 1708245.

Hopefully after those uploads clear xenial-proposed the fix for this bug will
get into xenial-proposed.

I've also put a PPA up with the changes at
 https://launchpad.net/~smoser/+archive/ubuntu/lp1527727
which I hope will have builds some time today.

Hello Chad, or anyone else affected,

Accepted grub2 into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2/2.02~beta2-36ubuntu3.17 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in grub2 (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-xenial
Changed in grub2-signed (Ubuntu Xenial):
status: In Progress → Fix Committed
Łukasz Zemczak (sil2100) wrote :

Hello Chad, or anyone else affected,

Accepted grub2-signed into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.66.17 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

As a part of the Stable Release Updates quality process a search for Launchpad bug reports using the version of grub2 from xenial-proposed was performed and bug 1750732 was found. Please investigate this bug report to ensure that a regression will not be created by this SRU. In the event that this is not a regression remove the "verification-failed-xenial" tag from this bug report and add the tag "bot-stop-nagging" to bug 1750732 (not this bug). Thanks!

tags: added: verification-failed-xenial
Scott Moser (smoser) wrote :

I've removed the verification-failed-xenial tag.
The problem reported in bug 1750732 may be related to a grub upgrade, but
surely is not related to this specific grub upgrade.

Please see the changes
 http://launchpadlibrarian.net/354651271/grub2_2.02~beta2-36ubuntu3.16_2.02~beta2-36ubuntu3.17.diff.gz
for justification.

The only change made there is a 'setenv' call.

tags: removed: verification-failed-xenial
Scott Moser (smoser) wrote :

Hi.
I've confirmed this fix by using curtin's "vmtest" test harness.

I've applied the attached patch to curtin at 82622f129b.
The patch does:
a.) remove the work arounds we had in place to make zfs root work on xenial.
b.) enable proposed

Then I've run an install of a system that does a installation of zfs on root.
   ./tools/jenkins-runner examples/tests/zfsroot.yaml:tests/vmtests/test_zfsroot.py:XenialGATestZfsRoot

I'll attach the boot and install logs of that system also.

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

From the install log you can see that we install grub2 at 2.02~beta2-36ubuntu3.17.
   ..
   Unpacking grub2-common (2.02~beta2-36ubuntu3.17)
   ..

'dpkg-reconfigure grub-pc' and 'grub-install /dev/vda' are run

And also:

  run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.4.0-116-generic /boot/vmlinuz-4.4.0-116-generic
  Generating grub configuration file ...
  Found linux image: /boot/vmlinuz-4.4.0-9025-euclid
  Found initrd image: /boot/initrd.img-4.4.0-9025-euclid
  Found linux image: /boot/vmlinuz-4.4.0-116-generic
  Found initrd image: /boot/initrd.img-4.4.0-116-generic
    /run/lvm/lvmetad.socket: connect failed: No such file or directory

Then during boot we see

[ 0.000000] Command line: BOOT_IMAGE=/ROOT/zfsroot@/boot/vmlinuz-4.4.0-9025-euclid root=ZFS=rpool/ROOT/zfsroot ro console=ttyS0

So we have successfully booted from root on a zfs system.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial

The verification of the Stable Release Update for grub2 has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2 - 2.02~beta2-36ubuntu3.17

---------------
grub2 (2.02~beta2-36ubuntu3.17) xenial; urgency=medium

  * Signal to zpool that it should emit full names of constituent devices.
    (LP: #1527727)

 -- Scott Moser <email address hidden> Wed, 24 Jan 2018 16:21:35 -0500

Changed in grub2 (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-signed - 1.66.17

---------------
grub2-signed (1.66.17) xenial; urgency=medium

  * Rebuild against grub2 2.02~beta2-36ubuntu3.17. (LP: #1527727)

 -- Scott Moser <email address hidden> Thu, 25 Jan 2018 10:05:01 -0500

Changed in grub2-signed (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.