grub-probe for zfs assumes all devices prefix with /dev, ignoring /dev/disk/...

Bug #1527727 reported by Chad Miller on 2015-12-18
108
This bug affects 15 people
Affects Status Importance Assigned to Milestone
grub
Unknown
Unknown
grub2 (Ubuntu)
Medium
Chad Miller
Xenial
Medium
Mathieu Trudel-Lapierre
grub2-signed (Ubuntu)
Medium
Unassigned
Xenial
Medium
Mathieu Trudel-Lapierre
zfs-linux (Ubuntu)
Medium
Colin Ian King
Xenial
Medium
Unassigned

Bug Description

[Impact]
Installs over ZFS where a ZFS disk is expected to be used as a root device.

[Test case]
- Run update-grub on a system with a ZFS root filesystem.

[Regression Potential]
Installs relying on the current broken behavior to avoid listing other operating systems in grub menu may find that new entries are added.

---

update-grub runs /usr/sbin/grub-probe

Without libzfslinux support compiled in, /usr/sbin/grub-probe runs ["zpool", "status", poolname] to find out ZFS info.

zpool responds with device names as used at (I think!) pool creation time. Often, this is /dev/disk/by-id/... names, without the path.

grub-probe then parses the output, and takes the names of devices, and if they do not start with a "/", it prepends "/dev/".

It then tests the existence of the path name of the device. it fails.

grub-probe then returns something like

/usr/sbin/grub-probe: error: failed to get canonical path of `/dev/ata-ST31000333AS_99999999-part1'.

The actual path is of course /dev/disk/by-id/ST31000333AS_99999999-part1

It can prepend smarter than "/dev" or it can understand ZFS natively, to fix the problem.

Chad Miller (cmiller) on 2015-12-18
description: updated
description: updated
Chad Miller (cmiller) on 2015-12-18
description: updated
description: updated
Chad Miller (cmiller) wrote :

Bug confirmed here:

https://github.com/zfsonlinux/grub/issues/5

and step 2 of

https://github.com/zfsonlinux/pkg-zfs/wiki/HOWTO-install-Ubuntu-14.04---15.04-to-a-Native-ZFS-Root-Filesystem

which advocates some craziness with udev to copy /dev/disk/by-id/ names to /dev/ .

Changed in grub2 (Ubuntu):
status: New → Confirmed
Chad Miller (cmiller) wrote :

Rationale for the order is that it should go in most unique to least unique order. Use device model and serial , then uuid, then human names, then assigned numbers.

Changed in grub2 (Ubuntu):
importance: Undecided → Medium
Chad Miller (cmiller) wrote :

A better change. Commented, manage memory, changelog better.

Chad Miller (cmiller) wrote :

The correct fix for this is to link in libzfslinux-dev to get smarter zfs polling, instead of parsing output of "zpool" command.

Changed in grub2 (Ubuntu):
assignee: nobody → Chad Miller (cmiller)
Chad Miller (cmiller) on 2015-12-27
description: updated
Chad Miller (cmiller) wrote :

License for libzfs is incompatible with grub. Also, main vs universe.

Chad Miller (cmiller) wrote :

So, back to patch in #4.

Chad Miller (cmiller) wrote :

Here's the short summary:

libzfs has a license that (most people agree) does not let grub link against it.

Linking against it is assumed by grub upstream to be the right way. But they have a fallback way that involves running a zfs utils command to list its devices.

The devices the zfs utils program prints are not full device names but things like "hda1" or "ata-ST12346124623-ajqwers". Those are names that were scanned when you fabricated and imported the zfs device pool. There's no way to get exact paths.

Grub assumes they're all device names that are directly within /dev/ directory, and that's often wrong if the user used /dev/by-id/ paths to set up zfs pools. ZFS-knowledgeable people are *really* likely to have used unambiguous device names.

So, this patch in #4 takes the professed device file name and tests whether a name exists before printing it. It goes from most unique names to least unique names. It's incredibly unlikely to collide. bus-model-serial names are unlike uuid names, and unlike sdx names. It should be safe in all cases.

Colin Ian King (colin-king) wrote :

Chad, OK, sounds good to me. Do you want me to SRU this for wily? or are we looking at just for Xenial?

Chad Miller (cmiller) wrote :

Colin, I don't think many people expect last release to be capable of ZFS root installation, so I'm happy with ignoring it in 15.10.

Have you sent this to grub upstream? Ideally we should sent it there for inclusion, then we can upload the fix to Debian and sync it to Ubuntu.

Chad Miller (cmiller) wrote :

As noted on IRC, upstream doesn't want this patch. "Use libzfs! We can't anticipate all device names, so we don't want to add any more. Or, change zpool to emit full names."

Chad Miller (cmiller) wrote :
Hajo Möller (dasjoe) wrote :

As the pool name is known we could run:
zdb -C poolname
to get the pool's configuration, which includes the path to all members of all (available) vdevs.

To get the configuration of a known, exported pool we could use:
zdb -C -e poolname
This will work as long as the pool is not imported and its devices can be found in /dev/.

Sample output follows:
root@sbooblehat:~/zfs# zdb -C sbooblehat-rpool

MOS Configuration:
        version: 5000
        name: 'sbooblehat-rpool'
        state: 0
        txg: 3550165
        pool_guid: 13767226917234919597
        errata: 0
        hostid: 2831164860
        hostname: 'sbooblehat'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 13767226917234919597
            children[0]:
                type: 'disk'
                id: 0
                guid: 2058733885599967477
                path: '/dev/disk/by-id/ata-Samsung_SSD_850_EVO_120GB_S21UNSAG310692M-part1'
                whole_disk: 1
                metaslab_array: 35
                metaslab_shift: 29
                ashift: 13
                asize: 90011336704
                is_log: 0
                DTL: 59
                create_txg: 4
        features_for_read:

root@sbooblehat:~/zfs# zdb -C -e sbooblehat-rpool
zdb: can't open 'sbooblehat-rpool': File exists

root@sbooblehat:~/zfs# zdb -C -e TEMP
zdb: can't open 'TEMP': No such file or directory

root@sbooblehat:~/zfs# zdb -C -e -p . TEMP

MOS Configuration:
        version: 5000
        name: 'TEMP'
        state: 1
        txg: 8
        pool_guid: 6410361307144069801
        errata: 0
        hostid: 2831164860
        hostname: 'sbooblehat'
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 6410361307144069801
            create_txg: 4
            children[0]:
                type: 'file'
                id: 0
                guid: 15915301152600074830
                path: '/root/zfs/A'
                metaslab_array: 34
                metaslab_shift: 24
                ashift: 9
                asize: 100139008
                is_log: 0
                create_txg: 4
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data

Chad Miller (cmiller) wrote :

dasjoe, that's not bad, but I don't know enough of "zdb" to propose it.

Chad Miller (cmiller) wrote :
Chad Miller (cmiller) wrote :

What it looks like without this patch:

Setting up grub-efi-amd64 (2.02~beta2-36) ...
Installing for x86_64-efi platform.
grub-install: error: failed to get canonical path of `/dev/ata-ST31000333AS_XXXXXXXX-part1'.
Failed: grub-install --target=x86_64-efi --force-extra-removable
WARNING: Bootloader is not properly installed, system may not be bootable
/usr/sbin/grub-probe: error: failed to get canonical path of `/dev/ata-ST31000333AS_XXXXXXXX-part1'.
dpkg: error processing package grub-efi-amd64 (--configure):
 subprocess installed post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of linux-image-extra-4.4.0-4-generic:
 linux-image-extra-4.4.0-4-generic depends on linux-image-4.4.0-4-generic; however:
  Package linux-image-4.4.0-4-generic is not configured yet.

Martin Pitt (pitti) wrote :

Just prepending /dev to the device name is obviously wrong indeed, but slapping random other prefixes onto it does not make it better really. This hardcodes udev rules and kernel/driver behaviour, which is always going to be brittle or incomplete -- people might set up their own device namings, or use a funny device driver. For example, in this patch /dev/mapper/ is missing.

So this is a hack which I wouldn't like to see in an LTS release -- let's please fix the ZFS CLI tools by either

  - show the full path by default. It's more useful for human users, and the current behaviour of showing only the basename of a device node is completely useless for tools that parse the output

 - add a new option --machine-readable or similar which shows full paths, if not changing the default output format is important

Alternatively it might be possible to pry this information out of /sys somewhere?

Marlin Cremers (marlinc) wrote :

The idea behind the short name is readability and simplicity. The recommended way to set up a zpool is to use partitions using device labels that allow you to identify devices in the chassis. Adding the full device name would add a lot of unnecessary clutter in the UI.

I kind of support the idea for a machine-readable flag but it might be better a improve zdb rather than the zpool command.

Chad Miller (cmiller) on 2016-02-16
Changed in zfs-linux (Ubuntu):
status: New → Confirmed
assignee: nobody → Chad Miller (cmiller)
status: Confirmed → In Progress
Changed in grub2 (Ubuntu):
status: Confirmed → In Progress
Chad Miller (cmiller) wrote :

Okay, so making grub ask for zfs details using a new, nonstandard command-line parameter is bad because I can't depend on zfs version Foo in grub, and I don't want to Conflict on older versions, and I don't want zfs zpool status to fail on an unknown parameter.

So, this makes zfs zpool peek at the environment for an optional variable "WITH_FULL_PATHS" and not truncate the emitted path names if it is set. This change is compatible with everyone.

Hajo Möller (dasjoe) wrote :

Chad, as of now Ubuntu's shipped ZFS comes with a custom udev rule to generate the required links under /dev/, so this patch may not be required.

See https://bugs.launchpad.net/ubuntu/+source/zfs-initramfs/+bug/1530953/comments/28 for details.

Colin Ian King (colin-king) wrote :

Chad, can you check if comment #24 is correct, and if so, we can close this bug

Changed in zfs-linux (Ubuntu):
assignee: Chad Miller (cmiller) → Colin Ian King (colin-king)
importance: Undecided → Medium

Technically I'd consider it more of a workaround. The actual issue in GRUB
hasn't been resolved.
On 22 Feb 2016 12:16 p.m., "Colin Ian King" <email address hidden>
wrote:

> Chad, can you check if comment #24 is correct, and if so, we can close
> this bug
>
> ** Changed in: zfs-linux (Ubuntu)
> Assignee: Chad Miller (cmiller) => Colin Ian King (colin-king)
>
> ** Changed in: zfs-linux (Ubuntu)
> Importance: Undecided => Medium
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1527727
>
> Title:
> grub-probe for zfs assumes all devices prefix with /dev, ignoring
> /dev/disk/...
>
> Status in grub:
> Unknown
> Status in grub2 package in Ubuntu:
> In Progress
> Status in zfs-linux package in Ubuntu:
> In Progress
>
> Bug description:
> update-grub runs /usr/sbin/grub-probe
>
> Without libzfslinux support compiled in, /usr/sbin/grub-probe runs
> ["zpool", "status", poolname] to find out ZFS info.
>
> zpool responds with device names as used at (I think!) pool creation
> time. Often, this is /dev/disk/by-id/... names, without the path.
>
> grub-probe then parses the output, and takes the names of devices, and
> if they do not start with a "/", it prepends "/dev/".
>
> It then tests the existence of the path name of the device. it fails.
>
> grub-probe then returns something like
>
> /usr/sbin/grub-probe: error: failed to get canonical path of `/dev
> /ata-ST31000333AS_99999999-part1'.
>
> The actual path is of course /dev/disk/by-
> id/ST31000333AS_99999999-part1
>
> It can prepend smarter than "/dev" or it can understand ZFS natively,
> to fix the problem.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/grub/+bug/1527727/+subscriptions
>

Richard Laager (rlaager) wrote :

This is related:
https://github.com/zfsonlinux/zfs/pull/4343

If that was merged, then GRUB could be patched to use -L and/or -p.

Marlin Cremers (marlinc) wrote :

That'd be great! I hope both ZoL and Ubuntu are able to pull that in in time for 16.04.

Richard Laager (rlaager) wrote :

The upstream change was merged. I propose the following:

1) Update zfs-linux in Xenial with the patch: https://github.com/zfsonlinux/zfs/commit/d2f3e292dccab23e47ade3c67677a10f353b9e85

2) Patch grub2 in Xenial to setenv("ZPOOL_VDEV_NAME_PATH", "YES")

3) Remove the udev rules from zfs-initramfs in Xenial.

I'll attach debdiffs which accomplish this. They could use a second set of eyes, as I made these changes quickly. Rather than edit the patch of the ZFS commit to make it apply, I backported additional man page fixes.

Richard Laager (rlaager) wrote :
Chad Miller (cmiller) wrote :

Thanks, Richard. Okay, so first thing to change is grub.

After changing grub, at any time we can change zfs-linux to simultaneously, drop the symlink udev rule, accept the environment variable to change output, and Conflict on grub older than the one that starts emitting the new environment variable.

Richard Laager (rlaager) wrote :

Is this going to make the Xenial release?

Clint Silvester (5-clint) wrote :

Looks like the needed changes could go into the next release. https://github.com/zfsonlinux/zfs/issues/4414

Hajo Möller (dasjoe) wrote :

It is my understanding that upgrading to a new upstream ZFS version would require a FFe.

I have asked ZFS on Linux's release manager Ned Bass whether he plans to tag a point release which includes the relevant commits soon.

Hajo Möller (dasjoe) wrote :

Ned replied to my mail already, he plans to release 0.6.5.6 soon:

"I hadn't planned to, but it seems worthwhile to get this into 16.04 if it fixes grub-probe.
I'll get a 0.6.5.6 release out in the next few days that includes these patches."

Colin Ian King (colin-king) wrote :

Any idea when 0.6.5.6 will be out? I'll sync up with that as soon as it lands as a Feature Freeze Exception.

Colin Ian King (colin-king) wrote :

Thanks, I'm keenly aware of this and I'm working on it :-)

Changed in zfs-linux (Ubuntu):
status: In Progress → Fix Released

Uploaded to yakkety:

grub2 (2.02~beta2-36ubuntu5) yakkety; urgency=medium

  * debian/patches/zpool_full_device_name.patch: Signal to zpool that
    it should emit full names of constituent devices.

Changed in grub2 (Ubuntu):
status: In Progress → Fix Released
description: updated
Changed in grub2 (Ubuntu Xenial):
importance: Undecided → Medium
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
status: New → In Progress
Richard Laager (rlaager) wrote :

I confirmed the package in yakkety works (on Xenial). That is, I can successfully run update-grub without having the special-case "/dev/disk"-style symlink in /dev.

Is the next step to upload this to xenial-proposed?

This doesn't really depend on any extra steps -- I've just uploaded grub2 and grub2-signed to xenial-proposed queue, so as soon as it's reviewed by a member of the SRU team it will be available in proposed.

From there, we'll need people to test this carefully to make sure nothing is broken and that it's fulfilling its intended purposed (ie. fixing the issue), after that we'll be able to make it available to the general public in -updates.

Changed in grub2-signed (Ubuntu):
status: New → Fix Released
Changed in grub2-signed (Ubuntu Xenial):
status: New → In Progress
assignee: nobody → Mathieu Trudel-Lapierre (cyphermox)
importance: Undecided → Medium
Changed in grub2-signed (Ubuntu):
importance: Undecided → Medium
Changed in zfs-linux (Ubuntu Xenial):
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu Xenial):
status: New → Confirmed
Richard Laager (rlaager) wrote :

cyphermox: Any progress? It's been three months since your last comment where it was close to ready.

Richard Laager (rlaager) wrote :

[My previous comment was incorrect. I had the wrong file in the wrong package.]

Colin, with grub2 in yakkety being patched, we should be ready for zfs-initramfs in yakkety to drop /lib/udev/rules.d/60-zpool.rules and "Conflicts: grub2 << 2.02~beta2-36ubuntu5".

Timo Aaltonen (tjaalton) wrote :

cyphermox: Those uploads seem to have been dropped in favor of some other changes, so this needs to be reuploaded

Currently blocked on a grub SRU in xenial-proposed that isn't completely verified.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.