Boot from a unique, stable, multipath-dependent symlink

Bug #1429327 reported by bugproxy on 2015-03-07
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2 (Ubuntu)
High
Unassigned
multipath-tools (Ubuntu)
Medium
Mathieu Trudel-Lapierre

Bug Description

-- Problem Description --
We try to install Ubuntu version 15.04 build 20150218 on our system which use the ibmveth for the network and bluefin (lpfc) for this disks.
The Installation completed and ask to continue and reboot, but then it drops to initramfs....
(initramfs) cat /proc/modules
usbhid 63247 0 - Live 0xd000000003520000
hid 128631 1 usbhid, Live 0xd0000000034a0000
lpfc 778399 1 - Live 0xd000000002e30000
dm_multipath 25603 0 - Live 0xd000000002cf0000
scsi_dh 10388 1 dm_multipath, Live 0xd000000002c10000
scsi_transport_fc 71871 1 lpfc, Live 0xd000000002890000
(initramfs) cat /proc/cmdline
BOOT_IMAGE=/vmlinux-3.18.0-13-generic root=/dev/mapper/creeklp1--vg-root ro splash quiet
Look like it cannot fine the vg-root to boot up.

  ?????????????????????????? [!!] Partition disks ???????????????????????????
  ? ?
  ? Note that all data on the disk you select will be erased, but not ?
  ? before you have confirmed that you really want to make the changes. ?
  ? ?
  ? Select disk to partition: ?
  ? ?
  ? Multipath mpath0 (WWID 36005076307ffc7b00000000000000717) - 16.1 ?
  ? Multipath mpath1 (WWID 36005076307ffc7b00000000000000718) - 16.1 ?
  ? Multipath mpath2 (WWID 36005076307ffc7b00000000000000719) - 16.1 ? ?
  ? Multipath mpath3 (WWID 36005076307ffc7b0000000000000071a) - 16.1 ? ?
  ? Multipath mpath4 (WWID 36005076307ffc7b0000000000000071b) - 64.4 ? ?
  ? SCSI1 (0,0,0) (sda) - 16.1 GB IBM 2107900 ? ?
  ? SCSI1 (0,5,1) (sdaa) - 16.1 GB IBM 2107900 ? ?
  ? SCSI1 (0,5,2) (sdab) - 16.1 GB IBM 2107900 ? ?
  ? SCSI1 (0,5,3) (sdac) - 16.1 GB IBM 2107900 ? ?
  ? SCSI1 (0,5,4) (sdad) - 64.4 GB IBM 2107900 ?
  ? ?
  ? <Go Back> ?
  ? ?
  ???????????????????????????????????????????????????????????????????????????

conflicting device node '/dev/mapper/mpath4p1' found, link to '/dev/dm-5' will not be created
conflicting device node '/dev/mapper/mpath4p2' found, link to '/dev/dm-6' will not be created
conflicting device node '/dev/mapper/mpath4p3' found, link to '/dev/dm-7' will not be created

  ??????????????????????????? [!] Partition disks ???????????????????????????
  ? ?
  ? You may use the whole volume group for guided partitioning, or part ?
  ? of it. If you use only part of it, or if you add more disks later, ?
  ? then you will be able to grow logical volumes later using the LVM ?
  ? tools, so using a smaller part of the volume group at installation ?
  ? time may offer more flexibility. ?
  ? ?
  ? The minimum size of the selected partitioning recipe is 596.0 MB (or ?
  ? 0%); please note that the packages you choose to install may require ?
  ? more space than this. The maximum available size is 64.2 GB. ?
  ? ?
  ? Hint: "max" can be used as a shortcut to specify the maximum size, or ?
  ? enter a percentage (e.g. "20%") to use that percentage of the maximum ?
  ? size. ?
  ? ?
  ? 64.2 GB______________________________________________________________ ?
  ? ?
  ? <Go Back> <Continue> ?
  ? ?
  ???????????????

   ?????????????????????????? [!!] Partition disks ??????????????????????????
   ? ?
   ? If you continue, the changes listed below will be written to the ?
   ? disks. Otherwise, you will be able to make further changes manually. ?
   ? ?
   ? WARNING: This will destroy all data on any partitions you have ?
   ? removed as well as on the partitions that are going to be formatted. ?
   ? ?
   ? The partition tables of the following devices are changed: ?
   ? LVM VG creeklp1-vg, LV root ?
   ? LVM VG creeklp1-vg, LV swap_1 ?
   ? Multipath mpath4 (WWID 36005076307ffc7b0000000000000071b) ?
   ? SCSI1 (0,5,4) (sdad) ?
   ? SCSI1 (0,0,4) (sde) ?
   ? SCSI1 (0,1,4) (sdj) ?
   ? SCSI1 (0,2,4) (sdo) ?
   ? SCSI1 (0,3,4) (sdt) ?
   ? ?
   ? <Yes> <No> ?
   ? ?
   ??????????????????????????????????????????????????????????????????????????

   ????????????????????? [!!] Finish the installation ??????????????????????
   ? ?
  ?? Installation complete ?
  ?? Installation is complete, so it is time to boot into your new system. ?
  ?? Make sure to remove the installation media (CD-ROM, floppies), so ?
  ?? that you boot into the new system rather than restarting the ?
  ?? installation. ?
  ?? ?
  ?? <Go Back> <Continue> ?
   ? ?
   ?????????????????????????????????????????????????????????????????????????

creeklp2 is now booting. I made a few changes to the system to accomplish this.

First I got the machine to boot by doing the following from the initramfs:

1. modprobe scsi_dh_alua
2. multipath -v0
3. exit

Once the machine was booted, I did the following:

1. Added scsi_dh_alua to /etc/initramfs-tools/modules
2. Ran update-initramfs to update the initramfs with this change
3. Set GRUB_DISABLE_LINUX_UUID=true in /etc/default/grub
4. Rebuild grub.cfg by using grub-mkconfig

I'm still getting a lot of errors booting the system from udev due to the fact that udev is spawning a call to multipath for each /dev/sd device on the system.

I decided to change to not use UUID when I saw in the initramfs that the /dev/disk/by-uuid symlink was pointing to one of the paths of the multipath device rather than the multipath device itself.

It seems there may be some udev configuration issues here when booting from a multipath device.

Can we get someone from Canonical to take a look at this?

Related bugs:
 * bug 1551937: lvm and multipath and xenial not happy together
 * bug 1432062: multipath-tools-boot: support booting without user_friendly_names on devices with spaces in identifiers

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-122037 severity-critical targetmilestone-inin---

Default Comment by Bridge

bugproxy (bugproxy) wrote : syslog.gz

Default Comment by Bridge

Luciano Chavez (lnx1138) on 2015-03-07
affects: ubuntu → debian-installer (Ubuntu)
bugproxy (bugproxy) on 2015-03-07
tags: added: targetmilestone-inin1504
removed: targetmilestone-inin---
Changed in debian-installer (Ubuntu):
status: New → Confirmed
assignee: nobody → Taco Screen team (taco-screen-team)

This seems to have something to do with hardware, or delay between 2 multipath commands.
Still checking.

Please don't care about this for now.
I'll get more time on it tomorrow.

------- Comment From <email address hidden> 2015-03-13 20:06 EDT-------
Problem identified.

The 'multipath' command is run *before* the disks over FC show up.
It must wait for them to settle.

There's a wait for SCSI devices already, but it is not enough in this case.
It must wait for the SCSI over FC devices to settle..
e.g., lpfc/other FC module loaded, and devices scanned, and corresponding SCSI disks shown in /dev.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-03-13 21:28 EDT-------
Hi Canonical,

May you please follow up on what's the best way to wait for the SCSI disks over FC to settle?

The place for that is probably in the initramfs's local-top/multipath script, similarly to waiting for the SCSI devices to settle (scsi scan wait module).

Thanks!

FYI

This problem becomes more evident/likely if the fix/work-around for bug 1431650 is applied (i.e., no udev time-out for multipath devices).

It seems that the time-out that arises from that bug gives the SCSI scan time to complete, and by the time the last 'multipath' command runs (local-top/multipath), the SCSI disks from FC storage are already detected and in place.

Nice comment on a lvm-devel thread:

"""
The removal of scsi_wait_scan doesn't actually mean any of the bus scans are
done when you wait on systemd-udev-settle, so making that change won't
really solve the issue you need. If I'm understanding it right, I think
in a world without scsi-wait-scan, you're likely going to need to enable
lvmetad and do incremental activation to have thigns work properly.
"""
 http://www.redhat.com/archives/lvm-devel/2012-September/msg00014.html

To illustrate the problem more clearly:

> The 'multipath' command is run *before* the disks over FC show up.
> It must wait for them to settle.

> There's a wait for SCSI devices already, but it is not enough in this case.
(it actually doesn't work because scsi_wait_scan was removed from the kernel years ago.)

 ...
 Loading, please wait...
 starting version 219
 ...
 Begin: Running /scripts/local-top ...
 Begin: Loading multipath modules ...
 ...
 Begin: Waiting for scsi storage ... done.
 Begin: Discovering multipaths ...
 ...
 [ 4.088476] scsi host2: Emulex LPe12000 PCIe Fibre Channel Adapter on PCI bus 50 device 01 irq 506
 ...
 [ 7.472269] scsi 2:0:0:1: Direct-Access IBM 2810XIV 10.2 PQ: 0 ANSI: 5
 [ 7.473301] sd 2:0:0:1: Attached scsi generic sg2 type 0
 [ 7.473542] sd 2:0:0:1: [sdb] 67108864 512-byte logical blocks: (34.3 GB/32.0 GiB)
 ...

In this example, sdb is not considered by the multipath command (Discovering multipaths ...).

Another problem arising from this: a root filesystem that should be mounted on a multipath device is actually mounted on one of its single-path devices.

Explanation: the root filesystem is specified by-UUID (root=UUID=...), and the /dev/disk/by-uuid symlink will point to the last detected single-path device w/ the matching UUID.
Had the disks been detected before the multipath command ran, the symlink would be updated to point to the multipath device (higher link priority, in kpartx udev rules).

And illustrating the previous point (root filesystem on single-path device):

 (initramfs) cat /proc/cmdline
 BOOT_IMAGE=/boot/vmlinux-3.19.0-8-generic root=UUID=3809b2f8-dcb0-4a6a-945a-885e384b463e ro break=post-multipath

 (initramfs) ls -l /dev/disk/by-uuid
 ...
 lrwxrwxrwx 1 10 3809b2f8-dcb0-4a6a-945a-885e384b463e -> ../../sda2
 ...
 (initramfs) multipath -v0
 ...
 (initramfs) ls -l /dev/disk/by-uuid
 ...
 lrwxrwxrwx 1 10 3809b2f8-dcb0-4a6a-945a-885e384b463e -> ../../dm-2
 ...

Constraints:
1) Can't count on udev rules to run 'multipath' (see bug 1431650)

Ideas:
0) sleep when there's no scsi_wait_scan module. (below)
1) make sure related SCSI modules are loaded before scsi_complete_async_scans() (drivers/scsi/scsi_scan.c) is called.
2) loop waiting for the number of SCSI disks to settle (maybe check /sys, /dev/, udev info, or something).

Ugly hack for now.. certainly something more elaborate is possible.

multipath-tools:/debian/initramfs/local-top:

  # Sync waiting for storage.
  verbose && log_begin_msg "Waiting for scsi storage"
 -{ rmmod scsi_wait_scan ; modprobe scsi_wait_scan ; rmmod scsi_wait_scan ; } >/dev/null 2>&1
 +# Ugly hack: sleep when scsi_wait_scan is not available (e.g., newer kernels).
 +{ rmmod scsi_wait_scan ; modprobe scsi_wait_scan || sleep 20; rmmod scsi_wait_scan ; } >/dev/null 2>&1
  verbose && log_end_msg

Testing on initramfs (break=top):

 (initramfs) modprobe scsi_wait_scan
 (initramfs) echo $?
 1
 (initramfs) sed 's:modprobe scsi_wait_scan:& || sleep 30:' -i /scripts/local-top/multipath

 (initramfs) exit

 works fine (i.e., as intended :P).

Steve Langasek (vorlon) on 2015-03-17
Changed in debian-installer (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Mathieu Trudel-Lapierre (mathieu-tl)
Steve Langasek (vorlon) wrote :

Architecturally speaking, I think it's a bug that installing to multipath results in the boot config pointing to a UUID for the root device. In all other cases, we use identifiers for all filesystems (root or otherwise) which are "guaranteed" to be both stable and unique. (For LVM, this is the LV path and not the UUID since UUIDs are not unique in the face of snapshots; otherwise we use the UUID.)

Given that we *know* that the UUID is not unique in the multipath scenario, and know this at install time, I think it's wrong for us to configure the system to reference filesystems via this non-unique identifier. I would argue instead that:

 - multipath-tools should (if it doesn't already) create a symlink for the device which includes the UUID, but is only ever created once multipath is initialized
 - the fstab and bootloader should be configured to refer to this symlink, not the non-unique UUID

This should address a number of issues with the initramfs, including allowing event-driven assembly of the multipath devices instead of the current blocking script approach.

Hi Steve,

> Given that we *know* that the UUID is not unique in the multipath scenario,
> and know this at install time, I think it's wrong for us to configure the system
> to reference filesystems via this non-unique identifier.

I see your point.
Although, I'm not sure it's technically 'wrong' to do that, given that we're relying on an udev feature/thing to handle this (which you know, as an underlying thing, is supposed to work, and actually does).
I guess it's an udev decision/thing to use (sym)link_priority to do this (i.e., the by-uuid symlink to a partition in a multipath disk is assigned a higher priority in kpartx udev rules, and thus the 'active' symlink is that one, until the multipath disk disappears).

> - multipath-tools should (if it doesn't already) create a symlink for the device which includes the UUID, but is only ever created once multipath is initialized

There's something almost there being done, but not really w/ an UUID; it has the device name and partition number in place:
    /dev/disk/by-id/dm-uuid-part2-mpath-0QEMU<...>HARDDISK<...>serial_one
Maybe a dm-uuid-part-mpath-<UUID> should address what you mean?

> This should address a number of issues with the initramfs, including
> allowing event-driven assembly of the multipath devices
> instead of the current blocking script approach.

May you please clarify? I'm not positive I got it.

AFAICT, for event-driven assembly of multipath devices, it takes calling 'multipath' from an udev ruleset, which is no longer possible w/ recent systemd-udev (bug 1431650).

And the real problem in this bug (for which the by-UUID problem is a consequence) is waiting for the SCSI scan to complete, which can't be guaranteed w/ the current scripts, in my understanding.

Given the case where the root fs is on a multipath device, and it is an async-SCSI-scan device (e.g., fibre channel), it's hard to tell when all events/the event of the root fs multipath device is completed. (i.e., udevadm settle is not enough).

Even the 'sleep' work-around doesn't guarantee anything -- it just helps; say, an SCSI scan that takes more than the sleep threshold, on which the root fs was placed at.

This is an interesting problem.. :)

There may be another (simpler) way here, that is to use the wait-for-root or the failure hooks called in pre_mountroot() (local/mount).

This should be combined w/ vorlon's catch of using a different UUID symlink for a multipath device, that should be changed on /etc/fstab / kernel cmdline, so we wait for the /right/ symlink to appear.

If the multipath init.d script runs multipath command later, before mounting the other/non-root filesystems (which I think it does), it would be OK.

Possibly good news.
See bug 1431650 comment 10.

Back to Steve's point (which now I understand more clearly, and looks totally right..) of event-driven multipath assembly.

> - multipath-tools should (if it doesn't already) create a symlink for the device which includes the UUID,
> but is only ever created once multipath is initialized
> - the fstab and bootloader should be configured to refer to this symlink, not the non-unique UUID

> This should address a number of issues with the initramfs,
> including allowing event-driven assembly of the multipath devices instead of the current blocking script approach.

Some observations..

In the context of relying on event-based multipath discovery.. waiting for the
root filesystem device.. and (not?) booting from an underlying device if it were
supposed to be a multipath device.
----------------------------------

I was wondering whether we would need an 'udevadm settle' after wait-for-root
in order to make sure that any /dev/disk/by-uuid symlink (or any other symlink)
pointed to multipath device at that time (i.e., it could point to an underlying
device if udev rules were still being processed and multipath hadn't run yet).

The answer is 'no', because wait-for-root only receives events from the udev
monitor after all udev rules have been processed.

wait-for-root is OK because it uses "udev" as source identifier in this call:
 udev_monitor = udev_monitor_new_from_netlink (udev, "udev");

    udev_monitor_new_from_netlink ()

 [...]
 Valid sources identifiers are "udev" and "kernel".
 [...]
 The "udev" events are sent out after udev has finished its event processing,
 all rules have been processed, and needed device nodes are created.

  https://www.kernel.org/pub/linux/utils/kernel/hotplug/libudev/libudev-udev-monitor.html

So, it only receives the event that the waited-for /dev/disk/by-uuid symlink
exists after all udev rules are processed, so it implies that the multipath
command called from udev rules finished, and _if successful_, that the symlink
points to the multipath device rather than the underlying device (the former has
greater link_priority).

In summary, it's not possible to guarantee that the multipath command is always
sucessful; therefore, it is reasonable to depend on a symlink that won't exist
if that command fails (as pointed out by Steve);
for example: /dev/disk/by-id/dm-uuid-partX-mpath-WWID

I'm writing a snippet for /etc/grub.d/10_linux to use that idea, and another
one for the installer to modify /target/etc/fstab to a similar intent, so that
the by-id links from multipath-only devices are used rather than by-uuid of
any disk (non-unique UUID).

BTW, the event-driven multipath assembly can now work again, with the patch in bug 1431650.

------- Comment From <email address hidden> 2015-03-27 16:43 EDT-------
I still see the error system complaint about the multipath link during installation. But the installation goes fine, system boots up normal on multipath disk without issue on today build 20150327. This is fixed. Thanks

root@creeklp2:~# uname -a
Linux creeklp2 3.19.0-10-generic #10-Ubuntu SMP Mon Mar 23 16:18:35 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

root@creeklp2:~# cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/mapper/mpath0-part2 / ext4 errors=remount-ro 0 1
/dev/mapper/mpath0-part3 none swap sw 0 0

Hi Mathieu and Steve,

Sorry for the delay on this.
Here's the patch addressing most of Steve's directions (comment #13).
I'll submit the remaining changes shortly.

Patch description:

> - multipath-tools should (if it doesn't already) create a symlink for the device which includes the UUID, but is only ever created once multipath is initialized

Now kpartx.rules creates the /dev/disk/by-uuid/multipath-<UUID> symlink for that.

Theoretically (I read somewhere), the symlinks in 'by-uuid' should be *UUID-based*, and not strictly *only the UUID string*.
So, either a 'multipath-' prefix or '-multipath' suffix would do it.
I opted for a prefix so not to confuse scripts with expressions like "/dev/disk/by-uuid/[[:uuid-chars:]]*" (don't strictly match the entire string) to think that the '-multipath' suffix is part of the UUID.

> - the fstab and bootloader should be configured to refer to this symlink, not the non-unique UUID

I'll submit the installer changes for the fstab shortly.

After some thinking and attempts, I opted not to require changes to the bootloader, but rather to keep it contained in multipath-tools-boot.

Ideally, it'd be nice to install new scripts (wrappers?) to /etc/grub.d with multipath-tools-boot, but it's not easy to wrap/re-use 10_linux and 30_os-prober currently.
Anyway, it requires just a simple change to /etc/grub.d/{10_linux,30_os-prober}, but would still require a change to non-multipath stuff.

I thought this would be a more simplistic and multipath-tools contained option:
Change the ROOT parameter in the initramfs from '/dev/disk/by-uuid/<UUID>' to '/dev/disk/by-uuid/multipath-<UUID>', thus relying on the new symlink; short and simple.

I added a failure_hook, so if wait-for-root can't find the root device (due to this change or *any other* multipath problem), the user will descriptively know that, and get a shell.

> This should address a number of issues with the initramfs, including allowing event-driven assembly of the multipath devices instead of the current blocking script approach.

I'll submit the event-driven assembly changes shortly too.
That's optional (and builds on-top of this change) - things already work really fine with the patch as it is.

Download full text (5.7 KiB)

The test-cases performed with the patch:

Install the modified multipath-tools' binary packages; the initramfs is updated with the changes.

Reboot. The system booted successfully; good.
Notice:
- unmodified kernel cmdline
- root filesystem mounted from the 'by-uuid/multipath-<UUID>' symlink
- it points to a device-mapper virtual block device
- the symlink is unique (no other devices claim it).

 $ cat /proc/cmdline
 BOOT_IMAGE=/boot/vmlinux-3.19.0-9-generic root=UUID=7fec6766-3166-4395-982a-555ecce92667 ro

 $ df /
 Filesystem 1K-blocks Used Available Use% Mounted on
 /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667 31483800 2374984 27486444 8% /

 $ udevadm info -q path /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667
 /devices/virtual/block/dm-4

 $ ls -1 /run/udev/links/*by-uuid*multipath-7fec6766-3166-4395-982a-555ecce92667
 b252:4

Verify the failure hook.
Reboot with 'break=top,post-multipath'.

 Reduce the wait-for-root timeout for debugging purposes.
 (initramfs) sed 's:\(ARCHDELAY\)=[0-9]\+:\1=10:' -i /scripts/local
 (initramfs) exit

 Check the expected root device:
 (initramfs) cat /proc/cmdline
 BOOT_IMAGE=/boot/vmlinux-3.19.0-9-generic root=UUID=7fec6766-3166-4395-982a-555ecce92667 ro break=top,post-multipath

 Initially, things are alright:

 (initramfs) ls -l /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667
 lrwxrwxrwx 1 10 /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667 -> ../../dm-3

 Now, make things wrong:

 (initramfs) multipath -F
 (initramfs) ls -l /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667
 ls: /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667: No such file or directory

 (initramfs) exit

 ~10 seconds for wait-for-root timeout...

 The informative message is displayed; good; the failure hook is OK.
  ...
  WARNING: the root device is supposed to be a multipath device,
  but it doesn't look like one: it doesn't have a multipath-UUID
  symlink, but it does have a non-multipath (normal) UUID symlink.
  ...

 If you don't try to fix anything and exit, the good and old 'root device not found' message appears, normally.
  ...
  Gave up waiting for root device. Common problems:
  ...
  ALERT! /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667 does not exist. Dropping to a shell!
  ....

Verify the failure hook again, but now let's fix things:

 (repeat steps above)
 ...
 WARNING: the root device is supposed to be a multipath device,
 but it doesn't look like one: it doesn't have a multipath-UUID
 symlink, but it does have a non-multipath (normal) UUID symlink.
 ...

 (initramfs) multipath -v0
 (initramfs) exit

 The initramfs continued on, and the system booted successfully:

 $ cat /proc/cmdline
 BOOT_IMAGE=/boot/vmlinux-3.19.0-9-generic root=UUID=7fec6766-3166-4395-982a-555ecce92667 ro break=top,post-multipath

 $ df /
 Filesystem 1K-blocks Used Available Use% Mounted on
 /dev/disk/by-uuid/multipath-7fec6766-3166-4395-982a-555ecce92667 31483800 2375308 27486120 8...

Read more...

The attachment "multipath-tools_root-uuid-multipath.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Chris J Arges (arges) on 2015-04-02
affects: debian-installer (Ubuntu) → multipath-tools (Ubuntu)
summary: - ISST-LTE: system drops to initramfs after install on multipath disk
+ Boot from an unique, stable, multipath-dependent symlink

Now that the vivid release is done, let's revisit these patches. I'll do some testing on my end, but looking briefly at them, they make sense.

What other installer changes (since you mention them in your comment #20) are necessary?

Changed in multipath-tools (Ubuntu):
status: Confirmed → In Progress
importance: Undecided → Medium

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Hi Mathieu,

(I have no idea what the 3 attachments above are related to. :-)

This is the patch missing for the installer.
I did some testing and noticed some problems w/ it, which made me to reconsider how/if we should really make this based on an UUID= field. Let me describe the issue.

IIRC, if this one is applied, and one or more partitions on multipath disks are used (so there are some UUID=multipath-<uuid> on /etc/fstab), the init jobs for the local filesystems wouldn't finish.
I couldn't investigate further by the time, but I guess the problem is some pieces maybe just don't assume they should pick a symlink from /dev/disk/by-uuid/, and perhaps search for the UUID field in the filesystem itself, which obviously doesn't contain the 'multipath-' prefix).

So, maybe better ways are either to move back to explicitly using /dev/disk/by-id/ (and then 'multipath-' prefix would be OK, as it's not a filesystem field, AFAICT), or use different approaches -- I believe upstream multipath-tools relies on different things, which are probably worth checking.

I guess that for W, the most interesting thing is to consider moving to upstream multipath-tools. I know there's a Debian-base to think of (and I see they still modify the sources to use mpath[0-9]+ rather than mpath[a-z]+, which creates some difficulties for some backporting..).. but at some time, either Ubuntu or Debian should make the move to have better multipath support, and by looking at the number of fixes made after 0.5.0, I think git HEAD is a very good place to sit on nowadays (besides it -- VPD page 0x80 as optional -- is important for IBM IPR controllers..).

This seems something that can span interesting things to talk about :)

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

I'm working on curtin support for multipath, and after a very long set of things, I believe I'm down to this bug, and are seeing it reproducibly using trusty+hwe-v on ppc64el (note, I've never seen it in trusty+hwe-u).

I believe that your re-setting of ROOT= guarantees failure of boot if you install multipath-tools-boot on a system that does not have multipath. That seems less than ideal. It basically changes "can" to "will" in bug 1463046 .

We are currently working around bug 1432062 by using user_friendly_names.

Scott Moser (smoser) wrote :

well, i lied. I've seen it on hwe-u also now.

Scott Moser (smoser) wrote :

It seems there is a grub component of this bug also, to make grub correctly identify the 'abstraction' of multipath and auomatically set GRUB_DEVICE=/dev/disk/by-uuid/multipath-UUID rather than UUID=

Hi Scott,

> I believe that your re-setting of ROOT= guarantees failure of boot if you
> install multipath-tools-boot on a system that does not have multipath.

Good to see more opinions on the this topic.
Well, I'm not positive the change to root=UUID=multipath- (done via ROOT=) would be well accepted by boot userspace, and would imagine some better can be done. (comment #28)

I see you chased some multipath stuff on the bug trail you mentioned.
In case your mind has been nurturing more ideas, it would be cool to ideate on the topic/implementation :)

Scott Moser (smoser) wrote :

Yeah, its been "fun".

I'm not sure what you mean by "accepted". as in "accepted upstream" or as in "the initramfs would respect your change to ROOT". initramfs will respect your change to root. Your path *will* work, but will break anyone that doesn't have their root device on multipath.

For curtin, we *almost* had it before landing on this bug yesterday.
I realized that since in curtin we were writing /etc/multipath.conf and /etc/multipath/bindings , that we actually definitively declare a stable path name for boot. the work-in-progress branch at https://code.launchpad.net/~smoser/curtin/grub-mpath does work, and boots with root=/dev/mapper/mpath0-<partition>X . At least for our case that is a stable path. Admittedly we're not doing raid or lvm on top of the multipath devices.

One thing I realized yesterday is that whatever solution we come up with for root, we also have to use for other mount points on multipath devices. If /usr was on a separate partition on a multipath device, then /etc/fstab for that device also needs the reliable multipath entry. For example, normally we'd see something like:
  UUID=715c6da7-c111-4842-868b-7778623ead7c / ext4 errors=remount-ro 0 1
  UUID=e1cfa4bb-4c8a-4fc3-9b35-49a8ef6ae975 /home ext4 defaults 0 2
but we will need:
  UUID=multipath-715c6da7-c111-4842-868b-7778623ead7c / ...
  UUID=multipath-e1cfa4bb-4c8a-4fc3-9b35-49a8ef6ae975 /home ...

or whatever else we're using for mounting. the point is that this isn't a "root device only" problem.

Scott Moser (smoser) wrote :

another comment, the path that i was pursuing, and almost have in place was to have /etc/multipath/bindings with consitent names and then boot with root=/dev/mappings/mpathX[-partY].

Mauricio's suggestion of boting with root=multipath-UUID is nicer in the event of a disk being replaced. As my solution would result in failure to boot if you replaced the physical disk with another with identical data (ie, you noticed it was going bad).

So, i do like the idea of multipath-<uuid> except for the guaranteed boot failure if you install that package and do not have multipath.

@smoser

> Yeah, its been "fun".

Hahah, I do understand the quotes.
Well, it's the way that turned out to be more fruitful to put it. :)

>> Well, I'm not positive the change to root=UUID=multipath- (done via ROOT=)
>> would be well accepted by boot userspace,

> I'm not sure what you mean by "accepted". [snip]

Not a good word for that context.
I meant: something in boot userspace (init scripts & the command they run, et al) didn't like it / failed with that root= parameter in some tests (comment #28)

> Your path *will* work, but will break anyone that doesn't have their root device on multipath.

Yes.
One of the things I couldn't devise clearly is to detect when multipath is really needed for the rootfs.
(that's the reason I added a helpful message about multipath-tools-boot in one of the patches for the initramfs script)

I guess one of the curtin's bugs mention to check for more than 1 device w/ a given WWID..
but Murphy might eventually ensure a condition where all but one paths are down when booting/detecting that (some sort of "degraded SAN" condition, is the term I read somewhere).
In a related aspect, newer multipath-tools introduced "find_multipaths" that does something similar, and for the 1-path case, it relies on the wwids/bindings file to see if that path was known previously.

An option is to use that on initramfs, but also implies the initramfs wwids/bindings file should be always in sync w/ that in the rootfs (which is not always intuitive for users.. regenerate the initramfs on SAN/topology changes).

> One thing I realized yesterday is that whatever solution we come up with for root,
> we also have to use for other mount points on multipath devices.

IIRC the patch for d-i (parman-base.. fstab-something) does that, and I believe that (+ the UUID=multiapath-<uuid>) is what introduced the failure I mentioned in comment #28.

> So, i do like the idea of multipath-<uuid> except for the guaranteed boot failure if
> you install that package and do not have multipath.

I'd suggest not to proceed with that route until that boot error is understood.
I had the impression something really expected an UUID after the UUID= parameter, not really interpreting it as a symlink, but as something to search for in the filesystems' UUID fields in the partitions' data; but I didn't investigate it further by that time.

To rule that out, I wondered about using /root/by-id/multipath-uuid-<uuid>, for example -- or anything that doesn't mean anything more than a symlink, that could hit the UUID interpretation bug I suspected.

Steve Langasek (vorlon) on 2015-06-22
summary: - Boot from an unique, stable, multipath-dependent symlink
+ Boot from a unique, stable, multipath-dependent symlink
Steve Langasek (vorlon) wrote :

> To rule that out, I wondered about using /root/by-id/multipath-uuid-<uuid>,
> for example -- or anything that doesn't mean anything more than a symlink,
> that could hit the UUID interpretation bug I suspected.

Yes; I don't think we should be overloading the behavior of UUID= since that is meant to be a uuid. I think it's preferable to point to /dev/disk/by-id/multipath-<uuid> (assuming that's the link that we're creating).

Scott Moser (smoser) on 2016-03-02
description: updated
Steve Langasek (vorlon) wrote :

Scott was right when he wrote:

> It seems there is a grub component of this bug also, to make grub
> correctly identify the 'abstraction' of multipath and auomatically
> set GRUB_DEVICE=/dev/disk/by-uuid/multipath-UUID rather than
> UUID=

I understand that curtin has been working around grub's behavior by manually setting GRUB_DISABLE_LINUX_UUID=true; this is a bad solution, grub-probe / update-grub should themselves detect that the root target is a multipath disk and automatically ensure the correct device name is used instead of the UUID.

Changed in grub2 (Ubuntu):
importance: Undecided → High
status: New → Triaged
tags: added: id-5c00556f000dd8799c9361aa
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers