ISST-LTE: root mpath device unavailable after installation

Bug #1526984 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Fix Released
Critical
Mathieu Trudel-Lapierre
Trusty
Fix Released
Critical
Mathieu Trudel-Lapierre

Bug Description

[Impact]
This affects users of multipath systems where devices may be slow to be detected. (See below).

On some systems disks may be slow to be detected due to a number of reasons: slow disks, spin delay, large number of devices to scan, etc. If the disk containing the root filesystem fails to be detected/scanned when the multipath initramfs scripts are run, the system will give up and fail to load the rootfs, which results in a failure to boot (and the user will be thrown into a busybox shell in the initramfs).

[Test case]
Boot the system.

[Regression Potential]
Given that this changes the load order for the system, it may cause some delay in booting due to the extra time taken to load all maps for the rootfs to become available through multipath. Also, crashes in multipathd may now affect early boot.

---

Canonical FYI:

We appear to have a regression in the current 14.04 environment back to
the behavior originally outlined in Comment #10 of LP Bug 1429327.

---Problem---

A multipath install of 14.04.04 fails to create the install mpath
device on subsequent boots. While the root filesystem ends up being
mounted on top of one of the path devices, other partitions on that
multipath device will not be automatically found.

---Additional details---

For one example host, the install disk was set up with a root and
swap partition:

Dec 14 21:37:55 disk-detect: create: mpath0 (35000c5007655a10f) undef IBM ,ST300MP0064
Dec 14 21:37:55 disk-detect: size=279G features='0' hwhandler='0' wp=undef
Dec 14 21:37:55 disk-detect: |-+- policy='round-robin 0' prio=1 status=undef
Dec 14 21:37:55 disk-detect: | `- 0:0:7:0 sdc 8:32 undef ready running
Dec 14 21:37:55 disk-detect: `-+- policy='round-robin 0' prio=1 status=undef
Dec 14 21:37:55 disk-detect: `- 1:0:3:0 sdd 8:48 undef ready running

The resulting fstab entries after install were:

/dev/mapper/mpath0-part2 / ext4 errors=remount-ro 0 1
/dev/mapper/mpath0-part3 none swap sw 0 0

In post-install boots, the multipath command appears to be executed
prior to any disks being discovered:

Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... [ 0.612955] ipr: IBM Power RAID SCSI Device Driver version: 2.6.1 (March 12, 2015)
[ 0.612985] ipr 0000:80:00.0: Found IOA with IRQ: 0
[ 0.613478] ipr 0000:80:00.0: ibm,query-pe-dma-windows(53) 800000 8000000 20000164 returned 0
Begin: Running /scripts/local-top ... [ 0.613727] ipr 0000:80:00.0: ibm,create-pe-dma-window(54) 800000 8000000 20000164 10 21 returned 0 (liobn = 0x70000164 starting addr = 8000000 0)
Begin: Loading multipath modules ... [ 0.618790] device-mapper: multipath: version 1.9.0 loaded
Success: loaded module dm-multipath.
Failure: failed to load module dm-emc.
done.
Begin: Waiting for scsi storage ... done.
Begin: Discovering multipaths ... [ 0.623429] be2net 0002:01:00.0: be2net version is 10.6.0.2
[ 0.623587] be2net 0002:01:00.0: enabling device (0000 -> 0002)
done.
[ 0.636785] ipr 0000:80:00.0: Using 64-bit direct DMA at offset 800000000000000
[ 0.637343] be2net 0002:01:00.0: PCIe error reporting enabled
[ 0.641435] ipr 0000:80:00.0: Received IRQ : 20
[ 0.641522] ipr 0000:80:00.0: Request for 2 MSIXs succeeded.
[ 0.645713] ipr 0000:80:00.0: Starting IOA initialization sequence.
[ 0.645719] scsi host0: IBM 0 Storage Adapter
[ 0.645741] ipr 0000:80:00.0: Starting IOA initialization sequence.
[ 0.645833] ipr 0001:a0:00.0: Found IOA with IRQ: 0
[ 0.646221] ipr 0000:80:00.0: Adapter firmware version: 13510C00
[ 0.647673] ipr 0000:80:00.0: IOA initialized.
[ 0.648009] ipr 0001:a0:00.0: Received IRQ : 448
[ 0.648041] ipr 0001:a0:00.0: Request for 2 MSIXs succeeded.
[ 0.651764] ipr 0001:a0:00.0: Starting IOA initialization sequence.
[ 0.651769] scsi host1: IBM 0 Storage Adapter
[ 0.651795] ipr 0001:a0:00.0: Starting IOA initialization sequence.
[ 0.652262] ipr 0001:a0:00.0: Adapter firmware version: 13510C00
[ 0.653702] ipr 0001:a0:00.0: IOA initialized.
[ 0.660800] scsi 0:3:0:0: No Device IBM 57B1001SISIOA 0150 PQ: 0 ANSI: 0
[ 0.660811] scsi 0:3:0:0: Resource path: 0/FE
[ 0.672518] scsi 0:0:0:0: Direct-Access IBM ST300MP0064 7D0E PQ: 0 ANSI: 6
[ 0.672530] scsi 0:0:0:0: Resource path: 0/00-0E-08
...
(first device in mpath0 seen at this point):
[ 0.710988] scsi 1:0:3:0: Direct-Access IBM ST300MP0064 7D0E PQ
: 0 ANSI: 6
[ 0.710994] scsi 1:0:3:0: Resource path: 1/00-0E-0A

The system ends up mounting the root filesystem from /dev/sdc2,
and the post-mount run of multipath errors out creating mpath0:

[ 10.056286] device-mapper: table: 252:0: multipath: error getting device
[ 10.082644] device-mapper: table: 252:0: multipath: error getting device
[ 10.192614] device-mapper: table: 252:2: multipath: error getting device
[ 10.222609] device-mapper: table: 252:2: multipath: error getting device
[ 10.663458] device-mapper: table: 252:2: multipath: error getting device
[ 10.682690] device-mapper: table: 252:2: multipath: error getting device
[ 10.796743] device-mapper: table: 252:2: multipath: error getting device

Since no other filesystems depend on mpath0 for this example config,
the system is able to boot, but lacks swap due to the missing swap
device.

Revision history for this message
bugproxy (bugproxy) wrote : sosreport from host nulp1

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-133937 severity-high targetmilestone-inin14044
Revision history for this message
bugproxy (bugproxy) wrote : Console log from post-install boot on host nulp1

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1526984/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
bugproxy (bugproxy)
tags: removed: bot-comment bugnameltc-133937 severity-high
Kevin W. Rudd (kevinr)
affects: ubuntu → multipath-tools (Ubuntu)
bugproxy (bugproxy)
tags: added: bugnameltc-133937 severity-critical
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2015-12-28 15:19 EDT-------
*** Bug 134385 has been marked as a duplicate of this bug. ***

Revision history for this message
bugproxy (bugproxy) wrote : outputs of multipath -v3

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : log file #1 from installer

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : log file #2 from installer

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : Support multipathd in the initramfs for async device discovery

------- Comment on attachment From <email address hidden> 2016-01-11 07:58 EDT-------

Hi Canonical @mathieu-tl,

This patch implements multipathd in the initramfs, and is reported to resolve the problem
(tested by developer and test team for more than 10 reboots each).

It's on top of the patch in LP 1432062 comment 10 (requirement, since multipathd -B might create devmaps w/ spaces in names).

Is it still in the window for 14.04.4?
Thanks!

Steve Langasek (vorlon)
Changed in multipath-tools (Ubuntu):
assignee: Taco Screen team (taco-screen-team) → Mathieu Trudel-Lapierre (mathieu-tl)
importance: Undecided → Critical
Changed in multipath-tools (Ubuntu Trusty):
importance: Undecided → Critical
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
milestone: none → ubuntu-14.04.4
Revision history for this message
bugproxy (bugproxy) wrote : Console log from post-install boot on host nulp1

Default Comment by Bridge

Changed in multipath-tools (Ubuntu):
status: New → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.5.0-7ubuntu10

---------------
multipath-tools (0.5.0-7ubuntu10) xenial; urgency=medium

  * debian/patches/0052-readonly-bindings_multipathd.patch,
    debian/patches/0053-readonly-bindings_multipathd_prod.patch: support -B to
    allow multipathd to handle cases where the bindings file is read-only.
    (LP: #1526984)
  * debian/initramfs/hooks: install multipathd and required directories.
  * debian/initramfs/local-bottom, debian/rules: install local-bottom for
    initramfs.
  * debian/initramfs/local-premount: reload all maps to make sure they're
    indeed loaded and ready before we end premount.
  * debian/initramfs/local-top: run multipathd rather than a one-off call to
    multipath so that new paths can be correctly added as detected while we're
    still in the initramfs.
  * debian/initramfs/local-bottom: remember to stop multipathd.
  * debian/patches/git-kpartx-support-spaces-in-dev-names-b407050a.patch: deal
    with spaces in device names in kpartx too (LP: #1432062)

 -- Mathieu Trudel-Lapierre <email address hidden> Wed, 27 Jan 2016 10:42:51 -0500

Changed in multipath-tools (Ubuntu):
status: In Progress → Fix Released
description: updated
Changed in multipath-tools (Ubuntu Trusty):
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-02-02 17:47 EDT-------
*** Bug 136463 has been marked as a duplicate of this bug. ***

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi @mathieu-tl,

Mathieu Trudel-Lapierre (mathieu-tl) on 2016-02-01
> description: updated
> Changed in multipath-tools (Ubuntu Trusty):
> status: New → Incomplete

Can you clarify the status 'incomplete' for this bug?
Is that indeed an information request for submitter (where, please) or just for some other change you'll do on the description field?

Thanks,

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 11:54 EDT-------
Is multipath-tools - 0.5.0-7ubuntu10 going to be available in the repository anytime soon? We want to verify the problem is fixed before completing our testing for 14.04.4, which we will be done with early next week...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 12:13 EDT-------
Hi Mikhail,

(In reply to comment #60)
> Is multipath-tools - 0.5.0-7ubuntu10 going to be available in the repository
> anytime soon? We want to verify the problem is fixed before completing our
> testing for 14.04.4, which we will be done with early next week...

That version is for 16.04 - it is already fix released/available in the repository.

The version for 14.04.4 (something like 0.4.9-3ubuntu7.<something>) is not yet fix released/available.

There's no defined date for when it will be made available - but it is in the list of multipath fixes that Canonical's team is aware of and working on. Perhaps there will be single update w/ more than 1 bug fixed.

There's likely a number of tasks on their side as well, and this one is prioritized as Critical, so things should happen as best as possible.

------- Comment From <email address hidden> 2016-02-04 12:17 EDT-------
Got it, thanks! We'll see if the problem is gone on 16.04 next week.

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: Incomplete → Fix Committed
tags: added: verification-needed
Revision history for this message
Stuart Hopkins (stu-g) wrote :

Mathieu,

As noted in LP:1538775, version 0.4.9-3ubuntu7.8 still isn't resolving the issue for me, as while my system does start (im not dropping to a rescue shell), the multipath daemon seems to be executed prior to the disks being discovered/udev starting and so it boots using sda for root instead of mpath. Running 'multipath -v5' post-boot shows that it wants to reconfigure pathing but cannot as the disks are in-use (understandably).

I've attached the dmesg log which shows when multipathd starts, the disks are detected, then udev starts.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Hum, that's kind of weird. You should see disks being picked up by multipath if they should, even if they are detected *after* multipath scripts run (because multipathd...). They wouldn't be detected before multipath runs, because then they would most likely be remapped anyway (unless something else claims them).

Could you file a new bug for this, including the output of multipath -v4 taken from the initramfs? You may stop the boot process by adding "break=pre-multipath" to break boot just before the multipath scripts. At that point, you'll need to modprobe dm_multipath; and then you should be able to run multipath -v4. If you exit from there boot should complete (hopefully succesfully, since there will have been some extra delay).

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi @stu-g,

Thanks for your report.
I think the problem you're experiencing (handled in LP 1538775) is one we've been discussing/trying to get around for some time now.
Since scsi_wait_scan was removed, there's no way to really wait /on the SCSI scan/.
For that objective, udevadm settle won't help, as the SCSI scan does not generate any udev events.. but only the disks found in the process do..
So, if your environment hits that intermediary spot (ie, the SCSI scan only starts to add the disks to the kernel /after/ udevadm settle finished w/out more events to process), then you'd hit the problem nonetheless.

Unfortunately, there's not something we have came up with yet for that kind of problem. It's been a head scratcher for some time, as mentioned.

For this particular bug, the reporter verified the patches submitted (which implement multipathd in the initramfs) and it does handle some async discoveries nicely.
So, unfortunately it doesn't fix one other scenario (which seems hard to fix), but that doesn't make the fix wrong/unuseful on its own / the scenario it happens to fix/work on.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :
Download full text (16.2 KiB)

I verified this works correctly on 2 LPARs, including the original LPAR from the bug report. (Thanks, Michail)
Marking as verification-done.

Installed the version from -proposed:

 root@pinelp2:~# dpkg -l | grep 0.4.9-3ubuntu7.8
 ii kpartx 0.4.9-3ubuntu7.8 ppc64el create device mappings for partitions
 ii kpartx-boot 0.4.9-3ubuntu7.8 all Provides kpartx during boot
 ii multipath-tools 0.4.9-3ubuntu7.8 ppc64el maintain multipath block device access
 ii multipath-tools-boot 0.4.9-3ubuntu7.8 all Support booting from multipath devices

 root@pinelp3:~# dpkg -l | grep 0.4.9-3ubuntu7.8
 ii kpartx 0.4.9-3ubuntu7.8 ppc64el create device mappings for partitions
 ii kpartx-boot 0.4.9-3ubuntu7.8 all Provides kpartx during boot
 ii multipath-tools 0.4.9-3ubuntu7.8 ppc64el maintain multipath block device access
 ii multipath-tools-boot 0.4.9-3ubuntu7.8 all Support booting from multipath devices

Notice the 2 LPARs (pinelp2 and pinelp3) booted from an individual path.
On pinelp2, it was required to manually include the rootfs device's WWID (below).

 root@pinelp2:~# mount | grep ' / '
 /dev/sdq2 on / type ext4 (rw,errors=remount-ro)

 root@pinelp3:~# mount | grep ' / '
 /dev/sdg2 on / type ext4 (rw,errors=remount-ro)

Make sure the rootfs device is listed in /etc/multipath/wwids in the initramfs:

- This is required on pinelp2:

 root@pinelp2:~# multipath -c /dev/sdq
 /dev/sdq is not a valid multipath device path

 root@pinelp2:~# multipath -d -v3 | grep 'sdq: uid ='
 Feb 05 11:04:14 | sdq: uid = 36005076304ffc4410000000000000161 (callout)

 root@pinelp2:~# multipath -d -v3 | grep 'sdq: uid =' | awk '{ print "/" $8 "/" }' >> /etc/multipath/wwids

 root@pinelp2:~# multipath -c /dev/sdq
 /dev/sdq is a valid multipath device path

 root@pinelp2:~# update-initramfs -u
 update-initramfs: Generating /boot/initrd.img-4.2.0-27-generic

- Not required on pinelp3:

 root@pinelp3:~# multipath -c /dev/sdg
 /dev/sdg is a valid multipath device path

 Already in WWIDs, so this is likely a timing problem in the boot (works some of the times).

Test it on both LPARs.

 root@pinelp2:~# reboot
 <...>

 root@pinelp3:~# reboot
 <...>

The 2 LPARs booted successfully from a multipath device:

 root@pinelp2:~# mount | grep ' / '
 /dev/mapper/mpath3-part2 on / type ext4 (rw,errors=remount-ro)

 root@pinelp3:~# mount | grep ' / '
 /dev/mapper/mpath4-part2 on / type ext4 (rw,errors=remount-ro)

For details: console log for pinelp2:

 Notice:
 - The scsi disks are added /after/ multipathd is started.
 - There *IS* a wait of some seconds at this point.
   So, the udevadm settle is indeed working.

 Loading, please wait...
 [ 0.750797] systemd-udevd[206]: starting version 204
 Begin: Loading essential drivers ... done.
 Begin: Running /scripts/init-premount ... done.
 Begin: Mounting root file system ... Begin: Running /scr...

tags: added: verification-done
removed: verification-needed
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Well, to be honest there isn't anything we can do in multipath-tools or udev if the scans aren't generating events. The best option then is to look at the driver, if it permits some control of how the scans happen, and if not, why.

Revision history for this message
bugproxy (bugproxy) wrote : sosreport from host nulp1

Default Comment by Bridge

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-done
tags: added: verification-needed
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

This was already verified successfully before; the additional update does not need reverification, only checking the regression in bug 1543430.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.9

---------------
multipath-tools (0.4.9-3ubuntu7.9) trusty; urgency=medium

  * debian/patches/kpartx-support-device-names-with-spaces.patch: fix loopback
    files unmapping. (LP: #1543430)

multipath-tools (0.4.9-3ubuntu7.8) trusty; urgency=medium

  * debian/patches/kpartx-support-device-names-with-spaces.patch: deal with
    spaces in device names in kpartx too (LP: #1432062)
  * debian/initramfs/local-premount: wait for udev to settle before mounting
    so the by-uuid/ symlinks have a chance to be updated by udev rules.
    (LP: #1503286)
  * Allow device detection all through the initramfs: run multipathd instead
    of only scanning once for devices, so those that come up slower can still
    be used as a root device (LP: #1526984):
    - debian/patches/0050-readonly-bindings_prefix.patch,
      debian/patches/0051-readonly-bindings_multipath.patch,
      debian/patches/0052-readonly-bindings_multipathd.patch,
      debian/patches/0053-readonly-bindings_multipathd_prod.patch: support -B
      to allow multipathd to handle cases where the bindings file is read-only.
    - debian/initramfs/hooks: install multipathd and required directories.
    - debian/initramfs/local-premount: also reload all maps to make sure
      they're ready before we mount.
    - debian/initramfs/local-top: run multipathd rather than a one-off call to
      multipath so that new paths can be correctly added as detected while
      we're still in the initramfs.
    - debian/initramfs/local-bottom: remember to stop multipathd.
    - debian/initramfs/local-bottom, debian/rules: install local-bottom for
      initramfs.
  * debian/patches/lp1496210_add_IBM_XIV_defaults.patch: add support (default
    config values) for the IBM 2810XIV storage system. (LP: #1496210)
  * debian/patches/0054-kpartx-update-option.patch: run kpartx -u rather than
    kpartx -a, so as to remove old partition entries if the partition table
    has changed. (LP: #1473903)
  * debian/patches/multipath_enable_sync_support_1b8082c8.patch,
    debian/patches/kpartx_rely_on_udev_dev_creation_9a632fff.patch: synchronize
    udev, device-mapper and multipath, and let udev deal with creating device
    nodes and symlinks. (LP: #1486370)
  * debian/initramfs/local-top: drop scsi_wait_scan stanza, that module is no
    longer available. (LP: #1538775)

 -- Mathieu Trudel-Lapierre <email address hidden> Tue, 09 Feb 2016 16:03:10 -0500

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
bugproxy (bugproxy) wrote : dmesg

Default Comment by Bridge

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.