ISST-LTE: Boot of Ubuntu15.10 lpar fails: "mounting /dev/sdn2 on /root failed: Device or resource busy" [multipath]

Bug #1503286 reported by bugproxy on 2015-10-06
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Critical
Steve Langasek
Trusty
Critical
Mathieu Trudel-Lapierre
Vivid
Critical
Mathieu Trudel-Lapierre
Wily
Critical
Steve Langasek

Bug Description

[Impact]
Systems with disks that have long spin-up times or otherwise take a while to be detected may be affected by a failure to boot due to the drives underlying multipath devices not being available.

[Test case]
This issue is difficult to reproduce.
 - Boot a system with the boot device on multipath.
This may be limited to POWER LPARs. See description below.

[Regression Potential]
Given that a new initramfs script is introduced to add a udev trigger with a timeout of 2 minutes (121 seconds), users may notice a delay of up to two minutes in booting if devices take 2 minutes or more to be brought up or detected by udev.

---

== Comment: #0 - Manjunatha H R <email address hidden> - 2015-09-25 11:05:36 ==
Booting of Ubuntu15.10 lpar fails and control falls to initramfs.

uname -a
--------------
Linux (none) 4.2.0-10-generic #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 ppc64le GNU/Linux

Boot log:
-------------
  Booting a command list

Loading Linux 4.2.0-10-generic ...
Loading initial ramdisk ...
OF stdout device is: /vdevice/vty@30000000
Preparing to boot Linux version 4.2.0-10-generic (buildd@fisher04) (gcc version 5.2.1 20150911 (Ubuntu 5.2.1-17ubuntu4) ) #12-Ubuntu SMP Tue Sep 15 19:46:04 UTC 2015 (Ubuntu 4.2.0-10.12-generic 4.2.0)
Detected machine type: 0000000000000101
Max number of cores passed to firmware: 256 (NR_CPUS = 2048)
Calling ibm,client-architecture-support... done
command line: BOOT_IMAGE=/boot/vmlinux-4.2.0-10-generic root=UUID=822dd709-5b69-45a9-aba5-63cb55768ffb ro splash quiet topology_updates=off
memory layout at init:
  memory_limit : 0000000000000000 (16 MB aligned)
  alloc_bottom : 000000000bf80000
  alloc_top : 0000000010000000
  alloc_top_hi : 0000000010000000
  rmo_top : 0000000010000000
  ram_top : 0000000010000000
found display : /pci@80000002000002c/display@0, opening... done
instantiating rtas at 0x000000000eb60000... done
prom_hold_cpus: skipped
copying OF device tree...
Building dt strings...
Building dt structure...
Device tree strings 0x000000000bf90000 -> 0x000000000bf91965
Device tree struct 0x000000000bfa0000 -> 0x000000000bfe0000
Quiescing Open Firmware ...
Booting Linux via __start() ...
 -> smp_release_cpus()
spinning_secondaries = 199
 <- smp_release_cpus()
 <- setup_system()
[ 2.868103] [drm:radeon_device_init [radeon]] *ERROR* Unable to find PCI I/O BAR
[ 3.074553] [drm:radeon_atombios_init [radeon]] *ERROR* Unable to find PCI I/O BAR; using MMIO for ATOM IIO
[ 5.060785] lpfc 0002:90:00.0: 0:1303 Link Up Event x1 received Data: x1 x0 x80 x0 x0 x0 0
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/sdn2 is in use.
e2fsck: Cannot continue, aborting.

fsck exited with status code 8
[ 36.233086] rport-0:0-9: blocked FC remote port time out: removing rport
mount: mounting /dev/sdn2 on /root failed: Device or resource busy
Target filesystem doesn't have requested /sbin/init.
mount: mounting /dev on /root/dev failed: No such file or directory
No init found. Try passing init= bootarg.

BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)
-------------------------

This lpar is having multipath disks and boot disk is on a multipath disk.
Boot passes only whenever fsck tries to scan boot disk via : /dev/dm OR /dev/mapper/mpath

Boot Pass scenarios:
----------------------------
1. Boot passed when fsck tried scanning "/dev/mapper/mpathb"
fsck from util-linux 2.26.2
/dev/mapper/mpathb-part2: clean, 81802/3139584 files, 1040598/12558080 blocks

2. Boot passed when fsck tried scanning "/dev/dm-3"
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/dm-3: clean, 81802/3139584 files, 1040605/12558080 blocks

Boot fails, whenever fsck is called on /dev/sd

Boot fail scenario: Boot failed when fsck is called on "/dev/sdn"
-------------------------
Scanning for Btrfs filesystems
fsck from util-linux 2.26.2
/dev/sdn2 is in use.
e2fsck: Cannot continue, aborting.

fsck exited with status code 8
[ 36.108653] rport-0:0-9: blocked FC remote port time out: removing rport
mount: mounting /dev/sdn2 on /root failed: Device or resource busy
Target filesystem doesn't have requested /sbin/init.

mount: mounting /dev on /root/dev failed: No such file or directory
No init found. Try passing init= bootarg.

BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)
-------------------------

Contact info:
----------------
Manju (<email address hidden>) A.P. (<email address hidden>)

== Comment: #12 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-02 13:51:29 ==
Hi Manju and Alton,

I could not reproduce this bug in 2 attempts.
The LPAR booted successfully, using the root=UUID= parameter.

By looking at this message from the description:
> mount: mounting /dev/sdn2 on /root failed: Device or resource busy

It should have happened because multipath udev rules failed to update the /dev/disk/by-id/<uuid> symlink from /dev/sdn to /dev/dm-X, but multipathing the path was successful (so it got locked/in-use).

If you can reproduce it again, please leave the LPAR in the failing state (in the initramfs), reopen this bug and ping me.
I'd be happy to debug it.

Thanks!

== Comment: #15 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 19:59:56 ==
This is probably a race between the resolve_device() call in mountroot() and the multipath discovery triggered by udev rules.

If resolve_device() runs before the root device is multipathed, $ROOT is set to an individual path (eg, /dev/sdf2) rather than its multipah device (eg, /dev/mapper/mpathb-part2), because the /dev/disk/by-uuid/<UUID> symlink is not updated yet.
The multipath discovery finishes after $ROOT is set, so the individual path becomes locked, and afterwards the root mount will be attempted on it -- this fails.

The LPAR is now patched w/ a test fix that is supposed to ensure resolve_device() only starts after udev rules are finished.

Can you try to recreate the issue, please? Thanks!

== Comment: #16 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 20:05:37 ==
Console messages

 (!) Note: the local-premount messages (Running ... & done.) occur around SCSI device scan/discovery time.

 ...
 Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading multipath modules ... [ 5.113397] device-mapper: multipath: version 1.9.0 loaded
 Success: loaded module dm-multipath.
 Failure: failed to load module dm-emc.
 done.
 Begin: Discovering multipaths ... done.
 done.
 Begin: Running /scripts/local-premount ... [ 5.187071] scsi 0:0:0:0: Direct-Access IBM 2107900 .850 PQ: 0 ANSI: 5
 [ 5.195814] sd 0:0:0:0: Attached scsi generic sg0 type 0
 [ 5.203289] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
 ...
 [ 5.878046] sd 0:0:3:3: [sdp] Attached SCSI disk
 ...
 <10-20 SCSI disks via FC>
 ...
 [ 5.923577] device-mapper: multipath round-robin: version 1.0.0 loaded
 ...
 done.
 ...

If resolve_device() runs before the multipath udev rules
(the rules multipath the root device and update the /dev/disk/by-uuid symlink of $ROOT)
this happens:

 Begin: Checking root file system ... fsck from util-linux 2.26.2
 /dev/sdf2 is in use.
 e2fsck: Cannot continue, aborting.

 fsck exited with status code 8
 done.
 Warning: File system check failed but did not detect errors

 mount: mounting /dev/sdf2 on /root failed: Device or resource busy
 done.
 Target filesystem doesn't have requested /sbin/init.
 Begin: Running /scripts/local-bottom ... done.
 Begin: Running /scripts/init-bottom ...
 ...
 mount: mounting /dev on /root/dev failed: No such file or directory
 done.
 No init found. Try passing init= bootarg.
 ...
 (initramfs)

So, /dev/sdf2 is in use ... and hits Device or resource busy.
This comes from $ROOT.
However, the root=UUID= symlink points to the multipath device:

 (initramfs) echo $ROOT
 /dev/sdf2

 (initramfs) cat /proc/cmdline
 BOOT_IMAGE=/boot/vmlinux-4.2.0-12-generic root=UUID=44bd8a6e-8613-431a-9335-879d8cf5d0e4 ro

 (initramfs) ls -l /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4
 lrwxrwxrwx 1 11 /dev/disk/by-uuid/44bd8a6e-8613-431a-9335-879d8cf5d0e4 -> ../../dm-19

 (initramfs) ls -l /dev/sdf
 brw------- 1 8, 80 /dev/sdf

 (initramfs) dmsetup table | grep 8:80
 mpathb: 0 104857600 multipath 0 0 1 1 round-robin 0 4 1 8:80 1 8:16 1 8:144 1 8:208 1

It's probably because resolve_device() was racing w/ the multipath discoveries from udev rules.
resolve_device() finished before the /dev/disk/by-uuid/ symlink was updated by multipath discovery.

Code:

initramfs :: /init

 log_begin_msg "Mounting root file system"
 ...
 mountroot
 log_end_msg

 (!) Note: message "Mounting root fs" and call to mountroot()

initramfs :: /scripts/local

 mountroot()
 {
  local_mount_root
 }

 local_mount_root()
 {
  ...
  local_premount

  ROOT=$(resolve_device "$ROOT")
  ...
 }

 (!) Note: local_premount() is the last call before resolve_device() is called

 local_premount()
 {
  ...
   [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount"
   run_scripts /scripts/local-premount
   [ "$quiet" != "y" ] && log_end_msg
  ...
 }

So we're testing a call to 'udevadm settle' in /scripts/local-premount/multipath script.

== Comment: #17 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-05 20:08:26 ==

== Comment: #18 - Manjunatha H R <email address hidden> - 2015-10-06 06:05:30 ==
Thank you Mauricio for a quick fix!!
Lpar is booting up properly without seeing device or resource busy errors.

== Comment: #20 - Mauricio Faria De Oliveira <email address hidden> - 2015-10-06 09:04:50 ==
Confirmed w/ Manju the # of tests.

10:00:41 AM: Manjunatha H R: Hi Mauricio, I tried around 10 boots..
10:01:07 AM: Manjunatha H R: all times it booted up..

Sounds good.

I'll be sending a patch/mirroring.

bugproxy (bugproxy) wrote : dmesg

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-131064 severity-critical targetmilestone-inin1510

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Default Comment by Bridge

Gary Gaydos (gmgaydos) on 2015-10-06
affects: ubuntu → initramfs-tools (Ubuntu)

This is a patch for multipath-tools-boot to install a /scripts/local-premount/multipath script (based on the btrfs script).
It just runs "udevadm settle --timeout 121" (timeout value taken from udev initramfs scripts).

This is supposed to ensure that by the time local_mount_root() calls resolve_device(), the multipath udev rules are finished, thus the /dev/disk/by-uuid/ symlinks are updated/pointing to multipath devices, and $ROOT will be set to a multipath device (vs. an individual path).

affects: initramfs-tools (Ubuntu) → multipath-tools (Ubuntu)

Verifying the initramfs scripts in the binary package:

$ dpkg-deb -c ../multipath-tools-boot_0.5.0-7ubuntu6premountsettle1_all.deb | grep scripts
drwxr-xr-x root/root 0 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/
drwxr-xr-x root/root 0 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/local-top/
-rwxr-xr-x root/root 1272 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/local-top/multipath
drwxr-xr-x root/root 0 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/local-premount/
-rwxr-xr-x root/root 335 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/local-premount/multipath
drwxr-xr-x root/root 0 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/init-top/
-rwxr-xr-x root/root 624 2015-10-06 16:08 ./usr/share/initramfs-tools/scripts/init-top/multipath

The attachment "multipath-tools_premount-udev-settle.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Changed in multipath-tools (Ubuntu):
assignee: nobody → Taco Screen team (taco-screen-team)

------- Comment From <email address hidden> 2015-10-12 20:37 EDT-------
This patch did not make it into today build.
I tried to reinstall pearlp5 at hit the initramfs issues still.

This is blocking debug of 130680

Thierry FAUCK (thierry-j) wrote :

Patch has been provided

Changed in multipath-tools (Ubuntu):
status: New → In Progress
Steve Langasek (vorlon) on 2015-10-20
Changed in multipath-tools (Ubuntu Vivid):
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
status: New → Triaged
Changed in multipath-tools (Ubuntu Wily):
assignee: Taco Screen team (taco-screen-team) → Steve Langasek (vorlon)
Changed in multipath-tools (Ubuntu Vivid):
importance: Undecided → Critical
Changed in multipath-tools (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.5.0-7ubuntu7

---------------
multipath-tools (0.5.0-7ubuntu7) wily; urgency=medium

  [ Mauricio Faria de Oliveira ]
  * debian/initramfs/local-premount: wait for udev to settle before the call
    to resolve_device() in local_mount_root(), so the by-uuid/ symlinks have
    a chance to be updated by the multipath udev rules (LP: #1503286).

 -- Steve Langasek <email address hidden> Mon, 19 Oct 2015 22:56:58 -0700

Changed in multipath-tools (Ubuntu Wily):
status: In Progress → Fix Released
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2015-10-27 09:53 EDT-------
Issue is fixed with new build, which has multipath-tools - 0.5.0-7ubuntu7:

root@pearlp5:~# uname -r
4.2.0-16-generic
root@pearlp5:~# dpkg -l |grep -i multipath-tools
ii multipath-tools 0.5.0-7ubuntu7 ppc64el maintain multipath block device access
ii multipath-tools-boot 0.5.0-7ubuntu7 all Support booting from multipath devices

Thanks,
Manju

Changed in multipath-tools (Ubuntu):
importance: Undecided → Critical
Changed in multipath-tools (Ubuntu Wily):
importance: Undecided → Critical
Changed in multipath-tools (Ubuntu Trusty):
status: Triaged → In Progress
description: updated

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into vivid-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu12.15.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Vivid):
status: Triaged → Fix Committed
tags: added: verification-needed
Brian Murray (brian-murray) wrote :

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in multipath-tools (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-done
removed: verification-needed

Verified on 14.04.
Marking verification-done.

As mentioned in the description test-case, the issue is hard to reproduce.
Trying to force it to happen, I modified the updated package local-premount/multipath
script to remove/rescan the SCSI devices in the background right before the
udevadm settle command, and could not reproduce the failure.
The system booted successfully.

More assurance is given since the patch is the same as that verified by the tester on the original environment (comment #12).

 . /scripts/functions

+multipath -F
+
+for sd_delete in /sys/block/sd*/device/delete; do
+ echo 1 > $sd_delete
+done
+
+for host_scan in /sys/class/scsi_host/host*/scan; do
+ echo '- - -' > $host_scan
+done &
+
 if [ -x /sbin/multipathd ]
 then
         [ "$quiet" != "y" ] && log_begin_msg "Waiting for udev to settle (multipath)"
         udevadm settle --timeout=121 || true
         [ "$quiet" != "y" ] && log_end_msg
 fi

 exit 0

------- Comment From <email address hidden> 2016-02-05 13:35 EDT-------
More evidence on this one.
This wasn't hit with the testing on 2 LPARs for LP #1526984.

Hello bugproxy, or anyone else affected,

Accepted multipath-tools into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/multipath-tools/0.4.9-3ubuntu7.9 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: removed: verification-done
tags: added: verification-needed

This was already verified successfully before; the additional update does not need reverification, only checking the regression in bug 1543430.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package multipath-tools - 0.4.9-3ubuntu7.9

---------------
multipath-tools (0.4.9-3ubuntu7.9) trusty; urgency=medium

  * debian/patches/kpartx-support-device-names-with-spaces.patch: fix loopback
    files unmapping. (LP: #1543430)

multipath-tools (0.4.9-3ubuntu7.8) trusty; urgency=medium

  * debian/patches/kpartx-support-device-names-with-spaces.patch: deal with
    spaces in device names in kpartx too (LP: #1432062)
  * debian/initramfs/local-premount: wait for udev to settle before mounting
    so the by-uuid/ symlinks have a chance to be updated by udev rules.
    (LP: #1503286)
  * Allow device detection all through the initramfs: run multipathd instead
    of only scanning once for devices, so those that come up slower can still
    be used as a root device (LP: #1526984):
    - debian/patches/0050-readonly-bindings_prefix.patch,
      debian/patches/0051-readonly-bindings_multipath.patch,
      debian/patches/0052-readonly-bindings_multipathd.patch,
      debian/patches/0053-readonly-bindings_multipathd_prod.patch: support -B
      to allow multipathd to handle cases where the bindings file is read-only.
    - debian/initramfs/hooks: install multipathd and required directories.
    - debian/initramfs/local-premount: also reload all maps to make sure
      they're ready before we mount.
    - debian/initramfs/local-top: run multipathd rather than a one-off call to
      multipath so that new paths can be correctly added as detected while
      we're still in the initramfs.
    - debian/initramfs/local-bottom: remember to stop multipathd.
    - debian/initramfs/local-bottom, debian/rules: install local-bottom for
      initramfs.
  * debian/patches/lp1496210_add_IBM_XIV_defaults.patch: add support (default
    config values) for the IBM 2810XIV storage system. (LP: #1496210)
  * debian/patches/0054-kpartx-update-option.patch: run kpartx -u rather than
    kpartx -a, so as to remove old partition entries if the partition table
    has changed. (LP: #1473903)
  * debian/patches/multipath_enable_sync_support_1b8082c8.patch,
    debian/patches/kpartx_rely_on_udev_dev_creation_9a632fff.patch: synchronize
    udev, device-mapper and multipath, and let udev deal with creating device
    nodes and symlinks. (LP: #1486370)
  * debian/initramfs/local-top: drop scsi_wait_scan stanza, that module is no
    longer available. (LP: #1538775)

 -- Mathieu Trudel-Lapierre <email address hidden> Tue, 09 Feb 2016 16:03:10 -0500

Changed in multipath-tools (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for multipath-tools has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers