lvm and multipath and xenial not happy together

Bug #1551937 reported by Scott Moser
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
curtin
Fix Released
High
Unassigned
curtin (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Unassigned
Xenial
Fix Released
High
Unassigned

Bug Description

[Impact]

 * MaaS deployments to systems with multipath configures cannot
   install Xenial releases due to a change in how multipath configures
   its friendly names. On older releases (multipath-tools < 0.5.0)
   multipath-tools expects that the names of the devices will include
   names and parses the file with that expectation. However, on newer
   releases (multipath-tools >= 0.5.0) multipath-tools uses spaces to
   separate fields in the bindings file and fails if the device name
   includes spaces.

   Curtin will detect the level of multipath-tools to be used in the
   target OS and adjusts how it generates device names for the binding
   file accordingly.

[Test Case]

 * Install proposed curtin package and deploy custom storage
   configuration against a Power8 or similiar configured multipath
   system and select Xenial as the target OS.

  PASS: The multipath configured machine will successfully install
  both Xenial and Trusty.

  FAIL: The multipath configured machine will fail to install Xenial
  but will successfully install Trusty.

[Regression Potential]

 * May impact users of systems with multipath storage configurations.

[Original Description]

tried deploy of xenial with curtin on a powerNV system. the result was failure to mount the root, ending like this:
Begin: Running /scripts/local-block ... lvmetad is not active yet, using direc
t activation during sysinit
  Volume group "mpath0" not found
  Cannot process volume group mpath0
done.
Begin: Running /scripts/local-block ... lvmetad is not active yet, using direct activation during sysinit
  Volume group "mpath0" not found
  Cannot process volume group mpath0
done.
done.
Gave up waiting for root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT! /dev/mapper/mpath0-part2 does not exist. Dropping to a shell!

Related bugs:
 * bug 1429327: Boot from a unique, stable, multipath-dependent symlink
 * bug 1432062: multipath-tools-boot: support booting without user_friendly_names on devices with spaces in identifiers
 * bug 1552319: xenial kernel boot slow/timeout on power8 powerNV

$ dpkg-query --show | egrep '(maas|curtin)'
curtin-common 0.1.0~bzr359-0ubuntu1
maas 1.9.1+bzr4541-0ubuntu1~trusty1
maas-cli 1.9.1+bzr4541-0ubuntu1~trusty1
maas-cluster-controller 1.9.1+bzr4541-0ubuntu1~trusty1
maas-common 1.9.1+bzr4541-0ubuntu1~trusty1
maas-dhcp 1.9.1+bzr4541-0ubuntu1~trusty1
maas-dns 1.9.1+bzr4541-0ubuntu1~trusty1
maas-provision 2.2.2-0ubuntu4
maas-provision-common 2.2.2-0ubuntu4
maas-proxy 1.9.1+bzr4541-0ubuntu1~trusty1
maas-region-controller 1.9.1+bzr4541-0ubuntu1~trusty1
maas-region-controller-min 1.9.1+bzr4541-0ubuntu1~trusty1
python-curtin 0.1.0~bzr359-0ubuntu1
python-django-maas 1.9.1+bzr4541-0ubuntu1~trusty1
python-maas-client 1.9.1+bzr4541-0ubuntu1~trusty1
python-maas-provision 2.2.2-0ubuntu4
python-maas-provisioningserver 1.9.1+bzr4541-0ubuntu1~trusty1

Related branches

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

attaching output of
 maas ubuntu node get-curtin-config node-11c03686-9d7f-11e4-91da-d4bed9a84493

Revision history for this message
Scott Moser (smoser) wrote :

Seems like this might be relevant.

(initramfs) cat /etc/multipath/bindings
# This file was created by curtin while installing the system.
mpath0 1IBM IPR-0 5EC29C0000000080
# End of content generated by curtin.
# Everything below is maintained by multipath subsystem.
mpatha 1IBM_IPR-0_5EC29C0000000080
mpathb 1IBM_IPR-0_5EC29C0000000060
mpathc 1IBM_IPR-0_5EC29C0000000040
mpathd 1IBM_IPR-0_5EC29C0000000020
mpathe 1IBM_IPR-0_5EC29C00000000C0
mpathf 1IBM_IPR-0_5EC29C00000000A0

note the one written by curtin has a spaces in it, but the others do not.
I'm pretty sure that when trusty is deployed they will have the spaces.

Scott Moser (smoser)
Changed in curtin:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Scott Moser (smoser) wrote :

related bug https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1432062

also note comment in curthooks

        # Without user_friendly_names option enabled system fails to boot
        # if any of the disks has spaces in its name. Package multipath-tools
        # has bug opened for this issue (LP: 1432062) but it was not fixed yet.

that bug is now reported as fixed in trusty.

Revision history for this message
Scott Moser (smoser) wrote :

so, verified xenial installs with sipmly this diff:

=== modified file 'curtin/commands/curthooks.py'
--- curtin/commands/curthooks.py 2016-02-22 21:10:40 +0000
+++ curtin/commands/curthooks.py 2016-03-02 01:32:21 +0000
@@ -562,7 +562,7 @@
     if mpbindings or not os.path.isfile(multipath_bind_path):
         # we do assume that get_devices_for_mp()[0] is /
         target_dev = block.get_devices_for_mp(target)[0]
- wwid = block.get_scsi_wwid(target_dev)
+ wwid = block.get_scsi_wwid(target_dev, replace_whitespace=True)
         blockdev, partno = block.get_blockdev_for_partition(target_dev)

         mpname = "mpath0"

I'll try trusty now.

Revision history for this message
Scott Moser (smoser) wrote :

and, yeah. as expected. trusty fails if i use replace_whitespace=True.

it ends up with
(initramfs) cat /etc/multipath/bindings
# This file was created by curtin while installing the system.
mpath0 1IBM_IPR-0_5EC29C0000000080
# End of content generated by curtin.
# Everything below is maintained by multipath subsystem.

Revision history for this message
Scott Moser (smoser) wrote :

So, the gist of this problem:
1.) Curtin runs in the install environment, which does not have mutipath kernel modules or user space utility.
 even if it did, it would not be guaranteed that it was the same version as the target's multipath
2.) curtin generates a /etc/multipath/bindings containing the root device as 'mpath0' (name doesnt matter) and then sets system to boot with 'root=/dev/mapper/mpath0' and enables uses user_friendly_names to get that read.
3.) /etc/multipath/bindings is line formated with space delimited fields. curtin uses '/lib/udev/scsi_id --whitelisted --device=/dev/sda' to read the wwid.
4.) some wwid have spaces in them (this system does), thus scsi_id takes the '--replace-whitespace' flag.
5.) xenial 'multipath -r' renders and expects that /etc/multipath/bindings has whitespace removed (scsi_id with --replace-whitelisted)
6.) trusty 'multipath -r' renders and expects that /etc/multipath/bindings has whitespace present (scsi_id without --replace-whitelisted)

Two last things
a.) one thing that might have to be fixed in curtin is that we only generate for the root device and let multipath handle the others at first boot. We've solved bug 1429327 for the root device, but do not do that for others. We can solve this by using /dev/mapper/mpath-X in /etc/fstab for each volume we reference.
b.) it seems that 5 and 6 above is an upgrade problem unless something is going to translate that file on upgrade from trusty to xenial.

description: updated
Revision history for this message
Scott Moser (smoser) wrote :

For reference, here is the /etc/multipath/bindings rendered by multipath -r in:
xenial:
  mpath0 1IBM_IPR-0_5EC29C0000000080
  mpath1 1IBM_IPR-0_5EC29C0000000060
  mpath2 1IBM_IPR-0_5EC29C0000000040
  mpath3 1IBM_IPR-0_5EC29C0000000020
  mpath4 1IBM_IPR-0_5EC29C00000000C0
  mpath5 1IBM_IPR-0_5EC29C00000000A0

trusty:
  mpath0 1IBM IPR-0 5EC29C0000000080
  mpath1 1IBM IPR-0 5EC29C0000000060
  mpath2 1IBM IPR-0 5EC29C0000000040
  mpath3 1IBM IPR-0 5EC29C0000000020
  mpath4 1IBM IPR-0 5EC29C00000000C0
  mpath5 1IBM IPR-0 5EC29C00000000A0

For clarity, on trusty:
$ sed 's, ,.,g' /etc/multipath/bindings
mpath0.1IBM.....IPR-0...5EC29C0000000080
mpath1.1IBM.....IPR-0...5EC29C0000000060
mpath2.1IBM.....IPR-0...5EC29C0000000040
mpath3.1IBM.....IPR-0...5EC29C0000000020
mpath4.1IBM.....IPR-0...5EC29C00000000C0
mpath5.1IBM.....IPR-0...5EC29C00000000A0

Ie, there are multiple spaces (not tabs) in these wwid names.

Revision history for this message
Scott Moser (smoser) wrote :

cyphermox points out that my 'b' is fixed in upgrade (postinst) of multipath-tools

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi @smoser,

On the LVM/multipath topic, on boot time (the issues here seem to be only install time) we've recently introduced a fix for Xenial, that may make into Trusty on LP #1540401 -- just in case you hit something similar on other tests.

Revision history for this message
Scott Moser (smoser) wrote :

The attached branch (https://launchpad.net/bugs/1551937) does work for me, allowing me to install the system using maas 1.9 with release of trusty or xenial.

Revision history for this message
Scott Moser (smoser) wrote :

So that attached branch gets you to be able to deploy xenial or trusty (i did not test wily).
xenial then shows bug 1552319 on the hardware I have.

description: updated
Ryan Harper (raharper)
tags: added: curtin-sru
Changed in curtin:
status: Confirmed → In Progress
Ryan Harper (raharper)
Changed in curtin:
status: In Progress → Fix Committed
Ryan Harper (raharper)
description: updated
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Scott, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr399-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu):
status: New → Fix Released
Changed in curtin (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Scott, or anyone else affected,

Accepted curtin into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr399-0ubuntu1~14.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in curtin (Ubuntu Trusty):
status: New → Fix Committed
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Verified this and works just fine. Marking verification-done.

Mathew Hodson (mhodson)
Changed in curtin (Ubuntu):
importance: Undecided → High
Changed in curtin (Ubuntu Trusty):
importance: Undecided → High
Changed in curtin (Ubuntu Xenial):
importance: Undecided → High
Revision history for this message
Felipe Reyes (freyes) wrote :

Andres, it seems you forgot to change the tag to verification-done.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr399-0ubuntu1~16.04.1

---------------
curtin (0.1.0~bzr399-0ubuntu1~16.04.1) xenial-proposed; urgency=medium

  * debian/new-upstream-snapshot: fix for specifying revision.
  * SRU current curtin
    - curtin/net: fix inet value for subnets, don't add interface attributes
      to alias (LP: #1588547)
    - improve net-meta network configuration (LP: #1592149)
    - reporting: set webhook handler level to DEBUG, no filtering
      (LP: #1590846)
    - tests/vmtests: add yakkety, remove vivid
    - curtin/net: use post-up for interface alias, resolve 120 second time out
      on Trusty when using interface aliases
    - vmtest: provide info on images used
    - fix multipath configuration and add multipath tests (LP: #1551937)
    - tools/launch and tools/xkvm: whitespace cleanup and bash -x
    - tools/launch: boot by root=LABEL=cloudimg-rootfs
    - Initial vmtest power8 support and TestSimple test.

 -- Ryan Harper <email address hidden> Tue, 12 Jul 2016 11:29:30 -0500

Changed in curtin (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for curtin has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.5 KiB)

This bug was fixed in the package curtin - 0.1.0~bzr399-0ubuntu1~14.04.1

---------------
curtin (0.1.0~bzr399-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * SRU current curtin
    - curtin/net: fix inet value for subnets, don't add interface attributes
      to alias (LP: #1588547)
    - improve net-meta network configuration (LP: #1592149)
    - reporting: set webhook handler level to DEBUG, no filtering
      (LP: #1590846)
    - tests/vmtests: add yakkety, remove vivid
    - curtin/net: use post-up for interface alias, resolve 120 second time out
      on Trusty when using interface aliases
    - vmtest: provide info on images used
    - fix multipath configuration and add multipath tests (LP: #1551937)
    - tools/launch and tools/xkvm: whitespace cleanup and bash -x
    - tools/launch: boot by root=LABEL=cloudimg-rootfs
    - Initial vmtest power8 support and TestSimple test.

curtin (0.1.0~bzr389-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * New upstream snapshot.
    * Detect and remove legacy /etc/network/interfaces.d/eth0.cfg from
      target (LP: #1577872)

curtin (0.1.0~bzr387-0ubuntu1~14.04.1) trusty-proposed; urgency=medium

  * sru current curtin (LP: #1577872)
  * debian/new-upstream-snapshot, debian/README.source: add
    new-upstream-snapshot and mention it in README.source
  * debian/control: drop python from curtin-common Depends.
     remove unnecessary Depends on util-linux as it is essential.
     python3-curtin, python-curtin: drop unnecessary 'curl' from Depends.
     python3-curtin, python-curtin: list oauthlib and yaml Depends
  * debian/control: add bcache-tools to curtin Depends.
  * New upstream snapshot.
    - fix timestamp not being updated in reported events
    - mdadm: resolve mdadm/bcache and trusty+hwe issues
    - fix support for 4k disks
    - emit source /etc/network/interfaces.d/*.cfg in
      rendered /etc/network/interfaces
    - net: introduce 'control' field to network configuration to allow
      for declaring manual controlled interfaces
    - disable cloud-init networking as curtin is the source of network config
    - block: wipe_volume improvements
    - reporter: enhance reporting events to include levels and
      improve usefullness of messages
    - network: add bonding tests and cleanup newline rendering
    - block: fix partition path issue with nvme devices
    - fix logic error in kernel installation
    - block: add debug regarding raid modules being missing on mdadm create
    - add s390x support to curtin and vmtest
    - support build on xenial where python3 pyflakes is split out
    - fix uefi install path on nvme devices
    - numerous unit tests and vmtests improvements. Add running
      of pylint for static checking.
    - Add bond parsing & improved source, source-directory parsing
      of /etc/network/interfaces.
    - move global dns-* options under auto lo in /etc/network/interfaces
    - partitioning: limited support for odd ordering of partition numbers
    - change use of mkfs.fat to mkfs.vfat and add dependency.
    - block-meta: use removable devices if no non-removable devices are
      found [Robert Clark]
    - Improve 'curtin mkfs' and move mkfs...

Read more...

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.