lvcreate fails with duplicate paths to physical volume

Bug #1869075 reported by Frank Heimes on 2020-03-25
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
High
Canonical Server Team
curtin
Critical
Ryan Harper
subiquity
Undecided
Unassigned

Bug Description

An subiquity installation was done on an s390x LPAR using a zFCP disk where previously a LUKS installation was done. While doing a LVM installation then (this time w/o LUKS) I faced this error:

 curtin: Installation failed with exception: Unexpected error while running comm
and.
 Command: ['curtin', 'block-meta', 'simple']
 Exit code: 3
 Reason: -
 Stdout: start: cmd-install/stage-partitioning/builtin/cmd-block-meta: curtin co
mmand block-meta
         get_path_to_storage_volume for volume disk-sda
         Processing serial 0x6005076306ffd6b60000000000002603 via udev to 0x6005
076306ffd6b60000000000002603
         lookup_disks found: ['wwn-0x6005076306ffd6b60000000000002603', 'wwn-0x6
005076306ffd6b60000000000002603-part1', 'wwn-0x6005076306ffd6b60000000000002603-
part2']
         lookup_disks realpath(wwn-0x6005076306ffd6b60000000000002603)=/dev/sda
         Running command ['udevadm', 'info', '--query=property', '--export', '/d
ev/sda'] with allowed return codes [0] (capture=True)
         Processing serial 36005076306ffd6b60000000000002603 via udev to 3600507
6306ffd6b60000000000002603
         lookup_disks found: ['scsi-36005076306ffd6b60000000000002603', 'scsi-36
005076306ffd6b60000000000002603-part1', 'scsi-36005076306ffd6b60000000000002603-
part2']
         lookup_disks realpath(scsi-36005076306ffd6b60000000000002603)=/dev/sda
         Running command ['udevadm', 'info', '--query=property', '--export', '/d

I've attached the full steps of the installation (partially with workarounds due to a VLAN nw issue).

And I'll also attach the full /var/log and /var/crash.

Related branches

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :
Changed in ubuntu-z-systems:
importance: Undecided → High
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Ryan Harper (raharper) wrote :

We create the vg on /dev/sda2

Running command ['vgcreate', '--force', '--zero=y', '--yes', 'ubuntu-vg', '/dev/sda2'] with allowed return codes [0] (capture=True)
 Running command ['pvscan'] with allowed return codes [0] (capture=True)
 Running command ['vgscan', '--mknodes'] with allowed return codes [0] (capture=True)
 finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: SUCCESS: configuring lvm_volgroup: lvm_volgroup-0
 start: cmd-install/stage-partitioning/builtin/cmd-block-meta: configuring lvm_partition: lvm_partition-0

Running command ['lvcreate', 'ubuntu-vg', '--name', 'ubuntu-lv', '--zero=y', '--wipesignatures=y', '--size', '4294967296.0B'] with allowed return codes [0] (capture=False)
   WARNING: Not using device /dev/sdb2 for PV los2ZN-fWXS-JNWB-WPJC-vUhM-2K46-6f5AAU.
   WARNING: Not using device /dev/sdc2 for PV los2ZN-fWXS-JNWB-WPJC-vUhM-2K46-6f5AAU.
   WARNING: Not using device /dev/sdd2 for PV los2ZN-fWXS-JNWB-WPJC-vUhM-2K46-6f5AAU.
   WARNING: PV los2ZN-fWXS-JNWB-WPJC-vUhM-2K46-6f5AAU prefers device /dev/sda2 because device name matches previous.
   WARNING: PV los2ZN-fWXS-JNWB-WPJC-vUhM-2K46-6f5AAU prefers device /dev/sda2 because device name matches previous.
   WARNING: PV los2ZN-fWXS-JNWB-WPJC-vUhM-2K46-6f5AAU prefers device /dev/sda2 because device name matches previous.
   Cannot update volume group ubuntu-vg with duplicate PV devices.
 An error occured handling 'lvm_partition-0': ProcessExecutionError - Unexpected error while running command.
 Command: ['lvcreate', 'ubuntu-vg', '--name', 'ubuntu-lv', '--zero=y', '--wipesignatures=y', '--size', '4294967296'

And then lvm looks at other *disks* on the system (and sees duplicates due to multipath).

It looks like curtin will need to dynamically enable lvm filters to ignore these:

--config 'devices{ filter = filter = [ "a|/dev/sda2$|" ] }'

Changed in curtin:
importance: Undecided → High
status: New → Triaged
summary: - subiquity/curtin block probe error with zfcp disks
+ lvcreate fails with duplicate paths to physical volume
Frank Heimes (fheimes) on 2020-03-26
Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
Ryan Harper (raharper) wrote :

Hrm, do we know if this LVM over multipath worked before?

In my recreation, I'm not seeing how this will work without curtin changes. We currently do not write out an lvm.conf which restricts scans to multipath devices only. I've got local changes to reproduce a failure similar to this, and I can successfully install, but the boot afterward fails due to root=/dev/mapper/ubuntu-vg-ubuntu-lv ... initramfs attempts to bring the vg/lv online and at this point, it complains again of the duplicate paths...

I suspect this has never worked before, but I'd like confirmation.

Curtin will need to emit a lvm.conf with a filter for multipath only devices.

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1869075] Re: lvcreate fails with duplicate paths to physical volume

It works with d-i installs...

On Fri, 27 Mar 2020, 21:30 Ryan Harper, <email address hidden> wrote:

> Hrm, do we know if this LVM over multipath worked before?
>
> In my recreation, I'm not seeing how this will work without curtin
> changes. We currently do not write out an lvm.conf which restricts
> scans to multipath devices only. I've got local changes to reproduce a
> failure similar to this, and I can successfully install, but the boot
> afterward fails due to root=/dev/mapper/ubuntu-vg-ubuntu-lv ...
> initramfs attempts to bring the vg/lv online and at this point, it
> complains again of the duplicate paths...
>
> I suspect this has never worked before, but I'd like confirmation.
>
> Curtin will need to emit a lvm.conf with a filter for multipath only
> devices.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1869075
>
> Title:
> lvcreate fails with duplicate paths to physical volume
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1869075/+subscriptions
>

Revision history for this message
Frank Heimes (fheimes) wrote :

+1
I used it on d-i a lot of times - but tried it here with subiquity for the first time

Frank Heimes (fheimes) on 2020-04-01
tags: added: req4focal
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks for confirming.

I'm mostly through the initial work to support lvmroot over multipath. I'm verifying that the change won't regress the existing multipath scenarios we test; adding the lvmroot over multipath, and then I need to check that the changes also work on previous LTS releases as well.

I hope to have a branch up for review later today, or tomorrow.

Changed in curtin:
importance: High → Critical
status: Triaged → In Progress
assignee: nobody → Ryan Harper (raharper)
Frank Heimes (fheimes) on 2020-04-02
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Ryan Harper (raharper) wrote :

I've got things working on Bionic -> Focal for normal multipath and lvm over multipath. I'm currently fixing unittests to verify that dnames work for multipath "partitions"; the dname rule we generate for partitions doesn't work for multipath as they are DEVTYPE=disk and multipath is *lovely* and *prepends* part%n to the DM_UUID of the parent disk. The other fix being worked on is ensuring that /etc/fstab entries reference dm-uuid, and that the root= parameter uses /dev/mapper/mpathX names.

Revision history for this message
Ryan Harper (raharper) wrote :

Continue working through verifying no regressions after the changes, I hope to have a merge proposal up by Monday EOB.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

@raharper were you able to make your merge proposal by Monday EOB as you had hoped?

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :
Revision history for this message
Joshua Powers (powersj) wrote :

The link above is correct - initial review is done, some cleanup to do, and then final review is what is left

Revision history for this message
Joshua Powers (powersj) wrote :

Currently going through a second round of review on the merge request.

Revision history for this message
Ryan Harper (raharper) wrote :

Final integration tests caught a few more scenarios that needed fixing:

1) we wrote /etc/multipath/bindings files with the target mpath device (partition) instead of just the disk name (mpatha, wwwid); We now copy this autogenated file from the ephemeral environment into the target system for release newer than Bionic

2) Fix an error in the multipath filter that was generated for lvm, the refactor in the review regressed the value

3) Fix an issue in Centos initramfs configuration, the untested path with adding both multipath and lvm modules;

I'll update the Merge with this fixes for another review and kick of an overnight integration test to verify that nothing else has broken with these changes.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit dc215b14 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=dc215b14

Changed in curtin:
status: In Progress → Fix Committed
Frank Heimes (fheimes) on 2020-04-16
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

Well, this worked with 20.04.1+git19.3bd1382b if I use the defaults, that are:
1GB /boot
and
4GB / (root)

But next time installing on the same system the UI was screwed up (I saw multiple entires, the list was too long I thing, but kind of shrunk again when I navigated to 'done' - difficult to describe - I btw. used a remote shell for subiquity this time - so the UI issue is unrelated to the HMC tasks 'OS Messages' or 'Intergated ASCII Console'.
I changed root to the maximum size which was 62.996G (unfortunately one cannot type just 'max'),
then the system seemd to install fine, but didn't came up anymore - it ended up in busybox.

I can see that the zfcp devices themselves are there:
"lszdev | grep yes"
"zfcp-host 0.0.e000 yes yes"
"zfcp-host 0.0.e100 yes no"
"zfcp-lun 0.0.e000:0x50050763060b16b6:0x4026400000000000 yes no sdc sg2"
"zfcp-lun 0.0.e000:0x50050763060b16b6:0x4026400100000000 yes no sdd sg3"
"zfcp-lun 0.0.e000:0x50050763061b16b6:0x4026400000000000 yes yes sda sg0"
"zfcp-lun 0.0.e000:0x50050763061b16b6:0x4026400100000000 yes no sdb sg1"
"zfcp-lun 0.0.e100:0x50050763060b16b6:0x4026400000000000 yes no sde sg4"
"zfcp-lun 0.0.e100:0x50050763060b16b6:0x4026400100000000 yes no sdf sg5"
"zfcp-lun 0.0.e100:0x50050763061b16b6:0x4026400000000000 yes no sdg sg6"
"zfcp-lun 0.0.e100:0x50050763061b16b6:0x4026400100000000 yes no sdh sg7"
"qeth 0.0.c000:0.0.c001:0.0.c002 yes no encc000"
(even if I'm wondering why only two are flagged with persistence 'yes').

I've attached the full boot log - that's unfortunately all I have (the kept the system in that status ...)

Revision history for this message
Ryan Harper (raharper) wrote :

root=UUID=3b98f6a2-7d41-47df-b828-ba62bb9b04

That doesn't look right for curtin s390x install, we should be specifying a /dev/disk/by-path value.

We will need the debug tarball to sort out what happened.

Revision history for this message
Frank Heimes (fheimes) wrote :

Well, I thought to redo the install and copy over the logs before the reboot.
This time I've updated 20.04.1+git30.51e235d4 (I'm not sure if I did that before).
I again ended in an installer error, but it's a bit different.
Looks like it's curtin, but also lvm related.
Please notice that I did an installation on a zFCP multipath disk and just selected LVM and only increased the root disk form 4GB to the max (62.966GB).
(The same layout was probably already on the disk, since I reused the system ...)

I've attached the steps for a better understanding of what I did.
And also /var/log and /var/crash (two crashes are listed, since I tried two times).

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

Thanks. That helps. The storage config has changed slightly, so curtin will need to adapt to this as well:

Previously we had a single wipe: superblock-recursive on the type:disk entry, nothing on the partitions.

Now, the config puts wipe on the disks/parts that are used:

  - {ptable: gpt, serial: 36005076306ffd6b60000000000002601, wwn: '0x6005076306ffd6b60000000000002601',
    multipath: '[orphan]', path: /dev/sdb, wipe: superblock, preserve: false, name: '',
    grub_device: false, type: disk, id: disk-sdb}
  - {device: disk-sdb, size: 1073741824, wipe: superblock, flag: '', number: 1, preserve: false,
    type: partition, id: partition-0}
  - {fstype: ext4, volume: partition-0, preserve: false, type: format, id: format-0}
  - {device: disk-sdb, size: 67643637760, wipe: superblock, flag: '', number: 2, preserve: false,
    type: partition, id: partition-1}

Then curtin trips up over trying to wipe "/dev/sdb2" which is is a multipath partition (we should be wiping /dev/dm-X instead.

Let's fix up this scenario as well.

Revision history for this message
Ryan Harper (raharper) wrote :

The 'path' key in the disks results in curtin resolving the path to the disk differently; since sdb does have a holder (multipath layer) we attempt to clear and wipe the partitions, however since they are path members, we cannot write to them directly.

Now that I can recreate I can start working on a fix for this. I think I'll need to convert more of our vmtests to using the path key (which is unstable for us in VMs but in practice for subiquity it is stable). Something to discuss on emitted config here I think.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

On Sat, 18 Apr 2020 at 08:50, Ryan Harper <email address hidden>
wrote:

> The 'path' key in the disks results in curtin resolving the path to the
> disk differently; since sdb does have a holder (multipath layer) we
> attempt to clear and wipe the partitions, however since they are path
> members, we cannot write to them directly.
>
> Now that I can recreate I can start working on a fix for this. I think
> I'll need to convert more of our vmtests to using the path key (which is
> unstable for us in VMs but in practice for subiquity it is stable).
> Something to discuss on emitted config here I think.
>

subiquity only emits path in the disk config to avoid having logic to care
about only emitting it if there is no serial -- I assumed that if serial
was there curtin entirely ignored the path. If that's not true, we can
change stuff.

Revision history for this message
Ryan Harper (raharper) wrote :

It's not the primary issue, we do prefer serial or wwn keys. And curtin used this keys to find a multipath *member* which then exposed a bug in what device showed:

get_path_to_storage_volume for volume disk-sdb({'ptable': 'gpt', 'serial': '36005076306ffd6b60000000000002601', 'wwn': '0x6005076306ffd6b60000000000002601', 'multipath': '[orphan]', 'path': '/dev/sdb', 'wipe': 'superblock', 'preserve': False, 'name': '', 'grub_device': False, 'type': 'disk', 'id': 'disk-sdb'})
Processing serial 0x6005076306ffd6b60000000000002601 via udev to 0x6005076306ffd6b60000000000002601
lookup_disks found: ['wwn-0x6005076306ffd6b60000000000002601', 'wwn-0x6005076306ffd6b60000000000002601-part2', 'wwn-0x6005076306ffd6b60000000000002601-part1']
Running command ['udevadm', 'info', '--query=property', '--export', '/dev/sdb']

This returned /dev/sdb

While this is correct, we cannot access or wipe the device directly since multipath is running.

I've currently got a patch which is detecting if the device is a multipath member, and if so, returning the /dev/mapper/mpathX value instead, which will resolve to dm-X and curtin then does the right thing.

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1869075

tags: added: iso-testing
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

But somewhere we had code to filter out multipath devices, and only present
a path in the config.

Now that curtin is improved, we should be just passing the
/dev/mapper/mpatha name right?

On Mon, 20 Apr 2020, 15:55 Ryan Harper, <email address hidden> wrote:

> It's not the primary issue, we do prefer serial or wwn keys. And curtin
> used this keys to find a multipath *member* which then exposed a bug in
> what device showed:
>
> get_path_to_storage_volume for volume disk-sdb({'ptable': 'gpt', 'serial':
> '36005076306ffd6b60000000000002601', 'wwn':
> '0x6005076306ffd6b60000000000002601', 'multipath': '[orphan]', 'path':
> '/dev/sdb', 'wipe': 'superblock', 'preserve': False, 'name': '',
> 'grub_device': False, 'type': 'disk', 'id': 'disk-sdb'})
> Processing serial 0x6005076306ffd6b60000000000002601 via udev to
> 0x6005076306ffd6b60000000000002601
> lookup_disks found: ['wwn-0x6005076306ffd6b60000000000002601',
> 'wwn-0x6005076306ffd6b60000000000002601-part2',
> 'wwn-0x6005076306ffd6b60000000000002601-part1']
> Running command ['udevadm', 'info', '--query=property', '--export',
> '/dev/sdb']
>
> This returned /dev/sdb
>
> While this is correct, we cannot access or wipe the device directly
> since multipath is running.
>
> I've currently got a patch which is detecting if the device is a
> multipath member, and if so, returning the /dev/mapper/mpathX value
> instead, which will resolve to dm-X and curtin then does the right
> thing.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1869075
>
> Title:
> lvcreate fails with duplicate paths to physical volume
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/curtin/+bug/1869075/+subscriptions
>

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 43db2823 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=43db2823

Revision history for this message
Ryan Harper (raharper) wrote :

@Dimitri

I'd like to push that out a bit so we can design a proper multipath type in curtin.

I wonder how best to identify the multipath group best: dm-uuid-mpath-<> ? dm-name-<friendly-name> what of the underlying scsi wwn or serial (which is what we do now and only points to the path device).

Would we entertain udev rule changes to remove links to path members while multipath is assembled? I generally don't like that the host can "see" the path members when it can't actually use them directly.

Revision history for this message
Frank Heimes (fheimes) wrote :

Just fyi, I think I still see this with the 20.04.2 installer version.
Attaching /var/log and /var/crash...

Revision history for this message
Ryan Harper (raharper) wrote :

We *really* need to get the curtin commit hash into curtin/__init__.py or curtin/version.py:_PACKAGED_VERSION so we know exactly what curtin is running since we're picking by hash.

That said, this looks strange to me, I didn't think subiquity set that recently...

'/dev/sda', 'wipe': 'superblock-recursive'

Looking at the wiping of sda, we now check if the device to be wiped is a multipath member and skip it. sda is a member, and the output does not show that curtin ran the multipath -c command before wiping ... it *looks* to me that this build did not include the fix for this scenario.

Revision history for this message
Ryan Harper (raharper) wrote :

So, the most recent crash file, 21040202.tgz has this info

SnapChannel:
SnapRevision: 1735
SnapUpdated: False
SnapVersion: 20.04.1+git44.670c9c83
SourcePackage: subiquity

When I check the store now, the most recent snap is 1747:

% snap info subiquity
name: subiquity
summary: Ubuntu installer
publisher: Canonical✓
store-url: https://snapcraft.io/subiquity
contact: https://bugs.launchpad.net/subiquity
license: AGPL-3.0
description: |
  The Ubuntu server installer
snap-id: ba2aj8guta0zSRlT3QM5aJNAUXPlBtf9
channels:
  latest/stable: 20.03.2 2020-03-20 (1569) 54MB classic
  latest/candidate: 20.03.3 2020-03-27 (1581) 58MB classic
  latest/beta: ↑
  latest/edge: 20.04.2 2020-04-21 (1747) 58MB classic

I downloaded 1747, mounted the squashfs and confirmed that 1747
has all of the recent curtin fixes included.

Can you retest with 1747?

Revision history for this message
Frank Heimes (fheimes) wrote :

I used the ISO from April 20th
and with that I was initially on: 20.04.1+git44.670c9c83
and updated the installer to 20.04.2 (according to the auto update screen)
'snap list' showed me afterwards:
snap list | grep subiquity
subiquity 20.04.2 1748 latest/stable/… canonical* classic
(I recorded that by accident for a different bug)

I just wondered if I (by accident) uploaded a wrong (older) tgz, but no - it has the right name = date format <German date: 21st of April ...> and the files inside have a stamp of April 21st.

Since there is no newer daily-live image (there is still April 20th in pending), I can only retry more or less with the same file(s) I have ...

Revision history for this message
Frank Heimes (fheimes) wrote :

So I downloaded the ISO again (but it was the one from April 20th) and restarted the install and made sure that I am on the latest level (20.04.2 # 1748 for me).
And I ran into the issue again.

I've attached the steps that I followed in the txt file.

Please also see the attached log / crash files...

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

As we discussed on IRC, this recent crash here is UI related (there's not curtin install failure included).

Revision history for this message
Ryan Harper (raharper) wrote :

With some effort, UI workaround, the new error is this:

finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
Traceback (most recent call last):
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/commands/main.py", line 202, in main
    ret = args.func(args)
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/log.py", line 97, in wrapper
    return log_time("TIMED %s: " % msg, func, *args, **kwargs)
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/log.py", line 79, in log_time
    return func(*args, **kwargs)
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 96, in block_meta
    meta_clear(devices, state.get('report_stack_prefix', ''))
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 1823, in meta_clear
    clear_holders.clear_holders(devices)
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 610, in clear_holders
    holder_trees = [gen_holders_tree(path) for path in base_paths]
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 610, in <listcomp>
    holder_trees = [gen_holders_tree(path) for path in base_paths]
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 433, in gen_holders_tree
    device = block.sys_block_path(device)
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/block/__init__.py", line 146, in sys_block_path
    (parent, partnum) = get_blockdev_for_partition(devname, strict=strict)
  File "/snap/subiquity/1748/lib/python3.6/site-packages/curtin/block/__init__.py", line 394, in get_blockdev_for_partition
    raise OSError("%s had no syspath (%s)" % (devpath, syspath))
OSError: /dev/mapper/mpatha had no syspath (/sys/class/block/mpatha)
/dev/mapper/mpatha had no syspath (/sys/class/block/mpatha)
curtin: Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'simple']

Revision history for this message
Ryan Harper (raharper) wrote :

So, it looks like we somehow lost the race with multipath and
the mpatha device *went away*:

On a live s390x with multipath, mpatha is alive, we run the
gen_holders_tree on '/dev/mapper/mpatha' (which points to dm-0)
And we run it on a non-existing mpath-z.

rharper@s1lp6:~/work/git/curtin$ PYTHONPATH=`pwd` sudo python3
Python 3.7.5 (default, Nov 20 2019, 09:21:52)
[GCC 9.2.1 20191008] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from curtin.block import clear_holders
>>> clear_holders.gen_holders_tree('/dev/mapper/mpatha')
{'device': '/sys/class/block/dm-0', 'dev_type': 'disk', 'name': 'dm-0', 'holders': [{'device': '/sys/class/block/dm-1', 'dev_type': 'disk', 'name': 'dm-1', 'holders': [{'device': '/sys/class/block/dm-12', 'dev_type': 'disk', 'name': 'dm-12', 'holders': []}]}]}

Now on one that does not exist

>>> clear_holders.gen_holders_tree('/dev/mapper/mpathz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/rharper/curtin/curtin/block/clear_holders.py", line 464, in gen_holders_tree
    device = block.sys_block_path(device)
  File "/srv/rharper/curtin/curtin/block/__init__.py", line 141, in sys_block_path
    (parent, partnum) = get_blockdev_for_partition(devname, strict=strict)
  File "/srv/rharper/curtin/curtin/block/__init__.py", line 322, in get_blockdev_for_partition
    raise OSError("%s had no syspath (%s)" % (devpath, syspath))
OSError: /dev/mapper/mpathz had no syspath (/sys/class/block/mpathz)

So, I don't know why mpatha is now gone... but I'm not sure what
curtin can do about this.

Revision history for this message
Ryan Harper (raharper) wrote :

Frank saw another variant of this with /dev/mapper/mpatha-part1, which fails in the same way if the on-disk symlink isn't present.

>>> clear_holders.gen_holders_tree('/dev/mapper/mpatha-part1')
{'device': '/sys/class/block/dm-1', 'dev_type': 'disk', 'name': 'dm-1', 'holders': [{'device': '/sys/class/block/dm-12', 'dev_type': 'disk', 'name': 'dm-12', 'holders': []}]}

>>> clear_holders.gen_holders_tree('/dev/mapper/mpathz-part1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/srv/rharper/curtin/curtin/block/clear_holders.py", line 464, in gen_holders_tree
    device = block.sys_block_path(device)
  File "/srv/rharper/curtin/curtin/block/__init__.py", line 141, in sys_block_path
    (parent, partnum) = get_blockdev_for_partition(devname, strict=strict)
  File "/srv/rharper/curtin/curtin/block/__init__.py", line 322, in get_blockdev_for_partition
    raise OSError("%s had no syspath (%s)" % (devpath, syspath))
OSError: /dev/mapper/mpathz-part1 had no syspath (/sys/class/block/mpathz-part1)

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Do we need to do an install with udevadm monitor in another terminal or
udev logging jacked way up or something?

Revision history for this message
Ryan Harper (raharper) wrote :

Maybe; I suspect I need instructions for how to run the installer remotely like Frank's doing so I can recreate during my day time.

I'm going to try one more simulation where I manually nuke /dev/mapper symlinks before curtin runs to see if I can create the same tracebacks to confirm that's what we're seeing.

Things that would normally remove them are:

multipath -f mpatha # flush the mapping
dmsetup remove mpatha # delete it from the table

I don't think either of these are happening, so let me keep digging.

Revision history for this message
Ryan Harper (raharper) wrote :

OK, I believe I've recreated the issue.

Curtin recently changed to clear off any device which had a wipe: enabled. In this recent test, this was set on the disk and the partition, curtin generated a list of block devices to clear:

/dev/mapper/mpatha, /dev/mapper/mpatha-part1

The problem... and it's not easy to see in error path, is that /dev/mapper/mpatha-part1 *does not* exist yet, because we've not created it; it's not an existing partition ...

I've recreated this in our multipath scenario by setting wipe: superblock on those partitions.

The fix on the curtin side is to filter out devices that do not exist from the clear-holders list of things to clear.

In the process, I uncovered a few other issues:

1) block_meta.get_path_to_storage_volume would iterate through all disk search keys (wwn, serial, device_id, path) even if we found the disk with a previous key; we now break out of the loop as soon as we've found a disk

2) Sometimes we call kpartx too soon after running parted to create a partition. We fix this by calling udevadm settle to ensure the events have been processed before we update the kernel's device map with the partition.

I should have an MP up in a bit.

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 6ddbf74d to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=6ddbf74d

Revision history for this message
Frank Heimes (fheimes) wrote :

I saw 20.04.3 1763 today - not sure if there is something new in.
Anyway, I just retried on zVM and got this (I guess I was too early with retrying):

 Loading kernel module bcache via modprobe
 Running command ['modprobe', '--use-blacklist', 'bcache'] with allowed return c
odes [0] (capture=False)
 Loading kernel module zfs via modprobe
 Running command ['modprobe', '--use-blacklist', 'zfs'] with allowed return codes [0] (capture=False)
 zfs filesystem is not supported in this environment
 Generating device storage trees for path(s): ['/dev/mapper/mpatha']
 finish: cmd-install/stage-partitioning/builtin/cmd-block-meta/clear-holders: FAIL: removing previous storage devices
 TIMED BLOCK_META: 1.745
 finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
 Traceback (most recent call last):
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/commands/main.py", line 202, in main
     ret = args.func(args)
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/log.py", line 97, in wrapper
     return log_time("TIMED %s: " % msg, func, *args, **kwargs)
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/log.py", line 79, in log_time
     return func(*args, **kwargs)
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 96, in block_meta
     meta_clear(devices, state.get('report_stack_prefix', ''))
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 1828, in meta_clear
     clear_holders.clear_holders(devices)
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 611, in clear_holders
     holder_trees = [gen_holders_tree(path) for path in base_paths]
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 611, in <listcomp>
     holder_trees = [gen_holders_tree(path) for path in base_paths]
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 433, in gen_holders_tree
     device = block.sys_block_path(device)
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/block/__init__.py", line 148, in sys_block_path
     (parent, partnum) = get_blockdev_for_partition(devname, strict=strict)
   File "/snap/subiquity/1763/lib/python3.6/site-packages/curtin/block/__init__.py", line 396, in get_blockdev_for_partition
     raise OSError("%s had no syspath (%s)" % (devpath, syspath))
 OSError: /dev/mapper/mpatha had no syspath (/sys/class/block/mpatha)
 /dev/mapper/mpatha had no syspath (/sys/class/block/mpatha)
 curtin: Installation failed with exception: Unexpected error while running command.
 Command: ['curtin', 'block-meta', 'simple']
 Exit code: 3
 Reason: -

Revision history for this message
Frank Heimes (fheimes) wrote :

Some good news: an LPAR installation with zFCP/multipath disks worked for me - just using the disks how they were, means not manually wiping them or anything).

The LPAR is much more powerful compared to the z/VM guest I used before (in #43).

Revision history for this message
Frank Heimes (fheimes) wrote :

btw. updated the installer there to 20.04.3 (latest aot)

Revision history for this message
Frank Heimes (fheimes) wrote :

I have to admit that this was a one-of-a-kind lucky LPAR install.
Wanted to test something else on LPAR and the installation failed in a similar way as in #43.
And it (now) reliably fails with that (6 times in a row).
I guess a direct look into the system could shed some more light into it?!

Revision history for this message
Ryan Harper (raharper) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

Looking at the system we found that:

/dev/mapper/mpatha-part1 was a *block* device, not a symbolic link to a dm-X device.

Looking around finds this old issue:

https://bugzilla.redhat.com/show_bug.cgi?id=869253

which mentions that if "udev is slow" then the multipath library may create a block device.
We may want to look at pulling that in to our multipath tools. In any event,
curtin can check if this block device is present, remove it and then when kpartx runs it will create the symbolic link.

Revision history for this message
Frank Heimes (fheimes) wrote :

I must unfortunately say that curtin has still an issue:

Found invalid device mapper mp path: /dev/mapper/mpatha, removing
del_file: removed /dev/mapper/mpatha
multipath: regenerating symlink for /dev/mapper/mpatha (/dev/dm-0)
Running command ['udevadm', 'trigger', '--subsystem-match=block', '--action=add'
, '/sys/class/block/dm-0'] with allowed return codes [0] (capture=False)
Running command ['udevadm', 'settle', '--exit-if-exists=/dev/mapper/mpatha'] wit
h allowed return codes [0] (capture=False)
TIMED udevadm_settle(exists='/dev/mapper/mpatha'): 0.293
Failed to regenerate udev symlink /dev/mapper/mpatha
Running command ['pvscan', '--config', 'devices{ filter = [ "a|/dev/mapper/mpath.*|", "r|.*|" ] }'] with allowed return codes [0] (capture=True)
Running command ['vgscan', '--mknodes', '--config', 'devices{ filter = [ "a|/dev/mapper/mpath.*|", "r|.*|" ] }'] with allowed return codes [0] (capture=True)
Running command ['vgchange', '--activate=y', '--config', 'devices{ filter = [ "a|/dev/mapper/mpath.*|", "r|.*|" ] }'] with allowed return codes [0] (capture=True)
Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
TIMED udevadm_settle(): 0.028
Loading kernel module bcache via modprobe
Running command ['modprobe', '--use-blacklist', 'bcache'] with allowed return codes [0] (capture=False)
Loading kernel module zfs via modprobe
Running command ['modprobe', '--use-blacklist', 'zfs'] with allowed return codes [0] (capture=False)
zfs filesystem is not supported in this environment
Generating device storage trees for path(s): ['/dev/mapper/mpatha', '/dev/mapper/mpatha-part1']
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta/clear-holders: FAI
L: removing previous storage devices
TIMED BLOCK_META: 2.258
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
Traceback (most recent call last):
  File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/commands/main.py", line 202, in main
    ret = args.func(args)
  File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/log.py", line 97, in wrapper
    return log_time("TIMED %s: " % msg, func, *args, **kwargs)
  File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/log.py", line 79, in log_time
    return func(*args, **kwargs)
  File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 96, in block_meta
    meta_clear(devices, state.get('report_stack_prefix', ''))
  File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 1832, in meta_clear
    clear_holders.clear_holders(devices)
  File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 611, in clear_holders
    holder_trees = [gen_holders_tree(path) for path in base_paths]

Attaching /var/log and /var/crash ...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

So blockdev -> symlink conversion did happen:
 Found invalid device mapper mp path: /dev/mapper/mpatha, removing
 del_file: removed /dev/mapper/mpatha
 multipath: regenerating symlink for /dev/mapper/mpatha (/dev/dm-0)
 Running command ['udevadm', 'trigger', '--subsystem-match=block', '--action=add', '/sys/class/block/dm-0'] with allowed return codes [0] (capture=False)
 Running command ['udevadm', 'settle', '--exit-if-exists=/dev/mapper/mpatha'] with allowed return codes [0] (capture=False)
 TIMED udevadm_settle(exists='/dev/mapper/mpatha'): 0.293
 Failed to regenerate udev symlink /dev/mapper/mpatha
 Running command ['pvscan', '--config', 'devices{ filter = [ "a|/dev/mapper/mpath.*|", "r|.*|" ] }'] with allowed return codes [0] (capture=True)
 Running command ['vgscan', '--mknodes', '--config', 'devices{ filter = [ "a|/dev/mapper/mpath.*|", "r|.*|" ] }'] with allowed return codes [0] (capture=True)
 Running command ['vgchange', '--activate=y', '--config', 'devices{ filter = [ "a|/dev/mapper/mpath.*|", "r|.*|" ] }'] with allowed return codes [0] (capture=True)
 Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
 TIMED udevadm_settle(): 0.028
 Loading kernel module bcache via modprobe
 Running command ['modprobe', '--use-blacklist', 'bcache'] with allowed return codes [0] (capture=False)
 Loading kernel module zfs via modprobe
 Running command ['modprobe', '--use-blacklist', 'zfs'] with allowed return codes [0] (capture=False)
 zfs filesystem is not supported in this environment
 Generating device storage trees for path(s): ['/dev/mapper/mpatha', '/dev/mapper/mpatha-part1']
 finish: cmd-install/stage-partitioning/builtin/cmd-block-meta/clear-holders: FAIL: removing previous storage devices

However, now it doesn't like the next thing:

           File "/snap/subiquity/1773/lib/python3.6/site-packages/curtin/block/__init__.py", line 396, in get_blockdev_for_partition
             raise OSError("%s had no syspath (%s)" % (devpath, syspath))
         OSError: /dev/mapper/mpatha had no syspath (/sys/class/block/mpatha)
         /dev/mapper/mpatha had no syspath (/sys/class/block/mpatha)

And i'm not sure how or why that is.

Revision history for this message
Ryan Harper (raharper) wrote :

I was afraid this might happen; we fix it up and multipathd just goes and writes the file as a block device again underneath us.

We only have a workaround; the real issue needs to be root caused and fixed.

Revision history for this message
Ryan Harper (raharper) wrote :

I've filed:

https://bugs.launchpad.net/ubuntu/+source/multipath-tools/+bug/1874501

To track the multipath mapper files not symlinks issue.

I *think* we can close this bug against subiquity/curtin as when we have mapper files that are created by udev (symlinks) everything works.

Revision history for this message
Ryan Harper (raharper) wrote :

I'm marking subiquity task invalid, the changes made were in curtin. The outstanding issue we see with this scenario is related to multipath-tools itself. We can reopen this task if we find that subiquity needs to do something different.

Changed in subiquity:
status: New → Invalid
Changed in ubuntu-z-systems:
status: Fix Committed → In Progress
Changed in curtin:
status: Fix Committed → In Progress
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

The reasons why req4focal tag was added were resolved.

Focal is shipped.

There are more new bugs discovered in multipath-tools which are not architecture specific. And they can be handled elsewhere.

This bug is now too long, and majority of issues reported in this bug report have been resolved in the GA image.

Closing.

Changed in ubuntu-z-systems:
status: In Progress → Fix Released
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Changed in curtin:
status: In Progress → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

+1
The focus is now shifted to other tickets ...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.