Subiquity installer fails on cleaning multipath device (timeout)

Bug #1860778 reported by Frank Heimes
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Critical
Unassigned
subiquity
Fix Released
Undecided
Unassigned
curtin (Ubuntu)
Fix Released
Critical
Unassigned

Bug Description

The issue described here has a relationship to LP 1857042 "curtin mishandles removing dm-multipath partition removal" - after testing a fix for 1857042, subiquity fails in one of the next steps at 'cleaning' / wiping the device.

A snippet from the logs show the following:

         get_blockdev_sector_size: (log=512, phys=512)
         Checking if /dev/dm-0 is a swap device
         Found swap magic: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
         Running command ['multipath', '-c', '/dev/dm-0'] with allowed return codes [0] (capture=True)
         wiping superblock on /dev/dm-0
         wiping /dev/dm-0 attempt 1/4
         wiping 1M on /dev/dm-0 at offsets [0, -1048576]
         successfully wiped device /dev/dm-0 on attempt 1/4
         Running command ['udevadm', 'info', '--query=property', '/dev/dm-0'] with allowed return codes [0] (capture=True)
         Running command ['udevadm', 'info', '--query=property', '/dev/dm-0'] with allowed return codes [0] (capture=True)
         Running command ['multipathd', 'show', 'maps', 'raw', 'format', "name=%n multipath='%w' sysfs='%d' paths='%N'"] with allowed return codes [0] (capture=True)
         Extracted multipath maps fields: {'name': 'mpatha', 'multipath': '36005076306ffd6b60000000000002602', 'sysfs': 'dm-0', 'paths': '4'}
         multipath: removing multipath map: mpatha
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | mpatha: map in use
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.035
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.077
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.006
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.024
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.023
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.023
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:55 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.023
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:56 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.016
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:56 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.015
         Running command ['multipath', '-f', 'mpatha'] with allowed return codes [0, 1] (capture=False)
         Jan 23 14:16:56 | failed to remove multipath map mpatha
         Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
         TIMED udevadm_settle(): 0.013
         waiting for /dev/mapper/mpatha to be removed
         sleeping 1
         sleeping 3
         sleeping 5
         sleeping 7
         finish: cmd-install/stage-partitioning/builtin/cmd-block-meta/clear-holders: FAIL: removing previous storage devices
         TIMED BLOCK_META: 19.016
         finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
         Traceback (most recent call last):
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/commands/main.py", line 202, in main
             ret = args.func(args)
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/log.py", line 97, in wrapper
             return log_time("TIMED %s: " % msg, func, *args, **kwargs)
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/log.py", line 79, in log_time
             return func(*args, **kwargs)
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 79, in block_meta
             meta_clear(devices, state.get('report_stack_prefix', ''))
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 1662, in meta_clear
             clear_holders.clear_holders(devices)
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 639, in clear_holders
             shutdown_function(dev_info['device'])
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/block/clear_holders.py", line 333, in wipe_superblock
             multipath.remove_map(mp_id)
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/block/multipath.py", line 102, in remove_map
             util.wait_for_removal(devpath)
           File "/snap/subiquity/1429/lib/python3.6/site-packages/curtin/util.py", line 296, in wait_for_removal
             raise OSError('Timeout exceeded for removal of %s', path)
         OSError: [Errno Timeout exceeded for removal of %s] /dev/mapper/mpatha
         [Errno Timeout exceeded for removal of %s] /dev/mapper/mpatha

Also trying to wipe/clean the device manually from the shell does not (always) succeed, since the device is reported to be in use, but lsof does not show any candidates:

root@ubuntu-server:/# wipefs /dev/mapper/mpatha
DEVICE OFFSET TYPE UUID LABEL
mpatha 0x200 gpt
mpatha 0xffffffe00 gpt
mpatha 0x1fe PMBR
root@ubuntu-server:/# wipefs -a /dev/mapper/mpatha
wipefs: error: /dev/mapper/mpatha: probing initialization failed: Device or reso
urce busy
root@ubuntu-server:/# lsof /dev/mapper/mpatha
root@ubuntu-server:/# multipath -ll
mpatha (36005076306ffd6b60000000000002602) dm-0 IBM,2107900
size=64G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  |- 0:0:1:1073889318 sdb 8:16 active ready running
  |- 0:0:0:1073889318 sda 8:0 active ready running
  |- 1:0:0:1073889318 sdc 8:32 active ready running
  `- 1:0:1:1073889318 sdd 8:48 active ready running

root@ubuntu-server:/# multipath -f mpatha-part1
root@ubuntu-server:/# lsof /dev/mapper/mpatha-part1
root@ubuntu-server:/# lsof /dev/mapper/mpatha
root@ubuntu-server:/# wipefs /dev/mapper/mpatha-part1
root@ubuntu-server:/# wipefs /dev/mapper/mpatha
DEVICE OFFSET TYPE UUID LABEL
mpatha 0x200 gpt
mpatha 0xffffffe00 gpt
mpatha 0x1fe PMBR
root@ubuntu-server:/# wipefs -a /dev/mapper/mpatha
wipefs: error: /dev/mapper/mpatha: probing initialization failed: Device or reso
urce busy
root@ubuntu-server:/#

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :

logs from another try

Changed in ubuntu-z-systems:
importance: Undecided → Critical
Revision history for this message
Frank Heimes (fheimes) wrote :

List of running processes

Revision history for this message
Ryan Harper (raharper) wrote :

Hi Frank,

I believe we've fixed this by:

1) Issuing a dmsetup remove on the 'mpatha' dm device

   dmsetup remove --force --retry /dev/dm-0

2) Instructing multipath -f to retry the flush

   multipath -v3 -R3 -f mpatha

I believe the current edge/case snap has this fix in it, if not xnox should be able to respin one with the fix included. Please test if that resolves the issue.

Changed in curtin (Ubuntu):
importance: Undecided → Critical
status: New → Incomplete
Revision history for this message
Frank Heimes (fheimes) wrote :

Hey Ryan - ok, will test soon.
Do you have a version number that is known to have that included?
Just for me making sure that I really test on a fixed version ...
I'll start with using latest/edge/case ...

Revision history for this message
Frank Heimes (fheimes) wrote :

That seems to be fine now with 19.12.2+git47.e510f574 from latest/edge/case.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Ryan, please merge your branch of fix/mpath-partition-wiping into curtin proper, thanks.

Revision history for this message
Ryan Harper (raharper) wrote :
Changed in curtin (Ubuntu):
status: Incomplete → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

I can confirm that it is fixed with subiquity 20.03.
I installed focal on an zfcp multipath system where before a LUKS installation was running.
While selecting 'whole disk' the installer was able to wipe out the old scheme and replaced it by an all-in-on part setup.

Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
Joshua Powers (powersj) wrote :

Marking fix released on other tasks, so this doesn't show up in the report.

Changed in subiquity:
status: New → Fix Released
Changed in curtin (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.