clear-holders doesn't wait for md device to stop
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin |
Fix Released
|
High
|
Unassigned |
Bug Description
Initially in clear-holders we saw an issue[1] when attempting to use mdadm_remove on a stopped
device due to how the mdadm shutdown handler in curtin's clear-holders is written.
While documented in many places, examination of the mdadm code in Trusty (and beyond) mdadm --remove is explicitly for ejecting one of the raid members; it is not needed as part of releasing raid resources.
With removing calls the mdadm_remove, this helped not trip a failure when calling remove on the array; however, it exposed the fact that clear-holders didn't wait until the md resources were released.
While working on that piece, Xenial and newer kernels exhibit some resource leaking which leaves entries in sysfs present (LP:1682456); this requires a kernel fix but does not block any functionalty. Curtin can continue to create a raid device and complete an install.
Until the kernel portion is fixed, curtin will monitor whether a raid device has been released by examining the output of /proc/mdstat.
The triggering storage config[2] is being added to curtin's vmtests to help catch an regressions here and then validate once the kernel fix is complete that we can switch to watching sysfs entries instead of /proc/mdstat.
1.
shutdown running on holder type: 'raid' syspath: '/sys/class/
path_to_kname input: '/sys/devices/
kname_to_path input: 'md1' output: '/dev/md1'
using mdadm.mdadm_stop on dev: /dev/md1
mdadm stopping: /dev/md1
Running command ['mdadm', '--stop', '/dev/md1'] with allowed return codes [0] (shell=False, capture=True)
mdadm stop:
mdadm: stopped /dev/md1
mdadm removing: /dev/md1
Running command ['mdadm', '--remove', '/dev/md1'] with allowed return codes [0] (shell=False, capture=True)
finish: cmd-install/
finish: cmd-install/
Traceback (most recent call last):
File "/curtin/
ret = args.func(args)
File "curtin/
meta_
File "curtin/
clear_
File "curtin/
shutdown_
File "curtin/
mdadm.
File "curtin/
rcs=[0], capture=True)
File "curtin/util.py", line 174, in subp
return _subp(*args, **kwargs)
File "curtin/util.py", line 122, in _subp
cmd=args)
ProcessExecutio
Command: ['mdadm', '--remove', '/dev/md1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'mdadm: error opening /dev/md1: No such file or directory\n'
Unexpected error while running command.
Command: ['mdadm', '--remove', '/dev/md1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'mdadm: error opening /dev/md1: No such file or directory\n'
builtin command failed
finish: cmd-install/
builtin took 2.579 seconds
stage_partitioning took 2.579 seconds
finish: cmd-install/
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3
2.
storage:
config:
- grub_device: true
id: sda
name: sda
ptable: msdos
type: disk
wipe: superblock
path: /dev/vdb
name: main_disk
- id: sdb
name: sdb
ptable: gpt
type: disk
wipe: superblock
path: /dev/vdc
name: second_disk
- device: sda
flag: boot
id: sda-part1
name: sda-part1
number: 1
offset: 4194304B
size: 511705088B
type: partition
uuid: fc7ab24c-
wipe: superblock
- device: sda
id: sda-part2
name: sda-part2
number: 2
size: 2G
type: partition
uuid: 47c97eae-
wipe: superblock
- device: sda
id: sda-part3
name: sda-part3
number: 3
size: 2G
type: partition
uuid: e3202633-
wipe: superblock
- device: sdb
flag: boot
id: sdb-part1
name: sdb-part1
number: 1
offset: 4194304B
size: 511705088B
type: partition
uuid: 86326392-
wipe: superblock
- device: sdb
id: sdb-part2
name: sdb-part2
number: 2
size: 2G
type: partition
uuid: a33a83dd-
wipe: superblock
- devices:
- sda-part2
- sdb-part2
id: md0
name: md0
raidlevel: 1
spare_devices: []
type: raid
- device: sdb
id: sdb-part3
name: sdb-part3
number: 3
size: 2G
type: partition
uuid: 27e29758-
wipe: superblock
- devices:
- sda-part3
- sdb-part3
id: md1
name: md1
raidlevel: 1
spare_devices: []
type: raid
- fstype: fat32
id: sda-part1_format
label: efi
type: format
uuid: b3d50fc7-
volume: sda-part1
- fstype: fat32
id: sdb-part1_format
label: efi
type: format
uuid: c604cbb1-
volume: sdb-part1
- fstype: ext4
id: md0_format
label: ''
type: format
uuid: 76a315b7-
volume: md0
- fstype: ext4
id: md1_format
label: ''
type: format
uuid: 48dceca6-
volume: md1
- device: md0_format
id: md0_mount
options: ''
path: /
type: mount
- device: sda-part1_format
id: sda-part1_mount
options: ''
path: /boot/efi
type: mount
- device: md1_format
id: md1_mount
options: ''
path: /var
type: mount
version: 1
Related branches
- Server Team CI bot: Approve (continuous-integration)
- Scott Moser (community): Approve
- Chad Smith: Approve
- Joshua Powers (community): Approve
-
Diff: 490 lines (+379/-6)6 files modifiedcurtin/block/clear_holders.py (+23/-1)
curtin/block/mdadm.py (+27/-1)
examples/tests/mirrorboot-uefi.yaml (+138/-0)
tests/unittests/test_block_mdadm.py (+94/-1)
tests/unittests/test_clear_holders.py (+57/-3)
tests/vmtests/test_mdadm_bcache.py (+40/-0)
description: | updated |
Changed in curtin: | |
importance: | Undecided → High |
status: | New → Fix Committed |
tags: | added: 4010 |
This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New
Thank you.