clear-holders doesn't wait for md device to stop

Bug #1682584 reported by Ryan Harper
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin
High
Unassigned

Bug Description

Initially in clear-holders we saw an issue[1] when attempting to use mdadm_remove on a stopped
device due to how the mdadm shutdown handler in curtin's clear-holders is written.

While documented in many places, examination of the mdadm code in Trusty (and beyond) mdadm --remove is explicitly for ejecting one of the raid members; it is not needed as part of releasing raid resources.

With removing calls the mdadm_remove, this helped not trip a failure when calling remove on the array; however, it exposed the fact that clear-holders didn't wait until the md resources were released.

While working on that piece, Xenial and newer kernels exhibit some resource leaking which leaves entries in sysfs present (LP:1682456); this requires a kernel fix but does not block any functionalty. Curtin can continue to create a raid device and complete an install.

Until the kernel portion is fixed, curtin will monitor whether a raid device has been released by examining the output of /proc/mdstat.

The triggering storage config[2] is being added to curtin's vmtests to help catch an regressions here and then validate once the kernel fix is complete that we can switch to watching sysfs entries instead of /proc/mdstat.

1.
shutdown running on holder type: 'raid' syspath: '/sys/class/block/md1'
path_to_kname input: '/sys/devices/virtual/block/md1' output: 'md1'
kname_to_path input: 'md1' output: '/dev/md1'
using mdadm.mdadm_stop on dev: /dev/md1
mdadm stopping: /dev/md1
Running command ['mdadm', '--stop', '/dev/md1'] with allowed return codes [0] (shell=False, capture=True)
mdadm stop:

mdadm: stopped /dev/md1

mdadm removing: /dev/md1
Running command ['mdadm', '--remove', '/dev/md1'] with allowed return codes [0] (shell=False, capture=True)
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: failed: removing previous storage devices
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: failed: curtin command block-meta
Traceback (most recent call last):
  File "/curtin/curtin/commands/main.py", line 211, in main
    ret = args.func(args)
  File "curtin/commands/block_meta.py", line 62, in block_meta
    meta_custom(args)
  File "curtin/commands/block_meta.py", line 1041, in meta_custom
    clear_holders.clear_holders(disk_paths)
  File "curtin/block/clear_holders.py", line 379, in clear_holders
    shutdown_function(dev_info['device'])
  File "curtin/block/clear_holders.py", line 146, in shutdown_mdadm
    mdadm.mdadm_remove(blockdev)
  File "curtin/block/mdadm.py", line 269, in mdadm_remove
    rcs=[0], capture=True)
  File "curtin/util.py", line 174, in subp
    return _subp(*args, **kwargs)
  File "curtin/util.py", line 122, in _subp
    cmd=args)
ProcessExecutionError: Unexpected error while running command.
Command: ['mdadm', '--remove', '/dev/md1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'mdadm: error opening /dev/md1: No such file or directory\n'
Unexpected error while running command.
Command: ['mdadm', '--remove', '/dev/md1']
Exit code: 1
Reason: -
Stdout: ''
Stderr: u'mdadm: error opening /dev/md1: No such file or directory\n'
builtin command failed
finish: cmd-install/stage-partitioning/builtin: FAIL: failed: running 'curtin block-meta custom'
builtin took 2.579 seconds
stage_partitioning took 2.579 seconds
finish: cmd-install/stage-partitioning: FAIL: failed: configuring storage
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'custom']
Exit code: 3

2.
storage:
  config:
  - grub_device: true
    id: sda
    name: sda
    ptable: msdos
    type: disk
    wipe: superblock
    path: /dev/vdb
    name: main_disk
  - id: sdb
    name: sdb
    ptable: gpt
    type: disk
    wipe: superblock
    path: /dev/vdc
    name: second_disk
  - device: sda
    flag: boot
    id: sda-part1
    name: sda-part1
    number: 1
    offset: 4194304B
    size: 511705088B
    type: partition
    uuid: fc7ab24c-b6bf-460f-8446-d3ac362c0625
    wipe: superblock
  - device: sda
    id: sda-part2
    name: sda-part2
    number: 2
    size: 2G
    type: partition
    uuid: 47c97eae-f35d-473f-8f3d-d64161d571f1
    wipe: superblock
  - device: sda
    id: sda-part3
    name: sda-part3
    number: 3
    size: 2G
    type: partition
    uuid: e3202633-841c-4936-a520-b18d1f7938ea
    wipe: superblock
  - device: sdb
    flag: boot
    id: sdb-part1
    name: sdb-part1
    number: 1
    offset: 4194304B
    size: 511705088B
    type: partition
    uuid: 86326392-3706-4124-87c6-2992acfa31cc
    wipe: superblock
  - device: sdb
    id: sdb-part2
    name: sdb-part2
    number: 2
    size: 2G
    type: partition
    uuid: a33a83dd-d1bf-4940-bf3e-6d931de85dbc
    wipe: superblock
  - devices:
    - sda-part2
    - sdb-part2
    id: md0
    name: md0
    raidlevel: 1
    spare_devices: []
    type: raid
  - device: sdb
    id: sdb-part3
    name: sdb-part3
    number: 3
    size: 2G
    type: partition
    uuid: 27e29758-fdcf-4c6a-8578-c92f907a8a9d
    wipe: superblock
  - devices:
    - sda-part3
    - sdb-part3
    id: md1
    name: md1
    raidlevel: 1
    spare_devices: []
    type: raid
  - fstype: fat32
    id: sda-part1_format
    label: efi
    type: format
    uuid: b3d50fc7-2f9e-4d1a-9e24-28985e4c560b
    volume: sda-part1
  - fstype: fat32
    id: sdb-part1_format
    label: efi
    type: format
    uuid: c604cbb1-2ee1-4575-9489-d38a60fa0cf2
    volume: sdb-part1
  - fstype: ext4
    id: md0_format
    label: ''
    type: format
    uuid: 76a315b7-2979-436c-b156-9ae64a565a59
    volume: md0
  - fstype: ext4
    id: md1_format
    label: ''
    type: format
    uuid: 48dceca6-a9f9-4c7b-bfd3-7f3a0faa4ecc
    volume: md1
  - device: md0_format
    id: md0_mount
    options: ''
    path: /
    type: mount
  - device: sda-part1_format
    id: sda-part1_mount
    options: ''
    path: /boot/efi
    type: mount
  - device: md1_format
    id: md1_mount
    options: ''
    path: /var
    type: mount
  version: 1

Related branches

Ryan Harper (raharper)
description: updated
Ryan Harper (raharper)
Changed in curtin:
importance: Undecided → High
status: New → Fix Committed
tags: added: 4010
Revision history for this message
Scott Moser (smoser) wrote : Fixed in Curtin 17.1

This bug is believed to be fixed in curtin in 17.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers