The logs indicate that we could not stop the raid array after numerous retries.
Feb 20 08:41:24 infra1 cloud-init[2863]: Running command ['mdadm', '--detail', '--scan', '-v'] with allowed return codes [0, 1] (capture=True)
Feb 20 08:41:24 infra1 cloud-init[2863]: mdadm detail scan after assemble:
Feb 20 08:41:24 infra1 cloud-init[2863]: ARRAY /dev/md/infra1:0 level=raid5 num-devices=3 metadata=1.2 spares=1 name=infra1:0 UUID=9ab31ec9:f764d630:e27e2ca5:6a083092
Feb 20 08:41:24 infra1 cloud-init[2863]: devices=/dev/sdb,/dev/sdc,/dev/sdd
That's the RAID5 array
Here's the storage tree layout:
Feb 20 08:41:26 infra1 cloud-init[2863]: Current device storage tree:
Feb 20 08:41:26 infra1 cloud-init[2863]: sda
Feb 20 08:41:26 infra1 cloud-init[2863]: sdb
Feb 20 08:41:26 infra1 cloud-init[2863]: `-- md127
Feb 20 08:41:26 infra1 cloud-init[2863]: sdc
Feb 20 08:41:26 infra1 cloud-init[2863]: `-- md127
Feb 20 08:41:26 infra1 cloud-init[2863]: sdd
Feb 20 08:41:26 infra1 cloud-init[2863]: `-- md127
Feb 20 08:41:26 infra1 cloud-init[2863]: sde
Feb 20 08:41:26 infra1 cloud-init[2863]: sdf
Feb 20 08:41:26 infra1 cloud-init[2863]: shutdown running on holder type: 'raid' syspath: '/sys/class/block/md127'
And curtin tries up to 60 times to stop the raid:
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm: stop on /dev/md127 attempt 0
Feb 20 08:41:26 infra1 cloud-init[2863]: /sys/class/block/md127/md/sync_action/sync_max = idle
Feb 20 08:41:26 infra1 cloud-init[2863]: /sys/class/block/md127/md/sync_max/sync_max = max
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm: setting array sync_{min,max}=0
Feb 20 08:41:26 infra1 kernel: [ 86.124825] bcache: register_bcache() error opening /dev/md127: device already registered
Feb 20 08:41:26 infra1 cloud-init[2863]: Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm stop failed, retrying
Feb 20 08:41:26 infra1 cloud-init[2863]: /proc/mdstat:
Feb 20 08:41:26 infra1 cloud-init[2863]: Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
Feb 20 08:41:26 infra1 cloud-init[2863]: md127 : active (auto-read-only) raid5 sdd[3] sdc[1] sdb[0]
Feb 20 08:41:26 infra1 cloud-init[2863]: 1953262592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
Feb 20 08:41:26 infra1 cloud-init[2863]: bitmap: 0/8 pages [0KB], 65536KB chunk
Feb 20 08:41:26 infra1 cloud-init[2863]: unused devices: <none>
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm: stop failed, retrying in 0.2 seconds
...
Feb 20 08:41:40 infra1 cloud-init[2863]: mdadm: stop on /dev/md127 attempt 59
Feb 20 08:41:40 infra1 cloud-init[2863]: /sys/class/block/md127/md/sync_action/sync_max = idle
Feb 20 08:41:40 infra1 cloud-init[2863]: /sys/class/block/md127/md/sync_max/sync_max = 0
Feb 20 08:41:40 infra1 cloud-init[2863]: Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
Feb 20 08:41:40 infra1 cloud-init[2863]: mdadm stop failed, retrying
Feb 20 08:41:40 infra1 cloud-init[2863]: /proc/mdstat:
Feb 20 08:41:40 infra1 cloud-init[2863]: Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
Feb 20 08:41:40 infra1 cloud-init[2863]: md127 : active (auto-read-only) raid5 sdd[3] sdc[1] sdb[0]
Feb 20 08:41:40 infra1 cloud-init[2863]: 1953262592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
Feb 20 08:41:40 infra1 cloud-init[2863]: bitmap: 0/8 pages [0KB], 65536KB chunk
Feb 20 08:41:40 infra1 cloud-init[2863]: unused devices: <none>
Feb 20 08:41:40 infra1 cloud-init[2863]: mdadm: stop failed, retrying in 0.2 seconds
The interesting bit is the bcache error; it appears to be related to the array. But I don't know how at this time.
Is this reproducible with wiped disks? or Only on redeploy? You filed https://bugs.launchpad.net/curtin/+bug/1750580 which I've not yet figured out if bcache supports using raid devices. If bcache over raid is not supported, then this is fallout from that lack of support.
The logs indicate that we could not stop the raid array after numerous retries.
Feb 20 08:41:24 infra1 cloud-init[2863]: Running command ['mdadm', '--detail', '--scan', '-v'] with allowed return codes [0, 1] (capture=True) f764d630: e27e2ca5: 6a083092 /dev/sdb, /dev/sdc, /dev/sdd
Feb 20 08:41:24 infra1 cloud-init[2863]: mdadm detail scan after assemble:
Feb 20 08:41:24 infra1 cloud-init[2863]: ARRAY /dev/md/infra1:0 level=raid5 num-devices=3 metadata=1.2 spares=1 name=infra1:0 UUID=9ab31ec9:
Feb 20 08:41:24 infra1 cloud-init[2863]: devices=
That's the RAID5 array
Here's the storage tree layout:
Feb 20 08:41:26 infra1 cloud-init[2863]: Current device storage tree: block/md127'
Feb 20 08:41:26 infra1 cloud-init[2863]: sda
Feb 20 08:41:26 infra1 cloud-init[2863]: sdb
Feb 20 08:41:26 infra1 cloud-init[2863]: `-- md127
Feb 20 08:41:26 infra1 cloud-init[2863]: sdc
Feb 20 08:41:26 infra1 cloud-init[2863]: `-- md127
Feb 20 08:41:26 infra1 cloud-init[2863]: sdd
Feb 20 08:41:26 infra1 cloud-init[2863]: `-- md127
Feb 20 08:41:26 infra1 cloud-init[2863]: sde
Feb 20 08:41:26 infra1 cloud-init[2863]: sdf
Feb 20 08:41:26 infra1 cloud-init[2863]: shutdown running on holder type: 'raid' syspath: '/sys/class/
And curtin tries up to 60 times to stop the raid:
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm: stop on /dev/md127 attempt 0 block/md127/ md/sync_ action/ sync_max = idle block/md127/ md/sync_ max/sync_ max = max
Feb 20 08:41:26 infra1 cloud-init[2863]: /sys/class/
Feb 20 08:41:26 infra1 cloud-init[2863]: /sys/class/
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm: setting array sync_{min,max}=0
Feb 20 08:41:26 infra1 kernel: [ 86.124825] bcache: register_bcache() error opening /dev/md127: device already registered
Feb 20 08:41:26 infra1 cloud-init[2863]: Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm stop failed, retrying
Feb 20 08:41:26 infra1 cloud-init[2863]: /proc/mdstat:
Feb 20 08:41:26 infra1 cloud-init[2863]: Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
Feb 20 08:41:26 infra1 cloud-init[2863]: md127 : active (auto-read-only) raid5 sdd[3] sdc[1] sdb[0]
Feb 20 08:41:26 infra1 cloud-init[2863]: 1953262592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
Feb 20 08:41:26 infra1 cloud-init[2863]: bitmap: 0/8 pages [0KB], 65536KB chunk
Feb 20 08:41:26 infra1 cloud-init[2863]: unused devices: <none>
Feb 20 08:41:26 infra1 cloud-init[2863]: mdadm: stop failed, retrying in 0.2 seconds
...
Feb 20 08:41:40 infra1 cloud-init[2863]: mdadm: stop on /dev/md127 attempt 59 block/md127/ md/sync_ action/ sync_max = idle block/md127/ md/sync_ max/sync_ max = 0
Feb 20 08:41:40 infra1 cloud-init[2863]: /sys/class/
Feb 20 08:41:40 infra1 cloud-init[2863]: /sys/class/
Feb 20 08:41:40 infra1 cloud-init[2863]: Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
Feb 20 08:41:40 infra1 cloud-init[2863]: mdadm stop failed, retrying
Feb 20 08:41:40 infra1 cloud-init[2863]: /proc/mdstat:
Feb 20 08:41:40 infra1 cloud-init[2863]: Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
Feb 20 08:41:40 infra1 cloud-init[2863]: md127 : active (auto-read-only) raid5 sdd[3] sdc[1] sdb[0]
Feb 20 08:41:40 infra1 cloud-init[2863]: 1953262592 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
Feb 20 08:41:40 infra1 cloud-init[2863]: bitmap: 0/8 pages [0KB], 65536KB chunk
Feb 20 08:41:40 infra1 cloud-init[2863]: unused devices: <none>
Feb 20 08:41:40 infra1 cloud-init[2863]: mdadm: stop failed, retrying in 0.2 seconds
The interesting bit is the bcache error; it appears to be related to the array. But I don't know how at this time.
Is this reproducible with wiped disks? or Only on redeploy? You filed https:/ /bugs.launchpad .net/curtin/ +bug/1750580 which I've not yet figured out if bcache supports using raid devices. If bcache over raid is not supported, then this is fallout from that lack of support.