I've seen this failure in another scenario, tracked in this private bug (LP: #1815018); so I'm copying in the portion that's relevant here. -- The failure you saw in the 1 in 30 case I now believe is related to the time it takes to flush the cache device. Curtin currently finds a bcache's cacheset device and stops that first. In your ceph deployment scenario, each cache device has 3 backing devices being cached, which may contain a large amount of dirty data that needs to be flush; and this is where the longer timeout that was initially mentioned in LP: #1796292 helped. Curtin will now do the following for stopping bcache devices. 1. wipe the bcache device contents 2. extract the cacheset uuid 3. extract the backing device 4. detached bcache from cacheset 5. stop the bcacheN device 6. wait for removal of sysfs path to bcacheN, bcacheN/bcache and backing/bcache to go away 7. Check how many other backing devices are attached to cset_uuid, if zero, stop cset Notably at step 4, we will monitor the bcache device's state, and wait until the cacheset is no longer attached. And then Step 7, we only remove a cache device once all if the devices it was caching have been stopped. [ 83.467348] cloud-init[1081]: shutdown running on holder type: 'bcache' syspath: '/sys/class/block/bcache0' [ 83.470549] cloud-init[1081]: Wiping superblock on bcache device: /sys/class/block/bcache0 [ 83.472919] cloud-init[1081]: wiping superblock on /dev/bcache0 [ 83.474801] cloud-init[1081]: wiping /dev/bcache0 attempt 1/4 [ 83.477077] cloud-init[1081]: wiping 1M on /dev/bcache0 at offsets [0, -1048576] [ 83.479757] cloud-init[1081]: successfully wiped device /dev/bcache0 on attempt 1/4 [ 83.481915] cloud-init[1081]: os.path.exists on blockdevs: [ 83.484922] cloud-init[1081]: [('/sys/class/block/bcache0/bcache', True), ('/sys/class/block/vda/vda1/bcache', True)] [ 83.489019] cloud-init[1081]: bcache: detaching /sys/class/block/bcache0 from cacheset 7d2a9905-bd60-4db1-a8c8-12ca4ac90d45 [ 83.492648] cloud-init[1081]: /sys/class/block/bcache0 waiting up to 300s for cacheset to detach [ 83.496051] cloud-init[1081]: /sys/class/block/bcache0 cset detach check=0 state='dirty' dirty_data='1.9M' [ 85.372738] cloud-init[1081]: /sys/class/block/bcache0 cset detach check=1 state='no cache' dirty_data='0.0k' [ 85.373121] cloud-init[1081]: /sys/class/block/bcache0 successfully detached from cacheset 7d2a9905-bd60-4db1-a8c8-12ca4ac90d45 [ 85.377747] cloud-init[1081]: stopping bcache backing device at: /sys/class/block/bcache0/bcache [ 85.379303] cloud-init[1081]: waiting for /sys/class/block/bcache0 to be removed [ 85.380359] cloud-init[1081]: sleeping 0.2 [ 85.569818] cloud-init[1081]: /sys/class/block/bcache0 has been removed [ 85.571160] cloud-init[1081]: waiting for /sys/class/block/bcache0/bcache to be removed [ 85.571727] cloud-init[1081]: /sys/class/block/bcache0/bcache has been removed [ 85.576889] cloud-init[1081]: waiting for /sys/class/block/vda/vda1/bcache to be removed [ 85.580122] cloud-init[1081]: /sys/class/block/vda/vda1/bcache has been removed [ 85.584239] cloud-init[1081]: Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False) [ 85.598546] cloud-init[1081]: TIMED udevadm_settle(): 0.025 [ 85.603519] cloud-init[1081]: bcache backing device stopped: /sys/class/block/bcache0/bcache [ 85.611148] cloud-init[1081]: /sys/class/block/bcache0 was attached to cacheset 7d2a9905-bd60-4db1-a8c8-12ca4ac90d45, checking for other members [ 85.626467] cloud-init[1081]: delaying removal of cacheset 7d2a9905-bd60-4db1-a8c8-12ca4ac90d45, still caching other devices: