Comment 0 for bug 1844543

Jason Hobbs (jason-hobbs) wrote :

Notes from a conversation with Ryan.

me:
I've been able to reproduce a bcache device removal timeout during maas/curtin installation on 4.15.0-62-generic #69-Ubuntu

I've left the machine up so we can investigate

Ryan, I imported your ssh key from launchpad, ubuntu@10.244.41.14 - can you have a look?

dmesg: http://paste.ubuntu.com/p/hWvSF5VnzS/

cloud-init-output.log http://paste.ubuntu.com/p/dv6bcnNW4Z/

Ryan:
I don't see any obvious oop's or tracebacks from the kernel. I'm not sure what else I should look for.

Looking at the shutdown tree, the dm-0 over the bcache-device looks like curtin should have stopped
the lvm device on top of bcache0

 Shutdown Plan:
        {'level': 2, 'device': '/sys/class/block/bcache1', 'dev_type': 'bcache'}
        {'level': 2, 'device': '/sys/class/block/dm-0', 'dev_type': 'lvm'}
        {'level': 1, 'device': '/sys/class/block/bcache0', 'dev_type': 'bcache'}

Let me see if I can recreate this structure in our vmtest; the way that the bcache show down works where we stop the cache of a bcache block device, in this scenario there's a relation between bcache1 and bcache0 which share a cacheset.

OK, I can recreate this issue in a VMtest with the config included in the logs (with a change to replicate the LVM over bcache) and it;s unrelated to the bcache kernel changes. The issue is the LVM over bcache0 (from ceph in the QA case) which shares a cacheset with bcache1 is the issue.