timeout removing bcache device when lvm is over bcache

Bug #1844543 reported by Jason Hobbs on 2019-09-18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Chad Smith
curtin (Ubuntu)

Bug Description

Notes from a conversation with Ryan.

I've been able to reproduce a bcache device removal timeout during maas/curtin installation on 4.15.0-62-generic #69-Ubuntu

I've left the machine up so we can investigate

Ryan, I imported your ssh key from launchpad, ubuntu@ - can you have a look?

dmesg: http://paste.ubuntu.com/p/hWvSF5VnzS/

cloud-init-output.log http://paste.ubuntu.com/p/dv6bcnNW4Z/

I don't see any obvious oop's or tracebacks from the kernel. I'm not sure what else I should look for.

Looking at the shutdown tree, the dm-0 over the bcache-device looks like curtin should have stopped
the lvm device on top of bcache0

 Shutdown Plan:
        {'level': 2, 'device': '/sys/class/block/bcache1', 'dev_type': 'bcache'}
        {'level': 2, 'device': '/sys/class/block/dm-0', 'dev_type': 'lvm'}
        {'level': 1, 'device': '/sys/class/block/bcache0', 'dev_type': 'bcache'}

Let me see if I can recreate this structure in our vmtest; the way that the bcache show down works where we stop the cache of a bcache block device, in this scenario there's a relation between bcache1 and bcache0 which share a cacheset.

OK, I can recreate this issue in a VMtest with the config included in the logs (with a change to replicate the LVM over bcache) and it;s unrelated to the bcache kernel changes. The issue is the LVM over bcache0 (from ceph in the QA case) which shares a cacheset with bcache1 is the issue.

curtin config: http://paste.ubuntu.com/p/zmTBb2Bnxs/

Related branches

tags: added: cdo-qa foundations-engine
Jason Hobbs (jason-hobbs) wrote :

curtin error logs

description: updated
Chad Smith (chad.smith) on 2019-09-19
Changed in curtin:
status: New → Confirmed
assignee: nobody → Chad Smith (chad.smith)
Ryan Harper (raharper) on 2019-10-07
Changed in curtin:
importance: Undecided → High

This bug is fixed with commit e174e1cd to curtin on branch master.
To view that commit see the following URL:

Changed in curtin:
status: Confirmed → Fix Committed

This bug is believed to be fixed in curtin in version 19.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
Nobuto Murata (nobuto) wrote :

When can we expect this to be SRUed?

Ryan Harper (raharper) wrote :

We plan to start a curtin SRU next week.

Changed in curtin (Ubuntu):
status: New → Triaged
Jason Hobbs (jason-hobbs) wrote :

20:37 < rharper> powersj: jhobbs: re: curtin SRU; I had planned to SRU in Nov, but the fix that landed at the time was not complete; I was still able to recreate the failure. We have a more omplete fix that's passing all of the vmtest scenarios with bcache; that's landed, so likely SRU will start at the start of 2020.

Jason Hobbs (jason-hobbs) wrote :

I tested with 19.3-787-gb022ed4-0ubuntu1+228~trunk~ubuntu18.04.1 and it can repeatedly install just fine - no issues with the setup described above.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments