Comment 7 for bug 2054672

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Hello

Thank you for your help so far.
I've made more tests on my lab with the daily build of Curtin (22.1-1153-gfc39d744-0ubuntu1+318~trunk~ubuntu22.04.1)

After doing more in depth analysis, I've noticed 3 different scenarios where installations with bcache can fail with Curtin :

1) If the servers are released without having their disks cleaned
--> curtin-logs-without-disk-erasing.tar
Problem : Curtin seems to fail to stop mdadm because bcache is on top, as a wild guess, it might be necessary for curtin to try to stop bcache first and then mdadm to be able to progress any further.

2) If the servers are released with only quick disk erasing and then redeployed
(This is a common scenario with hard-drives since the vast majority of them don't have the feature of secure erase like SSDs, and otherwise using MAAS to erase all the data hard-drives can literally take multiple days to complete)
--> curtin-logs-after-quick-disk-erase.tar
Problem : Partly related to the first issue, MAAS quick erase method doesn't seem to be thorough enough to remove all the partition signature such as bcache on the disks.

3) When using a commission script (manual-clean-disks.sh) to compensate MAAS quick erase not being thorough enough, and redeploying after the race condition can happen (initial subject of this bug report).
I believe it might be fixed since I cannot reproduce it on my lab, but I would need to test on the customer's environment to be able to confirm definitely.