Quick erasing disks doesn't clean properly some special filesystem

Bug #2057782 reported by DUFOUR Olivier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Triaged
Medium
Unassigned

Bug Description

This bug reports relates to servers using :
* bcache [1]
* LVM [2]

#
# Environment :
#
* MAAS 3.3 and 3.4
* Ubuntu 22.04
* deployment / commissioning OS : 20.04 and 22.04

#
# Problem :
#
When using special block layers such as Bcache or LVM, Curtin may fail or refuse to perform the installation if it detects partial traces of such filesystem. It is fairly frequent to use bcache, as part of our main recommendations with Ceph and hard drives, on our deployments.

For many deployment, we need to resort to "quick erase" the disks when we release a server, before trying to redeploy it.
As of today, quick erase does the following :
* on each disk
 * run wipefs -a -f /dev/<disk> (will just wipe the GPT partition table)
 * write 2MB of data at the beginning of the disk
 * write 2MB of data at the end of the disk

This is clearly insufficient since the "wipefs" tool is not working in a recursive manner (see [3]), meaning that any filesystem signature on a partition, for example, on /dev/sda1 or /dev/sda2 ... will still be present and may be detected when Curtin will recreate the same GPT partition layout at the exact same position.
Worse, it is currently only cleaning the beginning and end of the disk, meaning that any partition in the middle of the disk may become problematic later on.

I must add that "quick erase" is an important feature and used for specific cases we encounter frequently :
* servers with many hard drives, secure erase is not available and zeroing the drives on servers with like 12 disks can take more than week to complete !
* disks behind a hardware RAID controller, they don't expose the secure erase capability, leading to zeroing the drives too, even if they are SSD.

#
# Current workaround :
#
* is to have a custom commissioning scripts to perform the cleaning manually instead of using MAAS quick erase feature
* recommissiong manually the servers with the script before any deployment...

Proposed solution :
Have the quick erase to do the following :
* Stop any bcache detected by the kernel
* Stop any LVM detected by the kernel
* Clean-up filesystem above any mdadm
* when cleaning disks individually
 * clean-up first each partition with wipefs to remove any filesystem signature
 * reuse current behaviour :
   * use a final wipefs the disk to remove the GPT partition table
   * wipe the 2MB at the beginning and end of disk

[1] https://bugs.launchpad.net/maas/+bug/2054672
[2] https://bugs.launchpad.net/maas/+bug/2016351
[3] https://github.com/util-linux/util-linux/issues/1682

Bill Wear (billwear)
Changed in maas:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 3.6.0
Revision history for this message
DUFOUR Olivier (odufourc) wrote :

I've made a patch and tested it in my lab where it successfully clean-up machines with bcache, mdadm, LVM or a combination of them on top of multiple disks whereas it used to leave a lot of leftovers on the drives while still working extremely quickly.
(it takes a few seconds to run completely on a server in my lab)

An example of output from the cleaning script would be like below :
vda, vdb, vdc to be wiped.
/sys/fs/bcache/3427372a-636a-408c-b4cb-f65c185e4022 : bcache detected
stopping bcache in /sys/fs/bcache/3427372a-636a-408c-b4cb-f65c185e4022
cleaning filesystem above raid md126
raid md126: filesystem successfully quickly wiped.
mdadm: stopped md126
raid md126: successfully deactivated.
cleaning filesystem above raid md127
raid md127: filesystem successfully quickly wiped.
mdadm: stopped md127
raid md127: successfully deactivated.
vda: starting quick wipe.
vda1: partition was wiped successfully
vda2: partition was wiped successfully
vda3: partition was wiped successfully
vda: successfully quickly wiped.
vdb: starting quick wipe.
vdb1: partition was wiped successfully
vdb2: partition was wiped successfully
vdb3: partition was wiped successfully
vdb: successfully quickly wiped.
vdc: starting quick wipe.
vdc1: partition was wiped successfully
vdc2: partition was wiped successfully
vdc: successfully quickly wiped.
All disks have been successfully wiped.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.