MAAS

Deploying a server with bcache on top of HDD and mdadm can frequently fail

Bug #2054672 reported by DUFOUR Olivier on 2024-02-22

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
MAAS	Fix Released	Critical	Alexsander de Souza	MAAS 3.5.0-beta1
3.3	Triaged	Undecided	Unassigned	MAAS 3.3.x
3.4	Fix Released	Undecided	Unassigned	MAAS 3.4.1
curtin	Fix Committed	Undecided	Alexsander de Souza

Bug Description

Environment :
* MAAS 3.3 and 3.4
* Ubuntu 22.04
* deployment / commissioning OS : 20.04 and 22.04
* Servers to deploy with slow drives such as HDD

When deploying a server using Bcache as its device for rootfs, especially on top of software RAID (mdadm) and with slow drives such as hard drives, the installation of Ubuntu, on the storage configuration step, can fail quite frequently.

#
# Reproducer :
#
It is possible to recreate the environment with slow drives with Libvirt with the following setup :
1) Create around 6 or more VMs with (see the script "create-slow-vms.sh" for the exact commands) :
* 3 vCPUs
* 4 GB of RAM
* 3 disks :
* 1 x 10 GB fast, as bcache
* 2 x 30 GB with limited IOPS (150 iops, 30MB/s top speed)

2) With the following disk topology (see reproducer-storage-config.png) :
* /dev/vda --> 2 partitions
   - 1GB for md0
   - 29GB for md1
* /dev/vdb --> 2 partitions
   - 1GB for md0
   - 29GB for md1
* /dev/md0 --> ext4 for /boot
* /dev/vdc (fast drive) --> bcache0 cache set
* /dev/md1 --> bcache0 backend storage
* /dev/bcache0 --> ext4 for /

3) Deploy Ubuntu 22.04 to all VMs
--> some of the VMs will fail with the same error with Curtin

4) (Optional) Also not erasing the drives when releasing and redeploying right away the server seem to increase hugely the likelyness of failing the deployment.

#
# logs
#
I'm attaching to the bug report some more logs :
* quick-summary-logs.txt --> some logs from baremetal servers on customer's hardware.
* reproducer-installation-output.txt --> full installation output from a failing in my reproducer test.

# theory
And at a first glance, it seems to be a race condition, because when reusing the same server and retrying to deploy again Ubuntu, it may works right.
This may be triggered because the hard drives are already sollicited with mdadm currently syncing the disks together and may become even slower when some changes, such as creating a bcache backend device, is requested and then curtin failing with the race condition.

On a large deployment such as Openstack, this make the installation process cumbersome as one or multiple servers may randomly fail to deploy.

Looking at the logs of the installation output from MAAS, curtin seems to fail to confirm the backend storage
# main differences
## working
2024-02-06T10:09:43+00:00 server-node3 cloud-init[2701]: check just created bcache /dev/md1 if it is registered, try=2
2024-02-06T10:09:43+00:00 server-node3 cloud-init[2701]: Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
2024-02-06T10:09:43+00:00 server-node3 cloud-init[2701]: TIMED udevadm_settle(): 0.018
2024-02-06T10:09:43+00:00 server-node3 cloud-init[2701]: Found bcache dev /dev/md1 at expected path /sys/class/block/md1/bcache
2024-02-06T10:09:43+00:00 server-node3 cloud-init[2701]: validating bcache backing device '/dev/md1' from sys_path '/sys/class/block/md1/bcache'
2024-02-06T10:09:43+00:00 server-node3 cloud-init[2701]: bcache device /sys/class/block/md1/bcache using bcache kname: bcache6
2024-02-06T10:09:44+00:00 server-node3 cloud-init[2701]: bcache device /sys/class/block/md1/bcache has slaves: ['md1']

## non-working
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: check just created bcache /dev/md1 if it is registered, try=2
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: TIMED udevadm_settle(): 0.019
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: Found bcache dev /dev/md1 at expected path /sys/class/block/md1/bcache
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: validating bcache backing device '/dev/md1' from sys_path '/sys/class/block/md1/bcache'
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: bcache dev /dev/md1 at path /sys/class/block/md1/bcache successfully registered on attempt 2/60
2024-02-06T10:09:52+00:00 server-node1 cloud-init[2698]: devname '/dev/md1' had holders: []

Related branches

~alexsander-souza/curtin:lp2054672

Merged into curtin:master

Server Team CI bot: Approve (continuous-integration) on 2024-02-23

curtin developers: Pending requested 2024-02-23

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-22:

quick-summary-logs.txt Edit (20.3 KiB, text/plain)

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-22:

reproducer-installation-output.log Edit (258.9 KiB, text/plain)

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-22:

reproducer-storage-screenshot.png Edit (143.4 KiB, image/png)

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-22:

create-slow-vms.sh Edit (1.1 KiB, text/x-sh)

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-22:

Subscribed ~Field High

It penalises greatly an ongoing deployment with a customer relying on Bcache with hard-drives.

Alexsander de Souza (alexsander-souza) on 2024-02-23

Changed in maas:
status:	New → Triaged
importance:	Undecided → Critical
assignee:	nobody → Alexsander de Souza (alexsander-souza)
milestone:	none → 3.5.0

Alexsander de Souza (alexsander-souza) on 2024-02-23

Changed in curtin:
assignee:	nobody → Alexsander de Souza (alexsander-souza)
Changed in maas:
status:	Triaged → In Progress

Alexsander de Souza (alexsander-souza) on 2024-02-28

Changed in curtin:
status:	New → Fix Committed
Changed in maas:
status:	In Progress → Fix Committed

Revision history for this message

Alexsander de Souza (alexsander-souza) wrote on 2024-02-28:

Curtin updated to 23.1.1-1099-g585dd3a9-0ubuntu1~ubuntu22.04.1

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-28:

curtin-logs-without-disk-erasing.tar Edit (200.0 KiB, application/x-tar)

Hello

Thank you for your help so far.
I've made more tests on my lab with the daily build of Curtin (22.1-1153-gfc39d744-0ubuntu1+318~trunk~ubuntu22.04.1)

After doing more in depth analysis, I've noticed 3 different scenarios where installations with bcache can fail with Curtin :

1) If the servers are released without having their disks cleaned
--> curtin-logs-without-disk-erasing.tar
Problem : Curtin seems to fail to stop mdadm because bcache is on top, as a wild guess, it might be necessary for curtin to try to stop bcache first and then mdadm to be able to progress any further.

2) If the servers are released with only quick disk erasing and then redeployed
(This is a common scenario with hard-drives since the vast majority of them don't have the feature of secure erase like SSDs, and otherwise using MAAS to erase all the data hard-drives can literally take multiple days to complete)
--> curtin-logs-after-quick-disk-erase.tar
Problem : Partly related to the first issue, MAAS quick erase method doesn't seem to be thorough enough to remove all the partition signature such as bcache on the disks.

3) When using a commission script (manual-clean-disks.sh) to compensate MAAS quick erase not being thorough enough, and redeploying after the race condition can happen (initial subject of this bug report).
I believe it might be fixed since I cannot reproduce it on my lab, but I would need to test on the customer's environment to be able to confirm definitely.

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-28:

curtin-logs-after-quick-disk-erase.tar Edit (240.0 KiB, application/x-tar)

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-02-28:

00-manual-clean-disk.sh Edit (1.3 KiB, text/x-sh)

Revision history for this message

Alexsander de Souza (alexsander-souza) wrote on 2024-02-28:

#10

We are discussing this greedy behaviour of bcache in https://bugs.launchpad.net/maas/+bug/1887558

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-03-04:

#11

At least for the issue #3, with the custom disk cleaning script, I've confirmed for the deployment to be more reliable.

Do we have any idea of the timing for the first fix to be embedded into MAAS' snap ?

Anton Troyanov (troyanov) on 2024-03-05

Changed in maas:
milestone:	3.5.0 → 3.5.0-beta1
status:	Fix Committed → Fix Released

Revision history for this message

Björn Tillenius (bjornt) wrote on 2024-06-05:

#12

Unmarked the bug as Fix committed for MAAS 3.3, since 3.3 PPA still has curtin 22.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.