Mdadm slow RAID6 & RAID10 resync

Bug #1940207 reported by Sergiu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
New
Undecided
Unassigned

Bug Description

I am having a RAID 10 made of 24 x Micron 9300 Pro 15.36TB. I have initialized the array using the command as follows:

mdadm --create /dev/md0 --raid-devices=24 --chunk=32 --level=raid10 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1

What I have found out is that array resyncs at a total of 1.3GB/s and does not appear to be any way to speed it up. If built as 12 arrays of RAID1 in a RAID0 config, each individual group ends up being resynced at ~3.2GB/s for a total throughput of about 38.4GB/s which is almost 30 times faster, however in this configuration random IOPS performance appears to be way more unstable than in standard RAID10, thus negating the advantages of resync. If mdadm offers predefined RAID10 configuration, it should offer the same resync behavior as RAID 1+0, however it is not due to lack of parallelization in plain resync. The dev.raid.speed_limit_max parameter was raised to 5000000 during this exercise.

Same resync speed is observed also in RAID6, however there, setting group_thread_cnt value to 12 increases the resync speed from 1.3GB to about 10.5GB, which is however still far from theoretical speed of 72GB/s. To be noted that increasing group_thread_cnt above 12 does not scale up, contrary, throughput starts decreasing.

To be mentioned that server is having way more than enough CPU power (2 x 64 cores) and all SSDs are directly attached NVMe, thus there is no bus sharing of any kind, which was confirmed by running fio on all devices concurrently and observing max theoretical speed.

Revision history for this message
Sergiu (sergiuhlihor) wrote :

Forgot to mention the most important details: this was tested on Ubuntu Server 20.04, Kernel version 5.4 (standard install) then also 5.11. Behavior was the same in both.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.