mdadm RAID10 arrays cannot be rebuilt, will not use available spare drives
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Fix Released
|
Unknown
|
|||
Debian |
Fix Released
|
Unknown
|
|||
linux (Ubuntu) |
Fix Released
|
Medium
|
Stefan Bader | ||
Intrepid |
Fix Released
|
Undecided
|
Unassigned | ||
Jaunty |
Fix Released
|
Medium
|
Stefan Bader | ||
mdadm (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Intrepid |
Invalid
|
Undecided
|
Unassigned | ||
Jaunty |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: mdadm
Rebuild degraded RAID1 or RAID5 no problem, rebuild degraded RAID10 fails.
Here is basic problem, one drive fails leaving RAID10 array and clean/degraded. Add hot spare. Drive shows up as hot spare yet auto-rebuild does not start. Only occurs with RAID10 array. I try removing the spare drive, delete its super block, and then re-add it, stll fails. Do the same thing with RAID1/RAID5 and auto-rebuild starts on its own.
This is using Ubuntu 8.10 and mdadm 2.6.7-3ubuntu7.
Here is example:
bexamous@
mdadm: hot removed /dev/sdb4
bexamous@
bexamous@
mdadm: added /dev/sdb4
bexamous@
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md2 : active raid0 sdb2[0] sdd2[2] sdc2[1]
733767168 blocks 512k chunks
md3 : active raid5 sdb3[0] sdc3[1] sdd3[6] sdh3[5] sdg3[4] sdf3[3] sde3[2]
4300824576 blocks level 5, 512k chunk, algorithm 2 [7/7] [UUUUUUU]
md4 : active raid10 sdb4[7](S) sdc4[1] sdh4[6] sdg4[5] sdf4[4] sde4[3] sdd4[2]
53781280 blocks 32K chunks 2 near-copies [7/6] [_UUUUUU]
unused devices: <none>
bexamous@
Here is array --detail
/dev/md4:
Version : 00.90
Creation Time : Mon Oct 13 02:01:54 2008
Raid Level : raid10
Array Size : 53781280 (51.29 GiB 55.07 GB)
Used Dev Size : 15366080 (14.65 GiB 15.73 GB)
Raid Devices : 7
Total Devices : 7
Preferred Minor : 4
Persistence : Superblock is persistent
Update Time : Thu Oct 16 11:52:39 2008
State : clean, degraded
Active Devices : 6
Working Devices : 7
Failed Devices : 0
Spare Devices : 1
Layout : near=2, far=1
Chunk Size : 32K
UUID : c16f9559:
Events : 0.32
Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 36 1 active sync /dev/sdc4
2 8 52 2 active sync /dev/sdd4
3 8 68 3 active sync /dev/sde4
4 8 84 4 active sync /dev/sdf4
5 8 100 5 active sync /dev/sdg4
6 8 116 6 active sync /dev/sdh4
7 8 20 - spare /dev/sdb4
Here is another example of fail to rebuild RAID10 and then success in rebuilting as RAID1:
root@nine:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md9 : active raid10 loop4[3] loop3[2] loop2[1] loop1[0]
40832 blocks 64K chunks 2 near-copies [4/4] [UUUU]
unused devices: <none>
root@nine:~# mdadm /dev/md9 -f /dev/loop3
mdadm: set /dev/loop3 faulty in /dev/md9
root@nine:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md9 : active raid10 loop4[3] loop3[4](F) loop2[1] loop1[0]
40832 blocks 64K chunks 2 near-copies [4/3] [UU_U]
root@nine:~# mdadm /dev/md9 --remove /dev/loop3
mdadm: hot removed /dev/loop3
root@nine:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md9 : active raid10 loop4[3] loop2[1] loop1[0]
40832 blocks 64K chunks 2 near-copies [4/3] [UU_U]
root@nine:~# mdadm --zero-superblock /dev/loop3
root@nine:~# mdadm /dev/md9 --add /dev/loop3
mdadm: added /dev/loop3
root@nine:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md9 : active raid10 loop3[4](S) loop4[3] loop2[1] loop1[0]
40832 blocks 64K chunks 2 near-copies [4/3] [UU_U]
root@nine:~#
root@nine:~# mdadm -S /dev/md9
mdadm: stopped /dev/md9
root@nine:~# mdadm --zero-superblock /dev/loop1
root@nine:~# mdadm --zero-superblock /dev/loop2
root@nine:~# mdadm --zero-superblock /dev/loop3
root@nine:~# mdadm --zero-superblock /dev/loop4
root@nine:~# mdadm -C --level=raid1 -n 4 /dev/md9 /dev/loop1 /dev/loop2 /dev/loop3 /dev/loop4
mdadm: array /dev/md9 started.
root@nine:~# cat /proc/mdstat
md9 : active raid1 loop4[3] loop3[2] loop2[1] loop1[0]
20416 blocks [4/4] [UUUU]
root@nine:~# mdadm /dev/md9 -f /dev/loop3
mdadm: set /dev/loop3 faulty in /dev/md9
root@nine:~# mdadm /dev/md9 --remove /dev/loop3
mdadm: hot removed /dev/loop3
root@nine:~# mdadm /dev/md9 --zero-superblock /dev/loop3
root@nine:~# mdadm /dev/md9 --add /dev/loop3
mdadm: added /dev/loop3
root@nine:~# cat /proc/mdstat
md9 : active raid1 loop3[2] loop4[3] loop2[1] loop1[0]
20416 blocks [4/4] [UUUU]
CVE References
Changed in linux: | |
assignee: | nobody → stefan-bader-canonical |
importance: | Undecided → Medium |
status: | New → In Progress |
Changed in linux: | |
status: | Unknown → Fix Released |
Changed in mdadm: | |
status: | New → Invalid |
Changed in linux: | |
milestone: | none → intrepid-updates |
milestone: | intrepid-updates → none |
Changed in debian: | |
status: | Unknown → Fix Released |
I'm also experiencing this issue on intrepid/amd64, and it appears to have been noticed upstream as well (Debian Bug #495580).
This is a pretty severe bug for anyone using RAID10, as it means that arrays can't be rebuilt and recovered after a drive failure.