mdam software raid fails to start up on reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mdadm (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: mdadm
I have 2 RAID5 set up through mdadm. md0 contains only IDE-disks and md1 only contains SATA disks.
When i have set up both RAID`s for the first time, the resync starts just fine, but when i restart the server the boot takes 5 minutes longer than usual, and both RAID`s are gone. md0 has only 3 of 6 disks and are therefore stopped, md1 is non-existant.
The kern.log repeat`s this message some ten thousand times (md: array md0 already has disks!)
log content:
Feb 2 15:51:52 foo kernel: [ 676.666987] sd 0:0:1:0: [sdb] 398297088 512-byte hardware sectors (203928 MB)
Feb 2 15:51:52 foo kernel: [ 676.667022] sd 0:0:1:0: [sdb] Write Protect is off
Feb 2 15:51:52 foo kernel: [ 676.667027] sd 0:0:1:0: [sdb] Mode Sense: 00 3a 00 00
Feb 2 15:51:52 foo kernel: [ 676.667058] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Feb 2 15:51:52 foo kernel: [ 676.667064] sdb: sdb1
Feb 2 15:51:52 foo kernel: [ 676.754604] md: bind<sdm>
Feb 2 15:51:52 foo kernel: [ 676.754824] md: bind<sda>
Feb 2 15:51:52 foo kernel: [ 676.755173] md: bind<sdd>
Feb 2 15:51:52 foo kernel: [ 676.755403] md: bind<sdl>
Feb 2 15:51:53 foo kernel: [ 676.904290] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.911736] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.919289] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.926809] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.934386] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.941893] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.949414] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.956935] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.964457] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.971934] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.979412] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.986932] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 676.994599] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 677.002147] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 677.009677] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 677.017163] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 677.024679] md: array md0 already has disks!
Feb 2 15:51:53 foo kernel: [ 677.032153] md: array md0 already has disks!
When i only had set up md1, the same message was repeated, but with md1 instead.
So then i got a tips about reconfiguring mdadm. So i did:
"dpkg-reconfigure mdadm"
and restarted, then both RAID`s came up just fine, and they also did three reboots later, no problem anymore.
the kern.log now says:
Feb 2 18:15:49 foo kernel: [ 62.279977] md: bind<sdb1>
Feb 2 18:15:49 foo kernel: [ 62.280190] md: bind<sdc1>
Feb 2 18:15:49 foo kernel: [ 62.280393] md: bind<sdd1>
Feb 2 18:15:49 foo kernel: [ 62.280594] md: bind<sde1>
Feb 2 18:15:49 foo kernel: [ 62.280792] md: bind<sdf1>
Feb 2 18:15:49 foo kernel: [ 62.281018] md: bind<sdn1>
Feb 2 18:15:49 foo kernel: [ 62.281223] md: bind<sdo1>
Feb 2 18:15:49 foo kernel: [ 62.281426] md: bind<sda1>
Feb 2 18:15:49 foo kernel: [ 62.786545] raid5: device sda1 operational as raid disk 0
Feb 2 18:15:49 foo kernel: [ 62.786552] raid5: device sdn1 operational as raid disk 6
Feb 2 18:15:49 foo kernel: [ 62.786557] raid5: device sdf1 operational as raid disk 5
Feb 2 18:15:49 foo kernel: [ 62.786562] raid5: device sde1 operational as raid disk 4
Feb 2 18:15:49 foo kernel: [ 62.786566] raid5: device sdd1 operational as raid disk 3
Feb 2 18:15:49 foo kernel: [ 62.786571] raid5: device sdc1 operational as raid disk 2
Feb 2 18:15:49 foo kernel: [ 62.786576] raid5: device sdb1 operational as raid disk 1
Feb 2 18:15:49 foo kernel: [ 62.787622] raid5: allocated 8368kB for md1
Feb 2 18:15:49 foo kernel: [ 62.787628] raid5: raid level 5 set md1 active with 7 out of 8 devices, algorithm 2
Feb 2 18:15:49 foo kernel: [ 62.787677] RAID5 conf printout:
Feb 2 18:15:49 foo kernel: [ 62.787681] --- rd:8 wd:7
Feb 2 18:15:49 foo kernel: [ 62.787685] disk 0, o:1, dev:sda1
Feb 2 18:15:49 foo kernel: [ 62.787689] disk 1, o:1, dev:sdb1
Feb 2 18:15:49 foo kernel: [ 62.787693] disk 2, o:1, dev:sdc1
Feb 2 18:15:49 foo kernel: [ 62.787697] disk 3, o:1, dev:sdd1
Feb 2 18:15:49 foo kernel: [ 62.787701] disk 4, o:1, dev:sde1
Feb 2 18:15:49 foo kernel: [ 62.787705] disk 5, o:1, dev:sdf1
Feb 2 18:15:49 foo kernel: [ 62.787709] disk 6, o:1, dev:sdn1
Feb 2 18:15:49 foo kernel: [ 62.787768] RAID5 conf printout:
Feb 2 18:15:49 foo kernel: [ 62.787772] --- rd:8 wd:7
Feb 2 18:15:49 foo kernel: [ 62.787776] disk 0, o:1, dev:sda1
Feb 2 18:15:49 foo kernel: [ 62.787779] disk 1, o:1, dev:sdb1
Feb 2 18:15:49 foo kernel: [ 62.787783] disk 2, o:1, dev:sdc1
Feb 2 18:15:49 foo kernel: [ 62.787787] disk 3, o:1, dev:sdd1
Feb 2 18:15:49 foo kernel: [ 62.787790] disk 4, o:1, dev:sde1
Feb 2 18:15:49 foo kernel: [ 62.787794] disk 5, o:1, dev:sdf1
Feb 2 18:15:49 foo kernel: [ 62.787798] disk 6, o:1, dev:sdn1
Feb 2 18:15:49 foo kernel: [ 62.787801] disk 7, o:1, dev:sdo1
Feb 2 18:15:49 foo kernel: [ 62.787834] md: recovery of RAID array md1
Feb 2 18:15:49 foo kernel: [ 62.787839] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Feb 2 18:15:49 foo kernel: [ 62.787845] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb 2 18:15:49 foo kernel: [ 62.787858] md: using 128k window, over a total of 390708736 blocks.
Feb 2 18:15:49 foo kernel: [ 62.847931] md: bind<sdh1>
Feb 2 18:15:49 foo kernel: [ 62.848250] md: bind<sdi1>
Feb 2 18:15:49 foo kernel: [ 62.848659] md: bind<sdj1>
Feb 2 18:15:49 foo kernel: [ 62.851114] md: bind<sdl1>
Feb 2 18:15:49 foo kernel: [ 62.851379] md: bind<sdm1>
Feb 2 18:15:49 foo kernel: [ 62.852498] md: bind<sdg1>
Feb 2 18:15:49 foo kernel: [ 63.029749] raid5: device sdg1 operational as raid disk 0
Feb 2 18:15:49 foo kernel: [ 63.029756] raid5: device sdl1 operational as raid disk 4
Feb 2 18:15:49 foo kernel: [ 63.029760] raid5: device sdj1 operational as raid disk 3
Feb 2 18:15:49 foo kernel: [ 63.029765] raid5: device sdi1 operational as raid disk 2
Feb 2 18:15:49 foo kernel: [ 63.029770] raid5: device sdh1 operational as raid disk 1
Feb 2 18:15:49 foo kernel: [ 63.031587] raid5: allocated 6286kB for md0
Feb 2 18:15:49 foo kernel: [ 63.031593] raid5: raid level 5 set md0 active with 5 out of 6 devices, algorithm 2
Feb 2 18:15:49 foo kernel: [ 63.031679] RAID5 conf printout:
Feb 2 18:15:49 foo kernel: [ 63.031682] --- rd:6 wd:5
Feb 2 18:15:49 foo kernel: [ 63.031686] disk 0, o:1, dev:sdg1
Feb 2 18:15:49 foo kernel: [ 63.031690] disk 1, o:1, dev:sdh1
Feb 2 18:15:49 foo kernel: [ 63.031693] disk 2, o:1, dev:sdi1
Feb 2 18:15:49 foo kernel: [ 63.031697] disk 3, o:1, dev:sdj1
Feb 2 18:15:49 foo kernel: [ 63.031700] disk 4, o:1, dev:sdl1
Feb 2 18:15:49 foo kernel: [ 63.031752] RAID5 conf printout:
Feb 2 18:15:49 foo kernel: [ 63.031755] --- rd:6 wd:5
Feb 2 18:15:49 foo kernel: [ 63.031758] disk 0, o:1, dev:sdg1
Feb 2 18:15:49 foo kernel: [ 63.031762] disk 1, o:1, dev:sdh1
Feb 2 18:15:49 foo kernel: [ 63.031766] disk 2, o:1, dev:sdi1
Feb 2 18:15:49 foo kernel: [ 63.031769] disk 3, o:1, dev:sdj1
Feb 2 18:15:49 foo kernel: [ 63.031773] disk 4, o:1, dev:sdl1
Feb 2 18:15:49 foo kernel: [ 63.031776] disk 5, o:1, dev:sdm1
Feb 2 18:15:49 foo kernel: [ 63.031818] md: recovery of RAID array md0
Feb 2 18:15:49 foo kernel: [ 63.031823] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Feb 2 18:15:49 foo kernel: [ 63.031829] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb 2 18:15:49 foo kernel: [ 63.031841] md: using 128k window, over a total of 195358336 blocks.
root@foo:/var/log# uname -a
Linux foo 2.6.22-14-server #1 SMP Tue Dec 18 08:31:40 UTC 2007 i686 GNU/Linux
I just experienced this problem.
Through the "md: array md0 already has disks!" error. Through the spam, if you wait 60s (or whatever the timeout is), you will be dropped to the initramfs shell. You can't see it though through the "already has disks" error 100 times per second. But, I typed 'mdadm --stop /dev/md0' and this stopped the spam! Surprisingly, /dev/md0 existed, and was fine! (though degraded, but I expected that because one drive had failed)
To repeat, I ran 'mdadm --stop /dev/md0' and *after* I ran that, /dev/md0 still existed and was fine.
So, it appears that mdadm is trying to create /dev/md0 *twice* and obviously fails the second time (repeatedly).