mdadm + dm-raid: overrides previous devices due to good homehost
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mdadm (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Binary package hint: mdadm
System
------
OS: Ubuntu 7.04 Feisty Fawn
Kernel: 2.6.20-16-server (latest update as of bug report)
Software:
- ii dmsetup 1.02.08-1ubuntu10 The Linux Kernel Device Mapper userspace lib
- ii mdadm 2.5.6-7ubuntu5 tool to administer Linux MD arrays (software
- ii cryptsetup 1.0.4+svn26-
- ii lvm-common 1.5.20ubuntu12 The Logical Volume Manager for Linux (common
- ii lvm2 2.02.06-2ubuntu9 The Linux Logical Volume Manager
Hardware Configuration:
- Motherboard: ASUS M2N4-SLI ACPI BIOS Revision 0301 (output from dmidecode)
- Storage Controllers:
- 00:06.0 IDE interface: nVidia Corporation CK804 IDE (output from lspci -vv) [ONBOARD CONTROLLER = libata/ide_disk]
- 2 x IDE interfaces (1 drive online)
- 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (output from lspci -vv) [ONBOARD CONTROLLER = sata_nv)
- 4 x SATA II interfaces (4 drives online)
- 04:00.0 SCSI storage controller: Triones Technologies, Inc. Unknown device 2300 (output from lspci -vv) [HPT ROCKETRAID 2310 PCI RAID CONTROLLER = rr2310_00]
- 4 x SATA II interfaces (1 drive online)
Problem
-------
1. Default OS installed to IDE drive ( root filesystem, swap, using default Ubuntu configuration including LVM2) encountered no problems.
2. apt-get update && apt-get upgrade (including kernel). [reboot]
3. Downloaded, compiled and installed module for HPT RR2310 drivers from: http://
4. Created Software RAID5 set as per general guidelines at: http://
Software RAID-5 (mdadm) 4 disks => Device-mapper Crypto Pseudo-device (dm-crypt) => Device-mapper Logical Volume Manager (LVM2) => Extended 3 Filesystem (ext2fs+journaling) to be mounted in /crypto [reboot]
5. Tested working configuration from boot-up. (md1 device active/
6. Used mdadm to grow array to 5 devices, and allowed time to rebuild. Monitored /proc/mdstat.
7. Made additional size available to system by supplying grow/resizing options to cryptsetup, LVM2 and resize2fs.
8. Mounted md1 device supplied, cryptsetup LUKS password, everything functional 100% for 12 days.
9. Rebooted server. md1 device not active. Rebooted again. Same.
Troubleshooting
---------------
- dm-raid is used for both sata_nv and rr2310_00 controller support.
- I am inclined to believe that this is being caused by some sort of ordering issue (as I soon as I brought online the 'disk5' it worked, but after rebooting, it didn't come back up automatically)
- It is worth reiterating that the raid set spans across two (2) dm-raid controllers.
- If I bring the server up with all disks plugged in (just as it was before problem started occuring), 'disk5' (/dev/sde1 that sits on 2nd controller - sata_nv), the software raid array (/dev/md1) fails to become active after bootup and /dev/sde1 is the only device visible in /proc/mdstats
- If I bring the server up without 'disk5' online, the software raid array (/dev/md1) fails to become active after boot-up and /dev/sda1 to /dev/sdd1 are visible in /proc/mdstats (not /dev/sde1).
- If I hot-swap plug-in 'disk5' after bringing the server up without it online, then /proc/mdstats removes all other drives in array previously visible, and replaces it with single sde1 'disk5' disk.
Logs
----
The logs display when booting up without 'disk5':
Aug 29 07:18:14 FeistyFawn kernel: [ 38.907396] md: md1 stopped.
Aug 29 07:18:14 FeistyFawn kernel: [ 38.967943] md: bind<sdb1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.968030] md: bind<sdc1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.968107] md: bind<sdd1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.968183] md: bind<sda1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989506] md: md1 stopped.
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989515] md: unbind<sda1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989523] md: export_rdev(sda1)
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989541] md: unbind<sdd1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989544] md: export_rdev(sdd1)
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989552] md: unbind<sdc1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989556] md: export_rdev(sdc1)
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989563] md: unbind<sdb1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.989567] md: export_rdev(sdb1)
Aug 29 07:18:14 FeistyFawn kernel: [ 38.997016] md: bind<sdb1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.997103] md: bind<sdc1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.997179] md: bind<sdd1>
Aug 29 07:18:14 FeistyFawn kernel: [ 38.997255] md: bind<sda1>
root@
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sda1[0](S) sdd1[3](S) sdc1[2](S) sdb1[1](S)
1562352896 blocks
unused devices: <none>
Then after hot-plugging 'disk5' after booting, it generates constant "md: array md1 already has disks!" messages:
Aug 29 07:19:06 FeistyFawn kernel: [ 436.437594] md: array md1 already has disks!
Aug 29 07:19:06 FeistyFawn kernel: [ 436.440356] md: array md1 already has disks!
Aug 29 07:19:06 FeistyFawn kernel: [ 436.443079] md: array md1 already has disks!
Aug 29 07:19:06 FeistyFawn kernel: [ 436.445802] md: array md1 already has disks!
Aug 29 07:19:06 FeistyFawn kernel: [ 436.455818] md: array md1 already has disks!
Until I shutdown mdadm:
root@FeistyFawn
* Stopping MD array md1...
...done.
root@
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298524] md: md1 stopped.
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298533] md: unbind<sda1>
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298541] md: export_rdev(sda1)
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298563] md: unbind<sdd1>
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298566] md: export_rdev(sdd1)
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298579] md: unbind<sdc1>
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298584] md: export_rdev(sdc1)
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298595] md: unbind<sdb1>
Aug 29 07:20:02 FeistyFawn kernel: [ 492.298599] md: export_rdev(sdb1)
Aug 29 07:20:02 FeistyFawn kernel: [ 492.314312] md: bind<sde1>
root@
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sde1[4](S)
390588224 blocks
unused devices: <none>
It appears mdadm-raid is still running! Shut it down again:
root@
root@
Personalities : [raid6] [raid5] [raid4]
unused devices: <none>
Aug 29 07:27:31 FeistyFawn kernel: [ 940.500060] md: md1 stopped.
Aug 29 07:27:31 FeistyFawn kernel: [ 940.500069] md: unbind<sde1>
Aug 29 07:27:31 FeistyFawn kernel: [ 940.500078] md: export_rdev(sde1)
Now try to re-assemble the array (again):
root@
mdadm: /dev/sde1 overrides previous devices due to good homehost
mdadm: /dev/md1 assembled from 1 drive - not enough to start the array.
root@
Personalities : [raid6] [raid5] [raid4]
md1 : inactive sde1[4](S)
390588224 blocks
unused devices: <none>
Aug 29 07:30:39 FeistyFawn kernel: [ 1128.919022] md: md1 stopped.
Aug 29 07:30:39 FeistyFawn kernel: [ 1128.934855] md: bind<sde1>
System Health
-------------
The disks themselves (including superblocks) appear healthy and fine:
# mdadm -E /dev/md1 /dev/sd?1
mdadm: No md superblock detected on /dev/md1.
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : e38231e0:
Creation Time : Tue Aug 7 06:55:52 2007
Raid Level : raid5
Device Size : 390588224 (372.49 GiB 399.96 GB)
Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Aug 29 22:09:57 2007
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : 72143494 - correct
Events : 0.259758
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 81 4 active sync /dev/.static/
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : e38231e0:
Creation Time : Tue Aug 7 06:55:52 2007
Raid Level : raid5
Device Size : 390588224 (372.49 GiB 399.96 GB)
Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Aug 29 22:09:57 2007
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : 721434a6 - correct
Events : 0.259758
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 81 4 active sync /dev/.static/
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : e38231e0:
Creation Time : Tue Aug 7 06:55:52 2007
Raid Level : raid5
Device Size : 390588224 (372.49 GiB 399.96 GB)
Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Aug 29 22:09:57 2007
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : 721434b8 - correct
Events : 0.259758
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 33 2 active sync /dev/sdc1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 81 4 active sync /dev/.static/
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : e38231e0:
Creation Time : Tue Aug 7 06:55:52 2007
Raid Level : raid5
Device Size : 390588224 (372.49 GiB 399.96 GB)
Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Aug 29 22:09:57 2007
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : 721434ca - correct
Events : 0.259758
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 81 4 active sync /dev/.static/
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : e38231e0:
Creation Time : Tue Aug 7 06:55:52 2007
Raid Level : raid5
Device Size : 390588224 (372.49 GiB 399.96 GB)
Array Size : 1562352896 (1489.98 GiB 1599.85 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 1
Update Time : Tue Aug 29 22:09:57 2007
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : d3605c4f - correct
Events : 0.259758
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 81 4 active sync /dev/.static/
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 49 3 active sync /dev/sdd1
4 4 8 81 4 active sync /dev/.static/
Regards, hope this helps!
Also worth mentioning (not explicitly mentioned above), it is a default configuration, including mdadm.conf:
root@FeistyFawn:~# cat /etc/mdadm/ mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#
# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions
# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR root
# definitions of existing MD arrays
# This file was auto-generated on Tue, 07 Aug 2007 06:51:48 +1000
# by mkconf $Id: mkconf 261 2006-11-09 13:32:35Z madduck $