boot hangs with missing RAID device

Bug #123888 reported by Qumefox on 2007-07-04
This bug report is a duplicate of:  Bug #120375: cannot boot raid1 with only one disk. Edit Remove
4
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: mdadm

To be honest, I'm not sure if this is a mdadm problem or a initramfs problem.

Here's what I have. This particular box has 3 500gb sata2 drives partitioned as follows:

sda : 190mb swap, rest of disk RAID5 (/)
sdb : 30mb RAID1 (/boot), 160mb swap, rest of disk RAID5 (/)
sdc : 30mb RAID1 (/boot), 160mb swap, rest of disk RAID5 (/)

The box is also running feisty, with all the updates available as of today (7-3-07)

I probably wouldn't have ever noticed this behavior if I wasn't paranoid about losing data.

The box runs find so long as all devices are present, however, before I started using it, I wanted to simulate a flat out dead drive, so I shut down, unhooked one of the 3 from the controller, and restarted.

What I get then, is the boot hanging for about 3 minutes, then I get dumped to the initramfs prompt, with lots of mount failures. When I checked /proc/mdstat, it showed both md0 (raid1) and md1(raid5) to be inactive and missing the respective devices on the drive i'd unhooked.

Here's what i've tried so far to get around this.
in usr/share/initramfs-tools/scripts/local-top/mdadm in the line ${MD_DEGRADED_ARGS:= --no-degraded]" I got rid of the --no-degraded, since, from my understanding of the script, it would always get called, and stop the arrays from being started if there were failed devices.

After I rebuilt initramfs and rebooted, the box would then boot normally, wether or not a drive was missing or not.
However, this caused a different problem. The RAID5 array would allways be missing a device on boot and start degraded, even if all were present. I could manually add the partition to the array, and it would resync, but I certainly don't want to have to do that on every boot.

I've also read the bug about the race condition existing, where the arrays were trying to be assembled before all devices were detected. So I tried adding a 'sleep 10' after the line log_begin_msg "Mounting root file system..." in /usr/share/initramfs-tools/init but it didn't have any effect other than the machine taking 10 seconds longer to boot. The RAID5 array still came up missing a device.

I've also verified that all the raid5 partitions have the same UUID.

Let me know what logs I can provide to help with this, as the problem is super easy for me to duplicate.

I just want this box to be usable, even if a drive dies.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers