No complete reboot with sw-raid when one disk get faulty

Bug #125561 reported by gerbra on 2007-07-12
This bug report is a duplicate of:  Bug #120375: cannot boot raid1 with only one disk. Edit Remove
4
Affects Status Importance Assigned to Milestone
Ubuntu
Undecided
Brian Murray

Bug Description

System: Edubuntu 7.04 Kernel 2.6.20-16-generic x86_64

The server has 4 scsi disks.
There is a combination of sw raid1 and raid10.
/boot and swap are raid1
/ and /home are raid10.

I simulate an totally fault of one disk.
Then the system doesn't reboot correctly with 3 of 4 devices.
I got stuck in busybox/initramfs. There i were able to reassemble the md arrays
with mdadm.
But the server does not reboot with only 3 disks. And that should not be.

I've put a script with mdadm -A -s in the various script sections of
initramfs and generated similar new initrd images. Also put raid1, raid10
an md/scsi modules in these initrds.
As an effect i see that during boot of initrd the md devices become "up" with the 3 of 4 devices.
But further mount of the root fs and starting init is still not possibly, i run only in busybox.

Before busybox i see these in tty-Logs (Sorry, only notices on paper, i am not on the server)
Trying to resume from /dev/md0
Then i see an modprobe call but only the usage text of modprobe-
Then an entry that /etc/fstab could not be read-
Then that /proc and /sys could not mounted
Then: Target filesystem doesn't have /sbin/init
Then i got in busybox.

I have tested the same scenario on an other distribution in viritual machine and there the system
boots after a "disk fault" only with 3 of 4 devices.

Sorry for my bad english.
I could provide more information if needed.

Thank you.

Brian Murray (brian-murray) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in recently. We were wondering if this is still an issue for you? Thanks in advance.

gerbra (gerhard-brauer) wrote :

Thank for your answer, yeah i should have gave this report maybe a higher priority.

We updated this server at least to Edubuntu 7.10. I could not exactly say if above scenario is gone after update cause the server is in production. I could/will do a test maybe on weekend in week 50.
But i asume that our raid problem is still "alive"

At boot time of the server (it is not always running) we sometimes have degraded SW-Raid-Arrays.
After update to 7.10 it is better, but by ~10 Boots we get ~2 degraded array(s).
The harddisks are 100% ok, i asume this may be a timing problem on boot.
The degrated /deb/md Device changes randomly an also the /dev/sdXY device in the array.
Tested with rootdelay parameter, also with some sleeps in initrd-Skripts.

It's a little annoying to always do a: mdadm /dev/mdX -a /dev/sdXY, watch mdstat output an wait for the next false alarm ;-)

Alexander Dietrich (adietrich) wrote :

Hi, this is a duplicate of bug #120375, which contains a potential fix in one of the last comments by Ken.

Unfortunately, bug #120375 is marked as "not in Ubuntu", so it doesn't get much attention, I'm afraid.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers