Comment 26 for bug 120375

Revision history for this message
Peter Haight (peterh-sapros) wrote :

I haven't messed around with the problem in Gutsy, but what Daniel said above about the difference between Fiesty and Gusty is not correct. Fiesty was also launching mdadm as devices were discovered from udev and that is exactly the problem. Both are pretty much the same, just the code has moved around some. Unfortunately the box I fixed this on is now in production and I haven't had the chance to setup another test one to port my fix to Gutsy, so I'll explain what's going on, and maybe someone else can fix it. The version of mdadm doesn't have anything to do with this problem. This problem is entirely due to the Ubuntu startup scripts.

What Ubuntu is doing is as each device gets discovered by udev it runs:

mdadm --assemble --scan --no-degraded

If your RAID is made up of say sda1 and sdb1, then when 'sda1' is discovered by Linux, udev runs 'mdadm --assemble --scan --no-degraded'. Mdadm tries to build a RAID device using just 'sda1' because that's the only drive discovered so far. It fails because of the '--no-degraded' flag which tells it to not assemble the RAID unless all of the devices are present. If it didn't include the '--no-degraded' flag, it would assemble the RAID in degraded mode. This would be bad because at this point we don't know if 'sdb1' is missing or it just hasn't been discovered by udev yet.

So, then Linux chugs along and finds 'sdb1', so it calls 'mdadm --assemble --scan --no-degraded' again. This time both parts of the RAID (sda1 and sdb1) are available, so the command succeeds and the RAID device gets assembled.

This all works great if all the RAID devices are working, but since it allways runs mdadm with the '--no-degraded' option, it won't assemble the RAID device if say 'sda1' is broken or missing.

My solution was to wait until mounting root failed due to a timeout and then try 'mdadm --assemble --scan' without '--no-degraded' to see if we can assemble a degraded RAID device. Hopefully by the time the root mount has timed out, Linux has discovered all of the disks that it can. This works on my Fiesty box, but as I said above, stuff got moved around for Gutsy and I haven't had a chance to build another box to try it out and fix Gutsy. Also I think my script didn't take into account the scenario where the RAID device isn't root.