Comment 28 for bug 942106

Doug Jones (djsdl) wrote :

This does NOT fix this issue for me.

My system still boots up with some RAID arrays not running. Every single time.

This system has six SATA drives on two controllers. It contains seven RAID arrays, a mix of RAID 1, RAID 10, and RAID 6; all are listed in fstab. Some use 0.90.0 metadata and some use 1.2 metadata. The root filesystem is not on a RAID array (at least not any more, I got tired of that REAL fast) but everything else (including /boot and all swap) is on RAID. BOOT_DEGRADED is set. All partitions are GPT. Not using LUKS or LVM. All drives are 2TB and by various manufacturers, and I suspect some have 512B physical sectors and some have 2KB sectors. This is an AMD64 system with 8GB RAM.

This system has had about four different versions of Ubuntu on it, and has had multiple RAID arrays on it from the beginning. (This is why some of the arrays are still using 0.90.0 metadata.) RAID worked fine until the system was upgraded to Oneiric early in 2012 (no, it did not start with Precise).

I have carefully tested the system every time an updated kernel or mdadm has appeared, since the problem started with Oneiric. The behavior has gradually improved over the last several months. This latest version of mdadm (3.2.5) did not result in significant improvement; have rebooted four times since then and the behavior is consistent.

When the problem first started, on Oneiric, I had the root file system on RAID. This was unpleasant. I stopped using the system for a while, as I had another one running Maverick.

When I noticed some discussion of possibly related bugs on the Linux RAID list (I've been lurking there for years) I decided to test the system some more. By then Precise was out, so I upgraded. That did not help. Eventually I backed up all data onto another system and did a clean install of Precise on a non-RAID partition, which made the system tolerable. I left /boot on a RAID1 array (on all six drives), but that does not prevent the system from booting even if /boot does not start during Ubuntu startup (I assume because GRUB can find /boot even if Ubuntu later can't).

I started taking detailed notes in May (seven cramped pages so far). Have rebooted 23 times since then. On every boot, exactly two arrays did not start. Which arrays they were, varied from boot to boot; could be any of the arrays. No apparent correlation with metadata type or RAID level.

This mdadm 3.2.5 is the first time I have resorted to doing a forced upgrade from -proposed; before, I always just waited for a regular update. The most significant improvements happened with earlier regular updates. It has been a while since I had to wait for a degraded array to resync, or manually re-attach a component (usually a spare) that had become detached, or drop to the command line to zero a superblock before reattaching a component. It has been a while since an array containing swap has failed to start.

This issue has now become little more than an annoyance. I can now boot, wait for first array to not start, hit S, wait for the second, hit S, wait for the login screen, log in, wait for Unity desktop, start Disk Utility, manually start the two arrays that didn't start, then check all the other arrays to see if anything else has happened. Takes about five minutes. But I am still annoyed.

If you want to replicate this behavior consistently, get yourself seven arrays.