Comment 22 for bug 29858

Revision history for this message
Vince (tucsonclimber) wrote :

I was affected by this bug on a lucid (10.04) two step upgrade from 9.04 (Jaunty) (Jaunty to Karmic to Lucid) with an active root snapshot. I found that the 2.6.31.-22 kernel worked consistently and all later kernels attempted (2.6.32-22 and 2.6.32-25) failed in one of two ways:

The most common error was a drop to the shell after reporting a bad block on the root filesystem mount. I was never able to recover from this, even with manual mounts (always successful), mount moves and chroot.

The less common error was the cannot find /dev/mapper/rootvg-rootlv from the wait-for-root procedure. In these cases, I was always able to just type exit and it would retry (successfully) the failed mount.

In both cases, the wait-for-root call would take a full 30 seconds (or longer with rootdelay) - it was NOT detecting the volume before the timeout in any case.

After much experimentation with rootfstype, rootdelay, etc. I finally decided to remove the snapshot of the root volume that I had allocated prior to the upgrade and have booted successfully since then.

Please note - there was a snapshot of the root LV that I had created prior to the upgrade (for safety), but I was mounting the base LV, not the snapshot.

I have since recreated a snapshot of the root volume again with no problems booting.

I would conclude from this that there is a timing problem between the registration of the volumes with DM and when the volumes are actually usable. In some cases, the /dev/mapper links would not be created within the timeout, and in others the links would be created, but the read of the superblock would return garbage. I assume the differences between the kernel versions are related to this timing, or perhaps addition of additional threading that created a race condition. Further, the fact that the 25% full snapshot (the original) failed while the mostly empty snapshot succeeded would indicate the timing problem is related to the number of changed pages in the snapshot. The fact that manipulating the rootdelay never affected the problem (except to increase the time it took to present) indicates that the race-condition is somehow related to a lock held by the wait-for-root.

I did not encounter any problems with the upgrade of the LVM configurations or packages between Jaunty and Lucid as described by others - the initramfs configurations were all correct (with the possible exception of wait-for-root NEVER completing before the timeout)