Degraded boot fails when using encrypted raid1 with lvm

Bug #659899 reported by Luigi Messina
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
debian-installer (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: debian-installer

This is probably similar or duplicate of bug #577239 https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/577239

I'm testing Ubuntu 10.10 Server under VirtualBox following this test case:
http://testcases.qa.ubuntu.com/Install/ServerRAID1

Installation, reboot and upgrade go ok, but when testing boot with degraded array, grub fails with:

"-r ALERT! /dev/disk/by-uuid/[....] does not exist. Dropping to a shell"

dropping to initramfs busybox prompt.

To reproduce this bug, create a virtual machine for ubuntu guest with two 10Gb sata disks.

Boot with 10.10 server i386 iso image
Partiion as follows:

2 500mb partitions(one for each disk) used as physical volumes for raid1, device md0
2 partitions using all the remaining space (one for each disk) used as physical volumes for raid1, device md1

md0 mounted as /boot formatted with ext2
md1 as physical volume for encryption
md1_crypt as physical volume for LVM

3 LVM volume groups for /, /home and swap

complete the installation with defaults, install grub on MBR of both disks, and reboot.
System boots correctly.
Apt-get update && upgrade, reboot, system boots correctly.
Poweroff. Remove one of the two virtual disks from the VM, doesnt matter which one, reboot, fails with the message above.
Poweroff. Re-add the disk to the VM, poweron. System boots normally.

debian-installer version: 20100211ubuntu29
mdadm version: 2.6.7.1-1ubuntu16
cryptsetup version: 2:1.1.2-1ubuntu1

Revision history for this message
Peter Stolt (stormare) wrote :

Similar problem hit me when I installed Ubuntu Server 10.10 and wanted to confirm that both disks were capable of booting alone (in order to simulate a disk failure "2 years from now" (since previous installations always had problems with the MBR only being written to /dev/sda))

I do not use LVM on the boot device, so that differs from the original bug report.

To confirm the bug, I also reproduced this on two different computers with different hardware architectures.

During the installation of Ubuntu Server 10.10, I used the installer partitioner to create the following setup:

md0 = /dev/sda1 , /dev/sdb1
md1 = /dev/sda2 , /dev/sdb2
md2 = /deV/sda3 , /dev/sdb3

cryptsetup with luks key thing like this:
md0 => md0_crypt
md1 => md1_crypt

fstab:
md0_crypt => / (ext4)
md1_crypt => swap
md2 => /boot (ext4)

After installation, poweroff and EITHER of the two /dev/sda and /dev/sdb removed physically, the bootup fails with very cryptic error messages in the bootup text.

Classic "printf debugging" (with echo in the bash scripts), I conclude that the bug is when the raid1 arrays are assembled. It fails due being degraded EVEN THOUGH I selected the option to boot even if degraded during the installation.

By modifying the initrd image like this "ugly workaround", I was able to circumvent the bootup bug:

mkdir /root/initrd-temp
cd /root/initrd-temp/
cp /boot/initrd.img-2.6.35-28-generic /boot/initrd.img-2.6.35-28-generic.orig
cp /boot/initrd.img-2.6.35-28-generic .
gzip -d < initrd.img-2.6.35-28-generic | cpio --extract --verbose --make-directories --no-absolute-filenames
rm initrd.img-2.6.35-28-generic

################################
vi scripts/init-premount/mdadm
  #added this line to the end of the script, just before the exit 0 line
  mdadm --assemble --scan --run
################################

find . | cpio -H newc --create --verbose | gzip -9 > initrd.img-2.6.35-28-generic
mv initrd.img-2.6.35-28-generic /boot/

During the debugging, I also found out that there are no arguments ($1 has no value) passed to the mdadm script. Thus the "mountfail" case in the bottom of the script is never going to be triggered:

case $1 in
# get pre-requisites
prereqs)
        prereqs
        exit 0
        ;;
mountfail)
        mountroot_fail
        exit 0
        ;;
esac

Thus, the code segment in the mountroot_fail call is never activated, regardless if the choice in the installation was to set boot degraded to true:
if [ "$BOOT_DEGRADED" = "true" ]; then
                        echo "Attempting to start the RAID in degraded mode..."
                        if mdadm --assemble --scan --run; then

Revision history for this message
Peter Stolt (stormare) wrote :

Short summary ...

... the UUID and other "cryptic error messages" are consequences of the raid1 devices not being assembled. Thus, the corresponding UUID:s are never created and the result is fail in later stages.

An example:
scripts/local-top/cryptroot , line 272
if /sbin/cryptsetup isLuks $cryptsource > /dev/null 2>&1; then

This fails since isLuks returns faluse, since the $cryptsource is the md1 array that never was assebmled and does not exist at all and it then fallbacks to default cryptsetup asking for passphrase that is guaranteed to fail.

Another example:
cryptsetup: evms_activate is not available
This was the first error text I got which was pretty cryptic (no pun intended!) error message of UUID device not existing

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in debian-installer (Ubuntu):
status: New → Confirmed
Revision history for this message
dienteperro (dienteperro1207) wrote :

Reproduced in a Vmware virtual machine running Zentyal 3.3 (Ubuntu 12.04 LTS based) tried with two vmdisks in raid1 also, 3 partition as physical volumes for raid, one as /boot and two others as physical volumes for encryption used as swap and /. Grub installed on both disks, and confirmed positive to boot in degraded mode. After all was setup ok, reboot without one of the disks removed (tried on both with the same result) and get message "cryptsetup: evms_activate is not available" and after a while get the prompt of initramfs' busybox.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

This is a somewhat old LP bug, but for completeness it worth mentioning here that recently some patches were merged on initramfs-tools and cryptsetup, that allow a good experience booting with LUKS-encrypted rootfs on top of a degraded RAID1 array; for details, please check: https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/1879980

Cheers,

Guilherme

Changed in debian-installer (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.