Cannot boot degraded RAID1 array with LUKS partition

Bug #1196693 reported by Boi Sletterink
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
initramfs-tools (Ubuntu)
Confirmed
Undecided
Unassigned
initramfs-tools-ubuntu-core (Ubuntu)
Invalid
Critical
Bert

Bug Description

When pulling out a disk on my 12.04.2 RAID1 setup, which contains a LUKS container inside an md device, my system won't boot. Plugging the second disk back in worked, but I wanted to replace my disks, and if a disk is broken you don't have that option...

Debugging the initramfs boot sequence seems to indicate that the crypto handling is done before degraded array handing, rendering the BOOTDEGRADED flag ineffective.

I've looked at other bugs (#1077650 #1003309 #728435 #106215) but I think it's a different problem.

Situation

I've got a LVM-in-LUKS-in-RAID1 setup, with a separate, RAID'ed bootpartition.

# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 sda2[0] sdb2[2]
      523968 blocks super 1.2 [2/2] [UU]

md127 : active raid1 sda1[0] sdb1[2]
      976106048 blocks super 1.2 [2/2] [UU]

unused devices: <none>

md127 contains a LUKS container, called ugh2_lvm.
ugh2_lvm contains an LVM with a volume group called ugh2_vg.
ugh2_vg contains LV's called "root" (the root filesystem) and "swap".

# mount | grep /dev/m
/dev/mapper/ugh2_vg-root on / type ext4 (rw,relatime)
/dev/md126 on /boot type ext4 (rw)

# cat crypttab
ugh2_lvm UUID=69ade3d3-817d-42ee-991b-ebf86e9fe685 none luks

# grep 'DEGRADED=' /etc/initramfs-tools/conf.d/mdadm
BOOT_DEGRADED=true

Symptoms

Booting seems to hang with a message "evms_activate is not available". I'm not using EVMS so the message is not really indicative of the problem. Perhaps you get dropped to a shell after a lot of time (3 minutes? I saw a time-out of 180 seconds in the scripts somewhere) but that took too long for me.

Diagnosis

Interrupting the boot process with break=premount let me take a look at the situation. Turns out the degraded arrays assembled, but inactive; the BOOT_DEGRADED handling activates the degraded arrays (scripts/local-premount/mdadm). However, it does not get the chance to do so before the scripts try to open the LUKS device with the configured UUID, since this is done by scripts/local-top/cryptroot. "*-top" scripts are run before "*-premount" scripts.

Workaround / solution

I made it work again by linking /usr/share/initramfs-tools/scripts/local-premount/mdadm -> /etc/initramfs-tools/scripts/local-top/mdadm, then rebuilding my initramfs (update-initramfs -u).

It seems to work well. Not sure if it's the best or even a clean approach.

Revision history for this message
Boi Sletterink (boisletterink) wrote :
Revision history for this message
Boi Sletterink (boisletterink) wrote :

Forgot to mention that booting from a degraded array did work when I installed the system (12.04 RC2). That was slightly different - in that setup, I created the setup degraded. I added the other disk afterwards.

Revision history for this message
Boi Sletterink (boisletterink) wrote :

Fixed package that I think contains the bug.

affects: mdadm (Ubuntu) → initramfs-tools (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
Anatoli (anatoli) wrote :

Boi, check a similar solution here: https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/251164 (comment 30).

Revision history for this message
Nathan Rennie-Waldock (nathan-renniewaldock) wrote :

I'm currently testing this in a 14.04.1 virtual machine.
After disconnecting one HDD, boot stays on "Waiting for encrypted source device...", then drops to a shell (tried a few times before looking into it).
/proc/mdstat shows the array is assembled, but not been started. Manually starting it (with `mdadm -R`) was successful, so I rebooted to try again.. it booted fine degraded has done every boot since (below output is after booting degraded).

My setup is:
2 HDDs. Each contains 1 partition which is part of a RAID1 array (md0).
md0 is LVM with a VG called "system".
"system" contains 2 LVs, boot (unencrypted), root (LUKS)

# mount | grep /dev/m
/dev/mapper/system-root_crypt on / type xfs (rw)
/dev/mapper/system-boot on /boot type ext4 (rw)

# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Mon Jan 5 20:56:33 2015
     Raid Level : raid1
     Array Size : 20952960 (19.98 GiB 21.46 GB)
  Used Dev Size : 20952960 (19.98 GiB 21.46 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Wed Jan 7 16:28:18 2015
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : ubuntu-raid1:0 (local to host ubuntu-raid1)
           UUID : 66eefd8b:ad7f449d:73d180a0:cbcabb44
         Events : 187

    Number Major Minor RaidDevice State
       0 8 1 0 active sync /dev/sda1
       1 0 0 1 removed

tags: added: trusty
tags: added: utopic
Changed in initramfs-tools (Ubuntu):
importance: Undecided → Critical
Bert (bertdieltjens)
affects: initramfs-tools (Ubuntu) → initramfs-tools-ubuntu-core (Ubuntu)
Changed in initramfs-tools-ubuntu-core (Ubuntu):
assignee: nobody → Bert (bertdieltjens)
Revision history for this message
Oliver Grawert (ogra) wrote :

not a bug in ubuntu snappy, marking as invalid for initramfs-tools-ubuntu-core

Changed in initramfs-tools-ubuntu-core (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
xor (xor) wrote :

Ubuntu 18.04 is still affected with a blank test installation.
Please update the bugtracker entry to reflect this.

I would be really happy if this could be fixed, it's been 5 years and this breaks using RAID with dm-crypt :(

Steps to reproduce:

- Install via network installer, create the following partition layout manually:
{sda1, sdb1} -> md RAID1 -> btrs -> /boot
{sda2, sdb2} -> md RAID1 -> dm-crypt -> btrfs -> /

- After the system is installed and confirmed as working, shutdown and remove sdb

- Boot will now hang at "Begin: Waiting for encrypted source device ...". That will timeout eventually and drop to an initramfs shell, complaining that the disk doesn't exist.

Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.