initrd can't mount root luks partition correctly on raid1

Bug #298906 reported by razboinik
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cryptsetup (Ubuntu)
Fix Released
Medium
Unassigned
mdadm (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Well I created a raid 1 for my root partition and then encrypted it with luks, updated initrd and restarted
and when booting it failed it took me lot of time to debug but then i noticed that
on file:
scripts/local-top/cryptroot: line 236

if /sbin/cryptsetup isLuks $cryptsource > /dev/null 2>&1; then
failed and was taking my partition as if it werent luks
so i just commented that out (and the else) update initrd and my partition booted well

so not sure if the cryptsetup luks identification just fails on raid or normal partition also but is still a bug

also, i can't successfully enter my passphrase the first time (for what i see in the script must be a timeout issue)
/dev/md1 is not ready when prompted with the password, so there should be a check for this

Revision history for this message
razboinik (razboinik) wrote :

well it fails first time regardless
so dmraid is setting up the raid after the first attempt, so is not a problem with the cryptsetup but the init script

Revision history for this message
Daniel Hahler (blueyed) wrote :

Which version of Ubuntu are you using (and especially cryptsetup ("dpkg -s cryptsetup|grep Version:")?

Changed in cryptsetup:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
razboinik (razboinik) wrote :

Intrepid 8.10
Cryptsetup Version: 2:1.0.6-6ubuntu2.1

Revision history for this message
razboinik (razboinik) wrote :

i forgot to mention, i updated cryptsetup, but this error was on the original intrepid version also
i noticed the md0 gets initialized before calling cryptsetup but my root fs is on md1

Revision history for this message
Mattias Evensson (mevensson) wrote :

I can confirm this bug. I have my root fs on LVM on a LUKS encrypted md1. Same version of Ubuntu and Cryptsetup as razboinik.

The mounting usually succeeds after entering the password three or four times. The first time it always fails because "/sbin/cryptsetup isLuks" fails to detect that it's LUKS. The second and third failure are usually due to unknown fstype.

If I wait some time before entering the password the second time, it succeeds.

Revision history for this message
Mattias Evensson (mevensson) wrote :

I noticed that LUKS detection is outside the retry loop which puzzled me a bit. Then I noticed that I have two entries in /conf/conf.d/cryptroot, one for the root fs and one for swap and both are on the same LVM.

So I guess my previous comment wasn't entirely correct, it's always on the fourth attempt it succeeds, never on the third.

> If I wait some time before entering the password the second time, it succeeds.

That's also incorrect. The LUKS detection must have succeeded the time I tried waiting longer making the second attempt succeed.

Changing the script to force LUKS isn't enough to make it work for me. If I don't wait some time before entering the password I still get failures. So I guess the correct solution is to add a delay until the RAID-device is ready, but I have no idea how to do that.

Revision history for this message
ddom (dominikus-nold) wrote :

I can even confirm that bug. It first appeared on my 2.6.27-11 kernel (x86_64, SMP) running Ubuntu 8.10.

The strange thing is that it has been running fine until a few days ago. My system is running two s-ata disks in raid 1 mode so md0 gets started correctly, md1 stops as stated by the kernel message before running /scripts/local-top/cryptroot. After spending some time in debugging at which point the problem occurs, I noticed that adding a delay didn't really fix it, though it fixed it on another system running just 1 disk (ide) with raid1 setup. Before I even had to enter the passphrase exactly 4 times, after adding a loop (3 times checking for LUKS encryption) it works for the ide system.

The s-ata system still fails, so I added a function greping modprobe -l output and loading missing modules like raid1, etc. - if not found in modprobe output. Then - this was the most important part I noticed - I started mdadm --assemble --scan one time and added some sleep seconds (3 in my case). That caused md1 to start up even it has been stopped before by some other process I haven't identified by now.

I append my patch here for information purposes and for easier understanding. Maybe it's not the "best" way to work around this problem, but it works for now - tested on 2.6.27-11 on x86_64 with SMP (AMD Turion X2 Ultra) and 2.6.27-12 on x86 without SMP (VIA Nehemiah). I even played around a little bit before so there might be more changes that are not relevant (e.g. I tried to change the way the key is passed to cryptsetup to identify the point of failure).

Another side note: I noticed that since that time boot failed on my 2-disk s-ata system the kernel shows messages like "s-ata-1: soft reset failed" (s-ata 1, 2, 3 and 4). I got to know this when I left usplash for debugging purposes temporarily in cryptroot script.

It would be a great deal if someone would be able to identify the problem and whether it might be related to an ide/s-ata controller problem when resetting the device and then stopping the raid device (though it has been started before) just before executing cryptroot or anything related to mdadm daemon.

Revision history for this message
exactt (giesbert) wrote :

@ddom: check out https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285392 regarding the "s-ata-1: soft reset failed" (s-ata 1, 2, 3 and 4)"

Changed in cryptsetup (Ubuntu):
assignee: nobody → Reinhard Tartler (siretart)
status: Incomplete → Confirmed
assignee: Reinhard Tartler (siretart) → nobody
Changed in mdadm (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Reinhard Tartler (siretart) wrote :

Just triaging this bug.

I don't have a test system here, but this bug reads pretty valid. According from the changelog, it doesn't look that much has changed in karmic, though.

Revision history for this message
Reinhard Tartler (siretart) wrote :

please also note that bug #324997 indicates that there are some systems that do have working root on crypt on raid1. this supports my guess that this is a race condition.

Revision history for this message
Mattias Evensson (mevensson) wrote :

I can only speak for myself but I don't have this problem anymore. I think it was solved when I upgraded to Jaunty, I'm currently running on the Karmic beta.

I still have the same setup with the root fs on LVM on a LUKS encrypted md1.

I checked the commit log in the bazaar branch and found this:
revno: 56
tags: 2:1.0.6-7ubuntu6
author: Steve Langasek <email address hidden>
committer: Bazaar Package Importer <email address hidden>
branch nick: jaunty
timestamp: Sat 2009-03-07 13:39:14 -0800
message:
  debian/initramfs/cryptroot-script: we don't require vol_id to understand
  the encrypted device, but we should check the device is fully up first
  before continuing by calling udevadm settle. LP: #291752.

It added a call to udevsettle before the retry loop which might explain why it works.

Revision history for this message
Reinhard Tartler (siretart) wrote :

sounds plausible.

does anybody still have problems with booting from root on crypt on raid1?

Changed in cryptsetup (Ubuntu):
status: Confirmed → Incomplete
Changed in mdadm (Ubuntu):
status: New → Incomplete
Revision history for this message
ceg (ceg) wrote :

setting to dupe/fixed since expired incomplete bugs are cluttering the default bug listing...

ceg (ceg)
Changed in cryptsetup (Ubuntu):
status: Incomplete → Fix Released
Changed in mdadm (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.