boot impossible due to missing initramfs failure hook / event driven initramfs

Bug #251164 reported by ceg on 2008-07-23
60
This bug affects 8 people
Affects Status Importance Assigned to Milestone
cryptsetup (Ubuntu)
Medium
Unassigned
mdadm (Ubuntu)
Undecided
Unassigned

Bug Description

the cryptsetup package needs proper integration into the initramfs failure hooks

The system must not assume a particular layered setup or sequence of appearance of (hotplug/udev/crypt/md/degraded-md/lvm) devices.

It must be able to boot with any possible combination raid, lvm, crypt combination that the debian-installer can produce.

Description of solution in comment #15

ceg (ceg) wrote :

This applies to both initramfs and init.d scripts.

(root and home raids with devices on external disks take longer)

sleep 0.1 seconds intervalls are probably ok

description: updated

I think this is fixed in intrepid. could you please check?

ceg (ceg) wrote :

Hallo Reinhard,
are you refering to udev settling?
One of my usb disks is so slow in initializing, that it won't even trigger an udev event until 8 seconds after udev has been started in initramfs.

Is the intrepid initramfs-tools/scripts/local-top/cryptroot viewable somewhere on the web?

no, I'm referring to the loop that waits until the device appears
--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

Ok, a timeout loop looking for (slower) source devices sounds just what I was missing / suffering from in hardy. I whish there were a webcvs or somthing, would make it easier to compare/check, maybe there is and I don't know. http://packages.ubuntu.com seems unavailable.

You may change status if you wish, I'll take your word, thank you.

Alles Gute.

ceg (ceg) wrote :

Ah, packages server is back up.

I see the whole rootdelay loop has been copied from the local script and also the dropping to the console when ${cryptsource} is missing.

But the recent local script will execute additional failure hooks. For example degraded arrays will only get started by these.

When cryptroot drops to a console is like it does now it will basically prevent the local script from running any failure hooks. A system with a degraded array that contains the cryptroot for example won't come up. The array will only be run degraded after the rootdelay in local has timed out.
(Relevant is Bug 120375 and the updated mdadm packages from
https://launchpad.net/~kirkland/+archive)

It should be save to just remove the loop that drops to the console in init-top/cryptroot and exit to local instead. That would mean another rootdelay loop will have timeout for sure, though, until failure hooks might eventually help out. (And if for example the array can be started degraded, init-top/cryptsetup will still need to provide a failurehook, or won't be run again to open the rootdevice).

But here is a suggestion for further improvement:

Have the local script call cryptroot (or better an "eventhook", registerd on the first run just like the failure hooks) from within the rootdelay loop in the local script.

The eventhook of local-top/cryptroot should be called (once) if ${cryptsource} comes available.

The eventhook would only be needed for things like cryptsetup that may not be set up by udev rules in an coldplug driven boot process.
Instead, they would set a trigger within the single rootdelay loop in local.

(Or would it be possible to generate special udev rules during initramfs generation that can recognize non-luks partitions, and have a fully coldpluggable system?)

ceg (ceg) wrote :

Just checked that the same applies for encrypted non-rootfilesystems, too. (/etc/init.d/cryptdisks*) If for example external disks are slow crypdisk fails.

Since md and lvm devices are set up by udev: init.d/cryptdisk-early and cryptdisks (or just cryptdisk.functions) need to contain a timeout loop checking for their source device before failing.

Or better: If the cryptsetup can done by udev rules. (Maybe udev rules that are created on the fly by the boot script with the help of crypttab and a raw signature of the device?)

description: updated

ceg,

If initramfs-tools were managed in bzr, there would be a web-viewable way of browsing the source. But it's not. The best I can suggest is to start at:
 * https://launchpad.net/ubuntu/+source/initramfs-tools

Click on the release for Intrepid, as the case may be:
 * https://launchpad.net/ubuntu/intrepid/+source/initramfs-tools/0.92bubuntu10

Pull the source with:
 * dget https://launchpad.net/ubuntu/intrepid/+source/initramfs-tools/0.92bubuntu10/+files/initramfs-tools_0.92bubuntu10.dsc

Extract the sources with:
 * dpkg-source -x initramfs-tools_0.92bubuntu10.dsc

:-Dustin

Dustin Kirkland  (kirkland) wrote :

Reinhard-

As discussed in IRC, I do believe that utilizing the failure hooks in the initramfs would be the best approach.

In the initramfs-tools sources, see:
 * scripts/functions, add_mountroot_fail_hook(), and try_failure_hooks(), and the inline comment just above try_failure_hooks().

For an example of how to create/use a failure hook, pull the mdadm sources and see:
 * debian/initramfs/init-premount

Note that the failure hook will exit "0" if it thinks that it did something to positively affect the situation, and "1" otherwise.

Finally, the bit that brings this all together is back in the initramfs-tools sources, in scripts/local, see:
 * if root_missing "${ROOT}" && ! try_failure_hooks; then ....

Hopefully that helps get this a little further along? Reinhard- ping me again on IRC if you need some more help ;-)

:-Dustin

Reinhard Tartler (siretart) wrote :

So based on comment #6, this thread is discussing now a completely different bug. I'm updating the title and bugdescription now.

description: updated
Changed in cryptsetup:
assignee: nobody → kirkland
status: New → Triaged
Changed in cryptsetup (Ubuntu):
importance: Undecided → Medium

Reinhard-

Thanks for the update of the bug title, etc. As for assigning it to me, I appreciate the vote of confidence, but I don't think I'm going to have the cycles to work on this any time soon. Sorry.

Thanks,
:-Dustin

Changed in cryptsetup (Ubuntu):
assignee: Dustin Kirkland (kirkland) → nobody

present in 9.10

description: updated
summary: - proper integration in failiure initramfs hooks
+ boot errors due to missing initramfs failure hook integration
summary: - boot errors due to missing initramfs failure hook integration
+ boot impossible due to missing initramfs failure hook integration
ceg (ceg) wrote :

Current state of ubuntu systems with md raid: https://wiki.ubuntu.com/ReliableRaid

ceg (ceg) wrote :

What about using the event driven upstart mechanisms within initramfs, too? (LP: #491463)

(We need somthing that can handle any hotplugging sequence and stacking of lvm,raid,crypt,...)

ceg (ceg) wrote :

copying this conclusion from #531240 as it rather belongs here

----
As far as I can see cryptsetup in initramfs is not called on the event that a crypt device appears. It seems cryptsetup in initramfs is currently rather linear script driven: the cryptsetup script has its own while loop waiting for $cryptsource after all other "local top" scripts. "Failure hooks" have been introduced in initramfs and mdadm, but I don't see one for cryptsetup. And I have doubts that such a two step design can cope with the general case of devices depending on others.)

The simple case: rootfs on lvm on crypt on raid:

0) The md0 raid (sda,sdb) got degraded during power down,
1 udev/mdadm does not start the array,
2) crypt on raid does not come up for 2,5 minutes until ROOTDELAY timeout and init-top fails (putting aside this bug, that cryptsetup is wrongly opening a member device)
3) mdadm failure hook runs the array degraded
4) boot currently fails never the less #531240 (but could be made to work with a cryptsetup failure-hook for this case)

If however the rootfs is actually located on md1 assembled of md0 (the internal disks) an sdc, with the failure hook design there is no further timeout given or failurehook available after md0 is started degraded to bring the rootfs up if the (external backup disk) sdc is not connected.

(This is not far fetched because only stacking raids this way allows to take advantage of write intent bitmaps for a (backup) disk that are not connected all the time. (sdc in this case))

So to handle the genreal case with unforeseen combinations, I think in initramfs:

- cryptsetup udev rules should be supplied into initrams as well (new_crypt_device event, restriced to the rootfs dependency devices)
- The initramfs should have just one ROOTDELAY waiting loop in its script (or faster loading upstart/mountall binary?) started upon initramfs_start, that is paused however while cryptsetup it is prompting for a passphrase (prompt_start/stop events).
- Package mdadm needs to supply a MIN_COMPLETION_WAIT value and the dependency tree of arrays for the root device on mkinitramfs.
- During boot when time_elapsed == MIN_COMPLETION_WAIT (raid_start_degrated event)
   -If a next level in the dependency tree exists and the remaining root delay timer is lower then MIN_COMPLETION_WAIT the rootdelay_timer is increased by MIN_COMPLETION_WAIT.
   -The degraded arrays of the current dependency level are started degraded.

ceg (ceg) on 2010-03-09
summary: - boot impossible due to missing initramfs failure hook integration
+ boot impossible due to missing initramfs failure hook / event driven
+ initramfs
ceg (ceg) wrote :

Scott, I put you on here because from your upstart/mountall experience you may well spot flaws and benefits in the design.
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/251164/comments/15

ceg (ceg) on 2010-03-29
Changed in mdadm (Ubuntu):
status: New → Confirmed
ceg (ceg) on 2010-03-30
description: updated
Phillip Susi (psusi) wrote :

Cryptsetup has to be done via script rather than udev rule since it has to prompt for input, and udev rules can't do that.

I think just two things need done to fix this:

1) cryptsetup needs a failure hook
2) try_failure_hooks needs to continue trying all failure hooks rather than return as soon as one succeeds

This would allow the failure hooks to activate one or more degraded raid arrays, and finally cryptsetup to unlock the volume.

ceg (ceg) wrote :

The udev rules don't need to prompt, the cryptsetup that gets called will prompt. Actually, these things work quite well in the normal system. It seems preferable to me to adjust/improve the regular tools to be usable in initramfs as well, rather then trying to script up and maintain! another thing inherently limited.

Failure hooks for example will be called in a specific order (not necessarily matching the setup), if you are looping this should time out with a message, then if the user plugs in the missing disk it won't come up automatically (and if it does possibly only with a large delay as all unrelated loops need to time out first), etc. Thus, its better to go event driven, and then why not favor a proven tool.

Reasonable timeouts (see above and https://wiki.ubuntu.com/ReliableRaid) are another reason to go event driven, they should not be several waiting loops stacked up and possibly blocking each other.

Bug #491463 "support upstart within an initramfs"
https://launchpad.net/~csurbhi/+archive/natty-initramfs

ceg (ceg) wrote :

> Since udev already provides an event driven framework in
the initramfs, why add another one?

Hmm, if you would like to realize event driven init scripts, I believe you may be able to rework the scripts from doing linear pre...post things to just call a watchdog script that mostly sleeps and checks how things are going on. And then have separate task scripts that get only called by udev events, or the watchdog.

Phillip Susi (psusi) wrote :

If the udev rule calls cryptsetup, then it would be trying to prompt, and so it can't do that.

What invokes cryptsetup in the normal system? Don't you have to do it by hand right now if you hot plug a luks disk?

The order doesn't have to match the setup as long as they keep being called as long as any one indicates that it managed to take some corrective action.

AFAICS, there is only one wait, and then it goes into failure hook mode.

ceg (ceg) wrote :

I believe the initramfs only sets up the rootfs, other partitions (/home) are set up afterwards. If I remeber correct cryptsetup is called by udev rules. In any case, that is the way it has to be, event driven, to catch on upon (/home) devices appearing without polling loops and sleep delays.

When I looked, I think bootwait, mdadm and cryptsetup where all looping and sleeping in initramfs independently in whatever course they get called. Therefore I suggested the watchdog timers I mentioned above and in the wiki.

Phillip Susi (psusi) wrote :

mdadm is event driven via udev. The special part in the initramfs is the failure hook so that a degraded array will only be activated after a timeout. I'm not sure if we want to auto degrade arrays post boot, but I suppose if we did, we could move that logic from the failure hook to the udev rule.

What confuses me is cryptsetup. I see that there is an upstart job that appears to try and prompt for the password and unlock the device, but as an upstart task, it has no stdin/out, so that prompt can't work. After the system goes multi user you can no longer do a console password prompt. If you want an interactive prompt at that point, you will need something that can interact with the X desktop, and anything that does that obviously isn't going in the initramfs.

On Mon, May 14, 2012 at 02:15:10PM -0000, Phillip Susi wrote:
> mdadm is event driven via udev. The special part in the initramfs is
> the failure hook so that a degraded array will only be activated after a
> timeout. I'm not sure if we want to auto degrade arrays post boot, but
> I suppose if we did, we could move that logic from the failure hook to
> the udev rule.

It can't be done in a udev rule. Udev rules have a short timeout of their
own, after which the outstanding process will be killed. The handling has
to be passed off to something else - such as an upstart job.

> What confuses me is cryptsetup. I see that there is an upstart job that
> appears to try and prompt for the password and unlock the device, but as
> an upstart task, it has no stdin/out, so that prompt can't work.

The upstart job talks to plymouth. The initramfs script *also* talks to
plymouth.

> After the system goes multi user you can no longer do a console password
> prompt. If you want an interactive prompt at that point, you will need
> something that can interact with the X desktop, and anything that does
> that obviously isn't going in the initramfs.

The client-side logic is the same; what we're missing is a way to let
cryptsetup do plymouth interaction with something on the desktop after X is
started.

Phillip Susi (psusi) wrote :

Can't the udev rule fork and let the child wait for the timeout, then activate the array degraded?

I think the source of my confusion is that upstart is bluring the line between udev, which was designed to do event driven processing in response to hardware detection, and conventional sysvinit, which was designed to run things at startup. Doing this sort of thing with an upstart job makes it dependent on upstart, which means we won't be able to use systemd or sysvinit as alternatives ( which I think makes it a deal breaker for Debian doesn't it? ).

If possible, it would be better to find a way for this processing to be kicked off via udev without depending on upstart, either for later system init, or especially in the initramfs, where adding it would have significant overhead.

ceg (ceg) wrote :

Looking at the timouts suggested in Comment #15, I think they may actually be realizable with modular scripts. A base rootdelay script, and separate mdadm and cryptsetup sripts (that get called by their udev rules) can halt/extend the regular rootdelay (exported variable? named pipe?), if waiting for user input or a new level of raid degrading timeout. (As pipes block, reading from pipes may do good for event based processing.)

Phillip, your idea of handling udev events by modular event handlers (that may again trigger udev events, if I interpret you correctly) may also be what appears in part of the OpenRC discussion.
On init in Debian http://lists.debian.org/debian-devel/2012/03/msg00452.html
RFC: OpenRC as Init http://lists.debian.org/debian-devel/2012/04/msg00547.html

Teemu Toivola (vergo) wrote :

Commenting here since bug #324997 has been marked as duplicate and the behaviour appears to have changed since the original report. With Ubuntu 12.04, booting with a degraded raid (due to missing disk) causes the boot process to stop at "Begin: Waiting for encrypted source device... ...". There's a timeout of around 180 seconds after which a busybox prompt becomes available. It's possible to continue from that prompt by commanding "mdadm --run /dev/mdX" followed with ctrl-d. That causes cryptsetup to find the device, prompt for a passphrase and continue the boot procedure.

I noticed that this problem occurs only when a disk is missing from the raid. Setting one disk as failed and then booting with all disks but still in degraded state will result in some warnings during the boot process but there will be no waiting for timeouts or busybox prompts.

Another suggestion I found in several places and ended up testing was to replace the UUID in /etc/crypttab with the device itself (/dev/mdX). That results in no timeout and cryptsetup asking for the passphrase but it will not accept the correct passphrase as long as any disk from the raid is missing. There's no busybox prompt available either resulting in a system that can't boot.

ceg (ceg) wrote :

I am sorry I had to find out the same as Teemu for 12.04.1.

no longer affects: cryptsetup

I confirm behavior described by Teemu Toivola (vergo) in Ubuntu server 12.10. After removing one (no matter which one) drive from RAID1 i got:
cryptsetup evms_activate is not available
Begin: Waiting for encrypted source device.
If Raid1 is valid (with both drives connected) initram asks correctly for password.

François Marier (fmarier) wrote :

My problem (encrypted RAID1 drive refusing to boot when degraded) was fixed by adding a new initramfs boot script to start the RAID array before cryptsetup runs:

http://feeding.cloud.geek.nz/posts/the-perils-of-raid-and-full-disk-encryption-on-ubuntu/

That's on 12.04.3.

Anatoli (anatoli) wrote :

Actually, the solution proposed by François Marier should work, but only if you have one RAID array and this array is called md0.

A completely flexible and at the same time simple solution is to use the function mountroot_fail located at /usr/share/initramfs-tools/scripts/mdadm-functions, which correctly detects the bootdegraded options passed via a kernel param or in the mdadm.conf and knows how to activate all the arrays in the system. This is how degraded arrays are treated when there is no cryptsetup (which interferes in the middle of the mdadm handling scripts).

So, just run this command (adjust the line numbers if your version differs):

echo -e '205a206,212
> \t\t. /scripts/mdadm-functions
> \t\t. /scripts/functions
> \t\tmountroot_fail
> \t\techo "mountroot_fail returned $?\\n"
> \tfi
>
> \tif [ ! -e $cryptsource ]; then' | patch -n /usr/share/initramfs-tools/scripts/local-top/cryptroot

and then

update-initramfs -u -k all

And your system will boot with degraded arrays.

If not yet done, you'll need to change the ID of the disk for EFI partition to its dev name so the system can mount it independently of the disk it boots from, e.g.
/dev/sda1 /boot/efi

Anatoli (anatoli) wrote :

There is a similar solution mentioned here: https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1196693, which proposes to link /usr/share/initramfs-tools/scripts/local-premount/mdadm -> /etc/initramfs-tools/scripts/local-top/mdadm. Looks like it's the simplest, but I'm not sure it doesn't break something else.

Forgot to mention in the previous post: the EFI partition disk change should be made in /etc/fstab.

Mechanix (mechanix) wrote :

I also can confirm this bug on 14.04 First time notices on Ubuntu 8.04.2. Take a look here:

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/324997

and here:

http://feeding.cloud.geek.nz/posts/the-perils-of-raid-and-full-disk-encryption-on-ubuntu/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions