booting from raid in degraded mode ends in endless loop

Bug #1077650 reported by Bernd Schubert
98
This bug affects 21 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Confirmed
Undecided
Dimitri John Ledkov

Bug Description

Its basically the same as reported here:

http://efreedom.com/Question/6-103895/Can-Boot-Degraded-Mdadm-Array

So I just installed a new system, which is supposed to get later on an additional disk. For now I created md raid1 devices with one disk missing. To get ubuntu booting at all without complaining about a missing disk I already added "bootdegraded=yes" to the kernel command line. And now it ends in an endless loop of

unused devices: <none>
Attempting to start the RAID in degraded mode...
mdadm: CREATE group disk not found
Started the RAID in degraded mode.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: initramfs-tools 0.103ubuntu0.2
ProcVersionSignature: Ubuntu 3.5.0-17.28-generic 3.5.5
Uname: Linux 3.5.0-17-generic x86_64
ApportVersion: 2.6.1-0ubuntu3
Architecture: amd64
Date: Sun Nov 11 16:26:55 2012
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: initramfs-tools
UpgradeStatus: Upgraded to quantal on 2012-01-08 (308 days ago)

Revision history for this message
Bernd Schubert (aakef) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
cmcginty (casey-mcginty) wrote :

This feature is 4 year old, I can't belive it still broken. I hit the same issue just now. Why is it so hard to boot a system in degraded mode. It should be a variable check to deterine if to keep booting or not. holy cow.

Revision history for this message
Dominik Zalewski (dominikz) wrote :

I was using this feature successfuly with 11.04 alternate. Right now I decided to upgrade to 12.04 LTS. It was frustrating to see that this is not working. I've then checked 12.10 and it's also not working.'

There was some other bug report

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/728435

that stated that the problem was in mdadm package (not initramfs-tools). And it has been fixed. But apparently this is not the case.

Revision history for this message
Dominik Zalewski (dominikz) wrote :

I just checked that it got broken between 11.10 and 12.04 release. So if there's someone capable of checking what versions of mdadm and initramfs-tools do these two distribution have, then maybe it leads to the solution.

One thing I noticed from a user perspective is that 11.10 installer (alternative) was asking me a question:

  do you want degraded RAID array to start

And in 12.04 and 12.10 the installer stopped asking this question. I'm not sure if it's relevant.

Revision history for this message
Francois du Plessis (fdup) wrote :

This is probably a duplicate of
"raid1 boot degraded mode fails " Bug #728435

Revision history for this message
Bernd Schubert (aakef) wrote :

I'm not so sure about that, without "bootdegraded=yes" it drops me to the shell and I only need to exit from that shell to continue to boot - the md-raid is already assembled then.
I really don't see a reason why the boot script should assemble differently with or without "bootdegraded=yes". If at all, it only should check for failed devices if "bootdegraded=no". And I also really think the hole boot-to-shell should be given up at all in favour of a GUI pop-up.

Steve Langasek (vorlon)
affects: initramfs-tools (Ubuntu) → mdadm (Ubuntu)
Revision history for this message
Denis Manente (denis-manente) wrote :

It also affects ubuntu server 13.04 64bit with raid5 device.
I think this bug should be classified 'High importance' cause it prevent server usage until the raid device is completely rebuilt.
In my configuration with 3 x 3Tb discs in raid 5 the complete rebuild can take 8 hours at standard speed.
But also few hours are a totally inacceptable time period in a production environment.

Revision history for this message
Stefan Tauner (stefanct) wrote :

this is *not* a duplicate (of bug #728435) as bernd wrote.
i am not entirely sure, because in my case i am also suffering from another crypto-related bug, but i think lvm2 might be involved too:

mountroot_fail() in /usr/share/initramfs-tools/scripts/init-premount/lvm2 does never return 0 although scanning for new lvs might actually fix things and should restart the search for the root device. but as said i am not sure... initramfs-tools are pretty complicated regarding interoperability of the various scripts/packages.

anyway, this needs way more attention... raids should be used to up reliability not downtimes :/
best (as always) would probably to see if debian is also affected and fix it there first then...

Revision history for this message
Brian Morton (rokclimb15) wrote :

I think LVM2 is related here as Stefan suggests. I have a 2 disk RAID1 with LVM, 1 PV (the RAID 1), 1 VG, 2 LVs (one for root, one for swap). I experience the same problem as Bernd.

I'm going to attempt the same setup without LVM and see if it works properly.

Revision history for this message
Brian Morton (rokclimb15) wrote :

Rebuilt my server with no LVM, just 2 drive RAID1 for root and another for swap with ext4 FS. Failed each disk individually and rebooted with no problems, then resynced. I think LVM certainly plays a role in this bug.

Revision history for this message
Thiago Martins (martinx) wrote :

So, brand new Ubuntu 14.04 and, impossible to boot it in degraded mode...

---
I just install a server with:

/dev/sda1 = /dev/md0 = /boot
/dev/sda2 = /dev/md1 = LVM VG: vg01 = LVM LV: swap / LVM LV: root (ext4)

/dev/sdb not present.

Installation finishes without any problem, system doesn't boot.

--

If I do not use LVM, like this:

/dev/sda1 = /dev/md0 = /
/dev/sda2 = SWAP

/dev/sdb not present.

Ubuntu also doesn't boot (in degraded)...

I also tried this:

/dev/sda1 = /dev/md0 = /boot
/dev/sda2 = /dev/md1 = /
/dev/sda3 = SWAP

/dev/sdb not present.

Doesn't boot either.

---

This problem have nothing to do with LVM on top of raid1...

After adding /dev/sdb, sync raid1, system boots okay.

Come on guys, I'm a "Ubuntu Evangelist" but, problems like this are a shame. Sorry, I don't want to be rude... Sad, but true.

Regards,
Thiago

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Brian Morton (rokclimb15) & Thiago Martins (martinx) please open new bugs about your issues.

We have automatic testing of RAID installations and degraded-boot testing. However I do not believe we test "actually performing the installation degraded". I will check our test scenarios and test the conditions described above, when i get time later this week.

Changed in mdadm (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
Revision history for this message
Thiago Martins (martinx) wrote :

Okay, I'll do it.

Revision history for this message
Brian Morton (rokclimb15) wrote :

I did not file a new bug only because it was a system that I was sending off for production use and I stuck with the non-LVM RAID for reliability. I could attempt reproduction on another system, but I'm pretty sure it will happen on any system with the steps I described.

Note that I didn't attempt to install on a degraded RAID. It was already installed. I was just testing the ability of the system to boot degraded. It fails with LVM2.

Revision history for this message
Bernd Schubert (aakef) wrote :

Today a disk did not come up anymore in the morning and I just noticed that the bug is not fixed yet in mdadm-3.2.5-5ubuntu4.1.
In fact, it got worse "bootdegraded=yes" is now default (ubuntu 14.04) and cannot be disabled anymore and so the system stays in an endless loop of "mdadm: CREATE group disk not found" messages. The only chance to survice the system was to boot a rescue system. As I didn't have a spare disk I had to get the system to boot in degraded mode.

Below some diagnostics. Please note, I'm not familiar at all how the Ubuntu initramfs scripts are assembled from their pieces.

Diagnostic 1) In /usr/share/initramfs-tools/scripts/mdadm-functions

I disabled (commented out) the incremental if-branch
( mdadm --incremental --run --scan; then), instead only the assemble mdadm command run. After re-creating the initramfs and rebooting the "mdadm: CREATE group disk not found" message was only shown *once*, it then complained that it didn't find root partition and dropped to the busybox shell. MUCH BETTER!
Investigating on the shell I noticed that md devices had been assembled in degraded mode. Also, running "mdadm --assemble --scan --run" and it brought up the same disk group message. So seems to be a bug in mdadm to show this message and to return an error code.
After running "vgchange -ay" I could leave the shell and continue to boot

Diagnostic 2) I now changed several things as we needed this system to boot up automatically

2.1) I mad mountroot_fail to *always* execute 'vgchange -ay'

mountroot_fail()
{
    mount_root_res=1
    message "Incrementally starting RAID arrays..."
    if mdadm --incremental --run --scan; then
        message "Incrementally started RAID arrays."
        mount_root_res=0
    else
        if mdadm --assemble --scan --run; then
            message "Assembled and started RAID arrays."
            mount_root_res=0
        else
            message "Could not start RAID arrays in degraded mode."
        fi
    fi

      # note, if someone does that, she probably should change it to vgchange -ay || true
    vgchange -ay

    return mount_root_res
}

2.2) /usr/share/initramfs-tools/scripts/init-premount/mdadm

case mountroot_fail now exits with 0, not with the exist code of mountroot_fail

case $1 in
# get pre-requisites
prereqs)
        prereqs
        exit 0
        ;;
mountfail)
        mountroot_fail
        exit 0
        ;;
esac

. /scripts/functions

I think that is all I changed and the system now boots up in degraded mode like a charm.

Revision history for this message
Mark Thornton (mthornton-2) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.