Boot fails with degraded mdadm raid

Bug #1635049 reported by Grant Slater on 2016-10-19
138
This bug affects 25 people
Affects Status Importance Assigned to Milestone
mdadm (Debian)
Fix Released
Unknown
mdadm (Ubuntu)
High
Dimitri John Ledkov
Xenial
High
Dimitri John Ledkov

Bug Description

[Impact]

 * Systems fail to boot in certain status of mdadm arrays, requiring manual recovery / array assembly

 * Backport of boot logic from yakkety

[Test Case]

 * Install a system with RAID1 and two hard-drives and boot the system with array in-sync
 * Shutdown
 * Disconnect one of the drives and thus boot, unexpectedly, degraded
 * The boot should complete.
 * Shutdown, and boot again, expecting degraded state.
 * The boot should complete.
 * Shutdown, reconnect disconnected drive, and boot again.
 * The boot should complete, add the device to the array, the array should be resyncing, and results with system with array in-sync, just like at the beginning of the testcase.

[Regression Potential]

 * Systems may continue to fail to boot degraded.

[Other Info]

 * Original report

mdadm does not attempt to start partial md devices (incremental assembly) during initramfs and can cause system to fail to initramfs prompt if rootfs on md.

http://askubuntu.com/questions/789953/how-to-enable-degraded-raid1-boot-in-16-04lts

Fixed in debian mdadm 3.4-2: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=784070

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
tags: added: regression-release xenial
Brian Murray (brian-murray) wrote :

This is fixed in 16.10 and Zesty, but I'm leaving the task open so the bug doesn't disappear from some LP searches.

Changed in mdadm (Ubuntu Xenial):
assignee: nobody → Dimitri John Ledkov (xnox)
importance: Undecided → High
status: New → Triaged
Changed in mdadm (Ubuntu):
status: Confirmed → Triaged
Changed in mdadm (Debian):
status: Unknown → Fix Released
Changed in mdadm (Ubuntu):
importance: Undecided → High
assignee: nobody → Dimitri John Ledkov (xnox)
Brian Murray (brian-murray) wrote :

Dimitri - Do you have any plans to get this fixed in Xenial?

Changed in mdadm (Ubuntu):
milestone: none → ubuntu-17.02
chrone (chrone81) wrote :

Will there be any backport patch for Ubuntu 16.04.2?

Just tested this out yesterday and Ubuntu 16.04.2 with latest update still could not boot secondary drive.

My test was done on Linux mdadm RAID1 with LVM. (/dev/md0 for /boot xfs, and /dev/md1 for LVM with /root xfs and swap).

description: updated
description: updated
Dimitri John Ledkov (xnox) wrote :

Proposing the following fix:

mdadm (3.3-2ubuntu7.2) xenial; urgency=medium

  * Backport initramfs changes from 3.4-4, to improve reliability of
    booting with degraded arrays. LP: #1635049

  * debian/initramfs/hook:
    - Fix UUID= grep for configured RAIDs to be case insensitive.
    - Drop CREATE stanzas from mkconf and don't include them in the
    initramfs. The generated defaults, are the compiled-in defaults. And
    the current one generates warnings when running mdadm in the
    initramfs, as there is no passwd|group files to resolve root/disk
    uid/gid.
  * debian/initrmafs/script.local-block|script.local-bottom:
    - Use local-block integration scrips, in favor of root-fail hooks to
    activate incomplete arrays.
  * debian/initramfs/init-premount|mdadm-functions:
    - Drop, no longer in use.

 -- Dimitri John Ledkov <email address hidden> Mon, 20 Feb 2017 10:57:43 +0000

It is available from Bileto:
https://bileto.ubuntu.com/#/ticket/2500

Publish in this ephemeral PPA:
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2500

Changed in mdadm (Ubuntu):
status: Triaged → Fix Released
Changed in mdadm (Ubuntu Xenial):
status: Triaged → In Progress
milestone: none → xenial-updates
Dragan S. (dragan-s) wrote :

PPA located at:
https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2500

Fixes my degraded raid boot issues. System boots up fine after this PPA is applied.

Łukasz Zemczak (sil2100) wrote :

I accepted the package from the queue to -proposed but since it was a sync I couldn't do it through the proper tooling (I think I need to get educated how to do it properly for synces?). Anyway, this means that there was no auto-release message. Please install the package from xenial-proposed, test it and mark the bug as verification-done as with any other SRU bug.

Thank you!

tags: added: verification-needed
Changed in mdadm (Ubuntu Xenial):
status: In Progress → Fix Committed
Adam Blomberg (paradox606) wrote :

Hi Lukasz and Dragan, I just received confirmation from a customer that the propsed package also fixed the issue on their ibm power environment as well.

This was using the ca-train-ppa-service ppa however, so I'll check again using the proper proposed repository package.

Dragan S. (dragan-s) on 2017-02-23
tags: added: verification-done
removed: verification-needed
Brian Murray (brian-murray) wrote :

Was the "check again using the proper proposed repository package" made?

tags: added: verification-needed
removed: verification-done

I can do that. However, I am not sure if that is necessary since this is a
sync from bileti ppa, and thus it is identical binaries.

On 1 Mar 2017 22:31, "Brian Murray" <email address hidden> wrote:

Was the "check again using the proper proposed repository package" made?

** Tags removed: verification-done
** Tags added: verification-needed

--
You received this bug notification because you are a bug assignee.
https://bugs.launchpad.net/bugs/1635049

Title:
  Boot fails with degraded mdadm raid

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1635049/+subscriptions

Dimitri John Ledkov (xnox) wrote :

retested without ppa, and just with proposed using 3.3-2ubuntu7.2.

tags: added: verification-done
removed: verification-needed
Adam Blomberg (paradox606) wrote :

I have also validated the proposed package on my reproducer sandbox, and it worked correctly.
Still awaiting feedback from customer with ppc64le system, as soon as I hear back I will let you know.

-Adam

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 3.3-2ubuntu7.2

---------------
mdadm (3.3-2ubuntu7.2) xenial; urgency=medium

  * Backport initramfs changes from 3.4-4, to improve reliability of
    booting with degraded arrays. LP: #1635049

  * debian/initramfs/hook:
    - Fix UUID= grep for configured RAIDs to be case insensitive.
    - Drop CREATE stanzas from mkconf and don't include them in the
    initramfs. The generated defaults, are the compiled-in defaults. And
    the current one generates warnings when running mdadm in the
    initramfs, as there is no passwd|group files to resolve root/disk
    uid/gid.
  * debian/initrmafs/script.local-block|script.local-bottom:
    - Use local-block integration scrips, in favor of root-fail hooks to
    activate incomplete arrays.
  * debian/initramfs/init-premount|mdadm-functions:
    - Drop, no longer in use.

 -- Dimitri John Ledkov <email address hidden> Mon, 20 Feb 2017 10:57:43 +0000

Changed in mdadm (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for mdadm has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Eero (eero+launchpad) wrote :

I just made a fresh install from ubuntu-16.04.2-server-amd64.iso, updated everything, and tested to boot without one disk. It failed. See the attachment for my RAID configuration.

https://imgur.com/a/RApJS

On 5 April 2017 at 07:28, Eero <email address hidden> wrote:
> I just made a fresh install from ubuntu-16.04.2-server-amd64.iso,
> updated everything, and tested to boot without one disk. It failed. See
> the attachment for my RAID configuration.
>
> https://imgur.com/a/RApJS
>

Please open new a new bug report, instead of piling onto an unrelated report.

And your boot is waiting for you to unlock the encrypted volume...
only after which the volume groups will be detected.

I do not see anything degraded in your case at all.

Note your test-case is completely different to this bug report as it
also involves encrypted volume.

Regards,

Dimitri.

Eero (eero+launchpad) wrote :

> And your boot is waiting for you to unlock the encrypted volume...
> only after which the volume groups will be detected.

Why do you lie? You didn't even look what I reported?

The first screenshot clearly shows that the boot fails before password is even asked.
The second screenshot shows that the password is asked when I attached the drive again.

I've submitted a bug report regarding this to Ubuntu and Debian, but nobody seems to care. It's 2017 and RAID 1 doesn't work...

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1680448
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859691

Dimitri John Ledkov (xnox) wrote :

On 10 April 2017 at 16:57, Eero <email address hidden> wrote:
>> And your boot is waiting for you to unlock the encrypted volume...
>> only after which the volume groups will be detected.
>
> Why do you lie? You didn't even look what I reported?
>

File a new bug report with text logs.... not photographs / screenshots.

> The first screenshot clearly shows that the boot fails before password is even asked.
> The second screenshot shows that the password is asked when I attached the drive again.
>
> I've submitted a bug report regarding this to Ubuntu and Debian, but
> nobody seems to care. It's 2017 and RAID 1 doesn't work...
>
> https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1680448
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859691
>
>
> ** Bug watch added: Debian Bug tracker #859691
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859691
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1635049
>
> Title:
> Boot fails with degraded mdadm raid
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1635049/+subscriptions

--
Regards,

Dimitri.

Eero (eero+launchpad) wrote :

> File a new bug report with text logs.... not photographs / screenshots.

And this is a perfect example why Ubuntu is a piece of garbage. Someone reports a serious issue with Ubuntu, but the whole bug gets dismissed, because the reports are in a "wrong" format.

How do you even get early boot logs out of Ubuntu when the boot fails? You didn't even mention that.

Is there some way to contact professionals at Canonical instead of these amateur wise asses on this platform?

Eero (eero+launchpad) wrote :

From Ubuntu's own documentation https://wiki.ubuntu.com/DebuggingKernelBoot:

> If you are unable to capture a log file, a digital photo will work just as well.

Hopefully someone competent will see these messages at some point.

This is still broken in Xenial with mdadm 3.3-2ubuntu7.2

It seems that this is still somehow broken with Xubuntu 17.04

I just tried mdadm within Xubuntu 17.04 in Virtualbox using the following setup:
1 virtual disk containing the OS.
2 virtual disks running as a RAID 1.
Whenever I disconnect one of the RAID disks, Xubuntu cannot boot.

I wrote the script shown here:
https://askubuntu.com/questions/789953/how-to-enable-degraded-raid1-boot-in-16-04lts
to the file /usr/share/initramfs-tools/scripts/local-top/mdadm
After that, Xubuntu can boot properly with the degraded RAID.

Is this the intended behaviour?
Thanks

Eero (eero+launchpad) wrote :

Is this bug report abandoned?

And why the bug is assigned to Dimitri John Ledkov? He is obviously incompetent. In comment number 18 he even claimed that photographs and screenshots aren't acceptable in bug reports even though Ubuntu's website clearly states otherwise.

Kevin Lyda (lyda) wrote :

Hey there kids, this bug still appears to be relevant for 16.04. My /dev/sda died today and I'm prepping to replace the disk. I note that the answer here https://askubuntu.com/a/798213/185653 notes the missing file and it's still missing.

I haven't tried a reboot as I'm waiting for the monthly check to complete on the other array but I've added in that script and rebuilt my initramfs and installed grub on sdb. Will let folks know how I get on.

The tone and comments in this bug from some reporters and others is... poor. I would have hoped for better.

Kevin Lyda (lyda) wrote :

This did not work but I'm not clear why. It might have been be messing up the grub install.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.