Booting from a degraded array could be improved

Bug #125471 reported by Reinhard Tartler on 2007-07-12
This bug report is a duplicate of:  Bug #120375: cannot boot raid1 with only one disk. Edit Remove
14
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: mdadm

From a discussion on ubuntu-devel:

> Scott James Remnant <email address hidden> writes:
> > * md activation:
> > - We now have a single udev rule for both the real system and the
> > initramfs, since doing things differently there will only result in bugs
> > and confusion.
> > - This rule runs "mdadm --assemble --scan --no-degraded", automatically
> > activating any non-degraded device as their components are detected.
>
> Does this mean that booting from a degraded array is no longer possible?
>
> Suppose you have your root filesystem on an array, and one harddrive
> dies. Will the system boot, or will the boot process and in a busy box
> shell?
>

The boot process will hand for three minutes and leave you at a busy box
shell. This could be improved so that the problem is detected earlier
and the option of getting to the shell (and with suggested commands)
appears earlier.

Once in the shell, you can force the raid to run; this will update the
metadata so that the array can be run degraded automatically in future.

On Thursday 12 July 2007, Reinhard Tartler wrote:
> The boot process will hand for three minutes and leave you at a busy box
> shell. This could be improved so that the problem is detected earlier
> and the option of getting to the shell (and with suggested commands)
> appears earlier.
>
> Once in the shell, you can force the raid to run; this will update the
> metadata so that the array can be run degraded automatically in future.

Thiis is no good for those that have only remote access to a machine. For my
fileserver I would need to carry it out from under the stairs and connect a
monitor up etc. Not something I would want to do.

Best Regards

Jools
--
IT Consultant
Oxford Inspire
t: 01235 519446 m: 07966 577498
<email address hidden>

Scott James Remnant <email address hidden> writes:

>> Reported as https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/125471
>>
> This is not a regression though, right? We found nothing that made an
> attempt to force a degraded array to be used at boot.

No, I'd expect it to be triaged as wishlist bug for enhancment of the
package. I can imagine that espc. the ubuntu server people would be
interested in solving it.

> What suggestion would you make for it?

Basically the solution you suggested: Detect root on a degraded array,
and drop to busybox with a short explanation how to boot in degraded
mode.

> Note that removing --no-degraded doesn't work, since that's required to
> actually assemble the array. Without it, it would be always assembled
> degraded since block devices don't turn up at once. We'd need to
> examine the array status to see whether all components have been
> detected, and some marked as discarded.

Okay, so the detection would have to use some timeout value for
assembling the raid after finding/detecting the first drive.

> This won't work if a drive is removed; we can't tell the difference
> between that and a drive just hasn't been inserted or detected yet.

It seems to me that this should be configurable then. The default could
be to abort booting if the array is not complete with instructions how
to boot in degraded mode, and add documentation to the ubuntu-server
guide how to make the server boot without interaction in degraded mode
if a disk is missing.

The local admin would want to be notified (by email or other means)
about this incident, then.

--
Gruesse/greetings,
Reinhard Tartler, KeyID 945348A4

For the benefit of those stumbling on this bug looking for a work-around, just doing "mdadm --run /dev/md0" (assuming the array in question is /dev/md0) should be sufficient to fail the missing disk and making the array run in degraded mode. You then reboot, and the system should come up.

Natasha Polovski (mrnoahparker) wrote :

I ran into this same problem. It's going to prevent me from using Ubuntu on servers where I don't have ready access to the console. If it cannot boot unattended in degraded mode, that means my server is down, even though it has an array to use.

In case it's of value, I also encountered this issue when by boot partition was not a RAID array, but other partitions on that same disk were part of an array. It still dropped me to a BusyBox prompt when the array was degraded, even though the boot partition was fine.

It'd be great to have this configurable -- let the system boot with the array degraded.

I'm new here -- how does one check the status of a fix for this bug? Is there a way to vote for its priority? Thanks everyone!

Nick Barcet (nijaba) wrote :

ivoks is having the same problem. Trying to explain over the phone to a customer what to do in that case is no fun.

Changed in mdadm:
importance: Undecided → Medium
status: New → Confirmed

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been testing both, Ubuntu Server and Debian in VM's to make a decision for installation for a production server.
One of the tests I ran was exactly this: Using mirroring for all of the system's partitions, including the OS.
I reproduced exactly the same steps for both OS's like 3 times each.
After each installation, I tried shutting down and removing one of the disks.
For Debian, succeeded all times. Reconnected the disk and resynchronized the disks and everything went OK.
For Ubuntu Server, failed all times, taking me to busy box shell. I could connect the disk back and boot it up without doing anything else.
Thought I've installed Ubuntu Server much more than 3 times before trying Debian out, I have further info about this behavior.
Twice, after reconnecting the disk back, I could see the disk resynchronizing, but at about 16% or so, I got kernel panic due to inability to synchronize disks in RAID1.

About the VM's:
Running latest VMware Server.
Ubuntu Server 7.10 - i386
Debian 40r3-i386

Hope it is of some use while pursuing a definite solution for this issue.

Best regards,
Fabiano Antunes (mactimes)
<email address hidden>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iQIVAwUBR/72XHmASCWcPmiYAQJUMw/+KxbeEzClFLLbeGAOSs55JybfpsBz+7Ju
VExSOyHCzRasyIKAlxYAl0syFezaIsPUHULrTFQAa9W4q9D5MNspCobC4hjAUl8z
GUE4DHoWj5RzOQQo6HS+p7Y8IwFA/LMqlIFGarFrWsUK6iV6T/LHRJNR5s8XOu4l
hsjprY8a9xNVD75CeHK6DCgg1XXEcwizPtRgp/GUB+O6kHc1oAq7qdGmT8Bgk9ap
jo3I6fHhxi80PMlEM1ZL8YDFWWLUuFfeI0J/4OzyParRTtakT0rgP3tcrmiWmrbO
HJ1+LKqFwMM++eayj0cvKvgU+KCDkeirGNPosE0lNOvlvGtum8MJm6VgOl2xhKRc
j74JdKUndpprqF5y5umTlVgRPZcZfkFeIKOsC3nfrex6SI4DAtXtJTqDPXpdN4W2
yCLdhYZvuHQ+AkFxsDAa+lTZBNXf3jgH2i+A9zMpDnE2gV/Qu2LILz5z6hVlZU6t
FJtmSFACyKX2WENE3X/9tHkxbOaFrfOx4J9TbLWgHN4AmfwGOPYKpVlPfYjHilgE
k64lFM1QrWa9ttBkTMfbIW2QqiqFP5p5VBT3hhoC21cYdRP9KaOZ3juab7kKw9KU
ESDgcmfOdYuzk3lhn5Cq4EXCM1ejGzloHBOmKKiwDhi4FCybbFRnaVjIgsHGQGsK
exvYMoXmPiw=
=Rpy3
-----END PGP SIGNATURE-----

Peter Funk (pf-artcom-gmbh) wrote :

Natasha Polovski wrote on 2007-12-16:
> It's going to prevent me from using Ubuntu on servers where I don't have ready access to the console.
> If it cannot boot unattended in degraded mode, that means my server is down, even though it has an array to use.

I support this and this is an very important issue to us.

In Gutsy we dirty patched the script
    /usr/share/initramfs-tools/scripts/local
by adding a conditional
      # now without the --no_degraded as in /etc/udev/rules.d/85-mdadm.rules
       /sbin/mdadm --assemble --scan
and than reran update-initramfs -k all
to solve the problem temporarily before rollout.

Of course this dirty patch above will get overwritten (got overwritten)
by the new initramfstools Package 0.85eubuntu36 which is installed due
to an upgrade to 8.04 Hardy. :-(

I'am seeking a cleaner solution to this very important problem.
This is a real show stopper:
A system which doesn't boot unattended if one of the mirrored
hard disk is removed due to a failure doesn't make any sense at all.
Why bother adding a second mirror disk and use RAID1, if it doesn't
work without the second disk?

FWIW, we discussed this at UDS yesterday and hope to add a
configurable option for Intrepid, such that a system administrator can
specify whether or not to boot a system on a degraded RAID device.

:-Dustin

Alexander Dietrich (adietrich) wrote :

Bug #120375 contains a potential fix in one of the last comments by Ken.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related blueprints