Bug #120375 “cannot boot raid1 with only one disk” : Bugs : mdadm package : Ubuntu

Revision history for this message

peterh (peter-holik) wrote on 2007-06-14:

#1

Sorry folks, being tooo fast.

This is not working if all discs are present, now i understand the include of MD_DEGRADED_ARGS

my new workaround is to add a bootmenue entry in grub

title Ubuntu, kernel 2.6.20-16-generic (raid defect)
root (hd0,1)
kernel /boot/vmlinuz-2.6.20-16-generic root=/dev/md1 ro raid_degraded
initrd /boot/initrd.img-2.6.20-16-generic

and /etc/initramfs-tools/scripts/init-premount/raid_degraded

#!/bin/sh

set -eu

PREREQ="udev"

prereqs()
{
echo "$PREREQ"
}

case ${1:-} in
  prereqs)
    prereqs
    exit 0
    ;;
  *)
    . /scripts/functions
    ;;
esac

if [ -e /scripts/local-top/md ]; then
log_warning_msg "old md initialisation script found, getting out of its way..."
exit 1
fi

MDADM=$(command -v mdadm)
[ -x $MDADM ] || exit 0

if grep raid_degraded /proc/cmdline 2>/dev/null; then
echo "MD_DEGRADED_ARGS=' '" >> /conf/md.conf
fi

exit 0

now if a disc is defect and i will to start only with one disc i choose raid defect from bootmenu

Revision history for this message

Peter Haight (peterh-sapros) wrote on 2007-08-18:

#2

Patch to boot after timeout using degraded RAID drives. Edit (1.1 KiB, text/plain)

I've got a different way to fix this. After reading Bug #75681, it became clear why they have MD_DEGRADED_ARGS in there. They have that because /scripts/local-top/mdadm gets called every time a device marked raid gets called, but they only want mdadm to build the array once all the devices have come up.

So I've stuck a line in the startup script that tries to mount root and if mounting root times out, then we try to run mdadm again, but this time we let it try and run with degraded disks. This way it will still startup automatically in the presence of RAID failures.

Revision history for this message

Peter Haight (peterh-sapros) wrote on 2007-08-18:

#3

So, there is something wrong with that patch. Actually it seems to be working great, but when I disconnect a drive to fail it, it boots up immediately instead of trying mdadm after the timeout. So I'm guessing that the mdadm script is getting called without the from-udev parameter somewhere else. But it is working in some sense because the machine boots nicely with one of the RAID drives disconnected, or with both of them properly setup. So there might be some race problem with this patch.

Revision history for this message

brunus (reg-paolobrunello) wrote on 2007-11-15:

#4

Hey Peter, when I try to apply your patch I get questioned where the /scripts/local file is located. I coulnd't find it. Could you please specify it better? thanks ,

brunus

Revision history for this message

peterh (peter-holik) wrote on 2007-11-15: Re: [Bug 120375] Re: cannot boot raid1 with only one disk

#5

> Hey Peter, when I try to apply your patch I get questioned where the
> /scripts/local file is located. I coulnd't find it. Could you please
> specify it better? thanks ,

this patch is not from me - ist is from the other Peter in this bugreport

cu Peter

Revision history for this message

brunus (reg-paolobrunello) wrote on 2007-11-16:

#6

Hey Peter Haight,
brunus again. I can't find any mdadm file in /usr/share/initramfs-tools/scripts/local-top so the patch just hangs. I'm using Edubuntu 7.10 btw: is this patch just for Feisty?

thanks,

Paolo

Revision history for this message

Peter Haight (peterh-sapros) wrote on 2007-11-16:

#7

I've only tried it on fiesty. I don't have a gutsy machine handy. You have the 'mdadm' package installed, right? If so, then they must have moved stuff around. You could try this:

find / -type f -print0 | xargs -0 grep MD_DEGRADED_ARGS

That will search the whole file system for the a file with MD_DEGRADED_ARGS in it, but if they've moved stuff around, the patch probably won't work anyway.

Revision history for this message

Daniel Pittman (daniel-rimspace) wrote on 2007-11-16:

#8

Peter Haight <email address hidden> writes:

> I've only tried it on fiesty. I don't have a gutsy machine handy. You
> have the 'mdadm' package installed, right? If so, then they must have
> moved stuff around. You could try this:

The model for starting mdadm disks has changed substantially in gutsy;
it is now driven from a udev rule, building the devices as disks are
discovered.

Regards,
Daniel
--
Daniel Pittman <email address hidden> Phone: 03 9621 2377
Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au
Cybersource: Australia's Leading Linux and Open Source Solutions Company

Revision history for this message

brunus (reg-paolobrunello) wrote on 2007-11-19:

#9

Thanks for the information Daniel,
but it's still unclear to me whether this problem has been solved yet.

brunus

Revision history for this message

Fumihito YOSHIDA (hito) wrote on 2007-12-03:

#10

Dear brunus,

The problem is not solved in 7.04/7.10.

You have to use "raid_degraded" grub entry with /etc/initramfs-tools/scripts/init-premount/raid_degraded way.
(and update your initramfs, execute "sudo update-initramfs -u")

And, I test with 7.10, mdadm_2.6.2-1ubuntu2_<arch>.deb(7.10's) *does not work* in this way.
When you use 7.10, please down-grade mdadm package.

Use 7.04's mdadm package: "mdadm_2.5.6-7ubuntu5_<arch>.deb".

Revision history for this message

Davias (davias) wrote on 2007-12-14:

#11

Dear Fumihito,
Thank you for your help in this very annoying bug with ubuntu (so long for the "new&improved" version)...

I'm running 7.10, tried the above procedure and realized (as you stated) that mdadm_2.6.2-1ubuntu2 DOES NOT work on 7.10 with grub trick.

3 questions:

1) How to degrade to mdadm_2.5.6-7ubuntu5_<arch>.deb as you suggest? In Synaptic packet manager the option "force" is not available...

2) I'm a little bit scared of running a previous version of mdadm on my RAID1 created with 2.6.2: is it safe?

3) Since this is just a workaround and not a solution (the system should automatically start without user selecting a grub option), shall I wait for a new mdadm?

Thanks in advance for yours and everybody's help

Revision history for this message

Fumihito YOSHIDA (hito) wrote on 2007-12-14:

#12

Dear Davias,

1)
I suppose that you use 386, when you use another one(e.g. amd64),
please use your arch's pkg.

- Please download from http://archive.ubuntu.com/ubuntu/pool/main/m/mdadm/
$ wget http://archive.ubuntu.com/ubuntu/pool/main/m/mdadm/mdadm_2.5.6-7ubuntu5_i386.deb

- Install with dpkg command.(in your gnome-terminal)
$ sudo dpkg -i mdadm_2.5.6-7ubuntu5_i386.deb
In this case, we cannot depend on Synaptic.
Downloading and dpkg exec is not a well-mannered procedure, but useful.

- And, "hold" this package. This command is important. If you not set "hold",
update-manager will upgrade mdadm package...(and break your effort).
$ sudo dpkg-hold mdadm

2)
I cannot assert about your concern, but I tested in some case,
system was working well.
If you have worry, re-run mdadm settings with 2.5.6-7ubuntu5.

3)
hmm... This is a difficult quesition.
Old version of mdadm is working well, but I cant understood any reason...
So I cant tell you how you had better do it.

Revision history for this message

Davias (davias) wrote on 2007-12-14:

#13

Dear Fumihito,
first of all, thanks for your fast reply!

1) No, ?No, I'm on amd64. I just went to the archive ubuntu site you suggested and found out that the following exists:

mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_amd64.udeb 13-Dec-2007 01:04 76K
mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_i386.udeb 13-Dec-2007 00:04 73K
mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_powerpc.udeb 13-Dec-2007 01:04 85K
mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_sparc.udeb 13-Dec-2007 01:04 91K

...it seems a fresh new 2.6.3 version of mdadm - maybe it cures the bug ?!?

2) First time I deal with mdadm RAID, but from experiences on other OS with SW RAID, I learned (the hard way) that it is safe not to mess around with driver version different than the ones that created the arrays. But if I have no alternatives... I will try.

3) I tried to find details of this new 2.6.3, but found none. Good common sense makes me think that the mantainer od mdadm was aware of the bug and solved it in this new version, making the all "scripting from grub menu" solution unnecessary...

Suggestions?

Revision history for this message

Davias (davias) wrote on 2007-12-15:

#14

Searching, I found out that latest release is 2.6.4 - but no option in ubuntu repository (yet?).

ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
From: Neil Brown <email address hidden>
To: <email address hidden>
Subject: ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
Date: Fri, 19 Oct 2007 16:06:29 +1000
Message-ID: <email address hidden>
Archive-link: Article, Thread

I am pleased to announce the availability of
mdadm version 2.6.4

It is available at the usual places:
http://www.cse.unsw.edu.au/~neilb/source/mdadm/

Does some of the following changes applies to our bug?

Changes Prior to 2.6.4 release
    - Make "--create --auto=mdp" work for non-standard device names.
    - Fix restarting of a 'reshape' if it was stopped in the middle.
    - Fix a segfault when using v1 superblock.
    - Make --write-mostly effective when re-adding a device to an array.
    - Various minor fixes

Changes Prior to 2.6.3 release
    - allow --write-behind to be set for --grow.
    - When adding new disk to an array, don't reserve so much bitmap
        space that the disk cannot store the required data. (Needed when
1.x array was created with older mdadm).
    - When adding a drive that was a little too small, we did not get
the correct error message.
    - Make sure that if --assemble find an array in the critical region
of a reshape, and cannot find the critical data to restart the
reshape, it gives an error message.
    - Fix segfault with '--detail --export' and non-persistent
        superblocks.
    - Various manpage updates.
    - Improved 'raid4' support (--assemble, --monitor)
    - Option parsing fixes w.r.t -a
    - Interpret "--assemble --metadata=1" to allow any version 1.x
metadata, and be more specific in the "metadata=" message printed
with --examine --brief
    - Fix spare migration in --monitor.

Changes Prior to 2.6.2 release
    - --fail detached and --remove faulty can be used to fail and
remove devices that are no longer physically present.
    - --export option for --detail or present information in a format
that can be processed by udev.
    - fix internal bitmap allocation problems with v1.1, v1.2 metadata.
    - --help now goes to stdout so you can direct it to a pager.
    - Various manpage updates.
    - Make "--grow --add" for linear arrays really work.
    - --auto-detect to trigger in-kernel autodetect.
    - Make return code for "--detail --test" more reliable. Missing
devices as well as failed devices cause an error.

Searching, I found out that latest release is 2.6.4 - but no option in ubuntu repository (yet?).

ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
From: 	 	Neil Brown <neilb@suse.de>
To: 	 	linux-raid@vger.kernel.org
Subject: 	 	ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
Date: 	 	Fri, 19 Oct 2007 16:06:29 +1000
Message-ID: 	 	<18200.18789.190714.387367@notabene.brown>
Archive-link: 	 	Article, Thread

I am pleased to announce the availability of
   mdadm version 2.6.4

It is available at the usual places:
   http://www.cse.unsw.edu.au/~neilb/source/mdadm/

Does some of the following changes applies to our bug?

Changes Prior to 2.6.4 release
    -   Make "--create --auto=mdp" work for non-standard device names.
    -   Fix restarting of a 'reshape' if it was stopped in the middle.
    -   Fix a segfault when using v1 superblock.
    -   Make --write-mostly effective when re-adding a device to an array.
    -   Various minor fixes

Changes Prior to 2.6.3 release
    -   allow --write-behind to be set for --grow.
    -   When adding new disk to an array, don't reserve so much bitmap
        space that the disk cannot store the required data. (Needed when
	1.x array was created with older mdadm).
    -   When adding a drive that was a little too small, we did not get
	the correct error message.
    -   Make sure that if --assemble find an array in the critical region
	of a reshape, and cannot find the critical data to restart the
	reshape, it gives an error message.
    -   Fix segfault with '--detail --export' and non-persistent
        superblocks.
    -   Various manpage updates.
    -   Improved 'raid4' support (--assemble, --monitor)
    -   Option parsing fixes w.r.t -a
    -   Interpret "--assemble --metadata=1" to allow any version 1.x
	metadata, and be more specific in the "metadata=" message printed
	with --examine --brief
    -   Fix spare migration in --monitor.

Changes Prior to 2.6.2 release
    -   --fail detached and --remove faulty can be used to fail and
	remove devices that are no longer physically present.
    -   --export option for --detail or present information in a format
	that can be processed by udev.
    -   fix internal bitmap allocation problems with v1.1, v1.2 metadata.
    -   --help now goes to stdout so you can direct it to a pager.
    -   Various manpage updates.
    -   Make "--grow --add" for linear arrays really work.
    -   --auto-detect to trigger in-kernel autodetect.
    -   Make return code for "--detail --test" more reliable.  Missing
	devices as well as failed devices cause an error.

Revision history for this message

brunus (reg-paolobrunello) wrote on 2007-12-15:

#15

Davias,
have you tried any of the 2 releases?

brunus

Revision history for this message

Davias (davias) wrote on 2007-12-16:

#16

Dear brunus,
no: I have not. I was thinking about using Fumihito procedure with 2.5.6, but then I discovered that 2.6.3 existed in ubuntu repository (although Synaptic packet manager does not find any update over 2.6.2) and was thinking about installing that. Then I discovered 2.6.4 is the latest mdadm version as shown above. I downloaded the source... and stopped.

I do not have enough knowledge to compile & install something as critical as driver for RAID (my RAID with data on it...); is not that I'm scared of it, but is my production machine and cannot risk data failure or restore time.

Also I'm not convinced of the results. If this is to just result in a "so-so" procedure like selecting "foulty drive" from grub menu... I'll wait for a cleaner solution - like stable and safe mdadm driver release that will make my RAID1 start with only 1 disk WITHOUT me having to do anything, like it should be.

Thanks all for your precious thinking.

Dave out

Revision history for this message

dbendlin (diego-bendlin-hotmail) wrote on 2007-12-17:

#17

Well,

I went down to the root and installed Debian etch and tried mdadm and it worked just ok.

Regards,

Diego Bendlin

Revision history for this message

Davias (davias) wrote on 2007-12-17:

#18

Meaning what, exactly?

You replaced ubuntu with debian?

Revision history for this message

dbendlin (diego-bendlin-hotmail) wrote on 2007-12-18:

#19

Davias,

What id did is I replaced an ubuntu server 7.10 with a debian etch installation where I setup raid 1 using two sata disks, and after disconnecting one of the raid members System still was able to boot.

AFAIK ubuntu is debian's child, so I needed to try if the parent reproduced the error too, just to give you some more information to finally fix this issue on ubuntu 8.04 maybe.

Kind Regards,

Diego Bendlin

Revision history for this message

Davias (davias) wrote on 2007-12-18:

#20

Dear Diego,
thank you for clarifying matters, I'm glad that it works for you.

So now you got RAID1 as it is supposed to run. Just by changing OS...
Is it that difficult to get it to work on Ubuntu?!?
I mean, it is a serious bug - not dependent on the SW component but on OS version - and no solution?
I thought Ubuntu was one of the most supported distributions... We have to wait for another release?

Revision history for this message

dbendlin (diego-bendlin-hotmail) wrote on 2007-12-19:

#21

Well,

Im not a linux guru, soi I don't really understand why this is happening on Ubuntu (I find the same error since 7.04), I guess this could be issued a a bug fix to the current release but that will depend on the ubuntu development team.

In my opinion Ubuntu is great for desktop usage, I use it as dayly desktop and development workstation, I really like Ubuntu as desktop, haven't found a competitor thats is just that easy to install and setup, I know other distros are also great but normally you need a lot of expertise and configuration time to make it work as Ubuntu does out of the box.

As for the Ubuntu server release I must agree that this is a serious bug, for now I wouldn't install an ubuntu server if it needs to work over software raid (mdadm), after spending 2 weeks trying to make Ubuntu server work I finally switched to debian etch for my server installation.

Kind Regards,

Diego Bendlin

Revision history for this message

Jan Krupa (plnt) wrote on 2007-12-25:

#22

Hi,

The problem can be workarounded by issuing this command on the BusyBox shell when Ubuntu is missing one of the RAID disks:

/sbin/mdadm --assemble --scan

+ reboot

It will remove the missing disk from RAID 1 and allow Ubuntu to boot in degraded mode next time.

I think the root cause of the problem is in forcing mdadm not to start in degraded mode by "--no-degraded" parameter in /etc/udev/rules.d/85-mdadm.rules. If you remove "--no-degraded" parameter from mdadm in /etc/udev/rules.d/85-mdadm.rules and rerun "sudo update-initramfs -u", Ubuntu doesn't refuse to boot even if one of the disks is missing (after this change, no workarounds are needed). The problem is that it starts in degraded mode in some cases even if both disks are present.

Tested on Ubuntu/Gutsy.

I appreciate any comments.

Thanks,
Jan

Revision history for this message

Ken (ksemple) wrote on 2007-12-28:

#23

Hi,

I have been following this post for a couple of weeks and trying to solve this problem as a background job for around a month. So far I have tried many things to overcome this problem and the comment by Plnt seemed to be the most promising.

I tried issuing the --assemble --scan from the BusyBox shell as suggested with no luck, the system still wouldn't boot even though I was able to activate my degraded RAID sets in BusyBox.

I found the same problem when modifying the udev rules. Often the arrays would start degraded even when all the disks were available. I think the solution to this problem lies in modifying the udev rules, maybe we could add some code after the --no-degraded start attempt to start the arrays degraded if they haven't already started.

In my view this is a major problem, there is no point using a RAID1 root disk if you can't boot from a single disk if it's mirror fails.

Cheers,
Ken

Revision history for this message

Jan Krupa (plnt) wrote on 2007-12-28:

#24

Hi Ken,

I wasn't able to boot from degraded array after running "/sbin/mdadm --assemble --scan" in few cases when I had other disks in my computer. If I disconneted the disks and attached just the working one, system booted without a problem in degraded mode (after running the command mentioned above). I think the reason is that mdadm scans for any RAID devices by their signatures on the disk (because there is no /etc/raidtab accessible) and maybe it finds the singatures in different order each time.

There is also "--run" parameter in mdadm which can help assembling RAID in degraded mode.

Sorry for non-detailed description but I currently don't have the computer with Ubuntu+RAID1 physically with me so I can't do the tests.

Jan

Revision history for this message

Davias (davias) wrote on 2007-12-28:

#25

Dear Plnt & Ken,
thanks for providing & trying solutions to this "major problem" as it looks to me too.

But have any of you tried mdadm version 2.6.4, that I found around, as I stated a few posts up?

Regards,
Dave

Revision history for this message

Peter Haight (peterh-sapros) wrote on 2007-12-28:

#26

I haven't messed around with the problem in Gutsy, but what Daniel said above about the difference between Fiesty and Gusty is not correct. Fiesty was also launching mdadm as devices were discovered from udev and that is exactly the problem. Both are pretty much the same, just the code has moved around some. Unfortunately the box I fixed this on is now in production and I haven't had the chance to setup another test one to port my fix to Gutsy, so I'll explain what's going on, and maybe someone else can fix it. The version of mdadm doesn't have anything to do with this problem. This problem is entirely due to the Ubuntu startup scripts.

What Ubuntu is doing is as each device gets discovered by udev it runs:

mdadm --assemble --scan --no-degraded

If your RAID is made up of say sda1 and sdb1, then when 'sda1' is discovered by Linux, udev runs 'mdadm --assemble --scan --no-degraded'. Mdadm tries to build a RAID device using just 'sda1' because that's the only drive discovered so far. It fails because of the '--no-degraded' flag which tells it to not assemble the RAID unless all of the devices are present. If it didn't include the '--no-degraded' flag, it would assemble the RAID in degraded mode. This would be bad because at this point we don't know if 'sdb1' is missing or it just hasn't been discovered by udev yet.

So, then Linux chugs along and finds 'sdb1', so it calls 'mdadm --assemble --scan --no-degraded' again. This time both parts of the RAID (sda1 and sdb1) are available, so the command succeeds and the RAID device gets assembled.

This all works great if all the RAID devices are working, but since it allways runs mdadm with the '--no-degraded' option, it won't assemble the RAID device if say 'sda1' is broken or missing.

My solution was to wait until mounting root failed due to a timeout and then try 'mdadm --assemble --scan' without '--no-degraded' to see if we can assemble a degraded RAID device. Hopefully by the time the root mount has timed out, Linux has discovered all of the disks that it can. This works on my Fiesty box, but as I said above, stuff got moved around for Gutsy and I haven't had a chance to build another box to try it out and fix Gutsy. Also I think my script didn't take into account the scenario where the RAID device isn't root.

I haven't messed around with the problem in Gutsy, but what Daniel said above about the difference between Fiesty and Gusty is not correct. Fiesty was also launching mdadm as devices were discovered from udev and that is exactly the problem. Both are pretty much the same, just the code has moved around some. Unfortunately the box I fixed this on is now in production and I haven't had the chance to setup another test one to port my fix to Gutsy, so I'll explain what's going on, and maybe someone else can fix it. The version of mdadm doesn't have anything to do with this problem. This problem is entirely due to the Ubuntu startup scripts.

What Ubuntu is doing is as each device gets discovered by udev it runs:

mdadm --assemble --scan --no-degraded

If your RAID is made up of say sda1 and sdb1, then when 'sda1' is discovered by Linux, udev runs 'mdadm --assemble --scan --no-degraded'. Mdadm tries to build a RAID device using just 'sda1' because that's the only drive discovered so far. It fails because of the '--no-degraded' flag which tells it to not assemble the RAID unless all of the devices are present. If it didn't include the '--no-degraded' flag, it would assemble the RAID in degraded mode. This would be bad because at this point we don't know if 'sdb1' is missing or it just hasn't been discovered by udev yet.

So, then Linux chugs along and finds 'sdb1', so it calls 'mdadm --assemble --scan --no-degraded' again. This time both parts of the RAID (sda1 and sdb1) are available, so the command succeeds and the RAID device gets assembled.

This all works great if all the RAID devices are working, but since it allways runs mdadm with the '--no-degraded' option, it won't assemble the RAID device if say 'sda1' is broken or missing.

My solution was to wait until mounting root failed due to a timeout and then try 'mdadm --assemble --scan' without '--no-degraded' to see if we can assemble a degraded RAID device. Hopefully by the time the root mount has timed out, Linux has discovered all of the disks that it can. This works on my Fiesty box, but as I said above, stuff got moved around for Gutsy and I haven't had a chance to build another box to try it out and fix Gutsy. Also I think my script didn't take into account the scenario where the RAID device isn't root.

Revision history for this message

Ken (ksemple) wrote on 2007-12-28:

#27

Hi,

Plnt, I am currently re-syncing my RAID set and will then try again with my other drives disconnected. I tried removing my other disks at one stage, but can't recall whether I tried your suggestion in the BusyBox shell at the same time.

Davias, No I haven't tried 2.6.4. Wherever possible I try to use supported Ubuntu packages. This ensures that I have a simple support and upgrade path, and makes management of my machines considerably easier.

Peter; I agree, the problem isn't mdadm, it's the udev scripts (another reason I didn't pursue the mdadm version option). I am new to this, and only in the last week or so have been researching how udev works. How do you detect that mounting root has failed, and how do you hold off running mdadm until this point?

Thanks,
Ken

Revision history for this message

Peter Haight (peterh-sapros) wrote on 2007-12-28:

#28

I've forgotten about how this works exactly, but if you take a look in:

/usr/share/initramfs-tools/scripts/local

If you look for the comment
# We've given up, but we'll let the user fix matters if they canomment:

The bit inside the while loop with the panic is the part that gets executed if there is a timeout trying to mount root. Here's what I put in my Fiesty version right before that comment.

if [ ! -e "${ROOT}" ]; then
# Try mdadm in degraded mode in case some drive has failed.
/scripts/local-top/mdadm
fi

This doesn't work anymore because of the changes to Gutsy. You could just try putting 'mdadm --assemble --scan' there, but that probably won't work. Everything is a little tricky in these scripts because they run before root is mounted, so stuff doesn't always work as you would expect.

Also, you can't just modify these scripts. After you change them, you have to use 'mkinitramfs' to generate the image that contains these scripts that is used during the boot up. I'd put in instructions, but I've forgotten how to do it myself.

Revision history for this message

Ken (ksemple) wrote on 2007-12-28:

#29

Thanks, I'll give it a go. This is still a background task so it may take me a few days.

I discovered that you can't just edit these scripts a couple of weeks ago when I did some edits and they didn't take. It was Plnt's post which told me the how to rebuild the image (sudo update-initramfs -u). This helped me make some progress with udev.

I'll let you know how I go.

Cheers,
Ken

Revision history for this message

Ken (ksemple) wrote on 2007-12-29:

#30

Plnt, My RAID set finished re-syncing and I tried your suggestion. I removed all drives except one /dev/sda, the first of my RAID set. When I rebooted and was presented with the BusyBox shell I entered "/sbin/mdadm --assemble --scan" and rebooted. Still no luck!

I will persist with modifying the udev scripts as suggested by Peter.

Thanks,
Ken

Revision history for this message

Ken (ksemple) wrote on 2007-12-29:

#31

Hi,

Thanks everybody for your help. I have now fixed this on my machine using code similar to that suggested by Peter Haight.

Edit "/usr/share/initramfs-tools/scripts/local" and find the following comment "# We've given up, but we'll let the user fix matters if they can".

Just before this comment add the following code:

# The following code was added to allow degraded RAID arrays to start
if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; then
  # Try mdadm and allow degraded arrays to start in case a drive has failed
  log_begin_msg "Attempting to start RAID arrays and allow degraded arrays"
   /sbin/mdadm --assemble --scan
  log_end_msg
fi

Peter's suggestion of just using [ ! -e "${ROOT}" ] as the condition test didn't work, so I used the condition test from the "if" block above this code and it worked fine.

To rebuilt the boot image use "sudo update-initramfs -u" as suggested by Plnt. This script calls the "mkinitramfs" script mentioned by Peter and is easier to use as you don't have to supply the image name and other options.

I have tested this a couple of times, with and without my other drives plugged in without any problems. Just make sure you have a cron item setup to run "mdadm --monitor --oneshot" to ensure the System Administrator gets an email when an array is running degraded.

This has worked on my machine and I think this is a sound solution. Please let me know if it solves your problems also.

This bug has the title "Bug #120375 is not in Ubuntu", does this mean that it is not considered to be an Ubuntu bug? I believe it is due to the fact that involves the way the startup scripts are configured (it is definitely not a bug in mdadm). How do we get this escalated to an Ubuntu bug so that it will be solved in future releases?

Good luck,
Ken

Revision history for this message

Johannes Mockenhaupt (mockenh-deactivatedaccount) wrote on 2007-12-30:

#32

Ken,

Thanks for the patch. I've followed your instructions and tested booting with both physical discs and booting with one disc detached. The second test failed, the system would just stop like it did without the patch. Unfortunately I know next to nothing about udev, initramfs and friends so I can't do much by myself other than test.

Has anybody else tried Ken's solution?

Joe

Revision history for this message

Peter Haight (peterh-sapros) wrote on 2007-12-30:

#33

Did you wait 3 minutes on the test with one disc detached? I think that by default, there is a three minute timeout before it gets to the place where Ken's patch is.

Revision history for this message

Ken (ksemple) wrote on 2007-12-30:

#34

Johannes,

I agree with Peter, there is a 180 second delay in the code in an "if" block just before the suggested location for the patch. The comment above this delay says it is to ensure there is enough time for an external device to start should the root file system be on an external device. I changed this 180 second delay to 30 seconds for my testing.

I think it would be reasonable to put the mdadm patch before this delay also. The delay is OK if you are watching the system boot as it will grab your attention and remind you that you may have a problem.

Cheers,
Ken

Revision history for this message

Johannes Mockenhaupt (mockenh-deactivatedaccount) wrote on 2007-12-30:

#35

I thought I waited long enough - I had read that comment about the 180s delay - but I didn't. I just tested it again and booting with a detached disc continues after 3 minutes. Even mail notification worked right away :-) After re-attaching the second drive I was dropped into the BusyBox shell. Just restarting "fixed" that and the machine started and is just resyncing. I think that may be another problem on my machine that's unrelated with this bug. Thanks Ken and Peter for the help!

Joe

Revision history for this message

dbendlin (diego-bendlin-hotmail) wrote on 2007-12-31:

#36

Download full text (5.6 KiB)

Hello Guys,

Reading all of your post's helped me a lot understanding linux a little be more, thanks for sharing your knowledge.

As I stated a few post above I've tried mdadm on Debian and it works the just fine, so I compared the scripts in /usr/share/initramfs-tools/scripts and found out they are not so diferent from the ones in ubuntu but still there's a diference I wanned to share with you.

When you setup mdadm on debian, a mdadm file is created in /usr/share/initramfs-tools/scripts/local-top,(I'll paste its contend below). The scripts in this folder get called from the local script file (The one you guys suggest to patch). And if you compare the debian version of the local scrip, with ubuntu's version you'll find out its pretty similar. So I guess this could be a better solution since for example you wont have to wait 180 secs, and you don't include "intrusive" code in the local script.

Here goes the debian version of the local script file
[code]
# Local filesystem mounting -*- shell-script -*-

# Parameter: Where to mount the filesystem
mountroot ()
{
[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-top"
run_scripts /scripts/local-top
[ "$quiet" != "y" ] && log_end_msg

# If the root device hasn't shown up yet, give it a little while
# to deal with removable devices
if [ ! -e "${ROOT}" ]; then
log_begin_msg "Waiting for root file system..."

  # Default delay is 180s
  if [ -z "${ROOTDELAY}" ]; then
   slumber=180
  else
   slumber=${ROOTDELAY}
  fi
  if [ -x /sbin/usplash_write ]; then
   /sbin/usplash_write "TIMEOUT ${slumber}" || true
  fi

  slumber=$(( ${slumber} * 10 ))
  while [ ${slumber} -gt 0 ] && [ ! -e "${ROOT}" ]; do
   /bin/sleep 0.1
   slumber=$(( ${slumber} - 1 ))
  done

  if [ ${slumber} -gt 0 ]; then
   log_end_msg 0
  else
   log_end_msg 1 || true
  fi
  if [ -x /sbin/usplash_write ]; then
   /sbin/usplash_write "TIMEOUT 15" || true
  fi
fi

# We've given up, but we'll let the user fix matters if they can
while [ ! -e "${ROOT}" ]; do
  echo " Check root= bootarg cat /proc/cmdline"
  echo " or missing modules, devices: cat /proc/modules ls /dev"
  panic "ALERT! ${ROOT} does not exist. Dropping to a shell!"
done

# Get the root filesystem type if not set
if [ -z "${ROOTFSTYPE}" ]; then
  eval $(fstype < ${ROOT})
else
  FSTYPE=${ROOTFSTYPE}
fi
if [ "$FSTYPE" = "unknown" ] && [ -x /lib/udev/vol_id ]; then
  FSTYPE=$(/lib/udev/vol_id -t ${ROOT})
  [ -z "$FSTYPE" ] && FSTYPE="unknown"
fi

[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount"
run_scripts /scripts/local-premount
[ "$quiet" != "y" ] && log_end_msg

if [ ${readonly} = y ]; then
roflag=-r
else
roflag=-w
fi

# FIXME This has no error checking
modprobe -q ${FSTYPE}

# FIXME This has no error checking
# Mount root
mount ${roflag} -t ${FSTYPE} ${ROOTFLAGS} ${ROOT} ${rootmnt}

[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-bottom"
run_scripts /scripts/local-bottom
[ "$quiet" != "y" ] && log_end_msg
}
[/code]

And here is the contend of the mdadm script file located in /usr/share/initramfs-tools/local-top folder
[code]
#!/bin/sh
#
# Copyright © 2006 Martin F. Krafft <madduck@debian...

Hello Guys,

Reading all of your post's helped me a lot understanding linux a little be more, thanks for sharing your knowledge.

As I stated a few post above I've tried mdadm on Debian and it works the just fine, so I compared the scripts in /usr/share/initramfs-tools/scripts and found out they are not so diferent from the ones in ubuntu but still there's a diference I wanned to share with you.

When you setup mdadm on debian, a mdadm file is created in /usr/share/initramfs-tools/scripts/local-top,(I'll paste its contend below). The scripts in this folder get called from the local script file (The one you guys suggest to patch). And if you compare the debian version of the local scrip, with ubuntu's version you'll find out its pretty similar. So I guess this could be a better solution since for example you wont have to wait 180 secs, and you don't include "intrusive" code in the local script.

Here goes the debian version of the local script file
[code]
# Local filesystem mounting			-*- shell-script -*-

# Parameter: Where to mount the filesystem
mountroot ()
{
	[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-top"
	run_scripts /scripts/local-top
	[ "$quiet" != "y" ] && log_end_msg

# If the root device hasn't shown up yet, give it a little while
	# to deal with removable devices
	if [ ! -e "${ROOT}" ]; then
		log_begin_msg "Waiting for root file system..."

# Default delay is 180s
		if [ -z "${ROOTDELAY}" ]; then
			slumber=180
		else
			slumber=${ROOTDELAY}
		fi
		if [ -x /sbin/usplash_write ]; then
			/sbin/usplash_write "TIMEOUT ${slumber}" || true
		fi

slumber=$(( ${slumber} * 10 ))
		while [ ${slumber} -gt 0 ] && [ ! -e "${ROOT}" ]; do
			/bin/sleep 0.1
			slumber=$(( ${slumber} - 1 ))
		done

if [ ${slumber} -gt 0 ]; then
			log_end_msg 0
		else
			log_end_msg 1 || true
		fi
		if [ -x /sbin/usplash_write ]; then
			/sbin/usplash_write "TIMEOUT 15" || true
		fi
	fi

# We've given up, but we'll let the user fix matters if they can
	while [ ! -e "${ROOT}" ]; do
		echo "	Check root= bootarg cat /proc/cmdline"
		echo "	or missing modules, devices: cat /proc/modules ls /dev"
		panic "ALERT!  ${ROOT} does not exist.  Dropping to a shell!"
	done

# Get the root filesystem type if not set
	if [ -z "${ROOTFSTYPE}" ]; then
		eval $(fstype < ${ROOT})
	else
		FSTYPE=${ROOTFSTYPE}
	fi
	if [ "$FSTYPE" = "unknown" ] && [ -x /lib/udev/vol_id ]; then
		FSTYPE=$(/lib/udev/vol_id -t ${ROOT})
		[ -z "$FSTYPE" ] && FSTYPE="unknown"
	fi

[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount"
	run_scripts /scripts/local-premount
	[ "$quiet" != "y" ] && log_end_msg

if [ ${readonly} = y ]; then
		roflag=-r
	else
		roflag=-w
	fi

# FIXME This has no error checking
	modprobe -q ${FSTYPE}

# FIXME This has no error checking
	# Mount root
	mount ${roflag} -t ${FSTYPE} ${ROOTFLAGS} ${ROOT} ${rootmnt}

[ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-bottom"
	run_scripts /scripts/local-bottom
	[ "$quiet" != "y" ] && log_end_msg
}
[/code]

And here is the contend of the mdadm script file located in /usr/share/initramfs-tools/local-top folder
[code]
#!/bin/sh
#
# Copyright © 2006 Martin F. Krafft <madduck@debian.org>
# based on the scripts in the initramfs-tools package.
# released under the terms of the Artistic Licence.
#
# $Id: script.local-top 290 2006-12-19 08:18:50Z madduck $
#

set -eu

PREREQ="udev_helper"

prereqs()
{
	echo "$PREREQ"
}

case ${1:-} in
  prereqs)
    prereqs
    exit 0
    ;;
esac

. /scripts/functions

if [ -e /scripts/local-top/md ]; then
  log_warning_msg "old md initialisation script found, getting out of its way..."
  exit 1
fi

MDADM=/sbin/mdadm
[ -x "$MDADM" ] || exit 0

verbose()
{
  case "$quiet" in y*|Y*|1|t*|T*)
    return 1;;
  *)
    return 0;;
  esac
}

MD_DEVS=all
MD_MODULES='linear multipath raid0 raid1 raid456 raid5 raid6 raid10'
[ -s /conf/md.conf ] && . /conf/md.conf

verbose && log_begin_msg Loading MD modules
for module in ${MD_MODULES:-}; do
  if modprobe --syslog "$module"; then
    verbose && log_success_msg "loaded module ${module}."
  else
    log_failure_msg "failed to load module ${module}."
  fi
done
log_end_msg

if [ ! -f /proc/mdstat ]; then
  verbose && panic "cannot initialise MD subsystem (/proc/mdstat missing)"
  exit 1
fi

# handle /dev/md/X nodes
mkdir --parent /dev/md

CONFIG=/etc/mdadm/mdadm.conf
# in case the hook failed to install a configuration file, this is our last
# attempt... the "emergency procedure"... <drumroll>
if [ ! -e $CONFIG ]; then
  log_warning_msg "missing mdadm.conf file, trying to create one..."
  mkdir -p ${CONFIG%/*}
  echo DEVICE partitions > $CONFIG
  $MDADM --examine --scan >> $CONFIG
  if [ -s $CONFIG ]; then
    verbose && log_success_msg "mdadm.conf created."
  else
    verbose && log_failure_msg "could not create mdadm.conf, the boot will likely fail."
  fi
  MD_DEVS=all
fi

if [ "$MD_DEVS" = all ]; then
  
  verbose && log_begin_msg "Assembling all MD arrays"
  extra_args=''
  [ -n "$MD_HOMEHOST" ] && \
    extra_args="--homehost='$MD_HOMEHOST' --auto-update-homehost"
  if $MDADM --assemble --scan --run --auto=yes $extra_args; then
    verbose && log_success_msg "assembled all arrays."
  else
    log_failure_msg "failed to assemble all arrays."
  fi
  verbose && log_end_msg

elif [ "$MD_DEVS" != none ]; then
  for dev in $MD_DEVS; do

verbose && log_begin_msg "Assembling MD array $dev"
    if $MDADM --assemble --run --auto=yes $dev; then
      verbose && log_success_msg "started $dev"
    else
      log_failure_msg "failed to start $dev"
    fi
    verbose && log_end_msg

done
fi

exit 0
[/code]

Hope this works, didn't have time nor PC to try this, but maybe someone could give it a try :)

Kind Regards,

Diego Bendlin

Revision history for this message

dbendlin (diego-bendlin-hotmail) wrote on 2007-12-31:

#37

Sorry for my last long post,

The idea is to leave the local script file as its shipped by default, and only add the mdadm script file to /usr/share/initramfs-tools/local-top directory.

Finally don't forget to rebuilt the boot image issuing "sudo update-initramfs -u" as suggested by Plnt.

Kind Regards,

Diego Bendlin

Revision history for this message

Ken (ksemple) wrote on 2008-01-01:

#38

I have had a bit of a look into what Diego has found.

Took me a while, nearly gave up until I found this script: "/usr/share/initramfs-tools/hooks/mdadm". This script looks be a modified version of the one Diego has found on his Debian machine. It has the identical header and some similar code.

I would be hesitant to add the Debian script to the local-top folder. If you wanted a simple file copy solution and don't want to wait the 180seconds on the rare occasions you boot with a degraded array, create a script file with my suggested code in it and place is in the "/usr/share/initramfs-tools/scripts/init-top" folder and it will be called from the "local" script before the time delay.

Ken

Revision history for this message

dbendlin (diego-bendlin-hotmail) wrote on 2008-01-02:

#39

Guys,

I think this issue could have many work arounds as it has been proved here.
In my opinion the idea behind sharing our experiences and efforts is to improve Ubuntu so that a future version will handle this automatically for the user. "Linux for human beings", remember?

Analyzing my Debian machines I have noticed that not only mdadm scripts exist in the local-top folder, but others like lvm for example, so I guess Debian installer handles this based on the user configuration made at install time, following this direction I think a better way to deal with this mayor issue will be to improve Ubuntu's installer so it can copy a template file to local-top folder based on the user input at installation time.

Finally I hope Ubuntu developers get to read this topic in order to address this issue for the upcoming 8.04 version of Ubuntu with is near =)

Kind Regards,

Diego Bendlin

Revision history for this message

Tomas (tvinar-gmail) wrote on 2008-01-11:

#40

I have run into the same problem where my disk upgrade path included booting from temprorarily degraded array.

I have tried to update /etc/mdadm/mdadm.conf with the new file system UUIDs
(using mdadm --detail --scan and replacing the corresponding line in /etc/mdadm/mdadm.conf)
and after that update-initrd

Now the system seems to be booting without any problems.

I have also used -e 0.90 paramater in mdadm when assembling the degraded array (to create an older version of superblock that is recognized by the kernel), though I am not sure whether this had anything to do with the outcome.

Revision history for this message

Betz Stefan (encbladexp) wrote on 2008-01-16:

#41

An easy way is to edit /etc/udev/rules.d/85-mdadm.rules, an Change the "--no-degraded" to "-R".
After that "sudo update-initramfs -u -k all" an all works like assumed.

Greetings.

Revision history for this message

Ken (ksemple) wrote on 2008-01-16:

#42

I have tried replacing the "--no-degraded" option with "-R" in the past and found that sometimes the RAID sets will sometimes load degraded even when all of the drives are available. You then have to reassemble your RAID sets manually.

Cheers,
Ken

Revision history for this message

Bill Smith (bsmith1051) wrote on 2008-03-07:

#43

Can someone please post a simplified how-to for this? I've posted the following request for help on the forums but no one has responded,
http://ubuntuforums.org/showthread.php?t=716398

Revision history for this message

Bill Smith (bsmith1051) wrote on 2008-03-25:

#44

Still having this same problem on a new Hardy beta install, using 2 drives in a RAID-1 mirror. Either drive alone will just boot to a Busybox prompt after a several minute timeout.

Revision history for this message

Ken (ksemple) wrote on 2008-03-30:

#45

I have just had a problem with my Ubuntu 7.1 machine not booting again. Both disks of the RAID1 set (and the spare) seemed to be fine, but the system wouldn't boot. I just ended up at the Busybox prompt. I was able to get it going again using the following commands:
"mdadm --assemble /dev/md0 --super-minor=0 -f /dev/sda1 /dev/sdb1 /dev/sdc1"
"mdadm --assemble /dev/md1 --super-minor=1 -f /dev/sda2 /dev/sdb2 /dev/sdc2"

I have another RAID set on this machine /dev/md2 and it was fine.

A little research helped me discover a possible reason. I used the following commands to create the mdadm.conf file:
        "sudo echo "DEVICE partitions" > /etc/mdadm/mdadm.conf"
        "sudo mdadm --detail --scan >> /etc/mdadm/mdadm.conf"
It may have been the first command which created the problem I had as I had modified some partitions just before I had the boot problem. I have changed my mdadm.conf to that created by the following command:
        "mdadm --detail --scan --verbose > /etc/mdadm.conf"
Time will tell whether this solves the problem long term.

Bill, have a look at my post of 29 Dec 07. I think that tells you all you need to know. Having a look your other post, it seems that you have done everything else you need to do. I have put together a complete set of instructions for creating the bootable RAID set. I can post these if you need them.

Cheers,
Ken

Revision history for this message

G2 La Gixa (g2mula) wrote on 2008-03-31:

#46

" I have put together a complete set of instructions for creating the bootable RAID set. I can post these if you need them."

please post... would really appreciate...

Revision history for this message

Ken (ksemple) wrote on 2008-03-31:

#47

Download full text (3.9 KiB)

These instructions are notes I made for myself. They don't include the intricate details. This was done using Ubuntu 7.1 from the alternate disk.

    * Boot from the alternate disk
    * Select 'Install in text mode' install
    * Choose no on the detect keyboard layout question and enter this manually
    * Choose 'Manual' partioning
    * Create the partitions for the RAID sets (including the spare). Create two partitions on the drives, one for the system and the other for the Swap. The size of the Swap partition should be around twice the amount of RAM the machine has.
          o When creating the partitions select the use as: "physical volume for RAID" and select bootable for the system partitions.
    * Select 'Configure Software RAID' to setup the RAID sets.
          o Select 'Create MD Device'
          o RAID1
          o Number of active devices: 2
          o Number of spare devices: 1
          o Select the 2 active devices
          o Select the 1 spare device
          o Repeat for the Swap disk
          o Select Finish
    * Now the RAID sets have been created we need to partition them.
          o For the boot disk select use as 'Ext3 journaling file system' with '/' as the mount point and give the disk a label if desired.
          o For the Swap disk select use as 'swap area'
    * Select 'Finish partitioning and write changes to disk'
    * Allow the system to install and reboot.
    * Login and in a terminal window enter 'cat /proc/mdstat' to display the status of the RAID sets. The RAID sets should be sync'ing. Enter 'watch cat /proc/mdstat' to monitor the progress of the sync. You may continue to work whilst the RAID sets are sync'ing even though things will be a little slow.
    * Check that the RAID sets have been loaded correctly. Sometimes the spare doesn't get added correctly.
          o use 'sudo mdadm --query --detail /dev/md0' to see the detail on the status of an array (in this case /dev/md0).
          o use 'sudo mdadm --add /dev/md0 /dev/sdc2' to add a drive (in this case partition /dev/sdc2 is added to /dev/md0).
    * Need to ensure that either drive will boot should the other fail. Boot from the UBUNTU install disk and perform the following actions:
          o When booting from the install CD select 'Rescue a broken system'
          o Continue through the prompts until the screen 'Device to use as a root file system' is displayed.
          o Press '<alt>F2' to switch to a second console screen and press Enter to activate it.
          o Mount the md0 RAID device and use chroot and grub to install the bootloader onto both sda and sdb using the following commands:
                + mount /dev/md0 /mnt
                + chroot /mnt
                + grub
                + device (hd0) /dev/sda
                + root (hd0,0)
                + setup (hd0)
                + device (hd1) /dev/sdb
                + root (hd1,0)
                + setup (hd1)
                + quit
          o Reboot the system using '<alt>F1' to switch back to the initial console, escape and select 'Abort the instatllation'.

Note that if the spare drive is ever needed, the above actions will need to be performed again...

These instructions are notes I made for myself. They don't include the intricate details. This was done using Ubuntu 7.1 from the alternate disk.

* Boot from the alternate disk
    * Select 'Install in text mode' install
    * Choose no on the detect keyboard layout question and enter this manually
    * Choose 'Manual' partioning
    * Create the partitions for the RAID sets (including the spare). Create two partitions on the drives, one for the system and the other for the Swap. The size of the Swap partition should be around twice the amount of RAM the machine has.
          o When creating the partitions select the use as: "physical volume for RAID" and select bootable for the system partitions. 
    * Select 'Configure Software RAID' to setup the RAID sets.
          o Select 'Create MD Device'
          o RAID1
          o Number of active devices: 2
          o Number of spare devices: 1
          o Select the 2 active devices
          o Select the 1 spare device
          o Repeat for the Swap disk
          o Select Finish 
    * Now the RAID sets have been created we need to partition them.
          o For the boot disk select use as 'Ext3 journaling file system' with '/' as the mount point and give the disk a label if desired.
          o For the Swap disk select use as 'swap area' 
    * Select 'Finish partitioning and write changes to disk'
    * Allow the system to install and reboot.
    * Login and in a terminal window enter 'cat /proc/mdstat' to display the status of the RAID sets. The RAID sets should be sync'ing. Enter 'watch cat /proc/mdstat' to monitor the progress of the sync. You may continue to work whilst the RAID sets are sync'ing even though things will be a little slow.
    * Check that the RAID sets have been loaded correctly. Sometimes the spare doesn't get added correctly.
          o use 'sudo mdadm --query --detail /dev/md0' to see the detail on the status of an array (in this case /dev/md0).
          o use 'sudo mdadm --add /dev/md0 /dev/sdc2' to add a drive (in this case partition /dev/sdc2 is added to /dev/md0). 
    * Need to ensure that either drive will boot should the other fail. Boot from the UBUNTU install disk and perform the following actions:
          o When booting from the install CD select 'Rescue a broken system'
          o Continue through the prompts until the screen 'Device to use as a root file system' is displayed.
          o Press '<alt>F2' to switch to a second console screen and press Enter to activate it.
          o Mount the md0 RAID device and use chroot and grub to install the bootloader onto both sda and sdb using the following commands:
                + mount /dev/md0 /mnt
                + chroot /mnt
                + grub
                + device (hd0) /dev/sda
                + root (hd0,0)
                + setup (hd0)
                + device (hd1) /dev/sdb
                + root (hd1,0)
                + setup (hd1)
                + quit 
          o Reboot the system using '<alt>F1' to switch back to the initial console, escape and select 'Abort the instatllation'.

Note that if the spare drive is ever needed, the above actions will need to be performed again to load the boot system onto this spare drive.

* Check the status of the RAID sets once the system boots up again.

To ensure that the system will boot with a degraded RAID array follow these instructions:

Edit "/usr/share/initramfs-tools/scripts/local" and find the following comment "# We've given up, but we'll let the user fix matters if they can".

Just before this comment add the following code:

# The following code was added to allow degraded RAID arrays to start
	if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; then
		# Try mdadm and allow degraded arrays to start in case a drive has failed
		log_begin_msg "Attempting to start RAID arrays and allow degraded arrays"
			/sbin/mdadm --assemble --scan
		log_end_msg
	fi

Rebuild the boot image using "sudo update-initramfs -u"

Hope this helps,
Ken

Revision history for this message

brunus (reg-paolobrunello) wrote on 2008-04-12:

#49

Hi Ken,
thank you very much for your precious effort! It did worked for me on a Edubuntu 7.10, which I assume to be as an Ubuntu 7.10 server.

BUT:
1. as I tried to enter the shell in rescue mode, the chroot command was not recognized by busybox telling me that /bin/sh was not found
So I gave up installing grub that way, I did it while up and running as explained here: http://users.piuha.net/martti/comp/ubuntu/en/raid.html
Or you can use the Super Grub Disk. http://www.supergrubdisk.org/
2. I had to wait for the boot sequence to go in timeout. Just to let know people to wait long enough before be driven in despair.

thanks again!

brunus

Jens (jens.timmerman) on 2008-04-29

Changed in mdadm:
status:	New → Confirmed

Revision history for this message

Alexander Dietrich (adietrich) wrote on 2008-05-23:

#50

Many thanks to everyone working on this! I tried Ken's solution and it worked with either disk attached. I have a lot more confidence putting my server into "production" now. :)

Revision history for this message

Alexander Dietrich (adietrich) wrote on 2008-05-26:

#51

Just a thought: should the package of this bug be changed to initramfs-tools?

Revision history for this message

Pete Hardman (peter-ssbg) wrote on 2008-06-03:

#52

The fix suggested Ken on 2007-12-29 worked fine for me on 7.10 on a Dell PE840, but the selfsame patch doesn't work now I've upgraded to 8.04 (yes, I have re-applied the patch!). Boot drops through to Busybox after (my reduced) 10 sec timeout, but the good news is that I can continue booting with 'mdadm --assemble --scan' and then Ctrl+D. So I suppose that the test line is not working as expected, on my machine at least. Since this appears to be an initramfs-tools thing I've taken the liberty of adding that to the 'affects' in the hope that someone in the development team can get around to fixing this. Unfortunately it seems to be up to the project manager to say whether this bug is 'important' . That possibly depends on whether he is running a RAID array and has had a drive fail ;-)

Pete

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-06-03:

#53

You are probably correct, that some changes may be needed to initramfs-tools.

I will be working on this for Intrepid. See:
* Blueprint: https://blueprints.edge.launchpad.net/ubuntu/+spec/boot-degraded-raid
* Specification: https://wiki.ubuntu.com/BootDegradedRaid

:-Dustin

Changed in initramfs-tools:
assignee:	nobody → kirkland
status:	New → Confirmed
Changed in mdadm:
assignee:	nobody → kirkland
importance:	Undecided → Medium
milestone:	none → ubuntu-8.10

Revision history for this message

Alexander Dietrich (adietrich) wrote on 2008-06-04:

#54

Excellent, thanks!

Revision history for this message

Bill Smith (bsmith1051) wrote on 2008-06-04:

#55

didn't a previous poster say that this works fine in the current Debian? So maybe the place to start is to identify what was changed from the 'base' Debian scripts, and why.

Revision history for this message

Davias (davias) wrote on 2008-06-26:

#56

So, just to summarize, we are stuck to ubuntu 7.10?

I would like to upgrade to 8.04 for many reasons, but to risk to find myself with the same problem as before is unacceptable.

Do you guys think situation will improve by the time of 8.10 ?

Do you also find the following procedure suitable:

1) unplug 1 drive from RAID 1
2) update ubuntu from 7.10 to 8.04 with only one drive in the array
3) verify that everything works; if yes...
4) add the drive as spare and reassemble
5) No, upgrade is not good; then...
6) plug back the old drive and start the machine with that and reassemble the other as spare...

Help? Suggestions?

TIA

Revision history for this message

ceg (ceg) wrote on 2008-06-30:

#57

According to the man page the mdadm in 8.04 supports the --incremental mode for use in hotplug setups.
(like the udev rules driven assembly in ubuntu)

https://wiki.ubuntu.com/HotplugRaid

Revision history for this message

ceg (ceg) wrote on 2008-07-02:

#58

I added notes on how to adjust the ubuntu hotplug setup to cope with raid devices at:

https://wiki.ubuntu.com/BootDegradedRaid

See under "Implementation".

Revision history for this message

Doug Minderhout (doug-minderhout) wrote on 2008-07-04:

#59

I have been fighting installing a RAID setup with 8.04 + GRUB for about 20 hours. My old setup used a CF card mounted as /boot but I can't get that to work with some newer hardware and with grub and 8.04. This worked fine under 6.x and 7.x... So I give up and set up a md RAID 1 partition as ext2 as /boot at the start of my disk, I get grub installed on both drives no problem and now I hit this. It took a long time to figure out what was going on as I thought it was hanging because the scripts take so long to time out... it needs to say something like "waiting for root device to spin up" if it hits the part of the script looking for a removable root device so you can quickly identify what is going on.

I find it amazing that it has been going for a full year. The issue I have is that I am using RAID and redundant disks for my boot/root partition so that I can be hardware agnostic and not have to worry about getting the system up and going if I think my hardware RAID card is flaking out. If I can't boot the thing with a failed disk without having to google up the commands to get the array going it's just about worthless.

If someone participating in this bug has a good description of how to get it to work automatically with 8.04 I would really appreciate it if they could post either the modified initramfs scripts or whatever it takes to get it working.

Revision history for this message

Hopey (esperanza-glass) wrote on 2008-07-10:

#60

This was indeed very unpleasant bug to discover after spending money to raid disks, new server hardware. Now that the server is up and running, doing it all again with Debian doesn't feel that great option either. I hope I get it booting correctly with some instructions given here. Otherwise I guess I'm forced to change distribution or then feel that it was money wasted building raid. :(

Revision history for this message

Alexander Dietrich (adietrich) wrote on 2008-07-10:

#61

Patch for initramfs-tools as suggested by Ken on 2008-03-31 Edit (682 bytes, text/x-diff)

I have created a patch for the initramfs-tools script as suggested by Ken in this comment:

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/120375/comments/47

In order to apply it (as root), "cd /usr/share/initramfs-tools/scripts", "patch < initramfs-mdadm-assemble-scan.patch", then "update-initramfs -u".

With an initrd patched in this manner, I was able to disconnect either of my harddrives and still boot my server without manual intervention. The patch does not address the generous wait for the missing harddrive, so give it some time.

Revision history for this message

mspIggy (intoitall) wrote on 2008-07-14:

#62

i have been doing battle with this for 10 days on and off on a new server ...

before i put this on the live line i need this to work -- it does not --

8.04 server edition i just wiped my install, started fresh, again...

after the install finished i added root user changed to su

cd /usr/share/initramfs-tools/scripts
and ran
patch < initramfs-mdadm-assemble-scan.patch

fails - says file does not exist ???

last time i did patch line by line - that also did not work for me... this patch was not up yet

now what - i am about to go to freeBSD i do not want tu use debian
i installed no apps but ssh

pmrfco@service12:/$ su
Password:
root@*****:/# cd /usr/share/initramfs-tools/scripts
root@*****:/usr/share/initramfs-tools/scripts# patch < initramfs-mdadm-assemble-scan.patch
bash: initramfs-mdadm-assemble-scan.patch: No such file or directory
root@*****:/usr/share/initramfs-tools/scripts# dir
casper-premount init-premount local-bottom nfs nfs-top
functions init-top local-premount nfs-bottom
init-bottom local local-top nfs-premount
root@*****:/usr/share/initramfs-tools/scripts#

Revision history for this message

Alexander Dietrich (adietrich) wrote on 2008-07-14:

#63

Correct my if I'm wrong, but it looks like the patch file is not actually in "/usr/share/initramfs-tools/scripts"? If you downloaded the file to any other location, you need to specify the full path when executing "patch".

Revision history for this message

ar (ar-renovabis) wrote on 2008-07-15:

#64

I successfully installed raid 1 on an Ubuntu Desktop 8.04 64-bit machine according to Martiis intructions:

http://users.piuha.net/martti/comp/ubuntu/en/raid.html

The only thing missing in Martiis instructions was

sudo update-initramfs -u (thx to Alexander)

after inserting Kens lines

# The following code was added to allow degraded RAID arrays to start
if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; then
  # Try mdadm and allow degraded arrays to start in case a drive has failed
  log_begin_msg "Attempting to start RAID arrays and allow degraded arrays"
   /sbin/mdadm --assemble --scan
  log_end_msg
fi

into /usr/share/initramfs-tools/scripts/local script just before the comment "# We've given up, but we'll let the user fix matters if they can" as described in Kens posting.

Revision history for this message

wally.nl (ubuntu-general-failure) wrote on 2008-07-15:

#65

This also affects raid5 (seems logical) but the patch doesn't seem to work for me when also using lvm. My test setup:

  sda1 100Mb /boot
  sda2 raid
  sdb1 100Mb (manual copy of /boot)
  sdb2 raid
  sdc1 100Mb (manual copy of /boot)
  sdc2 raid

/dev/md0 is a raid5 using sda2/sdb2/sdc2, on that is a volume group 'vg' with 2 logical volumes 'lv0' which is / and lv1 which is swap.

when removing a disk (lets say disk 3/sdc) I see the patch kick in and start the degraded raid5 with 2 disks but from there it's downhill:

Found volume group "vg" using metadata type lvm2
ALERT! /dev/mapper/vg-lv0 does not exist. Dropping to a shell!

but /dev/mapper/vg-lv0 and /dev/mapper/vg-lv1 DO seem to exist. Re-adding the drive and rebooting works fine, is this another 'feature' of the /usr/share/initramfs-tools/scripts/local script ?

Revision history for this message

Rudolf Adamkovic (salutis) wrote on 2008-07-22:

#66

I recommend run '/sbin/mdadm --assemble --scan' command _before_ waiting for root device because it's faster. Second thing I discovered is call of 'sync', 'udevtrigger' and 'udevsettle' commands before and after 'mdadm'. This configuration works well also with root filesystem on LVM which on top of RAID array.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-25:

#67

I have a pair of debdiffs which will be posted in the following two messages. One for mdadm, and a second for initramfs-tools.

The mdadm patch supports an optional kernel parameter, which can be any of:
* bootdegraded
* bootdegraded=true
* bootdegraded=yes
* bootdegrade=1

If a RAID device containing the root filesystem fails to start, the mdadm failure hooks will eventually run. When they do, the kernel command line will be checked for a valid 'bootdegraded' flag. If it is set affirmatively, it will run:
* mdadm --assemble --scan --run
which will attempt to start the array in a degraded mode.

I have also changed the printed error message to direct the user to use "bootdegraded=true".

----

For the failure hooks to execute before the init scripts bail out, I have a patch to initramfs-tools.

Here, I split out the mountroot-fail.hooks code to its own independently callable function, call_failure_hooks(). And the panic() function conditionally invokes call_failure_hooks().

Furthermore, the local script invokes the call_failure_hooks() function *before* it gives up on finding the root device.

I have packages available for testing in my PPA.
* https://launchpad.net/~kirkland/+archive

:-Dustin

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-25:

#68

mdadm.120375.debdiff Edit (1.7 KiB, text/plain)

mdadm patch.

:-Dustin

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-25:

#69

initramfs-tools.120375.debdiff Edit (2.7 KiB, text/plain)

initramfs-tools patch.

:-Dustin

Changed in mdadm:
status:	Confirmed → Triaged

Revision history for this message

RpR (tom-lecluse) wrote on 2008-07-25:

#70

How can I apply them?

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-25:

#71

initramfs-tools.120375.debdiff Edit (5.6 KiB, text/plain)

This is an improved patch for initramfs-tools, with considerable assistance from Kees Cook. Thanks Kees!

I have an updated package in my PPA:
* https://launchpad.net/~kirkland/+archive

:-Dustin

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-25:

#72

mdadm.120375.debdiff Edit (1.9 KiB, text/plain)

And this mdadm patch supercedes the previous one. It uses the new initramfs-tools interface for adding the fail hook.

Updated package in my PPA:
* https://launchpad.net/~kirkland/+archive

:-Dustin

Kees Cook (kees) on 2008-07-25

Changed in initramfs-tools:
status:	Confirmed → In Progress
Changed in mdadm:
status:	Triaged → In Progress
Changed in initramfs-tools:
assignee:	nobody → kirkland
status:	New → In Progress
assignee:	kirkland → nobody
status:	In Progress → Confirmed
importance:	Undecided → Medium
milestone:	none → ubuntu-8.10

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-25:

#73

Testing instructions for Intrepid are available in a wiki page for a specification covering this bug:
* https://wiki.ubuntu.com/BootDegradedRaid#head-a5a91db34505d4a047fd7f30e44ac2020da369a6

:-Dustin

Revision history for this message

ceg (ceg) wrote on 2008-07-26:

#74

Hi, some feedback about the patches.

Does it matter if call_failure_hooks is allways called?

"mdadm --run --scan" will start all arrays degraded not only the one needed for the rootfs (and its lvm and crypt respectively).
It will start for example a partial array from a removable disk that was attatched and hasn't been forgotten when the computer was turned off. Remember starting an array degraded means it will also start degraded next time. To prevent accidential degration and sideeffect just start the required arrays with "mdadm --run /dev/mdX" or even better by UUID.

This means of course we need to know the required array in initramfs.

Another point are required non-root filesystems, like a /home array that needs to startdegraded.

We need a way to configure non-root raid devices to startdegraded anyway, so this configuration could as well be used (hooked into initramfs like the cryptroot script does) for the root device. (The bootdegraded kernel parameter might not be neccesary then.)

A regular init.d script (like the former mdadm-raid or mdadm-degreade) can handle the non-root startdegraded devices.

Instead of fixed sleep 5 in init-premount/mdadm, you could introduce a check & timeout loop after call_failure_hooks in the local script. (So we can resume booting ASAP.) Maybe just copy the slumber while loop from above the call_failure_hooks.

The error output could be more informative:
Instead of "Check rootdelay= (did the system wait long enough?)"
-->"If rootdelay=${VARIABLE} is not long enough, customize kernel parameter."

I have updated the comments on
https://wiki.ubuntu.com/BootDegradedRaid#head-59fd70893e835c25f3c7a40741e1b24ad6066a64

Revision history for this message

ceg (ceg) wrote on 2008-07-26:

#75

After

# We've given up,
A while loop is calling panic. An if statement might look better.

Revision history for this message

Kees Cook (kees) wrote on 2008-07-26:

#76

On Sat, Jul 26, 2008 at 01:15:08AM -0000, ceg wrote:
> # We've given up,
> A while loop is calling panic. An if statement might look better.

No, the while loop is correct -- it lets the user attempt to fix the
system repeatedly and on failures, it will drop to a shell again.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-26:

#77

On Fri, Jul 25, 2008 at 8:08 PM, ceg <email address hidden> wrote:
> Instead of fixed sleep 5 in init-premount/mdadm, you could introduce a check & timeout loop
> after call_failure_hooks in the local script. (So we can resume booting ASAP.) Maybe just copy
> the slumber while loop from above the call_failure_hooks.

I removed the "sleep 5" in a later patch. Please make sure your
comments are with respect to the latest patch attached to this bug.

:-Dustin

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-26:

#78

On Fri, Jul 25, 2008 at 8:08 PM, ceg <email address hidden> wrote:
> "mdadm --run --scan" will start all arrays degraded not only the one needed for the rootfs (and
> its lvm and crypt respectively).
> It will start for example a partial array from a removable disk that was attatched and hasn't been
> forgotten when the computer was turned off. Remember starting an array degraded means it will
> also start degraded next time. To prevent accidential degration and sideeffect just start the
> required arrays with "mdadm --run /dev/mdX" or even better by UUID.
...
> Another point are required non-root filesystems, like a /home array that
> needs to startdegraded.
...
> We need a way to configure non-root raid devices to startdegraded
> anyway, so this configuration could as well be used (hooked into
> initramfs like the cryptroot script does) for the root device. (The
> bootdegraded kernel parameter might not be neccesary then.)

I agree that my patch might start more degraded raids than someone may
necessarily want, but the current bug and the current issue at handle
is about providing a mechanism to configurably allow booting if a RAID
is missing a disk. This bug already has 73+ comments, and has been
around for over a year. Let's fix at least the immediate problem at
hand.

What you're talking about is going to be more complicated, and can be
handled at a later time, by subsequent patches. How about opening a
separate bug for independently allowing/disallowing each array to be
started in degraded mode?

> This means of course we need to know the required array in initramfs.

Well, that part is doable for the root filesystem at least. It's
${ROOT} within the initramfs scripts.

:-Dustin

Revision history for this message

ceg (ceg) wrote on 2008-07-26:

#79

Download full text (4.2 KiB)

Thank you for your comments and I'm sorry me slowly commenting overlapped with your updated patches. (Of course you just worked to fast ;-), and fixed things beforehand, perfect. :)

It's good to see that happen, and I've seen the merrit of the while loop in intrepid now. It looks much better than in hardy.

Yes, lets fix things avoiding immeadiate pittfalls.
Lets think crucial things like initramfs through while we are at it. I'll try to shake the ends over here that together we'll get those bugs thoroughly out of our puppy.

>> This means of course we need to know the required array in initramfs.

>Well, that part is doable for the root filesystem at least. It's
${ROOT} within the initramfs scripts.

${ROOT} may be the md device that needs to be run degraded in the corner case of a non-partitionable, non-crypted, non-lvm, md device.

>What you're talking about is going to be more complicated, and can be handled at a later time, by subsequent patches.

Yes, initramfs setup doesn't seem to be easy to me. I'd like to make much later work unneccesary.

> How about opening a separate bug for independently allowing/disallowing each array to be started in degraded mode?

It's more than that. For example the --incremental option is neccessary, because there is this nasty Bug #244792 waiting, (For that reason I have gathered all that in the BootDegradedRaid comments.)

---

I tested the changes and a crypted root does not yet come up degraded.

The init-premount/mdadm script is responsible to set up arrays. (the local script depends on it) With udev+mdadm init-premount/mdadm doesn't have to do more then register its mountfail hook and degrade arrays if they are not available after a while, if requested.

init-premount/mdadm needs to degrade the array before local-top/cryptroot.

I see two changes for package mdadm.
-> in init-premount/mdadm:mountroot_fail() run the array degraded imediately (implemented by ppa9). This will start degraded arrays that are made of several _crypt devices, after the rootdelay.
-> Regular degraded arrays should be run degraded directly in init-premount/mdadm, if a "while array not active" slumber loop timed out. (not yet implemented in ppa9)

Here is why:
With the second change local-top/cryptroot can find its source device.

Cryptsetup is not triggered by udev (and can not be, because non-luks partitions can not trigger events), its run by local-top/cryptroot. It doesn't make much sence to enter local-top/cryptroot with an inactive array, only to let it fail and then wait rootdelay to see that mounting the rootfs also fails and then try to recover with another rootdelay.

The rootdelay is there to wait for removable root devices. But encrypted root devices won't show up.

(Its another bug 251164, but serves the understanding:
local-top/cryptroot fails even with full array present when the disk has not triggerd an udev event. This is because cryptroot does not have its own checking loop, just as mdadm does not have its own checking loop yet. Only mounting root has its checking loop. Mixing one into another is not such a good idea because dependencies get lost and no while condition is adequate for all waiting invo...

Thank you for your comments and I'm sorry me slowly commenting overlapped with your updated patches. (Of course you just worked to fast ;-), and fixed things beforehand, perfect. :)

It's good to see that happen, and I've seen the merrit of the while loop in intrepid now. It looks much better than in hardy.

Yes, lets fix things avoiding immeadiate pittfalls.
Lets think crucial things like initramfs through while we are at it. I'll try to shake the ends over here that together we'll get those bugs thoroughly out of our puppy.

>> This means of course we need to know the required array in initramfs.

>Well, that part is doable for the root filesystem at least. It's
 ${ROOT} within the initramfs scripts.

${ROOT} may be the md device that needs to be run degraded in the corner case of a non-partitionable, non-crypted, non-lvm, md device.

>What you're talking about is going to be more complicated, and can be handled at a later time, by subsequent patches.

Yes, initramfs setup doesn't seem to be easy to me. I'd like to make much later work unneccesary.

> How about opening a separate bug for independently allowing/disallowing each array to be started in degraded mode?

It's more than that. For example the --incremental option is neccessary, because there is this nasty Bug #244792 waiting, (For that reason I have gathered all that in the BootDegradedRaid comments.)

---

I tested the changes and a crypted root does not yet come up degraded.

The init-premount/mdadm script is responsible to set up arrays. (the local script depends on it) With udev+mdadm init-premount/mdadm doesn't have to do more then register its mountfail hook and degrade arrays if they are not available after a while, if requested.

init-premount/mdadm needs to degrade the array before local-top/cryptroot.

I see two changes for package mdadm.
-> in init-premount/mdadm:mountroot_fail() run the array degraded imediately (implemented by ppa9). This will start degraded arrays that are made of several _crypt devices, after the rootdelay.
-> Regular degraded arrays should be run degraded directly in init-premount/mdadm, if a "while array not active" slumber loop timed out. (not yet implemented in ppa9)

Here is why:
With the second change local-top/cryptroot can find its source device.

Cryptsetup is not triggered by udev (and can not be, because non-luks partitions can not trigger events), its run by local-top/cryptroot. It doesn't make much sence to enter local-top/cryptroot with an inactive array, only to let it fail and then wait rootdelay to see that mounting the rootfs also fails and then try to recover with another rootdelay.

The rootdelay is there to wait for removable root devices. But encrypted root devices won't show up.

(Its another bug 251164, but serves the understanding:
 local-top/cryptroot  fails even with full array present when the disk has not triggerd an udev event. This is because cryptroot does not have its own checking loop, just as mdadm does not have its own checking loop yet. Only mounting root has its checking loop. Mixing one into another is not such a good idea because dependencies get lost and no while condition is adequate for all waiting involved.)

If a missing encrypted array member shows up 1 second into the rootdelay, cryptroot can't set up the root device before the failure hooks kick in, the root device won't show up for sure.
(And maybe we don't want the cryptsetup to be part of a loop with unlimited trys.)
When instad mdadm waits for the raid device to become active, local-top/cryptroot will start to set the root device up a second later and rootdelay would exit immediately.

On my testing system (8.04) local-top/cryptroot is not run at all after the array is degraded, and the root mounting fails. (cryptroot has no failure hooks) Ok, cryptsetup is a different package producing the issue at hand, but we are not yet done with package mdadm.

With manual cryptsetup and mounting of the rootfilesystem, the system still did not come up degraded. The /home raid isn't set up. Of course this part of "cannot boot raid1 with only one disk"  can be dealt with in /etc/init.d/mdadm-raid with a "while array inactive" delay loop.

---
When the confusion sets in, orchestred or not, recall.
Do stay on top. Fix the system, not the numberd bugs.
Orientation is your mastership.

Revision history for this message

Doug Minderhout (doug-minderhout) wrote on 2008-07-27:

#80

From my perspective, what needs to happen is that the issue of booting from a degraded root on kernel software raid needs a simple, easy to implement fix for those who run fixed configuration machines, I.E. servers and the like.

The provided patches seem to accomplish this goal with a boot parameter that an individual that administers/manages servers would be able to apply. Furthermore, it would be reasonable to me that the server distribution have that boot parameter set as a default to minimize the number of surprises.

Someone running a server system is unlikely to be using external storage that is not for backup of some sort or some type of more reliable hardware configuration (SAS, SCSI etc...) and can be considered on their own recognizance for doing weird things like making a raid stripe of firewire drives or something bizarre like that.

The issue of a desktop system booting with missing raid elements is another issue. The same workaround should be available for those who choose to run their root on software raid disks with the aforementioned caveats. It would of course be quite nice to have some sort of iterative y/n/r prompt sort of setup for filesystems listed in /etc/fstab so that a marginally technical user could interactively make a decision about continuing to bring the system up with degraded disks or a more technical user could set the kernel parameter, again at their own risk.

This discussion, while absolutely necessary, is somewhat political and outside the scope of this bug and belongs in another bug if for no other reason than to close the bug and stabilize the distro for those who are running business functions on it.

I am using Alexanders patch and will switch over to Dustin's as I have time. Thank you both for your contributions.

Revision history for this message

ceg (ceg) wrote on 2008-07-27:

#81

I think it can be understood that using "mdadm --assemble --scan" instead of doing selective "mdadm --run <device>" calls on all devices ${ROOT} depends on could be a reasonable and quick fix, if the information is not available somwhere allready. Full support here.

My interest lies in getting the algorithm right, so that it will work out for most if not all possible ways of udev,raid,crypt,lvm, etc. stacked configs. And we won't hit a whole waterfall of subsequent bugs.

Also configuring selective arrays to be run or not run degraded might be somthing that can be postponed. The list of md devices that are needed to fully bring up a server or desktop system consists of the md devices on which the devices listed in fstab depend on.

When a disk fails in a running system, it doesn't stop by default, and probably should't. This is the reason for the mdadm notification functionality. So defaulting to bootdegraded=yes seems reasonable, once the patches are tested to work with enough configurations, and will be safe if no other md devices than the ones neccessary to set up the fstab are touched.

https://wiki.ubuntu.com/BootDegradedRaid updated

Revision history for this message

ceg (ceg) wrote on 2008-07-28:

#82

I just noticed that with slowly-appearing drives (with md devices on it) the md device will not exist on the regular run of init-premount/mdadm. But since introducing another waiting loop into the coldplug driven boot does not make much sence it'll be preferable to just do degration with failure hooks after a full rootdelay.

That means that with a degraded md array two wainting loops will have to time out. (the one in local-top/cryptroot and the rootwait in local) But this is now an issue of local and local-top/cryptroot.

So the remaining list for package mdadm is down to the array degration of md devices on which the fstab depends in /etc/init.d/mdadm.

BTW /usr/share/initramfs-tools/hooks/cryptroot contains code to determine cryptdevices that the rootdevice depends on. Similar approach may work to find md devices the rootdevice depends on. It may help to identify the particular arrays to degrade and solve this bug in a way that does not introduce new problems.

Revision history for this message

tamoihl (tamoihl1) wrote on 2008-07-28:

#83

unsubscribe <email address hidden>

Revision history for this message

Launchpad Janitor (janitor) wrote on 2008-07-28:

#84

This bug was fixed in the package initramfs-tools - 0.92bubuntu7

---------------
initramfs-tools (0.92bubuntu7) intrepid; urgency=low

  * scripts/functions:
    - add_mountroot_fail_hook(): set up symlinks to hooks in tmp directory
    - try_failure_hooks(): new function that executes each fail hook in
      lexographic order, exiting successfully if a hook succeeds
    - panic(): remove the fail hook calling logic, now provided by
      try_failure_hooks()
  * scripts/local:
    - get_fstype(): FSTYPE must be initialized to "unknown" in case the fstype
      call does not set it
    - root_missing(): new function to provide the oft-called set of checks on
      the ROOT device
    - mountroot(): use root_missing(), and try the failure hooks before giving
      up
  * Fixes required for (LP: #120375), lots of help from Kees Cook, thanks!

-- Dustin Kirkland <email address hidden> Fri, 25 Jul 2008 17:34:21 -0500

Changed in initramfs-tools:
status:	In Progress → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2008-07-28:

#85

This bug was fixed in the package mdadm - 2.6.7-3ubuntu2

---------------
mdadm (2.6.7-3ubuntu2) intrepid; urgency=low

  [ Dustin Kirkland ]
  * debian/initramfs/init-premount: If mounting of root fails, allow the user
    to optionally boot in degraded mode (LP: #120375).

[ Kees Cook ]
* debian/initramfs/init-premount: silently scan for failed arrays.

-- Kees Cook <email address hidden> Mon, 28 Jul 2008 13:03:47 -0700

Changed in mdadm:
status:	In Progress → Fix Released

Revision history for this message

RpR (tom-lecluse) wrote on 2008-07-28:

#86

How can you install the solution.
Tried a sudo apt-get -y update && sudo apt-get -y upgrade

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-07-28:

#87

On Mon, Jul 28, 2008 at 3:41 PM, RpR <email address hidden> wrote:
> How can you install the solution.
> Tried a sudo apt-get -y update && sudo apt-get -y upgrade

This has been fixed in the development tree of Ubuntu Intrepid for
further testing. It's not available for Hardy (or anything prior to
Intrepid, rather) at this time.

If you would like to test this in Ubuntu Intrepid, follow the
instructions on this wiki page:
* https://wiki.ubuntu.com/BootDegradedRaid#head-140b6fbe658cc11fc2710fa57b826dbf374abb69
--
:-Dustin

Revision history for this message

ceg (ceg) wrote on 2008-07-29:

#88

4 of the 5 bugs taged as duplicates use a separate home array,
the fix will not let such systems boot degraded.

WARNING:
The mdadm - 2.6.7-3ubuntu2fix uses "mdadm --assemble --scan" on a system based on udev hotplugging.

It will start all partly attatched arrays in degraded mode.

When any members of the arrays that got degraded comes available (very likely for arrys that got started degraded only as a sideeffect) you risk Bug #244792 corrupting your data.
(As long as ubuntu still uses --assemble --scan --no-degraded in the udev rule.)

Revision history for this message

Ross Becker (ross-becker) wrote on 2008-08-10:

#89

One additional comment on this; this bug will also prevent a system from booting if the array is reshaping (as opposed to missing a disk)

Revision history for this message

rihad (rihad) wrote on 2008-08-10: unsubscribe

#90

unsubscribe

Revision history for this message

Michael Kofler (michael-kofler) wrote on 2008-08-12:

#91

when using RAID *and* LVM on Ubuntu 8.04 (and with / on LVM), I had to add two more lines to
/usr/share/initramfs-tools/scripts/local to give LVM time to get up

...
fi

# The following code was added to allow degraded RAID arrays to start
if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; then
  # Try mdadm and allow degraded arrays to start
  # in case a drive has failed
  log_begin_msg "Attempting to start degraded RAID arrays"
    /sbin/mdadm --assemble --scan

    # give LVM some time to initialize!
    /sbin/udevadm trigger
    /sbin/udevadm settle

log_end_msg
fi

# We've given up, but we'll let the user fix matters if they can
...

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-08-12:

#92

Michael-

Is there any chance you can test the functionality on Intrepid?

Also, would you please open a new bug, and reference this one? With 91+ comments, these bugs tend to spiral out of control and cover way too many independent though possibly related problems.

:-Dustin

Revision history for this message

Ace Suares (acesuares) wrote on 2008-08-14:

#93

Comment 91 (https://bugs.launchpad.net/ubuntu/+bug/120375/comments/91) worked for me.
I didn't need the extra lvm stuff but it can't hurt.

I was booting from /dev/sdb1, which was the 'failed' drive from /dev/md0.

In /etc/fstab, I had /dev/md0 als /, and it shows up as such in 'df'.

But /dev/md0 was not really mounted as root; is was /dev/sdb1 altough it *says* it's /dev/md0. This is a bug I think.

So I made the changes to the initramfs script, and did update-initramfs -u. BUT THAT HAPPENS ON /dev/sdb1 !
So I had to do this:

mount -t ext3 /dev/md0 /mnt
cd /mnt
chroot .
# Change the script in /usr/share/initramfs-tools/scripts/local !!!!! the previous change was on /dev/sdb1 !!!
update-initramfs -u
exit # from the chroot
reboot

This took me ten hours or more (linux-haters !)

Doesn't anyone think it's *really bad* that all 8.04 installs fail horribly when the root raid-array degrades !? Those machines won't boot at all when that happens !

And it's not going to be fixed in 8.04 ? How can this be !?

Revision history for this message

RpR (tom-lecluse) wrote on 2008-08-14:

#94

This is a reason why a lot of system administrators that I know went back to debian which doesn't have this behavior.
For a LTS version it should be fixed.

Revision history for this message

Ace Suares (acesuares) wrote on 2008-08-14: Re: [Bug 120375] Re: cannot boot raid1 with only one disk

#95

On Thursday 14 August 2008, RpR wrote:
> This is a reason why a lot of system administrators that I know went
> back to debian which doesn't have this behavior. For a LTS version it
> should be fixed.

It's extremely weird that the Debian distro doesn't have this bug (or
behaviour) and Ubuntu has. But dapper didn't have it; so switching from
dapper to hardy burns you -without- knowing it. Because your md's have
been working great for years, and they where up when you upgraded.

Now then, when one of your root disks fails, you feel safe - you get an
email and go to the datacenter and try to reboot. Bummer. And I dare you
to be able to bring it up again - it won't be easy unless you brought
some instructions!

The community brouhgt forward a possible solution within this thread -
it's not that hard to fix, is it ?

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-08-18:

#96

mdadm.boot_degraded_debconf.debdiff Edit (27.1 KiB, text/plain)

The attached patch adds a debconf question, prompting the user as to the desired behavior on a newly degraded RAID...

"To boot, or not to boot, that is the question."

The priority is currently set to medium, with the default being BOOT_DEGRADED=false, which is the conservative/traditional behavior.

Perhaps that priority could be raised to high, such that the user would actually be prompted with this question in the installer. I leave that for another discussion, and another bug, please.

:-Dustin

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-08-18:

#97

mdadm.boot_degraded_debconf.debdiff Edit (27.2 KiB, text/plain)

Updated patch. Only difference from the previous is that I added a line to the change log about the debian/po/*.po files changed.

:-Dustin

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-08-18:

#98

This bug is considerably overloaded.

Thus, I have split the mdadm/debconf/boot_degrade bug/patch to:
* Bug #259127

:-Dustin

Revision history for this message

ceg (ceg) wrote on 2008-08-18:

#99

Dustin, you could have open a separate bug with your debconf patch to track your work and at the same time keep the comunity updated and able to give you feedback in a more organized manner.

> the default being BOOT_DEGRADED=false, which is the conservative/traditional behavior.

Only some may consider this "conservative" behaviour a broken behavior, when a system on a "redundant array of independent disks" will degrade just fine when running, but won't even come up when booting.

The reason for all this restrictivity with starting arrays comes from those start up scripts that use(ed) "mdadm --assemble --scan" to start arrays. Those run whatever (partially) connected arrays they can get hold of (in degraded mode).

IMHO the right thing for start up scripts to do is to only start arrays that are needed to set up the root device and fstab, and degrade only them after a timeout.
(/usr/share/initramfs-tools/hooks/cryptroot contains code to determine devices that the root device depends on.)

Hotplugging can start any arrays afterwards, if it is completely attatched.

The homehost "feature" is one suboptimal attempt to restrict array assembly. Same with the restriction with DEVICE or ARRAY definitions in mdadm.conf. Such restrictions add extra configuration burdens and should not be necessary with start up scripts that just correctly honor the root device and fstab information.

In fact the homehost, and ARRAY restrictions prevent the hotplugging from beeing any better than manual configuration. Arrays still have to be configured in mdadm.conf (Bug #252345) .

Revision history for this message

ceg (ceg) wrote on 2008-08-18:

#100

Wow, Dustin we again wrote or comments almost simultaneously. Thank you for separating the issue.

Revision history for this message

Ace Suares (acesuares) wrote on 2008-08-18:

#101

>
> Only some may consider this "conservative" behaviour a broken behavior,
> when a system on a "redundant array of independent disks" will degrade
> just fine when running, but won't even come up when booting.

Even the more when you have upgraded from dapper LTS to hardy LTS and you
*think* you are all peachy. Big surprise the first time when the root
array degrades. Very unacceptable by all means.

Thanks for adressing this tough. But is it going to be worked into an
upgrade for the system or do we have to manually patch ? And if it's an
upgrade, will I be suffciently warned now I have patched a workaround
into the iniramfs scripts ?

Dang !

Revision history for this message

ceg (ceg) wrote on 2008-08-18:

#102

I have separated out the issue of booting with degraded non-root arrays into Bug# 259145.

As a user merely helping to gather info I can't say if updates to hardy (ubuntu 8.04) will be made available. Dustin?

Revision history for this message

Ross Becker (ross-becker) wrote on 2008-08-18:

#103

One more comment on this; I can understand the "conservative" behaviour, but when an array was degraded, the boot process halted with NO MESSAGE indicating what was wrong. Please whatever you do, even in the case of maintaining "conservative" behavior, emit a message telling the user why you're not continuing the boot process.

Revision history for this message

Ace Suares (acesuares) wrote on 2008-08-18:

#104

On Monday 18 August 2008, Ross Becker wrote:
> One more comment on this; I can understand the "conservative"
> behaviour, but when an array was degraded, the boot process halted with
> NO MESSAGE indicating what was wrong. Please whatever you do, even in
> the case of maintaining "conservative" behavior, emit a message telling
> the user why you're not continuing the boot process.

I totally agree.

The system seems to hang at a certain point, and when you wait long enough
(180s as far as I understand from this discussion) and then drops you
into initramfs.

I tried 999 different ways of modifying grub, fstab, mdadm.conf becasue I
didn't know WHY the system couldn't boot.

The fact was that I very well KNEW the array was degraded, too. A warning
message would have helped tremendously.

Also, since there is no message and no indication of what was wrong, I had
a very hard time finding this thread. I almost exploded when I found the
thread, saw it's age, and all the possible workarounds, and the fact that
it is not a problem in Debian.

This behaviour really and very strongly diminishes my trust in Ubuntu as a
system for my servers.

Just a question: have you any idea how many systems are affected? The
popularity contest says:
mdadm 25199 5566 19467 156 10

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-08-19:

#105

Screenshot.png Edit (13.6 KiB, image/png; name=Screenshot.png)

The new behavior present in Intrepid, following my series of patches,
results in the following message being printed to the console after 30
seconds of waiting for a complete array...

| There appears to be one or more degraded RAID devices, and your root device
| may depend on the RAID devices being online. One or more of the
following RAID
| devices are degraded:
| md0 : inactive sda1[0](S)
| 1020032 blocks

[See attached screenshot]

You can configure your RAID to boot even if degraded by running
`dpkg-reconfigure mdadm`, or, you can simply edit the file:
/etc/initramfs-tools/conf.d/mdadm by hand, and set BOOT_DEGRADED=true.
They have the same effect.

Additionally, you can set or override this on the kernel boot line by
entering the grub menu and using any of the following:
* bootdegraded
* bootdegraded=1, bootdegraded=true, bootdegraded=on, bootdegraded=yes

If BOOT_DEGRADED=true, it will continue to boot automatically in the
event of a degraded RAID event.

If those are set to false, the following prompt will appear, and wait
15 seconds for you to hit 'y' or 'n', at which point it will default
to 'n':

| ** WARNING: The root filesystem was found on a degraded RAID array! **
|
| The system may have suffered a hardware fault, such as a disk drive failure.
| You may attempt to start the system anyway, or stop now and attempt manual
| recovery operations.
|
| If you choose to boot the degraded RAID, the system may start normally, but
| performance may be degraded, and a further hardware fault could result in
| permanent data loss.
|
| If you abort now, you will be provided with a recovery shell.
| Do you wish to boot the degraded RAID? [y/N]:

[See attached screenshot]

This interactive selection is a one time deal. It does not affect the
configuration as written to file.

These changes involve too much code (grub, mdadm, initramfs-tools),
too many new 'features', and are not security-critical enough to be
_automatically_ published to Hardy LTS.

That said, we do have documented process for providing Stable Release
Updates, and Backports to previous releases. Please follow the
procedures specified in:
* https://wiki.ubuntu.com/StableReleaseUpdates
* https://help.ubuntu.com/community/UbuntuBackports

This bug, "[Bug 120375] Re: cannot boot raid1 with only one disk", is
solved, and I'm un-subscribing from it.

If you have separate or additional issues, please open new bugs.

Thanks,
:-Dustin

The new behavior present in Intrepid, following my series of patches,
results in the following message being printed to the console after 30
seconds of waiting for a complete array...

|   There appears to be one or more degraded RAID devices, and your root device
|   may depend on the RAID devices being online. One or more of the
following RAID
|   devices are degraded:
|   md0 : inactive sda1[0](S)
|            1020032 blocks

[See attached screenshot]

You can configure your RAID to boot even if degraded by running
`dpkg-reconfigure mdadm`, or, you can simply edit the file:
/etc/initramfs-tools/conf.d/mdadm by hand, and set BOOT_DEGRADED=true.
 They have the same effect.

Additionally, you can set or override this on the kernel boot line by
entering the grub menu and using any of the following:
 * bootdegraded
 * bootdegraded=1, bootdegraded=true, bootdegraded=on, bootdegraded=yes

If BOOT_DEGRADED=true, it will continue to boot automatically in the
event of a degraded RAID event.

If those are set to false, the following prompt will appear, and wait
15 seconds for you to hit 'y' or 'n', at which point it will default
to 'n':

|   ** WARNING: The root filesystem was found on a degraded RAID array! **
|
|   The system may have suffered a hardware fault, such as a disk drive failure.
|   You may attempt to start the system anyway, or stop now and attempt manual
|   recovery operations.
|
|   If you choose to boot the degraded RAID, the system may start normally, but
|   performance may be degraded, and a further hardware fault could result in
|   permanent data loss.
|
|   If you abort now, you will be provided with a recovery shell.
|   Do you wish to boot the degraded RAID? [y/N]:

[See attached screenshot]

This interactive selection is a one time deal.  It does not affect the
configuration as written to file.

These changes involve too much code (grub, mdadm, initramfs-tools),
too many new 'features', and are not security-critical enough to be
_automatically_ published to Hardy LTS.

That said, we do have documented process for providing Stable Release
Updates, and Backports to previous releases.  Please follow the
procedures specified in:
 * https://wiki.ubuntu.com/StableReleaseUpdates
 * https://help.ubuntu.com/community/UbuntuBackports

This bug, "[Bug 120375] Re: cannot boot raid1 with only one disk", is
solved, and I'm un-subscribing from it.

If you have separate or additional issues, please open new bugs.

Thanks,
:-Dustin

Revision history for this message

Ace Suares (acesuares) wrote on 2008-08-19:

#106

On Tuesday 19 August 2008, Dustin Kirkland wrote:
> The new behavior present in Intrepid, following my series of patches,
> results in the following message being printed to the console after 30
> seconds of waiting for a complete array...
>
> | There appears to be one or more degraded RAID devices, and your
> | root device may depend on the RAID devices being online. One or more
> | of the
>
> following RAID
>
> | devices are degraded:
> | md0 : inactive sda1[0](S)
> | 1020032 blocks
>
> [See attached screenshot]
>
> You can configure your RAID to boot even if degraded by running
> `dpkg-reconfigure mdadm`, or, you can simply edit the file:
> /etc/initramfs-tools/conf.d/mdadm by hand, and set BOOT_DEGRADED=true.
> They have the same effect.
>
> Additionally, you can set or override this on the kernel boot line by
> entering the grub menu and using any of the following:
> * bootdegraded
> * bootdegraded=1, bootdegraded=true, bootdegraded=on, bootdegraded=yes
>
> If BOOT_DEGRADED=true, it will continue to boot automatically in the
> event of a degraded RAID event.
>
> If those are set to false, the following prompt will appear, and wait
> 15 seconds for you to hit 'y' or 'n', at which point it will default
>
> to 'n':
> | ** WARNING: The root filesystem was found on a degraded RAID array!
> | **
> |
> | The system may have suffered a hardware fault, such as a disk drive
> | failure. You may attempt to start the system anyway, or stop now and
> | attempt manual recovery operations.
> |
> | If you choose to boot the degraded RAID, the system may start
> | normally, but performance may be degraded, and a further hardware
> | fault could result in permanent data loss.
> |
> | If you abort now, you will be provided with a recovery shell.
> | Do you wish to boot the degraded RAID? [y/N]:
>
> [See attached screenshot]
>
> This interactive selection is a one time deal. It does not affect the
> configuration as written to file.
>
> These changes involve too much code (grub, mdadm, initramfs-tools),
> too many new 'features', and are not security-critical enough to be
> _automatically_ published to Hardy LTS.
>
> That said, we do have documented process for providing Stable Release
> Updates, and Backports to previous releases. Please follow the
> procedures specified in:
> * https://wiki.ubuntu.com/StableReleaseUpdates
> * https://help.ubuntu.com/community/UbuntuBackports
>
> This bug, "[Bug 120375] Re: cannot boot raid1 with only one disk", is
> solved, and I'm un-subscribing from it.
>
> If you have separate or additional issues, please open new bugs.
>
> Thanks,
>
> :-Dustin
>
> ** Attachment added: "Screenshot.png"
> http://launchpadlibrarian.net/16926424/Screenshot.png

Good work.

On Tuesday 19 August 2008, Dustin Kirkland wrote:
> The new behavior present in Intrepid, following my series of patches,
> results in the following message being printed to the console after 30
> seconds of waiting for a complete array...
>
> |   There appears to be one or more degraded RAID devices, and your
> | root device may depend on the RAID devices being online. One or more
> | of the
>
> following RAID
>
> |   devices are degraded:
> |   md0 : inactive sda1[0](S)
> |            1020032 blocks
>
> [See attached screenshot]
>
> You can configure your RAID to boot even if degraded by running
> `dpkg-reconfigure mdadm`, or, you can simply edit the file:
> /etc/initramfs-tools/conf.d/mdadm by hand, and set BOOT_DEGRADED=true.
>  They have the same effect.
>
> Additionally, you can set or override this on the kernel boot line by
> entering the grub menu and using any of the following:
>  * bootdegraded
>  * bootdegraded=1, bootdegraded=true, bootdegraded=on, bootdegraded=yes
>
> If BOOT_DEGRADED=true, it will continue to boot automatically in the
> event of a degraded RAID event.
>
> If those are set to false, the following prompt will appear, and wait
> 15 seconds for you to hit 'y' or 'n', at which point it will default
>
> to 'n':
> |   ** WARNING: The root filesystem was found on a degraded RAID array!
> | **
> |
> |   The system may have suffered a hardware fault, such as a disk drive
> | failure. You may attempt to start the system anyway, or stop now and
> | attempt manual recovery operations.
> |
> |   If you choose to boot the degraded RAID, the system may start
> | normally, but performance may be degraded, and a further hardware
> | fault could result in permanent data loss.
> |
> |   If you abort now, you will be provided with a recovery shell.
> |   Do you wish to boot the degraded RAID? [y/N]:
>
> [See attached screenshot]
>
> This interactive selection is a one time deal.  It does not affect the
> configuration as written to file.
>
> These changes involve too much code (grub, mdadm, initramfs-tools),
> too many new 'features', and are not security-critical enough to be
> _automatically_ published to Hardy LTS.
>
> That said, we do have documented process for providing Stable Release
> Updates, and Backports to previous releases.  Please follow the
> procedures specified in:
>  * https://wiki.ubuntu.com/StableReleaseUpdates
>  * https://help.ubuntu.com/community/UbuntuBackports
>
> This bug, "[Bug 120375] Re: cannot boot raid1 with only one disk", is
> solved, and I'm un-subscribing from it.
>
> If you have separate or additional issues, please open new bugs.
>
> Thanks,
>
> :-Dustin
>
> ** Attachment added: "Screenshot.png"
>    http://launchpadlibrarian.net/16926424/Screenshot.png

Good work.

Ace Suares (acesuares) on 2008-08-19

description:

updated

Revision history for this message

tricky1 (tricky1) wrote on 2008-08-23:

#107

In my new installations of Alpha 4 this did not work.

Revision history for this message

miguel (miguel-scantec) wrote on 2008-09-01:

#109

I've experimented this bug on a new system I sold.. After a few days working, one of the disks just died for my bad luck. I lost a Lot of hours trying to fix it, resulting downtime to our costumer.. I couldn't believe when I saw the panic I went to was the result of a bug not fixed on this so called LTS release.. I would call Hardy a Long Time Supported Testing Release.. This "bug" has been here more than 1 year.. I just dumped hardy out of my way and went back to Debian Etch. No more Hard(y) and strange works for me! "It's extremely weird that the Debian distro doesn't have this bug (or behaviour) and Ubuntu has." Not that weird.. I just can't imagine Debian simply put out there a final release with a serious bug like this one..

Revision history for this message

no!chance (ralf-fehlau) wrote on 2008-09-04:

#110

Full ack to miguel. Why is somebody using a raid1 on his system. Does he want to have trouble, if a disk fails, or does he want a running system and to be informed about a hardware failue? Ubuntu raid is useless! The "conservative" mode is useless! If i had a 4 disk RAID1 system or a RAID5 with a spare disk and ONE fails. Is it useful to stop the boot process???

I have the same issue on my new system. First, I had one disk and decided to upgrade to software raid. With the second HD, I created a degraded raid1, copied the contents of the first disk to the second, and wanted to add the first disk to the raid. Ubuntu drops me into the shell. :-( And in spite of booting with a live CD and adding the first disk to the raid, the system refused to boot.

Because it is a new system without any data on it, I will do a new installation with debian or suse. For my home server, I will see.

Revision history for this message

no!chance (ralf-fehlau) wrote on 2008-09-04:

#111

Another thing to mention: Which systems are using raid? .... Right! ... Server! Such systems are usually maintained remotely and rebooted through a ssh connection. This message an the question is very very useful. :-(

> If you abort now, you will be provided with a recovery shell.
> Do you wish to boot the degraded RAID? [y/N]:

The last you will see from this server is ....

"rebooting now"

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-09:

#112

Can somebody explain, is this bug fixed in hardy packages?

A lot of comments, and not a single clear report!!!

If it is not fixed... Is there a patch for local script in initramfs-tools and mdadm or is there any rebuilt package with the fixes?

Currently in Hardy i got:
~$ apt-cache policy mdadm
mdadm:
  Installed: 2.6.3+200709292116+4450e59-3ubuntu3
  Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
  Version table:
*** 2.6.3+200709292116+4450e59-3ubuntu3 0
        500 http://ro.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status
stas@baikonur:~$ apt-cache policy initramfs-tools
initramfs-tools:
  Installed: 0.85eubuntu39.2
  Candidate: 0.85eubuntu39.2
  Version table:
*** 0.85eubuntu39.2 0
        500 http://ro.archive.ubuntu.com hardy-updates/main Packages
        100 /var/lib/dpkg/status
     0.85eubuntu36 0
        500 http://ro.archive.ubuntu.com hardy/main Packages
~$ apt-cache policy mdadm
mdadm:
  Installed: 2.6.3+200709292116+4450e59-3ubuntu3
  Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
  Version table:
*** 2.6.3+200709292116+4450e59-3ubuntu3 0
        500 http://ro.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status

Thank you in advance!

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-10-09:

#113

This is fixed in Intrepid, not in Hardy.

:-Dustin

Revision history for this message

Ace Suares (acesuares) wrote on 2008-10-09:

#114

On Thursday 09 October 2008, Stanislav Sushkov wrote:
> Can somebody explain, is this bug fixed in hardy packages?
>
> A lot of comments, and not a single clear report!!!
>
> If it is not fixed... Is there a patch for local script in initramfs-
> tools and mdadm or is there any rebuilt package with the fixes?
>
> Currently in Hardy i got:
> ~$ apt-cache policy mdadm
> mdadm:
> Installed: 2.6.3+200709292116+4450e59-3ubuntu3
> Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
> Version table:
> *** 2.6.3+200709292116+4450e59-3ubuntu3 0
> 500 http://ro.archive.ubuntu.com hardy/main Packages
> 100 /var/lib/dpkg/status
> stas@baikonur:~$ apt-cache policy initramfs-tools
> initramfs-tools:
> Installed: 0.85eubuntu39.2
> Candidate: 0.85eubuntu39.2
> Version table:
> *** 0.85eubuntu39.2 0
> 500 http://ro.archive.ubuntu.com hardy-updates/main Packages
> 100 /var/lib/dpkg/status
> 0.85eubuntu36 0
> 500 http://ro.archive.ubuntu.com hardy/main Packages
> ~$ apt-cache policy mdadm
> mdadm:
> Installed: 2.6.3+200709292116+4450e59-3ubuntu3
> Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
> Version table:
> *** 2.6.3+200709292116+4450e59-3ubuntu3 0
> 500 http://ro.archive.ubuntu.com hardy/main Packages
> 100 /var/lib/dpkg/status
>
> Thank you in advance!

It's not fixed in hardy LTS, which is very strange...

ace

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-09:

#115

Can you point to a wiki page or a comment in this thread where I'll find
a solution for hardy?

Or I should install Intrepid packages?

Thank you.

On Thu, 2008-10-09 at 15:14 +0000, Dustin Kirkland wrote:
> This is fixed in Intrepid, not in Hardy.
>
> :-Dustin
>
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message

RpR (tom-lecluse) wrote on 2008-10-09:

#116

Dustin I love the work that you've put in to this but I need to stand by Ace Suares and for a LTS version it should be fixed.
It prevents me to use ubuntu for my servers which uses software raid. The ones with hardware raid could use ubuntu.

But sticking with debian for the moment because of this.
If hardy wasn't a LTS version I would understand when you would just say upgrade to ...

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-09:

#117

Just post a file with patch which works for hardy, and that's all folks :)

Revision history for this message

Ace Suares (acesuares) wrote on 2008-10-09:

#118

On Thursday 09 October 2008, Stanislav Sushkov wrote:
> Just post a file with patch which works for hardy, and that's all folks
>
> :)

No it's not.

There is a procedure for that, so it will be updated automatically. That
procedure I followed but at some point in the procedure, powers
of 'normal' people are insufficient. We have to find an overlord who will
sponsor the process and move it forward. Even the developer who made the
patch for Ibex, can not move this forward on his own.

Where are the Power Puff Girls when you need them ?

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-09:

#119

You mean this procedure?

It didn't work for me.
I mean I patched my "local" manually, but it broke my init image
in /boot after update-initramfs...

Maybe I the patch really works, but no one from this thread reported it
as a solution...

On Thu, 2008-10-09 at 20:37 +0000, Ace Suares wrote:
> On Thursday 09 October 2008, Stanislav Sushkov wrote:
> > Just post a file with patch which works for hardy, and that's all folks
> >
> > :)
>
> No it's not.
>
> There is a procedure for that, so it will be updated automatically. That
> procedure I followed but at some point in the procedure, powers
> of 'normal' people are insufficient. We have to find an overlord who will
> sponsor the process and move it forward. Even the developer who made the
> patch for Ibex, can not move this forward on his own.
>
> Where are the Power Puff Girls when you need them ?
>
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-10-09:

#120

Fixing Hardy would mean, at the very least:
1) porting the patches for:
   * mdadm
   * initramfs-tools
   * grub
   * grub-installer
2) Rebuilding the installation ISO's.
3) Obsessively regression testing the new install media.

After Intrepid releases on October 30, 2008, I will spend a few cycles
considering the port to Hardy 8.04.2. No guarantees, but rest assured
that I am a Canonical/Ubuntu Server developer, who runs Hardy+RAID on
my own systems, and have plenty of motivation to see this fixed.

If anyone wants to volunteer to do (1), that would help move things
along. And I certainly hope at the very least some of you are willing
to help with (3).

:-Dustin

p.s. Please understand that your sarcasm is completely unappreciated.

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-09:

#121

Dustin,
what about those packages for ibex. If I update my hardy with those, so
I risk serious troubles?

Or it is not possible cause of difference between kernels?

Someone did this before?

On Thu, 2008-10-09 at 21:21 +0000, Dustin Kirkland wrote:
> Dustin
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-10-09:

#122

Stanislav-

First, the Intrepid packages won't work, due to toolchain (glibc,
klibc) differences. I just tested out of curiosity in a virtual
machine.

Second, under no circumstances would I recommend this as an acceptable
thing to do. If you are beholden to running Hardy, I presume that's
because Hardy is supported as an LTS. Once you start changing the
Hardy LTS packages and replacing them with Intrepid packages, you're
no longer in a supported configuration. Especially when we're talking
about things that are so fundamental to booting your system.

If you would be willing to upgrade these four key packages to
Intrepid, I'd say you would be much better served upgrading to
Intrepid across the board.

:-Dustin

Revision history for this message

Ace Suares (acesuares) wrote on 2008-10-09:

#123

Dustin,

I am glad you may be spending some time on this bug.

You mention rebuilding ISO's. But why can't it just be an upgrade to the existing installations?
I mean, on an existing system, all we need to do is upgrade ?

Also, I am not being sarcastic at all, when I say that I can not understand that there needs to be so much regression testing for the patch that removes the bug, since to introduce the bug was possible at all. The bug is not present in Debian, and was not present in Dapper.

Anyway, I am going to install the workaround on all my affected machines because I cannot wait that long. (But then I will get some trouble when an update finally comes around..).

And I am not going to advice using Ubuntu for servers that use software raid anymore. I am really disappointed in the way this is going. I am happy it will be fixed in Ibex tough. Just keep smiling...

I am also unsubscribing from this bug. I feel that I am becoming unconstructive.

Revision history for this message

Ross Becker (ross-becker) wrote on 2008-10-09:

#124

Dustin,
I've been doing sysadmin work for 15 years. I chose to try out Ubuntu
for a home RAID server project, and loaded up Hardy as it was an LTS
edition. In my first day working with Ubuntu, I ran into this bug, a bug
where the version of mdadm (pretty well out of date) on Hardy was unable to
resume a RAID reshape operation, and the ext2resize tools incorrectly detect
the RAID stride. All of these bugs are BASIC functionality of the storage
management tools which should have never made it through any sort of QA.

I reported all of them as bugs, and this is the ONLY one which has even
recieved a developer response after 2 months.

For a Long Term Support edition, that's shameful. Not only that, but a bug
which in a COMMON situation can prevent boot and your response as to whether
a fix will be backported is "I'll spend a few cycles considering backporting
it"

Your lack of understanding for someone's sarcasm is completely unjustified,
and the level of developer/bugfix support I'm seeing for Ubuntu is
pathetic. With this level of support, using Ubuntu for any sort of a
corporate server application would be a really poor decision.

Revision history for this message

brunus (reg-paolobrunello) wrote on 2008-10-10:

#125

Hello,
I second RpR post word by word: it's sincerely hard to accept that such a serious bug is, not present until 7.04, is still open 14 months and 2 distros after, one of them being LTS. And this is even more true for the - arguably called - server edition: it's like putting on the market a TIR truck that has no double tyres and doing it twice.

Dustin,
could you please explain how do you run Hardy+RAID: it is still not clear to me after reading the whole thread, sorry.

Thanks,

brunus

Revision history for this message

Steve Langasek (vorlon) wrote on 2008-10-10:

#126

Unsubscribing ubuntu-sru. Please do not subscribe the SRU team to bugs that don't actually include proposed fixes to previous releases. a) this is not the documented procedure for SRU fixes, b) the SRU team has other things to do that actually benefit Ubuntu users, instead of following a "bug report" that consists of lambasting Ubuntu for a bug that the relevant developers have already agreed should receive attention.

With regard to the last, I've accepted the nomination for hardy based on Dustin's statement that he'll work on this for 8.04.2.

Ross, as for the other bugs you mentioned: the ext2resize package is not part of Ubuntu main, which means it's not part of the set of software supported by Canonical. It's a third-party tool that ext3 upstream has repeatedly recommended against using. The package from Debian is included in the universe repository in the hope that it's useful to someone, but there is no one at all in Ubuntu tending that package - there's no reason to think it was subjected to any QA for it to fail. If you believe this problem makes the package unusable, then I'm happy to escalate bug #256669 and have the ext2resize package removed from subsequent Ubuntu releases.

Changed in initramfs-tools:
status:	New → Confirmed
Changed in mdadm:
importance:	Undecided → Medium
status:	New → Confirmed
Changed in initramfs-tools:
importance:	Undecided → Medium

Revision history for this message

agent 8131 (agent-8131) wrote on 2008-10-10:

#127

Download full text (4.0 KiB)

I think it's time for some tough love. No one would be taking the time to comment on this if they didn't want to see Ubuntu Server be a better product. I personally feel this is a significant issue for because it demonstrates Canonical's interest in supporting an LTS release and seriousness about becoming a presence in the server market. I know it can be difficult when people are lambasting a product you've put a lot of time into, believe me, I got more than my fair share of that this week. However, sometimes you have to step back and realize that your product quality has been lower than many people expect and you have to either step up or risk losing more customers. Make no mistake, this bug has lost a lot of sysadmins, some of whom had to fight hard to get Ubuntu onto servers in their workplaces in the first place. I was one of them, and I know a few more personally. I pitched that it made sense to have Ubuntu on the Server because of the benefits of the LTS release, including the longer support time and the large community that contributes to Ubuntu, therefore leading to more bugs being found and resolved. However, I doubt I will be able to propose Ubuntu again until version 10.04. If this bug were to be resolved in 8.04.2 I might at least start pitching it next year, barring any other bugs of this level of severity.

To respond to Steve Langasek, while I understand that a lot of these emails are not terribly useful, this bug is exactly what the SRU team is supposed to be addressing. There have been many proposed fixes in this thread and 8 patches uploaded. Do any of them work correctly? Well that is the question, but it's inaccurate to state that this bug does not contain proposed fixes. Furthermore this fits the SRU criteria of being a high impact bug representing a severe regression from earlier versions and one which also may lead to a loss of user data. When the average user is confronted with the initramfs shell during a drive failure I suspect they have the potential to do serious damage to their file systems in an attempt to fix the problem.

I don't feel it's possible for me to understate the severity of this bug and how badly sysadmins are going to react when they encounter it or read about it. It is certainly not the kind of bug one can dismiss in an LTS release if LTS is to say anything about quality, and hence suggestions to upgrade to Intrepid, while acceptable to a home user building a server, are not going to be acceptable in the workplace. If this is a market segment that Ubuntu Server is catered to than this issue needs to be addressed. If on the other hand Ubuntu Server is meant merely for enthusiasts with their home file servers than the solution should be to make sure that goal is clearly articulated.

To keep us focused on the work at hand and to avail myself of the opportunity that having this number of people working for fix this bug represents I'll say that I've tried a number of solutions on this page but none have been satisfactory. I tried changing the udev rule as suggested above (see Plnt 2007-12-25) but got the same results that have been reported: I can get the system to boot any time the ...

I think it's time for some tough love.  No one would be taking the time to comment on this if they didn't want to see Ubuntu Server be a better product.  I personally feel this is a significant issue for because it demonstrates Canonical's interest in supporting an LTS release and seriousness about becoming a presence in the server market.  I know it can be difficult when people are lambasting a product you've put a lot of time into, believe me, I got more than my fair share of that this week.  However, sometimes you have to step back and realize that your product quality has been lower than many people expect and you have to either step up or risk losing more customers.  Make no mistake, this bug has lost a lot of sysadmins, some of whom had to fight hard to get Ubuntu onto servers in their workplaces in the first place.  I was one of them, and I know a few more personally.  I pitched that it made sense to have Ubuntu on the Server because of the benefits of the LTS release, including the longer support time and the large community that contributes to Ubuntu, therefore leading to more bugs being found and resolved.  However, I doubt I will be able to propose Ubuntu again until version 10.04.  If this bug were to be resolved in 8.04.2 I might at least start pitching it next year, barring any other bugs of this level of severity.

To respond to Steve Langasek, while I understand that a lot of these emails are not terribly useful, this bug is exactly what the SRU team is supposed to be addressing.  There have been many proposed fixes in this thread and 8 patches uploaded.  Do any of them work correctly?  Well that is the question, but it's inaccurate to state that this bug does not contain proposed fixes.  Furthermore this fits the SRU criteria of being a high impact bug representing a severe regression from earlier versions and one which also may lead to a loss of user data.  When the average user is confronted with the initramfs shell during a drive failure I suspect they have the potential to do serious damage to their file systems in an attempt to fix the problem.

I don't feel it's possible for me to understate the severity of this bug and how badly sysadmins are going to react when they encounter it or read about it.  It is certainly not the kind of bug one can dismiss in an LTS release if LTS is to say anything about quality, and hence suggestions to upgrade to Intrepid, while acceptable to a home user building a server, are not going to be acceptable in the workplace.  If this is a market segment that Ubuntu Server is catered to than this issue needs to be addressed.  If on the other hand Ubuntu Server is meant merely for enthusiasts with their home file servers than the solution should be to make sure that goal is clearly articulated.

To keep us focused on the work at hand and to avail myself of the opportunity that having this number of people working for fix this bug represents I'll say that I've tried a number of solutions on this page but none have been satisfactory.  I tried changing the udev rule as suggested above (see Plnt 2007-12-25)  but got the same results that have been reported: I can get the system to boot any time the disk is degraded but at the expense that it boots degraded sometimes even when the disks are fine.  This is the setting on my remaining Ubuntu servers (which will be migrated to Debian in the near future).  This option is suboptimal but better than the default behavior.  I would say that the ideal (and expected) behavior is for the system to boot in the case of a degraded array by default, with an option to turn that off if the sysadmin chooses.  For those with Ubuntu servers, assuming this never gets fixed, I think an initramfs  script (in /etc/initramfs-tools/scripts/)  that anyone could copy onto their server might be ideal.  Alternatively patching the mdadm udev script to try and start with --no-degraded and on failure use --run might also be an option.  I have a server I may be able to user for testing but if anyone has a script to start with that would be great.

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-10-10:

#128

I'm un-subscribing from this bug as well.

Anyone who believes that berating the developer who has (finally)
fixed this bug in the current release of Ubuntu, and offered to fix it
in the LTS release, is constructive desperately needs to re-read the
Ubuntu Code of Conduct.
* http://www.ubuntu.com/community/conduct

If you feel that this issue is urgently blocking your organization
from adopting Ubuntu LTS at an enterprise level, I remind you that
contracting commercial support from Canonical or one of the numerous
Ubuntu partners around the globe is always an option.
* http://www.ubuntu.com/support/paid
* http://webapps.ubuntu.com/partners/

Dustin

Changed in initramfs-tools:
assignee:	kirkland → nobody
Changed in mdadm:
assignee:	kirkland → nobody

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-10:

#129

Dustin, Steve, thank you for your support.
I'll be waiting for the updates on my hardy, and pray to not get in a
situation where this bug will show up ugly on my servers.

I remain subscribed, as I believe guys will make their job done soon.

Good luck!

On Fri, 2008-10-10 at 06:01 +0000, Dustin Kirkland wrote:
> I'm un-subscribing from this bug as well.
>
> Anyone who believes that berating the developer who has (finally)
> fixed this bug in the current release of Ubuntu, and offered to fix it
> in the LTS release, is constructive desperately needs to re-read the
> Ubuntu Code of Conduct.
> * http://www.ubuntu.com/community/conduct
>
> If you feel that this issue is urgently blocking your organization
> from adopting Ubuntu LTS at an enterprise level, I remind you that
> contracting commercial support from Canonical or one of the numerous
> Ubuntu partners around the globe is always an option.
> * http://www.ubuntu.com/support/paid
> * http://webapps.ubuntu.com/partners/
>
> Dustin
>
> ** Changed in: initramfs-tools (Ubuntu)
> Assignee: Dustin Kirkland (kirkland) => (unassigned)
>
> ** Changed in: mdadm (Ubuntu)
> Assignee: Dustin Kirkland (kirkland) => (unassigned)
>
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message

agent 8131 (agent-8131) wrote on 2008-10-10:

#130

No one should be taking any of this personally or feel berated. People should feel free to be honest about their experiences without fear that doing so will drive away people that could be of help in resolving the issues. I've been keeping my mouth shut on this bug for a couple of months and I apologize for being long winded when I finally took the time to post my experiences.

Dustin Kirkland, you did not offer to fix this problem in the LTS release. You stated that you would consider it. And that's great, but without any commitment to fix the problem in 8.04 it would be nice to actually be able to offer up a simple solution to the people that come across this page when trying to resolve this bug.

I will not unsubscribe from this bug because I know that people will be bit by it again, come to me for help, and I wish to have a good solution for them when they do. I re-read the Code of Conduct and note that being collaborative seems to be at odds with disengaging when one finds the process unpleasant. I certainly respect the right of people to do so but I feel it's a loss to actually making progress on this bug. Granted, I've only been subscribed for 2 months; if I had been subscribed for a year I acknowledge I might feel differently.

Since people seem to feel that this hasn't been a friendly exchange I will be happy to buy a drink of choice to:
1) Anyone that comes up with a good solution to this bug for Ubuntu 8.04 and 8.04.1.
2) Anyone who works to get this bug fixed in 8.04.2.
3) Dustin Kirkland and anyone else who makes sure this bug never appears in 8.10.

Revision history for this message

Stas Sușcov (sushkov) wrote on 2008-10-10:

#131

Subscribing to what agent 8131 said.
Now there are two drinks to get for the guy who helps us!

On Fri, 2008-10-10 at 07:13 +0000, agent 8131 wrote:
> agent 8131
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-10-29:

#132

Now that the Intrepid development cycle has wound down, I was finally able to circle back to this.

For those of you interested in seeing this backported to Hardy, see Bug #290885.

I have test packages available in my PPA--standard PPA disclaimers apply. Please refer to Bug #290885 for test instructions.

:-Dustin

Revision history for this message

Nick Barcet (nijaba) wrote on 2008-11-04:

#133

Has anyone tested the packages in Dustin's PPA? (Are the beers coming?)

It would be really nice know before we place those in -proposed to follow the proper SRU process for hardy.

Dustin Kirkland  (kirkland) on 2008-11-04

Changed in initramfs-tools:
assignee:	nobody → kirkland
status:	Confirmed → In Progress
Changed in mdadm:
assignee:	nobody → kirkland
milestone:	none → ubuntu-8.04.2
status:	Confirmed → In Progress
Changed in initramfs-tools:
milestone:	none → ubuntu-8.04.2
Changed in grub:
assignee:	nobody → kirkland
importance:	Undecided → Medium
milestone:	none → ubuntu-8.04.2
status:	New → In Progress
status:	New → Fix Released

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2008-11-04:

#134

Thanks, Nick.

Please respond with any test results in Bug #290885.

Bug #120375 is hereby reserved for wailing, moaning, fussing, cursing, complaining, lamenting, murmuring, regretting, repining, bewailing, deploring, weeping, mourning, protesting, charging, accusing, disapproving, grumbling, fretting, whining, peeving, quarreling, resenting, dissenting, discontenting, malcontenting, bellyaching, and non-constructive criticisms :-)

:-Dustin

Dustin Kirkland  (kirkland) on 2008-12-15

Changed in grub:
status:	In Progress → Fix Released
Changed in initramfs-tools:
status:	In Progress → Fix Released
Changed in mdadm:
status:	In Progress → Fix Released

Revision history for this message

Tapani Rantakokko (trantako) wrote on 2009-01-27:

#135

It seems that the issue is now fixed in Intrepid, and also backported to Hardy 8.04.2, which was released a few days ago. However, it is unclear to me if I need to reinstall from 8.04.2 distribution media, or can I fix an existing Hardy installation via software updates?

Revision history for this message

Dustin Kirkland  (kirkland) wrote on 2009-01-28:

#136

You should not need to reinstall.

To solve this, you need to:
a) live upgrade all packages (specifically, you need to upgrade grub,
mdadm, and initramfs-tools)
b) install grub to the raid (for instance, if /dev/md0 provides your
/boot directory, you can do "grub-install /dev/md0")

Cheers,
:-Dustin

Revision history for this message

Tapani Rantakokko (trantako) wrote on 2009-02-02:

#137

Download full text (3.7 KiB)

Dustin, thank you for your quick answer and tips.

It took me a while to test it, as I have an encrypted RAID 1 array with LVM, and things are not that straightforward with that setup.

So far I have been using one of the tricks described in this thread earlier (ie. edit /etc/udev/rules.d/85-mdadm.rules: change the "--no-degraded" to "-R", after that "sudo update-initramfs -u -k all"). It allows me to boot with only one drive, but has that annoying side effect where one of the partitions often starts in degraded mode, even if both drives are in fact present and working.

I wanted to get rid of that problem, so I did this:
- return 85-mdadm.rules as it used to be, ie. --no-degraded
- sudo update-initramfs -u -k all
- cat /proc/mdstat and check that all drives are online and sync'ed
- upgrade all packages
- re-install grub to both drives

These are my test results:
1. Restart computer with both disks
-> everything works OK

2. Restart computer with only one disk
-> Keeps asking "Enter password to unlock the disk (md1_crypt):" even though I write the correct password

3. Restart computer again with both disks
-> everything works OK

So, first it seemed that the fix does not work at all, as Ubuntu starts only when both disks are present. Then I made some more tests:

4. Restart computer with only one disk
-> Keeps asking "Enter password to unlock the disk (md1_crypt):" even though I write the correct password
-> Now press CTRL+ALT+F1, and see these messages:
Starting up ...
Loading, please wait...
Setting up cryptographic volume md1_crypt (based on /dev/md1)
cryptsetup: cryptsetup failed, bad password or options?
cryptsetup: cryptsetup failed, bad password or options?
-> After waiting some minutes, I got dropped into the busybox
-> Something seems to be going wrong with encryption

5. Restart computer with only one disk, without "quiet splash" boot parameters in /boot/grub/menu.lst
-> Got these messages:
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
... other stuff ...
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
cryptsetup: maximum number of tries exceeded
Done.
Begin: Waiting for root file system... ...
-> After waiting some minutes, I get the question whether I want to start the system with degraded setup. However, it does not matter what I answer, as the system cannot start since the encryption has already given up trying. I don't know what it was trying to read as a password, because I did not type anything.

6. Restart computer with only one disk, with "quiet splash bootdegraded=true" boot parameters in /boot/grub/menu.lst
-> Keeps asking "Enter password to unlock the disk (md1_crypt):" even though I write the correct password
-> Now press CTRL+ALT+F1, and see these messages:
Starting up ...
Loading, please wait...
Setting up cryptographic volume md1_crypt (based on /dev/md1)
cryptsetup: cryptsetup failed, bad password or options?

Summary:
The fix does not seem to work, in case you have encrypted your RAID disks. To be more specific: after a long wait it does a...