cannot boot raid1 with only one disk

Bug #120375 reported by peterh
140
This bug affects 2 people
Affects Status Importance Assigned to Milestone
initramfs-tools
Fix Released
Undecided
Unassigned
grub (Ubuntu)
Fix Released
Undecided
Unassigned
Hardy
Fix Released
Medium
Unassigned
initramfs-tools (Ubuntu)
Fix Released
Medium
Unassigned
Hardy
Fix Released
Medium
Unassigned
mdadm (Ubuntu)
Fix Released
Medium
Unassigned
Hardy
Fix Released
Medium
Unassigned

Bug Description

The impact of the bug on users: Systems with root on a RAID array will not be able to boot if the array is degraded. Usrs affected by this will encounter an unusable system after a reboot (say, a kernel upgrade).

Justification for backporting the fix to the stable release: Hardy is a LTS edition. It is expected that people will continue to use this version and not upgrade to Intrepid. People who have upgrade from Dapper LTS to Hardy LTS are affected too.

TEST CASE
Build a clean system with root on RAID, with Ubuntu 8.04 LTS. Degrade the root array. Reboot and shiver.

A discussion of the regression potential of the patch and how users could get inadvertently effected.
- the only users that could be affected, as fas as know, are the ones that already made a workaround and altered their system files accordingly.

Binary package hint: mdadm

if i unplug one hd from raid1 i cannot successfully boot because raid starts only if all disks are available through

: "${MD_DEGRADED_ARGS:= --no-degraded}" in /usr/share/initramfs-tools/scripts/local-top/mdadm

my workaround is:

/etc/initramfs-tools/hooks/startdegradedraid

#!/bin/sh
#
# Copyright <C2><A9> 2006 Martin F. Krafft <email address hidden>
# based on the scripts in the initramfs-tools package.
# released under the terms of the Artistic Licence.
#
# $Id: hook 281 2006-12-08 08:14:44Z madduck $
#

set -eu

PREREQ="udev"

prereqs()
{
        echo "$PREREQ"
}

case ${1:-} in
  prereqs)
    prereqs
    exit 0
    ;;
esac

MDADM=$(command -v mdadm 2>/dev/null) || :
[ -x $MDADM ] || exit 0

DESTCONFIG=$DESTDIR/conf/md.conf

echo "MD_DEGRADED_ARGS=' '" >> $DESTCONFIG

exit 0

Revision history for this message
peterh (peter-holik) wrote :

Sorry folks, being tooo fast.

This is not working if all discs are present, now i understand the include of MD_DEGRADED_ARGS

my new workaround is to add a bootmenue entry in grub

title Ubuntu, kernel 2.6.20-16-generic (raid defect)
root (hd0,1)
kernel /boot/vmlinuz-2.6.20-16-generic root=/dev/md1 ro raid_degraded
initrd /boot/initrd.img-2.6.20-16-generic

and /etc/initramfs-tools/scripts/init-premount/raid_degraded

#!/bin/sh

set -eu

PREREQ="udev"

prereqs()
{
        echo "$PREREQ"
}

case ${1:-} in
  prereqs)
    prereqs
    exit 0
    ;;
  *)
    . /scripts/functions
    ;;
esac

if [ -e /scripts/local-top/md ]; then
  log_warning_msg "old md initialisation script found, getting out of its way..."
  exit 1
fi

MDADM=$(command -v mdadm)
[ -x $MDADM ] || exit 0

if grep raid_degraded /proc/cmdline 2>/dev/null; then
  echo "MD_DEGRADED_ARGS=' '" >> /conf/md.conf
fi

exit 0

now if a disc is defect and i will to start only with one disc i choose raid defect from bootmenu

Revision history for this message
Peter Haight (peterh-sapros) wrote :

I've got a different way to fix this. After reading Bug #75681, it became clear why they have MD_DEGRADED_ARGS in there. They have that because /scripts/local-top/mdadm gets called every time a device marked raid gets called, but they only want mdadm to build the array once all the devices have come up.

So I've stuck a line in the startup script that tries to mount root and if mounting root times out, then we try to run mdadm again, but this time we let it try and run with degraded disks. This way it will still startup automatically in the presence of RAID failures.

Revision history for this message
Peter Haight (peterh-sapros) wrote :

So, there is something wrong with that patch. Actually it seems to be working great, but when I disconnect a drive to fail it, it boots up immediately instead of trying mdadm after the timeout. So I'm guessing that the mdadm script is getting called without the from-udev parameter somewhere else. But it is working in some sense because the machine boots nicely with one of the RAID drives disconnected, or with both of them properly setup. So there might be some race problem with this patch.

Revision history for this message
brunus (reg-paolobrunello) wrote :

Hey Peter, when I try to apply your patch I get questioned where the /scripts/local file is located. I coulnd't find it. Could you please specify it better? thanks ,

brunus

Revision history for this message
peterh (peter-holik) wrote : Re: [Bug 120375] Re: cannot boot raid1 with only one disk

> Hey Peter, when I try to apply your patch I get questioned where the
> /scripts/local file is located. I coulnd't find it. Could you please
> specify it better? thanks ,

this patch is not from me - ist is from the other Peter in this bugreport

cu Peter

Revision history for this message
brunus (reg-paolobrunello) wrote :

Hey Peter Haight,
brunus again. I can't find any mdadm file in /usr/share/initramfs-tools/scripts/local-top so the patch just hangs. I'm using Edubuntu 7.10 btw: is this patch just for Feisty?

thanks,

Paolo

Revision history for this message
Peter Haight (peterh-sapros) wrote :

I've only tried it on fiesty. I don't have a gutsy machine handy. You have the 'mdadm' package installed, right? If so, then they must have moved stuff around. You could try this:

find / -type f -print0 | xargs -0 grep MD_DEGRADED_ARGS

That will search the whole file system for the a file with MD_DEGRADED_ARGS in it, but if they've moved stuff around, the patch probably won't work anyway.

Revision history for this message
Daniel Pittman (daniel-rimspace) wrote :

Peter Haight <email address hidden> writes:

> I've only tried it on fiesty. I don't have a gutsy machine handy. You
> have the 'mdadm' package installed, right? If so, then they must have
> moved stuff around. You could try this:

The model for starting mdadm disks has changed substantially in gutsy;
it is now driven from a udev rule, building the devices as disks are
discovered.

Regards,
        Daniel
--
Daniel Pittman <email address hidden> Phone: 03 9621 2377
Level 4, 10 Queen St, Melbourne Web: http://www.cyber.com.au
Cybersource: Australia's Leading Linux and Open Source Solutions Company

Revision history for this message
brunus (reg-paolobrunello) wrote :

Thanks for the information Daniel,
but it's still unclear to me whether this problem has been solved yet.

brunus

Revision history for this message
Fumihito YOSHIDA (hito) wrote :

Dear brunus,

The problem is not solved in 7.04/7.10.

You have to use "raid_degraded" grub entry with /etc/initramfs-tools/scripts/init-premount/raid_degraded way.
(and update your initramfs, execute "sudo update-initramfs -u")

And, I test with 7.10, mdadm_2.6.2-1ubuntu2_<arch>.deb(7.10's) *does not work* in this way.
When you use 7.10, please down-grade mdadm package.

Use 7.04's mdadm package: "mdadm_2.5.6-7ubuntu5_<arch>.deb".

Revision history for this message
Davias (davias) wrote :

Dear Fumihito,
Thank you for your help in this very annoying bug with ubuntu (so long for the "new&improved" version)...

I'm running 7.10, tried the above procedure and realized (as you stated) that mdadm_2.6.2-1ubuntu2 DOES NOT work on 7.10 with grub trick.

3 questions:

1) How to degrade to mdadm_2.5.6-7ubuntu5_<arch>.deb as you suggest? In Synaptic packet manager the option "force" is not available...

2) I'm a little bit scared of running a previous version of mdadm on my RAID1 created with 2.6.2: is it safe?

3) Since this is just a workaround and not a solution (the system should automatically start without user selecting a grub option), shall I wait for a new mdadm?

Thanks in advance for yours and everybody's help

Revision history for this message
Fumihito YOSHIDA (hito) wrote :

Dear Davias,

1)
I suppose that you use 386, when you use another one(e.g. amd64),
please use your arch's pkg.

- Please download from http://archive.ubuntu.com/ubuntu/pool/main/m/mdadm/
 $ wget http://archive.ubuntu.com/ubuntu/pool/main/m/mdadm/mdadm_2.5.6-7ubuntu5_i386.deb

- Install with dpkg command.(in your gnome-terminal)
 $ sudo dpkg -i mdadm_2.5.6-7ubuntu5_i386.deb
 In this case, we cannot depend on Synaptic.
 Downloading and dpkg exec is not a well-mannered procedure, but useful.

- And, "hold" this package. This command is important. If you not set "hold",
 update-manager will upgrade mdadm package...(and break your effort).
 $ sudo dpkg-hold mdadm

2)
I cannot assert about your concern, but I tested in some case,
system was working well.
If you have worry, re-run mdadm settings with 2.5.6-7ubuntu5.

3)
hmm... This is a difficult quesition.
Old version of mdadm is working well, but I cant understood any reason...
So I cant tell you how you had better do it.

Revision history for this message
Davias (davias) wrote :

Dear Fumihito,
first of all, thanks for your fast reply!

1) No, ?No, I'm on amd64. I just went to the archive ubuntu site you suggested and found out that the following exists:

mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_amd64.udeb 13-Dec-2007 01:04 76K
mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_i386.udeb 13-Dec-2007 00:04 73K
mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_powerpc.udeb 13-Dec-2007 01:04 85K
mdadm-udeb_2.6.3+200709292116+4450e59-3ubuntu1_sparc.udeb 13-Dec-2007 01:04 91K

...it seems a fresh new 2.6.3 version of mdadm - maybe it cures the bug ?!?

2) First time I deal with mdadm RAID, but from experiences on other OS with SW RAID, I learned (the hard way) that it is safe not to mess around with driver version different than the ones that created the arrays. But if I have no alternatives... I will try.

3) I tried to find details of this new 2.6.3, but found none. Good common sense makes me think that the mantainer od mdadm was aware of the bug and solved it in this new version, making the all "scripting from grub menu" solution unnecessary...

Suggestions?

Revision history for this message
Davias (davias) wrote :

Searching, I found out that latest release is 2.6.4 - but no option in ubuntu repository (yet?).

ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
From: Neil Brown <email address hidden>
To: <email address hidden>
Subject: ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux
Date: Fri, 19 Oct 2007 16:06:29 +1000
Message-ID: <email address hidden>
Archive-link: Article, Thread

I am pleased to announce the availability of
   mdadm version 2.6.4

It is available at the usual places:
   http://www.cse.unsw.edu.au/~neilb/source/mdadm/

Does some of the following changes applies to our bug?

Changes Prior to 2.6.4 release
    - Make "--create --auto=mdp" work for non-standard device names.
    - Fix restarting of a 'reshape' if it was stopped in the middle.
    - Fix a segfault when using v1 superblock.
    - Make --write-mostly effective when re-adding a device to an array.
    - Various minor fixes

Changes Prior to 2.6.3 release
    - allow --write-behind to be set for --grow.
    - When adding new disk to an array, don't reserve so much bitmap
        space that the disk cannot store the required data. (Needed when
 1.x array was created with older mdadm).
    - When adding a drive that was a little too small, we did not get
 the correct error message.
    - Make sure that if --assemble find an array in the critical region
 of a reshape, and cannot find the critical data to restart the
 reshape, it gives an error message.
    - Fix segfault with '--detail --export' and non-persistent
        superblocks.
    - Various manpage updates.
    - Improved 'raid4' support (--assemble, --monitor)
    - Option parsing fixes w.r.t -a
    - Interpret "--assemble --metadata=1" to allow any version 1.x
 metadata, and be more specific in the "metadata=" message printed
 with --examine --brief
    - Fix spare migration in --monitor.

Changes Prior to 2.6.2 release
    - --fail detached and --remove faulty can be used to fail and
 remove devices that are no longer physically present.
    - --export option for --detail or present information in a format
 that can be processed by udev.
    - fix internal bitmap allocation problems with v1.1, v1.2 metadata.
    - --help now goes to stdout so you can direct it to a pager.
    - Various manpage updates.
    - Make "--grow --add" for linear arrays really work.
    - --auto-detect to trigger in-kernel autodetect.
    - Make return code for "--detail --test" more reliable. Missing
 devices as well as failed devices cause an error.

Revision history for this message
brunus (reg-paolobrunello) wrote :

Davias,
have you tried any of the 2 releases?

brunus

Revision history for this message
Davias (davias) wrote :

Dear brunus,
no: I have not. I was thinking about using Fumihito procedure with 2.5.6, but then I discovered that 2.6.3 existed in ubuntu repository (although Synaptic packet manager does not find any update over 2.6.2) and was thinking about installing that. Then I discovered 2.6.4 is the latest mdadm version as shown above. I downloaded the source... and stopped.

I do not have enough knowledge to compile & install something as critical as driver for RAID (my RAID with data on it...); is not that I'm scared of it, but is my production machine and cannot risk data failure or restore time.

Also I'm not convinced of the results. If this is to just result in a "so-so" procedure like selecting "foulty drive" from grub menu... I'll wait for a cleaner solution - like stable and safe mdadm driver release that will make my RAID1 start with only 1 disk WITHOUT me having to do anything, like it should be.

Thanks all for your precious thinking.

Dave out

Revision history for this message
dbendlin (diego-bendlin-hotmail) wrote :

Well,

I went down to the root and installed Debian etch and tried mdadm and it worked just ok.

Regards,

Diego Bendlin

Revision history for this message
Davias (davias) wrote :

Meaning what, exactly?

You replaced ubuntu with debian?

Revision history for this message
dbendlin (diego-bendlin-hotmail) wrote :

Davias,

What id did is I replaced an ubuntu server 7.10 with a debian etch installation where I setup raid 1 using two sata disks, and after disconnecting one of the raid members System still was able to boot.

AFAIK ubuntu is debian's child, so I needed to try if the parent reproduced the error too, just to give you some more information to finally fix this issue on ubuntu 8.04 maybe.

Kind Regards,

Diego Bendlin

Revision history for this message
Davias (davias) wrote :

Dear Diego,
thank you for clarifying matters, I'm glad that it works for you.

So now you got RAID1 as it is supposed to run. Just by changing OS...
Is it that difficult to get it to work on Ubuntu?!?
I mean, it is a serious bug - not dependent on the SW component but on OS version - and no solution?
I thought Ubuntu was one of the most supported distributions... We have to wait for another release?

Revision history for this message
dbendlin (diego-bendlin-hotmail) wrote :

Well,

Im not a linux guru, soi I don't really understand why this is happening on Ubuntu (I find the same error since 7.04), I guess this could be issued a a bug fix to the current release but that will depend on the ubuntu development team.

In my opinion Ubuntu is great for desktop usage, I use it as dayly desktop and development workstation, I really like Ubuntu as desktop, haven't found a competitor thats is just that easy to install and setup, I know other distros are also great but normally you need a lot of expertise and configuration time to make it work as Ubuntu does out of the box.

As for the Ubuntu server release I must agree that this is a serious bug, for now I wouldn't install an ubuntu server if it needs to work over software raid (mdadm), after spending 2 weeks trying to make Ubuntu server work I finally switched to debian etch for my server installation.

Kind Regards,

Diego Bendlin

Revision history for this message
Jan Krupa (plnt) wrote :

Hi,

The problem can be workarounded by issuing this command on the BusyBox shell when Ubuntu is missing one of the RAID disks:

/sbin/mdadm --assemble --scan

+ reboot

It will remove the missing disk from RAID 1 and allow Ubuntu to boot in degraded mode next time.

I think the root cause of the problem is in forcing mdadm not to start in degraded mode by "--no-degraded" parameter in /etc/udev/rules.d/85-mdadm.rules. If you remove "--no-degraded" parameter from mdadm in /etc/udev/rules.d/85-mdadm.rules and rerun "sudo update-initramfs -u", Ubuntu doesn't refuse to boot even if one of the disks is missing (after this change, no workarounds are needed). The problem is that it starts in degraded mode in some cases even if both disks are present.

Tested on Ubuntu/Gutsy.

I appreciate any comments.

Thanks,
Jan

Revision history for this message
Ken (ksemple) wrote :

Hi,

I have been following this post for a couple of weeks and trying to solve this problem as a background job for around a month. So far I have tried many things to overcome this problem and the comment by Plnt seemed to be the most promising.

I tried issuing the --assemble --scan from the BusyBox shell as suggested with no luck, the system still wouldn't boot even though I was able to activate my degraded RAID sets in BusyBox.

I found the same problem when modifying the udev rules. Often the arrays would start degraded even when all the disks were available. I think the solution to this problem lies in modifying the udev rules, maybe we could add some code after the --no-degraded start attempt to start the arrays degraded if they haven't already started.

In my view this is a major problem, there is no point using a RAID1 root disk if you can't boot from a single disk if it's mirror fails.

Cheers,
Ken

Revision history for this message
Jan Krupa (plnt) wrote :

Hi Ken,

I wasn't able to boot from degraded array after running "/sbin/mdadm --assemble --scan" in few cases when I had other disks in my computer. If I disconneted the disks and attached just the working one, system booted without a problem in degraded mode (after running the command mentioned above). I think the reason is that mdadm scans for any RAID devices by their signatures on the disk (because there is no /etc/raidtab accessible) and maybe it finds the singatures in different order each time.

There is also "--run" parameter in mdadm which can help assembling RAID in degraded mode.

Sorry for non-detailed description but I currently don't have the computer with Ubuntu+RAID1 physically with me so I can't do the tests.

Jan

Revision history for this message
Davias (davias) wrote :

Dear Plnt & Ken,
thanks for providing & trying solutions to this "major problem" as it looks to me too.

But have any of you tried mdadm version 2.6.4, that I found around, as I stated a few posts up?

Regards,
Dave

Revision history for this message
Peter Haight (peterh-sapros) wrote :

I haven't messed around with the problem in Gutsy, but what Daniel said above about the difference between Fiesty and Gusty is not correct. Fiesty was also launching mdadm as devices were discovered from udev and that is exactly the problem. Both are pretty much the same, just the code has moved around some. Unfortunately the box I fixed this on is now in production and I haven't had the chance to setup another test one to port my fix to Gutsy, so I'll explain what's going on, and maybe someone else can fix it. The version of mdadm doesn't have anything to do with this problem. This problem is entirely due to the Ubuntu startup scripts.

What Ubuntu is doing is as each device gets discovered by udev it runs:

mdadm --assemble --scan --no-degraded

If your RAID is made up of say sda1 and sdb1, then when 'sda1' is discovered by Linux, udev runs 'mdadm --assemble --scan --no-degraded'. Mdadm tries to build a RAID device using just 'sda1' because that's the only drive discovered so far. It fails because of the '--no-degraded' flag which tells it to not assemble the RAID unless all of the devices are present. If it didn't include the '--no-degraded' flag, it would assemble the RAID in degraded mode. This would be bad because at this point we don't know if 'sdb1' is missing or it just hasn't been discovered by udev yet.

So, then Linux chugs along and finds 'sdb1', so it calls 'mdadm --assemble --scan --no-degraded' again. This time both parts of the RAID (sda1 and sdb1) are available, so the command succeeds and the RAID device gets assembled.

This all works great if all the RAID devices are working, but since it allways runs mdadm with the '--no-degraded' option, it won't assemble the RAID device if say 'sda1' is broken or missing.

My solution was to wait until mounting root failed due to a timeout and then try 'mdadm --assemble --scan' without '--no-degraded' to see if we can assemble a degraded RAID device. Hopefully by the time the root mount has timed out, Linux has discovered all of the disks that it can. This works on my Fiesty box, but as I said above, stuff got moved around for Gutsy and I haven't had a chance to build another box to try it out and fix Gutsy. Also I think my script didn't take into account the scenario where the RAID device isn't root.

Revision history for this message
Ken (ksemple) wrote :

Hi,

Plnt, I am currently re-syncing my RAID set and will then try again with my other drives disconnected. I tried removing my other disks at one stage, but can't recall whether I tried your suggestion in the BusyBox shell at the same time.

Davias, No I haven't tried 2.6.4. Wherever possible I try to use supported Ubuntu packages. This ensures that I have a simple support and upgrade path, and makes management of my machines considerably easier.

Peter; I agree, the problem isn't mdadm, it's the udev scripts (another reason I didn't pursue the mdadm version option). I am new to this, and only in the last week or so have been researching how udev works. How do you detect that mounting root has failed, and how do you hold off running mdadm until this point?

Thanks,
Ken

Revision history for this message
Peter Haight (peterh-sapros) wrote :

I've forgotten about how this works exactly, but if you take a look in:

/usr/share/initramfs-tools/scripts/local

If you look for the comment
# We've given up, but we'll let the user fix matters if they canomment:

The bit inside the while loop with the panic is the part that gets executed if there is a timeout trying to mount root. Here's what I put in my Fiesty version right before that comment.

 if [ ! -e "${ROOT}" ]; then
  # Try mdadm in degraded mode in case some drive has failed.
  /scripts/local-top/mdadm
 fi

This doesn't work anymore because of the changes to Gutsy. You could just try putting 'mdadm --assemble --scan' there, but that probably won't work. Everything is a little tricky in these scripts because they run before root is mounted, so stuff doesn't always work as you would expect.

Also, you can't just modify these scripts. After you change them, you have to use 'mkinitramfs' to generate the image that contains these scripts that is used during the boot up. I'd put in instructions, but I've forgotten how to do it myself.

Revision history for this message
Ken (ksemple) wrote :

Thanks, I'll give it a go. This is still a background task so it may take me a few days.

I discovered that you can't just edit these scripts a couple of weeks ago when I did some edits and they didn't take. It was Plnt's post which told me the how to rebuild the image (sudo update-initramfs -u). This helped me make some progress with udev.

I'll let you know how I go.

Cheers,
Ken

Revision history for this message
Ken (ksemple) wrote :

Plnt, My RAID set finished re-syncing and I tried your suggestion. I removed all drives except one /dev/sda, the first of my RAID set. When I rebooted and was presented with the BusyBox shell I entered "/sbin/mdadm --assemble --scan" and rebooted. Still no luck!

I will persist with modifying the udev scripts as suggested by Peter.

Thanks,
Ken

Revision history for this message
Ken (ksemple) wrote :

Hi,

Thanks everybody for your help. I have now fixed this on my machine using code similar to that suggested by Peter Haight.

Edit "/usr/share/initramfs-tools/scripts/local" and find the following comment "# We've given up, but we'll let the user fix matters if they can".

Just before this comment add the following code:

 # The following code was added to allow degraded RAID arrays to start
 if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; then
  # Try mdadm and allow degraded arrays to start in case a drive has failed
  log_begin_msg "Attempting to start RAID arrays and allow degraded arrays"
   /sbin/mdadm --assemble --scan
  log_end_msg
 fi

Peter's suggestion of just using [ ! -e "${ROOT}" ] as the condition test didn't work, so I used the condition test from the "if" block above this code and it worked fine.

To rebuilt the boot image use "sudo update-initramfs -u" as suggested by Plnt. This script calls the "mkinitramfs" script mentioned by Peter and is easier to use as you don't have to supply the image name and other options.

I have tested this a couple of times, with and without my other drives plugged in without any problems. Just make sure you have a cron item setup to run "mdadm --monitor --oneshot" to ensure the System Administrator gets an email when an array is running degraded.

This has worked on my machine and I think this is a sound solution. Please let me know if it solves your problems also.

This bug has the title "Bug #120375 is not in Ubuntu", does this mean that it is not considered to be an Ubuntu bug? I believe it is due to the fact that involves the way the startup scripts are configured (it is definitely not a bug in mdadm). How do we get this escalated to an Ubuntu bug so that it will be solved in future releases?

Good luck,
Ken

Revision history for this message
Johannes Mockenhaupt (mockenh-deactivatedaccount) wrote :

Ken,

Thanks for the patch. I've followed your instructions and tested booting with both physical discs and booting with one disc detached. The second test failed, the system would just stop like it did without the patch. Unfortunately I know next to nothing about udev, initramfs and friends so I can't do much by myself other than test.

Has anybody else tried Ken's solution?

Joe

Revision history for this message
Peter Haight (peterh-sapros) wrote :

Did you wait 3 minutes on the test with one disc detached? I think that by default, there is a three minute timeout before it gets to the place where Ken's patch is.

Revision history for this message
Ken (ksemple) wrote :

Johannes,

I agree with Peter, there is a 180 second delay in the code in an "if" block just before the suggested location for the patch. The comment above this delay says it is to ensure there is enough time for an external device to start should the root file system be on an external device. I changed this 180 second delay to 30 seconds for my testing.

I think it would be reasonable to put the mdadm patch before this delay also. The delay is OK if you are watching the system boot as it will grab your attention and remind you that you may have a problem.

Cheers,
Ken

Revision history for this message
Johannes Mockenhaupt (mockenh-deactivatedaccount) wrote :

I thought I waited long enough - I had read that comment about the 180s delay - but I didn't. I just tested it again and booting with a detached disc continues after 3 minutes. Even mail notification worked right away :-) After re-attaching the second drive I was dropped into the BusyBox shell. Just restarting "fixed" that and the machine started and is just resyncing. I think that may be another problem on my machine that's unrelated with this bug. Thanks Ken and Peter for the help!

Joe

Revision history for this message
dbendlin (diego-bendlin-hotmail) wrote :
Download full text (5.6 KiB)

Hello Guys,

Reading all of your post's helped me a lot understanding linux a little be more, thanks for sharing your knowledge.

As I stated a few post above I've tried mdadm on Debian and it works the just fine, so I compared the scripts in /usr/share/initramfs-tools/scripts and found out they are not so diferent from the ones in ubuntu but still there's a diference I wanned to share with you.

When you setup mdadm on debian, a mdadm file is created in /usr/share/initramfs-tools/scripts/local-top,(I'll paste its contend below). The scripts in this folder get called from the local script file (The one you guys suggest to patch). And if you compare the debian version of the local scrip, with ubuntu's version you'll find out its pretty similar. So I guess this could be a better solution since for example you wont have to wait 180 secs, and you don't include "intrusive" code in the local script.

Here goes the debian version of the local script file
[code]
# Local filesystem mounting -*- shell-script -*-

# Parameter: Where to mount the filesystem
mountroot ()
{
 [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-top"
 run_scripts /scripts/local-top
 [ "$quiet" != "y" ] && log_end_msg

 # If the root device hasn't shown up yet, give it a little while
 # to deal with removable devices
 if [ ! -e "${ROOT}" ]; then
  log_begin_msg "Waiting for root file system..."

  # Default delay is 180s
  if [ -z "${ROOTDELAY}" ]; then
   slumber=180
  else
   slumber=${ROOTDELAY}
  fi
  if [ -x /sbin/usplash_write ]; then
   /sbin/usplash_write "TIMEOUT ${slumber}" || true
  fi

  slumber=$(( ${slumber} * 10 ))
  while [ ${slumber} -gt 0 ] && [ ! -e "${ROOT}" ]; do
   /bin/sleep 0.1
   slumber=$(( ${slumber} - 1 ))
  done

  if [ ${slumber} -gt 0 ]; then
   log_end_msg 0
  else
   log_end_msg 1 || true
  fi
  if [ -x /sbin/usplash_write ]; then
   /sbin/usplash_write "TIMEOUT 15" || true
  fi
 fi

 # We've given up, but we'll let the user fix matters if they can
 while [ ! -e "${ROOT}" ]; do
  echo " Check root= bootarg cat /proc/cmdline"
  echo " or missing modules, devices: cat /proc/modules ls /dev"
  panic "ALERT! ${ROOT} does not exist. Dropping to a shell!"
 done

 # Get the root filesystem type if not set
 if [ -z "${ROOTFSTYPE}" ]; then
  eval $(fstype < ${ROOT})
 else
  FSTYPE=${ROOTFSTYPE}
 fi
 if [ "$FSTYPE" = "unknown" ] && [ -x /lib/udev/vol_id ]; then
  FSTYPE=$(/lib/udev/vol_id -t ${ROOT})
  [ -z "$FSTYPE" ] && FSTYPE="unknown"
 fi

 [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-premount"
 run_scripts /scripts/local-premount
 [ "$quiet" != "y" ] && log_end_msg

 if [ ${readonly} = y ]; then
  roflag=-r
 else
  roflag=-w
 fi

 # FIXME This has no error checking
 modprobe -q ${FSTYPE}

 # FIXME This has no error checking
 # Mount root
 mount ${roflag} -t ${FSTYPE} ${ROOTFLAGS} ${ROOT} ${rootmnt}

 [ "$quiet" != "y" ] && log_begin_msg "Running /scripts/local-bottom"
 run_scripts /scripts/local-bottom
 [ "$quiet" != "y" ] && log_end_msg
}
[/code]

And here is the contend of the mdadm script file located in /usr/share/initramfs-tools/local-top folder
[code]
#!/bin/sh
#
# Copyright © 2006 Martin F. Krafft <madduck@debian...

Read more...

Revision history for this message
dbendlin (diego-bendlin-hotmail) wrote :

Sorry for my last long post,

The idea is to leave the local script file as its shipped by default, and only add the mdadm script file to /usr/share/initramfs-tools/local-top directory.

Finally don't forget to rebuilt the boot image issuing "sudo update-initramfs -u" as suggested by Plnt.

Kind Regards,

Diego Bendlin

Revision history for this message
Ken (ksemple) wrote :

I have had a bit of a look into what Diego has found.

Took me a while, nearly gave up until I found this script: "/usr/share/initramfs-tools/hooks/mdadm". This script looks be a modified version of the one Diego has found on his Debian machine. It has the identical header and some similar code.

I would be hesitant to add the Debian script to the local-top folder. If you wanted a simple file copy solution and don't want to wait the 180seconds on the rare occasions you boot with a degraded array, create a script file with my suggested code in it and place is in the "/usr/share/initramfs-tools/scripts/init-top" folder and it will be called from the "local" script before the time delay.

Ken

Revision history for this message
dbendlin (diego-bendlin-hotmail) wrote :

Guys,

I think this issue could have many work arounds as it has been proved here.
In my opinion the idea behind sharing our experiences and efforts is to improve Ubuntu so that a future version will handle this automatically for the user. "Linux for human beings", remember?

Analyzing my Debian machines I have noticed that not only mdadm scripts exist in the local-top folder, but others like lvm for example, so I guess Debian installer handles this based on the user configuration made at install time, following this direction I think a better way to deal with this mayor issue will be to improve Ubuntu's installer so it can copy a template file to local-top folder based on the user input at installation time.

Finally I hope Ubuntu developers get to read this topic in order to address this issue for the upcoming 8.04 version of Ubuntu with is near =)

Kind Regards,

Diego Bendlin

Revision history for this message
Tomas (tvinar-gmail) wrote :

I have run into the same problem where my disk upgrade path included booting from temprorarily degraded array.

I have tried to update /etc/mdadm/mdadm.conf with the new file system UUIDs
(using mdadm --detail --scan and replacing the corresponding line in /etc/mdadm/mdadm.conf)
and after that update-initrd

Now the system seems to be booting without any problems.

I have also used -e 0.90 paramater in mdadm when assembling the degraded array (to create an older version of superblock that is recognized by the kernel), though I am not sure whether this had anything to do with the outcome.

Jens (jens.timmerman)
Changed in mdadm:
status: New → Confirmed
Changed in initramfs-tools:
assignee: nobody → kirkland
status: New → Confirmed
Changed in mdadm:
assignee: nobody → kirkland
importance: Undecided → Medium
milestone: none → ubuntu-8.10
Changed in mdadm:
status: Confirmed → Triaged
Kees Cook (kees)
Changed in initramfs-tools:
status: Confirmed → In Progress
Changed in mdadm:
status: Triaged → In Progress
Changed in initramfs-tools:
assignee: nobody → kirkland
status: New → In Progress
assignee: kirkland → nobody
status: In Progress → Confirmed
importance: Undecided → Medium
milestone: none → ubuntu-8.10
Changed in initramfs-tools:
status: In Progress → Fix Released
Changed in mdadm:
status: In Progress → Fix Released
Ace Suares (acesuares)
description: updated
69 comments hidden view all 149 comments
Revision history for this message
no!chance (ralf-fehlau) wrote :

Full ack to miguel. Why is somebody using a raid1 on his system. Does he want to have trouble, if a disk fails, or does he want a running system and to be informed about a hardware failue? Ubuntu raid is useless! The "conservative" mode is useless! If i had a 4 disk RAID1 system or a RAID5 with a spare disk and ONE fails. Is it useful to stop the boot process???

I have the same issue on my new system. First, I had one disk and decided to upgrade to software raid. With the second HD, I created a degraded raid1, copied the contents of the first disk to the second, and wanted to add the first disk to the raid. Ubuntu drops me into the shell. :-( And in spite of booting with a live CD and adding the first disk to the raid, the system refused to boot.

Because it is a new system without any data on it, I will do a new installation with debian or suse. For my home server, I will see.

Revision history for this message
no!chance (ralf-fehlau) wrote :

Another thing to mention: Which systems are using raid? .... Right! ... Server! Such systems are usually maintained remotely and rebooted through a ssh connection. This message an the question is very very useful. :-(

> If you abort now, you will be provided with a recovery shell.
> Do you wish to boot the degraded RAID? [y/N]:

The last you will see from this server is ....

"rebooting now"

Revision history for this message
Stas Sușcov (sushkov) wrote :

Can somebody explain, is this bug fixed in hardy packages?

A lot of comments, and not a single clear report!!!

If it is not fixed... Is there a patch for local script in initramfs-tools and mdadm or is there any rebuilt package with the fixes?

Currently in Hardy i got:
~$ apt-cache policy mdadm
mdadm:
  Installed: 2.6.3+200709292116+4450e59-3ubuntu3
  Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
  Version table:
 *** 2.6.3+200709292116+4450e59-3ubuntu3 0
        500 http://ro.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status
stas@baikonur:~$ apt-cache policy initramfs-tools
initramfs-tools:
  Installed: 0.85eubuntu39.2
  Candidate: 0.85eubuntu39.2
  Version table:
 *** 0.85eubuntu39.2 0
        500 http://ro.archive.ubuntu.com hardy-updates/main Packages
        100 /var/lib/dpkg/status
     0.85eubuntu36 0
        500 http://ro.archive.ubuntu.com hardy/main Packages
~$ apt-cache policy mdadm
mdadm:
  Installed: 2.6.3+200709292116+4450e59-3ubuntu3
  Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
  Version table:
 *** 2.6.3+200709292116+4450e59-3ubuntu3 0
        500 http://ro.archive.ubuntu.com hardy/main Packages
        100 /var/lib/dpkg/status

Thank you in advance!

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

This is fixed in Intrepid, not in Hardy.

:-Dustin

Revision history for this message
Ace Suares (acesuares) wrote :

On Thursday 09 October 2008, Stanislav Sushkov wrote:
> Can somebody explain, is this bug fixed in hardy packages?
>
> A lot of comments, and not a single clear report!!!
>
> If it is not fixed... Is there a patch for local script in initramfs-
> tools and mdadm or is there any rebuilt package with the fixes?
>
> Currently in Hardy i got:
> ~$ apt-cache policy mdadm
> mdadm:
> Installed: 2.6.3+200709292116+4450e59-3ubuntu3
> Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
> Version table:
> *** 2.6.3+200709292116+4450e59-3ubuntu3 0
> 500 http://ro.archive.ubuntu.com hardy/main Packages
> 100 /var/lib/dpkg/status
> stas@baikonur:~$ apt-cache policy initramfs-tools
> initramfs-tools:
> Installed: 0.85eubuntu39.2
> Candidate: 0.85eubuntu39.2
> Version table:
> *** 0.85eubuntu39.2 0
> 500 http://ro.archive.ubuntu.com hardy-updates/main Packages
> 100 /var/lib/dpkg/status
> 0.85eubuntu36 0
> 500 http://ro.archive.ubuntu.com hardy/main Packages
> ~$ apt-cache policy mdadm
> mdadm:
> Installed: 2.6.3+200709292116+4450e59-3ubuntu3
> Candidate: 2.6.3+200709292116+4450e59-3ubuntu3
> Version table:
> *** 2.6.3+200709292116+4450e59-3ubuntu3 0
> 500 http://ro.archive.ubuntu.com hardy/main Packages
> 100 /var/lib/dpkg/status
>
> Thank you in advance!

It's not fixed in hardy LTS, which is very strange...

ace

Revision history for this message
Stas Sușcov (sushkov) wrote :

Can you point to a wiki page or a comment in this thread where I'll find
a solution for hardy?

Or I should install Intrepid packages?

Thank you.

On Thu, 2008-10-09 at 15:14 +0000, Dustin Kirkland wrote:
> This is fixed in Intrepid, not in Hardy.
>
> :-Dustin
>
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message
RpR (tom-lecluse) wrote :

Dustin I love the work that you've put in to this but I need to stand by Ace Suares and for a LTS version it should be fixed.
It prevents me to use ubuntu for my servers which uses software raid. The ones with hardware raid could use ubuntu.

But sticking with debian for the moment because of this.
If hardy wasn't a LTS version I would understand when you would just say upgrade to ...

Revision history for this message
Stas Sușcov (sushkov) wrote :

Just post a file with patch which works for hardy, and that's all folks :)

Revision history for this message
Ace Suares (acesuares) wrote :

On Thursday 09 October 2008, Stanislav Sushkov wrote:
> Just post a file with patch which works for hardy, and that's all folks
>
> :)

No it's not.

There is a procedure for that, so it will be updated automatically. That
procedure I followed but at some point in the procedure, powers
of 'normal' people are insufficient. We have to find an overlord who will
sponsor the process and move it forward. Even the developer who made the
patch for Ibex, can not move this forward on his own.

Where are the Power Puff Girls when you need them ?

Revision history for this message
Stas Sușcov (sushkov) wrote :

You mean this procedure?

It didn't work for me.
I mean I patched my "local" manually, but it broke my init image
in /boot after update-initramfs...

Maybe I the patch really works, but no one from this thread reported it
as a solution...

On Thu, 2008-10-09 at 20:37 +0000, Ace Suares wrote:
> On Thursday 09 October 2008, Stanislav Sushkov wrote:
> > Just post a file with patch which works for hardy, and that's all folks
> >
> > :)
>
> No it's not.
>
> There is a procedure for that, so it will be updated automatically. That
> procedure I followed but at some point in the procedure, powers
> of 'normal' people are insufficient. We have to find an overlord who will
> sponsor the process and move it forward. Even the developer who made the
> patch for Ibex, can not move this forward on his own.
>
> Where are the Power Puff Girls when you need them ?
>
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Fixing Hardy would mean, at the very least:
 1) porting the patches for:
   * mdadm
   * initramfs-tools
   * grub
   * grub-installer
 2) Rebuilding the installation ISO's.
 3) Obsessively regression testing the new install media.

After Intrepid releases on October 30, 2008, I will spend a few cycles
considering the port to Hardy 8.04.2. No guarantees, but rest assured
that I am a Canonical/Ubuntu Server developer, who runs Hardy+RAID on
my own systems, and have plenty of motivation to see this fixed.

If anyone wants to volunteer to do (1), that would help move things
along. And I certainly hope at the very least some of you are willing
to help with (3).

:-Dustin

p.s. Please understand that your sarcasm is completely unappreciated.

Revision history for this message
Stas Sușcov (sushkov) wrote :

Dustin,
what about those packages for ibex. If I update my hardy with those, so
I risk serious troubles?

Or it is not possible cause of difference between kernels?

Someone did this before?

On Thu, 2008-10-09 at 21:21 +0000, Dustin Kirkland wrote:
> Dustin
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Stanislav-

First, the Intrepid packages won't work, due to toolchain (glibc,
klibc) differences. I just tested out of curiosity in a virtual
machine.

Second, under no circumstances would I recommend this as an acceptable
thing to do. If you are beholden to running Hardy, I presume that's
because Hardy is supported as an LTS. Once you start changing the
Hardy LTS packages and replacing them with Intrepid packages, you're
no longer in a supported configuration. Especially when we're talking
about things that are so fundamental to booting your system.

If you would be willing to upgrade these four key packages to
Intrepid, I'd say you would be much better served upgrading to
Intrepid across the board.

:-Dustin

Revision history for this message
Ace Suares (acesuares) wrote :

Dustin,

I am glad you may be spending some time on this bug.

You mention rebuilding ISO's. But why can't it just be an upgrade to the existing installations?
I mean, on an existing system, all we need to do is upgrade ?

Also, I am not being sarcastic at all, when I say that I can not understand that there needs to be so much regression testing for the patch that removes the bug, since to introduce the bug was possible at all. The bug is not present in Debian, and was not present in Dapper.

Anyway, I am going to install the workaround on all my affected machines because I cannot wait that long. (But then I will get some trouble when an update finally comes around..).

And I am not going to advice using Ubuntu for servers that use software raid anymore. I am really disappointed in the way this is going. I am happy it will be fixed in Ibex tough. Just keep smiling...

I am also unsubscribing from this bug. I feel that I am becoming unconstructive.

Revision history for this message
Ross Becker (ross-becker) wrote :

Dustin,
   I've been doing sysadmin work for 15 years. I chose to try out Ubuntu
for a home RAID server project, and loaded up Hardy as it was an LTS
edition. In my first day working with Ubuntu, I ran into this bug, a bug
where the version of mdadm (pretty well out of date) on Hardy was unable to
resume a RAID reshape operation, and the ext2resize tools incorrectly detect
the RAID stride. All of these bugs are BASIC functionality of the storage
management tools which should have never made it through any sort of QA.

I reported all of them as bugs, and this is the ONLY one which has even
recieved a developer response after 2 months.

For a Long Term Support edition, that's shameful. Not only that, but a bug
which in a COMMON situation can prevent boot and your response as to whether
a fix will be backported is "I'll spend a few cycles considering backporting
it"

Your lack of understanding for someone's sarcasm is completely unjustified,
and the level of developer/bugfix support I'm seeing for Ubuntu is
pathetic. With this level of support, using Ubuntu for any sort of a
corporate server application would be a really poor decision.

Revision history for this message
brunus (reg-paolobrunello) wrote :

Hello,
I second RpR post word by word: it's sincerely hard to accept that such a serious bug is, not present until 7.04, is still open 14 months and 2 distros after, one of them being LTS. And this is even more true for the - arguably called - server edition: it's like putting on the market a TIR truck that has no double tyres and doing it twice.

Dustin,
could you please explain how do you run Hardy+RAID: it is still not clear to me after reading the whole thread, sorry.

Thanks,

brunus

Revision history for this message
Steve Langasek (vorlon) wrote :

Unsubscribing ubuntu-sru. Please do not subscribe the SRU team to bugs that don't actually include proposed fixes to previous releases. a) this is not the documented procedure for SRU fixes, b) the SRU team has other things to do that actually benefit Ubuntu users, instead of following a "bug report" that consists of lambasting Ubuntu for a bug that the relevant developers have already agreed should receive attention.

With regard to the last, I've accepted the nomination for hardy based on Dustin's statement that he'll work on this for 8.04.2.

Ross, as for the other bugs you mentioned: the ext2resize package is not part of Ubuntu main, which means it's not part of the set of software supported by Canonical. It's a third-party tool that ext3 upstream has repeatedly recommended against using. The package from Debian is included in the universe repository in the hope that it's useful to someone, but there is no one at all in Ubuntu tending that package - there's no reason to think it was subjected to any QA for it to fail. If you believe this problem makes the package unusable, then I'm happy to escalate bug #256669 and have the ext2resize package removed from subsequent Ubuntu releases.

Changed in initramfs-tools:
status: New → Confirmed
Changed in mdadm:
importance: Undecided → Medium
status: New → Confirmed
Changed in initramfs-tools:
importance: Undecided → Medium
Revision history for this message
agent 8131 (agent-8131) wrote :
Download full text (4.0 KiB)

I think it's time for some tough love. No one would be taking the time to comment on this if they didn't want to see Ubuntu Server be a better product. I personally feel this is a significant issue for because it demonstrates Canonical's interest in supporting an LTS release and seriousness about becoming a presence in the server market. I know it can be difficult when people are lambasting a product you've put a lot of time into, believe me, I got more than my fair share of that this week. However, sometimes you have to step back and realize that your product quality has been lower than many people expect and you have to either step up or risk losing more customers. Make no mistake, this bug has lost a lot of sysadmins, some of whom had to fight hard to get Ubuntu onto servers in their workplaces in the first place. I was one of them, and I know a few more personally. I pitched that it made sense to have Ubuntu on the Server because of the benefits of the LTS release, including the longer support time and the large community that contributes to Ubuntu, therefore leading to more bugs being found and resolved. However, I doubt I will be able to propose Ubuntu again until version 10.04. If this bug were to be resolved in 8.04.2 I might at least start pitching it next year, barring any other bugs of this level of severity.

To respond to Steve Langasek, while I understand that a lot of these emails are not terribly useful, this bug is exactly what the SRU team is supposed to be addressing. There have been many proposed fixes in this thread and 8 patches uploaded. Do any of them work correctly? Well that is the question, but it's inaccurate to state that this bug does not contain proposed fixes. Furthermore this fits the SRU criteria of being a high impact bug representing a severe regression from earlier versions and one which also may lead to a loss of user data. When the average user is confronted with the initramfs shell during a drive failure I suspect they have the potential to do serious damage to their file systems in an attempt to fix the problem.

I don't feel it's possible for me to understate the severity of this bug and how badly sysadmins are going to react when they encounter it or read about it. It is certainly not the kind of bug one can dismiss in an LTS release if LTS is to say anything about quality, and hence suggestions to upgrade to Intrepid, while acceptable to a home user building a server, are not going to be acceptable in the workplace. If this is a market segment that Ubuntu Server is catered to than this issue needs to be addressed. If on the other hand Ubuntu Server is meant merely for enthusiasts with their home file servers than the solution should be to make sure that goal is clearly articulated.

To keep us focused on the work at hand and to avail myself of the opportunity that having this number of people working for fix this bug represents I'll say that I've tried a number of solutions on this page but none have been satisfactory. I tried changing the udev rule as suggested above (see Plnt 2007-12-25) but got the same results that have been reported: I can get the system to boot any time the ...

Read more...

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I'm un-subscribing from this bug as well.

Anyone who believes that berating the developer who has (finally)
fixed this bug in the current release of Ubuntu, and offered to fix it
in the LTS release, is constructive desperately needs to re-read the
Ubuntu Code of Conduct.
 * http://www.ubuntu.com/community/conduct

If you feel that this issue is urgently blocking your organization
from adopting Ubuntu LTS at an enterprise level, I remind you that
contracting commercial support from Canonical or one of the numerous
Ubuntu partners around the globe is always an option.
 * http://www.ubuntu.com/support/paid
 * http://webapps.ubuntu.com/partners/

Dustin

Changed in initramfs-tools:
assignee: kirkland → nobody
Changed in mdadm:
assignee: kirkland → nobody
Revision history for this message
Stas Sușcov (sushkov) wrote :

Dustin, Steve, thank you for your support.
I'll be waiting for the updates on my hardy, and pray to not get in a
situation where this bug will show up ugly on my servers.

I remain subscribed, as I believe guys will make their job done soon.

Good luck!

On Fri, 2008-10-10 at 06:01 +0000, Dustin Kirkland wrote:
> I'm un-subscribing from this bug as well.
>
> Anyone who believes that berating the developer who has (finally)
> fixed this bug in the current release of Ubuntu, and offered to fix it
> in the LTS release, is constructive desperately needs to re-read the
> Ubuntu Code of Conduct.
> * http://www.ubuntu.com/community/conduct
>
> If you feel that this issue is urgently blocking your organization
> from adopting Ubuntu LTS at an enterprise level, I remind you that
> contracting commercial support from Canonical or one of the numerous
> Ubuntu partners around the globe is always an option.
> * http://www.ubuntu.com/support/paid
> * http://webapps.ubuntu.com/partners/
>
> Dustin
>
> ** Changed in: initramfs-tools (Ubuntu)
> Assignee: Dustin Kirkland (kirkland) => (unassigned)
>
> ** Changed in: mdadm (Ubuntu)
> Assignee: Dustin Kirkland (kirkland) => (unassigned)
>
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message
agent 8131 (agent-8131) wrote :

No one should be taking any of this personally or feel berated. People should feel free to be honest about their experiences without fear that doing so will drive away people that could be of help in resolving the issues. I've been keeping my mouth shut on this bug for a couple of months and I apologize for being long winded when I finally took the time to post my experiences.

Dustin Kirkland, you did not offer to fix this problem in the LTS release. You stated that you would consider it. And that's great, but without any commitment to fix the problem in 8.04 it would be nice to actually be able to offer up a simple solution to the people that come across this page when trying to resolve this bug.

I will not unsubscribe from this bug because I know that people will be bit by it again, come to me for help, and I wish to have a good solution for them when they do. I re-read the Code of Conduct and note that being collaborative seems to be at odds with disengaging when one finds the process unpleasant. I certainly respect the right of people to do so but I feel it's a loss to actually making progress on this bug. Granted, I've only been subscribed for 2 months; if I had been subscribed for a year I acknowledge I might feel differently.

Since people seem to feel that this hasn't been a friendly exchange I will be happy to buy a drink of choice to:
1) Anyone that comes up with a good solution to this bug for Ubuntu 8.04 and 8.04.1.
2) Anyone who works to get this bug fixed in 8.04.2.
3) Dustin Kirkland and anyone else who makes sure this bug never appears in 8.10.

Revision history for this message
Stas Sușcov (sushkov) wrote :

Subscribing to what agent 8131 said.
Now there are two drinks to get for the guy who helps us!

On Fri, 2008-10-10 at 07:13 +0000, agent 8131 wrote:
> agent 8131
--
() Campania Panglicii în ASCII
/\ http://stas.nerd.ro/ascii/

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Now that the Intrepid development cycle has wound down, I was finally able to circle back to this.

For those of you interested in seeing this backported to Hardy, see Bug #290885.

I have test packages available in my PPA--standard PPA disclaimers apply. Please refer to Bug #290885 for test instructions.

:-Dustin

Revision history for this message
Nick Barcet (nijaba) wrote :

Has anyone tested the packages in Dustin's PPA? (Are the beers coming?)

It would be really nice know before we place those in -proposed to follow the proper SRU process for hardy.

Changed in initramfs-tools:
assignee: nobody → kirkland
status: Confirmed → In Progress
Changed in mdadm:
assignee: nobody → kirkland
milestone: none → ubuntu-8.04.2
status: Confirmed → In Progress
Changed in initramfs-tools:
milestone: none → ubuntu-8.04.2
Changed in grub:
assignee: nobody → kirkland
importance: Undecided → Medium
milestone: none → ubuntu-8.04.2
status: New → In Progress
status: New → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Thanks, Nick.

Please respond with any test results in Bug #290885.

Bug #120375 is hereby reserved for wailing, moaning, fussing, cursing, complaining, lamenting, murmuring, regretting, repining, bewailing, deploring, weeping, mourning, protesting, charging, accusing, disapproving, grumbling, fretting, whining, peeving, quarreling, resenting, dissenting, discontenting, malcontenting, bellyaching, and non-constructive criticisms :-)

:-Dustin

Changed in grub:
status: In Progress → Fix Released
Changed in initramfs-tools:
status: In Progress → Fix Released
Changed in mdadm:
status: In Progress → Fix Released
Revision history for this message
Tapani Rantakokko (trantako) wrote :

It seems that the issue is now fixed in Intrepid, and also backported to Hardy 8.04.2, which was released a few days ago. However, it is unclear to me if I need to reinstall from 8.04.2 distribution media, or can I fix an existing Hardy installation via software updates?

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

You should not need to reinstall.

To solve this, you need to:
 a) live upgrade all packages (specifically, you need to upgrade grub,
mdadm, and initramfs-tools)
 b) install grub to the raid (for instance, if /dev/md0 provides your
/boot directory, you can do "grub-install /dev/md0")

Cheers,
:-Dustin

Revision history for this message
Tapani Rantakokko (trantako) wrote :
Download full text (3.7 KiB)

Dustin, thank you for your quick answer and tips.

It took me a while to test it, as I have an encrypted RAID 1 array with LVM, and things are not that straightforward with that setup.

So far I have been using one of the tricks described in this thread earlier (ie. edit /etc/udev/rules.d/85-mdadm.rules: change the "--no-degraded" to "-R", after that "sudo update-initramfs -u -k all"). It allows me to boot with only one drive, but has that annoying side effect where one of the partitions often starts in degraded mode, even if both drives are in fact present and working.

I wanted to get rid of that problem, so I did this:
- return 85-mdadm.rules as it used to be, ie. --no-degraded
- sudo update-initramfs -u -k all
- cat /proc/mdstat and check that all drives are online and sync'ed
- upgrade all packages
- re-install grub to both drives

These are my test results:
1. Restart computer with both disks
-> everything works OK

2. Restart computer with only one disk
-> Keeps asking "Enter password to unlock the disk (md1_crypt):" even though I write the correct password

3. Restart computer again with both disks
-> everything works OK

So, first it seemed that the fix does not work at all, as Ubuntu starts only when both disks are present. Then I made some more tests:

4. Restart computer with only one disk
-> Keeps asking "Enter password to unlock the disk (md1_crypt):" even though I write the correct password
-> Now press CTRL+ALT+F1, and see these messages:
Starting up ...
Loading, please wait...
Setting up cryptographic volume md1_crypt (based on /dev/md1)
cryptsetup: cryptsetup failed, bad password or options?
cryptsetup: cryptsetup failed, bad password or options?
-> After waiting some minutes, I got dropped into the busybox
-> Something seems to be going wrong with encryption

5. Restart computer with only one disk, without "quiet splash" boot parameters in /boot/grub/menu.lst
-> Got these messages:
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
... other stuff ...
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
Command failed: Not a block device
cryptsetup: cryptsetup failed, bad password or options?
cryptsetup: maximum number of tries exceeded
Done.
Begin: Waiting for root file system... ...
-> After waiting some minutes, I get the question whether I want to start the system with degraded setup. However, it does not matter what I answer, as the system cannot start since the encryption has already given up trying. I don't know what it was trying to read as a password, because I did not type anything.

6. Restart computer with only one disk, with "quiet splash bootdegraded=true" boot parameters in /boot/grub/menu.lst
-> Keeps asking "Enter password to unlock the disk (md1_crypt):" even though I write the correct password
-> Now press CTRL+ALT+F1, and see these messages:
Starting up ...
Loading, please wait...
Setting up cryptographic volume md1_crypt (based on /dev/md1)
cryptsetup: cryptsetup failed, bad password or options?

Summary:
The fix does not seem to work, in case you have encrypted your RAID disks. To be more specific: after a long wait it does a...

Read more...

Revision history for this message
Tapani Rantakokko (trantako) wrote :

Ok, I'm answering myself: there is a workaround for getting it to work with LUKS encryption. You can run "sudo dpkg-reconfigure mdadm" and enable automatic startup with degraded RAID array if you want, or watch the screen and be quick enough to answer "Yes" when asked to start degraded. Nevertheless, you need to wait again until you're dropped to BusyBox. Then do this:

# to enter the passphrase. md1 and the md1_crypt are the same values
# you had to put in /target/etc/crypttab at the end of the install
cryptsetup luksOpen /dev/md1 md1_crypt

# (type your LUKS password, as requested)

# continue to boot!
<hit CTRL+D>

I found the instructions from here: http://ubuntuforums.org/archive/index.php/t-524513.html

Now, if only someone could give a hint on how to make this automatic, so that there would be no need to write anything. It is ok to wait a few minutes, though.

Nevertheless, I'm pretty happy now that I can use "--no-degraded" parameter in 85-mdadm.rules, yet get the system up in case a disk fails. In the rare case of an actual disk failure, writing a one-liner can be tolerated. Thank you everyone who have worked with this issue and helped to get it solved in Hardy.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Tapani-

I very much appreciate the detail with which you have constructed your
report, as well as your followup which provides a hint as to how one
might fix this issue. Thank you! But you are describing a different
issue, which is deeper, and more involved.

Please, please, please open a new bug report against mdadm ;-)

:-Dustin

Revision history for this message
Tapani Rantakokko (trantako) wrote :

Degraded RAID 1 array with encryption, Bug #324997.

Revision history for this message
kilroy (channelsconf) wrote :

Is this bug still present? I've installed a fresh 8.04.2 amd64 in a VirtualBox 2.1.2 with three SATA harddrives. When i switch off a drive, kernel is loadding but at the end the raid arrays are stopped and the system is unusable.

root@ubuntu:~# dpkg-reconfigure mdadm
"Do you want to boot your system if your RAID becomes degraded?" -> "YES"

root@ubuntu:~# uname -a
Linux ubuntu 2.6.24-23-server #1 SMP Mon Jan 26 01:36:05 UTC 2009 x86_64 GNU/Linux

root@ubuntu:~# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=8.04
DISTRIB_CODENAME=hardy
DISTRIB_DESCRIPTION="Ubuntu 8.04.2"

root@ubuntu:~# dpkg -l mdadm grub initramfs-tools
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-f/Unpacked/Failed-cfg/Half-inst/t-aWait/T-pend
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Beschreibung
+++-===========================-===========================-======================================================================
ii grub 0.97-29ubuntu21.1 GRand Unified Bootloader
ii initramfs-tools 0.85eubuntu39.3 tools for generating an initramfs
ii mdadm 2.6.3+200709292116+4450e59- tool to administer Linux MD arrays (software RAID)
root@ubuntu:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sda5[0] sdc5[2] sdb5[1]
      5124480 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md2 : active raid5 sda2[0] sdc2[2] sdb2[1]
      995840 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sda1[0] sdc1[2] sdb1[1]
      80192 blocks [3/3] [UUU]

root@ubuntu:~# mount | grep ^/
/dev/md1 on / type ext3 (rw,relatime,errors=remount-ro)
/sys on /sys type sysfs (rw,noexec,nosuid,nodev)
/dev/md0 on /boot type ext2 (rw,relatime)

This would be a real show stopper for the 8.04 LTS...

Revision history for this message
Tobias McNulty (tmcnulty1982) wrote :

Yeah, isn't 8.04 an LTS edition?

My 8.04 Desktop machine suffers from this problem. I think I got it fixed by adding the command to assemble the arrays degraded to the initramfs 'local' script, but finding what needed to be done was a real PITA and I'm sure others will continue to hit this issue.

Can't we get a fix in 8.04?

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

This bug has been fixed, and backported to hardy-updates.

:-Dustin

Changed in initramfs-tools:
status: Confirmed → Fix Released
Changed in grub (Ubuntu Hardy):
assignee: Dustin Kirkland (kirkland) → nobody
Changed in initramfs-tools (Ubuntu Hardy):
assignee: Dustin Kirkland (kirkland) → nobody
Changed in mdadm (Ubuntu Hardy):
assignee: Dustin Kirkland (kirkland) → nobody
Revision history for this message
Hosed (liveonaware) wrote :

Hi, I have Ubuntu 9.04. I have Two (2) 36gb Seagate Cheetah with Raid 1 - Mirroring Setup. When I boot to Ubuntu 9.04 with One (1) of the hard disks unplugged, it fails to boot, it drops me to a shell with initramfs. What's the deal with that? Does it mean that Raid1 is NOT WORKING with Ubuntu 9.04?

Any help is greatly appreciated. Thanks.

Revision history for this message
Emanuele Olivetti (emanuele-relativita) wrote :

Hosed wrote:
> Hi, I have Ubuntu 9.04. I have Two (2) 36gb Seagate Cheetah with Raid 1
> - Mirroring Setup. When I boot to Ubuntu 9.04 with One (1) of the hard
> disks unplugged, it fails to boot, it drops me to a shell with
> initramfs. What's the deal with that? Does it mean that Raid1 is NOT
> WORKING with Ubuntu 9.04?
>
> Any help is greatly appreciated. Thanks.
>

Hi,

I had a similar issue on Friday: a disk failed on a raid 1 ubuntu
9.10 system. After shutdown I removed the failed disk and added a new
one. While booting I got an initramfs shell, as in your case.

In my case the instructions on screen said that the system requested
me whether to boot with a degraded array (y/N) or not and that and I
did not answer quickly enough to the question, so, after timeout, it
brought me the initramfs shell. To be honest I never saw that question
on screen and the reason was that the splashscreen covered it.

So I just rebooted and pressed 'y' some times when the splashscreen
appeared and then the degraded boot started as expected. After
that I had to manually partition the new disk (I have 2 different raid1
systems, one for /boot, one for / ) and add each partition to the raid
system:

sfdisk -d /dev/sda | sfdisk /dev/sdb # partition sdb as sda
mdadm /dev/md0 -a /dev/sdb1 # add sdb1 to the raid1 system md0 (/boot)
mdadm /dev/md1 -a /dev/sdb2 # add sdb2 to the raid1 system md1 (/)

WARNING: this worked for my setting so don't take them as general instructions!
If you have a mirroring setup than you could need just to add sdb1.

After this the new disk sdb started syncing as expected.

HTH,

Emanuele

Revision history for this message
Hosed (liveonaware) wrote :

Thank you Emanuele, I really haven't tried what you're suggesting, will definitely do it if I have time, or if my hard disk crashes for real. I've a question, I subscribe to this bug, but when I look at my ACCOUNT -> BUGS -> LIST SUBSCRIBED BUGS, it's not there, and I really want to be updated on this!!! Is this a bug? Help! :)

Revision history for this message
Francis Mak (francis-franfran) wrote :

My server was in 8.04 LTS, having a raid 1 setup with 2 mirrored harddisk.
I have tested, if I unplug one of the disk, the raid 1 was working fine.

Sadly, after upgraded my system to 10.04.1 LTS, if I unplug one of the disk and boot, the raid 1 become inactive.

#mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.

# cat /proc/mdstat
md0 : inactive sdb1[0](S)

I have removed some of the messages to save space here. In the above case, I have unplug the sdc, the raid 1 is not working anymore. It treated the sdb as spare disk.

If I plug back the sdc, the raid 1 will be back to normal. It doesn't make any sense.

Honestly, I am not a real expert in system admin. I implemented the software raid is simply to protect my data in case of a harddisk failure. Now I need to pray for both disk working in order for this raid 1 keeping a live..... really frustrating.

MAY I KNOW, what I need to do to fix this problem?

Thank you very much!

Revision history for this message
François Marier (fmarier) wrote :

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003309 and https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/251164 had the info I needed to fix my problems with encrypted RAID1 not booting in degraded mode.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

This is a very old LP bug, but for completeness it worth mentioning here that recently some patches were merged on initramfs-tools and cryptsetup, that allow a good experience booting with LUKS-encrypted rootfs on top of a degraded RAID1 array; for details, please check: https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/1879980

Cheers,

Guilherme

Displaying first 40 and last 40 comments. View all 149 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.