no block devices found after an upgrade from 8.10 to 9.04 on a soft RAID1 system

Bug #358054 reported by Manson Thomas on 2009-04-08
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

Hi,

I've decided to upgrade to kubuntu 9.04(64bit) from kubuntu 8.10(64bit) on the following system :

P5N32-E Sli plus motherboard
C2D 6250
4GB of RAM
2x 250GB SATA, RAID1 software for linux (sdc, sdd)
2x150GB Raptor SATA Raid0 for windows (fake raid from motherboard) (sda, sdb)

I've setup my kubuntu with the alternate CD with a software RAID1).

I run as told the "update-manager -d" command.

Nothing to report on the upgrade (just ask me what to do with the vim conf file). The system was running well before the upgrade (and as already been upgraded from 8.04 to 8.10)

On reboot I have this message:

no block devices found (4 times)
Gave up waiting for root device.
ALERT! /dev/md3 does not exist. dropping to a shell!

I've tryed to reboot with the fake raid for windows disabled (as it was in troubleshouting). no change.

on busybox (which I get after the errors):
in dmesg output I can see these kind of message:

sdd : sdd1 sdd2 <<6>attept to access beyond end of device
sda: rw=0 want=586067137, limit=293046768
Buffer I/O error on device sda1, logical block 293033536
(but these kind of message appears on gparted livecd)

in ls /dev, I can see the and sdcX sddX partitions are here.

I've no clue on how to bring my system back... (i'm searching)...

>I've tryed to reboot with the fake raid for windows disabled (as it was in troubleshouting). no change.
(disabled in the bios)
Note that gparted still saw the hard drives so maybe I've to retry with the drives unplugged

with

mdadm --assemble --scan

I get the md 0,1,2,3 of my system

So I tryed to chroot and reinstall grub :

mkdir /mnt

mount -t ext3 /dev/md3 /mnt
mount -o bind /dev /mnt/dev
mount -t proc none /mnt/proc
mount -t sysfs none /mnt/sys
mount -t ext2 /dev/md0 /mnt/boot
mount -t ext3 /dev/md2 /mnt/var
chroot /mnt /bin/bash

grub
grub>root (hd0,0)
grub>setup (hd0)
grub>quit

but no change... :'(

also on setup(hd0) grub tells that he didn't find stag1 (but found the others)

Also on mdadm --assemble --scan

the two first line I get before i see my mdX devices created are :

mdadm: CREATE user root not found
mdadm: CREATE group disk not found

I don't know if it's wrong or not...

I've succeeded in booting my system like before the upgrade (with kde...) .

On busybox console,

I typed :

mdadm -As (--assemble --scan)

so that it create /dev/mdX devices.

then type exit and it resume the boot process, which succeed as the raid device are up.

but this won't fix the problem (on reboot still the same issue), but at least I've my system back...

Now I need to find out what to do so that it boot normally without falling to busybox.

Also there must be an issue in the upgrade process that make my kubuntu box loose the raid config at boot time.

kede (kede) wrote :

I confirm this.
Mainboard is Gigabyte GA-P35-DS4, Raid disabled, ahci enabled.

2 Sata Drives:
Boot paritition raid 1 (md0)
Root partition raid 0 (md1)

I ran jaunty for some weeks. After yesterday's update, it drops to busybox because /dev/md1 can't be found.
In /dev there are no devices. No sda, sda, md0, md1...

Booting an already installed vanilla kernel (2.6.29-020629-generic) still works, I guess there is a problem with the 2.6.28-11-generic kernel.

Michael Nixon (zipplet-zipplet) wrote :

This bug happened to me also when upgrading 8.10 to 9.04 (with do-release-upgrade). Definately a buggy kernel.

Asus motherboard (not sure of exact - can get if needed) with RAID hardware disabled.
2 SATA hard disks as /dev/sdb and /dev/sdc
Fails to boot due to a failure to start the RAID.

Issues:
1) /dev/md0 vanished - it seems the new kernel wants to use /dev/md/0
2) Can't reconstruct the array because of a "Device or resource busy" error adding the second disk. There were no issues with 8.10. This prevents one from simply reconstructing the array with --assemble --scan and editing fstab.

Temporary workaround:
When boot fails and drops you into a shell, do this (device names will differ for you):
losetup /dev/loop1 /dev/sdb
losetup /dev/loop2 /dev/sdc
mdadm --assemble /dev/md/0 /dev/sdb /dev/sdc

If mdadm only starts one drive you may need to add the second:
mdadm --manage /dev/md/0 --add <missing drive name>

Then type 'exit' and watch your system boot. You may need to edit fstab to reference /dev/md/0 instead of /dev/md0
This allows one to use the box as intended but those commands have to be entered EVERY BOOT!

Michael Nixon (zipplet-zipplet) wrote :

!!!!!! Sorry!!!!!!
 I posted a bad workaround - it should read:

Temporary workaround:
When boot fails and drops you into a shell, do this (device names will differ for you):
losetup /dev/loop1 /dev/sdb1
losetup /dev/loop2 /dev/sdc1
mdadm --assemble /dev/md/0 /dev/loop1 /dev/loop2

And the missing drive name should be the loop device that didn't assemble into the array.

Somehow the loopback driver is able to get a hold of the partition, whereas mdadm is not.

Michael Nixon (zipplet-zipplet) wrote :

More news:

I rebooted my system and looked in the grub menu (hit ESC) at the prompt. It listed 2 kernels for ubuntu 9.04.
2.6.28
2.6.27

I choose the second one and got dropped into the same recovery shell. But this time, mdadm was able to build my array WITHOUT loopback workarounds! It does however still use /dev/md/* rather than /dev/md*

I'd put my money on 2.6.28 being broken somehow?

We're having exactly the same problem on a headless server.
temporary workaround: use an older kernel. Searching the net for other solutions...

kede (kede) wrote :

For me it works again with the ubuntu default kernel 2.6.28-11-generic.
I changed nothing, just did an upgrade.

The disks are available with /dev/md0 and /dev/md1.

For me, it wors again too, with the last kernel update it works.

I can still see the "no block devices found" for time if I switch back to the first tty, but it boot normally.

Doesn't work with vmlinuz-2.6.28-11-server.

Mark Doliner (thekingant) wrote :

This is still happening for me even with 2.6.28-11. I get dropped to a shell each time I boot. I type "mdadm -As ; exit" and it activates my raid arrays and continues booting normally. Is this supposed to happen automatically somewhere as part of the boot process? Is there a config file I should check?

0815 (christian-skala) wrote :

Have the same issue. Tried all workarounds I found on the web. It's really frustrating that I can boot Ubuntu 9.04 2.6.28-11 ONLY by typing "mdadm --assemble --scan; exit" in the busybox.

If anybody out there finds a solution please post!

Thanx

Outdooralex (alex-outdooralex) wrote :

Same for me with 2.6.28-11

no block devices found
/dev/md0 does not exist

best workaround from 0815: "mdadm --assemble --scan; exit"

please post solutions, thanks!

I confirm this bug.

I am using 2.6.28-11-generic on Jaunty.

I have a "/boot" on software RAID1 and the remaining FS on a software RAIDO.
I use a PackardBell Laptop with 2 HDD.

The workaround is OK, but I have to type it each boot.

Please advice.

Outdooralex (alex-outdooralex) wrote :

Mine disappeared after a while - as I just set up this machine, I installed some more packages and edited some configuration-files...

have no idea what caused it to go away and what caused it to appear though.

kede (kede) wrote :

After the problem occured to my machine, I installed the ubuntu-package of the vanilla-kernel
(see https://wiki.ubuntu.com/KernelMainlineBuilds)
It was Kernel v2.6.29.
After a few weeks, it worked with the ubuntu kernel, too.... strange.

Maybe workaround (or even solution) is to install a vanilla kernel...

I have _one_ idea that could explain it's gone, but I dont assume anything: when on an old working install, you install the 2.6.28 being running the old kernel. This runs "initramfs" on the old curent one. When trying to boot on the new one, it brakes.

Then, after booting with the workaround, and upgrading once your system, the "initramfs" has been ran on the current new kernel, so everything got OK.

I wont install a vanilla kernel on this box. ;-)

Another better solution is to edit the /etc/mdadm/mdadm.conf file and add your ARRAY variables
root@serv02:~# cat /etc/mdadm/mdadm.conf | grep ARRAY
ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md2 devices=/dev/sda2,/dev/sdb2

Then just rebuild the initramfs of the kernel you want to run, the easiest is to reinstall linux-image using apt(itude).
This solved the problem on our machine.
But it still doesn't explain why the array is not autodetected...

Mark Doliner (thekingant) wrote :

My /etc/mdadm/mdadm.conf file already listed by arrays (although using UUID=blah instead of devices=blah). I didn't have linux-image installed. I tried installing it but it doesn't seem to do much (I guess because it's one of those virtual package type things?). But I did this and it fixed my problem:

sudo dpkg-reconfigure linux-image-2.6.28-11-generic

It surprised me that that fixed it, because I had run "/usr/sbin/update-initramfs -u -k all" and that did NOT fix it. Oh well.

gadLinux (gad-aguilardelgado) wrote :

For me this does not solve anything.

I custom compiled a new kernel and this also does not solve the problem.

I have not tried the mdadm solution. Because this is another softraid solution. I want to use the onbios solution. dmrad.

gadLinux (gad-aguilardelgado) wrote :

I'm sorry. I wrote incorrect the solution. Correct spelling is dmraid.

This need a solution. Cause the whole system is unusable if you have all your disks in RAID-1 configuration.

Have one sparse disk saved my life!

gadLinux (gad-aguilardelgado) wrote :

I upgraded to ALPHA (Karmic) and the solution is not there...

¡ Do not upgrade !

I had to put the old 2.6.28-kernel image to make the system work a little bit.

It seems that mixing 2.6.28-kernel initrd and the kernel 2.6.30 it get so confused that can make the system work.

But this is a whole mess as the procedure to reboot now for me is very tricky:

1.- It blocks on boot. And I have to use crtl+alt+supr.
2.- Curiously it does not reboot but gives me a root terminal.
3.- From there I can remount / (rw).
4.- Make a telinit 1
5.- Make a telinit 3

And I have the system booting.

This is horrible.

I don't know what is the cause of the no block devices found. But it's something related to having all the disks connected to the SATA interface and making the BIOS let run the SATA ports as RAID ports.

I have exactly the same system running perfectly but it boots from a IDE disk. So what it seems to be not supported is booting from RAID 1 disk.

Hope someone takes this bug because it really makes the system to go crazy.

Thanks

tamasko (tamasko) wrote :

Hi!

I also had the same problem with a Gigabyte ga73pvm-s2h rev 1.0 mobo previously working flawlessly with intrepid.

Now the comments above have been very helpful in understanding the problem, and thank you for your expertise, but they actually didn't concretely solve my problem.

I didn't have md* devices, but sda1 and sdb1 which were in raid previously. Then I noticed that in grub, I had root defined as /dev/mapper/nvidia_cibiiiaa1 so I just replaced that with /dev/sda1.

It boots correctly now, and I guess I just have to rebuild another RAID1 array. However, I am not even sure if I should do that, because RAID only caused me headaches so far (I have a RAID0 and RAID5 array on my windows PC, 2 RAID5 on my fileserver and one RAID1 on my webserver, so I am not totally new to the concept :)

I hope this helps,

Thanks!

Annoyed Ape (jarkoian) wrote :

I have this problem as well, and as the above poster commented, I do not have any md array partitions. (The drives might have been a RAID at one point, but they were cleaned and grub was reinstalled) I upgraded from 8.10 to 9.10... via a quick stop at 9.04. The bug appeared in 9.04 and only by booting an old kernel 2.6.27-14-server was I able to scrape-along and get the machine up.

My boot drives are on the motherboard and are identified as sde and sdf. The software raid a.k.a. "fake RAID" card handles my storrage array of 4 drives... listed as sda through sdd. If it is worth mentioning, all drives are IDE and not SATA... so somewhat confused as to why the BIOS presents them to ubuntu as "s" drives in unix. (I would have thought they would have been "h" drives?)

The problem seems to be kernel related, but maybe the auto-upgrade messed up/confused grub? I vaguely remeber having to tell it to boot from hd(x:y) where x. y weren't the standard 0,0.

Annoyed Ape (jarkoian) wrote :
Download full text (3.4 KiB)

Ok, found my solution... but probably not for the folks that are actually using a RAID configuration. Essentially whenever a drive has been in a RAID configuration at one point in its life, the controller card writes "meta data" to the last sectors of the drive. These sectors are beyond the user-data space, but can be read by the controller and/or the controller-driver... and apparently the new kernels!

I had a former RAID 1 (mirror) configuration on two identical drives. Even though there was no md0, md1 (RAID partitions) on either drive, both drives were present in the system via a /dev/mapper device. The /dev/mapper is a virtual device that maps the physical drives to the fake raid driver card(s). The prefix of this device, in my case "pdc", just tells the system it is a "Promise IDE" card and to use the correct promise fake raid drivers. The rest of the device's name is just some ungodly unique identifier for the set. In order to mount the drives, linux likes to know the unique identifier of the drives and doesn't really care as to their physical partition (/dev/sde, /dev/sdf, etc.). The unique identifier Linux uses is a big hexadecimal code that is reffered to as the UUID. (Universal Unique IDentifier)

Well, it turned out that I had two old RAID drives with the same UUID. Since they were mapped via the pdc "fake rade drivers" to the system they should present themselves as a single UUID because they are a mirror. (i.e. Linux doesn't care which physical drive gets the data, the mapper handles that... all it cares is that /dev/sde and /dev/sdf partitions are mounted via the mapper.) In the old configuration, even though both drives had the same UUID, only one of them had data and was mounted, the other was empty and unmounted. The old system didn't seem to care that there was another drive with the same UUID because it never attempted to mount it. The new kernel seems to read the raid meta data and incorrectly identifies the disk as part of a RAID set. It then tries to mount it, but of course there are no RAID md partitions so it can't... hence enter the wonderful work of busybox.

It looks like the new kernel must have changed the way it identified if a drive is in a RAID set? Maybe before it looked for the md partitions of Linujx RAID types and now it looks for the existance of meta data or a /dev/mapper/pdcxxxxx device? Either way, during boot it would time out because it couldn't initialize the RAID array... well because there wasn't one.

Here is how I fixed my system:

List the RAID devices in your system:
sudo dmraid -r

Verify you don't really have an array: (no Linux RAID md partitions)
sudo fdisk -l
(that's an "el" above)

Remove meta data from the old RAID drives: (be really sure you don't have a RAID, otherwise say bye bye to your data)
sudo dmraid -r -E

create a new partition if required on the empty (non boot/root disk, sdf in my case):
sudo fdisk /dev/sdf
(partition as required, in my case one ext3 partition as sdf1)

"format the new partition" (in my case ext3)
sudo mke2fs -j /dev/sdf1

check out the UUIDs to make sure they are unique:
sudo blkid

if they aren't, change them: (you may have to reboot here...

Read more...

PaFuLong (pafulong) wrote :

There is a quite clean fix for this here:

http://ubuntuforums.org/showthread.php?t=1026461#8

Seems like dmraid is loaded at the wrong time or in the wrong way
and the script loading it was actually not made executable

Cheers

affects: ubuntu → linux (Ubuntu)
tags: added: kj-triage
Jeremy Foshee (jeremyfoshee) wrote :

Hi Manson,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 358054

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: New → Incomplete
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
gadLinux (gad-aguilardelgado) wrote :

This was caused because incorrect metadata in raid.

Solution was to delete old metadata and everything worked again.

Hi,

  the situation was fixed by the publication of kernel 2.6.28.11 (mid-april of 2009).

Regards,
Thomas.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers