mdadm cannot assemble array as cannot open drive with O_EXCL

Bug #27037 reported by Ian Oliver on 2005-12-13
62
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

Further to discussions (well, monologue!) on the forums here - http://ubuntuforums.org/
showthread.php?p=563255&posted=1

I've had two raid arrays working on my Breezy machine for several months.

/dev/md0 is a raid 5 built from /dev/sd[a-d]
/dev/md1 is a raid 0 built from /dev/sd[e-h]

I rebooted the server so I could change the power leads around and /dev/md1 won't assemble - it
says no drives. I have recently done a dist-upgrade and don't recall if I rebooted afterwards.
/dev/md0 is working fine.

---------------------------------------------------
Here is what I get if I try manually -
sudo mdadm --assemble /dev/md1 /dev/sd[e-h]
mdadm: cannot open device /dev/sde: Device or resource busy
mdadm: /dev/sde has no superblock - assembly aborted

---------------------------------------------------
But --examine is happy with all the drives that make up the array
sudo mdadm --examine /dev/sde
/dev/sde:
Magic : a92b4efc
Version : 00.90.01
UUID : 22522c98:40ff7e71:c16d6be5:d6401d24
Creation Time : Fri May 20 13:56:01 2005
Raid Level : raid0
Device Size : 293036096 (279.46 GiB 300.07 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 1

Update Time : Fri May 20 13:56:01 2005
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 3abea2a9 - correct
Events : 0.1

Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 112 3 active sync /dev/sdh

0 0 8 64 0 active sync /dev/sde
1 1 8 80 1 active sync /dev/sdf
2 2 8 96 2 active sync /dev/sdg
3 3 8 112 3 active sync /dev/sdh

The array definitely doesn't exist as this shows -
cat /proc/mdstat
Personalities : [raid5]
md0 : active raid5 sda[3] sdb[2] sdd[1] sdc[0]
732595392 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

---------------------------------------------------
More info from using strace

open("/dev/sde", O_RDONLY|O_EXCL) = -1 EBUSY (Device or resource busy)
write(2, "mdadm: cannot open device /dev/s"..., 60mdadm: cannot open device /dev/sde: Device or
resource busy
) = 60
write(2, "mdadm: /dev/sde has wrong uuid.\n", 32mdadm: /dev/sde has wrong uuid.
) = 32

It looks like the exclusive open to sde is failing. I tried using lsof to see what else had sde
open but can't see anything.

All ideas welcome, but I'm really worried that I might do something to get my /dev/md0 array
into the same state, and this is required 24x7.

Note that this machine has been upgraded from Hoary, and I've also renumbered the mdadm-raid
script in /etc/rcS.d as with it in the default place it was running before hotplug.

I'm reporting this as a kernel-package bug as I really don't know where else to put it!

Thanks

Ian

Ben Collins (ben-collins) wrote :

Can you run (may have to install lsof pkg) "sudo lsof | grep sde" and see if
anything has the device open?

Ian Oliver (hvy4-idbo) wrote :

(In reply to comment #1)
> Can you run (may have to install lsof pkg) "sudo lsof | grep sde" and see if
> anything has the device open?

I have run lsof (but this was kind of burried in my message) and there is nothing there.

this produces no output
ioliver@tera:~ $ sudo lsof | grep -i sde
but this also produces no output
ioliver@tera:~ $ sudo lsof | grep -i sda

Regards

Ian

Ian Oliver (hvy4-idbo) wrote :

Oh, and here is my uname output to confirm my kernel version.
ioliver@tera:~ $ uname -a
Linux tera 2.6.12-9-386 #1 Mon Oct 10 13:14:36 BST 2005 i686 GNU/Linux

I'm trying to build up the confidence to reboot with the older kernel, but am really scared in case my /dev/
md0 gets into the same state. Without any work-around or real idea of what's wrong, I feel like I'm walking
on egg shells.

Ian

Ian Oliver (hvy4-idbo) wrote :

You're going to like this!

ioliver@tera:~ $ sudo losetup /dev/loop/0 /dev/sde
ioliver@tera:~ $ sudo losetup /dev/loop1 /dev/sdf
ioliver@tera:~ $ sudo losetup /dev/loop2 /dev/sdg
ioliver@tera:~ $ sudo losetup /dev/loop3 /dev/sdh

ioliver@tera:~ $ sudo mdadm --assemble /dev/md1 /dev/loop/0 /dev/loop1 /dev/loop2 /dev/loop3
mdadm: /dev/md1 has been started with 4 drives.

So, if I "hide" the drive behind a loop back, then mdadm is perfectly happy with it and will assemble the
array. Remember, my other four drives, on an identical controller, didn't require this treatment!

Now that I have a work-around, I'm happier to try rebooting with different kernels/patches etc.

Regards

Ian

Ian Oliver (hvy4-idbo) wrote :

Is there yet any indication of where the problem might be? Is it as deep as libata, as I'm on that list?

Everything is still working with the loop on all four drives, but it can't be very efficient!

Ian

Ben Collins (ben-collins) wrote :

This is more of a block layer issue. I believe that it is related to some VFS hacks. We have a patch that is supposed to address this issue (I believe it is present in the stock kernel), but it seems like it is not working.

Sebastian Goth (seezer) wrote :

Got the same problem - can't assemble because of busy devices.

mdadm --examine /dev/hde1
tells me
mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got 00000000)

The raid was created on gentoo one and a half year ago, migrated to breezy beta without a problem.
Just kept the mdadm.conf and was happy.
Here on Dapper the mdadm init scripts fail.

With the default kernel of kubuntu flight5 (2.6.15-18-386) i even got a segfault for
mdadm --details --scan
but after updating to 2.6.15-19-686 it worked normally again - with no result.

I got it up and running by just using --build:
mdadm --build /dev/md0 --level=0 --raid-devices=2 --auto /dev/hd[eg]1

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Pelle (per-anders-andersson) wrote :

I have a similar problem, now running the 2.6.27 kernel.

Ian Olivers nice workaround solves it, but there is a spooky md0 array that i can not get rid of.

>mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid0 num-devices=2 UUID=7d97e292: ... /* /dev/sd[c-d] - won't go away */
ARRAY /dev/md1 level=raid0 num-devices=4 UUID=84d9862b: ... /* /dev/sd[e-h]1 - working fine */
ARRAY /dev/md0 level=raid0 num-devices=2 UUID=279762b4: ... /* /dev/sd[c-d]1 - now loop[1-2] */

The history is:
/dev/sdc & /dev/sdd was first setup. Then removed and replaced by /dev/sdc1 & /dev/sdd1.
That worked fine for a long time until the next reboot. Then the first array reappeared
and could not be removed. And it became a horrible mess.

Many Thanks for the workaround! - but I hope for a cleaner solution.

Ian Oliver (hvy4-idbo) wrote :

What have you got in /etc/mdadm/mdadm.conf? Is the ARRAY definition for the old array in there? If so, comment it out and reboot (or unmount everything and restart mdadm, but reboot is maybe easier!)

It might also be worth removing the entry that creates the array on your loop devices and assemble it again from scratch. Hopefully it will work, which would get 2.6.27 off the hook.

(By the way, I'm sure you know this already, but raid0 shoudn't be used for any data you don't have copies of elsewhere, particularly with four drives. A few days ago, I had a drive go wrong in a four drive raid 0 and it's instant loss of all data. However, my array was just for backups, so not an issue, which is why I felt OK going for raid 0.)

Ian

Pelle (per-anders-andersson) wrote :

I have tried several things in /etc/mdadm/mdadm.conf without sucess.
And i think that the only way out of this is to rewrite the superblocks.
But if that can not be done with the mdadm tool i am not willing to try,
since i have no backup at this moment!

One strange thing is that no fs type is set to 0xFD=Linux raid autodetect
but the parititions are nevertheless detected and then md0 is stopped.
I have drivers/md.c compiled in and is running debian.

Ian Ward (ian-excess) wrote :

I can confirm I have encountered this bug on two separate machines. One running Ubuntu 6.06 (kernel 2.6.16) and one running Debian 4.0 (kernel 2.6.18).

The loopback workaround works for me as well.

Pelle (per-anders-andersson) wrote :

I got rid of my problems after compiling the kernel without LVM
That is:
# CONFIG_BLK_DEV_DM is not set

Mikael Frykholm (mikael) wrote :

I spent a couple of hours on this as well. I'm running with the loop workaround now.
I am running 2.6.27-9-server x86_64.

Feel free to ask me to test stuff.

Running 2.6.27-7-server x86_64 on Intrepid and had the same problem. Loop workaround works for me.

kaefert (kaefert) wrote :

Just ran into the same problem on Ubuntu 9.04 x64 Server. I had to adapt the workaround a little so that it worked for me

kaefert@Blechserver:~$ uname -a
Linux Blechserver 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux

kaefert@Blechserver:~$ sudo losetup /dev/loop0 /dev/sdc1
kaefert@Blechserver:~$ sudo losetup /dev/loop1 /dev/sdd1
kaefert@Blechserver:~$ sudo losetup /dev/loop2 /dev/sde1
kaefert@Blechserver:~$ sudo losetup /dev/loop3 /dev/sdf1
kaefert@Blechserver:~$ sudo mdadm --assemble /dev/md1 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
mdadm: /dev/md1 has been started with 4 drives.

Ian Oliver (hvy4-idbo) wrote :

This bug is now over three years old so has been ignored for many versions of Ubuntu. Yes, there's a work-around, but it's a pretty nasty thing to have to do on a production server.

I agree. Do we know when this Bug emerges?
I created this array within another machine running Ubuntu 8.10, and
didn't have any problems like this then.

2009/5/16 Ian Oliver <email address hidden>:
> This bug is now over three years old so has been ignored for many
> versions of Ubuntu. Yes, there's a work-around, but it's a pretty nasty
> thing to have to do on a production server.
>
> --
> mdadm cannot assemble array as cannot open drive with O_EXCL
> https://bugs.launchpad.net/bugs/27037
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Ian Oliver (hvy4-idbo) wrote :

No idea, I'm afraid. It went away for me some time ago when I upgraded Ubuntu, but clearly it's still deep down in there and keeps leaping up and biting someone.

Sadly, when it does occur, no-one is really able to keep trying things to localise the issue - they just want their raid array back and their heart rate back in check.

Ian

Pelle (per-anders-andersson) wrote :

As i understand it the RAID support and the Device Mapper Support do not work well together.
My problems are gone since disabeling CONFIG_BLK_DEV_DM in the kernel config.

What does your /boot/config-xxx say?

kaefert (kaefert) wrote :

Hi Pele.

You can find the contents of my "/boot/config-2.6.28-11-server" file @
http://pastebin.com/m4ed33d4a

2009/5/17 Pelle <email address hidden>:
> As i understand it the RAID support  and the Device Mapper Support do not work well together.
> My problems are gone since disabeling CONFIG_BLK_DEV_DM in the kernel config.
>
> What does your /boot/config-xxx say?
>
> --
> mdadm cannot assemble array as cannot open drive with O_EXCL

Charles Galpin (cgalpin) wrote :

I have the same problem and the loop device work around doesn't work for me (same error)

[root@xen ~]# cat /etc/redhat-release
Fedora Core release 6 (Zod)
[root@xen ~]# uname -a
Linux xen.galpin.net 2.6.20-1.3002.fc6xen #1 SMP Mon Aug 13 14:21:21 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

Yes this is getting l;ong in the tooth - if I can get the array back up I'll backup my data and give the box a fresh install!

Dustyn Marks (dustynmarks) wrote :
Download full text (3.7 KiB)

I, as many others, have the same problem. And honestly this bug is starting to make me wonder whether or not ubuntu is going to make a stable production server. [EDIT: I found a solution to my problem at the bottom]

I have "good" news for those who have been hoping to pinpoint where/how this bug occurs. I've been able to successfully reproduce this error many times using the same technique. Read below;

Test System
Ubuntu 9.04 server (Linux 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux)
1 500gb Sata drive (Boot only) - sda
4x Western Digital WD400BB (sdb, sdc, sdd, sde)

First i created a mirrored raid md0 with disks sdb1 and sdc1 which is working fantastically even now. Here is how i did it;
1. Booted a Live CD (Ubuntu 8.10 amd64)
2. Used gparted to partition/format each drive to ext3 and set the raid flag
3. Rebooted to ubuntu server
4. created the array: sudo mdadm --create /dev/md0 --level=mirror --raid-devices=2 /dev/sdb1 /dev/sdc1
5. allow raid to build on boot: sudo mdadm -Es | grep md0 >> /etc/mdadm/mdadm.conf
6. added the next line to /etc/fstab to allow raid to mount on boot: /dev/md0 /media/raida ext3 auto,user,rw,exec 0 0

Then i made another mirrored raid md1 with disks sdd1 and sde1 which is the one i am having problems with.
1. created the array: sudo mdadm --create /dev/md1 --level=mirror --raid-devices=2 /dev/sdd1 /dev/sde1
2. it created fine. So i let it rebuild, which took 20 minutes. After rebuild, it realized disk sdd failed.
3. powered off the machine, swapped the failed hard drive for a replacement [which was previously formated ext3 and raid flag set] (notice i did not add anything to mdadm.conf or fstab)
4. powered the machine back on.
5. Tried to recreate the array, and it failed with;

administrator@testserver:~$ sudo mdadm --create /dev/md1 --level=mirror --raid-devices=2 /dev/sdd1 /dev/sde1
mdadm: /dev/sdd1 appears to contain an ext2fs file system
    size=39078080K mtime=Wed Dec 31 18:00:00 1969
mdadm: Cannot open /dev/sde1: Device or resource busy
mdadm: create aborted

sdd1 is the replacement drive, and sde1 is from the original raid.

It is a good thing this is only a test server, and nothing with important data on it. It is virtually pointless if one cannot rebuild a raid1 if the only disk that surives is the one that can't be used! This is really pathetic that this bug has been living for 3+ years and no FIX has been discovered for it. I do not want to try any dirty workarounds for the simple reason that eventually my test server will be put into production. I am just testing out the system, and i'm not liking what i'm seeing.

Here are some more outputs.
administrator@testserver:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md_d1 : active raid1 sde1[1]
      39078016 blocks [2/1] [_U]

md0 : active raid1 sdc1[1] sdb1[0]
      39078016 blocks [2/2] [UU]

unused devices: <none>

HEY! I GOT IT TO WORK!!

I want to apologize now for writing this post as a "stream of though," but i just got it working. I wasn't going to post this entire comment, but i figured since i got here using google, maybe someon...

Read more...

vk (vartan-kurtcuoglu) wrote :

@Dustyn Marks: What do you mean by you got it to work? Did, all of a sudden, cat /proc/mdstat indicate an active raid or did you do something to make that happen?

In any case, I was hit with this bug as well and cannot afford to use software raid on Ubuntu as long as this bug exists. Is there any indication on when/if this will be resolved?

John Cosgrove (kobyov) wrote :

I was also testing raid5, using Ubuntu 9.04 x64 Alternate, and hit this problem after rebooting. After reading post #24, I checked /proc/mdstat and it indicated that there was an inactive array md_d0 which had one of my drives listed. After stopping this array, the problem went away and the raid would mount.

Dustyn Marks (dustynmarks) wrote :

@vk: cat /proc/mdstat indicated a different array - one that I did not build. It must have build it automatically (md_d1). Upon deleting this "other" array (md_d1), i was able to build a new array. It seems as if mdadm tried to build a broken array from one of the disks that still had the previous raid settings, which would explain why that one drive was unable to be used.

Hope that helps
-Dusty

vk (vartan-kurtcuoglu) wrote :

Thanks, Dusty, that helps.

- Vartan

Andrew K (andrew-koebbe) wrote :

Ok. I know this issue is really old. But I just wanted to say thanks to everyone for getting some solutions.

I built my second (key I think) mirrored array last night and moved a bunch of stuff over. After rebooting, the second (new) raid was gone and I thought I had lost it all.

The loop worked for me but now that I look at mdstat, I see the md_d1 raid sitting out there.

So after reading this thread, this seems to only be an issue on the second array.

Am I going to have to delete this md_d1 array and build the correct array every time I reboot?

Andrew K (andrew-koebbe) wrote :

Hmm... ok follow up. It seems that after i ran:

sudo mdadm -Es | grep md1 >> /etc/mdadm/mdadm.conf

and rebooted the raid came back on it's own. md0 was already in mdadm.conf. Is it that simple?

Pelle (per-anders-andersson) wrote :

Hej Andrew,

I have had no trouble with my last two arrays on basically debian /etc/rcS.d setup,
even with the CONFIG_BLK_DEV_DM flag turned on while kompiling the latest "stable" kernel.
(but i now have the flag turned off for paranoidomal safty reasons) - i do not need it on.

Why do'nt You (who is still wrestling with this problem & solving it with a delaying softswitch)
write to Ben Collins who was the first one to answer questions about bug[27037] in this forum.
He began 2005 at the same day that we here in Sweden & in Holland and in Her hometown in
Italy have a grand party for Santa Lucia. As i remembered it he also wwrote in the Kernel-Bug
forum asking Linus ant the other guys who are supposed to have some overview about this.

Maybe You found something new so we can go to the rooth of this evil an mark it as solved.
I'll saw that you joined Ubuntu some months before me (but could'nt find your first postings).
I began writing only on this page because goggle saw that it was right here this shit was most
written about. If you are still doing experiments, you can write directly to me. I did buy 2x1.5TB
WD drives who both broke down within 2 weeks. Saved some mony an have now 2 similar Segate
under heavy testing (checking S.M.A.R.T. messages from them). So if you write to me (and not
here every 3 min) and tell me more in detail of your setup, maybe i have time and interest to
try do duplicate the strange behavior you just described.

Pelle (per-anders-andersson) wrote :

Like to report another bug concerning the formatting of what i just wrote.

Jeremy Foshee (jeremyfoshee) wrote :

Unassigned from Ben Collins.

If one of the people in this bug are still experiencing the same bug as the original reporter in Karmic or (preferably) Lucid could run apport-collect -p linux 27037
This will enable the team to narrow the issue with the use of the logs.

Thanks in advance.

-JFo

Changed in linux (Ubuntu):
assignee: Ben Collins (ben-collins) → nobody
status: Confirmed → Incomplete
Mark Knowles (markknowles) wrote :

I just experienced this bug with Ubuntu Lucid Alpha 2.

Version: mdadm 2.6.7.1-1ubuntu15

I have 3 RAID 6 arrays: /dev/md0, /dev/md1 and /dev/md2

/dev/md0 was created by the installer and was ok. md1 and md2 were created after installation.

At boot, my /proc/mdstat looked as follows:

md_d2 : inactive sdi2[3](S)
      871895680 blocks

md_d1 : inactive sdf1[4](S)
      1465047552 blocks

md0 : active raid6 sdb1[0] sdh1[2] sdi1[3] sdg1[1] sdj1[4]
      314592576 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

I then proceeded to try and assemble the missing RAID arrays with the following command:

mdadm --assemble --scan

After that, md1 and md2 were visible, but degraded:

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md2 : active raid6 sdb2[0] sdj2[4] sdh2[2] sdg2[1]
      2615687040 blocks level 6, 64k chunk, algorithm 2 [5/4] [UUU_U]

md1 : active raid6 sda1[0] sde1[3] sdd1[2] sdc1[1]
      4395142656 blocks level 6, 64k chunk, algorithm 2 [5/4] [UUUU_]

md_d2 : inactive sdi2[3](S)
      871895680 blocks

md_d1 : inactive sdf1[4](S)
      1465047552 blocks

md0 : active raid6 sdb1[0] sdh1[2] sdi1[3] sdg1[1] sdj1[4]
      314592576 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

Following some of the comments, I recovered without rebuilding the arrays with the following commands:

1. Stop the degraded arrays and the strange md_d* arrays:
mdadm -S /dev/md1
mdadm -S /dev/md2
mdadm -S /dev/md_d1
mdadm -S /dev/md_d2

2. I checked to see that my mdstat looked clean:

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md0 : active raid6 sdb1[0] sdh1[2] sdi1[3] sdg1[1] sdj1[4]
      314592576 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

Yep, just md0, so that's all good.

3. I tried to restart md1 and md2:

mdadm --assemble --scan

4. Check mdstat

cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
md1 : active raid6 sda1[0] sdf1[4] sde1[3] sdd1[2] sdc1[1]
      4395142656 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

md2 : active raid6 sdb2[0] sdj2[4] sdi2[3] sdh2[2] sdg2[1]
      2615687040 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

md0 : active raid6 sdb1[0] sdh1[2] sdi1[3] sdg1[1] sdj1[4]
      314592576 blocks level 6, 64k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>

5. Make sure mdadm.conf is correct.

I previously only had an entry for md0 in mdadm.conf, so I needed to add definitions for md1 and md2:

mdadm -Es | grep md1 >> /etc/mdadm/mdadm.conf
mdadm -Es | grep md2 >> /etc/mdadm/mdadm.conf

6. For good measure I updated my initrd in case the mdadm conf is stored there:
update-initramfs -u ALL

All seems good :) Thanks for all the comments, it helped me fix a 7TB array quickly (after having a heart attack).

Andrew K (andrew-koebbe) wrote :

Thanks, Mark, for relaying your experience.

I'd venture that Mark's experience confirms my theory that the problem stems from the additional configuration lines not making it in to /etc/mdadm/mdadm.conf.

When are those lines supposed to be automatically entered in to that file?

Mark Knowles (markknowles) wrote :

I have to agree. I set up the two failing RAID devices *after* I installed the system and did not define the devices in mdadm.conf. I would have thought that the kernel would find the arrays regardless of the mdadm.conf though.

Either way, the problem is fixed after fixing mdadm.conf and the problem does not come back after a reboot, so I'm happy.

Pelle (per-anders-andersson) wrote :

The kernel does'nt acess any of /etc/mdadm/mdadm.conf /etc/mdadm.conf /etc/default/mdadm nowadays.
What worries me is that nobody seems to have cared about this package the last 2 years.

Pelle (per-anders-andersson) wrote :

Since i can not edit my recent post, here i go again.
Have you seen what strange raid-things grub2 contains?

Boyd Waters (waters-boyd) wrote :

Cannot access RAID5 array upon reboot.

# uname -a
Linux bwaters-desktop 2.6.32-16-generic #25-Ubuntu SMP Tue Mar 9 16:33:12 UTC 2010 x86_64 GNU/Linux

1. manually assemble the array (named vla) as root
# mdadm --assemble --scan --auto=md

2. mount the array (it contains a large ext4 filesystem)
# mount /dev/md/vla /mnt/vla

This large filesystem is shared by AFPD from the netatalk package.

3. reboot the computer from the GNOME "power symbol" menu (at the top right corner)

When the system comes back up, log in as user, open a terminal, sudo -i to become root.
Attempt to re-start the array yields ("/dev/sdc no superblock found") and the array is not started.

WORK-AROUND -- there is a spurious "array" that seems to be running after reboot. I stopped that "ghost array", and was able to get my array back into operation.

$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md_d127 : inactive sdf[3](S)
      1953514452 blocks super 1.1

unused devices: <none>

$ sudo -i
[sudo] password for bwaters:

# mdadm --stop /dev/md_d127
mdadm: stopped /dev/md_d127

# mdadm --assemble --scan --auto=md
mdadm: /dev/md/vla has been started with 6 drives.

Andrew K (andrew-koebbe) wrote :

Boyd, do you have an entry for all of your arrays in /etc/mdadm/mdadm.conf?

It should look like the output of:
sudo mdadm -Es

ceg (ceg) wrote :

The last couple of comments sound like it's bug #252345

The following will recreate a static mdadm.conf (possible workaround) but is not a fix to the issue (disfunctional hotpluging):

# /usr/share/mdadm/mkconf force-generate /etc/mdadm/mdadm.conf
# update-initramfs -k all -u

Mat Ludlam (matludlam) wrote :

My problem was slightly different in that the "busy device" would change every time that I re-booted.

Removing the fake MD device allowed me assemble correctly and I am now working. Thanks for the help.

I have seen posts elsewhere regarding having a Fake RAID controller in the machine also causing problems. Not sure if this is affecting others here too.

AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 9.10
Ec2AMI: ami-55739e3c
Ec2AMIManifest: ubuntu-images-us/ubuntu-karmic-9.10-amd64-server-20100121.manifest.xml
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.xlarge
Ec2Kernel: aki-fd15f694
Ec2Ramdisk: ari-7b739e12
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1:
Package: linux (not installed)
ProcCmdLine: root=/dev/sda1 ro 4
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, no user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.31-302.7-ec2
Tags: ec2-images
Uname: Linux 2.6.31-302-ec2 x86_64
UserGroups:

Changed in linux (Ubuntu):
status: Incomplete → New
tags: added: apport-collected
Jeffrey Baker (jwbaker) wrote :

@Jeremy Foshee I ran my apport-collect on a Karmic server above. I had the same workaround as one of the above commenters. I had a useless, incomplete md_d0 listed in mdstat which had claimed three of the four devices in my RAID. I stopped md_d0 and assembled the RAID again with success.

$ cat /proc/mdstat
Personalities : [raid0]
md_d0 : inactive sde[3](S) sdc[1](S) sdd[2](S)
      1321098048 blocks

unused devices: <none>
# mdadm --assemble /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm: /dev/md0 has been started with 4 drives.
# cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sdb[0] sde[3] sdd[2] sdc[1]
      1761460224 blocks 1024k chunks

unused devices: <none>

tags: added: karmic
Jeremy Foshee (jeremyfoshee) wrote :

Hi Ian,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily-live/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 27037

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Jeffrey Baker (jwbaker) wrote :

There is really no reason to believe this bug does _not_ exist in the current release, since it has existed for years in past releases and has not been specifically addressed. Setting this bug to "incomplete" repeatedly is how we managed to have this longstanding bug for so long. It's a form of institutionalized buck-passing that really shouldn't be encouraged.

Please do not set this bug to "incomplete" unless you have a specific reason to believe there is a fix that needs testing. As we do not have a specific means of reproducing this bug, which occurs randomly, asking users to test Lucid or to test the upstream is not really a reasonable request, and in any case the existence of the bug cannot be confirmed or denied by a short test.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Jeremy Foshee (jeremyfoshee) wrote :

Jeffrey,
   As it is clear you do not understand the way the Ubuntu Kernel Team works, I'd like to explain a little bit. The Ubuntu Kernel Team regularly pull stable updates from the upstream kernel. As such, quite a number of bugs do not get active work in Launchpad but do get resolved due to the upstream bugs filed against the same type of issue. So you see, it is imperative that I ask for updates from bug reporters. This necessitates the need to move a bug to the incomplete state so that I can track what of the over 6,000 open bugs I have requested information on. Additionally, if you are not the original bug reporter, it is preferable that you open your own bug to address the possibility that your hardware has an impact on the way we address your particular issue since it is entirely possible that a fix for Ian doesn't resolve your issue.

I hope this helps. I'm resetting this bug to Incomplete pending a response to the inquiry that I put forth to Ian.

Thanks!

~JFo

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Ian Oliver (hvy4-idbo) wrote :

I'm the original reporter but I haven't personally experienced this bug since completely switching hardware and moving to Hardy, so I can't provide any more information. Sorry about that, but after six years things were bound to have changed.

I guess people need to open their own bugs, and I'm not sure whether then marking this as a dupe makes sense. Hopefully your bugs won't be open quite as long ...

Ian

Alvin (alvind) wrote :

There is no comment about this bug affecting Lucid, so I'll confirm that now. I encountered it on a RAID-0 array and used the mentioned workaround. (stopping and reassembling the array.)

Jeremy Foshee (jeremyfoshee) wrote :

Alvin,
  Would you mind opening a new bug for me? our policy is to close bugs when the original reporter is unable to replicate the conditions of the bug whether through resolution due to updates or changes in their local environment.

Ian,
   thanks for following up on this. I'll mark it as Fixed for you since you have not seen this in a while. Those of you filing new bugs for your issues, please feel free to notify me of the new bug numbers via this bug if you like so that I can look at them.

Thanks!

~JFo

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Jeffrey Baker (jwbaker) wrote :

HAHAHA fix released? Two people just reproduced the bug in the most recent and second most recent distro within the last 24 hours!

Ian Oliver (hvy4-idbo) wrote :

Hey, this would make a good Dilbert.

Dilbert: Because Wally couldn't fix that critical bug, the customer gave up after a few years and switched to a competitor's product.
PHB: Mark the bug as fixed; we've got targets to meet.

Ian

Alvin (alvind) wrote :

I would open a new bug, but I already reassembled the array (RAID0, without data loss), so I can no longer give relevant output. Of course, I could use more or less the same description as this one.

The array was broken after the second last reboot. I have no intention of trying to reproduce this by rebooting a lot, because I need that server active. (Rebooting the last 3 Ubuntu releases has proven to be a non-trivial task, so I try to avoid that.)

Jeremy, where is it written that separate bugs should be filed for the same issue if the original reporter is unable to reproduce? This bug is probably not the place to discuss this, but isn't it likely that some reporters change version/distro if they encounter a bug that is critical for them? Some reporters are normal users and not beta testers!

Pelle (per-anders-andersson) wrote :

II suggest that we try to identify the bug before discussing its status.

II have not addressed te bug for a long time, but i saw that we can nowadays read how things are supposed to work in
./linux-2.6.33.4/Documentation/md.txt

Alvin (alvind) wrote :

I just rebooted my server and ran into this, so I filed bug 599135

Serge van Ginderachter (svg) wrote :

I'm experiencing the same problem on a host with lucid. I even tried reinstalling hardy (thinking it was a problem with the partition creation - gparted bug) but the same.

What I'm experiencing is pretty much well described in

http://serverfault.com/questions/209379/what-tells-initramfs-or-the-ubuntu-server-boot-process-how-to-assemble-raid-array

In my case, mdadm -Es /dev/sda returns the superblock info that belongs to md2 (= /dev/sda3 + /dev/sdb3)
Because of this, it seems (AFAICS) the initial assembly for md2 is created by combining sda3 with sdb (the whole disk). sdb is then marked in use, and the system cannot start md0 and md1 (/boot and / here).

Adding the script mentioned at the end of the serverfault article seems to do the trick for me.

HTH,

Serge

Serge van Ginderachter (svg) wrote :

It seems that still didn't fix it permanently.

At a certain ppoint things got really messed up.
Normally I have sda1+sdb1=md0 sda2+sdb2=md1 and sda3+sdb3=md2 on said system.
At one point md2 got somehow created from sda and sdb, md0 and md1 were then created from md2p1 and md2p2 (!!).
And this thing booted... (md0 is /boot, md1 is /)

I re-upgraded from hardy to lucid in the mean time, which didn't fix anything either.

As I'm consistently noticing the problem seems related to mdadm detecting a superblock on the full disk device, instead of the partitions, I added the specific partitions to mdadm.conf:

/etc/mdadm/mdadm.conf:
DEVICE /dev/sda1 /dev/sda2 /dev/sda3 /dev/sdb1 /dev/sdb2 /dev/sdb3

This seems to consistently solve the issue in my case now.

Dirk T (miriup) wrote :

Ok, I just had the same issue - or at least a similar one. Unfortunately the box has some other problems and I don't have network with me.

It affected a RAID0 as well and I'm running 2.6.36. I was rebooting into 2.6.38 and after that reboot my RAID array was gone. I have to admit that array has a partially faulty disk (hence me working on it).

The symptom was that /proc/mdstat reported an inactive device md4 in state (S). The other physical devices participating in this array did not show up in /proc/mdstat. When trying to --assemble /dev/md4 it would complain that the one physical device listed in the incomplete md4 was reported "in use". The event counters were equal on all four participating partitions and the "partner" partitions were properly listed, i.e. `mdadm --examine /dev/sd{a,b,c,d}4` spit out all correct information. I'm in the process of downsizing this RAID0 (hence the kernel upgrade). This device had seen a `mdadm --grow --array-size` before and a `mdadm --grow --size` was attempted, but failed in the past. More details can be found in my "log" here: http://miriup.de/index.php?option=com_content&view=article&id=83:wading-through-my-own-raid-and-lvm-mess&catid=8:linux&Itemid=25&lang=en

I did `mdadm --stop /dev/md4` followed by `mdadm --assemble /dev/md4` which fixed the problem.

ctirpak (chris-tirpak) wrote :

I just ran into this and found a solution that actually makes some sense at least for full volumes:

In some kernel release ... presumably around when this cropped up, the default device id naming changed from /dev/sdXX to /dev/xvdXXX - it seems both are still supported but that mdadm is looking for the new convention. Changing my device line cured all of my problems instantly.

in mdadm.conf

Instead of (or sda1, sda2 etc)
DEVICE /dev/sdh[1-8]

try (xvda1, sdva2 etc):
DEVICE /dev/xvdh[1-8]

you can see my discovery process here:
http://ubuntuforums.org/showthread.php?t=1468064&page=5

take a look in /dev and you will probably see an xdvxxx entry that correspoibds to every sdxx entry ... not sure if it will help in the partition scenarios

Michael Shigorin (shigorin) wrote :

This "ubuntu feature" has even earned a special note at linux-raid wiki: https://raid.wiki.kernel.org/index.php/RAID_setup#Saving_your_RAID_configuration

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.