long bootup, dmesg full of md: array md1 already has disks!

Bug #139802 reported by Brian
34
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Running a Fiesty Server install.

About a week ago I updated my Ubuntu server (which I believe installed a new kernel) but hadn't yet rebooted. Yesterday we had a power failure and it took ages for my server to startup. I left it overnight so I don't know how long it took.

There wasn't much on the screen, but there was alot of disk activity and this morning it was up and running, except for my most recently added RAID-1 array.

I checked DMESG and found this nasty oom-killer because of mdadm.

[2000 lines of md: array md1 already has disks!]
[31746.101310] md: array md1 already has disks!
[31832.551736] md: array md1 already has disks!
[31919.273831] md: array md1 already has disks!
[31931.074529] mdadm invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
[31931.074556] [<c015ae75>] out_of_memory+0x175/0x1b0
[31931.074570] [<c015c84c>] __alloc_pages+0x2bc/0x310
[31931.074580] [<c015dfcc>] __do_page_cache_readahead+0x10c/0x250
[31931.074585] [<c01e45ce>] blk_remove_plug+0x2e/0x70
[31931.074593] [<c02f4442>] io_schedule+0x22/0x30
[31931.074600] [<c02f471b>] __wait_on_bit_lock+0x5b/0x70
[31931.074604] [<c0157650>] sync_page+0x0/0x40
[31931.074612] [<c0157633>] __lock_page+0x73/0x80
[31931.074619] [<c015a094>] filemap_nopage+0x2a4/0x380
[31931.074626] [<c0164bb4>] __handle_mm_fault+0x204/0xe50
[31931.074644] [<c0176345>] nameidata_to_filp+0x35/0x40
[31931.074653] [<c017639b>] do_filp_open+0x4b/0x60
[31931.074662] [<c02f7415>] do_page_fault+0x125/0x6b0
[31931.074670] [<c017646c>] do_sys_open+0xbc/0xe0
[31931.074676] [<c02f72f0>] do_page_fault+0x0/0x6b0
[31931.074679] [<c02f5ad4>] error_code+0x7c/0x84
[31931.074689] =======================
[31931.074690] Mem-info:
[31931.074692] DMA per-cpu:
[31931.074695] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
[31931.074697] Normal per-cpu:
[31931.074699] CPU 0: Hot: hi: 186, btch: 31 usd: 170 Cold: hi: 62, btch: 15 usd: 14
[31931.074704] Active:59839 inactive:59901 dirty:0 writeback:0 unstable:0 free:1204 slab:5748 mapped:11 pagetables:993
[31931.074708] DMA free:2056kB min:88kB low:108kB high:132kB active:5244kB inactive:5260kB present:16256kB pages_scanned:17660 all_unreclaimable? yes
[31931.074711] lowmem_reserve[]: 0 492 492
[31931.074716] Normal free:2760kB min:2792kB low:3488kB high:4188kB active:234112kB inactive:234344kB present:503876kB pages_scanned:702548 all_unreclaimable
? yes
[31931.074719] lowmem_reserve[]: 0 0 0
[31931.074722] DMA: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2056kB
[31931.074729] Normal: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2760kB
[31931.074737] Swap cache: add 377699, delete 377699, find 27/48, race 0+0
[31931.074739] Free swap = 0kB
[31931.074740] Total swap = 1510068kB
[31931.074742] Free swap: 0kB
[31931.080067] 131056 pages of RAM
[31931.080069] 0 pages of HIGHMEM
[31931.080071] 2100 reserved pages
[31931.080072] 28 pages shared
[31931.080073] 0 pages swap cached
[31931.080075] 0 pages dirty
[31931.080076] 0 pages writeback
[31931.080077] 11 pages mapped
[31931.080079] 5748 pages slab
[31931.080080] 993 pages pagetables
[31931.080092] Out of memory: kill process 5473 (S25mdadm-raid) score 7853 or a child
[31931.080109] Killed process 5479 (mdadm)

md1 is a raid array I recently added to the system. I did not update mdadm.conf or do anything other than create the array with a mdadm --create /dev/md1 --level=raid1 --raid-devices=2 /dev/sd[bd]1

Here's what mdstat looked like:
cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sdb1[0](S)
244195904 blocks

md0 : active raid1 sda1[0] sdc1[1]
312568576 blocks [2/2] [UU]

unused devices: <none>

I first tried:
mdadm --assemble --scan but that just produced more of those repetitive errors:
md: array md1 already has disks!

as I hadn't yet stopped md1.

So I stopped md1 and started it again with:
mdadm --assemble --scan
and it started but with only one member, sdd1.

after a bit more trouble shooting (starting and running a new md2 with sdb1 successfully) I stopped md1 (and md2) restarted md1 with --assemble --scan and added sdb1 with mdadm --add /dev/md1 /dev/sdb1.

The array successfully rebuilt but I'm hesitant to reboot.

In addition, the filesystem on md1 was xfs (not sure if that's important).

I have since learned about this:
/usr/share/mdadm/mkconf >/etc/mdadm/mdadm.conf
update-initramfs -k all -u

which I have done and this may make the next boot uneventful.

I can't exclude some sort of hardware failure as a result of the power failure, but I thought I'd post this incase it makes sense to someone.

Revision history for this message
Brian (brian2004) wrote :

Followup: I restarted the server without reproducing this error. The array had successfully rebuilt, I don't know what would happen if the array had been degraded. There could be something here, but I can't reproduce it.

Revision history for this message
netslayer (netslayer007) wrote :

I think I figured out why we both have this bug, please read
https://bugs.launchpad.net/ubuntu/+bug/140854

Revision history for this message
Giuseppe Dia (giusedia) wrote :

I have this sort of bug too.
https://bugs.launchpad.net/ubuntu/+bug/140854 wasn't useful, and I still get a lot of md: array md1 already has disks!.
If you need I can paste the relevant, just tell me what you need.

Revision history for this message
Alan Jenkins (aj504) wrote :

I have a similar problem. I usually can't boot, and I get these floods of "already has disks" messages. I've found it goes away if I explicitly describe my array in mdadm.conf (listing the devices in the array) and rebuild the initramfs.

I'm intrigued by this OOM. I get something that looks similar if I boot with mem=128M. I get several OOMs until the kernel runs out of processes to kill (init is last) and then panics.

Maybe limiting the system to 128M just puts it under too much stress. But I wonder the loop which generates this message and consumes too much memory is inside the kernel.

Revision history for this message
Alan Jenkins (aj504) wrote :

Ah no, it is mdadm that's buggy.

The "array md1 already has disks!" infinite loop is fixed in the latest upstream version (2.6.7). If you have git access, have a look at commit "mdadm-2.6.6-1-g1c203a4". The log says it fixes "autoassemble for stacked arrays", but it looks like this bug also affects non-stacked arrays in some circumstances.

At least On Hardy Heron, Ubuntu doesn't patch mdadm at all. So one should be able to just install mdadm from source. Obviously one needs to take great care if you use MD for your root filesystem.

There are also a worrying number of "fix segfault" commits. Personally I think I'll be much happer after manually upgrading mdadm to version 2.6.7.

Daniel T Chen (crimsun)
Changed in mdadm:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.