Ubuntu
mdadm package

long bootup, dmesg full of md: array md1 already has disks!

Bug #139802 reported by Brian on 2007-09-15

Affects		Status	Importance	Assigned to	Milestone
	mdadm (Ubuntu)	Fix Released	Undecided	Unassigned

Bug Description

Running a Fiesty Server install.

About a week ago I updated my Ubuntu server (which I believe installed a new kernel) but hadn't yet rebooted. Yesterday we had a power failure and it took ages for my server to startup. I left it overnight so I don't know how long it took.

There wasn't much on the screen, but there was alot of disk activity and this morning it was up and running, except for my most recently added RAID-1 array.

I checked DMESG and found this nasty oom-killer because of mdadm.

[2000 lines of md: array md1 already has disks!]
[31746.101310] md: array md1 already has disks!
[31832.551736] md: array md1 already has disks!
[31919.273831] md: array md1 already has disks!
[31931.074529] mdadm invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
[31931.074556] [<c015ae75>] out_of_memory+0x175/0x1b0
[31931.074570] [<c015c84c>] __alloc_pages+0x2bc/0x310
[31931.074580] [<c015dfcc>] __do_page_cache_readahead+0x10c/0x250
[31931.074585] [<c01e45ce>] blk_remove_plug+0x2e/0x70
[31931.074593] [<c02f4442>] io_schedule+0x22/0x30
[31931.074600] [<c02f471b>] __wait_on_bit_lock+0x5b/0x70
[31931.074604] [<c0157650>] sync_page+0x0/0x40
[31931.074612] [<c0157633>] __lock_page+0x73/0x80
[31931.074619] [<c015a094>] filemap_nopage+0x2a4/0x380
[31931.074626] [<c0164bb4>] __handle_mm_fault+0x204/0xe50
[31931.074644] [<c0176345>] nameidata_to_filp+0x35/0x40
[31931.074653] [<c017639b>] do_filp_open+0x4b/0x60
[31931.074662] [<c02f7415>] do_page_fault+0x125/0x6b0
[31931.074670] [<c017646c>] do_sys_open+0xbc/0xe0
[31931.074676] [<c02f72f0>] do_page_fault+0x0/0x6b0
[31931.074679] [<c02f5ad4>] error_code+0x7c/0x84
[31931.074689] =======================
[31931.074690] Mem-info:
[31931.074692] DMA per-cpu:
[31931.074695] CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0
[31931.074697] Normal per-cpu:
[31931.074699] CPU 0: Hot: hi: 186, btch: 31 usd: 170 Cold: hi: 62, btch: 15 usd: 14
[31931.074704] Active:59839 inactive:59901 dirty:0 writeback:0 unstable:0 free:1204 slab:5748 mapped:11 pagetables:993
[31931.074708] DMA free:2056kB min:88kB low:108kB high:132kB active:5244kB inactive:5260kB present:16256kB pages_scanned:17660 all_unreclaimable? yes
[31931.074711] lowmem_reserve[]: 0 492 492
[31931.074716] Normal free:2760kB min:2792kB low:3488kB high:4188kB active:234112kB inactive:234344kB present:503876kB pages_scanned:702548 all_unreclaimable
? yes
[31931.074719] lowmem_reserve[]: 0 0 0
[31931.074722] DMA: 0*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2056kB
[31931.074729] Normal: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2760kB
[31931.074737] Swap cache: add 377699, delete 377699, find 27/48, race 0+0
[31931.074739] Free swap = 0kB
[31931.074740] Total swap = 1510068kB
[31931.074742] Free swap: 0kB
[31931.080067] 131056 pages of RAM
[31931.080069] 0 pages of HIGHMEM
[31931.080071] 2100 reserved pages
[31931.080072] 28 pages shared
[31931.080073] 0 pages swap cached
[31931.080075] 0 pages dirty
[31931.080076] 0 pages writeback
[31931.080077] 11 pages mapped
[31931.080079] 5748 pages slab
[31931.080080] 993 pages pagetables
[31931.080092] Out of memory: kill process 5473 (S25mdadm-raid) score 7853 or a child
[31931.080109] Killed process 5479 (mdadm)

md1 is a raid array I recently added to the system. I did not update mdadm.conf or do anything other than create the array with a mdadm --create /dev/md1 --level=raid1 --raid-devices=2 /dev/sd[bd]1

Here's what mdstat looked like:
cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sdb1[0](S)
244195904 blocks

md0 : active raid1 sda1[0] sdc1[1]
312568576 blocks [2/2] [UU]

unused devices: <none>

I first tried:
mdadm --assemble --scan but that just produced more of those repetitive errors:
md: array md1 already has disks!

as I hadn't yet stopped md1.

So I stopped md1 and started it again with:
mdadm --assemble --scan
and it started but with only one member, sdd1.

after a bit more trouble shooting (starting and running a new md2 with sdb1 successfully) I stopped md1 (and md2) restarted md1 with --assemble --scan and added sdb1 with mdadm --add /dev/md1 /dev/sdb1.

The array successfully rebuilt but I'm hesitant to reboot.

In addition, the filesystem on md1 was xfs (not sure if that's important).

I have since learned about this:
/usr/share/mdadm/mkconf >/etc/mdadm/mdadm.conf
update-initramfs -k all -u

which I have done and this may make the next boot uneventful.

I can't exclude some sort of hardware failure as a result of the power failure, but I thought I'd post this incase it makes sense to someone.