if array is given a name, a strange inactive md device appears instead of the one created upon reboot

Bug #576147 reported by Ypthor
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: mdadm

I installed Ubuntu 10.04 on a software (throughout the description I'm talking about softraid) raid1 array, with the alternate installer. It boots, it works, all's well.

Now I wanted to create some new raid arrays and I experience the following:

If I create it with the 1.2 superblock AND I give it a name
"#mdadm --create /dev/md1 --metadata=1.2 --name=TEST --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1 -> the array is created
"#mdadm -Es | grep md1 >> /etc/mdadm/mdadm.conf" -> the array gets the name "TEST" and device name "/dev/md/TEST" and upon reboot doesn't start.
Instead an inactive array called md_d127 (further arrays are named md_d126, md_d125) appears, in /proc/mdstat grabbing one of the components.
Trying to assemble the created array "#mdadm --assemble --name=TEST" (or --assemble /dev/md/TEST) results in a degraded array with only the other component in it.

Stopping the md_d127 array and then assembling the TEST array usually works, but last time I tried it failed.

The following device files are present in /dev (md0 has the system on it):

# ls /dev/md*
/dev/md0 /dev/md_TEST1 /dev/md_TEST3
/dev/md_d127 /dev/md_TEST2 /dev/md_TEST4

/dev/md:
TEST TEST1 TEST2 TEST3 TEST4

If I create a default raid1 (I've tried naming it too, but I guess that's unavailable, with a v0.9 superblock), or one with a 1.2 superblock, but no name given, they all get assembled upon reboot.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: mdadm 2.6.7.1-1ubuntu15
ProcVersionSignature: Ubuntu 2.6.32-21.32-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-21-generic x86_64
Architecture: amd64
Date: Thu May 6 01:45:07 2010
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1)
MDadmExamine.dev.sda: Error: command ['/sbin/mdadm', '-E', '/dev/sda'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda.
MDadmExamine.dev.sda3: Error: command ['/sbin/mdadm', '-E', '/dev/sda3'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda3.
MDadmExamine.dev.sda4: Error: command ['/sbin/mdadm', '-E', '/dev/sda4'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda4.
MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdb.
MDadmExamine.dev.sdb3: Error: command ['/sbin/mdadm', '-E', '/dev/sdb3'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdb3.
MDadmExamine.dev.sdb4: Error: command ['/sbin/mdadm', '-E', '/dev/sdb4'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdb4.
MachineType: Gigabyte Technology Co., Ltd. GA-MA78GPM-DS2H
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-21-generic root=UUID=88e1b6ce-dd96-4f24-b899-48bca4256fc7 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: mdadm
dmi.bios.date: 04/14/2009
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F4
dmi.board.name: GA-MA78GPM-DS2H
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF4:bd04/14/2009:svnGigabyteTechnologyCo.,Ltd.:pnGA-MA78GPM-DS2H:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-MA78GPM-DS2H:rvr:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-MA78GPM-DS2H
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'

Revision history for this message
Ypthor (ypthor) wrote :
Revision history for this message
M. O. (marcusoverhagen) wrote :

I also see this /dev/md_d127 array after reboot. It usually consists of a 1 to 3 devices that belongs to the real raid array.

I usually run

mdadm --stop /dev/md_d127
mdadm --assemble --no-degraded /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1

to get the real array one running. Annoying!

This md_d127 gets created even if /dev/mdadm/mdadm.conf is empty.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdb1[0] sdc1[8] sde1[9] sdf1[5] sdg1[4] sdh1[3] sdi1[10] sda1[1]
      13674582656 blocks super 1.2 level 5, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]
      bitmap: 2/466 pages [8KB], 2048KB chunk

# mdadm -Es
ARRAY /dev/md/Apophis level=raid5 metadata=1.2 num-devices=8 UUID=2a54ea57:ce057f40:128e12a9:354e7dea name=:Apophis

Revision history for this message
ceg (ceg) wrote :

May this be Bug #469574 or Bug #532960 ? They have workarounds.

Revision history for this message
Ypthor (ypthor) wrote :

Similar, but not the same.

In my case, an error occurs IF the array has been named, and I don't see this mentioned in those two bugs.

I've tried using the exact same array, the only difference it being named or unnamed and if it's named it fails, but when not, it works flawlessly. I've been using it ever since without error.

Could you please explain what the workaround for those two bugs does?

description: updated
Revision history for this message
ceg (ceg) wrote :

Bug #469574: The workaround generates a mdadm.conf describing active arrays and updates the initramfs for all kernels, so they will contain the new file.

Revision history for this message
Ypthor (ypthor) wrote :

I suppose, the lines generated by

#mkconf force-generate

would be the same as those generated by

#mdadm -Es

so the difference would be the updated initramfs, right?

If so, then updating it with the the conf file I created should work also.

I'll try these when I find some drives, as the ones I used before are in use.

Do you know of a way to create/modify the name of an already created array?

BTW Does launchpad have markup in posts?
I'll try below.
<code>#this -is /code</code>
[code]#so -is /this[/code]

Revision history for this message
Minoc (wolenetz) wrote :

I can verify the bug described above, and that the workaround of Bug #469574 does NOT work.

My system 10.04 LTS on an AMD 86x64:
Linux logopolis 2.6.32-22-generic #36-Ubuntu SMP Thu Jun 3 19:31:57 UTC 2010 x86_64 GNU/Linux

Steps to reproduce:
* Initialized 8 1.5T new drive partitions - each drive getting primary partition 3 configured as type fd using the entire 1.5T of space.

* Verified all working drives via smartctl tests

* Created a new raid w/ the command:
sudo mdadm --create /dev/md0 --level=6 --verbose -c 128 --name=raid_disk_2 --raid-devices=8 --metadata 1.2 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sdg3 /dev/sdh3 /dev/sdi3 /dev/sdj3

* Created a ext4 fs on the raid, and added about 2.7T of data to raid (test data...)

* Added the ARRAY to the /etc/mdadm/mdadm.conf manually by adding the line returned from mdadm -Es:
ARRAY /dev/md/raid_disk_2 level=raid6 metadata=1.2 num-devices=8 UUID=b5402c2f:02bd34b0:fe8277f3:2925a050 name=logopolis:raid_disk_2

* Manually started and stopped the raid

* Rebooted, and got the phantom md_d127 raid.

* Verified that multiple reboots consistantly cause one or more of the 8 random drive partitions to be added to this md_d127 fake raid.

* Verified that after stopping md_d127 that mdadm -As starts the proper raid

* Verified that the Bug #469574 does NOT fix the problem, as this did not help:
# start your arrays manualy
# /usr/share/mdadm/mkconf force-generate /etc/mdadm/mdadm.conf
# update-initramfs -k all -u

* Finally, verified that by manually removing the name part of the ARRAY definition in mdadm.conf does not help, even after then re-running update-initramfs -k all -u

* The only fix I know works is to add a init script which stops the phantom raid and then assembles the correct one.

Interesting sidenote: In researching this problem, I found that the Fedora people might have seen the a similar sort of problem, see http://osdir.com/ml/linux-raid/2009-04/msg00358.html however I have not verified this; it may be a red herring.

Revision history for this message
Minoc (wolenetz) wrote :

Addendum: I am on a 64bit Intel box., I accidentally stated above that I was on an AMD box.

Revision history for this message
Minoc (wolenetz) wrote :

Possible workaround (or way to undo the name assignment from the raid w/o losing data.)

If I reset the name (from raid_disk_2), it auto assembles now on reboot properly.

Steps I used to undo the --name setting when I built the raid, w/o data loss or resync.

* unmount any mounted fs on raid, and stop the raid
* changed /dev/md/raid_disk_2 entry in mdadm.conf to /dev/md5, and removed name=
* reset the super-minor: mdadm -A -U super-minor /dev/md5
* stop the raid again: mdadm --stop /dev/md5
* reset the name: mdadm -A -U name /dev/md5
* reset the mdadm.conf: /usr/share/mdadm/mkconf force-generate /etc/mdadm/mdadm.conf
* reset the various initramfs images: update-initramfs -k all -u

Now /dev/md5 is fully assembled upon boot.

Conclusion: I will not mess with the name field when creating raids for now. It may be due to my misunderstanding of the purpose of the name field (apparently it works only if it is a number - the same as the super-minor), or there is a bug somewhere.

Revision history for this message
Brian Quinion (n-launchpad-brian-quinion-co-uk) wrote :

I've had this problem also. It seems to relate to a broken MD definition line created by mkconf

  sudo /usr/share/mdadm/mkconf force-generate /etc/mdadm/mdadm.conf

creates a line that looks like this:

  ARRAY /dev/md/array24 level=raid1 metadata=1.2 num-devices=2 UUID=a5ce2c19:4ab6c886:d226463a:f01fec16 name=:array24

while

  sudo mdadm --detail --scan

generates:

  ARRAY /dev/md0 level=raid1 num-devices=2 metadata=01.02 name=:array24 UUID=a5ce2c19:4ab6c886:d226463a:f01fec16

I I replace the line in /etc/mdadm/mdadm.conf with the above and then run

  sudo update-initramfs -k all -u

the problem is resolved.

Revision history for this message
Oliver Breuer (oliver-breuer) wrote :

Same problem here with a strange /dev/md_d127 (AMD64, Ubuntu 10.10). I also used a name ("raid1"). It's of type raid1 with two devices. The md device is partitioned. Metadata version is 1.2.

Unsetting the name as described by Marcus Overhagen solved the problem. Actually the name was not removed but automatically replaced with "64". Not all steps described by Marcus Overhagen where necessary. Only the lines with "... --update=name ...", the ".../mkconf..." and the "update-initramfs..." were needed (the first --update=name command I executed in the emergency shell when booting the system, see below).

But there is something else wrong which might be the reason. The following steps where done before resetting the name:

After deactivating (commenting out the lines) in /lib/udev/rules.d/85-mdadm.rules (this prevents the array from being automatically incrementally assembled at boot time), the boot stopped and threw me into a shell (the root resides on the raid).

In the shell I tried to reproduce the commands normally automatically executed by udev (via the script 85-mdadm.rules):
mdadm --incremental /dev/sda1
mdadm --incremental /dev/sdb1

The first command succeeded. The second command didn't. It failed with a message similar (I cannot remember the exact name of the file) to: "cannot create /dev/md/d1, file exists"

It looks like the mdadm --incremental command twice tries to create the device file which, of course, the second time doesn't succeed. This leaves a half build array lying arround.

After resetting the name, the --incremental command works as expected.

The bug is quite severe, I think, because it can lead to an unbootable system (at least without manual intervention). Probably the easiest way to reduce the problem would be a big warning in the mdadm man page that non-numeric names can lead to problems (probably there are even more restrictions like the number range).

Changed in mdadm (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.