[->UUIDudev] initramfs/mdrun doesn't honor preferred minor when starting RAID volumes

Bug #49914 reported by Jeff Balderson
8
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

On a Sun e420 during the install, I created my volumes as follows:

/dev/sda1 /boot
/dev/md0 /home (/dev/sd[ab]8)
/dev/md1 / (/dev/sd[ab]2)
/dev/md2 /usr (/dev/sd[ab]4)
/dev/md3 swap (/dev/sd[ab]5)
/dev/md4 /tmp (/dev/sd[ab]6)
/dev/md5 /var (/dev/sd[ab]7)

and completed the install. Upon reboot, my RAID volumes were started as:

/dev/sda1 /boot
/dev/md0 /
/dev/md1 /usr
/dev/md2 swap
/dev/md3 /tmp
/dev/md4 /var
/dev/md5 /home

apparently started in order of discovery (/dev/sda1 through /dev/sda8), not honoring the preferred minor or /etc/mdadm.conf, and rendering my system unbootable until I did some surgery.

After the surgery, I patched to date (incl kernel 2.6.15-25) and did an event-free reboot.

At this point, the raid volumes are:

/dev/md0 /boot (/dev/sd[ab]1)
/dev/md1 / (/dev/sd[ab]2)
/dev/md2 /usr (/dev/sd[ab]4)
/dev/md3 swap (/dev/sd[ab]5)
/dev/md4 /tmp (/dev/sd[ab]6)
/dev/md5 /var (/dev/sd[ab]7)
/dev/md6 /home (/dev/sd[ab]8)

I created two raid volumes

mdadm -C -l5 -c64 -n6 -x0 /dev/md11 /dev/sd[cdefgh]
mdadm -C -l5 -c64 -n6 -x0 /dev/md12 /dev/sd[ijklmn]

As you can see, my RAID volume components do have the preferred minor listed correctly prior to rebooting (the array is still building here):

root@vali:~# mdadm -E /dev/sdc (d-h report similar)
/dev/sdc:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 71063e4f:f3a0c78b:12a4584b:a8cd9402
  Creation Time : Thu Jun 15 15:21:36 2006
     Raid Level : raid5
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 11

    Update Time : Thu Jun 15 15:28:42 2006
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 54875fd8 - correct
         Events : 0.48

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 0 8 32 0 active sync /dev/sdc

   0 0 8 32 0 active sync /dev/sdc
   1 1 8 48 1 active sync /dev/sdd
   2 2 8 64 2 active sync /dev/sde
   3 3 8 80 3 active sync /dev/sdf
   4 4 8 96 4 active sync /dev/sdg
   5 5 0 0 5 faulty removed
   6 6 8 112 6 spare /dev/sdh

my mdadm.conf is set correctly:

root@vali:~# cat /etc/mdadm/mdadm.conf
DEVICE partitions
DEVICE /dev/sd[cdefghijklmn]
ARRAY /dev/md11 level=raid5 num-devices=6 UUID=71063e4f:f3a0c78b:12a4584b:a8cd9402
ARRAY /dev/md12 level=raid5 num-devices=6 UUID=456e8cd0:0f23591b:14a0ff9f:1a302d54
ARRAY /dev/md6 level=raid1 num-devices=2 UUID=4b33d5c5:80846d59:dba11e6d:814823f3
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=76f34ac9:d74a2d9c:d0fc0f95:eab326d2
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=0eed0b47:c6e81eea:3ed1c7a6:3ed2a756
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=1d626217:4d20944a:5dbbcb0d:dd7c6e3d
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=102303be:19a3252d:48a3f79e:33f16ce1
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=30eedd12:b5b69786:97b18df5:7efabcbf
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=9b28e9d5:944316d7:f0aacc8b:d5d82b98

And yet when I reboot /dev/md11 is started as /dev/md7 and /dev/md12 is started as /dev/md8.

root@vali:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md8 : active raid5 sdi[0] sdn[6] sdm[4] sdl[3] sdk[2] sdj[1]
      177832000 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUUU_]
      [>....................] recovery = 0.0% (28416/35566400) finish=561.1min speed=10520K/sec

md7 : active raid5 sdc[0] sdh[6] sdg[4] sdf[3] sde[2] sdd[1]
      177832000 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUUUU_]
      [>....................] recovery = 0.0% (29184/35566400) finish=567.9min speed=10420K/sec

md6 : active raid1 sda8[0] sdb8[1]
      6474112 blocks [2/2] [UU]

md5 : active raid1 sda7[0] sdb7[1]
      14651200 blocks [2/2] [UU]

md4 : active raid1 sda6[0] sdb6[1]
      995904 blocks [2/2] [UU]

md3 : active raid1 sda5[0] sdb5[1]
      7815552 blocks [2/2] [UU]

md2 : active raid1 sda4[0] sdb4[1]
      4996096 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      497920 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      120384 blocks [2/2] [UU]

unused devices: <none>

You'll notice that the preferred minor is set correctly:

root@vali:~# mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 00.90.03
           UUID : 71063e4f:f3a0c78b:12a4584b:a8cd9402
  Creation Time : Thu Jun 15 15:21:36 2006
     Raid Level : raid5
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 11

    Update Time : Thu Jun 15 15:28:42 2006
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
  Spare Devices : 1
       Checksum : 54875fd8 - correct
         Events : 0.48

         Layout : left-symmetric
     Chunk Size : 64K

      Number Major Minor RaidDevice State
this 0 8 32 0 active sync /dev/sdc

   0 0 8 32 0 active sync /dev/sdc
   1 1 8 48 1 active sync /dev/sdd
   2 2 8 64 2 active sync /dev/sde
   3 3 8 80 3 active sync /dev/sdf
   4 4 8 96 4 active sync /dev/sdg
   5 5 0 0 5 faulty removed
   6 6 8 112 6 spare /dev/sdh

The preferred minor is available in the initramfs, so there's no reason it shouldn't be used to restart the arrays (for i in /dev/hd* /dev/sd*; do mdadm -E $i; done, etc..)

I'm fairly certain that I've done similar non-linear md layouts in the past on Ubuntu (hoary/breezy), and it worked without problems, although I can't say that I've done any this way in Dapper.

And before someone reads this and says I'm asking for trouble, it *is* possible to safely do a RAID1 /boot on Sparc without issues, but that's a separate issue.

Revision history for this message
Jeff Balderson (jbalders) wrote :

This may possibly be an madm/mdrun problem, but I haven't had a chance to investigate whether it's the local-top/md script that's causing it, or if it's mdrun.

Revision history for this message
Oliver Brakmann (obrakmann) wrote :

Just for the heck of it, you might want to try my patched md script and see if it works better for you. See bug #48756. Please not that in the attached script, there's a typo on the last mdadm line: it really should be '--assemble' where it says '-assemble'. Also, the output doesn't get redirected, but that's not critical.
It'd be nice to hear if it works out for you :-)

Revision history for this message
gmlion (gm-l) wrote :

How is the situation in Ubuntu 6.10? Is the bug still present?

Changed in mdadm:
assignee: nobody → gm-l
status: Unconfirmed → Needs Info
Revision history for this message
Jeff Balderson (jbalders) wrote :

Yes. The bug is present in Edgy.. I haven't tried upgrading to Feisty yet.

I created a raid array:

mdadm --create -l5 -n 14 /dev/md10 /dev/sd[bcdfghijklqrst]1

Added it to /etc/mdadm/mdadm.conf:

root@sunflower:/# cat /etc/mdadm/mdadm.conf
DEVICE partitions
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=75fa8aad:ebad7171:fab782c2:128dd001
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=4962386b:4220a6b0:52362aed:e8177a98
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=211c8f5b:d8854afd:33626b06:0fce6c77
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=a389ea68:820de8ac:50f8d391:6a1e5bfc
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=4d4021dc:5a1d718e:c72beddc:a40274b6
ARRAY /dev/md5 level=raid1 num-devices=2 UUID=aaeb5e70:58768c4f:b218283e:86a35d4c
ARRAY /dev/md10 level=raid5 num-devices=14 UUID=b68f200f:c99aa697:ec26162e:580f6e07

Rebuilt my initramfs (not sure if this is actually necessary or not):
update-initramfs -k all -u

Added it to /etc/fstab:

root@sunflower:/# fgrep export /etc/fstab
UUID=378b2a0d-66f6-4228-9772-6edc387b2dad /export ext3 defaults 0 2

But it still gets started as /dev/md6:
root@sunflower:/# cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md6 : active raid5 sdb1[0] sdt1[13] sds1[12] sdr1[11] sdq1[10] sdl1[9] sdk1[8] sdj1[7] sdi1[6] sdh1[5] sdg1[4] sdf1[3] sdd1[2] sdc1[1]
      462172672 blocks level 5, 64k chunk, algorithm 2 [14/14] [UUUUUUUUUUUUUU]
... (md0-md5 pruned for brevity)

While everything DOES gets started and mounted correctly using the UUIDs, if I set a RAID volume to start up as /dev/md10, it's my belief that it should remain there after rebooting even if it uses the UUIDs to start them.

gmlion (gm-l)
Changed in mdadm:
status: Needs Info → Confirmed
assignee: gm-l → nobody
Revision history for this message
ceg (ceg) wrote :

You can't rely on device numbering and it's a seriously flawed design of mdadm to set up arrays according to unreliable superblock information. (device "minor" numbers, labels, hostnames) The idea of fixing the unreliability by limiting array assembly with mdadm.conf (PARTITIONS, ARRAY, HOMEHOST lines) makes it even worse. Now setup tools and admins are forced to create mdadm.conf files leading to problems with wrongly/partly/never assembled arrays.

The only thing mdadm can and should rely on when assembling is the high probability of uniqueness of UUIDs (not on admins or tools or install scripts to set up mdadm.conf)

Does someone have the permissions to set this to won't fix in order to unclutter the bug list? (Other mdadm bugs do cover the root cause/fix to use UUIDs.)

ceg (ceg)
summary: - initramfs/mdrun doesn't honor preferred minor when starting RAID volumes
+ [->UUIDudev] initramfs/mdrun doesn't honor preferred minor when starting
+ RAID volumes
Revision history for this message
Phillip Susi (psusi) wrote :

As long as you have a properly configured mdadm.conf and up to date initramfs, the minor should be honored. Is this still affecting anyone these days?

Changed in mdadm (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for mdadm (Ubuntu) because there has been no activity for 60 days.]

Changed in mdadm (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.