Ubuntu

[->UUIDudev] mdadm.conf w/o ARRAY lines but udev/mdadm not assembling arrays. (boot & hotplug fails)

Reported by Christian Roessner on 2007-08-30
142
This bug affects 22 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: mdadm

Hi,

I could not boot from my /dev/md1 -> /dev/vg01/lv_root partition, because the initrd could not assemble the md0 and md1 devices. After reading mdadm.conf, I added these two lines to my config:

# definitions of existing MD arrays
ARRAY /dev/md0 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md1 devices=/dev/sda5,/dev/sdb5

and updated initramfs. After this, the system started normally. Before figuring out this, I used dpkg-reconfigure mdadm. So this seems to produce a non working config, which is copied to the initrd.

Kind regards

Christian

-> However on hotplug systems mdadm.conf should not have to contain any references to specific DEVICEs, ARRAYs or HOMEHOST.

-> We need to switch to a pure and secure UUID-based raid assembly. The boot up scripts already support and use UUIDs, but mdadm needs to be convinced (-> comment #42)

I had the same problem when converting from one disk to a RAID-1 setup.

I added a break=mount and tried to start the arrays with "mdadm -A -s", but that didn't work.

The default configuration has a line "DEVICE partitions" in /etc/mdadm/mdadm.conf. This apparently doesn't work.

(All my partitions are listed in /proc/partitions and the major/minor numbers match up to those in /dev)

Adding the ARRAY lines to mdadm.conf and regenerating the initrd image worked for me.

I'm using gutsy 7.10 x86_64 with Linux image 2.6.22-14-generic (all updates installed).

Subscribing to bug

I had a similar problem when attempting to move my root filesystem onto a RAID-1 configuration.

mdadm wouldn't assmble the array while the initramfs was mounted, but would when the normal root filesystem was mounted.

My mdadm.conf looks to be similar to those described, with a "DEVICE partitions" line.

However, after a bit research, I think that my problem was with the "HOMEHOST <system>" line.

mdadm uses HOMEHOST to determine which partitions to automatically assemble into arrays. When HOMEHOST is set to '<system>' mdadm uses the hostname of the system.

Unfortunately, the initramfs file system doesn't set a hostname (i.e. the hostname of the system in initramfs is '(none)'), which doesn't match the homehost of the partitions in the array, which causes mdadm to not automatically assemble any arrays.
(The first 64 bits of the SHA1 hash of homehost appears to be the last 64 bits of the array's UUID.)

Explicitly naming the arrays using ARRAY lines in mdadm.conf causes mdadm to assemble those arrays without using homehost Alternatively, I set HOMEHOST in mdadm.conf to match the hostname of the system.

I don't know enough about things to say what a general solution is, but hopefully this explains the inconsistent behavior between the initramfs image and the normal root filesystem.

Tobias McNulty (tmcnulty1982) wrote :

Hey, I set the HOMEHOST line in my mdadm.conf and it worked like a charm. Thanks so much for posting this. BTW this seems to be a duplicate of https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/226484

When can we get a fix to this released? This is pretty unacceptable behavior, and there are all sorts of theories circulating out there as to why it doesn't work (it took me several a couple hours to get to this page).

After the update to 9.04 I had the same problems but could build the md devices in the emergency shell and resumed booting with:

# mdadm --assemble --scan;exit

Today I found the fix. I removed the duplicated definitions of the arrays in mdadm.conf with the output of:

# mdadm --detail --scan

this added the information for the metadata - 0.90 if you are wondering

The other change that I made was editing the DEVICE line from:

DEVICE partitions

to:

DEVICE /dev/sda[1-4] /dev/sdb[1-4]

The last step was to rebuild the initrd with the new config file.

# update-initramfs

after that the system of my grandparents is booting again without the need of them entering a cryptic line on every start up.

Some questions that still remain:

1. Why does the RAID brake on every ubuntu version upgrade since about 3 years?
2. Why the mdadm.conf contained duplicated array and device lines?
3. Why can't mdadm report a broken configuration in a way so that I don't need a week or two too fix the problem?

Regards, Dominik

Davias (davias) wrote :

Thank you for providing a solution - I printed the page and started to update from 8.10 to 9.04 on AMD64 with /, /home and swap on md0, md1 & md2 on RAID1, plus a RAID0 on md3 using two spare partitions on my 2 SATA disks.

But... The upgrade just went as smooth as silk! The system booted just fine. For now I just had to reconfigure VMWare server 2.

Not so for md3, the RAID0. Raid monitor says md_d3 (instead of calling it md3): inactive sdb4[0](S)

Any help? TIA

Roy Jamison (xteejx) wrote :

Marking this bug as Triaged as there should be enough debugging information here for a developer to begin working on it. Please provide them with any information they need if requested. High importance set - problems with disk controller(s).
If anyone has Jaunty and this problem, would you run
apport-collect 136252
Thank you.

Changed in mdadm (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
status: Confirmed → Triaged

After updating to 9.04, I experienced a similar problem.

I have three md devices: md0 (/dev/sd{a,b}1, swap), md1 (/dev/sd{a,b}2, root),
and md4 (/dev/sd{a,b}3, home). On boot, mdadm would incorrectly detect that
/dev/sda was an md device and create it. Upon inspection (using mdadm -E),
/dev/sda, /dev/sdb, /dev/sda3, and /dev/sdb3 all had the same super block.
Zeroing the /dev/sda and /dev/sdb ones resulted in also zeroing the /dev/sda3
and /dev/sdb3 ones (which suggests that the /dev/sda3 superblock was being
detected as the /dev/sda one).

I changed my /etc/mdadm/mdadm.conf to have

  DEVICE /dev/sd??*

rather than

  DEVICE partitions

and the system booted fine.

While searching for this problem, I found that some people reported that
mdadm --incremental would not create their arrays properly whereas
mdadm -As worked fine. I extracted the initrd (kernel 2.6.28-11-generic)
and found that this is indeed how the arrays are being built. I didn't check
whether 8.10 uses --incremental or --assemble, but perhaps this is a starting
point for further investigation of this issue.

ceg (ceg) wrote :

Current state of ubuntu systems with md raid: https://wiki.ubuntu.com/ReliableRaid

Roy Jamison (xteejx) wrote :

Returning to Incomplete.
Can someone please provide the requested information and let us know if this is a problem in the latest Ubuntu release (Karmic), and if so, please run "apport-collect 136252" without quotes and let it pull in the required debugging information.
Thank you.

Changed in mdadm (Ubuntu):
status: Triaged → Incomplete
Patrick (oc3an) wrote :

Confirming that the problem still exists in Karmic.

If /etc/mdadm/mdadm.conf does not contain an ARRAY line, the system will not boot from an md array.

If I run mdadm --auto-detect from the recovery console md0 is created without needing the ARRAY line.

Patrick (oc3an) wrote :

After doing a bit of digging it looks like the arrays are built using the --incremental option to mdadm, specifically with following UDEV rule:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
 RUN+="/sbin/mdadm --incremental $env{DEVNAME}"

Here is my not-working mdadm.conf:

------------------------------------------------------------------------------------------------
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR <email address hidden>

# definitions of existing MD arrays

# This file was auto-generated on Wed, 27 Jan 2010 15:02:41 +1100
# by mkconf $Id$
------------------------------------------------------------------------------------------------

If I add a line:
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=00.90 UUID=blah:blah:blah:blah

The system will boot fine, which suggests that we are breaking one of the rules that mdadm requires for --incremental to work (or mdadm has a bug).

My first thoughts are:
1. is /proc/partitions is valid at the point where this UDEV rule runs.
2. does the homehost match. i.e. is gethostname working when this rule runs?

This is about where my knowledge is exhausted.

-Patrick

Roy Jamison (xteejx) wrote :

Thank you for updating this. Can you please run the apport-collect command as detailed in my last comment. This will give us most, if not all the debugging information the developers will need, although what you have provided so far has been great. Thank you.

summary: - [gutsy] mdadm, initramfs missing ARRAY lines
+ [karmic] mdadm, initramfs missing ARRAY lines

I will run it now.

Be aware that I'm doing this from a working system, with the mdadm.conf file modified so that I can boot.

Architecture: amd64
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
MDadmExamine.dev.sda:
 Error: command ['/sbin/mdadm', '-E', '/dev/sda'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sda: Permission denied
MDadmExamine.dev.sda1:
 Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sda1: Permission denied
MDadmExamine.dev.sda2:
 Error: command ['/sbin/mdadm', '-E', '/dev/sda2'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sda2: Permission denied
MDadmExamine.dev.sdb:
 Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sdb: Permission denied
MDadmExamine.dev.sdb1:
 Error: command ['/sbin/mdadm', '-E', '/dev/sdb1'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sdb1: Permission denied
MDadmExamine.dev.sdb2:
 Error: command ['/sbin/mdadm', '-E', '/dev/sdb2'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sdb2: Permission denied
MDadmExamine.dev.sdc:
 Error: command ['/sbin/mdadm', '-E', '/dev/sdc'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sdc: Permission denied
MDadmExamine.dev.sdd:
 Error: command ['/sbin/mdadm', '-E', '/dev/sdd'] failed with exit code 1: mdadm: metadata format 00.90 unknown, ignored.
 mdadm: cannot open /dev/sdd: Permission denied
MachineType: Dell Inc. Precision WorkStation 390
NonfreeKernelModules: nvidia
Package: mdadm 2.6.7.1-1ubuntu13
PackageArchitecture: amd64
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-17-generic root=UUID=78b8e8f7-0073-42b5-8a4c-039ea88ee8c0 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_AU.UTF-8
ProcMDstat:
 Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
 md0 : active raid1 sda2[0] sdb2[1]
       288840576 blocks [2/2] [UU]

 unused devices: <none>
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
Uname: Linux 2.6.31-17-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 07/28/2007
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.4.0
dmi.board.name: 0DN075
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.4.0:bd07/28/2007:svnDellInc.:pnPrecisionWorkStation390:pvr:rvnDellInc.:rn0DN075:rvr:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision WorkStation 390
dmi.sys.vendor: Dell Inc.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'

Patrick (oc3an) wrote : Lspci.txt
Patrick (oc3an) wrote : Lsusb.txt
Patrick (oc3an) wrote : UdevDb.txt
Patrick (oc3an) wrote : UdevLog.txt
Changed in mdadm (Ubuntu):
status: Incomplete → New
tags: added: apport-collected

Thank you for doing that for us. This bug can now be upgraded to the Triaged status, and should now have enough information for a developer to begin work. Thank you again.

Changed in mdadm (Ubuntu):
status: New → Triaged
ceg (ceg) wrote :

I think on hotplug systems mdadm.conf should generally not contain any specific ARRAY references, maybe it should explicity mention "any" like this:?

DEVICE <any>
HOMEHOST <any>
ARRAY <any>

This whole bussiness of locking down array assembly (homehost,ARRAY) may just be due to the historical (suboptimal) mdadm design to assamble raids going by major/minor numbers saved in the superblocks. Those should always be considered being dynamic(depreciated). Using --assemble --uuid mdadm takes a UUID and match this unique but same UUID on all member devices.

Of course we should refrain from running arrays that are not complete (avoid leading them to desync by subsequent writes), unless we are required to recover data from a specific failed array and --run it manually or by a startup script.

ceg (ceg) on 2010-03-09
summary: - [karmic] mdadm, initramfs missing ARRAY lines
+ [karmic] mdadm.conf w/o ARRAY lines but udev/mdadm not assembling
+ arrays.

The problem would appear to be in the --incremental flag, it is unable to create the map file for the arrays already created and mdadm gets confused. A quick fix is to add this to the mdadm hook in initramfs:

# Create runtime directory for incremental map
mkdir -p ${DESTDIR}/var/run/mdadm

My machine still doesn't activate lvm properly though, but that would be a different issue =)

ceg (ceg) wrote :

Nice you found that out, Sami!

Are your arrays taged with your hostname? From Bug #252345 mdadm --incremental seems to block members with non-matching homehost tag and does not seem to have an option to disable that. (Or comprehend the --auto-update-homehost option as a workaround with the possible side effect to break other systems.)

ceg (ceg) wrote :

@Sami, with that directory created in initramfs, your setup is now working without any mdadm.conf ARRAY lines, right?

ceg (ceg) wrote :

@Sami (newly subscribed): Some previous comments relate to you.

ceg (ceg) wrote :

Adopting from #226484 (now a duplicate of this one):

This is a bug with mdadm --incremental not doing hotplug, because it looks for permission to do so in mdadm.conf.

On hotplug systems mdadm.conf should not have to contain any specific references, to make things clear and for backwards compatibility maybe it could contain explicit "any" statements like this:?

DEVICE <any>
HOMEHOST <any>
ARRAY <any>

This whole bussiness of locking down array assembly (homehost,ARRAY) may just be due to the historical (suboptimal) mdadm design to assamble raids going by major/minor numbers saved in the superblocks. Those should always be considered being dynamic (depreciated as a reference). Using --assemble --uuid mdadm takes a UUID and matches this unique but same UUID to determine member devices, this is appropriate for default behavour.

Of course we should refrain from running arrays that are not complete (to avoid leading them to desync by subsequent writes), unless it is required to recover data or boot from a specific failed array. For this case we should use --run <specific device> manually or in a startup scripts.

ceg (ceg) on 2010-03-27
description: updated
summary: [karmic] mdadm.conf w/o ARRAY lines but udev/mdadm not assembling
- arrays.
+ arrays. (boot fails)

I think i've set the homehost in the arrays to null and no modifications have been made to mdadm configuration other than the modification in the initramfs hook. The array is now built normally during boot.

ceg (ceg) wrote :

Could you maybe post the mdadm.conf and try what happens if you (temporarily) change the homehost of array members to somthing different, to make sure what problem got solved.

-mdadm.conf "DEVICE partitions" has mdadm consider any partition available to the system.
-mdadm.conf ARRAY <identification> is present it disables homehost checking said Jason, but it disables assembly of all other arrays.
-Changing homehost to "null" makes the homehost check of the initramfs pass.
-Creating ${DESTDIR}/var/run/mdadm makes ... Sami?

Think, we are missing the option to disable the homehost checking with --incremental, or is the failing only a side effect of mdadm not being able to create the device map?

Sami Haahtinen (ressu) wrote :

Here is the mdadm.conf from my system, it is unmodified. I did run --assemble from initramfs once with --auto-update-homehost which apparently set the homehost to '<system>'.

ceg (ceg) wrote :

Thanks for clarifying Sami.

I had seen that missing dir before, and now filed the separate Bug #550131, could you add/confirm info there about the error you encountered with /var/run/mdadm missing.

ceg (ceg) wrote :

We are missing pure and secure UUID-only incremental assembly (i.e. allowing "DEVICES partititions", "ARRAY <any>" and "HOMEHOST <any>").

One that would create unique disk/by-uuid nodes and by-label symlinks (the later might get numbered on conflicts). This scheme may not even expose any of the generally arbitrary md* device enumaration to userpace/filesystem, just like the device mapper under /dev/mapper. (Yes, this is disregarding the unreliable and fluctuating major-minor numbers as well as hostnames replicated on superblocks around the world.) Each raid member showing up is added to the system either creating a new UUID node or (incrementally) added to the matching UUID node.

How to test/implement this concept, now?

Maybe, let /lib/udev/rules.d/85-mdadm.rules dynamically update the mdadm.conf prior to calling mdadm --incremental.

# grep --invert-match ARRAY mdadm.conf > /etc/mdadm/mdadm.conf-stripped
# cp /etc/mdadm/mdadm.conf-stripped /etc/mdadm/mdadm.conf
# echo "ARRAY uuid=${<uuid-variable-from-udev>}" >> /etc/mdadm/mdadm.conf

This way mdadm --incremental always sees a mdadm.conf containing a "ARRAY uuid=<uuid-of-new-device>" line, should allways assemble, but never based on wrongly matching minors, labels or hostnames in superblocks.

This should currently still create /dev/md* devices, but after switching to use UUIDs in boot scripts etc. this should not lead to false matches.

ceg (ceg) on 2010-03-28
description: updated
ceg (ceg) wrote :

Hey, it's also a solution to Bug #158918 (improved there).

ceg (ceg) on 2010-03-28
summary: - [karmic] mdadm.conf w/o ARRAY lines but udev/mdadm not assembling
+ [->UUIDudev] mdadm.conf w/o ARRAY lines but udev/mdadm not assembling
arrays. (boot fails)
description: updated
ceg (ceg) on 2010-03-30
summary: [->UUIDudev] mdadm.conf w/o ARRAY lines but udev/mdadm not assembling
- arrays. (boot fails)
+ arrays. (boot & hotplug fails)

Dear ubuntu developer,

I have finished assessing the mdadm bugs and found the four
(distribution related) bugs in ubuntu's mdadm setup which mostly render
it unusable.

All four reports contain proposed solutions, now.
Mostly they're just simple adjustments to the mdadm commands used in the
management scripts.
Two are already set to a higher priority.
Two are harder to identify as such without a higher priority setting.

Ok, the first two (with a bugload of other reports) can be solved by
using commands that implement UUID/udev-based raid assembly.

High/Triaged
* Bug #158918 [->UUIDudev] installing mdadm (or outdated mdadm.conf)
breaks bootup

Medium/Triaged
Bug #136252 [->UUIDudev] mdadm.conf w/o ARRAY lines but udev/mdadm not
assembling arrays. (boot & hotplug fails)

Then, a simple mkdir is needed in mdadm's initramfs hook.
* Bug #550131 initramfs missing /var/run/mdadm/ dir (loosing state,
race, misconfig)

Finaly, the initramfs must only deal with degrading the rootfs
dependencies.
* Bug #497186 initramfs' init-premount degrades *all* arrays (not just
those required to boot)

Hopfully I could help out with sorting through the issues.

--
You'd probably want to get all the fixes from upstream into ubuntu main
anyway.
Bug #495370 Please upgrade to 3.1.x for lucid

Surbhi Palande (csurbhi) wrote :
Download full text (4.8 KiB)

Call for testing mdadm 2.7.1 autoassembly.

For hitherto Ubuntu releases the mdadm package shall stay at 2.7.1 However Natty would have mdadm at 3.4.1. This document is intended to test the mdadm fixes for 2.7.1. Here is the rough procedure that needs to be followed:

Testing auto-assembly of your md array when your rootfs lies on it:
1) Install the mdadm package and initramfs package kept at: https://edge.launchpad.net/~csurbhi/+archive/mdadm-autoassembly
2) Run /usr/share/mdadm/mkconf and ensure that your /etc/mdadm/mdadm.conf has the array definition.
a) Save your original initramfs in /boot itself by say /boot/initrd-old.img.
b) Then run update-initramfs -c -k <your-kernel-version>. Store this iniramfs as /boot/initrd-new.img. We shall use this initramfs as a safety net. If you cannot boot with the auto-assembly fixes, then you should not land in a foot in your mouth situation. Through grub's edit menu, you can then resort to this safety net by editing the initrd=initrd-new.img (or if this does not work for some random reason then resort back to your older initrd=initrd-old.img) This way you will be sure that you can still boot your precious system.
c) Now comment or remove the ARRAY definitions from your /etc/mdadm/mdadm.conf and once again run the same “update-initramfs -c -k <your-kernel-version>” to generate a brand new initramfs.
3)Run mdadm –detail –scan and note the UUIDs in the array. Note the hostname stored in your array. Does it not match with your real hostname? Then we can fix that at the initramfs prompt that you inevitably will land at if you try auto-assembly. Also note the device components that form the root md-device. Keep this paper for cross checking when you reboot
4)Reboot.
5)If you are at the initramfs prompt here are the things that you should first ensure:
a) ls /bin/hostname /etc/hostname - are these files present?
b) run “hostname”. Does this show you the hostname that your system is intended to have? Is it the same as the contents of /etc/hostname.
c) ls /var/run – Is this dir there?
If you answer yes to the above three questions, then things are so far so good. Now run the following command:
mdadm –assemble -U uuid /dev/<md-name> <dev-components-listed here>
Your mdadm –detail –scan that you ran previously should have given you the component names if you dont know it right now. Hopefully you have them listed on your paper.
Eg in my case I ran:
mdadm –assemble -U uuid /dev/md0 /dev/sda1 /dev/sdb1
Again run:
mdadm –detail –scan <md-device> and verify that the uuids are indeed updated and the hostname reflects the hostname that is stored /etc/hostname. You can now press Ctr+D and you should come back to the root prompt. However you still need to test auto-assembly of your root md device. To do that simple reboot and you should not see the face of initramfs this time. You should land gently on your root prompt as you expected. If you do not see the light of the rootfs prompt this way or using this initramfs, then as mentioned earlier, please avail your saved initrd images through grub. Skip the further steps in this case. Update the launchpad bugs, saying you could not get to the root prompt with manua...

Read more...

Changed in mdadm (Ubuntu):
assignee: nobody → Surbhi Palande (csurbhi)
status: Triaged → Confirmed
status: Confirmed → In Progress
Surbhi Palande (csurbhi) wrote :

Hi All, Thanks a lot for your insightful comments and making Ubuntu better. I have done the fixes mostly, based on these comments. The patches are based on the source code in neil-browns git repository and a few initramfs fixes which are applicable to Ubuntu (which are again based on the bugs on launchpad). Do let me know the output of the test ppas. At present these are only for maverick. I shall be uploading the ones based for lucid soon enough! Thanks again!

Sami Haahtinen (ressu) wrote :

I just confirmed that the bug is gone for me with the packages from the PPA.

Surbhi Palande (csurbhi) wrote :

Adding a few patches for mdadm auto assembly to work. Seen in the ppa requested for testing.

Surbhi Palande (csurbhi) wrote :
Surbhi Palande (csurbhi) wrote :
Surbhi Palande (csurbhi) wrote :
Surbhi Palande (csurbhi) wrote :
Surbhi Palande (csurbhi) wrote :

1) The ppa requested for testing consists of these patches. They are needed for the proper working of mdadm auto assembly.
2) Also for the mdadm autoassembly to work properly, the following needs sponsorship for maverick and lucid:
https://code.launchpad.net/~csurbhi/+junk/initramfs.mdadm.fixes.

Please do consider merging these patches for maverick and lucid. Thanks!

tags: added: patch
The Loeki (the-loeki) wrote :

And there you go.

+1 for fix on my up-to-date 64-bit Maverick Server :)

I'm loving it. Been rebuilding my 6*2TB RAID6 for three days now, figuring something went wrong a couple of times.

Then I found out the detection issue and then this bug is easily found.

FWIW, I've got an array created using:
mdadm --create /dev/md20 --level=6 --raid-devices=6 -c 256 -e 1.2 --name=SATAs /dev/sd[cdefgh]

The infamous md_d127 device kept showing up as well.

doing an
mdadm --stop /dev/md_d127
mdadm --assemble -v /dev/md/SATAs /dev/sd[cdefgh]
then properly reinitalized the array.

The only thing that would work was manually adding
ARRAY /dev/md20 level=raid6 num-devices=6 metadata=01.02 name=ubuntu:SATAs UUID=10b2a833:0fac6b94:3224150e:0aeefa84 devices=/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh

to /dev/mdadm/mdadm.conf. Named array (/dev/md/SATAs) will not work.

Your mdadm/initramfs correctly detects & activates the array without anything in /etc/mdadm/mdadm.conf, albeit as /dev/md127, NOT as /dev/md/SATAs

As a fortunate side-note, yours also correctly reports the metadata superblock version as 1.2, as opposed to 01.02 in the other, leading to errors.

ceg (ceg) wrote :

@Surbhi

With natty (mdadm capable of hotplugging) the hostname should probably not be exposed in the the initramfs (boot may stay unencrypted). Are you applying that patch in 11.04?

Surbhi Palande (csurbhi) wrote :

@ceg,
yes, the hostname part is no longer needed for the newer mdadm. We don't need any of the patches for natty. They only apply to older mdadm version.

Surbhi Palande (csurbhi) wrote :

Debdiff of all patches fixing bugs for:
*) autoassembly of raid arrays
This will also as a good side effect:
*) choose the correct device name (also necessary for correct auto assembly).
*) report the appropriate meta data information.

Please do consider this for mdadm-2.6.7.x. The debdiff is created for maverick and can apply also for lucid.
Thanks!

Changed in mdadm (Ubuntu):
assignee: Surbhi Palande (csurbhi) → nobody
Ben Bucksch (benbucksch) wrote :

Again I ran into this. System was an Ubuntu 8.10 with an 8-disk md array plus separate system disk. Given that 8.10 is not supported anymore, I needed to upgrade to 10.04. All I did was change sources.list to lucid and apt-get dist-upgrade, reboot, and I am left without my array.
* mdadm.conf is the default, no ARRAY line
* mdadm --auto-detect does nothing
* mdadm --assemble /dem/md0 complains
Worse,
* Adding ARRAY /dev/md0 devices=/dev/sda,/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf,/dev/sdg,/dev/sdh to mdadm.conf and doing mdadm --assemble /dem/md0 says that /dev/sdb has no superblock - which is freaky, given that it worked fine before the upgrade.

So, this is still breaking systems, even when upgrading to the latest 10.04 LTS.

Davias (davias) wrote :

Well, what you described is not standard upgrade procedure, as per Ubuntu. I did upgrade from 8.10, following Ubuntu directive, going to -> 9.04 ; reboot -> 9.10 ; reboot -> 10.04 LTS with my system on RAID1 and GRUB on both disks. That was a year ago and it is still working.
Hope this can be of any help.

JC (s-launchpad-jc) wrote :

I just ran into this problem with homehost vs. hostname in initramfs on my up-to-date 10.04. It's good to know that a fix is imminent.

BTW, bug #330399 is another duplicate.

Steve Langasek (vorlon) wrote :

Surbhi,

Sorry to be so long in reviewing this. Some questions about the proposed SRU diff:

  * While rebuilding the mapfile (mdadm -Ir), if appropriate name is not found
    in /dev/md, look for a name in mdadm.conf or the metadata.

There's no bug number for this change. What issue is this fixing? This also doesn't appear to be in mdadm 3.1.4 - is the issue fixed differently in later releases?

  * Allow an empty name rather than "unknown" for an array

What issue is this fixing (no bug number)? Is there any chance of regression from this change?

  * Identify md0 correctly - fixed typo (LP: #532960)

The SRU diff is 900 lines long... can you be more specific about which line is the typo fix for this bug? :)

  * Resolve issues like mdadm -Ss; mdadm: unable to open /dev/md/r1: "No such
    file or directory"
  * Report the correct superblock version.
  * Correct the logic for partitions in md devices. Use /sys/dev links to map
    major/minor to devnum in sysfs
  * Changed the open_mddev_devnum() to create_mddev_devnum() - as its really
    creating something in /dev. Also renamed for porting few changes from
    mdadm-3.4.1 to mdadm-2.6.7.x
  * If two devices are added via -I, mdadm can get badly confused. Fixed this.

Again, not clear which changes map to which changelog entries, making it very difficult to review this. Maybe a bzr branch with individual commits would make this easier to understand?

  * For autoassembly to work properly the initramfs should set the hostname.
    Copy the hostname binary and /etc/hostname in the initramfs so as to set
    the hostname at boot time. (LP: #136252)

It's surprising that initramfs-tools' init calls hostname, but doesn't take any steps to make it available in the initramfs. But I think this is still the right place to do the copy since only when mdadm is used do we need it in the initramfs.

  * At installation time hostname will not be set before the array is
    created. Thus the uuid written on the root array created at Ubuntu
    installation time will never correspond to that of the hostname. Due to
    this auto assembly will never work. Add support to use the machine name
    when the hostname is unspecified example at installation time.
    (LP: #136252) (LP: #532960) (LP: #330399)

None of the referenced bugs appear to discuss this happening at installation time. Can you explain why this would happen at install time? Isn't setting the host name one of the first things done in the installer? If that's not being done correctly, is there a bug to track fixing this issue in the installer, too? (Of course, we still need mdadm to cope with any systems that were already installed this way.)

  * Fix the error " /dev/MKDEV not found"

From what I see, this isn't an error, just a warning. I don't think this is appropriate to include in an SRU.

Changed in mdadm (Ubuntu Lucid):
status: New → In Progress
Changed in mdadm (Ubuntu Maverick):
status: New → In Progress
ceg (ceg) wrote :

> It's surprising that initramfs-tools' init calls hostname, but doesn't take any steps to make it available in the initramfs.

I guess it's quite sane to not have the hostname leak into the initramfs. The initramfs stays universal to boot different root filsystems. And if a root filesystem is encrypted, having the hostname in initramfs might be even less desired.

The new mdadm should be able to incrementally assemble any arrays plugged securely, without needing/doing any hostname checks on the member devices, relying on UUIDs instead to prevent wrong assemblies. So this hostname insertion should just be a temporary workaround, and go away with an update to a more recent mdadm version.

Phillip Susi (psusi) wrote :

It looks like you can specify that incremental assembly should recognize arrays without caring about homehost or having an ARRAY line with the AUTO line.

Changed in mdadm (Ubuntu):
assignee: nobody → Dmitrijs Ledkovs (dmitrij.ledkov)
no longer affects: mdadm (Ubuntu Lucid)
no longer affects: mdadm (Ubuntu Maverick)
Changed in mdadm (Ubuntu):
status: In Progress → Confirmed
assignee: Dmitrijs Ledkovs (dmitrij.ledkov) → nobody

I can also confirm that this bug is stil present on precise

I first did a release-upgrade when the crash started appearing. I reinstalled the box and the issue still appears. I will try these tips here regarding mdadm.conf.

On a related note. I find it interesting that my only RAID-device becomes /dev/md127 instead of /dev/md0 as it has always been.

root@dom0:/home/thu# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04 LTS"

root@dom0:/home/thu# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Sat, 28 Jul 2012 01:05:22 +0200
# by mkconf $Id$

root@dom0:/home/thu# mdadm --detail /dev/md127
/dev/md127:
        Version : 0.90
  Creation Time : Sat Jan 24 14:16:52 2009
     Raid Level : raid6
     Array Size : 13186245888 (12575.38 GiB 13502.72 GB)
  Used Dev Size : 1465138432 (1397.26 GiB 1500.30 GB)
   Raid Devices : 11
  Total Devices : 12
Preferred Minor : 127
    Persistence : Superblock is persistent

    Update Time : Sat Jul 28 13:31:06 2012
          State : clean, degraded, recovering
 Active Devices : 10
Working Devices : 11
 Failed Devices : 1
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

 Rebuild Status : 85% complete

           UUID : 1f095377:1c62a5f4:352a6ad4:582f9bd3
         Events : 0.608892

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions