[->UUIDudev] installing mdadm (or outdated mdadm.conf) breaks bootup

Bug #158918 reported by PaulSchulz on 2007-10-31
50
This bug affects 7 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
High
Unassigned
Nominated for Lucid by ceg

Bug Description

Original Report (also confirmed with 9.10 and 10.04 beta1):

On a freshly installed ubuntu-7.10-alternate, with latest apt-get update.

When the 'mdadm' package is installed, the system fails to boot successfully, and ends up at the initrd '(busybox)' prompt.

Hardware: DELL 1950 - 1RU Server
HDD: SAS

To get the server booting again you need to revert to the old initramfs:

- Boot with ubuntu-7.10-alternate, and go through install steps up to 'partitioning'.
- ALT-F2 to start other shell
- 'fdisk -l' to see details of available drives.
- mkdir /mnt/disk
- mount -t ext3 /dev/sdb1 /mnt/disk
- cd /mnt/disk/boot
- mv initrd-<version>.img initrd-<version>.img-new
- cp initrd-<version>.img.bak initrd-<version>.img
- sync
- reboot

---
Diagnose:

-> This is mdadm setting up arrays according to unreliable superblock information (device "minor" numbers, labels, hostnames) combined with the idea of fixing the unreliability by limiting array assembly with information from mdadm.conf (PARTITIONS, ARRAY, HOMEHOST lines) which just reassigns the unsolvable conflict handling problem to setup tools, admins and installers. It forces them to create mdadm.conf files. And of course they fail.

In cases where old superblocks are found on the disks during mdadm install, they are added to ARRAY definitions (that really shouldn't need to be there at all) in mdadm.conf, and copied over into the initramfs. During next boot the system can not assemble these (incomplete) arrays.

Cure:

Systematically prevent conflicts from arising instead of relying on mdadm.conf maintanance. -> Do not depend on mdadm.conf definitions but use UUID-based array assembly as described in comment #33

PaulSchulz (paulschulz) wrote :

 Version: mdadm 2.6.2-1ubuntu2

PaulSchulz (paulschulz) wrote :

Hee is a diff between the original initrd.img-2.6.22-14-server, and the one installed after mdadm has been installed (and the fix done above).

Roy Jamison (xteejx) wrote :

Thank you for reporting this bug to Ubuntu. Unfortunately, 7.10 has reached EOL.
Please see this document for currently supported Ubuntu releases:
https://wiki.ubuntu.com/Releases
You can follow instructions at https://help.ubuntu.com/community/EOLUpgrades to upgrade to Hardy or later.
Please feel free to report any other bugs you may find.

Changed in mdadm (Ubuntu):
status: New → Invalid
PaulSchulz (paulschulz) wrote :

Thank you for this automated response.

I will testing to see whether this bug is still present in the Karmic Pre-Release.

Roy Jamison (xteejx) wrote :

It wasn't an automated response ;) but thank you.

Craig Rae (crae) wrote :

I'm seeing the same thing on a fresh install of the 32-bit release of 9.10.

Hardware: Dell T3500 (Xeon E5520, 6GB DDR3, nVidia Quadro 470, 2x Seagate LP 1.5TB)

Steps to reproduce:

1. Install Ubuntu 9.10 to a single disk partition (doesn't appear to matter which).
2. Boot to the new install.
3. Install mdadm (sudo apt-get install mdadm). Choose "none" at the email config prompt during setup.
4. Once mdadm is installed, reboot the system.
5. During boot, the initial splash screen will come up and the display will quickly go black.

At this point, pressing any key will show a busybox prompt from the middle of initramfs. A failure above will indicate a

I can work around this by making a copy of initramfs before installing mdadm, but will be a showstopper the next time anything wants to touch the ramdisk image, so I'll have to go hack to see if pulling the RAID modules fixes anything.

The worst part is that in this case, I don't even care about initially supporting RAID. This system is going to boot from a simple single-disk partition and later mount a RAID1 array for data.

I can provide a diff of the ramdisk images later.

Roy Jamison (xteejx) wrote :

Thank you Craig. Can you run "apport-collect 158918" and allow it full permission to add debugging information to Launchpad. Thank you.

Changed in mdadm (Ubuntu):
status: Invalid → Incomplete

Architecture: i386
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
MDadmExamine.dev.sda1: Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda1.
MDadmExamine.dev.sdb1: Error: command ['/sbin/mdadm', '-E', '/dev/sdb1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdb1.
MDadmExamine.dev.sdc: Error: command ['/sbin/mdadm', '-E', '/dev/sdc'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdc.
MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sdc1.
MDadmExamine.dev.sdd: Error: command ['/sbin/mdadm', '-E', '/dev/sdd'] failed with exit code 1: mdadm: cannot open /dev/sdd: No medium found
MDadmExamine.dev.sde: Error: command ['/sbin/mdadm', '-E', '/dev/sde'] failed with exit code 1: mdadm: cannot open /dev/sde: No medium found
MDadmExamine.dev.sdf: Error: command ['/sbin/mdadm', '-E', '/dev/sdf'] failed with exit code 1: mdadm: cannot open /dev/sdf: No medium found
MDadmExamine.dev.sdg: Error: command ['/sbin/mdadm', '-E', '/dev/sdg'] failed with exit code 1: mdadm: cannot open /dev/sdg: No medium found
MachineType: Dell Inc. Precision WorkStation T3500
Package: mdadm 2.6.7.1-1ubuntu13
PackageArchitecture: i386
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=UUID=43fd31e7-392c-4c92-a29a-f8c4fe5990b0 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, no user)
 LANG=en_US.UTF-8
ProcMDstat:
 Personalities :
 md0 : inactive sda[2](S)
       1465138496 blocks

 unused devices: <none>
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
Uname: Linux 2.6.31-14-generic i686
UserGroups:

XsessionErrors:
 (gnome-settings-daemon:1861): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (gnome-settings-daemon:1861): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (nautilus:1916): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
 (polkit-gnome-authentication-agent-1:1947): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
dmi.bios.date: 09/09/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A03
dmi.board.name: 0XPDFK
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA03:bd09/09/2009:svnDellInc.:pnPrecisionWorkStationT3500:pvr:rvnDellInc.:rn0XPDFK:rvrA01:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision WorkStation T3500
dmi.sys.vendor: Dell Inc.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
initrd.files:

Craig Rae (crae) wrote : Lspci.txt
Craig Rae (crae) wrote : Lsusb.txt
Craig Rae (crae) wrote : UdevDb.txt
Craig Rae (crae) wrote : UdevLog.txt
Changed in mdadm (Ubuntu):
status: Incomplete → New
tags: added: apport-collected

Done (obviously). Note that this was after reverting back to the original initrd and rerunning grub-install to make it stick - dunno if that matters much to you.

Oh, and I have one other data point for you - I was unable to reproduce this problem on a Dell Poweredge T100 (Xeon E3110, ICH9R). Following the steps in post #6, the system boots sa you'd expect.

Is there anything else I can grab that might help?

Roy Jamison (xteejx) wrote :

That's brilliant! Thank you. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as Triaged and let them handle it from here, and good luck :)

Changed in mdadm (Ubuntu):
importance: Undecided → High
status: New → Triaged
Roy Jamison (xteejx) wrote :

If there are duplicates of this, please mark them as such, as I don't know much about mdadm or RAID, although debugging information is here. Thank you.

Craig Rae (crae) wrote :

Teej - you're welcome. Just holler if there's anything else I can gather to help out.

As a final anecdote, I was able to work around the problem for this particular system by hacking on mdadm's hook in initramfs. Because I don't need to have the system boot from a RAID array, I was able to just ignore whatever it is that the mdadm hook is trying to do by stuffing an "exit 0" at the head of the script.

In case it helps anyone else in a similar situation, this is what I came up with (both to restore a buggered system and to tweak things to avoid the problem for now). The following instructions assume for the sake of brevity that you've already opened a terminal and are pretending to be root with "sudo bash" or similar (prepend "sudo" to all the command-line stuff otherwise).

1. Boot with the 9.10 live CD.

2. Mount the relevant partition. In this case, I've been messing with /dev/sdb1, but whatever floats your board:

> mount /dev/sdb1 /mnt

3. To make life easier for a bit, chroot to the mounted drive:

> chroot /mnt

4. Modify the mdadm hook in initramfs so that it has no effect. Open /usr/share/initramfs-tools/hooks/mdadm and insert "exit 0" on line 8.

4a: To ensure I wouldn't get bitten again until I wanted to, I made the mdadm hook read-only and made it immutable. This may cause a future upgrade of mdadm to fail - you could lock down the version of mdadm as well with apt or synaptic if you so wished.

> chmod -r /usr/share/initramfs-tools/hooks/mdadm
> chattr +i /usr/share/initramfs-tools/hooks/mdadm

5. Rebuild the relevant initrd to your kernel. I haven't updated yet, so I'm still on 2.6.31-14-generic at this time - change this to match whatever kernel you're currently running:

> update-initramfs -k 2.6.31-14-generic -u

6. Bail from the chroot and update grub to pick up the new ramdisk image. You'll need to supply the mount point for your partition and the root device name for that partition. Mine were /mnt and /dev/sdb respectively:

> exit
> grub-install --root-directory=/mnt /dev/sdb

Assuming there were no errors, you can now reboot. The system should come up as normal, and mdadm will be usable. Also, any future updates to the ramdisk image won't result in the changes being blown away, which is why I didn't just take a copy of the ramdisk image and restore that.

Again, not that this procedure is *only* applicable to systems that don't boot from a RAID array, but are interested in using one after the kernel is started.

Craig Rae (crae) wrote :

OK, I have an answer to why this particular system ran into problems but a different one didn't. I noticed it in the stuff that apport dumped to this ticket.

Unbeknownst to me, these two disks were previously part of a 4-disk RAID5 array, built from /dev/sda - /dev/sdd. /dev/sda and /dev/sdb already had a superblock associated with them.

I partitioned both with a 20GB root partition up front so that the end-user will be able to flop between successive Ubuntu releases, and the remaining 1.4whatever TB of each was partitioned as type "fd" to be RAIDed together for data. The initial OS installation went just fine.

However, it looks like when mdadm was installed, it saw the superblocks for /dev/sda and sdb as well as for /dev/sda2 and /dev/sdb2 and set things up accordingly. An initial ramdisk was created that knew about both arrays, and the failure was apparently in it trying to start the broken RAID5 array.

Blowing away the superblocks on /dev/sda and sdb solved this particular problem - mdadm now installs without causing any further problems.

However, what I don't get is why a broken array (with no mount point specified) would result in an unbootable system., rootfs in this case is on a single drive's partition and is not part of any array.

As to whether this is the cause of anyone else's problem, I don't know. I reckon it's pretty unlikely, actually - we have quite a few disks lying around that have been used in 4-disk rackmount servers that have been pulled at one point or another, which is probably a lot less likely to have happened in someone's home setup.

ceg (ceg) wrote :

> what I don't get is why a broken array (with no mount point specified) would result in an unbootable system.

This is again the glorious mdadm thing of setting up arrays according to unreliable superblock information (device "minor" numbers, labels, hostnames) combined with the idea of fixing the unreliability by limiting array assembly with mdadm.conf (PARTITIONS, ARRAY, HOMEHOST lines) and thus forcing setup tools and admins to create mdadm.conf files leading to exact the same problems.

The only thing mdadm can and should rely on when assembling is the high probability of uniqueness of UUIDs (not on admins or tools or install scripts to set up mdadm.conf). Bug #136252

summary: - Installing mdadm package breaks bootup.
+ Installing mdadm package breaks bootup (with old superblocks around).

The solution is a reliable UUID-based raid assembly mechanism.

One that does not depend on ARRAY definitions in mdadm.conf. At the same time making sure never to create devices named equal to the legacy 'standard' non-uniqe and non-persistent naming scheme /dev/md0 etc. causing much havoc.

Without direct support for this in mdadm I finaly think I figured a way with current mdadm features to realize this:

A) Hotplug Action
/lib/udev/rules.d/85-mdadm.rules needs to take care of mdadm.conf when a new raid member appears, and do tree things:

1) grep --invert-match ARRAY mdadm.conf > /var/run/mdadm/mdadm-udev-event-<eventID>.conf
(prior existence of /var/run/mdadm/ is required by mdadm --incremental anyway, see Bug #550131)

2) echo "ARRAY uuid=${<uuid-variable-from-udev>} name=${<uuid-variable-from-udev>}-md # ARRAY lines are dynamicaly rewritten on udev events" >> /var/run/mdadm/mdadm-udev-event-<eventID>.conf

3) mdadm --incremental --config=/var/run/mdadm/mdadm-udev-event-<eventID>.conf $env{DEVNAME}

B) Degrading Action
For arrays required during the boot process a watchdog like script or program like mountall needs to --run arrays degraded if they haven't come up after a timeout. (See https://wiki.ubuntu.com/ReliableRaid for more details.) But as mdadm --run /dev/<md-device> is not supported to start *only a specific* array degraded (i.e. to start only the rootfs degraded after a timeout in initramfs #251646) use this:

mdadm --remove <incomplete-md-device> <arbitrary-member-device-of-incomplete-array>
mdadm --incremental --run <arbitrary-member-device-of-incomplete-array>

PS: With this, to enable partitionable arrays just change the CREATE line in /etc/mdadm/mdadm.conf to contain auto=part.

summary: - Installing mdadm package breaks bootup (with old superblocks around).
+ Installing mdadm or outdated mdadm.conf breaks bootup
ceg (ceg) on 2010-03-28
description: updated
summary: - Installing mdadm or outdated mdadm.conf breaks bootup
+ Installing mdadm (or outdated mdadm.conf) breaks bootup
ceg (ceg) on 2010-03-28
summary: - Installing mdadm (or outdated mdadm.conf) breaks bootup
+ [->UUIDudev] installing mdadm (or outdated mdadm.conf) breaks bootup
ceg (ceg) wrote :

A note for /etc/mdadm/mdadm.conf:

# ARRAY definitions you make here will be honored only when executing mdadm manually.
# The mdadm calls executed on udev events do use this file replacing all lines containing "ARRAY".

ceg (ceg) on 2010-03-30
description: updated
description: updated
ceg (ceg) on 2010-03-30
description: updated
ceg (ceg) on 2010-03-30
description: updated
ceg (ceg) wrote :

Current mdadm versions support AUTO lines in mdadm.conf, that may supersede dynamic mdadm.conf rewriting.

ceg (ceg) on 2010-06-09
description: updated
Grant P. (kevorkian) wrote :

I had problems with the release version of Lucid Desktop on AMD64. A clean install is brought up to date and rebooted. The mdadm package is installed and rebooted. System becomes unbootable in the same manner. Disks had been previously used in nv-raids and maybe md-raid (cannot confirm). Single disk install with new partition table on disk. No raid arrays or partitions present anywere on system. System was rebooted immediately after installing mdadm.

Tried to repeat this on a "factory fresh" disk, however it installs and reboots without problems.

tombert (tombert.live) wrote :

Does affect me as well - here are the steps to reproduce:
Fresh install of Ubuntu Desktop 10.10, performing all online updates - reboot works fine. Then install mdadm the system does not boot any more:

mount: mounting /dev/disk/by-uuid/.... on /root
failed: Device or resource busy
...
...
No init found. Try passing init=bootarg.

See also:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/591696

Roy Helge Rasmussen (reon) wrote :

This bug has bugged me for a while now. It started in 10.10 and is still there in 11.04.

Fresh install of Natty, 11.04 AMD64. All updates performed. -> Reboot works fine.

Install MDADM -> reboot fails

mount: mounting /dev/disk/by-uuid/.... on /root
failed: Device or resource busy
...
...
No init found. Try passing init=bootarg.

Remove MDADM (boot on live-CD) -> reboot works fine

System has 7 disks. /dev/sda as boot disk. /dev/sd[bcdefg] are used as raid members.

Error occurs both when a raid is properly set up, and when all raid members have the superblock zeroed. Error occurs both when mdadm.conf is correct or blank, or missing.

Craig Yoshioka (craigyk) wrote :

I run into the same exact issue described by Roy.

This is still a problem in 12.04.1 Server AMD64.
OS is installed on a single drive and the machine also contains 6 drives that I want to configure as RAID0. Installed and configured OS fine, installed mdadm and now can't boot.
The 6 drives destined for the array are brand new so have never been part of a RAID array. The OS drive was previously part of a RAID array, but obviously was formatted during installation.

ceg (ceg) wrote :

You might want to use the original debian on your server instead.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers