Regression: mdadm drops to initramfs shell on unregistered array

Bug #917520 reported by Sergei Ianovich on 2012-01-17
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Undecided
Unassigned

Bug Description

We run a production server with software RAID5. After upgrade from lucid to oneiric (clean), the server didn't boot.

This file [1] is the source of the regression. It makes a system drop to an initramfs shell on boot, when there is a failure a non-critical RAID. At the moment it is not event registered to /etc/mdadm/mdadm.conf, but it prevents system from booting.

The file assumes that any single array failure should prevent system from booting. I think this assumption is too strong. RAID0 is good for swap and cache storage. Hot-swapping a failed drive with losing cache is perfectly legitimate. Failed swap may require a cold reboot, but it is still possible.

In my case, some drives are attached to a backplane, others are connect to the motherboard. The latter are seen in second, while the former take a dozen to initialize. The root device gets ready in between, then [1] kicks in and forcefully tries to start in a degraded state each array that is not yet ready. The system drops to initramfs shell.

The complete solution will be to provide BOOT_DEGRADED on a per array basis. For now, please remove this file or provide an option to disable it.

1. /usr/share/initramfs-tools/scripts/local-premount/mdadm

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: mdadm 3.1.4-1+8efb9d1ubuntu6
ProcVersionSignature: Ubuntu 3.0.0-14.23-server 3.0.9
Uname: Linux 3.0.0-14-server x86_64
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
CurrentDmesg:
 [ 297.542335] init: tty1 main process ended, respawning
 [ 310.400047] usb 5-2: USB disconnect, device number 2
Date: Tue Jan 17 10:45:48 2012
InstallationMedia: Ubuntu-Server 11.10 "Oneiric Ocelot" - Release amd64 (20111011)
MDadmExamine.dev.sda: Error: command ['/sbin/mdadm', '-E', '/dev/sda'] failed with exit code 1: mdadm: cannot open /dev/sda: Permission denied
MDadmExamine.dev.sda1: Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: cannot open /dev/sda1: Permission denied
MDadmExamine.dev.sda2: Error: command ['/sbin/mdadm', '-E', '/dev/sda2'] failed with exit code 1: mdadm: cannot open /dev/sda2: Permission denied
MDadmExamine.dev.sda3: Error: command ['/sbin/mdadm', '-E', '/dev/sda3'] failed with exit code 1: mdadm: cannot open /dev/sda3: Permission denied
MDadmExamine.dev.sdb: Error: command ['/sbin/mdadm', '-E', '/dev/sdb'] failed with exit code 1: mdadm: cannot open /dev/sdb: Permission denied
MDadmExamine.dev.sdb1: Error: command ['/sbin/mdadm', '-E', '/dev/sdb1'] failed with exit code 1: mdadm: cannot open /dev/sdb1: Permission denied
MDadmExamine.dev.sdb2: Error: command ['/sbin/mdadm', '-E', '/dev/sdb2'] failed with exit code 1: mdadm: cannot open /dev/sdb2: Permission denied
MDadmExamine.dev.sdb3: Error: command ['/sbin/mdadm', '-E', '/dev/sdb3'] failed with exit code 1: mdadm: cannot open /dev/sdb3: Permission denied
MDadmExamine.dev.sdc: Error: command ['/sbin/mdadm', '-E', '/dev/sdc'] failed with exit code 1: mdadm: cannot open /dev/sdc: Permission denied
MDadmExamine.dev.sdc1: Error: command ['/sbin/mdadm', '-E', '/dev/sdc1'] failed with exit code 1: mdadm: cannot open /dev/sdc1: Permission denied
MDadmExamine.dev.sdc2: Error: command ['/sbin/mdadm', '-E', '/dev/sdc2'] failed with exit code 1: mdadm: cannot open /dev/sdc2: Permission denied
MDadmExamine.dev.sdc3: Error: command ['/sbin/mdadm', '-E', '/dev/sdc3'] failed with exit code 1: mdadm: cannot open /dev/sdc3: Permission denied
MDadmExamine.dev.sdd: Error: command ['/sbin/mdadm', '-E', '/dev/sdd'] failed with exit code 1: mdadm: cannot open /dev/sdd: Permission denied
MDadmExamine.dev.sdd1: Error: command ['/sbin/mdadm', '-E', '/dev/sdd1'] failed with exit code 1: mdadm: cannot open /dev/sdd1: Permission denied
MDadmExamine.dev.sdd2: Error: command ['/sbin/mdadm', '-E', '/dev/sdd2'] failed with exit code 1: mdadm: cannot open /dev/sdd2: Permission denied
MDadmExamine.dev.sdd3: Error: command ['/sbin/mdadm', '-E', '/dev/sdd3'] failed with exit code 1: mdadm: cannot open /dev/sdd3: Permission denied
MDadmExamine.dev.sde: Error: command ['/sbin/mdadm', '-E', '/dev/sde'] failed with exit code 1: mdadm: cannot open /dev/sde: Permission denied
MDadmExamine.dev.sde1: Error: command ['/sbin/mdadm', '-E', '/dev/sde1'] failed with exit code 1: mdadm: cannot open /dev/sde1: Permission denied
MDadmExamine.dev.sde2: Error: command ['/sbin/mdadm', '-E', '/dev/sde2'] failed with exit code 1: mdadm: cannot open /dev/sde2: Permission denied
MDadmExamine.dev.sde3: Error: command ['/sbin/mdadm', '-E', '/dev/sde3'] failed with exit code 1: mdadm: cannot open /dev/sde3: Permission denied
MDadmExamine.dev.sdf: Error: command ['/sbin/mdadm', '-E', '/dev/sdf'] failed with exit code 1: mdadm: cannot open /dev/sdf: Permission denied
MDadmExamine.dev.sdf1: Error: command ['/sbin/mdadm', '-E', '/dev/sdf1'] failed with exit code 1: mdadm: cannot open /dev/sdf1: Permission denied
MDadmExamine.dev.sdf2: Error: command ['/sbin/mdadm', '-E', '/dev/sdf2'] failed with exit code 1: mdadm: cannot open /dev/sdf2: Permission denied
MDadmExamine.dev.sdf3: Error: command ['/sbin/mdadm', '-E', '/dev/sdf3'] failed with exit code 1: mdadm: cannot open /dev/sdf3: Permission denied
MDadmExamine.dev.sdg: Error: command ['/sbin/mdadm', '-E', '/dev/sdg'] failed with exit code 1: mdadm: cannot open /dev/sdg: Permission denied
MDadmExamine.dev.sdg1: Error: command ['/sbin/mdadm', '-E', '/dev/sdg1'] failed with exit code 1: mdadm: cannot open /dev/sdg1: Permission denied
MDadmExamine.dev.sdg2: Error: command ['/sbin/mdadm', '-E', '/dev/sdg2'] failed with exit code 1: mdadm: cannot open /dev/sdg2: Permission denied
MDadmExamine.dev.sdg3: Error: command ['/sbin/mdadm', '-E', '/dev/sdg3'] failed with exit code 1: mdadm: cannot open /dev/sdg3: Permission denied
MachineType: Supermicro X7DCL
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-14-server root=UUID=92036d1d-e7c5-4249-a721-ae72f6ca3a65 ro rootdelay=30
SourcePackage: mdadm
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/12/2009
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 1.1a
dmi.board.name: X7DCL
dmi.board.vendor: Supermicro
dmi.board.version: PCB Version
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr1.1a:bd01/12/2009:svnSupermicro:pnX7DCL:pvr0123456789:rvnSupermicro:rnX7DCL:rvrPCBVersion:cvnSupermicro:ct1:cvr0123456789:
dmi.product.name: X7DCL
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Sergei Ianovich (ynvich-gmail) wrote :
Clint Byrum (clint-fewbar) wrote :

Sergey, thanks for the bug report. I understand that this seems too harsh, but we are being perhaps overly careful not to boot up a system with a broken array unless the user explicitly allows it.

Answering the debconf question to boot degraded will solve this issue for you, as it trades the security of never booting automatically while degraded for the convenience of always having the system boot up.

Also I believe this is a duplicate of bug #872220

nanog (sorenimpey) wrote :

This bug is not a duplicate of #872220.

I run 6 large Raid5/6 arrays. After upgrade to precise from hardy all of them were mistakenly detected as degraded by the init script. I can absolutely confirm that none of these drives weractually degraded. Because the init-script enters a race state and drops to busybox this bug prevents my servers from being restarted remotely. This is a major regression.

The cause and solution of this bug is detailed in this post:

http://ubuntuforums.org/showpost.php?p=11388915&postcount=18

Multiple users confirm that they are affected by this regression.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
nanog (sorenimpey) wrote :

just to be clear the switch to udev (instead of debian init script) caused the problem. udev does not wait for the raid array to settle and mistakenly marks it as degraded.

nanog (sorenimpey) wrote :

Please unmark this bug as a duplicate. It is not a duplicate of https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/872220.

Phillip Susi (psusi) wrote :

Why do you think it is not a duplicate?

paul fox (pgf-launchpad) wrote :

i can't tell from the initial description whether or not there was a truly degraded array on his system. perhaps not, in which case the fixes described in #942106 would help.

but since the topic of this bug very precisely describes my current issue, and my issue is definitely not a duplicate, i'm commenting here.

i'm running oneiric.

my root fs is _not_ a RAID disk.

i have a single mirrored RAID pair, which is _not_ mentioned in mdadm.conf.

this pair is _always_ degraded, intentionally so. so it's clearly important for my system to boot with it in a degraded state.

[rationale, for the curious: the disk stores my BackupPC backup pool, which is extremely difficult (impossible, really) to copy (for offsite purposes) with standard rsync/cp tools, due to its huge numbers of hardlinks. it could be copied using dd, but that would involve taking the disk offline for long periods. instead, i do my backups to one half of a degraded RAID. when i want a copy for offsite storage i add the missing disk, sync it (which all happens with the pair online), then (the next day, when syncing is done) briefly shut down BackupPC, unmount the RAID pair, manually fail the offsite disk, remove it, and remount the now-degraded pair.]

clearly, setting BOOT_DEGRADED will fix (i hope) my problem -- i only have one RAID, and maintain it in a somewhat unusual way.

but i've been thinking of using RAID for my root disk soon. if i do, i may well not want to set BOOT_DEGRADED. so i'll be stuck.

here's the critical part of the original description: "The complete solution will be to provide BOOT_DEGRADED on a per array basis." and, as a corollary, booting shouldn't be prevented when degradation is detected in an array that's not even configured.

Dimitri John Ledkov (xnox) wrote :

@paul fox (pgf-launchpad)

I think for your unusual way you should write a udev rule to override mdadm.udev rule to do the right thing for your special case, until improvements are made to unconditionally start incomplete non-rootfs raids.

Mitch Claborn (mitch-news) wrote :

This also doesn't seem like a duplicate to me.

I built a brand new server with a Raid5. The array was in perfect shape, but the system would not boot - dropped to the initramfs shell. http://ubuntuforums.org/showpost.php?p=11388915&postcount=18 fixed it for me.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers