Race condition at system-boot: md-RAID not always ready in time

Bug #610107 reported by Arno Wagner on 2010-07-26
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
udev (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: udev

Almost each time I start the System I get error-messages concerning my md-RAID devices.

Example:
"udevd-work[77]: inotify_add_watch(6, /dev/md1, 10) failed: No such file or directory"

Sometimes only one of my several RAIDs is concerned, sometimes more of them.
If this error shows up for the RAID my root directory is located on, the system won't boot but drop to a shell.
If only data-drives are concerned, the system finishes the boot process normally and all RAID drives are up by then.

There are no remarkable entries in the log.

The problem occurs on all three Lucid Installations I have made so far. (One was an upgrade from Karmic, the next was a fresh install, both on real hardware. The third is the one this report is done with, and this is a fresh Installation in a VirtualBox)

/etc/mdadm/mdadm.conf:
# definitions of existing MD arrays
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=0.90 UUID=cd601f2c:76c2a84f:2d20de61:3cd29610
ARRAY /dev/md1 level=raid1 num-devices=2 metadata=0.90 UUID=5057c6bb:a8f652b9:2d20de61:3cd29610

blkid:
/dev/sda1: UUID="712d6c10-d9a5-4471-83b8-1e1f2749f817" TYPE="ext4"
/dev/sda5: UUID="696c8371-1612-4654-ac21-a7b08b35c950" TYPE="swap"
/dev/sdb1: UUID="cd601f2c-76c2-a84f-2d20-de613cd29610" TYPE="linux_raid_member"
/dev/sdc1: UUID="cd601f2c-76c2-a84f-2d20-de613cd29610" TYPE="linux_raid_member"
/dev/sdd1: UUID="5057c6bb-a8f6-52b9-2d20-de613cd29610" TYPE="linux_raid_member"
/dev/sde1: UUID="5057c6bb-a8f6-52b9-2d20-de613cd29610" TYPE="linux_raid_member"
/dev/md1: UUID="tM2vUv-zY1H-i1LW-wlDb-3qit-23uE-33fxP6" TYPE="LVM2_member"
/dev/md0: UUID="5MJTwi-LIkA-oxV4-328f-nqD6-nev6-fFzRHD" TYPE="LVM2_member"
/dev/mapper/vg1-test: UUID="de6e51c6-4cce-4810-9cb9-47d1a730ca6d" TYPE="jfs"

Ubuntu-Release:
Description: Ubuntu 10.04.1 LTS
Release: 10.04

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: udev 151-12
ProcVersionSignature: Ubuntu 2.6.32-24.38-generic 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic i686
Architecture: i386
CustomUdevRuleFiles: 70-xorg-vboxmouse.rules 60-vboxadd.rules
Date: Mon Jul 26 16:18:20 2010
InstallationMedia: Ubuntu 10.04 "Lucid Lynx" - Beta i386 (20100318)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: innotek GmbH VirtualBox
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-24-generic root=UUID=712d6c10-d9a5-4471-83b8-1e1f2749f817 ro quiet splash
ProcEnviron:
 LANG=de_DE.utf8
 SHELL=/bin/bash
SourcePackage: udev
dmi.bios.date: 12/01/2006
dmi.bios.vendor: innotek GmbH
dmi.bios.version: VirtualBox
dmi.modalias: dmi:bvninnotekGmbH:bvrVirtualBox:bd12/01/2006:svninnotekGmbH:pnVirtualBox:pvr1.2:
dmi.product.name: VirtualBox
dmi.product.version: 1.2
dmi.sys.vendor: innotek GmbH

Changed in udev (Ubuntu):
status: New → Confirmed
Gernot Hillier (gernot-hillier) wrote :

We also see somehow similar issues here sporadically on a number of machines with 10.04.1 - in our case, we only have a data-RAID which is not necessary for mounting root partition. And this data raid will stay half-way assembled on some boots.

Currently, we think it's caused by udevd being killed in the middle of its operation, see #613273.

If "mdadm --incremental" is interrupted at the wrong moment, it seems to cause a lot of weird issues - ranging from a leftover /dev/.tmp.md.8:xx which makes mdadm bailing out with "Strange error loading metadata for /dev/md0" for all future operations until reboot to completely damaged data structures in the kernel with wrong device numbers, half-busy devices and the like.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers