vol_id doesn't guarantee that mdadm/lvm/etc. have actually finished with the device, so only mitigates and does not solve mountroot race

Bug #244926 reported by Etienne Goyer on 2008-07-02
8
Affects Status Importance Assigned to Milestone
initramfs-tools (Ubuntu)
Undecided
Scott James Remnant (Canonical)
Hardy
Undecided
Unassigned
Intrepid
Undecided
Scott James Remnant (Canonical)

Bug Description

Under certain circumstances, boot may fail and stop at an initramfs shell prompt with an error such as:

     mount: Mounting /dev/md0 on /root failed: device or ressource busy

This could happen, for example, when the root file system is on a software RAID 1 volume. The problem could be resolved by adding a "sleep 10" or "udevadm settle" in /usr/share/initramfs-tools/init, before the function mountroot is being called.

Discussing the problem with Scott, it appear that the race condition could be better avoided by letting udev settle after each call to vol_id in /usr/share/initramfs-tools/scripts/local instead. A package in his PPA implement such a change: https://launchpad.net/~scott/+archive.

== SRU ==

Impact: Boot failure can happen intermittently. The failure mode is very unfriendly (being dumped in an initramfs shell). The cause is non-apparent and hard to find.

Development: This problem have been fixed in intrepid, specifically in version 0.92bubuntu2. However, the changelog do not mention that it LP: #244926.

Patch for the current version in hard: http://launchpadlibrarian.net/15761888/initramfs-tools_0.85eubuntu36_0.85eubuntu39.2.diff.gz

TEST CASE: I was unable to reproduce the problem reliably. The problem was reported to me (EtienneG) by a third party. thi s third party confirmed the fix provided by Scott make booting reliable.

Regression potential: None that I can see.

The initramfs-tools fixes, as packaged in Scott's PPA, have been reported to fix the problem on a system where the boot failed intermittently (believed to be to be related to having the root filesystem on a software RAID 1 volume).

Following a discussion with Scott on IRC, I would like to ask for SRU in hardy for this bug.

Scott: for the benefit of the SRU team, can you confirm that 0.92bubuntu2 actually fix this bug in intrepid, and make this bug task "Fix Released" for intrepid? Thanks!

description: updated
Martin Pitt (pitti) wrote :

I subscribed Scott, so that he has a chance to answer. Please clarify what the patch is (the description just refers to teh entire package diff.gz) and the intrepid status. Based on the description, the change sounds pretty straightforward and safe, thus ok for SRU.

Changed in initramfs-tools:
status: New → Incomplete
assignee: nobody → scott

diff -Nru /tmp/eAxeH4D1AR/initramfs-tools-0.85eubuntu36/scripts/local /tmp/QfAijt3GjD/initramfs-tools-0.85eubuntu39.2/scripts/local
--- initramfs-tools-0.85eubuntu36/scripts/local 2008-04-09 15:18:14.000000000 +0100
+++ initramfs-tools-0.85eubuntu39.2/scripts/local 2008-07-02 16:44:28.000000000 +0100
@@ -9,7 +9,7 @@

  # If the root device hasn't shown up yet, give it a little while
  # to deal with removable devices
- if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; then
+ if [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1 || ! /sbin/udevadm settle; then
   log_begin_msg "Waiting for root file system..."

   # Default delay is 180s
@@ -23,7 +23,7 @@
   fi

   slumber=$(( ${slumber} * 10 ))
- while [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; do
+ while [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1 || ! /sbin/udevadm settle; do
    /bin/sleep 0.1
    slumber=$(( ${slumber} - 1 ))
    [ ${slumber} -gt 0 ] || break
@@ -40,7 +40,7 @@
  fi

  # We've given up, but we'll let the user fix matters if they can
- while [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1; do
+ while [ ! -e "${ROOT}" ] || ! /lib/udev/vol_id "${ROOT}" >/dev/null 2>&1 || ! /sbin/udevadm settle; do
   echo " Check root= bootarg cat /proc/cmdline"
   echo " or missing modules, devices: cat /proc/modules ls /dev"
   panic -r "ALERT! ${ROOT} does not exist. Dropping to a shell!"

Changed in initramfs-tools:
status: Incomplete → Fix Released

Pasted relevant bit of the diff above, it's actually one of those new LP package version-to-version diffs, though it appears to have picked a larger diff than it needed to.

Fundamentally we call udevadm settle after vol_id to guarantee that udev has actually finished with the block device. This means we check:

  1) the device exists, ie. udev has created it
  2) vol_id returns valid data for the filesystem, ie. any lvm or mdadm has been activated
  3) udev has settled, ie. any binaries run by udev have exited and there are no locks

Steve Langasek (vorlon) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in initramfs-tools:
status: New → Fix Committed
Pascal DANEK (hunal) wrote :

I confirm that this patch has fixed our intermittent boot issue.

Martin Pitt (pitti) wrote :

Copied to hardy-updates.

Changed in initramfs-tools:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers