[natty] boot hangs / udev vgchange deadlock in nested vgs?

Bug #789930 reported by Jens Wilke
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
udev (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: udev

After upgrading a server system from lucid to natty the system is not booting any more.
The boot stalls before mounting the root filesystem.

After a while the following messages appear:

udevd[138]: worker [168] exit

udevd[138]: worker [168] unexpectedly returned with status 0x0100

udevd[138]: worker [168] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0/target0:2:0/0:2:0:0/block/sda/sda1'

udevd[138]: seq 1664 done with -32

udevd[138]: worker [168] cleaned up

udevd[138]: worker [167] exit

udevd[138]: worker [167] unexpectedly returned with status 0x0100

udevd[138]: worker [167] failed while handling '/devices/pci0000:00/0000:00:03.0/0000:03:00.0/host0/target0:2:1/0:2:1:0/block/sdb/sdb1'

udevd[138]: seq 1661 done with -32

udevd[138]: worker [167] cleaned up

udevd[138]: worker [173] exit

udevd[138]: worker [173] unexpectedly returned with status 0x0100

udevd[138]: worker [173] failed while handling '/devices/virtual/block/dm-5'

udevd[138]: seq 1770 done with -32

udevd[138]: worker [173] cleaned up

Revision history for this message
Jens Wilke (jw-launchpad) wrote :

The special thing on this system is, that it is used as virtual server host system and logical volumes contain volume groups for the guest systems. After debugging with udev debug output, I got this relevant message:

. . .
udevd-work[173]: creating link '/dev/SAS/vb-flat-fast' to '/dev/dm-5'

udevd-work[173]: creating symlink '/dev/SAS/vb-flat-fast' to '../dm-5'

udevd-work[173]: created db file '/dev/.udev/data/b251:5' for '/devices/virtual/block/dm-5'

udevd-work[173]: 'watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'' started

After this the system stalls.

1. The LVs containing on the system VG are not fully processed yet. The LV containing the root is not yet available.
2. The above LV is the first containing another VG, which triggers another vgchange.

So the problem seams to be a deadlock situation when recursively activating volume groups.

I did fix the problem by overriding the 85-lvm2.rules udev rule in /etc/udev/rules.d/85-lvm2.rules:

---snip---
# override default 85-lvm2.rules and don't run vgchange recursively on LV devices (ENV{DM_LV_NAME}=="")
# 2011-05-29;jw
SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", ENV{DM_LV_NAME}=="" \
        RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"
---snip---

And "update-initramfs -u".

It must be noted that the same system boots fine with Lucid without this modification.

Revision history for this message
Kotusev (kotusev) wrote :

Seems the same problem here:
After upgrade to natty, once after ~10 reboots system hangs while initrd tries to bing up lvm.

I have no rules in /etc/udev/rules.d in my initrd.
I have no nested LVs and even have this filter in my lvm.conf: filter = [ "a|^/dev/sd.*|", "r|.*|" ]

ps shows this:
  244 0 4124 S /lib/udev/watershed sh -c /sbin/lvm vgscan; /sbin/lv
  245 0 4412 S sh -c /sbin/lvm vgscan; /sbin/lvm vgchange -a y
  247 0 4124 S /lib/udev/watershed sh -c /sbin/lvm vgscan; /sbin/lv
  248 0 34996 S < /sbin/lvm vgchange -a y

No LV is in active state.
So initrd can not mount the root.

Changed in udev (Ubuntu):
status: New → Confirmed
Revision history for this message
Wessel Dankers (wsl) wrote :

I think this happens because the lvm2 commands are called from inside udev, while lvm itself also waits for udev to complete. This creates an obvious deadlock.

/lib/udev/rules.d/85-lvm2.rules should be changed to read:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", \
 RUN+="watershed sh -c '/sbin/lvm vgscan --noudevsync; /sbin/lvm vgchange --noudevsync -a y'"

cheers

Revision history for this message
Wessel Dankers (wsl) wrote :

I was a little overzealous with the application of --noudevsync there, only vgchange accepts that.

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", \
 RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange --noudevsync -a y'"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.