udev and lvm2 hang at boot

Bug #906358 reported by Filip Granö
102
This bug affects 18 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned
udev (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I'm experiencing a really long wait when booting freshly installed xubuntu 11.10 with only updates and mdadm + lvm2 installed. I have managed to narrow it down somewhat but still looking for a fix.

After disabling splash and quiet from grub I noticed it's waiting at /scripts/init-bottom/udev in initrd and continues after 61 seconds.
Not surprisingly i found a "udevadm control --timeout 61 --exit" line in there. Well, why does it fail so badly it waits for timeout before exiting?

After quite a bit of googling I found this:
http://us.generation-nt.com/answer/problem-lvm-gets-stuck-during-booting-due-recent-uevent-change-help-205241751.html
Ari Savolainen writes:
An init script (/scripts/init-bottom/udev in initrd) issues command
"udevadm control --timeouta --exit".
At the same time udevd is executing "/sbin/lvm vgchange -a y" (from
/lib/udev/rules.d/85-lvm2.rules) that calls ioctl to resume a logical
volume. After that lvm gets stuck forever. Booting continues after the
61 second timeout.

Milan Broz writes:
If you call vgchange or even vgscan from udev rule, it is completely wrong.
This is not lvm upstream udev rule btw.

This makes me slightly worried, does this cause any other problems than a annoyingly slow boot? Is there any way to fix this? I this a known issue?

Revision history for this message
Filip Granö (fgrano) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in udev (Ubuntu):
status: New → Confirmed
Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Filip,
  That sounds like it's the same as what I used to have in bug 625395

Revision history for this message
David (dogge2k-deactivatedaccount) wrote :

It seems the problem is due to udevadm called from the initramfs (/usr/share/initramfs-tools/scripts/init-bottom/udev). On my system udevadm runs for 61 seconds and times out (see bootchart). It only occurs with kernel 3.0.0-14, older kernels are working fine. Reducing the timeout to 20 seconds will result in an unusable system. I get an error message that the lvm partition is not found.

Revision history for this message
David (dogge2k-deactivatedaccount) wrote :
Revision history for this message
Christian Weiske (cweiske) wrote :

Disabling the rule and running `update-initramfs -u` afterwards leads to a non-bootable system :)

Revision history for this message
Davide (darkenergyreactor) wrote :

Hi, I am experiencing the same bug on some of my machines with kernel 3.0.0-14.
I found you can easily reproduce the issue on a virtualbox vm by installing a minimal system on a logical volume.
The system will hang for 61 sec at next reboot if you use lvcreate to create additional logical volumes.

Revision history for this message
David (dogge2k-deactivatedaccount) wrote :

I did some more research on this problem and it's udevd itself. Activating the debug output gives me the following message after 60 seconds:

timeout, giving up waiting for workers to finish

This message comes from udevd.c:1626 when epoll_wait times out. After this message udevd will immediately exit without waiting for the other worker threads. I'm not sure why the kernel doesn't signal udevd.
I've also tested this with the recent kernel 3.0.0-15 from oneiric-propsed with the same result.
Maybe we should raise this bug as critical because Ubuntu is currently not really usable on a lvm partition.

Revision history for this message
Peter Matulis (petermatulis) wrote :

@dogge2k

Can you confirm that this happens with any type of LVM configuration? Desktop Edition and/or Server Edition?

I wouldn't say that a one-minute pause translates to "unusable".

Revision history for this message
Peter Matulis (petermatulis) wrote :

I just a fresh install of a 11.10 Desktop (KVM guest) with the root filesystem on a logical volume (and /boot not) and there is no pause at all. It is running the 3.0.0-14 generic kernel.

Revision history for this message
Mikael Rapp (micke-rapp) wrote :

I experience the same issue on Ubuntu Server (11.10) , but only after a few hours of fiddling with lxc and brctl.

My system completely freezes(not a 60 sec pause) but i guess that is due to fact that my / is on the lvm
The only way for me to access the system is to start it in recovery mode -> remount -> resume.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Mikael: It sounds like you have a different problem there - this bug number is just for bugs that occur during boot, and give a shortish pause to the boot process, but it then carries on.

Peter: It doesn't happen with any lvm config - I'm running LVM and I don't see it at the moment, but have in the past.

Revision history for this message
Peter Matulis (petermatulis) wrote :

This is not an obvious problem. I'm changing the status to 'Incomplete' until we can narrow down the problem some more.

1. Let's try the recipe given by darkenergyreactor. He stated that it is reliably reproduceable if you manually create logical volumes (with lvcreate) post-install.

2. For those people that have been affected by this, does the above sound familiar?

Changed in udev (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
David (dogge2k-deactivatedaccount) wrote :

@petermatulis

I couldn't reproduce this problem with a fresh installation (desktop edition, amd64). But the problem still exists on my other system. I'm not sure if there are other dependencies which might trigger this problem but it seems I'm not the only one who encounter this problem.

Revision history for this message
Christian Weiske (cweiske) wrote :

The machine I have problems with has this partition setup:

$ LC_ALL=C fdisk -l /dev/sda

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b8523

   Device Boot Start End Blocks Id System
/dev/sda1 * 63 1048638 524288 83 Linux
/dev/sda3 126881370 2930272064 1401695347+ 5 Extended
/dev/sda5 126881433 246870854 59994711 83 Linux
/dev/sda6 246870918 254871224 4000153+ 82 Linux swap / Solaris
/dev/sda7 254871288 2930272064 1337700388+ 83 Linux

sda7 is the partition on which i have my lvm setup:

$ lvmdiskscan |grep LVM
  /dev/sda7 [ 1,25 TiB] LVM physical volume
  0 LVM physical volume whole disks
  1 LVM physical volume

LVM setup:

$ lvm vgs
  VG #PV #LV #SN Attr VSize VFree
  Daten 1 2 0 wz--n- 1,25t 575,73g
$ lvm lvs
  LV VG Attr LSize Origin Snap% Move Log Copy% Convert
  games Daten -wi-ao 400,00g
  homes Daten -wi-ao 300,00g

Mounts:
/dev/mapper/Daten-homes on /home type ext4 (rw,commit=0)
/dev/mapper/Daten-games on /home/spiele type ext4 (rw,commit=0)

Revision history for this message
Peter Matulis (petermatulis) wrote :

@Christian

Did you create the logical volume post-install?

Provide all information on how you set up your machine outside of the installer itself.

Everybody please read my comment #13 so I don't have to keep repeating myself. Thanks.

Revision history for this message
Christian Weiske (cweiske) wrote :

I'm on the way reproducing it, but I wasn't there yet - but already wanted to share the setup.

Revision history for this message
Michael Kofler (michael-kofler) wrote :

Same problem: First a U11.10 installation into a 'normal' partition, all fine.

Later I set up a Logical Volume (actually using CentOS, which is also installed on the machine). I installed lvm2 in Ubuntu and added the path to the LV in /etc/fstab. The Logical Volume is available after boot, no error messages, but a 60 sec boot lag, which is also visible in dmesg:

...
[ 1.607767] scsi6 : pata_marvell
[ 1.607831] scsi7 : pata_marvell
[ 1.607859] ata7: PATA max UDMA/100 cmd 0xc040 ctl 0xc030 bmdma 0xc000 irq 19
[ 1.607861] ata8: PATA max UDMA/133 cmd 0xc020 ctl 0xc010 bmdma 0xc008 irq 19
[ 1.636055] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[ 1.707674] usb 2-1: new high speed USB device number 2 using ehci_hcd
[ 1.811657] firewire_core: created device fw0: GUID 001e8c0000507f90, S400
[ 1.839547] Refined TSC clocksource calibration: 3311.144 MHz.
[ 1.839553] Switching to clocksource tsc
[ 1.852211] hub 2-1:1.0: USB hub found
[ 1.852422] hub 2-1:1.0: 8 ports detected
[ 2.123415] usb 2-1.3: new low speed USB device number 3 using ehci_hcd
[ 2.291274] usb 2-1.4: new low speed USB device number 4 using ehci_hcd
[ 2.475120] usb 2-1.5: new high speed USB device number 5 using ehci_hcd
[ 2.576201] hub 2-1.5:1.0: USB hub found
[ 2.576284] hub 2-1.5:1.0: 3 ports detected
[ 2.846809] usb 2-1.5.2: new low speed USB device number 6 using ehci_hcd
[ 62.062792] udevd[333]: starting version 173
[ 62.084227] lp: driver loaded but no devices found
[ 62.085550] coretemp coretemp.0: TjMax is 98 C.
[ 62.085556] coretemp coretemp.0: TjMax is 98 C.
[ 62.085561] coretemp coretemp.0: TjMax is 98 C.
[ 62.085567] coretemp coretemp.0: TjMax is 98 C.
[ 62.167185] wmi: Mapper loaded
...

My computer has a really fast SSD, so it is not a slow external (net) disk which causes the delay.

Revision history for this message
Michael Kofler (michael-kofler) wrote :

The boot delay even happens with all LVs removed from /etc/fstab.

apt-get remove lvm2
update-initramfs -c -k all

finally helped (but of course no more LVM now).

Revision history for this message
Christian Weiske (cweiske) wrote :

I can reproduce the issue using Virtualbox 4.1.2 on ubuntu 64bit, 8gib disk

1. install a fresh system: ubuntu 11.10 desktop amd64
2. during installation: manual partitioning, 4gib ext4 for /, no other partition
3. after installation: run gparted, create new partition on the rest of the disk (4gib) without file system (uninitialized)
4. run system-config-lvm: init partition 2 (the newly created one)
4.1 add lvmdisk1, 2gib, ext4
4.2 add lvmdisk2, 1gib, ext4
5. reboot
6. wait 60 seconds

Revision history for this message
Peter Matulis (petermatulis) wrote :

@Christian

Excellent.

And if you unmount the partition, remove the volume (man lvremove), and reboot?

Revision history for this message
Christian Weiske (cweiske) wrote :

Hm. While the 60s delay was there, it is not anymore on the 2nd boot. Need to investigate further.

Revision history for this message
Christian Weiske (cweiske) wrote :

The delay is there again... wtf?

Revision history for this message
Peter Matulis (petermatulis) wrote :

Right. I was conducting tests (rebooting) on a server installation. I have a roughly 80% rate of occurrence. I will post detailed test description and results tomorrow morning (EST).

Revision history for this message
Peter Matulis (petermatulis) wrote :

Reproduced in a KVM guest (11.10 Server amd64)

Post-install:

sudo apt-get install lvm2
sudo pvcreate /dev/vdb
sudo vgcreate data /dev/vdb
sudo lvcreate --extents 100%FREE --name flies data
sudo reboot
<wait 60 seconds>

With 3.0.0-14 kernel, it didn't happen every time (yes = wait problem):

14 / 16 reboots - yes

When booting with the 3.0.0.12 kernel (choosing it in the GRUB menu):

5 / 5 reboots - no

While running the -14 kernel, I remove the volume:

sudo lvremove data/flies

5 / 5 reboots - no

Changed in udev (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Similar to bug 631795

tags: added: kernel-da-key kernel-key oneiric regression-update
Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
importance: Medium → High
tags: added: precise
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-9.16)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-9.16
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for folks affected by this bug to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds .

Please test the latest v3.2 kernel[1]. Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-precise/

tags: added: needs-upstream-testing
removed: kernel-request-3.2.0-9.16
Revision history for this message
Peter Matulis (petermatulis) wrote :

This problem also manifests itself in the Precise daily (Jan 15) exactly as described in comment #25. Including the very small chance of booting without the wait.

However, when LVM is used in the installer there is never a problem. Even after adding a volume post-install. I tested this scenario on both 11.10 and 12.04 (jan 15 daily).

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Another possible dup: bug 902491

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Possible workaround in bug 802626

It consists in adding a --noudevsync parameter to the vgchange command in /lib/udev/rules.d/85-lvm2.rules. Then regenerating the initramfs with update-initramfs -u

tags: added: bot-stop-nagging
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Christian Weiske (cweiske) wrote :

The workaround makes boot finish in 20 seconds! Yay. No more waiting anymore.

tags: removed: kernel-key
Revision history for this message
David (dogge2k-deactivatedaccount) wrote :

The workaround in bug 802626 works for me too.

Revision history for this message
Toon Verstraelen (toon-verstraelen) wrote :

The workaround in bug 802626 works for me too. Thanks.

Revision history for this message
Herton R. Krzesinski (herton) wrote :

So far this is the same as bug 802626, marking this one as duplicate of it. Also, this is an userspace synchronization problem, not kernels's fault.

Besides using --noudevsync, another solution is patching udev to not ignore relevant lvm notifications from the kernel (events with DM_COOKIE set) on exit: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/802626/comments/53

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Michael (auslands-kv) wrote :

Hello

I have the same or a very similar problem with my current Ubuntu 12.04 installation. See attached bootchart. My system has an lvm vg and the rootfs is in the vg.

I have tried the --noudevsync option as described above, but no change.

Maybe this is a different problem. I don't know. Anybody can read a bit from the bootchart where the problem is?

Revision history for this message
elatllat (elatllat) wrote :

Ubuntu 18.04(2018) sais hi to 11.10(2010) and still has a LMV timeout on boot.

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.