Ubuntu

udev and lvm2 hang at boot

Reported by Filip Granö on 2011-12-19
98
This bug affects 17 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
udev (Ubuntu)
Undecided
Unassigned

Bug Description

I'm experiencing a really long wait when booting freshly installed xubuntu 11.10 with only updates and mdadm + lvm2 installed. I have managed to narrow it down somewhat but still looking for a fix.

After disabling splash and quiet from grub I noticed it's waiting at /scripts/init-bottom/udev in initrd and continues after 61 seconds.
Not surprisingly i found a "udevadm control --timeout 61 --exit" line in there. Well, why does it fail so badly it waits for timeout before exiting?

After quite a bit of googling I found this:
http://us.generation-nt.com/answer/problem-lvm-gets-stuck-during-booting-due-recent-uevent-change-help-205241751.html
Ari Savolainen writes:
An init script (/scripts/init-bottom/udev in initrd) issues command
"udevadm control --timeouta --exit".
At the same time udevd is executing "/sbin/lvm vgchange -a y" (from
/lib/udev/rules.d/85-lvm2.rules) that calls ioctl to resume a logical
volume. After that lvm gets stuck forever. Booting continues after the
61 second timeout.

Milan Broz writes:
If you call vgchange or even vgscan from udev rule, it is completely wrong.
This is not lvm upstream udev rule btw.

This makes me slightly worried, does this cause any other problems than a annoyingly slow boot? Is there any way to fix this? I this a known issue?

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in udev (Ubuntu):
status: New → Confirmed
Dave Gilbert (ubuntu-treblig) wrote :

Hi Filip,
  That sounds like it's the same as what I used to have in bug 625395

David (dogge2k) wrote :

It seems the problem is due to udevadm called from the initramfs (/usr/share/initramfs-tools/scripts/init-bottom/udev). On my system udevadm runs for 61 seconds and times out (see bootchart). It only occurs with kernel 3.0.0-14, older kernels are working fine. Reducing the timeout to 20 seconds will result in an unusable system. I get an error message that the lvm partition is not found.

David (dogge2k) wrote :
Christian Weiske (cweiske) wrote :

Disabling the rule and running `update-initramfs -u` afterwards leads to a non-bootable system :)

Davide (darkenergyreactor) wrote :

Hi, I am experiencing the same bug on some of my machines with kernel 3.0.0-14.
I found you can easily reproduce the issue on a virtualbox vm by installing a minimal system on a logical volume.
The system will hang for 61 sec at next reboot if you use lvcreate to create additional logical volumes.

David (dogge2k) wrote :

I did some more research on this problem and it's udevd itself. Activating the debug output gives me the following message after 60 seconds:

timeout, giving up waiting for workers to finish

This message comes from udevd.c:1626 when epoll_wait times out. After this message udevd will immediately exit without waiting for the other worker threads. I'm not sure why the kernel doesn't signal udevd.
I've also tested this with the recent kernel 3.0.0-15 from oneiric-propsed with the same result.
Maybe we should raise this bug as critical because Ubuntu is currently not really usable on a lvm partition.

Peter Matulis (petermatulis) wrote :

@dogge2k

Can you confirm that this happens with any type of LVM configuration? Desktop Edition and/or Server Edition?

I wouldn't say that a one-minute pause translates to "unusable".

Peter Matulis (petermatulis) wrote :

I just a fresh install of a 11.10 Desktop (KVM guest) with the root filesystem on a logical volume (and /boot not) and there is no pause at all. It is running the 3.0.0-14 generic kernel.

Mikael Rapp (micke-rapp) wrote :

I experience the same issue on Ubuntu Server (11.10) , but only after a few hours of fiddling with lxc and brctl.

My system completely freezes(not a 60 sec pause) but i guess that is due to fact that my / is on the lvm
The only way for me to access the system is to start it in recovery mode -> remount -> resume.

Dave Gilbert (ubuntu-treblig) wrote :

Mikael: It sounds like you have a different problem there - this bug number is just for bugs that occur during boot, and give a shortish pause to the boot process, but it then carries on.

Peter: It doesn't happen with any lvm config - I'm running LVM and I don't see it at the moment, but have in the past.

Peter Matulis (petermatulis) wrote :

This is not an obvious problem. I'm changing the status to 'Incomplete' until we can narrow down the problem some more.

1. Let's try the recipe given by darkenergyreactor. He stated that it is reliably reproduceable if you manually create logical volumes (with lvcreate) post-install.

2. For those people that have been affected by this, does the above sound familiar?

Changed in udev (Ubuntu):
status: Confirmed → Incomplete
David (dogge2k) wrote :

@petermatulis

I couldn't reproduce this problem with a fresh installation (desktop edition, amd64). But the problem still exists on my other system. I'm not sure if there are other dependencies which might trigger this problem but it seems I'm not the only one who encounter this problem.

Christian Weiske (cweiske) wrote :

The machine I have problems with has this partition setup:

$ LC_ALL=C fdisk -l /dev/sda

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000b8523

   Device Boot Start End Blocks Id System
/dev/sda1 * 63 1048638 524288 83 Linux
/dev/sda3 126881370 2930272064 1401695347+ 5 Extended
/dev/sda5 126881433 246870854 59994711 83 Linux
/dev/sda6 246870918 254871224 4000153+ 82 Linux swap / Solaris
/dev/sda7 254871288 2930272064 1337700388+ 83 Linux

sda7 is the partition on which i have my lvm setup:

$ lvmdiskscan |grep LVM
  /dev/sda7 [ 1,25 TiB] LVM physical volume
  0 LVM physical volume whole disks
  1 LVM physical volume

LVM setup:

$ lvm vgs
  VG #PV #LV #SN Attr VSize VFree
  Daten 1 2 0 wz--n- 1,25t 575,73g
$ lvm lvs
  LV VG Attr LSize Origin Snap% Move Log Copy% Convert
  games Daten -wi-ao 400,00g
  homes Daten -wi-ao 300,00g

Mounts:
/dev/mapper/Daten-homes on /home type ext4 (rw,commit=0)
/dev/mapper/Daten-games on /home/spiele type ext4 (rw,commit=0)

Peter Matulis (petermatulis) wrote :

@Christian

Did you create the logical volume post-install?

Provide all information on how you set up your machine outside of the installer itself.

Everybody please read my comment #13 so I don't have to keep repeating myself. Thanks.

Christian Weiske (cweiske) wrote :

I'm on the way reproducing it, but I wasn't there yet - but already wanted to share the setup.

Same problem: First a U11.10 installation into a 'normal' partition, all fine.

Later I set up a Logical Volume (actually using CentOS, which is also installed on the machine). I installed lvm2 in Ubuntu and added the path to the LV in /etc/fstab. The Logical Volume is available after boot, no error messages, but a 60 sec boot lag, which is also visible in dmesg:

...
[ 1.607767] scsi6 : pata_marvell
[ 1.607831] scsi7 : pata_marvell
[ 1.607859] ata7: PATA max UDMA/100 cmd 0xc040 ctl 0xc030 bmdma 0xc000 irq 19
[ 1.607861] ata8: PATA max UDMA/133 cmd 0xc020 ctl 0xc010 bmdma 0xc008 irq 19
[ 1.636055] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
[ 1.707674] usb 2-1: new high speed USB device number 2 using ehci_hcd
[ 1.811657] firewire_core: created device fw0: GUID 001e8c0000507f90, S400
[ 1.839547] Refined TSC clocksource calibration: 3311.144 MHz.
[ 1.839553] Switching to clocksource tsc
[ 1.852211] hub 2-1:1.0: USB hub found
[ 1.852422] hub 2-1:1.0: 8 ports detected
[ 2.123415] usb 2-1.3: new low speed USB device number 3 using ehci_hcd
[ 2.291274] usb 2-1.4: new low speed USB device number 4 using ehci_hcd
[ 2.475120] usb 2-1.5: new high speed USB device number 5 using ehci_hcd
[ 2.576201] hub 2-1.5:1.0: USB hub found
[ 2.576284] hub 2-1.5:1.0: 3 ports detected
[ 2.846809] usb 2-1.5.2: new low speed USB device number 6 using ehci_hcd
[ 62.062792] udevd[333]: starting version 173
[ 62.084227] lp: driver loaded but no devices found
[ 62.085550] coretemp coretemp.0: TjMax is 98 C.
[ 62.085556] coretemp coretemp.0: TjMax is 98 C.
[ 62.085561] coretemp coretemp.0: TjMax is 98 C.
[ 62.085567] coretemp coretemp.0: TjMax is 98 C.
[ 62.167185] wmi: Mapper loaded
...

My computer has a really fast SSD, so it is not a slow external (net) disk which causes the delay.

The boot delay even happens with all LVs removed from /etc/fstab.

apt-get remove lvm2
update-initramfs -c -k all

finally helped (but of course no more LVM now).

Christian Weiske (cweiske) wrote :

I can reproduce the issue using Virtualbox 4.1.2 on ubuntu 64bit, 8gib disk

1. install a fresh system: ubuntu 11.10 desktop amd64
2. during installation: manual partitioning, 4gib ext4 for /, no other partition
3. after installation: run gparted, create new partition on the rest of the disk (4gib) without file system (uninitialized)
4. run system-config-lvm: init partition 2 (the newly created one)
4.1 add lvmdisk1, 2gib, ext4
4.2 add lvmdisk2, 1gib, ext4
5. reboot
6. wait 60 seconds

Peter Matulis (petermatulis) wrote :

@Christian

Excellent.

And if you unmount the partition, remove the volume (man lvremove), and reboot?

Christian Weiske (cweiske) wrote :

Hm. While the 60s delay was there, it is not anymore on the 2nd boot. Need to investigate further.

Christian Weiske (cweiske) wrote :

The delay is there again... wtf?

Peter Matulis (petermatulis) wrote :

Right. I was conducting tests (rebooting) on a server installation. I have a roughly 80% rate of occurrence. I will post detailed test description and results tomorrow morning (EST).

Peter Matulis (petermatulis) wrote :

Reproduced in a KVM guest (11.10 Server amd64)

Post-install:

sudo apt-get install lvm2
sudo pvcreate /dev/vdb
sudo vgcreate data /dev/vdb
sudo lvcreate --extents 100%FREE --name flies data
sudo reboot
<wait 60 seconds>

With 3.0.0-14 kernel, it didn't happen every time (yes = wait problem):

14 / 16 reboots - yes

When booting with the 3.0.0.12 kernel (choosing it in the GRUB menu):

5 / 5 reboots - no

While running the -14 kernel, I remove the volume:

sudo lvremove data/flies

5 / 5 reboots - no

Changed in udev (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Similar to bug 631795

tags: added: kernel-da-key kernel-key oneiric regression-update
Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
importance: Medium → High
tags: added: precise

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-9.16
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for folks affected by this bug to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds .

Please test the latest v3.2 kernel[1]. Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-precise/

tags: added: needs-upstream-testing
removed: kernel-request-3.2.0-9.16
Peter Matulis (petermatulis) wrote :

This problem also manifests itself in the Precise daily (Jan 15) exactly as described in comment #25. Including the very small chance of booting without the wait.

However, when LVM is used in the installer there is never a problem. Even after adding a volume post-install. I tested this scenario on both 11.10 and 12.04 (jan 15 daily).

Joseph Salisbury (jsalisbury) wrote :

Another possible dup: bug 902491

Joseph Salisbury (jsalisbury) wrote :

Possible workaround in bug 802626

It consists in adding a --noudevsync parameter to the vgchange command in /lib/udev/rules.d/85-lvm2.rules. Then regenerating the initramfs with update-initramfs -u

tags: added: bot-stop-nagging
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Christian Weiske (cweiske) wrote :

The workaround makes boot finish in 20 seconds! Yay. No more waiting anymore.

tags: removed: kernel-key
David (dogge2k) wrote :

The workaround in bug 802626 works for me too.

The workaround in bug 802626 works for me too. Thanks.

Herton R. Krzesinski (herton) wrote :

So far this is the same as bug 802626, marking this one as duplicate of it. Also, this is an userspace synchronization problem, not kernels's fault.

Besides using --noudevsync, another solution is patching udev to not ignore relevant lvm notifications from the kernel (events with DM_COOKIE set) on exit: https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/802626/comments/53

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Michael (auslands-kv) wrote :

Hello

I have the same or a very similar problem with my current Ubuntu 12.04 installation. See attached bootchart. My system has an lvm vg and the rootfs is in the vg.

I have tried the --noudevsync option as described above, but no change.

Maybe this is a different problem. I don't know. Anybody can read a bit from the bootchart where the problem is?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions