Ubuntu

vgchange may deadlock in initramfs when VG present that's not used for rootfs

Reported by Sébastien Bernard on 2011-06-27
348
This bug affects 64 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
High
Steve Langasek
Oneiric
High
Steve Langasek
Precise
High
Steve Langasek
udev (Ubuntu)
High
Unassigned
Oneiric
High
Unassigned
Precise
High
Unassigned

Bug Description

The system is now unable to boot.
I had to boot on previous kernel 3.0.0.
Symptoms are boot freezes. When debugging, issuing a vgscan works ok.
doing the vgchange -a y just hangs. I need to reboot the system afterwards.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: lvm2 2.02.66-4ubuntu2
ProcVersionSignature: Ubuntu 3.0-0.1-generic 3.0.0-rc2
Uname: Linux 3.0-0-generic x86_64
Architecture: amd64
Date: Mon Jun 27 18:15:46 2011
ProcEnviron:
 LANGUAGE=fr_FR:en
 PATH=(custom, user)
 LANG=fr_FR.utf8
 LC_MESSAGES=fr_FR.UTF-8
 SHELL=/bin/bash
SourcePackage: lvm2
UpgradeStatus: No upgrade log present (probably fresh install)

Sébastien Bernard (sbernard) wrote :
Sébastien Bernard (sbernard) wrote :

This hangs vanished after 3.0.0-3 and came back at 3.0.0-5.

Sébastien Bernard (sbernard) wrote :

Symptom is vgchange -a -y hangs and became unkillable.

Here is a "me too" posting.

My system:
* three harddisks
  * WD 750GB
  * WD 150GB (raptor)
  * OCZ Vertex3 120GB
* one volume group on the WD 750GB HDD
* one LV within VG to be mounted at boot (via fstab)
* root filesystem is on the Vertex3 SSD. Maybe there is a timing issue, the SSD is quite fast.

Symtoms on my host
* Only happens occasionally on boot; mostly after a cold-start
* my system boots after a delay of about 60s, but my root partition is still mounted read-only
* aside from root, none of the other partitions in fstab is mounted
* X11 does not start (because of read-only mount)
* I can login, but /home is not mounted
* like with the original reporter, "vgchange -a y" hangs. I found it to hang in "semtimedop".

Changed in lvm2 (Ubuntu):
status: New → Confirmed

I managed to obtain a logfile from the lvm command when it hangs:

libdm-deptree.c:941 Resuming phenom-data2 (253:0)
libdm-common.c:1153 Udev cookie 0xd4d0511 (semid 0) created
libdm-common.c:1166 Udev cookie 0xd4d0511 (semid 0) incremented
libdm-common.c:1054 Udev cookie 0xd4d0511 (semid 0) incremented
libdm-common.c:1226 Udev cookie 0xd4d0511 (semid 0) assigned to dm_task type 5 with flags 0x0
ioctl/libdm-iface.c:1821 dm resume (253:0) NF [16384]
libdm-common.c:828 phenom-data2: Stacking NODE_READ_AHEAD 256 (flags=1)
libdm-common.c:1081 Udev cookie 0xd4d0511 (semid 0) decremented
libdm-common.c:1276 Udev cookie 0xd4d0511 (semid 0): Waiting for zero

Obviously, lvm waits for udev to decrement the device-mapper task semaphore. A strange observation is that when looking at "ps ax" the PID(s) of udevd (437,829,839) suggest that udevd was started after the lvm tool (pid 355). Maybe there is a small time window when udevd is not able to perform the semaphore decrementation.

In /boot/initrd.img-3.0.0-8-generic/scripts/init-bottom/udev I see these two lines:

# Stop udevd, we'll miss a few events while we run init, but we catch up
udevadm control --exit

Just a wild speculation, because I haven't yet digged into the interactions between kernel and udevd, but the semaphore decrementation event might be lost when transitioning from the initrd-udevd to the rootfs-udevd.

In my opinion, there is a fundamental problem wrt interworking between udev and lvm during the init phase. Looking at scripts and source code I found nothing which prevents the following to happen:

1) kernel boots with initial ramdisk
2) udevd of initial ramdisk is started by initrd.img/scripts/init-top/udev
3) udevd handles the initial events of the block devices (partitions) produced by "udevadm trigger --action=add ..." invoked by "initrd.img/scripts/init-top/udev". Please note that this is executed in the background using "&".
4) /lib/udev/rules.d/85-lvm2.rules intercepts add events for LVM partitions and issues "watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'"
5) "lvm vgchange -a y" sends a LV resume request to the device-mapper in the kernel and waits for completition notification by udevd via semaphore
6) initrd.img/scripts/init-bottom/udev gets executed which invokes "udevadm control --exit". After this point, the initrd-udevd is not handling events anymore.
7) kernel delivers the mapper-task-completion-event to udevd which does not handle events anymore, and therefore does not decrement the semaphore
8) having already waited 60 seconds (event_timeout) for "watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange -a y'" to complete, udevd just gives up and continues termination
9) according to udev-173/udev/udevd.c:1631 the remaining queued events (including the semaphore decrement) are discarded by means of "event_queue_cleanup".
10) after termination of udevd "udevadm control --exit" also terminates and the boot process continues

In the end, the rootfs-udevd does not see the semaphore decrement event anymore, so "lvm vgscan -a y" still hangs. Some other bugs like #631795 #797226 #581566 might be related to this timing problem.

Also adding udev to the bug. Please set to invalid if e.g. lvm is required to wait for the logical volume setup to finish in initrd.img/scripts/init-premount/lvm2.

Steve Langasek (vorlon) wrote :

Eduard, thank you very much for your detailed analysis of this issue! It appears that this may be the same problem as bug #818177 which we're working on fixing for Ubuntu 11.10.

> Also adding udev to the bug. Please set to invalid if e.g. lvm is required to wait for
> the logical volume setup to finish in initrd.img/scripts/init-premount/lvm2.

A wait there is insufficient because another vgchange may be issued *after* that script ends. We need to make sure that when udev terminates, it doesn't leave vgchange hanging.

Changed in lvm2 (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in udev (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon) on 2011-10-06
Changed in lvm2 (Ubuntu Oneiric):
assignee: nobody → Steve Langasek (vorlon)
Steve Langasek (vorlon) on 2011-10-06
Changed in udev (Ubuntu Oneiric):
importance: Undecided → High
summary: - boot hangs at initrd
+ vgchange may deadlock in initramfs when VG present that's not used for
+ rootfs
Serge Hallyn (serge-hallyn) wrote :

The fix to the udev --exit bug did not fix this bug. After uninstalling my daemonizing watershed package, here is some debug info of the hung vgchange -a y task. It is hung in dm_udev_wait() in semop on a semaphore which is stuck at 1.

I'm not sure what other task could have set that value (and exited/died). Will try (not right now) a run with udev -D.

The "udevadm control --exit" fix was not supposed to fix this bug. I still have the (occasional) 60 seconds hang at boot. It is just that '/dev' is now moved to rootfs, enabling the filesystems to be mounted rw.

The very same lvm process which waits for the semaphore is the process which set it to 1. It waits for udevd to call the 55-dm.rules rule

ENV{DM_COOKIE}=="?*", RUN+="/sbin/dmsetup udevcomplete $env{DM_COOKIE}"

to be processed. But since udevd is not accepting events after "udevadm control --exit", and any event, which arrived before closing the netlink socket, is discarded, 'vgchange -a y' waits forever.

Reading the documentation of dmsetup, I found the udevcomplete_all option. This might be a means to finish hanging 'vgchange -a y' processes, and could be called from within a init-bottom script of initramfs

Serge Hallyn (serge-hallyn) wrote :

AIUI udevcomplete_all will clean up any hanging vgchange -a y processes, but it won't solve the problem. udev is supposed to, in response to the lvs created by vgchange, call 55-dm again, and thereby create the /dev/{vg}/{lv} links (before calling dmsetup udevcomplete ${DM_COOKIE} to un-hang vgchange). Those links won't get created.

(I've tested this. In a case where /dev/schroots/kvm-1 did not get created and vgchange was hanging, I did dmsetup udevcomplete ${DM_COOKIE}" (using the cookie shown by ipcs -a). vgchange happily went away, /dev/schroots/kvm-1 did not show up.

Serge Hallyn (serge-hallyn) wrote :

Some logs of good and bad boots. Interestingly, vgchange actually complains during the good boot that udev did not create the /dev/schroots/kvm-1 link.

So for /dev/dm-* nodes already existing, the symlinks are not created. Even with having 'udevadm trigger --action=add' invoked by /etc/init/udevtrigger.conf when the rootfs-udevd is running.

We could try to make initramfs-udevd process the queued events by means of removing the
 event_queue_cleanup(udev, EVENT_QUEUED);
from the main loop. This would not eliminate the potential of hanging, but in theory it should at least avoid events being lost.

Steve Langasek (vorlon) wrote :

> This would not eliminate the potential of hanging, but in theory it
> should at least avoid events being lost.

Well, I don't think there's any point in doing this (and slowing down the boot) if it won't even reliably eliminate the risk of a hang.

Serge Hallyn (serge-hallyn) wrote :

It seems pretty clear what is happening here: udev's 85-lvm rule fires off vgchange (through watershed); which activates lv, then sends udev a DM_COOKIE with the address of a semaphore, then waits for that semaphore to drop to zero; udev fires off 55-dm, which calls dmsetup udevcomplete ${DM_COOKIE}, which clears the semaphore; udev sometimes exits before calling dmsetup udevcomplete, so vgchange never exits, but initramfs' udev waits for it to exit (until finally being killed after timeout).

Daemonizing watershed worked because it allowed udev to exit while vgchange continues to wait. I *believe* the rootfs udev then always continued to call dmsetup udevcomplete to let vgchange exit, though it is possible that there were cases of vgchange hanging which I simply never noticed.

The other possible solution would be to have vgchange not wait for the semaphore. Note that theoretically this could cause errors to be hidden (when vgchange should have hung but didn't), except that vgchange sticking around is easy to not notice if nothing else goes wrong. However, I've tried having the vgchange -a y in initramfs use --ignoremonitoring, and several boots hung completely very early on, while others succeeded. It's possible I did something else wrong.

A third solution *could* be for udev to not clear queues and workers so long as there are non-idle workers. The code to do so should be pretty easy (I have a sample patch but haven't tested it)

Wessel Dankers (wsl) wrote :

I think this happens because the lvm2 commands are called from inside udev, while lvm itself also waits for udev to complete. This creates an obvious deadlock.

/lib/udev/rules.d/85-lvm2.rules should be changed to read:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", \
 RUN+="watershed sh -c '/sbin/lvm vgscan --noudevsync; /sbin/lvm vgchange --noudevsync -a y'"

cheers

Serge Hallyn (serge-hallyn) wrote :

@Wessel, could you describe the deadlock you are talking about?

The hang I see is not a really a deadlock - it can't happen at random, but only if udev is killed before it can acknowledge the DM_COOKIE. Preventing udev's exit from blocking on the vgchange completing also addressed that one. But perhaps there is another potential deadlock, which could happen even with a simple 'pvcreate' long after exiting initramfs?

The comments in libdevmapper.h certainly seem to imply that it is better to pass flags along in a DM_COOKIE, although 55-dm.rules appears to handle the case of no cookie. Can you explain if there is any real downside to using --noudevsync?

Wessel Dankers (wsl) wrote :

What I'm seeing is a vgchange process that hangs until udev kills it. When it is killed it has only initialized a few LV's and most symlinks in /dev/<vgname> are missing. Perhaps this is a different bug, but the symptoms are exactly those from the original bug report.

Also, I'm not preventing udev's exit from blocking on the vgchange completing but rather the reverse: I'm preventing vgchange's completing to block on udev.

From what I understand, vgchange waits for udev to finish processing the new device nodes before returning to the user so that after the command completes, the user can be sure that the nodes are created. Only in this case udev *is* the user, so it doesn't make sense to wait, and in fact creates a deadlock.

Quoting Wessel Dankers (<email address hidden>):
> What I'm seeing is a vgchange process that hangs until udev kills it.
> When it is killed it has only initialized a few LV's and most symlinks
> in /dev/<vgname> are missing. Perhaps this is a different bug, but the
> symptoms are exactly those from the original bug report.
>
> Also, I'm not preventing udev's exit from blocking on the vgchange
> completing but rather the reverse: I'm preventing vgchange's completing
> to block on udev.

Right, with that I was referring to my earlier solution of daemonizing
watershed. But,

> >From what I understand, vgchange waits for udev to finish processing the
> new device nodes before returning to the user so that after the command
> completes, the user can be sure that the nodes are created. Only in this
> case udev *is* the user, so it doesn't make sense to wait, and in fact
> creates a deadlock.

If that is the only point of the sync, then your solution is the best
one.

Thanks for your input, Wessel.

Wessel Dankers (wsl) wrote :

I was a little overzealous with the application of --noudevsync btw, only vgchange accepts that option. Proper version:

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="lvm*|LVM*", \
 RUN+="watershed sh -c '/sbin/lvm vgscan; /sbin/lvm vgchange --noudevsync -a y'"

Well, in some sense udev _is_ the user, because it has to wait for the events which create the /dev/mapper/vg-lv and /dev/lg/lv nodes. This resembles the same problem wich causes udevcomplete_all in my comment #11 not to work reliably. The hang is gone, but there is still the problem of some /dev/mapper nodes missing (see comment #12).

On the other hand, with the current behavior of udev waiting is pointless, because udevd immediately aborts event processing, and afterwards waits for vgchange to complete.

Wessel Dankers (wsl) wrote :

vgchange will create the device mapper table entries before exiting, so that part should be taken care of. After it's finished, udev will create the device nodes and symlinks as it gets the information about the new dm nodes from the kernel.

From the vgchange manpage:

       --noudevsync
              Disable udev synchronisation. The process will not wait for
              notification from udev. It will continue irrespective of any
              possible udev processing in the background. You should only use
              this if udev is not running or has rules that ignore the devices
              LVM2 creates.

Note that the above snippet doesn't say anything about udev waiting on vgchange, just the reverse. And I agree udev is the "user" here, as I already noted in #19. ;)

All failure modes I've seen (missing device nodes, 60 second boot times, etc) were variations on either udev killing vgchange after a timeout, udev getting killed before it could kill vgchange, handover issues between the initramfs udev instance and the main udev instance with vgchange interupted or still deadlocked, etc. All of them were solved by removing the vgchange deadlock.

The only remaining issue I see is that udevcomplete_all can't function reliably in this scenario (as you say) because the events that result from vgchange's actions are handled asynchronously. It might be that vgchange passes some udev cookies to solve that issue, but that seems unrelated to the above option.

Uxorious (uxorious) wrote :

Has there been any progress with this?
I am seeing this problem 100% with an ESXi install using 3.0.0-12-server with all the latest updates applied.

Another possible cause of problems is that, due to the use of
watershed, DM_COOKIE is not passed to the actual invocation of
vgchange, or vgchange is run fewer times than expected.

watershed considers two identical command lines with different
environments coalescable. So if you run
  watershed vgchange -a y &
  DM_COOKIE=A watershed vgchange -a y &
  DM_COOKIE=B watershed vgchange -a y &
vgchange may see any subset (by DM_COOKIE setting) of the calls.

I don't know the innards of the lvm/udev plumbing well enough to know
if this is going to be a problem, but from the descriptions in this
bug it seems it might be.

Of course simply removing the use of watershed, or manually putting
the cookie in the watershed command-id (or on the watershed command
line, for the same effect), will with the current udev rules result in
lots of serialised invocations of vgscan.

Ian.

Serge Hallyn (serge-hallyn) wrote :

I'm trying out this debdiff on my laptop. Mr statistics-averse says: "so far so good".

The attachment "debdiff making lvm in udev rules not wait on udev" of this bug report has been identified as being a patch in the form of a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Serge Hallyn (serge-hallyn) wrote :

(Note, the debdiff above is just Wessel's suggestion from comment #21. It'll need to be targeted at precise and give due credit if it is deemed correct)

Kotusev (kotusev) wrote :

seems, this patch cure the issue, in my case

Endre Karlson (endre-karlson) wrote :

Is this bug fixed or still an issue ?! I'm having serious problems with this and using a linux-image 2.6.32.x works totally fine..

Biji (biji) wrote :

Im having this problem too.
My laptop has / in sda2 and /home and other partition in LVM
Using last update, boot very slow, hang after init-bottom printed for 1 minute, then boot normally

Serge Hallyn (serge-hallyn) wrote :

I've not had the problem since starting to use the linked bzr tree.

Jack D (jdonohue654-ubuntu) wrote :

I ran across this bug this morning after installing 11.10. I can say the patch above worked for me as well.

Sergio Callegari (callegar) wrote :

Experiencing this hang on a DELL E6500 where I am using LVM, after ubuntu pushed the 3.0.0-14 kernel, the -13 kernel did not have the issue.

But I do have the root filesystem on LVM. So the problem seems to be there even if rootfs is on a logical LVM volume.

The --noudevsync in udev lvm2 rules in /lib/udev, followed by an update to the initramfs seems to remove the hang.

Please also take a look at Bug #902491 and see if it needs to be marked as a duplicate of this bug.

Sergio Callegari (callegar) wrote :

BTW, is it OK and safe to keep the --noudevsync in the udev lvm2 rule?

Sergio Callegari (callegar) wrote :

Just a short note to mention that the slow boot also affects 3.2RC4 from the mainline ppa

janevert (j-e-van-grootheest) wrote :

A me too.
I have /var and /home on two lvm partitions. /home only needs one of the two partitions, /var was recently extended and now needs both. It fails in most boots, more than 9 out of 10.

The comment from #34 also applies for me, no problem with the 3.0.0-13 kernel.

The thing I can add is that /var never gets activated. So I'd say that at least one vgchange is not executed. If I manually vgscan and vgchange and then exit the shell, the boot continues just fine.

Serge Hallyn (serge-hallyn) wrote :

janevert - are you saying that with the --noudevsync you get one vgchange not executed at all (until you force it by hand)?

janevert (j-e-van-grootheest) wrote :

Serge, I've got the stock lvm2 package. So if that --noudevsync is in the stock package, then yes, it usually fails.
The lvm2 is 2.02.66-4ubuntu3.

janevert (j-e-van-grootheest) wrote :

Had some time over x-mas. Adding --noudevsync fixes it for me. 5 consecutive reboots all worked.

Seth Jennings (spartacus06) wrote :

I also am experiencing the 60 second delay on startup with -14 and not with -13. According to the Ubuntu to Mainline map, 3.0.0-13.22 maps to 3.0.6 and 3.0.0-14.23 maps to 3.0.9. I bisected the commits over that range and determined that reverting the following commit corrects this issue:

7b59e3e29e1a28ad40892dd2115175e2702f1153 kobj_uevent: Ignore if some listeners c
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 70af0a7..ad72a03 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -282,7 +282,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_ac
                                                            kobj_bcast_filter,
                                                            kobj);
                        /* ENOBUFS should be handled in userspace */
- if (retval == -ENOBUFS)
+ if (retval == -ENOBUFS || retval == -ESRCH)
                                retval = 0;
                } else
                        retval = -ENOMEM;

This comment on related defect 818177 references the lkml.org post where the patch was submitted.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/818177/comments/67

It seems like there might be a history of this bug before -13, but this commit definitely exposed it more.

The fix for 818177 is already released for Precise but is still just committed for Oneiric.

James Keber (james-keber) wrote :

I have a VM running fully-updated precise (as of today precise-generic 3.2.0-9 - presumably including the fix for #818177?), with the entire file-system in lvm, which requires the "--noudevsync" fix to prevent a 60 second boot-delay.

I also have...
- Beige-box running oneiric-server with entire fs in an lvm displaying random unbootableness.
- Notebook running oneiric-generic with the entire fs in lvm displaying a 60 second boot-delay (bug #902491).

Both issues alleviated with the "--noudevsync" fix.

summary: - vgchange may deadlock in initramfs when VG present that's not used for
- rootfs
+ vgchange may deadlock in initramfs when VG present

Changed title to reflect the larger scope of this bug (it definitely also impacts situations where root fs /is/ in lvm).

Steve Langasek (vorlon) wrote :

James,

I don't believe that's true. If the rootfs is in the same VG, there should be no possibility of udev being stopped in the initramfs prior to the dependent event making its way through the system, because without those events the rootfs can't be mounted at all. And certainly, root-on-LVM has been 100% reliable for me here. I think you should file a separate bug for the issue you're seeing.

summary: - vgchange may deadlock in initramfs when VG present
+ vgchange may deadlock in initramfs when VG present that's not used for
+ rootfs
James Keber (james-keber) wrote :

Steve,
Several corresponding root-in-lvm lvm-udev timeout/deadlock bugs have been placed (including bug #902491, bug #912876, bug #909805), but all - at least superficially - appear to be as a result of the same udev/lvm deadlock as this bug, and all are fixed (or at least ameliorated) by the lvm "--noudevsync" work-around. Admittedly this could just be masking the true cause...

Obviously I defer to you; could you please glance-over bug #902491 and authoritatively decide if it (and the others) needs to be a separate bug (given the similarity of cause and of solution)?

Sergio Callegari (callegar) wrote :

As mentioned, the long delay at boot is definitely experienced by users including me that have their root fs on LVM. The --no-udevsync workaround fixes the issue that apparently started with the update to 3.00-14. Personally, I suspect that some problem was present even before 3.00-14 but occurring so rarely that one was not noticing it (e.g. 1 slow boot every 100). With 3.00-14 the hang at boot became sistematic.

As the original reporter of Bug #902491, I really hope that it can get some attention (e.g., currently this bug has a 'high' importance and that one a 'medium' one and most important this thread seems to host a technical discussion). Note that all those on 902491 see that the --no-udevsync workaround fixes the issue. Also note that following a (possibly wrong) hint on my side 902491 was closed as a duplicate of the current bug.

For the time being, following the comment by Steve I am re-opeining 902491 by removing the duplicate status. Please let me know if it is more appropriate to leave 902491 closed and open a new "vgchange may deadlock in initramfs when VG present that's used for rootfs"

Joseph Salisbury (jsalisbury) wrote :

Possible dups: bug 906358 and bug 631795

tags: added: kernel-key
tags: added: kernel-da-key
Phillip Susi (psusi) wrote :

It sounds like there are two separate bugs then:

1) lvm waits for udev, which waits for lvm -> circular dependency deadlock

2) watershed eats the DM_COOKIE causing the semaphore to not be released

Steffen Neumann (sneumann) wrote :

Hi,

#596554 could be a possible duplicate of this bug.

yours,
Steffen

Patch suggested by Wessel in comment #21 completely solved this problem for me (without patch issue was reproducible almost in 100% of system boots).

tags: removed: kernel-key
tags: added: kernel-key

As yet another person affected with this bug, I would like to add that the '--noudevsync workaround' does not really remove the timeout, it drastically decreases it. On my system (Acer 4820TG) the timeout of 16..25 seconds still exists (although not the original 80 seconds I've seen with 3.0.0-14/15).

Jeremy Kerr (jk-ozlabs) wrote :

I'm seeing the same issue on Precise, with a non-root LVM disk: a 60-second delay in the boot process, while scripts/init-bottom/udev is waiting for the `udevadm control --exit` process to finish, which is in turn waiting for the `watershed vgscan; vgchange` RUN command.

When run under watershed, the vgchange process hangs forever, waiting for a semop() syscall to return.

If I remove the watershed wrapper from the 85-lvm2 udev rule, then the vgchange process completes sucessfully, and boot returns to normal.

Herton R. Krzesinski (herton) wrote :

I also reproduced the issue here, after being pointed out that some people started seeing frequently this with the 3.0.0-14 kernel stable update in Oneiric.

After debugging, I reached same conclusions as posted here. It's a "deadlock" between vgchange and udev exiting: vgchange after the resume dm ioctl keeps waiting for an semaphore to be "unlocked" (reach zero), which happens when the kernel sends the DM_COOKIE from the ioctl back to userspace (udev), and udev runs the dmsetup udevcomplete for the same cookie, which drops the semaphore count and makes vgchange go on. But if udev is exiting in initramfs before the kernel cookie event is sent, it ignores any later kernel events or only accepts the firmware loading events (if you are running udev from updates where Andy fixed the firmware request problem on udev and it exiting).

This dm_cookie stuff is the udev synchronization process in lvm/dm, and that's why disabling it makes the problem not happen anymore, as it doesn't rely anymore on DM_COOKIE event returned from the kernel, as already stated.

I think watershed makes a difference only in changing the timing of when things run, I wasn't able to see any problem with it.

Perhaps running with vgchange with --noudevsync only inside initramfs would be an acceptable workaround, if no synchronization is needed for all initramfs cases (it seems the case). Being system wide doesn't make sense, as udev is always on managing device nodes, and we would have bugs I expect we could have problems on node management or with other users.

But may be it's safer to just make udev process remaining DM_COOKIE events from the kernel as we already do with udev from updates in oneiric for example, where we already have a special case for timely events (events with TIMEOUT set - firmware loading). I'm proposing the following patch as a solution, it works well so far here, no need to disable udev synchronization anymore.

Herton R. Krzesinski (herton) wrote :

The patch applies on top of udev 173-0ubuntu4.1 (and introduces a new symbol, which must be taken care when building the udev package).

Sergei Ianovich (ynvich-gmail) wrote :

@Herton: Great job. Patch from comment 53 fixes the problem for me.

@Steve: Please ship ASAP. I can help with testing, if need be.

jpritikin73 (joshua-peeredit) wrote :

My system is also affected. I applied the patch in comment #53 and all is well again. Thanks!

Rémi Rérolle (remi.rerolle) wrote :

I recently started experiencing the "stale lvm vgchange process" issue. After applying the patch from #53, it's no longer the case: no more timeouts during boot, no more stale process.

Note that I have a single VG composed of a single PV, and several LVs in the VG, one of which being my rootfs.

This fixes the issue for me.

tags: added: precise
Kees Cook (kees) wrote :

So, I spent some time looking at herton's fix, and I think it's correct. While using --noudevsync looks tempting, is does trigger some new races since now both udev and lvm are trying to do things like renaming device nodes. Watershed isn't a problem -- DM_COOKIE is passed separately via dmsetup rules (grep for DM_COOKIE in /lib/udev/rules.d). So, we need to not drop the DM_COOKIE events, which is what this patch does. I'll get this tested shortly and then we can get this uploaded to precise and backported to oneiric.

Kees Cook (kees) wrote :

udev test package for precise here: https://launchpad.net/~kees/+archive/ppa/+packages

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package udev - 175-0ubuntu6

---------------
udev (175-0ubuntu6) precise; urgency=low

  * Add debian/patches/avoid-exit-deadlock-for-dm_cookie.patch,
    debian/libudev0.symbols: do not exit across a pending DM_COOKIE
    event to avoid vgchange deadlocks, thanks to Herton R. Krzesinski
    (LP: #802626).
 -- Kees Cook <email address hidden> Sun, 04 Mar 2012 18:21:15 -0800

Changed in udev (Ubuntu Precise):
status: Confirmed → Fix Released
Steve Langasek (vorlon) wrote :

my understanding is that the udev upload comprises a complete fix, and no further changes are needed to lvm2.

Changed in lvm2 (Ubuntu Precise):
status: Triaged → Invalid
Changed in lvm2 (Ubuntu Oneiric):
status: Triaged → Invalid
Changed in udev (Ubuntu Oneiric):
status: Confirmed → Triaged
Marc Gariépy (mgariepy) wrote :

hello,

Can we expect to have a package showing up in oneirci-proposed ?

Thanks

Marc

On Wed, Mar 07, 2012 at 05:54:09PM -0000, Marc Gariépy wrote:
> Can we expect to have a package showing up in oneirci-proposed ?

Sorry, thought I'd uploaded this already. It's in the queue now, waiting
for SRU team approval.

Hello Sébastien, or anyone else affected,

Accepted into oneiric-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in udev (Ubuntu Oneiric):
status: Triaged → Fix Committed
FP (fabrice-pardo) wrote :

I have no more this tedious 60 seconds waiting at
    udevadm control --timeout=121 --exit
in
    /scripts/init-bottom/udev
after the today oneiric-proposed update of a Dell Latitude E6520 using lvm2. Kind thanks.

tags: added: verification-done
Changed in udev (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Steve Langasek (vorlon) on 2012-03-09
Changed in udev (Ubuntu Oneiric):
status: Fix Released → Fix Committed

Tested (from oneiric-proposed) in four machines suffering from the problem (one which wouldn't boot at all and where I'd used the --noudevsync hack before, three which hang for an extra minute until watershed timeout): problem gone, all boot perfectly (and quickly) now. No adverse effects observed. Very nice!

Danny Yates (mail4danny) wrote :

Can anyone give me an idea of when I can expect this fix to be available for Oneiric without using proposed?

Thanks.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package udev - 173-0ubuntu4.2

---------------
udev (173-0ubuntu4.2) oneiric-proposed; urgency=low

  * Add debian/patches/avoid-exit-deadlock-for-dm_cookie.patch,
    debian/libudev0.symbols: do not exit across a pending DM_COOKIE
    event to avoid vgchange deadlocks, thanks to Herton R. Krzesinski
    (LP: #802626).
 -- Steve Langasek <email address hidden> Mon, 05 Mar 2012 11:01:09 -0800

Changed in udev (Ubuntu Oneiric):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers