[feisty] failures when creating snapshots "in use: not deactivating"

Bug #84672 reported by Kees Cook on 2007-02-12
6
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
High
Scott James Remnant (Canonical)

Bug Description

Binary package hint: lvm2

When trying to open snapshots with schroot, I've been getting a lot of warnings. Sometimes it works, sometimes is fails, but usually I see warnings. This is a new behavior, since the recent updates. I'm unclear if this is associated with bug #38409, and haven't had a chance to debug it yet. I just wanted to get this reported so I wouldn't forget about it. :)

  /dev/mapper/lvm2|systemvg|alternate-edgy-i386: open failed: No such device or address
  Attempt to close device '/dev/mapper/lvm2|systemvg|alternate-edgy-i386' which is not open.
  /dev/mapper/lvm2_systemvg_feisty-660017ce-d6dd-46ef-b69d-1319b14d8198: open failed: No such device or address
  Attempt to close device '/dev/mapper/lvm2_systemvg_feisty-660017ce-d6dd-46ef-b69d-1319b14d8198' which is not open.
  LV systemvg/feisty-ad4625e9-046b-4b3e-8ebd-22a72b7ae28b in use: not deactivating
  Couldn't deactivate new snapshot.

This happens frequently when attempting to create a snapshot:

# lvcreate -s -L4G /dev/systemvg/edgy_chroot -n edgy-mongoose2
  LV systemvg/edgy-mongoose2 in use: not deactivating
  Couldn't deactivate new snapshot.

And then evms_activate runs like crazy and then lvm following that, hammering my drives:

root 15334 0.0 0.0 10304 648 ? S< 17:41 0:00 /lib/udev/watershed /sbin/evms_activate
root 15335 0.0 0.0 10300 636 ? S< 17:41 0:00 /lib/udev/watershed /sbin/evms_activate
root 15354 1.8 0.1 82364 4528 ? D<l 17:41 0:00 /sbin/evms_activate

...later...

root 15416 0.0 0.0 10304 644 ? S< 17:41 0:00 /lib/udev/watershed /sbin/lvm vgck
root 15418 0.0 0.0 13312 1576 ? D< 17:41 0:00 /sbin/lvm vgck
root 15422 0.0 0.0 10304 640 ? S< 17:41 0:00 /lib/udev/watershed /sbin/lvm vgck

I'm assuming some new race condition has been introduced.

Also, it doesn't seem sensible to run vgck every time a lvm operation happens. This is pretty disk intensive.

Kees Cook (kees) wrote :

Tracked this down a bit more. There is some race condition with lv_activate. If I load up with IO, I can win the race. (I assume I'm either slowing down the vgck from udev, or slowing down the lvcreate itself...)

Here's a failure:

# lvcreate -s -L1G -n test1 /dev/systemvg/edgy_chroot-i386
  LV systemvg/test1 in use: not deactivating
  Couldn't deactivate new snapshot.

And if I nudge the disks, I can get a success:

# (ls -lR /var >/dev/null) &
# lvcreate -s -L1G -n test1 /dev/systemvg/edgy_chroot-i386
  Logical volume "test1" created

Putting a "sleep(2)" at the start of lv_activate also "solved" it, but doing retries within lv_activate just ends up hanging the command. :(

Kees Cook (kees) wrote :

Sorry, that should be "lv_deactivate". Here's my brain-dead work-around, just to help illustrate.

Kees Cook (kees) wrote :

siretart is seeing this too.

Changed in lvm2:
importance: Undecided → High
status: Unconfirmed → Confirmed
Reinhard Tartler (siretart) wrote :

right, though I'm not sure if this isn't bug #38409 after all...

Common cause of that error is udev running stuff that opens and
accesses the device while lvm2 is trying to remove it. Fix the udev
rules so there's no attempt to open those devices.

Long-term fix needs a new userspace notification mechanism: udev will
notify lvm2 when it's finished whatever it's doing, and lvm2 will wait
for that notification before proceeding.

Also udev must totally ignore 'add' events from device-mapper devices
and act instead on the 'change' events we added upstream recently.
['add' event is triggered in the kernel when the device number is
reserved and the device is not yet usable - there are intrinsic races;
'change' is triggered when the device is ready for use incl. each time
its constitution changes and udev ought to reprocess it.]

Alasdair
--
<email address hidden>

Craig Box (craig.box) wrote :

For anyone finding this bug, the fix in bug #38409 (restoring the edgy workaround) makes the immediate problem go away.

However, the udev rules have changed since that bug - the workaround simply tells udev to ignore any devices starting with dm-, whereas the correct fix lies with feisty's udev rules specifically for LVM in 65-persistent-storage.rules and 70-lvm.rules.

The problem is that the udev rules we have on "add" are the ones that create the device with the right name.

So removing those rules would mean there was nothing in /dev/mapper until a table was loaded, which would break lots of things.

I have a confirmed fix for this, working on tidying it up

Changed in lvm2:
assignee: nobody → keybuk
status: Confirmed → In Progress

Packages for testing are available here:

   http://people.ubuntu.com/~scott/packages

Please build all of the sources, and install the binaries from them. This should hopefully correct the problem.

We believe that this problem has been corrected by a series of uploads today. Please update to ensure you have the following package versions:

    dmsetup, libdevmapper1.02 - 1.02.08-1ubuntu6
    lvm-common - 1.5.20ubuntu12
    lvm2 - 2.02.06-2ubuntu9
    mdadm - 2.5.6-7ubuntu5 (not applicable unless you're also using mdadm)
    udev, volumeid, libvolume-id0 - 108-0ubuntu1

The problem was caused by a number of ordering issues and race conditions relating to when lvm and mdadm were called, and how those interacted to ensure the devices were created and their contents examined.

In particular, due to a typo, both udev and devicemapper were attempting to create the /dev/mapper device node. This has been corrected so that only udev does this, and devicemapper waits, as was originally intended.

Note that this event-based sequence is substantially different from Debian, so any bugs filed there will not be relevant to helping solve problems in Ubuntu.

This should now work correctly. If it does not, I would ask that you do not re-open this bug, and instead file a new bug on lvm2 for your exact problem, even if someone else has already filed one, with verbose details about your setup and how you cause the error.

Changed in lvm2:
status: In Progress → Fix Released
Darik Horn (dajhorn) wrote :

I've got one computer that still expresses this bug with the latest Feisty updates. On this computer, there is a 25% chance that the snapshot creation will fail.

Package revisions on the affected computer are:

  dmsetup 2:1.02.08-1ubuntu10
  libdevmapper1.02 2:1.02.08-1ubuntu10
  lvm-common 1.5.20ubuntu12
  lvm2 2.02.06-2ubuntu9
  mdadm 2.5.6-7ubuntu5
  udev 108-0ubuntu4
  volumeid 108-0ubuntu4
  libvolume-id0 108-0ubuntu4
  linux-image-2.6.20-16-generic 2.6.20-16.31

Attached are the dmesg, fstab, and verbose lvcreate output.

Darik Horn (dajhorn) wrote :
Darik Horn (dajhorn) wrote :
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers