[SRU] lvremove often fails on first attempt when removing snapshot

Bug #1223576 reported by Adam Gandelman
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
Fix Released
High
Unassigned
Raring
Won't Fix
High
Dimitri John Ledkov
Saucy
Fix Released
High
Unassigned

Bug Description

[SRU Justification]

[Impact]

* Removing LVM snapshots is unreliable on Raring on some hardware.
* Tools that abstract LVM actions will likely fail if they expect these actions
  to function correctly and reliably. Specifically, OpenStack Cinder requires
  reliability here.

[Test Case]

The following test case seems to trigger the bug on two classes of hardware I've used, but not on a virtual machine:

sudo apt-get install lvm2
# A free block device
BDEV="sdb"

# Create original LVM volume.
pvcreate /dev/$BDEV
vgcreate testing-vg /dev/$BDEV
lvcreate -L 1024M -n testing test

# Run a snapshot create/delete cycle and observe frequent failures.
lvcreate -L 1024M --name testing-snapshot --snapshot /dev/testing-vg/testing
lvremove -f /dev/testing-vg/testing-snapshot

Example failure:

$ lvcreate -L 1024M --name testing-snapshot --snapshot /dev/testing-vg/testing
$ lvremove -f /dev/testing-vg/testing-snapshot
  Logical volume "testing-snapshot" created
  Unable to deactivate open testing--vg-testing-real (252:2)
  Failed to resume testing.

[Solution]

Recent updates in Debian+Saucy to the LVM and dm udev rules seem to alleviate the issue and I no longer see it when applying those changes from lvm2 2.02.98-6ubuntu1 (Patch attached)

-------

# lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu Saucy Salamander (development branch)
Release: 13.10
Codename: saucy
# uname -r
3.11.0-4-generic
# dpkg -l | grep lvm2
ii lvm2 2.02.98-1ubuntu5 amd64 Linux Logical Volume Manager

Experiencing this on saucy across a number of boxes, all of similar hardware. Unable to reproduce in a virtual machine using saucy cloud images.

# BDEV="sdb"
# apt-get install -y lvm2
# umount /mnt
# pvcreate /dev/$BDEV
# vgcreate testing-vg /dev/$BDEV
# lvcreate -L 1024M -n testing testing-vg
# lvcreate -L 1024M --name testing-snapshot --snapshot /dev/testing-vg/testing
# lvremove -f /dev/testing-vg/testing-snapshot
  Unable to deactivate open testing--vg-testing-real (252:2)
  Failed to resume testing.
# lvremove -f /dev/testing-vg/testing-snapshot
  Logical volume "testing-snapshot" successfully removed

Comparing output of 'dmsetup info' before and after failed attempts:

 - The failed attempt removes the -cow device (testing--vg-testing--snapshot-cow), but the -real and -snapshot devices remain.
 - Pre failure, the Open Count of testing--vg-testing-real is 2 (which AFAIK is to be expected). After failure, the device remains but with an Open Count of 0.
 - After successful lvremove of the snapshot, the -real device persists. On unaffected systems, the post-snapshot removal dm table is the same as it was pre-creation of snapshot. In this case, a single device (testing--vg-testing)

# dmsetup info
Name: testing--vg-testing-real
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 2
Event number: 0
Major, minor: 252, 2
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw-real

Name: testing--vg-testing--snapshot
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 1
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3qlotuThv4tJMHJH7w0CN1B6kEcnNYkdv

Name: testing--vg-testing--snapshot-cow
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 1
Event number: 0
Major, minor: 252, 3
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3qlotuThv4tJMHJH7w0CN1B6kEcnNYkdv-cow

Name: testing--vg-testing
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 0
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw

# lvremove -f /dev/testing-vg/testing-snapshot
  Unable to deactivate open testing--vg-testing-real (252:2)
  Failed to resume testing.

# dmsetup info
Name: testing--vg-testing-real
State: ACTIVE
Read Ahead: 0
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 2
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw-real

Name: testing--vg-testing--snapshot
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 1
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3qlotuThv4tJMHJH7w0CN1B6kEcnNYkdv

Name: testing--vg-testing
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 0
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw

# lvremove -f /dev/testing-vg/testing-snapshot
  Logical volume "testing-snapshot" successfully removed

# dmsetup info
Name: testing--vg-testing-real
State: ACTIVE
Read Ahead: 0
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 2
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw-real

Name: testing--vg-testing
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 0
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw

Output of 'udevadm monitor' with comments for contexts (also attached):

# Original LV creation
KERNEL[3890.419526] add /devices/virtual/bdi/252:0 (bdi)
KERNEL[3890.419572] add /devices/virtual/block/dm-0 (block)
KERNEL[3890.419903] change /devices/virtual/block/dm-0 (block)
UDEV [3890.420177] add /devices/virtual/bdi/252:0 (bdi)
UDEV [3890.420594] add /devices/virtual/block/dm-0 (block)
UDEV [3890.439210] change /devices/virtual/block/dm-0 (block)
KERNEL[3890.440430] change /devices/virtual/block/dm-0 (block)
UDEV [3890.458613] change /devices/virtual/block/dm-0 (block)
KERNEL[3890.493110] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3890.553035] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

# Snapshot creation
KERNEL[3915.481512] add /devices/virtual/bdi/252:1 (bdi)
KERNEL[3915.481542] add /devices/virtual/block/dm-1 (block)
KERNEL[3915.481814] change /devices/virtual/block/dm-1 (block)
UDEV [3915.482069] add /devices/virtual/bdi/252:1 (bdi)
UDEV [3915.482468] add /devices/virtual/block/dm-1 (block)
UDEV [3915.503721] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.504157] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.548206] add /devices/virtual/bdi/252:2 (bdi)
KERNEL[3915.548288] add /devices/virtual/block/dm-2 (block)
UDEV [3915.548503] add /devices/virtual/bdi/252:2 (bdi)
KERNEL[3915.548787] add /devices/virtual/bdi/252:3 (bdi)
KERNEL[3915.548882] add /devices/virtual/block/dm-3 (block)
UDEV [3915.549093] add /devices/virtual/block/dm-2 (block)
KERNEL[3915.549173] change /devices/virtual/block/dm-3 (block)
UDEV [3915.549240] add /devices/virtual/bdi/252:3 (bdi)
UDEV [3915.549710] add /devices/virtual/block/dm-3 (block)
UDEV [3915.553716] change /devices/virtual/block/dm-3 (block)
KERNEL[3915.563685] change /devices/virtual/block/dm-2 (block)
KERNEL[3915.563823] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.563915] change /devices/virtual/block/dm-0 (block)
UDEV [3915.567734] change /devices/virtual/block/dm-2 (block)
UDEV [3915.586136] change /devices/virtual/block/dm-1 (block)
UDEV [3915.603960] change /devices/virtual/block/dm-0 (block)
UDEV [3915.611925] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.625968] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3915.672978] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

# Failed snapshot delete
KERNEL[3937.161806] change /devices/virtual/block/dm-3 (block)
KERNEL[3937.161839] change /devices/virtual/block/dm-2 (block)
KERNEL[3937.161929] change /devices/virtual/block/dm-1 (block)
KERNEL[3937.162154] remove /devices/virtual/block/dm-3 (block)
KERNEL[3937.162305] remove /devices/virtual/bdi/252:3 (bdi)
KERNEL[3937.162375] remove /devices/virtual/block/dm-3 (block)
UDEV [3937.162842] remove /devices/virtual/bdi/252:3 (bdi)
UDEV [3937.169283] change /devices/virtual/block/dm-3 (block)
UDEV [3937.172103] remove /devices/virtual/block/dm-3 (block)
UDEV [3937.172217] remove /devices/virtual/block/dm-3 (block)
KERNEL[3937.176945] change /devices/virtual/block/dm-0 (block)
KERNEL[3937.177255] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3937.220478] change /devices/virtual/block/dm-2 (block)
UDEV [3937.221039] change /devices/virtual/block/dm-1 (block)
UDEV [3937.222593] change /devices/virtual/block/dm-0 (block)
UDEV [3937.297968] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

# Successful snapshot delete
KERNEL[3953.366547] remove /devices/virtual/block/dm-1 (block)
KERNEL[3953.366713] remove /devices/virtual/bdi/252:1 (bdi)
KERNEL[3953.366805] remove /devices/virtual/block/dm-1 (block)
UDEV [3953.367431] remove /devices/virtual/bdi/252:1 (bdi)
UDEV [3953.369726] remove /devices/virtual/block/dm-1 (block)
UDEV [3953.369849] remove /devices/virtual/block/dm-1 (block)
KERNEL[3953.439220] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3953.510027] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

Revision history for this message
Adam Gandelman (gandelman-a) wrote :
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Using the same hardware this seems to not affect 12.04, but on 13.04 the failure occurs %50 of the time. Every other create + delete cycle completes without error.

Revision history for this message
Brian Murray (brian-murray) wrote :

Did you mean on 13.10 in your last comment?

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

No, using the same hardware I've tested on Precise, Raring and Saucy. Precise does not seem to be affected at all, Raring/13.04 seems to be affected 50% of the time (or every other test cycle), Saucy/13.10 hits an error on first attempt 100% of the time. Sorry for the confusion.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

This appears to be resolved in Debian using lvm2 2.02.98-6. There have been a number of updates to the udev rules since our 13.10 version (2.02.98-1). Syncing these changes out to my test hardware fixes this issue as well as another total LVM deadlock I started hitting while testing some new OpenStack patches. I will merge 2.02.98-6, test and file a FFE.

Changed in lvm2 (Ubuntu):
importance: Undecided → High
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Testing the changes to udev rules from lvm2 2.02.98-6 on raring 13.04 fixes the intermittent issue there, as well.

Revision history for this message
Adam Gandelman (gandelman-a) wrote :

This was fixed in 13.10 with lvm2 (2.02.98-6ubuntu1).

Changed in lvm2 (Ubuntu Saucy):
status: New → Fix Released
Changed in lvm2 (Ubuntu Raring):
importance: Undecided → High
Revision history for this message
Adam Gandelman (gandelman-a) wrote :
description: updated
summary: - lvremove always fails on first attempt when removing snapshot
+ [SRU] lvremove often fails on first attempt when removing snapshot
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Looks good to me. Will test & upload into raring-proposed sometime this week.

Changed in lvm2 (Ubuntu Raring):
assignee: nobody → Dmitrijs Ledkovs (xnox)
status: New → Confirmed
Revision history for this message
Adam Gandelman (gandelman-a) wrote :

Dmitrijs, any update on pushing the raring fix to proposed?

Revision history for this message
Sebastian Unger (sebunger44) wrote :

Any idea whether this will be fixed in raring?

Revision history for this message
Dimitri John Ledkov (xnox) wrote : Re: [Bug 1223576] Re: [SRU] lvremove often fails on first attempt when removing snapshot

On 24 January 2014 11:40, Sebastian Unger <email address hidden> wrote:
> Any idea whether this will be fixed in raring?
>

No, as it's less than 7 days until Raring End Of Life.
https://lists.ubuntu.com/archives/ubuntu-announce/2014-January/000178.html

Please upgrade to 13.10 (Saucy) release.

--
Regards,

Dimitri.

Revision history for this message
Rolf Leggewie (r0lf) wrote :

raring has seen the end of its life and is no longer receiving any updates. Marking the raring task for this ticket as "Won't Fix".

Changed in lvm2 (Ubuntu Raring):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.