[SRU] lvremove often fails on first attempt when removing snapshot

Bug #1223576 reported by Adam Gandelman on 2013-09-10
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
High
Unassigned
Raring
High
Dimitri John Ledkov
Saucy
High
Unassigned

Bug Description

[SRU Justification]

[Impact]

* Removing LVM snapshots is unreliable on Raring on some hardware.
* Tools that abstract LVM actions will likely fail if they expect these actions
  to function correctly and reliably. Specifically, OpenStack Cinder requires
  reliability here.

[Test Case]

The following test case seems to trigger the bug on two classes of hardware I've used, but not on a virtual machine:

sudo apt-get install lvm2
# A free block device
BDEV="sdb"

# Create original LVM volume.
pvcreate /dev/$BDEV
vgcreate testing-vg /dev/$BDEV
lvcreate -L 1024M -n testing test

# Run a snapshot create/delete cycle and observe frequent failures.
lvcreate -L 1024M --name testing-snapshot --snapshot /dev/testing-vg/testing
lvremove -f /dev/testing-vg/testing-snapshot

Example failure:

$ lvcreate -L 1024M --name testing-snapshot --snapshot /dev/testing-vg/testing
$ lvremove -f /dev/testing-vg/testing-snapshot
  Logical volume "testing-snapshot" created
  Unable to deactivate open testing--vg-testing-real (252:2)
  Failed to resume testing.

[Solution]

Recent updates in Debian+Saucy to the LVM and dm udev rules seem to alleviate the issue and I no longer see it when applying those changes from lvm2 2.02.98-6ubuntu1 (Patch attached)

-------

# lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu Saucy Salamander (development branch)
Release: 13.10
Codename: saucy
# uname -r
3.11.0-4-generic
# dpkg -l | grep lvm2
ii lvm2 2.02.98-1ubuntu5 amd64 Linux Logical Volume Manager

Experiencing this on saucy across a number of boxes, all of similar hardware. Unable to reproduce in a virtual machine using saucy cloud images.

# BDEV="sdb"
# apt-get install -y lvm2
# umount /mnt
# pvcreate /dev/$BDEV
# vgcreate testing-vg /dev/$BDEV
# lvcreate -L 1024M -n testing testing-vg
# lvcreate -L 1024M --name testing-snapshot --snapshot /dev/testing-vg/testing
# lvremove -f /dev/testing-vg/testing-snapshot
  Unable to deactivate open testing--vg-testing-real (252:2)
  Failed to resume testing.
# lvremove -f /dev/testing-vg/testing-snapshot
  Logical volume "testing-snapshot" successfully removed

Comparing output of 'dmsetup info' before and after failed attempts:

 - The failed attempt removes the -cow device (testing--vg-testing--snapshot-cow), but the -real and -snapshot devices remain.
 - Pre failure, the Open Count of testing--vg-testing-real is 2 (which AFAIK is to be expected). After failure, the device remains but with an Open Count of 0.
 - After successful lvremove of the snapshot, the -real device persists. On unaffected systems, the post-snapshot removal dm table is the same as it was pre-creation of snapshot. In this case, a single device (testing--vg-testing)

# dmsetup info
Name: testing--vg-testing-real
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 2
Event number: 0
Major, minor: 252, 2
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw-real

Name: testing--vg-testing--snapshot
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 1
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3qlotuThv4tJMHJH7w0CN1B6kEcnNYkdv

Name: testing--vg-testing--snapshot-cow
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 1
Event number: 0
Major, minor: 252, 3
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3qlotuThv4tJMHJH7w0CN1B6kEcnNYkdv-cow

Name: testing--vg-testing
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 0
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw

# lvremove -f /dev/testing-vg/testing-snapshot
  Unable to deactivate open testing--vg-testing-real (252:2)
  Failed to resume testing.

# dmsetup info
Name: testing--vg-testing-real
State: ACTIVE
Read Ahead: 0
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 2
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw-real

Name: testing--vg-testing--snapshot
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 1
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3qlotuThv4tJMHJH7w0CN1B6kEcnNYkdv

Name: testing--vg-testing
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 0
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw

# lvremove -f /dev/testing-vg/testing-snapshot
  Logical volume "testing-snapshot" successfully removed

# dmsetup info
Name: testing--vg-testing-real
State: ACTIVE
Read Ahead: 0
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 2
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw-real

Name: testing--vg-testing
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 0
Event number: 0
Major, minor: 252, 0
Number of targets: 1
UUID: LVM-cYTNHi8q1D1wA2wl0OMDmpgLBzUBsdm3wOGeFtca62anbqn236E0g20fYFhVRXEw

Output of 'udevadm monitor' with comments for contexts (also attached):

# Original LV creation
KERNEL[3890.419526] add /devices/virtual/bdi/252:0 (bdi)
KERNEL[3890.419572] add /devices/virtual/block/dm-0 (block)
KERNEL[3890.419903] change /devices/virtual/block/dm-0 (block)
UDEV [3890.420177] add /devices/virtual/bdi/252:0 (bdi)
UDEV [3890.420594] add /devices/virtual/block/dm-0 (block)
UDEV [3890.439210] change /devices/virtual/block/dm-0 (block)
KERNEL[3890.440430] change /devices/virtual/block/dm-0 (block)
UDEV [3890.458613] change /devices/virtual/block/dm-0 (block)
KERNEL[3890.493110] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3890.553035] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

# Snapshot creation
KERNEL[3915.481512] add /devices/virtual/bdi/252:1 (bdi)
KERNEL[3915.481542] add /devices/virtual/block/dm-1 (block)
KERNEL[3915.481814] change /devices/virtual/block/dm-1 (block)
UDEV [3915.482069] add /devices/virtual/bdi/252:1 (bdi)
UDEV [3915.482468] add /devices/virtual/block/dm-1 (block)
UDEV [3915.503721] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.504157] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.548206] add /devices/virtual/bdi/252:2 (bdi)
KERNEL[3915.548288] add /devices/virtual/block/dm-2 (block)
UDEV [3915.548503] add /devices/virtual/bdi/252:2 (bdi)
KERNEL[3915.548787] add /devices/virtual/bdi/252:3 (bdi)
KERNEL[3915.548882] add /devices/virtual/block/dm-3 (block)
UDEV [3915.549093] add /devices/virtual/block/dm-2 (block)
KERNEL[3915.549173] change /devices/virtual/block/dm-3 (block)
UDEV [3915.549240] add /devices/virtual/bdi/252:3 (bdi)
UDEV [3915.549710] add /devices/virtual/block/dm-3 (block)
UDEV [3915.553716] change /devices/virtual/block/dm-3 (block)
KERNEL[3915.563685] change /devices/virtual/block/dm-2 (block)
KERNEL[3915.563823] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.563915] change /devices/virtual/block/dm-0 (block)
UDEV [3915.567734] change /devices/virtual/block/dm-2 (block)
UDEV [3915.586136] change /devices/virtual/block/dm-1 (block)
UDEV [3915.603960] change /devices/virtual/block/dm-0 (block)
UDEV [3915.611925] change /devices/virtual/block/dm-1 (block)
KERNEL[3915.625968] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3915.672978] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

# Failed snapshot delete
KERNEL[3937.161806] change /devices/virtual/block/dm-3 (block)
KERNEL[3937.161839] change /devices/virtual/block/dm-2 (block)
KERNEL[3937.161929] change /devices/virtual/block/dm-1 (block)
KERNEL[3937.162154] remove /devices/virtual/block/dm-3 (block)
KERNEL[3937.162305] remove /devices/virtual/bdi/252:3 (bdi)
KERNEL[3937.162375] remove /devices/virtual/block/dm-3 (block)
UDEV [3937.162842] remove /devices/virtual/bdi/252:3 (bdi)
UDEV [3937.169283] change /devices/virtual/block/dm-3 (block)
UDEV [3937.172103] remove /devices/virtual/block/dm-3 (block)
UDEV [3937.172217] remove /devices/virtual/block/dm-3 (block)
KERNEL[3937.176945] change /devices/virtual/block/dm-0 (block)
KERNEL[3937.177255] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3937.220478] change /devices/virtual/block/dm-2 (block)
UDEV [3937.221039] change /devices/virtual/block/dm-1 (block)
UDEV [3937.222593] change /devices/virtual/block/dm-0 (block)
UDEV [3937.297968] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

# Successful snapshot delete
KERNEL[3953.366547] remove /devices/virtual/block/dm-1 (block)
KERNEL[3953.366713] remove /devices/virtual/bdi/252:1 (bdi)
KERNEL[3953.366805] remove /devices/virtual/block/dm-1 (block)
UDEV [3953.367431] remove /devices/virtual/bdi/252:1 (bdi)
UDEV [3953.369726] remove /devices/virtual/block/dm-1 (block)
UDEV [3953.369849] remove /devices/virtual/block/dm-1 (block)
KERNEL[3953.439220] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)
UDEV [3953.510027] change /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/host2/port-2:1/end_device-2:1/target2:0:1/2:0:1:0/block/sdb (block)

Adam Gandelman (gandelman-a) wrote :
Adam Gandelman (gandelman-a) wrote :

Using the same hardware this seems to not affect 12.04, but on 13.04 the failure occurs %50 of the time. Every other create + delete cycle completes without error.

Brian Murray (brian-murray) wrote :

Did you mean on 13.10 in your last comment?

Adam Gandelman (gandelman-a) wrote :

No, using the same hardware I've tested on Precise, Raring and Saucy. Precise does not seem to be affected at all, Raring/13.04 seems to be affected 50% of the time (or every other test cycle), Saucy/13.10 hits an error on first attempt 100% of the time. Sorry for the confusion.

Adam Gandelman (gandelman-a) wrote :

This appears to be resolved in Debian using lvm2 2.02.98-6. There have been a number of updates to the udev rules since our 13.10 version (2.02.98-1). Syncing these changes out to my test hardware fixes this issue as well as another total LVM deadlock I started hitting while testing some new OpenStack patches. I will merge 2.02.98-6, test and file a FFE.

Changed in lvm2 (Ubuntu):
importance: Undecided → High
Adam Gandelman (gandelman-a) wrote :

Testing the changes to udev rules from lvm2 2.02.98-6 on raring 13.04 fixes the intermittent issue there, as well.

Adam Gandelman (gandelman-a) wrote :

This was fixed in 13.10 with lvm2 (2.02.98-6ubuntu1).

Changed in lvm2 (Ubuntu Saucy):
status: New → Fix Released
Changed in lvm2 (Ubuntu Raring):
importance: Undecided → High
summary: - lvremove always fails on first attempt when removing snapshot
+ [SRU] lvremove often fails on first attempt when removing snapshot
Dimitri John Ledkov (xnox) wrote :

Looks good to me. Will test & upload into raring-proposed sometime this week.

Changed in lvm2 (Ubuntu Raring):
assignee: nobody → Dmitrijs Ledkovs (xnox)
status: New → Confirmed
Adam Gandelman (gandelman-a) wrote :

Dmitrijs, any update on pushing the raring fix to proposed?

Sebastian Unger (sebunger44) wrote :

Any idea whether this will be fixed in raring?

On 24 January 2014 11:40, Sebastian Unger <email address hidden> wrote:
> Any idea whether this will be fixed in raring?
>

No, as it's less than 7 days until Raring End Of Life.
https://lists.ubuntu.com/archives/ubuntu-announce/2014-January/000178.html

Please upgrade to 13.10 (Saucy) release.

--
Regards,

Dimitri.

Rolf Leggewie (r0lf) wrote :

raring has seen the end of its life and is no longer receiving any updates. Marking the raring task for this ticket as "Won't Fix".

Changed in lvm2 (Ubuntu Raring):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments