udev rules make it impossible to deactivate lvm volume group with vgchange -an

Bug #1088081 reported by Richard Hansen on 2012-12-08
102
This bug affects 18 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
High
Dimitri John Ledkov
udev (Ubuntu)
Undecided
Unassigned

Bug Description

Running 'vgchange -a n volume_group_name' generates udev events that are matched by /lib/udev/rules.d/85-lvm2.rules, causing it to run 'vgchange -a y'. This defeats the initial 'vgchange -a n' and makes it impossible to:
  * run 'cryptsetup luksClose' on the underlying encrypted device
  * safely remove a removable drive containing an LVM physical volume
  * safely use dd to copy the LVM partition to another device (LVM might be writing data)

I turned up logging and it looks like the following happened to the instance I was watching:
  1. 'vgchange -a n vgname' caused LVM to close /dev/dm-5 (the LUKS dm device holding the LVM physical volume)
  2. udev logged this: device /dev/dm-5 closed, synthesising 'change'
  3. a new 'change' 'block' udev event was enqueued for the LUKS dm device
  4. udev started processing the new 'change' event
  5. the event matched /lib/udev/rules.d/60-persistent-storage-dm.rules:16 so blkid was run and set ID_FS_TYPE=LVM2_member
  6. all of the conditions for /lib/udev/rules.d/85-lvm2.rules matched, so 'vgchange -a y' was run

Brainstorming ideas for fixing this:
  * I'm not sure why udev synthesizes a change event when a dm device is closed, but disabling that bit of code should fix this bug (and probably cause many worse bugs).
  * Maybe the lvm udev rule should condition on ACTION=="add" instead of ACTION="add|change" (just tried this and unfortunately it doesn't work -- unlocking a LUKS device causes two back-to-back udev events: one 'add' event that only appears to match /lib/udev/rules.d/50-udev-default.rules:67 and a second 'change' event that matches much more).

Other info:
$ lsb_release -rdc
Description: Ubuntu 12.10
Release: 12.10
Codename: quantal

lvm2 version: 2.02.95-4ubuntu1
udev version: 175-0ubuntu13

Richard Hansen (rhansen) on 2012-12-08
affects: udev (Ubuntu) → lvm2 (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lvm2 (Ubuntu):
status: New → Confirmed
Changed in udev (Ubuntu):
status: New → Confirmed
Dr_Jekyll (borsi222) wrote :

Hello everyone,
i made the same observations, on 12.10 Server. If there's anything i can do to test or debug this, I'd love to.

Thanks,
Chris

Dr_Jekyll (borsi222) wrote :

Hello,
quick follow-up: I confirmed that Ubuntu 12.04 does work as expected, it is possible to make a logical volume unavailable with lvchange -an <LV>. I tested this with the same LV as from 12.10.

Thanks,
Chris

Richard Hansen (rhansen) wrote :

Tagging with regression-release because this works in 12.04. Thanks to @Dr_Jekyll for testing!

tags: added: regression-release
Dr_Jekyll (borsi222) wrote :

Hi,
it's been a month now, can we do anything to "promote" this bug? Even if it won't be fixed in 12.10, can we somehow make sure it receives some attention for 13.04?

Thanks,
Christoph

Martin Bruns (martin-konahina) wrote :

Hi,

I would also like to get if fixed soon.

Nevertheless I found at least a workaround which works for my systems. The workaround is as follows.

sudo umount /mnt # asume the drive in question is mounted under /mnt
sudo service udev stop
sudo lvchange -a n <LV-name>
sudo cryptsetup luksClose <LUKS-devicename>
sudo service udev start

Kind Regards
Martin

Dr_Jekyll (borsi222) wrote :

Hi,
Thanks Martin, that works for me as well. Still, a "real" fix would be great.

Have a good week,
Christoph

Pavel A (pavela) wrote :

Same problem here. Another workaround is to use 'sudo dmsetup remove /dev/dm-X' instead of 'sudo lvchange -a n vg/lv'.
As @Dr_Jekyll has stated Ubuntu 12.04 is not affected.

Tom Fields (udzelem) wrote :

Hi,

thank you Pavel, the tip with dmsetup remove works fine (using /dev/mapper/vg_.....).

Nasty bug though.

Dimitri John Ledkov (xnox) wrote :

Interesting. Precise -> Quantal saw a very big lvm2 update. There were changes done to udev handling and maybe some of it needs to be reverted. It's interesting that dmsetup remove works, and maybe lvchange should act more like dmsetup does.

Changed in lvm2 (Ubuntu):
assignee: nobody → Dmitrijs Ledkovs (xnox)
importance: Undecided → High

It looks that Precise is affected also in combination with the LVM resource agent from the pacemaker cluster manager (package: resource-agents): In a simple test setup (LVM on top of an DRBD device) I was not able to reliable migrate the volume group from one node to the other until I disabled the udev rule "85-lvm2.rules".

I am not sure, though, if in this special context this behaviour should be considered as a bug.

This also stops vgimportclone (which I used to import LVM VGs from virtual machines) from working correctly, eg:

# lvcreate -n testsnap -L 10G --snapshot vmdata/sirius-test-nz-db
  Logical volume "testsnap" created

# kpartx -va /dev/mapper/vmdata-testsnap
add map vmdata-testsnap1 (252:238): 0 1024000 linear /dev/mapper/vmdata-testsnap 2048
add map vmdata-testsnap2 (252:239): 0 61886464 linear /dev/mapper/vmdata-testsnap 1026048

# pvs /dev/mapper/vmdata-testsnap2
  PV VG Fmt Attr PSize PFree
  /dev/mapper/vmdata-testsnap2 vg_siriusstnzdb lvm2 a-- 29.51g 0

# vgs
  VG #PV #LV #SN Attr VSize VFree
  vg_siriusstnzdb 1 2 0 wz--n- 29.51g 0
  vmdata 1 95 43 wz--n- 3.27t 2.11t

# vgimportclone -n testsnap-200313 /dev/mapper/vmdata-testsnap2
  WARNING: Activation disabled. No device-mapper interaction will be attempted.
  Physical volume "/tmp/snap.zGhp1FKk/vgimport0" changed
  1 physical volume changed / 0 physical volumes not changed
  WARNING: Activation disabled. No device-mapper interaction will be attempted.
  Volume group "vg_siriusstnzdb" successfully changed
  Volume group "vg_siriusstnzdb" successfully renamed to "testsnap-200313"
  Reading all physical volumes. This may take a while...
  Found volume group "testsnap-200313" using metadata type lvm2
  Found volume group "vmdata" using metadata type lvm2

-------------- At this point there should be no "vg_siriusstnzdb" devices, because LVM doesn't know about them. ----------
# dmsetup ls|grep vg_siriusstnzdb
vg_siriusstnzdb-lv_swap (252:241)
vg_siriusstnzdb-lv_root (252:240)

Obviously I can work around this by listing the volumes and then removing $ORIGVG-$VOLUME with dmsetup, but without knowing of this bug, you end up having to reboot, because you don't know what's using your devices.

Tuomas Heino (iheino+ub) on 2013-06-15
tags: added: quantal raring
Dane Fichter (dane-fichter) wrote :

Has this ever truly been resolved? I believe I am seeing this same bug with 14.04 and failing to resolve it even with the workarounds listed here. After unstack I'm left with a logical volume which I cannot delete or deactivate.

Josip Rodin (joy) wrote :

I also see this race condition happen often when I open up and manipulate nested LVM devices.

For example, an LV is partitioned and one of the partitions inside is a PV. A kpartx -a is run to activate the partitions, vgimportclone is run to rename the VG inside that nested PV, and as MatthewParslow said above the old nested VG device nodes are always left behind.

After cleaning those up manually, a sequence of two commands is run, vgchange -a n on the renamed VG and kpartx -d on the top-level LV. It usually works (it wins the race with udev vgchange -a y), but sometimes it doesn't.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers