Race condition with dmsetup causes 'map already present' messages

Bug #585027 reported by Nigel Jones
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
multipath-tools (Ubuntu)
Medium
Unassigned
Trusty
Medium
Mathieu Trudel-Lapierre

Bug Description

Binary package hint: multipath-tools

Environment:

njones@fantail:~$ lsb_release -rd
Description: Ubuntu 10.04 LTS
Release: 10.04
njones@fantail:~$ apt-cache policy multipath-tools
multipath-tools:
  Installed: 0.4.8-14ubuntu4
  Candidate: 0.4.8-14ubuntu4
  Version table:
 *** 0.4.8-14ubuntu4 0
        500 http://nz.archive.ubuntu.com/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status

Issue:

While reproducing https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/571093 both myself and Dustin Kirkland encountered an apparent race condition in multipath and dmsetup.

We were attempting variants of 'while true; do sudo multipath -F; sudo multipath -v4 ; done' to create a set of udev add/remove events and noticed that shortly after starting, no output appeared within the 'udevadm monitor' command.

We observed that the output from a: multipath -v4 was similar to:

root@fantail:~# multipath -v4
Discover device /sys/block/ram0
ram0: device node name blacklisted
Discover device /sys/block/ram1
ram1: device node name blacklisted
Discover device /sys/block/ram2
ram2: device node name blacklisted
Discover device /sys/block/ram3
ram3: device node name blacklisted
Discover device /sys/block/ram4
ram4: device node name blacklisted
Discover device /sys/block/ram5
ram5: device node name blacklisted
Discover device /sys/block/ram6
ram6: device node name blacklisted
Discover device /sys/block/ram7
ram7: device node name blacklisted
Discover device /sys/block/ram8
ram8: device node name blacklisted
Discover device /sys/block/ram9
ram9: device node name blacklisted
Discover device /sys/block/ram10
ram10: device node name blacklisted
Discover device /sys/block/ram11
ram11: device node name blacklisted
Discover device /sys/block/ram12
ram12: device node name blacklisted
Discover device /sys/block/ram13
ram13: device node name blacklisted
Discover device /sys/block/ram14
ram14: device node name blacklisted
Discover device /sys/block/ram15
ram15: device node name blacklisted
Discover device /sys/block/loop0
loop0: device node name blacklisted
Discover device /sys/block/loop1
loop1: device node name blacklisted
Discover device /sys/block/loop2
loop2: device node name blacklisted
Discover device /sys/block/loop3
loop3: device node name blacklisted
Discover device /sys/block/loop4
loop4: device node name blacklisted
Discover device /sys/block/loop5
loop5: device node name blacklisted
Discover device /sys/block/loop6
loop6: device node name blacklisted
Discover device /sys/block/loop7
loop7: device node name blacklisted
Discover device /sys/block/sda
sda: not found in pathvec
sda: mask = 0x1f
sda: dev_t = 8:0
sda: size = 625142448
sda: subsystem = scsi
sda: vendor = ATA
sda: product = WDC WD3200AAKS-0
sda: rev = 12.0
sda: h:b:t:l = 0:0:0:0
sda: serial = WD-WCAS70000070
sda: getprio = NULL (internal default)
sda: prio = 1
sda: getuid = /lib/udev/scsi_id -g -u -d /dev/%n (config file default)
sda: uid = 1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070 (callout)
Discover device /sys/block/sr0
sr0: device node name blacklisted
Discover device /sys/block/sdb
sdb: not found in pathvec
sdb: mask = 0x1f
sdb: dev_t = 8:16
sdb: size = 15679488
sdb: subsystem = scsi
sdb: vendor = Kingston
sdb: product = DataTraveler G2
sdb: rev = PMAP
sdb: h:b:t:l = 6:0:0:0
sdb: serial =
sdb: getprio = NULL (internal default)
sdb: prio = 1
sdb: getuid = /lib/udev/scsi_id -g -u -d /dev/%n (config file default)
error calling out /lib/udev/scsi_id -g -u -d /dev/sdb
Discover device /sys/block/sdc
sdc: not found in pathvec
sdc: mask = 0x1f
sdc: dev_t = 8:32
sdc: size = 41943040
sdc: subsystem = scsi
sdc: vendor = IET
sdc: product = VIRTUAL-DISK
sdc: rev = 0
sdc: h:b:t:l = 7:0:0:0
sdc: serial =
sdc: getprio = NULL (internal default)
sdc: prio = 1
sdc: getuid = /lib/udev/scsi_id -g -u -d /dev/%n (config file default)
error calling out /lib/udev/scsi_id -g -u -d /dev/sdc
Discover device /sys/block/dm-0
dm-0: device node name blacklisted
Discovery status 0
===== paths list =====
uuid hcil dev dev_t pri dm_st chk_st
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070 0:0:0:0 sda 8:0 1 [undef][undef
                                           6:0:0:0 sdb 8:16 1 [undef][undef
                                           7:0:0:0 sdc 8:32 1 [undef][undef
sda: ownership set to 1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070
sda: not found in pathvec
sda: mask = 0xc
sda: path checker = directio (config file default)
directio: called for 800
directio: called in synchronous mode
directio: starting new request
directio: async io getevents returns 1 (errno=No such file or directory)
directio: io finished 4096/0
sda: state = 2
sda: prio = 1
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: pgfailover = -1 (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: pgpolicy = failover (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: selector = round-robin 0 (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: features = 0 (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: hwhandler = 0 (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: rr_weight = 1 (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: minio = 1000 (config file default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: no_path_retry = NONE (internal default)
pg_timeout = NONE (internal default)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: set ACT_CREATE (map does not exist)
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: map already present
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: domap (0) failure for create/reload map
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070: remove multipath map
sda: orphaned

And a device was left behind in /dev/mapper:

njones@fantail:~$ ll /dev/mapper/
total 0
drwxr-xr-x 2 root root 80 2010-05-25 04:25 ./
drwxr-xr-x 18 root root 4080 2010-05-25 04:25 ../
brw-rw---- 1 root disk 252, 0 2010-05-25 04:25 1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070
crw-rw---- 1 root root 10, 59 2010-05-22 10:35 control

$ sudo dmsetup ls
1ATA_WDC_WD3200AAKS-00TMA0_WD-WCAS70000070 (252, 0)

We had to perform 'dmsetup remove <id>' to clear the 'map already present' message.

While I admit that this is a very specific case, it is still a race condition and does warrant a bug report.

Nigel Jones (dev-nigelj)
description: updated
Revision history for this message
Chuck Short (zulcss) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Please answer these questions:
1. Is this reproducible?
2. If so, what specific steps should we take to recreate this bug? Be as detailed as possible.
This will help us to find and resolve the problem.

Changed in multipath-tools (Ubuntu):
status: New → Incomplete
Revision history for this message
Nigel Jones (dev-nigelj) wrote :

Yes, as stated in my original comment, it is reproducible (myself and Dustin Kirkland), with variants of "while true; do sudo multipath -F; sudo multipath -v4 ; done".

All that is required is a device that multipath will create a /dev/mapper entry for (use_friendly_names or not) and remove with -F (as exampled in my original comment, a bog standard Hard Disk drive will be fit for the purpose.

Changed in multipath-tools (Ubuntu):
status: Incomplete → New
Revision history for this message
Chuck Short (zulcss) wrote :

Thanks for the information.

chuck

Changed in multipath-tools (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Phillip Susi (psusi) wrote :

I have been looking for a race condition that sometimes causes problems with dmraid devices. Can you be more specific about the nature of this race?

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

In examining the ioctl interface to device mapper (kernel), I have not found any architecture that
assures that when a device mapper device has been marked for deletion, that this information is
relayed back to userspace so it can block on this deletion. An example will clarify things.

In SD we have a state machine the details the teardown of a SCSI device. Deletion is "latched" in
two FSM states, SDEV_CANCEL and SDEV_DEL. The first state tells everyone that this device is
on it's way out and to stop trying to use it, DEL is the final state where callbacks start the
kobj destruction process. In this model, it's easy to determine that a device is being deleted
by examining the state variable in sysfs. It's of little consequence here as SD devices are not
deterministic, the same name need not be used, if sdd was being deleted, and another hotplug
create event was generated, SD would hand you sde. Deterministic names are left to udev.

In the device mapper case, the WWID is a deterministic and unique name, there is no substitute.
Once the device has been marked for deletion, there's no readily apparent way to verify that it has in
fact been marked for deletion. dev_status is the closest we can get, and the flags it provides
are not clear enough to determine that a device is being deleted. Therefore, the mechanism
does not exist to even poll for the destruction of a dm name, as we can't even determine that
the dm name is being destroyed. We couldn't wait for deletion if we wanted to.

http://lxr.linux.no/linux+v3.4.4/drivers/md/dm-ioctl.c#L656

So this really isn't a bug, it's an example of DM's current design. If we want to address the issue
then we'll have to propose an enhanced tear down protocol, and then update any device mapper
consumers to use this protocol. There might be a way to infer a deletion state from a combination
of flags, but that will take a concentrated effort to determine. This issue is a synthetic fault at best,
there's no real impact to the user.

Changed in multipath-tools (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Should really be won't fix but I apparently can't set it.

Revision history for this message
Alasdair G. Kergon (agk2) wrote : Re: [Bug 585027] Re: Race condition with dmsetup causes 'map already present' messages

I don't follow this. Your title mentions 'dmsetup' but your description only
talks about multipathd. What is the problem here? One of the multipath
commands in your loop failing to wait until the associated udev action has
completed before exiting? Or some dmsetup command you are running in parallel
that interferes?

Revision history for this message
Alasdair G. Kergon (agk2) wrote :

On Mon, Jul 09, 2012 at 03:29:11PM -0000, Peter Petrakis wrote:
> In examining the ioctl interface to device mapper (kernel), I have not found any architecture that
> assures that when a device mapper device has been marked for deletion, that this information is
> relayed back to userspace so it can block on this deletion. An example will clarify things.

If this is the same process that issued the deletion that needs to wait
till the device has gone, then the udevcookie mechanism is meant to
provide that already. (Multipathd is supposed to be using that
mechanism, upstream at least.)

If it's a different process, then I consider it's userspace's job to
co-ordinate the sequencing of ioctls that conflict with each other.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

@Alasdair

It's buried in the description:

"We were attempting variants of 'while true; do sudo multipath -F; sudo multipath -v4 ; done' to create a set of udev add/remove events and noticed that shortly after starting, no output appeared within the 'udevadm monitor' command."

So it appears that multipath is sending a DM ioctl to create a map with the same wwid,
that "should" have been completely removed at the exit of 'multipath -F'. There's
no dmsetup command executing out of band unless there's some udev rule misbehaving.

The udev rules managing dmsetup are /lib/udev/rules.d/55-dm.rules

I'm not familiar with the cookie feature you mentioned, but we appear to have some of the
capability here. I haven't compared it to upstream yet.

"55-dm.rules"

# Decode udev control flags and set environment variables appropriately.
# These flags are encoded in DM_COOKIE variable that was introduced in
# kernel version 2.6.31. Therefore, we can use this feature with
# kernels >= 2.6.31 only.
ENV{DM_COOKIE}=="?*", IMPORT{program}="/sbin/dmsetup udevflags $env{DM_COOKIE}"
ENV{DM_COOKIE}=="?*", RUN+="/sbin/dmsetup udevcomplete $env{DM_COOKIE}"

and compared to our packaged source in precise:

<12:29:34>multipath-tools-0.4.9$ grep -R cookie .
./kpartx/devmapper.c:dm_simplecmd (int task, const char *name, int no_flush, uint32_t *cookie) {
./kpartx/devmapper.c: if (udev_wait_flag && !dm_task_set_cookie(dmt, cookie, 0))
./kpartx/devmapper.c: mode_t mode, uid_t uid, gid_t gid, uint32_t *cookie) {
./kpartx/devmapper.c: if (task == DM_DEVICE_CREATE && !dm_task_set_cookie(dmt, cookie, 0))
./kpartx/kpartx.c: uint32_t cookie = 0;
./kpartx/kpartx.c: 0, &cookie)) {
./kpartx/kpartx.c: buf.st_gid, &cookie)) {
./kpartx/kpartx.c: 1, &cookie)) {
./kpartx/kpartx.c: &cookie);
./kpartx/kpartx.c: &cookie);
./kpartx/kpartx.c: dm_udev_wait(cookie);
./libmultipath/config.h: uint32_t cookie;
./libmultipath/devmapper.c: if (udev_wait_flag && !dm_task_set_cookie(dmt, &conf->cookie, 0))
./libmultipath/devmapper.c: !dm_task_set_cookie(dmt, &conf->cookie, 0))
./libmultipath/devmapper.c: if (!dm_task_set_cookie(dmt, &conf->cookie, 0))
./multipath/main.c: dm_udev_wait(conf->cookie);

This cookie mechanism is news to me, I'll take a closer look, thanks.

Revision history for this message
Alasdair G. Kergon (agk2) wrote :

On Mon, Jul 09, 2012 at 04:33:25PM -0000, Peter Petrakis wrote:
> I'm not familiar with the cookie feature you mentioned, but we appear to have some of the
> capability here. I haven't compared it to upstream yet.

The code issuing the removal ioctl specfies a cookie (a semaphore with a
positive value). It then waits until the semaphore drops to 0.

The last thing the udev rules do after they've updated /dev is decrement
the semaphore, which wakes up the original code.

There are libdevmapper calls and dmsetup commands to update/query cookie state.

udev 'watch' rules can run out-of-band and need careful configuration to
avoid problems.

If libvirt is issuing dm ioctls or dmsetup commands itself then it might
also need to be udevcookie-aware - not something that's arisen in any
discussions yet - there is some default udevcookie handling built into
dmsetup commands.

I was assured the upstream multipath udevcookie support was complete,
but I've not verified that personally - perhaps we have a situation
that's overlooked here.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

So when this bug was first reported we were using multipath 0.4.8, it doesn't appear
to have this cookie mechanism. 0.4.9 does (oneiric and higher); I have been
unable to reproduce the issue with 0.4.9. Going back through the logs...

"""
From 8d63b33d0996e141a2451df552b062b908db15bc Mon Sep 17 00:00:00 2001
From: Benjamin Marzinski <email address hidden>
Date: Mon, 17 May 2010 14:03:38 -0500
Subject: [PATCH] multipath: add udev sync support.

device-mapper in now able to synchronize operations through udev. This patch
allows multipath and kpartx to make use of this feature. If kpartx is run with
"-s", it waits for the partitions to be created before returning. Multipath
will now always wait for the devices to be created before returning.

This feature requires dm_task_set_cookie() which was finalized in device-mapper
version 1.2.38
"""
Appears to be where cookie support was initially added. I'll retest with 0.4.8 to
verify that the fault still exists (probably) and take it from there.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Retested on precise base and downgraded to multipath-tools 0.4.8-14ubuntu4
on generic 3.2.0.27.29. Ran unit test for several hours, no fault found.

Intend to downgrade to lucid kernel and retest, hopefully it pops then, save
me the trouble of installing lucid somewhere.

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Using the lucid kernel, I was able to reproduce, though it took a lot longer than
original report implied, easily over an hour, probably more than 3 I wasn't watching
it the entire time.

multipath-tools 0.4.8-14ubuntu4.10.04.2
kernel 2.6.32-41-generic

udevadm monitor simply stops responding.

I cannot remove the dm maps, they say "busy/in use". They
are not mounted or used by anyone.

Using the same kernel and multipath 0.4.9-3ubuntu5 I was
unable to reproduce the issue. So this older kernel helps expose the
problem.

Changed in multipath-tools (Ubuntu):
status: Invalid → Confirmed
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Closing as Fix Released; we fixed this with multipath-tools (0.4.9-3ubuntu11) in vivid.

Nominating a fix for Trusty too which we should SRU.

Changed in multipath-tools (Ubuntu):
status: Confirmed → Fix Released
Changed in multipath-tools (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers