Serialization problem of udev events with DM_COOKIE set
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
udev (Ubuntu) |
Fix Released
|
Medium
|
Herton R. Krzesinski | ||
Oneiric |
Fix Released
|
Medium
|
Unassigned | ||
Precise |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Alex Lyakas reports:
Hi Herton,
we are doing initial testing of our software on stock ubuntu-precise
3.2.0-23 - 36. It looks like we have identified a problem with the udevd
code. Here are some details:
We believe the problem was introduced by a fix to
https:/
introduced code like this:
in event_queue_
/* run all events with a timeout set immediately, or in the case
* it's a dm_cookie event being processed */
if (udev_device_
udev_
event_
return 0;
}
Basically, what this fix does, if the udev event has a DM_COOKIE, then
this event is dispatched immediately. Previously, this event was going
through the event_queue_
in-flight events for the same device (or parent, or child etc.) by
calling is_devpath_busy(). With the fix, this doesn’t happen and event
is dispatched immediately.
What we see that when a new device-mapper device is created in the
system, an “add” event is fired, then “change” event comes in (that has
a DM_COOKIE). As a result of the fix, the “change” processing is not
delayed until the “add” processing completes. As a result, both events
are handled concurrently by two different udev-workers. This causes
several different unwanted effects that we observed.
One such effect is in the attached log: dm-16 device is being created.
“add” and “change” events are fired:
May 24 20:26:37 vc-00-00-A-dev udevd[5859]: seq 100141 queued, 'add'
'block'
May 24 20:26:37 vc-00-00-A-dev udevd[5859]: seq 100142 queued, 'change'
'block'
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: seq 100142 running
The run in parallel and “change” processing starts first. Then “add”
processing starts:
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: seq 100141 running
“change” processing creates symbolic links
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating link '/dev/disk
/by-id/
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating symlink '/dev/disk
/by-id/
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating link
'/dev/mapper/
May 24 20:26:37 vc-00-00-A-dev udevd[26633]: creating symlink
'/dev/mapper/
but then “add” processing removes them:
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: update old name, '/dev/disk
/by-id/
'/devices/
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: no reference left, remove
'/dev/disk/
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: update old name,
'/dev/mapper/
'/devices/
May 24 20:26:37 vc-00-00-A-dev udevd[26544]: no reference left, remove
'/dev/mapper/
As a result, there is no /dev/mapper/XXX symbolic link.
We have also seen some other bad effects of this parallel processing
like:
# bad symbolic link is created in /dev/disk/by-path (this happens in
“add” processing, when there is no DM_NAME property):
ll /dev/disk/by-path:
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name- -> ../../dm-4
lrwxrwxrwx 1 root root 10 May 23 21:45 dm-name-ioerror -> ../../dm-0
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-100 -> ../../dm-5
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-103 -> ../../dm-3
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-105 -> ../../dm-2
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-108 -> ../../dm-1
lrwxrwxrwx 1 root root 10 May 23 21:47 dm-name-ppart-109 -> ../../dm-6
# devnode is created in /dev/mapper, instead of symbolic link:
root@vc-
total 0
drwxr-xr-x 2 root root 140 May 23 22:01 ./
drwxr-xr-x 14 root root 9900 May 23 22:01 ../
lrwxrwxrwx 1 root root 7 May 23 22:01 blabla -> ../dm-1
crw------- 1 root root 10, 236 May 23 19:09 control
lrwxrwxrwx 1 root root 7 May 23 21:53 ioerror -> ../dm-0
lrwxrwxrwx 1 root root 7 May 23 22:01 ppart-4927362 -> ../dm-2
brw-rw---- 1 root disk 252, 3 May 23 22:01 ppart-4927363 // This is a
devnode, not symlink!
and some others.
We see this issue only when doing stress testing of the udev system,
like creating and deleting devices, and then creating them again and
deleting again etc.
To resolve the issue, we disabled the code that handles the DM_COOKIE in
udevd. Then we stopped seeing issues.
All in all, our understanding is that two udev events for the same
devpath should not be executing concurrently, which this fix violates
for Device-Mapper devices. Also, this problem cannot be easily addressed
by playing with udev rules, because the flaw is in the udevd code
itself.
Can you pls share your view on the problems we see?
Thanks,
Alex.
Related branches
- Martin Pitt: Approve
-
Diff: 202 lines (+64/-36)3 files modifieddebian/changelog (+8/-0)
debian/patches/avoid-exit-deadlock-for-dm_cookie.patch (+45/-30)
udev/udevd.c (+11/-6)
- Martin Pitt: Approve
-
Diff: 202 lines (+64/-36)3 files modifieddebian/changelog (+8/-0)
debian/patches/avoid-exit-deadlock-for-dm_cookie.patch (+45/-30)
udev/udevd.c (+11/-6)
- Ubuntu branches: Pending requested
-
Diff: 202 lines (+64/-36)3 files modifieddebian/changelog (+8/-0)
debian/patches/avoid-exit-deadlock-for-dm_cookie.patch (+45/-30)
udev/udevd.c (+11/-6)
tags: | added: patch |
Changed in udev (Ubuntu Oneiric): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in udev (Ubuntu Precise): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Indeed there is a serialization issue regarding the DM_COOKIE events. I was able to reproduce with the test case attached, that must be run as root.
Simply let the test case running, it stress test the creation/removal of a dm device. On the buggy udev, it'll fail at some point, usually leaving a /dev/disk/ by-id/dm- name- link, lacking the device name. On a good udev, the test case will run indefinitely without errors.
The problem only happens when stress testing device creation/removal so far, unlikely this to be an issue in practice, where usually no devices are removed.