khubd/usbhid deadlock(?) creates processes in state D
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
(Note that the attached logs are not a machine exhibiting the issue)
I've observed this issue specifically on Dell's PowerEdge R410 with its DRAC. I manage quite a few other machines with DRACs and haven't seen this issue so it may be limited to the R410. I've seen cases of this both with the precise and trusty kernel.
Basically, what happens is that things that read any info about the DRAC's usbhid device will block in an uninterruptible state. A quick way to do this is with "cat /sys/devices/
This is also associated with the following from the kernel (first from a precise machine):
[197852.595820] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197857.716899] usb 5-2: device descriptor read/64, error -71
[197857.944966] usb 5-2: device descriptor read/64, error -71
[197858.165020] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197858.285090] usb 5-2: device descriptor read/64, error -71
[197858.513108] usb 5-2: device descriptor read/64, error -71
[197858.733154] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197859.145214] usb 5-2: device not accepting address 2, error -71
[197859.261234] usb 5-2: reset full-speed USB device number 2 using uhci_hcd
[197859.669322] usb 5-2: device not accepting address 2, error -71
[197859.675357] usb 5-2: USB disconnect, device number 2
[198047.530714] INFO: task khubd:203 blocked for more than 120 seconds.
[198047.537147] Not tainted 3.13.0-74-generic #118~precise1-
[198047.543691] "echo 0 > /proc/sys/
[198047.551792] khubd D ffff880803fe18a0 0 203 2 0x00000000
[198047.551803] ffff8810036b1c18 0000000000000046 ffff88102fc53180 ffff8810036b1fd8
[198047.551809] 0000000000013180 0000000000013180 ffff881003f7c800 ffff880804634800
[198047.551814] ffff8810036b1c38 ffff8810020908e8 ffff8810020908ec 00000000ffffffff
[198047.551819] Call Trace:
[198047.551828] [<ffffffff81764
[198047.551833] [<ffffffff81764
[198047.551837] [<ffffffff81766
[198047.551845] [<ffffffff8156e
[198047.551848] [<ffffffff81766
[198047.551852] [<ffffffff81565
[198047.551857] [<ffffffff8156f
[198047.551860] [<ffffffff81568
[198047.551864] [<ffffffff81562
[198047.551868] [<ffffffff81569
[198047.551871] [<ffffffff81764
[198047.551875] [<ffffffff81569
[198047.551881] [<ffffffff810b0
[198047.551885] [<ffffffff81569
[198047.551890] [<ffffffff8108f
[198047.551893] [<ffffffff8108f
[198047.551898] [<ffffffff81771
[198047.551902] [<ffffffff8108f
[198047.551908] INFO: task kworker/9:1:246 blocked for more than 120 seconds.
[198047.558892] Not tainted 3.13.0-74-generic #118~precise1-
[198047.565491] "echo 0 > /proc/sys/
[198047.573524] kworker/9:1 D ffff880803fe18a0 0 246 2 0x00000000
[198047.573547] Workqueue: events hid_reset [usbhid]
[198047.573550] ffff8808011f9788 0000000000000046 ffff88080fc93180 ffff8808011f9fd8
[198047.573555] 0000000000013180 0000000000013180 ffff881003f94800 ffff8808011f0000
[198047.573559] ffff8808011f9798 ffff8808011f98d8 7fffffffffffffff 7fffffffffffffff
[198047.573564] Call Trace:
[198047.573579] [<ffffffff81764
[198047.573582] [<ffffffff81763
[198047.573587] [<ffffffff810a4
[198047.573591] [<ffffffff81765
[198047.573596] [<ffffffff8109f
[198047.573602] [<ffffffff81089
[198047.573605] [<ffffffff81085
[198047.573609] [<ffffffff81089
[198047.573623] [<ffffffff81089
[198047.573632] [<ffffffffa00a6
[198047.573640] [<ffffffffa00a8
[198047.573650] [<ffffffffa0036
[198047.573659] [<ffffffff815b3
[198047.573663] [<ffffffff815b9
[198047.573670] [<ffffffff815b9
[198047.573674] [<ffffffff815b5
[198047.573678] [<ffffffff815b5
[198047.573687] [<ffffffffa0036
[198047.573695] [<ffffffffa0033
[198047.573708] [<ffffffffa0033
[198047.573720] [<ffffffff814b7
[198047.573724] [<ffffffff814b7
[198047.573728] [<ffffffff814b7
[198047.573736] [<ffffffff814b4
[198047.573744] [<ffffffffa0032
[198047.573754] [<ffffffffa00a6
[198047.573758] [<ffffffff81573
[198047.573764] [<ffffffff814b7
[198047.573768] [<ffffffff814b7
[198047.573772] [<ffffffff81573
[198047.573780] [<ffffffffa00a8
[198047.573785] [<ffffffff81573
[198047.573788] [<ffffffff81573
[198047.573792] [<ffffffff81573
[198047.573796] [<ffffffff81567
[198047.573804] [<ffffffffa00a7
[198047.573808] [<ffffffff81087
[198047.573811] [<ffffffff81088
[198047.573815] [<ffffffff81088
[198047.573819] [<ffffffff8108f
[198047.573823] [<ffffffff8108f
[198047.573827] [<ffffffff81771
[198047.573830] [<ffffffff8108f
And these from a trusty machine (which includes info from one of the stuck processes too):
[1367979.659164] SysRq : Show Blocked State
[1367979.663114] task PC stack pid father
[1367979.669300] khubd D ffff88100054f160 0 203 2 0x00000000
[1367979.676574] ffff88100352fc18 0000000000000046 ffff88080fc93180 ffff88100352ffd8
[1367979.684216] 0000000000013180 0000000000013180 ffff8808049f4800 ffff8808043f4800
[1367979.691855] ffff88100352fc38 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
[1367979.699495] Call Trace:
[1367979.702134] [<ffffffff81764
[1367979.707288] [<ffffffff81764
[1367979.713831] [<ffffffff81766
[1367979.720288] [<ffffffff8156e
[1367979.726049] [<ffffffff81766
[1367979.731376] [<ffffffff81565
[1367979.737138] [<ffffffff8156f
[1367979.743160] [<ffffffff81568
[1367979.749703] [<ffffffff81562
[1367979.755724] [<ffffffff81569
[1367979.761223] [<ffffffff81764
[1367979.766897] [<ffffffff81569
[1367979.772313] [<ffffffff810b0
[1367979.778160] [<ffffffff81569
[1367979.784942] [<ffffffff8108f
[1367979.790008] [<ffffffff8108f
[1367979.796378] [<ffffffff81771
[1367979.801965] [<ffffffff8108f
[1367979.808345] kworker/8:1 D ffff88100054f160 0 329 2 0x00000000
[1367979.815632] Workqueue: events hid_reset [usbhid]
[1367979.820445] ffff881002625788 0000000000000046 ffff88102fc93180 ffff881002625fd8
[1367979.828088] 0000000000013180 0000000000013180 ffff8808049f3000 ffff881002639800
[1367979.835726] ffff881002625798 ffff8810026258d8 7fffffffffffffff 7fffffffffffffff
[1367979.843367] Call Trace:
[1367979.845999] [<ffffffff81764
[1367979.851151] [<ffffffff81763
[1367979.857174] [<ffffffff810aa
[1367979.863196] [<ffffffff81765
[1367979.869392] [<ffffffff8109f
[1367979.875414] [<ffffffff81089
[1367979.880741] [<ffffffff81085
[1367979.886415] [<ffffffff81089
[1367979.892609] [<ffffffff81089
[1367979.898461] [<ffffffffa0072
[1367979.905801] [<ffffffffa0074
[1367979.912176] [<ffffffffa0042
[1367979.918374] [<ffffffff815b3
[1367979.924397] [<ffffffff815b9
[1367979.929983] [<ffffffff815b9
[1367979.935832] [<ffffffff815b5
[1367979.942547] [<ffffffff815b5
[1367979.949007] [<ffffffffa0042
[1367979.955638] [<ffffffffa003f
[1367979.961837] [<ffffffffa003f
[1367979.968298] [<ffffffff814b7
[1367979.974752] [<ffffffff814b7
[1367979.981095] [<ffffffff814b7
[1367979.987202] [<ffffffff814b4
[1367979.992707] [<ffffffffa003e
[1367979.999253] [<ffffffffa0072
[1367980.005968] [<ffffffff81573
[1367980.012252] [<ffffffff814b7
[1367980.018710] [<ffffffff814b7
[1367980.024992] [<ffffffff81573
[1367980.031885] [<ffffffffa0074
[1367980.038603] [<ffffffff81573
[1367980.044971] [<ffffffff81573
[1367980.052121] [<ffffffff81573
[1367980.059981] [<ffffffff81567
[1367980.066004] [<ffffffffa0073
[1367980.072199] [<ffffffff81087
[1367980.078221] [<ffffffff81088
[1367980.083982] [<ffffffff81088
[1367980.090699] [<ffffffff8108f
[1367980.095765] [<ffffffff8108f
[1367980.102132] [<ffffffff81771
[1367980.107719] [<ffffffff8108f
[1367980.114098] python2.7 D ffff88100054f160 0 18303 1 0x00000004
[1367980.121369] ffff881001497d08 0000000000000082 ffff88080fc53180 ffff881001497fd8
[1367980.129007] 0000000000013180 0000000000013180 ffff881003f86000 ffff881001688000
[1367980.136648] 0000881000000024 ffff881002e100e8 ffff881002e100ec 00000000ffffffff
[1367980.144286] Call Trace:
[1367980.146921] [<ffffffff81764
[1367980.152073] [<ffffffff81764
[1367980.158615] [<ffffffff81766
[1367980.165069] [<ffffffff81766
[1367980.170397] [<ffffffff81576
[1367980.176331] [<ffffffff814b4
[1367980.181918] [<ffffffff81766
[1367980.187418] [<ffffffff811ee
[1367980.193180] [<ffffffff81242
[1367980.199027] [<ffffffff811ef
[1367980.204267] [<ffffffff811cb
[1367980.209506] [<ffffffff811cb
[1367980.214657] [<ffffffff81771
lsusb for the device in question:
$ sudo lsusb -v -d 0624:0248
Bus 005 Device 002: ID 0624:0248 Avocent Corp. Virtual Hub
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 1.10
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0624 Avocent Corp.
idProduct 0x0248 Virtual Hub
bcdDevice 0.00
iManufacturer 1 Avocent
iProduct 2 USB Composite Device-0
iSerial 3 20080519
bNumConfigura
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 59
bNumInterfaces 2
bConfigurat
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 2mA
Interface Descriptor:
bLength 9
bDescript
bInterfac
bAlternat
bNumEndpoints 1
bInterfac
bInterfac
bInterfac
iInterface 4 Keyboard
HID Device Descriptor:
bLength 9
bcdHID 1.00
Report Descriptors:
** UNAVAILABLE **
Endpoint Descriptor:
bLength 7
Transfer Type Interrupt
Synch Type None
Usage Type Data
bInterval 8
Interface Descriptor:
bLength 9
bDescript
bInterfac
bAlternat
bNumEndpoints 1
bInterfac
bInterfac
bInterfac
iInterface 5 Mouse
HID Device Descriptor:
bLength 9
bcdHID 1.00
Report Descriptors:
** UNAVAILABLE **
Endpoint Descriptor:
bLength 7
Transfer Type Interrupt
Synch Type None
Usage Type Data
bInterval 8
Device Status: 0x0000
(Bus Powered)
And now for my purely speculative analysis of what might be going on after digging through the code...
There are two devices interacting here, the usbhid that is the keyboard for the DRAC and the hub that it's connected to.
As a part of the usbhid initialization (usbhid_probe) it creates a tasklet for hid_reset (I have no idea why, but perhaps it needs to reset the underlying hid/usb devices to start working) which unloads and reloads all the parts of the hidusb driver (usb, hid, input, evdev). As a part of the unload for the usbhid it calls hid_cancel_
I haven't been able to confirm this, but I think that if the hub closes it can close the devices associated with it, which gives us a fun race condition where if we start the probe for the usbhid device and then the underlying hub gets removed it will have no instances open, which causes usbhid to call hid_cancel_
I'm currently working around the issue by adding a udev rule to prevent the device from being used by setting authorized=0 for it. This obviously prevents the DRAC from actually working, though.
tags: | added: patch |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1557172
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.