evdev mutex inconsistent lock state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Linaro Android |
Fix Released
|
High
|
Tixy (Jon Medhurst) | |
| linaro-landing-team-arm |
Fix Released
|
High
|
Tixy (Jon Medhurst) |
Bug Description
I've been fairly regularly seeing "inconsistent lock state" backtraces when running Android on vexpress TC2. See attached log.
This looks like a problem with how the evdev driver uses a mutex.
Tixy (Jon Medhurst) (tixy) wrote : | #1 |
Changed in linaro-landing-team-arm: | |
assignee: | nobody → Tixy (Jon Medhurst) (tixy) |
importance: | Undecided → High |
status: | New → Triaged |
Changed in linaro-landing-team-arm: | |
status: | Triaged → In Progress |
Changed in linaro-android: | |
milestone: | none → 12.09 |
Tixy (Jon Medhurst) (tixy) wrote : | #2 |
Changed in linaro-android: | |
importance: | Undecided → High |
assignee: | nobody → Tixy (Jon Medhurst) (tixy) |
Botao (botao-sun) wrote : | #3 |
For vexpress A9 board, observed on Linaro Jelly Bean image:
Please refer to attachment to get full boot log.
John Stultz (jstultz) wrote : | #4 |
Per Tixy's debugging, I've generated a patch to use irqsave/restore when accessing the events_lock.
This seems to me to be the right fix, but I've sent it to the PM maintainer for comment before wider submission.
If this is right, we will need to make sure this gets backported to AOSPs common.
Changed in linaro-landing-team-arm: | |
milestone: | none → 2012.09 |
John Stultz (jstultz) wrote : | #5 |
Patch sent to lkml and pushed out via linaro-
Once the patch is merged upstream, I'll send it on to AOSP.
Tixy (Jon Medhurst) (tixy) wrote : | #6 |
Patch "wakeup: Use irqsave/irqrestore for events_lock" applied to http://
Changed in linaro-landing-team-arm: | |
status: | In Progress → Fix Committed |
Amit Khare (amit-khare) wrote : | #7 |
Observed with android vexpress JB build
https:/
LOGS:
[ 263.224205] CPU0
[ 263.231518] ----
[ 263.238827] lock(&(
[ 263.253461] <Interrupt>
[ 263.261294] lock(&(
[ 263.276449]
[ 263.276449] *** DEADLOCK ***
[ 263.276449]
[ 263.294192] 2 locks held by InputReader/2014:
[ 263.307237] #0: (&evdev-
[ 263.330005] #1: (&(&client-
[ 263.357450]
[ 263.357450] stack backtrace:
[ 263.370533] [<c0015058>] (unwind_
[ 263.397929] [<c04fce78>] (print_
[ 263.424276] [<c0071c70>] (mark_lock+
[ 263.450361] [<c007468c>] (mark_held_
[ 263.480096] [<c00747fc>] (trace_
[ 263.510875] [<c05052e4>] (_raw_spin_
[ 263.538529] [<c03aaff8>] (evdev_
[ 263.564352] [<c00f4678>] (do_vfs_
[ 263.589133] [<c00f4b64>] (sys_ioctl+
[ 362.131513] init: sys_prop: permission denied uid:1003 name:service.
[ 476.986679] mmcblk0: error -5 transferring data, sector 617064, nr 24, cmd response 0x900, card status 0xb00
[ 477.016346] mmcblk0: retrying using single block transfer
[ 484.922577] mmcblk0: error -5 transferring data, sector 1979938, nr 30, cmd response 0x900, card status 0xc00
[ 484.952583] mmcblk0: retrying using single block transfer
[ 486.646280] mmcblk0: error -5 transferring data, sector 624360, nr 8, cmd response 0x900, card status 0xb00
[ 486.675675] mmcblk0: retrying using single block transfer
John Stultz (jstultz) wrote : | #8 |
Amit: Has build37 included the fix above?
Amit Khare (amit-khare) wrote : | #9 |
John: Looking at the logs it seems the fix is not included with this build. I see similar out put on build 37 as in comment #3 by Botao
John Stultz (jstultz) wrote : | #10 |
Colin Cross just pulled this fix into AOSP for Android's offical 3.4 tree.
Amit Khare (amit-khare) wrote : | #11 |
Observed with android vexpress JB build
https:/
I dont see the DEADLOCK issue now but lot of sdcard related issues are there.
LOGS:
[ 21.851704] mmcblk0: retrying using single block transfer
[ 22.574259] init: cannot find '/system/bin/rild', disabling 'ril-daemon'
[ 22.686519] usb 1-1.2: new full-speed USB device number 47 using isp1760
[ 22.706855] mmcblk0: error -5 transferring data, sector 782368, nr 24, cmd response 0x900, card status 0xb00
[ 22.736382] mmcblk0: retrying using single block transfer
[ 22.786593] usb 1-1.2: device descriptor read/64, error -32
[ 22.986556] usb 1-1.2: device descriptor read/64, error -32
[ 23.003532] mmcblk0: error -5 transferring data, sector 754920, nr 24, cmd response 0x900, card status 0xb00
[ 23.033079] mmcblk0: retrying using single block transfer
[ 23.186641] usb 1-1.2: new full-speed USB device number 48 using isp1760
[ 23.207079] mmcblk0: error -5 transferring data, sector 751520, nr 24, cmd response 0x900, card status 0xb00
[ 23.236649] mmcblk0: retrying using single block transfer
[ 23.326558] usb 1-1.2: device descriptor read/64, error -32
Changed in linaro-android: | |
status: | New → Fix Committed |
Botao (botao-sun) wrote : | #12 |
@Amit, those SD card errors are expected. This bug is used for tracking that "DEADLOCK" issue and now it has been solved.
Changed in linaro-android: | |
status: | Fix Committed → Fix Released |
Changed in linaro-landing-team-arm: | |
status: | Fix Committed → Fix Released |
This bug is caused by the Android patch called " Input: evdev - Add ioctl to block suspend while event queue is not empty."
This is broken because evdev_enable_ suspend_ block() has
spin_lock_ irq(&client- >buffer_ lock); init(&client- >wake_lock, WAKE_LOCK_SUSPEND, client->name); irq(&client- >buffer_ lock);
wake_lock_
[...]
spin_unlock_
which means that wake_lock_init() is called with interrupts disabled. However wake_lock_init() calls wakeup_source_add() which itself uses spinlocks like...
spin_lock_ irq(&events_ lock); irq(&events_ lock);
[...]
spin_unlock_
which assumes that it is called with interrupts enabled. As a consequence, the spin_unlock_irq() above unconditionally enables irqs which means the evdev code which called it is now holding the client->buffer_lock with interrupts enabled, opening up the opportunity for deadlock to occur if an interrupt happens for an event which then calls evdev_pass_event() and then waits on the same spinlock.