OSK becomes unusable as taps are delayed by 1-2 seconds and vibration doesn't occur any more (although gestures continue to work)

Bug #1620553 reported by Andrea Bernabei
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
Critical
Pat McGowan
platform-api (Ubuntu)
In Progress
High
Thomas Voß
usensord (Ubuntu)
Fix Committed
High
Zhang Enwei

Bug Description

Krillin, rc-proposed/bq_aquaris.en r422

TEST CASE:
1. Create to alarms that'll trigger at the same time
2. Wait until they go off
3. Dismiss the alarms
4. Verify that there is still haptic feedback and OSK is still responsive

ACTUAL RESULT
Step 4 fails.

UPDATE: see comment #5, it turns out it's the haptics plugin doing SYNC dbus calls and blocking the UI thread when the dbus service does not reply or the replies come with a big delay

Description:
All at once taps have become incredibly delayed, by 1 or 2 secs.
Gestures still work ok, no delay there.
Swiping the greeter -> no delay. Tapping numbers to input code on the greeter --> 2 secs delay. Even though both are parts of unity8. So it doesn't seem to only be a problem of unity8 clients, but also unity8 itself.
I have no idea why horizontal/vertical swipes would be unaffected, though.

The virtual keyboard is also completely unusable because of the huge delay that each tap has.

Also noticed that the vibration is gone, taps don't trigger vibration anymore.

Webview also seemed to be unaffected by the delays (although I'm not entirely sure, the bug is now gone)

I also restarted unity8 and unity8-dash with Mir input logging enabled. That showed that the touch events were being delivered as expected, no delay.
restart unity8 MIR_CLIENT_INPUT_RECEIVER_REPORT=log and
restart unity8-dash MIR_CLIENT_INPUT_RECEIVER_REPORT=log

Restarting lightdm (that forces the restart of unity-system-compositor) fixed the issue.

Additional info: I had Mir touchspots visualization enabled, which are known to cause more stuttering, but I think they're unlikely to be the cause of this bug, I had them enabled for 2 weeks and haven't noticed any problem like this before.

This is the log from a tap on an icon in the shell:
[2016-09-06 10:52:51.876400] <DEBUG> input-receiver: Received event:touch_event(when=54593678129000 (6.030520ms ago), from=3, touch = {{id=0, action=down, tool=finger, x=426.211, y=292.695, pressure=0.85098, major=19.963, minor=0, size=19.963}, modifiers=1)
[2016-09-06 10:52:51.953821] <DEBUG> input-receiver: Received event:touch_event(when=54593751953000 (9.709366ms ago), from=3, touch = {{id=0, action=change, tool=finger, x=426.211, y=292.695, pressure=0.843137, major=19.963, minor=0, size=19.963}, modifiers=1)
[2016-09-06 10:52:51.954343] <DEBUG> input-receiver: Received event:touch_event(when=54593761256000 (0.958751ms ago), from=3, touch = {{id=0, action=up, tool=finger, x=426.211, y=292.695, pressure=0.843137, major=19.963, minor=0, size=19.963}, modifiers=1)

SEE VIDEO ATTACHMENT BELOW

Related branches

Revision history for this message
Andrea Bernabei (faenil) wrote :

I think I know that it could be!
I think it's the vibration handler!

After inputting the passcode in the greeter, unity8 prints the following lines *after* the 2 secs delay:
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

And as noted in the bug description, one of the sympthoms was that vibration stopped working.

I think what happened is that the vibrator handler crashed, and on every tap that requires vibration the system is waiting some timeout for the vibrator to reply to the DBus call before timing out.

Those calls are probably synchronous, hence blocking the UI until a reply is received (do we really have to block the UI to vibrate?). But since the other end is probably dead, the phone stalls until the NoReply answer is received, and only then the UI thread gets back in control again.

Revision history for this message
Andrea Bernabei (faenil) wrote :
description: updated
Michał Sawicz (saviq)
affects: unity8 (Ubuntu) → ubuntu-ui-toolkit (Ubuntu)
Revision history for this message
Andrea Bernabei (faenil) wrote :
description: updated
no longer affects: ubuntu-ui-toolkit (Ubuntu)
Revision history for this message
Cris Dywan (kalikiana) wrote :

The new plugin indeed seems to be synchronous - in the UI Toolkit we merely use HapticsEffect.play(), so nothing to do on that side.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in platform-api (Ubuntu):
status: New → Confirmed
Cris Dywan (kalikiana)
Changed in platform-api (Ubuntu):
importance: Undecided → Critical
Andrea Bernabei (faenil)
summary: - [Krillin] Taps are delayed by 1-2seconds, although gestures are not.
- Keyboard is thus unusable
+ [Krillin] Haptics engine uses sync dbus calls. Was: "Taps are delayed by
+ 1-2seconds, although gestures are not. Keyboard is thus unusable"
summary: - [Krillin] Haptics engine uses sync dbus calls. Was: "Taps are delayed by
+ Haptics engine uses sync dbus calls. Was: "Taps are delayed by
1-2seconds, although gestures are not. Keyboard is thus unusable"
Changed in platform-api (Ubuntu):
importance: Critical → High
assignee: nobody → Thomas Voß (thomas-voss)
Revision history for this message
Pat McGowan (pat-mcgowan) wrote : Re: Haptics engine uses sync dbus calls. Was: "Taps are delayed by 1-2seconds, although gestures are not. Keyboard is thus unusable"

This change to usensord now checks the app permissions to access the sensor on every tap.
http://bazaar.launchpad.net/~phablet-team/usensord/trunk/revision/26
This should use something like an lru cache with the dbus sender id

tags: added: regression-proposed
Changed in usensord (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

@andrea I am not experiencing anything close to a 1 sec delay, not really noticing any delay at all
In the description you mention I had Mir touchspots visualization enabled, can you test without that

Changed in canonical-devices-system-image:
status: New → Incomplete
assignee: nobody → Pat McGowan (pat-mcgowan)
Changed in canonical-devices-system-image:
importance: Undecided → Medium
tags: removed: regression-proposed
Revision history for this message
Andrea Bernabei (faenil) wrote :

@Pat this is a bug I accidentally managed to reproduce, not a constant behaviour.

I don't have a way to reproduce it yet. We "just" have a to find a way for the haptics engine dbus call to block, in my case I'm not sure what made that happen, the dbus call was blocking and then returning NoReply.

I tried killing usensord to reproduce the bug, but that didn't help, dbus just errored instantly with a blabla was not provided by any .service files".

tags: added: performance
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Just tidying up. It's usually better to keep the bug title describing what users see. Especially if your diagnosis is wrong... that title then remains correct.

summary: - Haptics engine uses sync dbus calls. Was: "Taps are delayed by
- 1-2seconds, although gestures are not. Keyboard is thus unusable"
+ Haptics engine uses sync dbus calls. OSK becomes unusable as taps are
+ delayed by 1-2 seconds and vibration doesn't occur any more (although
+ gestures continue to work)
Changed in platform-api (Ubuntu):
status: Confirmed → In Progress
Changed in canonical-devices-system-image:
status: Incomplete → In Progress
summary: - Haptics engine uses sync dbus calls. OSK becomes unusable as taps are
- delayed by 1-2 seconds and vibration doesn't occur any more (although
- gestures continue to work)
+ OSK becomes unusable as taps are delayed by 1-2 seconds and vibration
+ doesn't occur any more (although gestures continue to work)
Revision history for this message
Andrea Bernabei (faenil) wrote :

I'd think it'd be useful to also add the actual problem, so we can look it up later when needed. But I don't mind

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Reproduced a few times, need to root cause it

Changed in canonical-devices-system-image:
importance: Medium → Critical
milestone: none → 13
tags: added: regression-proposed
Revision history for this message
Ken VanDine (ken-vandine) wrote :

My turbo was affected by this yesterday and reboots didn't seem to fix it. I'm guessing usensord was wedged very quickly after startup, I was receiving lots of SMS messages around the time it happened.

Debugging this morning showed the last process that called usensord was telephony-service-indicator, which handles the notifications for SMS/MMS. The log ended with the initial call from telephony-service-indicator, so it appears usensord never replied to that request and was blocked.

After restarting usensord with upstart in my user session, it started responding like normal. I'm trying to reproduce it again to determine if telephony-service-indicator is the trigger.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

fwiw I bombarded usensord with direct dbus requests simultaneously and could not reproduce the issue

#!/bin/bash
i=$1
while [ $i -gt 0 ]; do
    dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:500
    ((i--))
    sleep $2
done

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Verified it does not trigger a deadlock on stable using the approach of two simultaneous alarms

Revision history for this message
Jonas G. Drange (jonas-drange) wrote :

One of the threads deadlocks inside the haptic handler (which is 100 lines of go, called every time the user e.g. presses the OSK), on a call to dbus (messageBus.Call()). However, dbus seems to function fine, this looks like an internal failure in usensord.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

It seems to be deadlocked after calling messagebus.Call
@enwei can you take a look, set two alarms in the clock app for the same minute to reproduce it

Changed in usensord (Ubuntu):
assignee: nobody → Zhang Enwei (zhangew401)
Revision history for this message
Andrea Bernabei (faenil) wrote :

I had a quick look at this (I have ZERO Go knowledge though), the code seems to hang at http://bazaar.launchpad.net/~go-dbus/go-dbus/v1/view/136/dbus.go#L303
i.e. waiting for a reply on replyChan

gdb showed that all threads had an infinite chain of futexsleep() except one, that was waiting on epollwait(), and above epollwait() that had a chain of runtime.netpoll().

Hope that helps

tags: added: lt-blocker
description: updated
Zhang Enwei (zhangew401)
Changed in usensord (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Zhang Enwei (zhangew401) wrote :

Thanks Pat telling me how to reproduce it.
I have reproduced it. I am trying to figuring out the reason and at the same time I wrote to James for help.

Revision history for this message
Andrea Bernabei (faenil) wrote :

Started from Pat's script and found an easier way to reproduce it (note the final ampersand, which is the trick):

cat ./bombard.sh
#!/bin/bash
i=$1
while [ $i -gt 0 ]; do
    dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
    ((i--))
done

On my MX4 Arale "./bombard.sh 4" is enough to get usensord stuck

Revision history for this message
Zhang Enwei (zhangew401) wrote :

Thanks Andrea. I am using a similar way with that of yours, only four commands executed simultaneously.

#!/bin/bash
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &

I think this could simulate the operation in clock/alarm
I found the rule is that two or more senders(thread or process) calls the dbus function at the same time.
It freezes at below code,
http://bazaar.launchpad.net/~phablet-team/usensord/trunk/view/head:/haptic/haptic.go#L146

Revision history for this message
Andrea Bernabei (faenil) wrote :

Here is the stracktrace of all the goroutines running at http://bazaar.launchpad.net/~go-dbus/go-dbus/v1/view/136/dbus.go#L302

i.e. I printed the stacktrace of all the goroutines right before the "reply := <- replyChan"

(disclaimer: I have 0 Go knowledge so take my words here with a pinch of salt, the method I used comes from http://stackoverflow.com/a/19094539 )

Revision history for this message
Andrea Bernabei (faenil) wrote :

Hi Zhang, more precisely, it seems at least one of the routines gets stuck at waiting for a reply on replyChan

i.e. line 303 (see above)

Revision history for this message
Andrea Bernabei (faenil) wrote :

another stacktrace, this time dispatchMessage does not appear in the list of the calls, just fyi.

It could be that it had already completed or had not started yet, not sure. Just thought I'd share this trace as well

Revision history for this message
Andrea Bernabei (faenil) wrote :

Additional info:

I built usensord on my laptop using
"go run -race usensord.go"
and that did not show any race condition after usensord stalled.

I also tried "go run -race test.go" using the code from https://golang.org/doc/articles/race_detector.html#Introduction
so I think the race detector setup is working, it's just there isn't any race condition in usensord or go-dbus

My laptop has
go version go1.6.2 linux/amd64

Revision history for this message
Zhang Enwei (zhangew401) wrote :

Landing:https://requests.ci-train.ubuntu.com/#/ticket/1930
I verified vivid debian package on below images by creating 6 simultaneous alarms that will ring one minute later.
current build number: 827
device name: krillin
channel: ubuntu-touch/rc-proposed/bq-aquaris.en-proposed
last update: 2016-09-09 23:42:32
version version: 827
version ubuntu: 20160907
version device: 20160606-ab415b2
version custom: 20160831-991-38-18

Revision history for this message
Andrea Bernabei (faenil) wrote :

And the race detector correctly showed the data race in test.go, just not in usensord

Changed in usensord (Ubuntu):
status: In Progress → Fix Committed
Changed in canonical-devices-system-image:
status: In Progress → Fix Committed
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.