OSK becomes unusable as taps are delayed by 1-2 seconds and vibration doesn't occur any more (although gestures continue to work)

Bug #1620553 reported by Andrea Bernabei on 2016-09-06
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Critical
Pat McGowan
platform-api (Ubuntu)
High
Thomas Voß
usensord (Ubuntu)
High
Zhang Enwei

Bug Description

Krillin, rc-proposed/bq_aquaris.en r422

TEST CASE:
1. Create to alarms that'll trigger at the same time
2. Wait until they go off
3. Dismiss the alarms
4. Verify that there is still haptic feedback and OSK is still responsive

ACTUAL RESULT
Step 4 fails.

UPDATE: see comment #5, it turns out it's the haptics plugin doing SYNC dbus calls and blocking the UI thread when the dbus service does not reply or the replies come with a big delay

Description:
All at once taps have become incredibly delayed, by 1 or 2 secs.
Gestures still work ok, no delay there.
Swiping the greeter -> no delay. Tapping numbers to input code on the greeter --> 2 secs delay. Even though both are parts of unity8. So it doesn't seem to only be a problem of unity8 clients, but also unity8 itself.
I have no idea why horizontal/vertical swipes would be unaffected, though.

The virtual keyboard is also completely unusable because of the huge delay that each tap has.

Also noticed that the vibration is gone, taps don't trigger vibration anymore.

Webview also seemed to be unaffected by the delays (although I'm not entirely sure, the bug is now gone)

I also restarted unity8 and unity8-dash with Mir input logging enabled. That showed that the touch events were being delivered as expected, no delay.
restart unity8 MIR_CLIENT_INPUT_RECEIVER_REPORT=log and
restart unity8-dash MIR_CLIENT_INPUT_RECEIVER_REPORT=log

Restarting lightdm (that forces the restart of unity-system-compositor) fixed the issue.

Additional info: I had Mir touchspots visualization enabled, which are known to cause more stuttering, but I think they're unlikely to be the cause of this bug, I had them enabled for 2 weeks and haven't noticed any problem like this before.

This is the log from a tap on an icon in the shell:
[2016-09-06 10:52:51.876400] <DEBUG> input-receiver: Received event:touch_event(when=54593678129000 (6.030520ms ago), from=3, touch = {{id=0, action=down, tool=finger, x=426.211, y=292.695, pressure=0.85098, major=19.963, minor=0, size=19.963}, modifiers=1)
[2016-09-06 10:52:51.953821] <DEBUG> input-receiver: Received event:touch_event(when=54593751953000 (9.709366ms ago), from=3, touch = {{id=0, action=change, tool=finger, x=426.211, y=292.695, pressure=0.843137, major=19.963, minor=0, size=19.963}, modifiers=1)
[2016-09-06 10:52:51.954343] <DEBUG> input-receiver: Received event:touch_event(when=54593761256000 (0.958751ms ago), from=3, touch = {{id=0, action=up, tool=finger, x=426.211, y=292.695, pressure=0.843137, major=19.963, minor=0, size=19.963}, modifiers=1)

SEE VIDEO ATTACHMENT BELOW

Related branches

Andrea Bernabei (faenil) wrote :

I think I know that it could be!
I think it's the vibration handler!

After inputting the passcode in the greeter, unity8 prints the following lines *after* the 2 secs delay:
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

And as noted in the bug description, one of the sympthoms was that vibration stopped working.

I think what happened is that the vibrator handler crashed, and on every tap that requires vibration the system is waiting some timeout for the vibrator to reply to the DBus call before timing out.

Those calls are probably synchronous, hence blocking the UI until a reply is received (do we really have to block the UI to vibrate?). But since the other end is probably dead, the phone stalls until the NoReply answer is received, and only then the UI thread gets back in control again.

Andrea Bernabei (faenil) wrote :
description: updated
Michał Sawicz (saviq) on 2016-09-06
affects: unity8 (Ubuntu) → ubuntu-ui-toolkit (Ubuntu)
Andrea Bernabei (faenil) wrote :
description: updated
no longer affects: ubuntu-ui-toolkit (Ubuntu)
Christian Dywan (kalikiana) wrote :

The new plugin indeed seems to be synchronous - in the UI Toolkit we merely use HapticsEffect.play(), so nothing to do on that side.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in platform-api (Ubuntu):
status: New → Confirmed
Changed in platform-api (Ubuntu):
importance: Undecided → Critical
Andrea Bernabei (faenil) on 2016-09-06
summary: - [Krillin] Taps are delayed by 1-2seconds, although gestures are not.
- Keyboard is thus unusable
+ [Krillin] Haptics engine uses sync dbus calls. Was: "Taps are delayed by
+ 1-2seconds, although gestures are not. Keyboard is thus unusable"
summary: - [Krillin] Haptics engine uses sync dbus calls. Was: "Taps are delayed by
+ Haptics engine uses sync dbus calls. Was: "Taps are delayed by
1-2seconds, although gestures are not. Keyboard is thus unusable"
Changed in platform-api (Ubuntu):
importance: Critical → High
assignee: nobody → Thomas Voß (thomas-voss)

This change to usensord now checks the app permissions to access the sensor on every tap.
http://bazaar.launchpad.net/~phablet-team/usensord/trunk/revision/26
This should use something like an lru cache with the dbus sender id

tags: added: regression-proposed
Changed in usensord (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Pat McGowan (pat-mcgowan) wrote :

@andrea I am not experiencing anything close to a 1 sec delay, not really noticing any delay at all
In the description you mention I had Mir touchspots visualization enabled, can you test without that

Changed in canonical-devices-system-image:
status: New → Incomplete
assignee: nobody → Pat McGowan (pat-mcgowan)
Changed in canonical-devices-system-image:
importance: Undecided → Medium
tags: removed: regression-proposed
Andrea Bernabei (faenil) wrote :

@Pat this is a bug I accidentally managed to reproduce, not a constant behaviour.

I don't have a way to reproduce it yet. We "just" have a to find a way for the haptics engine dbus call to block, in my case I'm not sure what made that happen, the dbus call was blocking and then returning NoReply.

I tried killing usensord to reproduce the bug, but that didn't help, dbus just errored instantly with a blabla was not provided by any .service files".

tags: added: performance
Daniel van Vugt (vanvugt) wrote :

Just tidying up. It's usually better to keep the bug title describing what users see. Especially if your diagnosis is wrong... that title then remains correct.

summary: - Haptics engine uses sync dbus calls. Was: "Taps are delayed by
- 1-2seconds, although gestures are not. Keyboard is thus unusable"
+ Haptics engine uses sync dbus calls. OSK becomes unusable as taps are
+ delayed by 1-2 seconds and vibration doesn't occur any more (although
+ gestures continue to work)
Changed in platform-api (Ubuntu):
status: Confirmed → In Progress
Changed in canonical-devices-system-image:
status: Incomplete → In Progress
summary: - Haptics engine uses sync dbus calls. OSK becomes unusable as taps are
- delayed by 1-2 seconds and vibration doesn't occur any more (although
- gestures continue to work)
+ OSK becomes unusable as taps are delayed by 1-2 seconds and vibration
+ doesn't occur any more (although gestures continue to work)
Andrea Bernabei (faenil) wrote :

I'd think it'd be useful to also add the actual problem, so we can look it up later when needed. But I don't mind

Pat McGowan (pat-mcgowan) wrote :

Reproduced a few times, need to root cause it

Changed in canonical-devices-system-image:
importance: Medium → Critical
milestone: none → 13
tags: added: regression-proposed
Ken VanDine (ken-vandine) wrote :

My turbo was affected by this yesterday and reboots didn't seem to fix it. I'm guessing usensord was wedged very quickly after startup, I was receiving lots of SMS messages around the time it happened.

Debugging this morning showed the last process that called usensord was telephony-service-indicator, which handles the notifications for SMS/MMS. The log ended with the initial call from telephony-service-indicator, so it appears usensord never replied to that request and was blocked.

After restarting usensord with upstart in my user session, it started responding like normal. I'm trying to reproduce it again to determine if telephony-service-indicator is the trigger.

Pat McGowan (pat-mcgowan) wrote :

fwiw I bombarded usensord with direct dbus requests simultaneously and could not reproduce the issue

#!/bin/bash
i=$1
while [ $i -gt 0 ]; do
    dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:500
    ((i--))
    sleep $2
done

Pat McGowan (pat-mcgowan) wrote :

Verified it does not trigger a deadlock on stable using the approach of two simultaneous alarms

Jonas G. Drange (jonas-drange) wrote :

One of the threads deadlocks inside the haptic handler (which is 100 lines of go, called every time the user e.g. presses the OSK), on a call to dbus (messageBus.Call()). However, dbus seems to function fine, this looks like an internal failure in usensord.

Pat McGowan (pat-mcgowan) wrote :

It seems to be deadlocked after calling messagebus.Call
@enwei can you take a look, set two alarms in the clock app for the same minute to reproduce it

Changed in usensord (Ubuntu):
assignee: nobody → Zhang Enwei (zhangew401)
Andrea Bernabei (faenil) wrote :

I had a quick look at this (I have ZERO Go knowledge though), the code seems to hang at http://bazaar.launchpad.net/~go-dbus/go-dbus/v1/view/136/dbus.go#L303
i.e. waiting for a reply on replyChan

gdb showed that all threads had an infinite chain of futexsleep() except one, that was waiting on epollwait(), and above epollwait() that had a chain of runtime.netpoll().

Hope that helps

tags: added: lt-blocker
description: updated
Zhang Enwei (zhangew401) on 2016-09-09
Changed in usensord (Ubuntu):
status: Confirmed → In Progress
Zhang Enwei (zhangew401) wrote :

Thanks Pat telling me how to reproduce it.
I have reproduced it. I am trying to figuring out the reason and at the same time I wrote to James for help.

Andrea Bernabei (faenil) wrote :

Started from Pat's script and found an easier way to reproduce it (note the final ampersand, which is the trick):

cat ./bombard.sh
#!/bin/bash
i=$1
while [ $i -gt 0 ]; do
    dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
    ((i--))
done

On my MX4 Arale "./bombard.sh 4" is enough to get usensord stuck

Zhang Enwei (zhangew401) wrote :

Thanks Andrea. I am using a similar way with that of yours, only four commands executed simultaneously.

#!/bin/bash
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &
dbus-send --session --print-reply --type=method_call --dest='com.canonical.usensord' /com/canonical/usensord/haptic com.canonical.usensord.haptic.Vibrate uint32:100 &

I think this could simulate the operation in clock/alarm
I found the rule is that two or more senders(thread or process) calls the dbus function at the same time.
It freezes at below code,
http://bazaar.launchpad.net/~phablet-team/usensord/trunk/view/head:/haptic/haptic.go#L146

Andrea Bernabei (faenil) wrote :

Here is the stracktrace of all the goroutines running at http://bazaar.launchpad.net/~go-dbus/go-dbus/v1/view/136/dbus.go#L302

i.e. I printed the stacktrace of all the goroutines right before the "reply := <- replyChan"

(disclaimer: I have 0 Go knowledge so take my words here with a pinch of salt, the method I used comes from http://stackoverflow.com/a/19094539 )

Andrea Bernabei (faenil) wrote :

Hi Zhang, more precisely, it seems at least one of the routines gets stuck at waiting for a reply on replyChan

i.e. line 303 (see above)

Andrea Bernabei (faenil) wrote :

another stacktrace, this time dispatchMessage does not appear in the list of the calls, just fyi.

It could be that it had already completed or had not started yet, not sure. Just thought I'd share this trace as well

Andrea Bernabei (faenil) wrote :

Additional info:

I built usensord on my laptop using
"go run -race usensord.go"
and that did not show any race condition after usensord stalled.

I also tried "go run -race test.go" using the code from https://golang.org/doc/articles/race_detector.html#Introduction
so I think the race detector setup is working, it's just there isn't any race condition in usensord or go-dbus

My laptop has
go version go1.6.2 linux/amd64

Zhang Enwei (zhangew401) wrote :

Landing:https://requests.ci-train.ubuntu.com/#/ticket/1930
I verified vivid debian package on below images by creating 6 simultaneous alarms that will ring one minute later.
current build number: 827
device name: krillin
channel: ubuntu-touch/rc-proposed/bq-aquaris.en-proposed
last update: 2016-09-09 23:42:32
version version: 827
version ubuntu: 20160907
version device: 20160606-ab415b2
version custom: 20160831-991-38-18

Andrea Bernabei (faenil) wrote :

And the race detector correctly showed the data race in test.go, just not in usensord

Changed in usensord (Ubuntu):
status: In Progress → Fix Committed
Changed in canonical-devices-system-image:
status: In Progress → Fix Committed
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers