Slow/hanging performance

Bug #1480844 reported by David Planella on 2015-08-03
52
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Undecided
Unassigned

Bug Description

Reported on the ubuntu-phone mailing list by user Chris Go. While I can confirm this behaviour, I don't have a way to reproduce it. A few things I have noticed:

- Using krillin, stable channel, OTA-5
- It seems to happen when the phone has been on for a while
- A reboot is a workaround to fix it, until next time it happens
- While noticeable slowness in registering the digits you press on the lock screen is a symptom of this issue, sometimes the digits are slow to register, but after a while the phone becomes responsive again.
- Until a reboot, the phone effectively becomes unusable. It feels like a process is taking 100% of the CPU, but I've not managed to launch a terminal to confirm yet. Generally, it happens in a day-to-day situation on the street where you need to use the phone as opposed to being able to debug it.

Original post:
Since about a month I got those situations where my phone becomes very very slow and unresponsive.

When I press power button to unlock I try to type my unlock password, but nothing happens and after a few seconds entire password pops in (however incorrect), I try to type again - I got vibration as a feedback, but no dots representing each number form the pin.

I manage to unlock the phone, but still - phone does not react to any gesture. Even when it finally starts reacting (about 30 seconds) - those reactions are very slow.

I normally try to keep dialling app open all the time (in background) so I don't have to wait for it to load each time I want to use it. But even when I manage to switch to it - again it takes time for the phone to react to my touches.

I cannot figure when does it happen. Usually in the morning after couple hours of inactivity, but there is hardly any rule to it.

David Planella (dpm) on 2015-08-03
description: updated
Davide Alberelli (dadexix86) wrote :

One way that I have to reproduce it (because it happens always to me) is being connected to wi-fi (I am at home), then move to an area with no connectivity (down the stairs, no wi-fi or mobile connection), finally move to an area with only mobile connection.

At this point I believe that something starts using a lot of CPU and the phone hangs but I cannot investigate what it is, because when it goes back to being responsive everything is fine.

In my case I had a photograph as background, and reverting to use the default ubuntu theme seemed to improve things quite a bit.

JkB (joergberroth) wrote :

Exactly what davide wrote for me.

Michael Terry (mterry) wrote :

This has been bugging me. Here are some facts, followed by speculation:

= Facts =

- I have a mako, using latest stable channel image (at time of this writing, 22 -- OTA5 in other words).

- A reboot will fix it for some amount of time.

- I can reproduce this fairly reliably. Especially when walking around my neighborhood (a busy urban area). For example, I watched it start happening this morning when leaving the subway and entering my neighborhood.

- I've attached a screenshot of 'top' when this happened to me once. It shows dbus-daemon at 100% cpu. Other processes occasionally flair up briefly, but settle down shortly. dbus-daemon is constant though.

- I've looked at top other instances of this bug, and it looks the same. So I think it's reliably dbus-daemon that is the 100% cpu culprit.

- It really takes a toll on wifi detection. I walked from my apartment to a coffee shop. And hit this bug on the way. Once in the coffee shop, I wanted to make sure I was using the shop's wifi. I opened the network indicator, and boy was it having a hard time. It was showing me still connected to my home network. And slowly started updating with wifi networks along the way to the shop. In chunks of updates, not smoothly. It *eventually* got to the shop's wifi, but only after at least a minute I recall. A real minute, not a fake 10s, it-felt-like-forever minute.

= Speculation =

- I think this is switching-network related (or maybe just a bunch of wifi networks coming in and out of scan range?). When I left the subway into my neighborhood, I went from 3G->wifi (for a nearby shop I've connected to before). When I walk in my neighborhood, I'm both connecting to temporary wifi (for shops I pass that I've been to before), and just scanning a bunch of networks (for shops that I've not connected to before, but line the street).

- This may be confirmation bias, or just the daemon trying to keep up, but it feels like NetworkManager spikes CPU in 'top' more than other processes like unity8. Nothing comes close to dbus-daemon, but still.

Changed in canonical-devices-system-image:
status: New → Confirmed
Michael Terry (mterry) wrote :

Whoops, here's the same screenshot rotated.

Michael Terry (mterry) wrote :

OK, I decided to try to get some data. I enabled full DBus debugging [1] and started the Terminal app with dbus-monitor --system running.

I walked from my 3rd-floor apartment to the front door, with Terminal open, watching top in a second tab. The attached log (dmon2.log) is what resulted. dbus-daemon spiked around 90% as I got to the bottom. Then settled. So not the constant spiking that we've seen in this bug. But something.

I will also attach a second file (dmon3.log) that is me walking from my front door down the street, to be away from any vestige of my wifi. dbus-daemon spiked again for a solid chunk of time. Eventually it finished, around the same time that the network-indicator finally updated to notice that we weren't on wifi anymore.

Unfortunately, dbus-daemon doesn't do timestamps by default. So I made another walk out and back again with the --profile argument. Which isn't as verbose, but provides timestamps. That's attached as dmon5.log.

I haven't done much analysis of these yet. But looks like a lot of cpu-device-online signals from upstart, wireless properties changing from NetworkManager, and attempts to upload apport crash files. Will dig into it, but wanted to provide raw data first.

[1] https://wiki.ubuntu.com/DebuggingDBus

Michael Terry (mterry) wrote :
Michael Terry (mterry) wrote :
Michael Terry (mterry) wrote :

More testing!

So when dbus-daemon spikes, other spiking programs (at ~30% CPU) I've seen are NetworkManager, posclientd, and initctl (at various times).

But I did note that initctl stayed around longer than the others. That coupled with the frequent ping-ponging of cpu-device-online and cpu-device-offline was making me wonder if maybe we weren't letting the CPU properly sleep because of all the DBus noise we created when noticing that the CPU was trying to sleep (or at least were DoSing the DBus channel with such notices).

I'm not 100% on that theory, since I thought DBus was smart about sending signals only when there was a listener (though my use of dbus-monitor might count there and be giving us non-realistic results).

But anyway. On that theory, I tried commenting out [1] the bit in upstart where we send DBus signals for each upstart signal. I'm not sure anyone uses that functionality. Most code uses upstart directly.

With that, my spikes were much shorter. Only saw dbus-daemon go to 75%, instead of 100%. And initctl never spiked. (NetworkManager and posclientd did still.) The network indicator gave quicker feedback about switching between wifi/3G, presumably because it wasn't blocked on DBus as much.

I'm not entirely convinced that these short-ish (~30s) spikes I've been testing with are the same cause as the longer-term spikes this bug originated with (and I've seen before). Needs more testing. I'll try to use my patched phone around town more.

And even if this patch does fix it, it's a big hammer with unknown side effects. A more elegant solution is desirable.

[1] Upstart patch:

=== modified file 'init/control.c'
--- init/control.c 2014-08-14 11:19:43 +0000
+++ init/control.c 2015-08-24 20:33:12 +0000
@@ -1284,7 +1284,7 @@
 control_notify_event_emitted (Event *event)
 {
        nih_assert (event != NULL);
-
+/*
        control_init ();

        NIH_LIST_FOREACH (control_conns, iter) {
@@ -1294,6 +1294,7 @@
                NIH_ZERO (control_emit_event_emitted (conn, DBUS_PATH_UPSTART,
                                                            event->name, event->
        }
+*/
 }

 /**

Michael Terry (mterry) wrote :

Did some more testing today. I don't think my patch is helping as much as I thought. I still got the slowdown for significant periods of time. But I'm more convinced that it's the interaction between NetworkManager and dbus-daemon than ever.

My phone got really slow leaving my wifi zone. The network indicator hadn't yet updated to 3G though. 'top' indicated dbus-daemon at 100% again. 'dbus-monitor --system' showed driblets of NetworkManager signals coming through (not rapid fire, so it's not like dbus-daemon is simply operating at max capacity, it's actually in some weird state). But once the indicator showed 3G, my phone immediately was back to normal. (Until I started moving around again and hitting more wifi areas.)

So I definitely think NetworkManager is wigging dbus-daemon out. But I'm not yet sure in which package the bug lies.

My testing with my patch applied (yesterday and today) was done on rc-proposed + custom-built-upstart:
current build number: 95
device name: mako
channel: ubuntu-touch/rc-proposed/bq-aquaris.en
last update: 1970-01-01 09:51:34
version version: 95
version ubuntu: 20150821
version device: 20150819
version custom: mako-1.1

Pat McGowan (pat-mcgowan) wrote :

I am marking this a dupe of bug #1480877 as both bugs have valuable data and analysis. Lets collect more data there

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers