Occassional severe battery drain

Bug #1183065 reported by Sam Bull
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
phone-app
Invalid
Undecided
Unassigned
touch-preview-images
Fix Released
High
Mathieu Trudel-Lapierre
qmenumodel (Ubuntu)
Fix Released
Undecided
Unassigned
unity8 (Ubuntu)
Fix Released
Undecided
Nick Dedekind

Bug Description

Occasionally the battery is drained really quickly, something like 10% an hour when doing nothing, when it normally uses about 2-3% an hour.

From what I can tell, this seems to be something to do with wifi. It seems to happen after losing a wifi connection (without disconnecting). This might happen from walking out of the range of the wifi connection, or from the connection being switched off (such as a hotspot being switched off) while the phone is connected.

I have not noticed any indication of when this problem occurs, the only thing I can tell is when a couple of hours later my battery is dead or dying.

Using Nexus 4.

Steps to reproduce (hopefully):
 - Set up an Android phone as a hotspot.
 - Connect your Ubuntu phone using Wifi to the Android hotspot.
 - Disable the hotspot on the Android device (without disconnecting on Ubuntu). (Might also need to make sure it's not going to reconnect to another access point in the area)
 - Compare battery level in an hour or two.

Tags: dogfood

Related branches

Sam Bull (dreamsorcerer)
description: updated
Sam Bull (dreamsorcerer)
summary: - Severe battery drain
+ Occassional severe battery drain
Revision history for this message
Ricky Chan (djghosie) wrote :

I have similar synptom, I hope this logcat helps.

http://pastebin.com/5KYZ4Tmt

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Please add the contents of /var/log/syslog at the time this happens

Changed in touch-preview-images:
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
importance: Undecided → High
Revision history for this message
Ricky Chan (djghosie) wrote :

  I've trialed some replication method for the heating up / battery
  drain issue. Having checked in terminal, the two services are taking
  up most of the CPU usage:

  1. Maliit-server (~35%)
  2. qml-phone-shell (~35%)

  In order to reproduce the battery drain, I tried the followings:

  1. Let the phone to connect to a registered wifi connection
  2. Move to an area where there's no wifi network which is registered by the phone.
  3. Then slowly but surely the phone will heat up.

  Seems like only way to stop the qml-phone-shell from consuming the CPU is to have the phone plugged to a computer.

  It seems like when the current connection is lost, the phone will try frantically to find another connection.

  It's  happening to many previous builds. I'm currently on build #154
  using Mako.

tags: added: dogfood
Michał Sawicz (saviq)
Changed in unity:
status: New → Invalid
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Please add the contents of /var/log/syslog.

I'm not sure how to explain it yet -- NetworkManager itself shouldn't be at cause, since otherwise we'd run into the same battery issues on desktop images. This needs further investigation.

Revision history for this message
Rick Spencer (rick-rickspencer3) wrote :

Here's what I did:

1. I rebooted the phone at my house and let it connect to my wireless network
2. I went outside walking until the wireless disconnected
3. I turned on cellular data via the terminal app
4. I opened the facebook app
5. I walked a little
6. I turned off cellular data via the terminal app
7. I ran top and observed the high cpu utilization
8. I returned home and let the NM reconnect to my wireless network

I attach the resultant syslog

Revision history for this message
Alexander Sack (asac) wrote :

from quick look at log it seems there are a bunch of things looping.

Think its worth if power folks as well as rild/wifi and NM folks take a look and figure whats going on here (CCed a few)...

to me it looks a bit like we loop in suspend resume, but...

Revision history for this message
Seth Forshee (sforshee) wrote :

A couple of comments from the log:

* powerd tries to suspend the device, but often suspend is being blocked by smd_sns_dsps. I suspect this has something to do with sensors. Note that on the nexus 4 the driver for the sensors is completely in userspace, so it isn't tied into the kernel suspend/resume mechanism. We may need to get the sensor code listening for powerd state change notifications to quiesce the hardware before we suspend.

* There are a lot of wakeups from suspend. The logs don't always tell us the source of the wakeup, but there are a lot of "wakeup wake lock: smd_sns_dsps" messages which tell that this lock is taken very shortly after resuming. Also note the "suspend: too many immediate wakeups, back off" messages, which indicate that the device is frequently going into suspends for very short durations of time.

Revision history for this message
Seth Forshee (sforshee) wrote :

I see similar issues with smd_sns_dsps in the logs on my nexus 4 but am not seeing the exceptionally high battery drain described by this bug. So I don't know that it's behind the specific problem described here, but I do note that the nexus 4 does not last nearly as long on a charge as does the galaxy nexus when left sitting unused on my desk.

Revision history for this message
Alexander Sack (asac) wrote :

one question on the wake ups: could we improve logging of wake ups so we know better what is causing what?

Revision history for this message
Seth Forshee (sforshee) wrote : Re: [Bug 1183065] Re: Occassional severe battery drain

On Mon, Jun 10, 2013 at 01:35:21PM -0000, Alexander Sack wrote:
> one question on the wake ups: could we improve logging of wake ups so we
> know better what is causing what?

Possibly. That would be a kernel-level effort that probably depends on
very low-level details of the SoC architecture and possibly even the
board design, and thus will have to be repeated for each device we
support. We may also need access to the developer manuals for the SoCs,
and I don't know which are publicly available, if any.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I've been doing more investigation specifically on wifi: I can't confirm that this is what causes the battery drain, but it does look like wpasupplicant gets scan requests every few seconds.

I'm looking to see what causes this, if there is anything that needs to be done to increase the delay of these scans, to see if it improves the situation. It certainly at least causes fairly frequent wakeups for the wcnss driver, so it could be a factor in battery usage.

Revision history for this message
Tony Espy (awe) wrote :

I dealt with a similar issue with a MID ( Mobile Internet Device ) while working on Moblin v1.

At the time, what I discovered was that scanning wasn't consuming that much power, the fact that the radio was enabled at all was the culprit. So even if you're doing passive scanning, the fact that the radio is on at all is the problem.

When associated, the device can use Wi-Fi PS mode, which by default makes the radio periodically cycle into a low power mode. In this MID project, I had actually proto-typed a NM patch that made NM disable the radio between scan periods ( ~2min ).

I haven't looked at the core NM scanning logic in quite some time, so this may already be happening, but I thought it was still worth mentioning.

Has anyone benchmarked this on a Nexus 4 running vanilla JB or Cyanogenmod? I know my personal Nexus Galaxy burns through battery like there's no tomorrow when Wi-Fi is enabled. I haven't tried to compare with & without association, but I bet I'd see the same results.

Also, has anyone looked at the Wi-Fi driver to see whether or not it utilizes kernel wake_locks?

Revision history for this message
Seth Forshee (sforshee) wrote :

On Mon, Jun 10, 2013 at 10:00:43PM -0000, Tony Espy wrote:
> Also, has anyone looked at the Wi-Fi driver to see whether or not it
> utilizes kernel wake_locks?

There's definitely wake lock use in the driver. However the logs don't
show that these locks are ever preventing suspend, so they don't seem to
be causing problems in this scenario.

Revision history for this message
Tony Espy (awe) wrote :

So much for my theory in comment #1. Apparently the Wi-Fi power management on Android is much better than I expected.

Last night I activated Wi-Fi on my Nexus Galaxy (maguro) running JB last night, and with all other apps killed and on 2G, I barely saw a dent in the battery capacity. At the end of ~24 hours, it'd lost only ~30%.

Also, I missed the part in the initial description of disabling the AP ( or moving outside the AP's effective range ), so I did re-associated this morning, and then disabled from the AP. After two hours, this doesn't really seem to be having any more significant effect on battery usage either.

I've rebuilt a version of CM10.1 and will do some testing with wpa_supplicant's debug logging enabled ( and verbose ).

We've discussed that Android may be using device-specific driver libraries which may in turn use private ioctls. Hopefully I can verify use of these libraries via wpa_supplicant debug logging.

Revision history for this message
Michał Sawicz (saviq) wrote :

This doesn't feel like a unity8 bug, but please correct me if I'm wrong.

Revision history for this message
Colin Ian King (colin-king) wrote :

It may be helpful to gather some data to see what's going on. We may be able to get enough insight using powerstat, install it on the device from my PPA with:

sudo add-apt-repository ppa:colin-king/white
sudo apt-get update && sudo apt-get install powerstat

and run it with:

sudo powerstat -d 0 -s > powerstat.log &

let it run for the duration of your testing and kill it with SIGINT,

sudo kill -SIGINT `pidof powerstat`

and attach powerstat.log to the bug report for analysis.

Revision history for this message
Tony Espy (awe) wrote :

Changed status to Confirmed, as none other than our fearless leader @rick-rickspencer3 confirmed this early on...

Changed in touch-preview-images:
status: New → Confirmed
Revision history for this message
Michael Zanetti (mzanetti) wrote :

This is 100% reproducable for me on the Nexus 4 (strangely not so much on the Galaxy Nexus). At latest after 3 hours of uptime unity8 goes crazy with around 70% CPU usage.

Here's the strace output of unity8 while this happens.

read(3, "\2\0\0\0\0\0\0\0", 16) = 8
futex(0x2c481c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2c4818, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x2c4800, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2c4644, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2c481c, FUTEX_WAIT_PRIVATE, 1469943, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x2c4800, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {15457, 541909444}) = 0
clock_gettime(CLOCK_MONOTONIC, {15457, 542214648}) = 0
poll([{fd=3, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=34, events=POLLIN}, {fd=35, events=POLLIN}, {fd=45, events=POLLIN}, {fd=43, events=POLLIN}], 7, -1) = 1 ([{fd=3, revents=POLLIN}])
read(3, "\2\0\0\0\0\0\0\0", 16) = 8
futex(0x2c481c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x2c4818, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x2c4800, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2c4644, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2c4800, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2c4818, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x2c481c, FUTEX_WAIT_PRIVATE, 1469947, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x2c4800, FUTEX_WAKE_PRIVATE, 1) = 0
clock_gettime(CLOCK_MONOTONIC, {15457, 564097757}) = 0
clock_gettime(CLOCK_MONOTONIC, {15457, 564494521}) = 0
poll([{fd=3, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=34, events=POLLIN}, {fd=35, events=POLLIN}, {fd=45, events=POLLIN}, {fd=43, events=POLLIN}], 7, -1) = 1 ([{fd=3, revents=POLLIN}])

This block, here listed twice, is repeated over and over in full speed. 2 minutes of stracing created 16M of output repeating this block.

Looks like unity tries to read form a file descriptor (3) which is not available any more.
lrwx------ 1 phablet phablet 64 Jul 28 10:27 /proc/690/fd/3 -> anon_inode:[eventfd]

Attaching gdb and interrupting it does not give any thing useful.

Any other hints how to debug this further?

Revision history for this message
Michael Zanetti (mzanetti) wrote :

Yep, this happens when chewie-network-menu-server crashes. You see the network indicators contents flicker and regenerate. After that chewie-network-menu-server's PID is increased, which means it was respawned due to a crash and starting from there on, unity8 spins the CPU.

It's actually not that hard to reproduce, just delete all you network configs and freshly connect to a network. Chances are pretty high it directly crashes, or, if not, disconnect and reconnect, select some other network while it tries to connect to one etc. It will crash eventually.

So I think two issues here:
a) chewie-network-menu-server shouldn't crash
b) unity8 shouldn't go wild if the backend service crashes

Stephen M. Webb (bregma)
no longer affects: unity/8.0
no longer affects: unity
Michał Sawicz (saviq)
Changed in unity8:
status: New → In Progress
assignee: nobody → Nick Dedekind (nick-dedekind)
Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:unity8 at revision 173, scheduled for release in unity8, milestone Unknown

Changed in unity8:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unity8 - 7.81.3+13.10.20130808-0ubuntu1

---------------
unity8 (7.81.3+13.10.20130808-0ubuntu1) saucy; urgency=low

  [ Michał Sawicz ]
  * Re-enable battery slider test.

  [ Nick Dedekind ]
  * Workaround for non-deletion of indicator page menu items. (LP:
    #1183065, #1206991)

  [ Albert Astals ]
  * LVPWH: Fix regression handling the sticky top section culling r166
    introduced the regression, this fixes it+tests.
  * Implement+test the maximizeVisibleArea function Tries to show as
    much possible of an index that is already shown on screen Will be
    used for the dash category expansion.

  [ Ubuntu daily release ]
  * Automatic snapshot from revision 174
 -- Ubuntu daily release <email address hidden> Thu, 08 Aug 2013 15:23:17 +0000

Changed in unity8 (Ubuntu):
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qmenumodel - 0.2.7+13.10.20130827.3-0ubuntu1

---------------
qmenumodel (0.2.7+13.10.20130827.3-0ubuntu1) saucy; urgency=low

  [ Nick Dedekind ]
  * UnityMenuModel::changeState now uses current state parameter type as
    a base for parameter GVariant conversion.
  * Adds qt event spawning to direct glib main loop callbacks. (LP:
    #1183065, #1206991)

  [ Ubuntu daily release ]
  * Automatic snapshot from revision 78
 -- Ubuntu daily release <email address hidden> Tue, 27 Aug 2013 14:08:51 +0000

Changed in qmenumodel (Ubuntu):
status: New → Fix Released
Michał Sawicz (saviq)
Changed in unity8:
status: Fix Committed → Fix Released
Sam Bull (dreamsorcerer)
Changed in touch-preview-images:
status: Confirmed → Fix Released
Sam Bull (dreamsorcerer)
Changed in phone-app:
status: New → Invalid
Michał Sawicz (saviq)
Changed in unity8 (Ubuntu):
assignee: nobody → Nick Dedekind (nick-dedekind)
no longer affects: unity8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.