BQ E4.5 OTA-13: repowerd turn_off() display and never turn_on() again

Bug #1632573 reported by Matthias Apitz on 2016-10-12
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Canonical Device Images
Undecided
Unassigned
Canonical System Image
Critical
Unassigned
repowerd
High
Alexandros Frantzis

Bug Description

I have had the device completely black, but functional:
- one could SSH into it
- it was ringing on incoming call
- one could control the audio volume with the buttons
- green LED showed missed call
But all this with black display.

Long press on power-button did not helped. It may be that it
put the Restart/Power-odd/Cancel menu on screen, but if so it
was not visible.

It seems in addition that the /userdata slice had no space left on device
because it was impossible to save 'ps ax > ps.txt' in phablet's HOME.
After re-boot, a df showed 3 GB free.

I have below the interesting lines of the log /var/log/repowerd.log; it
shows that

1. at 2016-10-09T18:04:19 UnityDisplayPowerControl: turn_off()
2. never turned on again, not even on call at 2016-10-09T18:21:24
3. at 2016-10-09T20:41:46 restart of repowerd[1020]: main: Starting repowerd 2016.08.3
   due to device reboot via 'sudo reboot' from SSH session

(log file deleted; will attach it gzip'ed)

Alexandros Frantzis (afrantzis) wrote :

Thanks for reporting.

From the logs the problem seems to be that under some circumstances we can deadlock the dbus processing thread when suspending and waking up.

Changed in repowerd:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Alexandros Frantzis (afrantzis)
Alexandros Frantzis (afrantzis) wrote :

> From the logs the problem seems to be that under some circumstances we can deadlock the
> dbus processing thread when suspending and waking up.

To be more precise, we can deadlock the UnityScreenService DBus processing thread.

Alexandros Frantzis (afrantzis) wrote :

Could you please attach /var/log/syslog? It could provide more hints about the exact cause of this problem.

Matthias Apitz (gubu) wrote :

/var/log/repowerd.log

description: updated
Changed in repowerd:
status: In Progress → Triaged
Matthias Apitz (gubu) wrote :

/var/log/syslog can not be provided due to roll-over (which for sure is a bug too not having gzip'ed copies of older logs);

Alexandros Frantzis (afrantzis) wrote :

> /var/log/syslog can not be provided due to roll-over (which for sure is a bug too not having gzip'ed copies of older logs);

That's unfortunate and indeed a problem. I have filed a bug about the log rotation, along with branch with a solution (see [1]). Of course, the log rotation fix doesn't help us at this point.

Unfortunately I haven't been able to reproduce the repowerd issue locally yet, which makes investigation difficult. If you come across this problem again please attach the new /var/log/repowerd.log and /var/log/syslog.

[1] https://bugs.launchpad.net/ubuntu/+source/lxc-android-config/+bug/1633108

Matthias Apitz (gubu) wrote :

I'm attaching the full file repowerd.log.1.gz, maybe this gives some more information about what happened before the reboot at the above mentioned time;

Matthias Apitz (gubu) wrote :

It seems that the problem is reproduceable. My colleague writes:

Steps before the black screen:

- activated hotspot feature in BQ device while still was connected
  via SSH Wifi into the device; SSH session was killed by this;
- rebooted the netbook (an Acer C720 Chromebook, running FreeBSD)
- surfing Internet from the C720 via BQ hotspot
- deactivated hotspot

Maybe she activated/deactivate hotspot in BQ more than once. The event occured
around 21:41. The /var/log syslog was then spammed until end with always the
same lines:

Oct 13 21:41:33 ubuntu-phablet kernel: [30865.060738]*********wlanP2P_early_suspend************
Oct 13 21:41:33 ubuntu-phablet kernel: [30865.060766]*********p2pEarlySuspend************
Oct 13 21:41:33 ubuntu-phablet kernel: [30865.060784]ip is not avaliable.

She says in addition that /userdata file system was 100% full, after
reboot again normal (some 3 GB free). How is this possible?

I will attach syslog.gz and repowerd.log.gz

Matthias Apitz (gubu) wrote :
Matthias Apitz (gubu) wrote :

additional information after chat with OP about the file size of syslog: it seems that the turning of the syslog file at size of 20 MByte does not work in this situation; she transfered the attached file by SCP to her netbook and when this ended, the syslog file in the BQ was already grown to 500 MByte; this explains why "no space left" in has in addition the risk that such a device is not booting anymore!!!

Matthias Apitz (gubu) wrote :

OP just said, that she was able to reproduce it again right now;

Thanks Matthias, the syslog helps a lot.

A first assessment of the situation:

1. Phone tries to suspend while in hotspot mode ("wlanP2P")
2. Suspend goes into an infinite loop in the kernel, which has two effects:
  a. blocks repowerd, since repowerd waits for suspend to succeed
  b. spams syslog with messages and fills the partition. Logrotate kicks in once per hour,
     so chances are it's not able to rotate before the partition is already full. It seems
     the logs are also rotated during boot, that's why it gets cleared.

So, at it's core this seems to be a kernel issue, and repowerd blocking is just a side effect.

I will add the "devices" project to this bug, so the kernel (or other relevant) team can get involved.

Matthias Apitz (gubu) wrote :

I'm a bit surprised because I have my BQ E4.5, as well with OTA-13, 18/24 hours as hotspot to Internet and it *never* suspends; it shoots down the display but is always alive; maybe the diff is that I run on the netbook which uses the BQ as AP every 60 secs fetchmail to pull down the mails from my provider and this disables the suspend; I will switch this off and check with tcpdump if all IP is silent to the AP;

> I'm a bit surprised because I have my BQ E4.5, as well with OTA-13, 18/24 hours as hotspot to Internet and it *never* suspends;

This is correct. The 'connectivity-service' acquires a suspend blocker when in hotspot mode.

Looking at the relevant part of the log (a bit before the spam starts) in more detail, what seems to be happening is:

1. Hotspot mode is activated (@21:29:59)
2. Suspend blocker acquired "dbus_requestSysState(:1.80,connectivity-service,1) => 1248" (@21:29:30)

3. Hotspot mode is deactivated (@21:39:30)
4. Suspend blocker released "dbus_clearSysState(:1.80,1248)" (@21:39:31)

5. Normal wifi connection is activated (@21:39:32)

6. Suspend triggered (@21:41:33)
7. For some reason early suspend is called for the wlanP2P module and fails (Does the kernel think we are still in hotspot mode?)
8. goto 7

There are also a few:

wpasupplicant main process (NNNN) killed by SEGV signal

messages, which may or may not be related to the issue at hand.

Matthias Apitz (gubu) wrote :

one observation about this: it seems that the OP switched-off hotspot-mode which let the Wifi active and this in turn is re-connecting to the AP SSID Wehlan;

when I end hotspot-mode in my BQ, I always use "switch-off Wifi" (which switches-off the hotspot too) because I do not want that the BQ connects to an unknown or unwanted AP;

Matthias, I have been able to reproduce this with the following steps (wait a couple of seconds between the steps):

1. Enable hotspot mode
2. Disabled hotspot mode
3. Press/release power button to turn off screen
4. Press/release power button to turn on screen

You may need to repeat the sequence a few times before this fails. The failure actually occurs in step (3) (e.g. that's the point where syslog starts to get spammed), but it's at point (4) that the user notices because they can't turn on the screen, due to repowerd being blocked waiting in vain for suspend to happen.

I have found it makes no difference if usb is plugged in and there is an active ssh connection (e.g. with the phablet-shell tool).

El día Friday, October 14, 2016 a las 01:23:11PM -0000, Alexandros Frantzis escribió:

> Matthias, I have been able to reproduce this with the following steps
> (wait a couple of seconds between the steps):
>
> 1. Enable hotspot mode
> 2. Disabled hotspot mode
> 3. Press/release power button to turn off screen
> 4. Press/release power button to turn on screen
>
> You may need to repeat the sequence a few times before this fails. The
> failure actually occurs in step (3) (e.g. that's the point where syslog
> starts to get spammed), but it's at point (4) that the user notices
> because they can't turn on the screen, due to repowerd being blocked
> waiting in vain for suspend to happen.
>
> I have found it makes no difference if usb is plugged in and there is an
> active ssh connection (e.g. with the phablet-shell tool).

Shouldn't we send a big warning to the list?

 matthias

--
Matthias Apitz, ✉ <email address hidden>, ⌂ http://www.unixarea.de/ ☎ +49-176-38902045
Einheitsfeier? Nein, danke! Kann ich bitte mein Land wiederhaben und am 7. Oktober feiern.

Pat McGowan (pat-mcgowan) wrote :

Is this reproduced on ota13 or proposed?

Changed in canonical-devices-system-image:
importance: Undecided → Critical
milestone: none → 14
status: New → Confirmed
tags: added: regression-release
Changed in canonical-devices-system-image:
assignee: nobody → John McAleely (john.mcaleely)
Pat McGowan (pat-mcgowan) wrote :

sorry I see the title says 13

Changed in canonical-devices-system-image:
milestone: 14 → backlog

On Friday, 4 November 2016 18:18:59 CET, Pat McGowan
<email address hidden> wrote:
> ** Changed in: canonical-devices-system-image
> Milestone: 14 => backlog
>

Why this important bug is moved to backlog?

--
Sent from my Ubuntu phone
http://www.unixarea.de/

Changed in canonical-devices-system-image:
assignee: John McAleely (john.mcaleely) → nobody
Robie Basak (racb) wrote :

This just happened to me on OTA-15. rsyslogd was using 100% CPU. dmesg is full of:

[286425.216330] *********wlanP2P_early_suspend************
[286425.216339] *********p2pEarlySuspend************
[286425.216349] ip is not avaliable.
[286425.216355] *********wlanP2P_early_suspend************
[286425.216366] *********p2pEarlySuspend************
[286425.216376] ip is not avaliable.
[286425.216385] *********wlanP2P_early_suspend************
[286425.216394] *********p2pEarlySuspend************
[286425.216401] ip is not avaliable.

/var/log/syslog contains this and a bunch of truncated lines like this:

Apr 13 13:25:37 ubuntu-phablet kernel: [2864
Apr 13 13:25:37 ubuntu-phablet kernel: [28647
Apr 13 13:25:37 ubuntu-phablet kernel: [28647
Apr 13 13:25:37 ubuntu-phablet kernel: [2864
Apr 13 13:25:37 ubuntu-phablet kernel: [28647
Apr 13 13:25:37 ubuntu-phablet kernel: [2864
Apr 13 13:25:37 ubuntu-phablet kernel: [286474
Apr 13 13:25:37 ubuntu-phablet kernel: [28647
Apr 13 13:25:37 ubuntu-phablet kernel: [2864
Apr 13 13:25:37 ubuntu-phablet kernel: [2864
Apr 13 13:25:37 ubuntu-phablet kernel: [28647
Apr 13 13:25:37 ubuntu-phablet kernel: [28647

I had used the hotspot earlier today. I thought I'd turned it off before putting my phone away, but I may have forgotten.

I heard a notification sound. The screen was blank (but on I think - faint background visible) and the power button unresponsive.

I have ssh enabled and managed to get in that way to find the logs as above. I rebooted ("sudo reboot") and saw the notification: "Low on disk space" reporting 0%.

"sudo du -hs /var/log" says 1.5G and /var/log/syslog is 1.5G so I assume the log flooding caused the secondary problem.

I did:

sudo -s
service rsyslog stop
>/var/log/syslog
service rsyslog start

to fix the problem (after the reboot).

A catch here is that with 100% CPU burn users may not notice the problem before the battery dies.

Robie Basak (racb) wrote :

This is on a bq Aquaris 4.5.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers