[webapp-container] when using multiple webapps they crash randomly, if there is only one app remaining, this one starts being replaced when new ones start

Bug #1303676 reported by Alan Pope 🍺🐧🐱 πŸ¦„ on 2014-04-07
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Oxide
Undecided
Unassigned
unity-mir
Undecided
Unassigned
webbrowser-app (Ubuntu)
Critical
Unassigned

Bug Description

I have a bunch of webapps I use regularly which use the webapp-container. After a short period they nearly all die, leaving one running.

To reproduce. Install BBC News, BBC Sport, Guardian, Untapped, giffgaff, OpenStreetMap (not OSMTouch) (yes, all my apps [not trying to bump my stats] which use webapp-container)

Reboot device
Start each one in turn, wait for it to load then switch back to app lens and start the next.
I started:- BBC News, BBC Sport, The Guardian, OpenStreetMap, Untapped, giffgaff
At this point I have 6 running:- http://popey.com/~alan/phablet/device-2014-04-07-102106.png
One of them died: http://popey.com/~alan/phablet/device-2014-04-07-102114.png
(in this case OpenStreetMap)
I then started G+
Now more died:- http://popey.com/~alan/phablet/device-2014-04-07-102450.png

So I started BBC News, BBC Sport, The Guardian, OpenStreetMap, Untapped, giffgaff, G+ and I'm left with G+, Giffgaff, The Guardian.

Time passes... I'm left with giffgaff and The Guardian now. http://popey.com/~alan/phablet/device-2014-04-07-102930.png

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: webbrowser-app 0.23+14.04.20140403-0ubuntu1
Uname: Linux 3.4.0-5-mako armv7l
ApportVersion: 2.14.1-0ubuntu1
Architecture: armhf
Date: Mon Apr 7 10:04:11 2014
InstallationDate: Installed on 2014-04-07 (0 days ago)
InstallationMedia: Ubuntu Trusty Tahr (development branch) - armhf (20140407)
SourcePackage: webbrowser-app
UpgradeStatus: No upgrade log present (probably fresh install)

Oliver Grawert (ogra) on 2014-04-07
Changed in webbrowser-app (Ubuntu):
status: New → Confirmed
description: updated
Oliver Grawert (ogra) wrote :

for me this scenario ends in only being able to open one webapp which then gets replaced by the new one i open.

summary: - webapps seem to randomly crash after a while
+ [webapp-container] webapps seem to randomly crash after a while
Didier Roche (didrocks) on 2014-04-07
Changed in webbrowser-app (Ubuntu):
importance: Undecided → High
importance: High → Critical

do you have logs or crash reports to share?

Oliver Grawert (ogra) wrote :

i can give you logs for my apps, but there are no crash files at all ...

Oliver Grawert (ogra) wrote :

just had these apps above open ... attached all teh logs after i was only left with an empty and grey G+ one

Didier Roche (didrocks) on 2014-04-07
summary: - [webapp-container] webapps seem to randomly crash after a while
+ [webapp-container] webapps seem to randomly be replaced after a while
+ with another webapp
summary: [webapp-container] webapps seem to randomly be replaced after a while
- with another webapp
+ with another webapp or die
Oliver Grawert (ogra) on 2014-04-07
summary: - [webapp-container] webapps seem to randomly be replaced after a while
- with another webapp or die
+ [webapp-container] when using multiple webapps they crash randomly, if
+ there is only one app remaining, this one starts being replaced when new
+ ones start
Download full text (5.7 KiB)

I just tried again, and when I hit the 4th app they all start failing so I now have one.
However, in the process list I still see them, they just disappear from unity.

phablet 4922 0.0 0.8 232980 15232 ? T 13:13 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://*.bbc.co.uk/news*,https?://*.bbc.com/news* http://m.bbc.com/news
phablet 5109 0.0 0.8 232976 15232 ? T 13:14 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://*.bbc.co.uk/sport*,https?://*.bbc.com/sport http://m.bbc.com/sport
phablet 5308 0.1 0.8 232972 15240 ? T 13:14 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://giffgaff.com/* http://giffgaff.com/?m=1
phablet 5483 0.0 0.8 232988 15244 ? T 13:14 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://*.untappd.com/* http://m.untappd.com/
phablet 5692 0.1 0.8 232976 15248 ? T 13:15 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https://plus.google.*/*,https://accounts.google.*/* https://plus.google.com
phablet 5898 2.8 3.5 530856 67328 ? Tsl 13:16 0:04 webapp-container --enable-back-forward --webappUrlPatterns=https?://*.bbc.co.uk/news*,https?://*.bbc.com/news* http://m.bbc.com/news
phablet 5937 0.1 0.8 232976 15228 ? T 13:16 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://*.bbc.co.uk/news*,https?://*.bbc.com/news* http://m.bbc.com/news

phablet@ubuntu-phablet:~$ ps aux | grep oxide
phablet 4923 0.0 0.0 1568 288 ? T 13:13 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/chrome-sandbox /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=zygote
phablet 4924 0.0 0.5 95384 9824 ? T 13:13 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=zygote
phablet 4929 0.0 0.1 104600 2916 ? T 13:13 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=zygote
phablet 4980 0.9 2.1 193196 40196 ? Tl 13:13 0:03 /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=renderer --disable-accelerated-video-decode --disable-delegated-renderer --enable-threaded-compositing --lang=en-US --enable-threaded-compositing --channel=4886.4.1760984022
phablet 5110 0.0 0.0 1568 288 ? T 13:14 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/chrome-sandbox /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=zygote
phablet 5111 0.0 0.5 95384 9824 ? T 13:14 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=zygote
phablet 5115 0.0 0.1 104600 2916 ? T 13:14 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=zygote
phablet 5158 0.8 1.9 207340 37220 ? Tl 13:14 0:02 /usr/lib/arm-linux-gnueabihf/oxide-qt/oxide-renderer --type=renderer --disable-accelerated-video-decode --disable-delegated-renderer --enable-threaded-compositing --lang=en-US --enable-threaded-compositing --channel=5074.4.17772142
phablet 5309 0.0 0.0 1568 288 ? T 13:14 0:00 /usr/lib/arm-linux-gnueabihf/oxide-qt/chrome-sandbox /usr/lib/arm-linux-gnueabihf/oxide...

Read more...

Olivier Tilloy (osomon) wrote :

@Oliver: can you confirm what Alan is seeing, that the apps are actually running, but not visible in the apps scope in unity?

Oliver Grawert (ogra) wrote :

i can confirm that there are a bunch of webapp-container processes in my processlist even though there is only one app on screen ...
i also notice that there are as many zombie processes of:
[oxide-renderer] <defunct>

David Barth (dbarth) wrote :

The apps are not crashing. I can confirm while still testing on #281 that all of the webapp-container and oxide-renderer processes are present and alive (or paused) for all of the 10 webapps I have started. At some random point they stop being displayed on the 'Apps' list in Unity.

Changed in webbrowser-app (Ubuntu):
status: Confirmed → Invalid
MichaΕ‚ Sawicz (saviq) on 2014-04-08
affects: unity8 → unity-mir
Changed in unity-mir:
status: New → Triaged
importance: Undecided → High
Oliver Grawert (ogra) wrote :

i see teh following in syslog ... which indicates that OOM kicked in (i had 6 apps open of which 4 died over time):

root@ubuntu-phablet:~# grep "send sigkill" /var/log/syslog
Apr 8 13:14:58 ubuntu-phablet kernel: [ 1369.711399] send sigkill to 3839 (webapp-containe), adj 798, size 23128
Apr 8 13:18:40 ubuntu-phablet kernel: [ 1591.685121] send sigkill to 2577 (webapp-containe), adj 798, size 25359
Apr 8 13:19:59 ubuntu-phablet kernel: [ 1670.907858] send sigkill to 3306 (webapp-containe), adj 798, size 21919
Apr 8 13:21:42 ubuntu-phablet kernel: [ 1773.813672] send sigkill to 3602 (webapp-containe), adj 798, size 21629

Oliver Grawert (ogra) wrote :

intrestingly the processlist still has the apps ... so i assume only some subprocess died

 2679 ? T 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https://plus.google.*/*,https://accounts.google.*/* https://plus.google.com
 3330 ? T 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://heise.de/* http://m.heise.de/
 3467 ? Tsl 2:07 webapp-container --enable-back-forward --webappUrlPatterns=https?://golem.de/* http://m.golem.de/
 3497 ? T 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://golem.de/* http://m.golem.de/
 3863 ? T 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https?://sueddeutsche.de/* http://m.sueddeutsche.de/

Olivier Tilloy (osomon) wrote :

@Oliver: the kill signal was probably sent to the renderer process, the one that would be consuming too much memory, which left the browser/container executable running, but without a renderer.

MichaΕ‚ Sawicz (saviq) wrote :

On second thought, there seems to be something fishy:

$ initctl status application-click APP_ID=net.launchpad.click-webapps.googleplus_googleplus_6
application-click (net.launchpad.click-webapps.googleplus_googleplus_6) start/running, process 8105
phablet@ubuntu-phablet:~$ ps aux | grep google
phablet 8105 9.5 3.3 507900 63768 ? Ssl 13:27 0:08 webapp-container --enable-back-forward --webappUrlPatterns=https://plus.google.*/*,https://accounts.google.*/* https://plus.google.com
phablet 8129 0.1 0.8 234240 15252 ? S 13:27 0:00 webapp-container --enable-back-forward --webappUrlPatterns=https://plus.google.*/*,https://accounts.google.*/* https://plus.google.com

And anyway, if the renderer goes away, shouldn't the app itself go away, too?

Chris Coulson (chrisccoulson) wrote :

That looks expected. Chromium forks the browser process to use it as a sandbox IPC process, which is great for Chromium (because it forks it at startup and before any threads are created) but sucks for us (because by the time it forks, we already have a QML app and other gunk).

I'm trying to reimplement this at the moment as there's another bug anyway (because threads are never forked, there's a potential deadlock in the child process if fork is called whilst another thread holds a lock in something like malloc)

David Barth (dbarth) on 2014-04-08
tags: added: webapps-blocker
tags: added: webapps-hotlist
removed: webapps-blocker
Changed in webbrowser-app (Ubuntu):
status: Invalid → Confirmed
David Barth (dbarth) wrote :

Re-opening on webbrowser-app (ie container) as it seems we're using more memory that before, and the OOM killer removes webapps more quickly.

Some observations though:
- the situation gets worse when mixing old and new webapps, because we have both the oxide runtime in memory and the old qtwebkit one: openstreetmap in particular taxes memory quite heavily and is a sure way to get other webapps killed (keep in mind for testing)
- the OOM killer does not succeed in killing webapps properly: the first webapp-container is the one killed, but leaves the rest of the processes in the group
- this leads to duplicate app instances, and further confuses unity/appmgr which then fails at killing webapps as well

Chris Coulson (chrisccoulson) wrote :

For the specific sandbox IPC helper process issue, see bug 1304648

kevin gunn (kgunn72) on 2014-04-09
Changed in unity-mir:
assignee: nobody → Daniel d'Andrada (dandrader)
Jamie Strandboge (jdstrand) wrote :

The oxide portion of this is bug #1304648.

Changed in oxide:
status: New → Fix Released
tags: added: qa-touch-blocker
Daniel d'Andrada (dandrader) wrote :

Why is it a unity-mir bug?

MichaΕ‚ Sawicz (saviq) on 2014-04-10
Changed in unity-mir:
status: Triaged → Incomplete
assignee: Daniel d'Andrada (dandrader) → nobody
importance: High → Undecided
Oliver Grawert (ogra) wrote :

with the latest changes in image 286 i can now start 6 webapps and their thubnails stay in the UI ...

switching through them for a while with the right edge gesture they start dieing and respawning (which gets the app selector completely out of order and you most of the time do not end up with the app you selected in front of you)

the respawning does not seem to kill all processes belonging to the app though:
right after start i get:
root@ubuntu-phablet:~# ps ax|grep -c oxide
46
root@ubuntu-phablet:~# ps ax|grep -c webapp
6

after switching through apps for 5 min and have them die and re-order constantly in the app switcher it looks like:
ogra@styx:~/Devel/branches/unity8$ adb shell ps ax|grep -c oxide
95
ogra@styx:~/Devel/branches/unity8$ adb shell ps ax|grep -c webapp
2

during this all 6 apps have proper thumbnails in the switcher as well as in the applications scope

the phone eventually runs out of ram and starts getting unresponsive after about 198 dangling oxide processes have been started ...

http://paste.ubuntu.com/7230191/ has a processtree snapshot takinng after a few minutes after a fresh reboot

David Barth (dbarth) wrote :

After the call with ricmm, and with ogra's process tree in front of us:
proposed solution for now: have oxide kill itself if the parent process dies
alternative: figure out why the process group is apparently not formed
if both webapp-container and oxide were in the same process group, then the system would kill all of the processes at the same time. Which is not happening apparently.

Oliver Grawert (ogra) wrote :

one other thing i noticed is that if you close a webapp via the hud close action, all attached oxide processes get killed properly.

Chris Coulson (chrisccoulson) wrote :

All processes we spawn quit when their IPC links to the browser are closed (ie, the browser process being killed). This works ok in my testing here. The process tree in comment 25 shows that all of our child processes have been *stopped* (by somebody sending them SIGSTOP). I don't see how I can make oxide subprocesses kill themselves when they aren't running (and they already have this functionality)

Jamie Strandboge (jdstrand) wrote :

To clearly state the issue:
1. the leader of the process group for the app using oxide is sent STOP by the Unity appmgr (application lifecycle). Because the app is started via upstart, the appmgr sends SIGSTOP to -<pid>, ie the process group and the process group leader and all children (ie, the oxide processes) are stopped. This all works correctly as Chris stated
2. low memory killer (android) kills a pid (kernel OOM could also do this), possibly (often?) the process group leader
3. since the process group leader is gone, the appmgr does not resume the oxide child processes, leaving them hanging around

Two things seem to be happening here
1. OOM is called more often with webapp-container using oxide than with qtwebkit
2. appmgr is not cleaning up the killed processes

For '1', alex-abreu stated that for compatibility with 13.10 webapps in the store, webapp-container/webbrowser-app loads *both* qtwebkit and oxide and detects which to use at runtime. This is likely the cause for the OOM

For '2', appmgr needs to be a little smarter and notice that if the leader is killed, the other processes in the process group need to also be killed. There are a couple of ways to do this, the merits of which are being discussed.

Changed in unity-mir:
status: Incomplete → Confirmed
Jamie Strandboge (jdstrand) wrote :

"2. appmgr is not cleaning up the killed processes" should have read:

2. appmgr is not cleaning up all the stopped processes when a process in the process group is killed

Chris Coulson (chrisccoulson) wrote :

Following on from Jamie's comments, I'm just going to braindump what else we discovered is happening yesterday:

1) appmgr sends SIGSTOP to the process group for the webapp-container
2) The low memory killer comes along and kills the process group leader
3) appmgr considers the app closed, and never continues the remaining processes in their process group, leaving them around forever.

Normally when a process group becomes orphaned and it contains processes that are stopped, the kernel automatically sends SIGHUP and then SIGCONT to all remaining processes in that group. This is sufficient to ensure all remaining processes in the group are cleaned up, and I've verified this is the case on the desktop and the device when launching webbrowser-app manually from a terminal.

However, applications started by Upstart get their own session (thanks to Tyler for spotting this), with the session leader being webbrowser-app (or whatever app you're launching). The process group of the session leader is already orphaned by definition*, and so when the group (and session) leader is killed, the kernel will never send SIGHUP and SIGCONT to the remaining stopped processes.

*A process group is orphaned if the parent of every member is either within the process group or outside of the session.

Tyler Hicks (tyhicks) wrote :

As Chris pointed out in comment #31, upstart's use of setsid() means that, by default, all processes of a launched application are in the same session and process group. This process group is orphaned from the start so the kernel does not send SIGHUP/SIGCONT signals when, for example, webbrowser-app dies.

If we wanted applications to manage this themselves (I don't think we do, but...), the main process of an application could leave itself in the process group created by upstart. As the main application process forks off new processes, it could place those processes in a different process group. If/when the main process dies (such as the low memory killer sending it a SIGKILL), then the kernel recognizes that the process group containing the children processes will be orphaned and it will send SIGHUP/SIGCONT to all processes in the process group.

It would make the application author's jobs quite a bit more difficult, but I also see how they could benefit from leaving all smart/essential processes in the main process group and placing all dumb worker processes in the other process group.

Oliver Grawert (ogra) wrote :

sooo ...
using a combination of:
https://code.launchpad.net/~abreu-alexandre/webbrowser-app/better-control-webengine-lib-loaded/+merge/215280

and the hack from:
https://code.launchpad.net/~ted/upstart-app-launch/process-group-kill/+merge/215475

gets the issue solved, but ... the webapp (re-)start time is way to slow, that means a killed app that i pick from the right edge app switcher will actually bring the surface of the app underneath in front of you while the webapp starts anew ... once the app is ready (about 5sec) it takes over the screen with the actual content ...

this is indeed not related to this bug but our app lifecycle handling in general (it needs to compensate for this by showing a screenshot of the app or some such instead of showing the app that is underneath in the right edge switcher)

if i restrict myself to not use more than 5 webapps i see apps never getting killed (even with some additional native apps running)

David Barth (dbarth) on 2014-04-14
Changed in unity-mir:
status: Confirmed → Fix Released
Changed in webbrowser-app (Ubuntu):
status: Confirmed → Fix Released
Oliver Grawert (ogra) wrote :

definitely not fixed in unity-mir yet ... this will need serious reworking of the lifecycle management. I don't know why but i can not change anything on this bug anymore, david, can you set unity-mir back to confirmed ?
(we worked around the issue with a hack in upstart-app-launch and did not fix it properly in unity-mir)

To post a comment you must log in.