unity8 leaks file descriptors cause unresponsive ui but power button works display
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | Canonical System Image |
High
|
kevin gunn | ||
| | qtmir (Ubuntu) |
High
|
Daniel d'Andrada | ||
| | ubuntu-app-launch (Ubuntu) |
Critical
|
Ted Gould | ||
Bug Description
unity8 leaks file descriptors in various cases, causing the process to eventually fail (hang and/or restart). These leaks were first discovered during investigations of https:/
The fds are leaked in various scenarios, including:
* opening/closing apps
* switching between apps
* locking the screen
Related branches
- Charles Kerr (community): Approve on 2016-04-25
- PS Jenkins bot: Approve (continuous-integration) on 2015-12-07
-
Diff: 193 lines (+84/-7)6 files modifieddebian/libubuntu-app-launch2.symbols (+1/-0)
helpers-shared.c (+11/-2)
libubuntu-app-launch/ubuntu-app-launch.c (+5/-5)
libubuntu-app-launch/ubuntu-app-launch.h (+12/-0)
tools/CMakeLists.txt (+9/-0)
tools/ubuntu-app-list-pids.c (+46/-0)
- PS Jenkins bot: Pending (continuous-integration) requested 2015-09-16
- Indicator Applet Developers: Pending requested 2015-09-16
-
Diff: 57 lines (+12/-3) (has conflicts)2 files modifiedhelpers-shared.c (+11/-2)
libubuntu-app-launch/ubuntu-app-launch.h (+1/-1)
- PS Jenkins bot: Approve (continuous-integration) on 2015-09-16
- Indicator Applet Developers: Pending requested 2015-09-16
-
Diff: 57 lines (+12/-3)2 files modifiedhelpers-shared.c (+11/-2)
libubuntu-app-launch/ubuntu-app-launch.h (+1/-1)
- Daniel d'Andrada (community): Disapprove on 2015-09-25
- PS Jenkins bot: Needs Fixing (continuous-integration) on 2015-09-25
-
Diff: 33 lines (+9/-7)1 file modifiedsrc/modules/Unity/Application/mirsurfaceitem.cpp (+9/-7)
- Gerry Boland: Approve on 2015-09-29
- PS Jenkins bot: Approve (continuous-integration) on 2015-09-28
-
Diff: 96 lines (+20/-5)4 files modifiedsrc/modules/Unity/Application/mirsurface.h (+1/-0)
src/modules/Unity/Application/mirsurfaceinterface.h (+2/-0)
src/modules/Unity/Application/mirsurfaceitem.cpp (+16/-5)
tests/modules/common/fake_mirsurface.h (+1/-0)
| Alexandros Frantzis (afrantzis) wrote : | #1 |
| Alexandros Frantzis (afrantzis) wrote : | #2 |
As mentioned in the description, one way to leak fds is just opening and closing an app. It leaks an fd of type 'anon_inode:
Sample output from a run (omitting the lsof and /proc/*/fd information) is:
#### Starting at 2015-09-
# Before fds: 119
# During fds: 131
# After fds: 120
#### Starting at 2015-09-
# Before fds: 120
# During fds: 133
# After fds: 121
#### Starting at 2015-09-
# Before fds: 121
# During fds: 134
# After fds: 122
#### Starting at 2015-09-
# Before fds: 122
# During fds: 135
# After fds: 123
...
You can clearly see unity8 leaking one fd every time the app connects and disconnects.
| Albert Astals Cid (aacid) wrote : | #3 |
I don't see any eventfd() call in unity8, mir has at least two eventfd calls.
| Albert Astals Cid (aacid) wrote : | #4 |
According to alf_ it can't be mir since those eventfd calls are stored in a Mir::Fd that takes care of closing them
| Changed in canonical-devices-system-image: | |
| assignee: | nobody → kevin gunn (kgunn72) |
| status: | New → Confirmed |
| importance: | Undecided → Critical |
| milestone: | none → ww40-2015 |
| tags: | added: hotfix |
| Albert Astals Cid (aacid) wrote : | #5 |
Investigating, seems pretty clear mir is not at blame here
| Changed in mir (Ubuntu): | |
| status: | New → Invalid |
| Gerry Boland (gerboland) wrote : | #6 |
Watching the eventfd2 & close calls to see which one leaks:
sudo strace -p `pidof unity8` 2>&1 | grep -P "(eventfd|close)"
I get on app open:
eventfd2(0, O_NONBLOCK|
eventfd2(0, O_NONBLOCK|
close(117) = 0
eventfd2(0, O_NONBLOCK|
close(116) = 0
on app close:
eventfd2(0, O_NONBLOCK|
close(109) = 0
eventfd2(0, O_NONBLOCK|
close(109) = 0
eventfd2(0, O_NONBLOCK|
close(109) = 0
eventfd2(0, O_NONBLOCK|
close(109) = 0
I saw no close call for FD 118 - the first one.
In gdb the first eventfd call on app launch has this backtrace:
Breakpoint 1, 0xb614ae36 in eventfd () from /lib/arm-
(gdb) bt
#0 0xffffffff in eventfd () at /lib/arm-
#1 0xffffffff in g_wakeup_new () at /build/
#2 0xffffffff in g_main_context_new () at /build/
#3 0xffffffff in cgroup_
#4 0xffffffff in pids_for_appid (appid=
#5 0xffffffff in ubuntu_
#6 0xffffffff in qtmir::
#7 0xffffffff in qtmir::
#8 0xffffffff in qtmir::
#9 0xffffffff in QMetaCallEvent:
#10 0xffffffff in QMetaCallEvent:
#11 0xffffffff in QObject:
#12 0xffffffff in QCoreApplicatio
#13 0xffffffff in QCoreApplicatio
| Changed in ubuntu-app-launch (Ubuntu): | |
| assignee: | nobody → Ted Gould (ted) |
| importance: | Undecided → Critical |
| status: | New → Confirmed |
| no longer affects: | mir (Ubuntu) |
| no longer affects: | unity8 (Ubuntu) |
| Alexandros Frantzis (afrantzis) wrote : | #7 |
From Gerry's backtrace it is clear that the fd leak we get when opening/closing apps can be traced back to u-a-l.
However, Jean-Baptiste found a few other scenarios (see first comment) that lead to leaked fds. Have we confirmed that all the leaks in these scenarios can be traced back to this u-a-l leak? If not, we need to ensure this is reflected in the bug status (by keeping unity8 as affected), or opening a new bug report.
| summary: |
- unity8 leaks file descriptors + unity8 leaks file descriptors cause unresponsive ui but power button + works display |
| kevin gunn (kgunn72) wrote : | #8 |
had a question earlier today also if this was on stable (ota6) i just confirmed it is.
| Changed in ubuntu-app-launch (Ubuntu): | |
| status: | Confirmed → In Progress |
| Albert Astals Cid (aacid) wrote : | #9 |
> Have we confirmed that all the leaks in these scenarios can be traced back to this u-a-l leak?
No we have not, it'll be easier once this one has been fixed, but given we don't use eventfd nor glib extensibly, there's a high chnange this is not in unity8 itself
> If not, we need to ensure this is reflected in the bug status (by keeping unity8 as affected), or opening a new bug report.
I guess this is a matter of taste, feel free to re-add it if you prefer.
| Alexandros Frantzis (afrantzis) wrote : | #10 |
>> If not, we need to ensure this is reflected in the bug status (by keeping unity8 as affected), or opening a new bug report.
> I guess this is a matter of taste, feel free to re-add it if you prefer.
I just want to make sure we track this properly, not close it when the u-a-l part is fixed (unless, of course, u-a-l turns out to be the source of all our leak instances). I guess tracking it in "Canonical System Image" should be enough for this.
| Albert Astals Cid (aacid) wrote : | #11 |
I can confirm that Ted's branch seems to fix all the fd leaking for me. A second source would be cool.
It's a bit unfortunate we need ~8 fd per app though, may we need to optimize that at some point in the future.
| Albert Astals Cid (aacid) wrote : | #12 |
s/second source/second confirmation
| Ted Gould (ted) wrote : Re: [Bug 1495871] Re: unity8 leaks file descriptors cause unresponsive ui but power button works display | #13 |
On Wed, 2015-09-16 at 15:22 +0000, Albert Astals Cid wrote:
> It's a bit unfortunate we need ~8 fd per app though, may we need to
> optimize that at some point in the future.
It will go down a bunch with the switch to systemd as we can request on
the system bus instead of building a custom dbus peer-to-peer connection
with CGManager.
| kevin gunn (kgunn72) wrote : | #14 |
using ted's branch
https:/
which is conveniently available in silo 60
i tested using the script above to open/close apps...which kept the number of open fd's to a steady state
however, i then modified the script to not open/close apps and just poop out the unity8 process # of open fds
I then manually opened as many apps as possible/available on the basic image
I then watched the ubuntu-app-watch output & ubuntu-app-list to make sure apps were getting closed due to memory pressure, i then would use the spread and pull up the very "last" or front app in spread (e.g. all the way to the right in the spread) app to make sure i was cycling between all the apps available in the spread
when i started with no apps open it was posted 135 fds open, at some point after opening all the apps fresh(no toggling yet) it was at ~200+, upon toggling the number of fds grew to 480, i stopped toggle, allowed steady state/screen blanked...it remained at 480. Then i swiped away every available app in the spread, which reduced the number of open fds down to 255. So that's 120 fds orphaned. Not sure if these are associated with screen shots in the spread or ual somehow.
| kevin gunn (kgunn72) wrote : | #15 |
just out of curiosity, i set up the phone sim, sent text messages - since that's when this seems to happen the most
the fds for unity8 would jump up by 1 but eventually drop back down by one.
I also watched the fd count for message+ & indicator-
| kevin gunn (kgunn72) wrote : | #16 |
ok, watched pulseaudio and mediahub-server using the phonesim
pulseaudio was fine
media-hub-server was leaking fds, leaking 2 per text message alert & phone call
| Changed in media-hub (Ubuntu): | |
| importance: | Undecided → Critical |
| Jean-Baptiste Lallement (jibel) wrote : | #17 |
If it's useful to anyone, here is a small script to track the number of fd used per process. By default t takes a sample every 5 minutes. Then you can diff the logs to find which process is suspicious.
| Albert Astals Cid (aacid) wrote : | #18 |
The leaks in unity8 when the memory pressure is high can be easily reproduced by:
* Be on dash
* Open an app
* Switch to he dash
* Use kill -9 to fill the app
We're leaking an anon_inode:dmabuf fd each time that happens.
| kevin gunn (kgunn72) wrote : | #19 |
removing media-hub, spawing bug 1496894
| no longer affects: | media-hub (Ubuntu) |
| Albert Astals Cid (aacid) wrote : | #20 |
The anon_inode:dmabuf i mention in #18 is not leaked anymore if we make ApplicationScre
I'm investigating now if it's either qtmir, unity8, qt or it's not a fd leak at all and it's just the image cache of Qt being used.
| Changed in canonical-devices-system-image: | |
| status: | Confirmed → In Progress |
| Albert Astals Cid (aacid) wrote : | #21 |
Adding qtmir since i can make the anon_inode:dmabuf fd not leak by making qtmir not free early the session when the app is killed.
| Albert Astals Cid (aacid) wrote : | #22 |
FWIW the ubuntu-app-launch fix has landed in the vivid image
| Changed in ubuntu-app-launch (Ubuntu): | |
| status: | In Progress → Fix Released |
| Changed in qtmir (Ubuntu): | |
| importance: | Undecided → Critical |
| Albert Astals Cid (aacid) wrote : | #23 |
I've found two ways (which are effectively the same) to workaround the fd leak caused by killing an app (comment #18)
* removing surfaceItem.surface = null; in unity8 SurfaceContaine
* removing the 3 deleteLater in qtmir session.cpp
Both have the same effect. qtmir's session and mirsurface won't be freed when the application is killed, just when the user actually swipes down from the side spread, by doing that the fd leak is gone.
It has two problems:
* We are delaying freeing memory resources and since the most common occurrence of apps being killed is by the OOM, this is not good
* If you try to restart the app without swiping down from the side spread, unity8 will crash because the mir surface item has bad data, i think this could be fixed, but the first problem seems strong enough to disqualify these solutions.
The workaround seem to imply that something is wrong when the early free for killed apps is done.
I've read the code and to my non-domain-
Valgrind could not find any memory or fd leak either.
Maybe someone from the mir/qtmir fields can have a look at the early free code for killed apps and see if they see anything wrong?
| kevin gunn (kgunn72) wrote : | #24 |
unfortunately gerry's probably the best expert but out, assigning to daniel in hopes he can make some progress
| Changed in qtmir (Ubuntu): | |
| assignee: | nobody → Daniel van Vugt (vanvugt) |
| Daniel van Vugt (vanvugt) wrote : | #25 |
Using comment #18 on arale, I find roughly three fds are leaked at a time:
> lrwx------ 1 phablet phablet 64 Sep 22 07:49 135 -> anon_inode:dmabuf
> lrwx------ 1 phablet phablet 64 Sep 22 07:49 138 -> /dev/pvrsrvkm
> lrwx------ 1 phablet phablet 64 Sep 22 07:49 139 -> /dev/pvrsrvkm
| Daniel van Vugt (vanvugt) wrote : | #26 |
Apparently we fixed a very similar bug on desktop two years ago --> bug 1185183
| Daniel van Vugt (vanvugt) wrote : | #27 |
Invalid for qtmir, but it does seem to be a Mir issue... See bug 1498361.
| Changed in qtmir (Ubuntu): | |
| status: | New → Invalid |
| Daniel van Vugt (vanvugt) wrote : | #28 |
Actually there might still be a qtmir-specific leak but we need to fix the leaks seen in plain Mir too --> bug 1498361
| Changed in qtmir (Ubuntu): | |
| status: | Invalid → New |
| Daniel van Vugt (vanvugt) wrote : | #29 |
Considering bug 1498816, and the new-ish bufferstream logic in Mir's SessionMediator, I'm suspicious that Mir might just be indefinitely bloated, never releasing some resources until shutdown. It's a bit scary that the bufferstream logic is passing around raw pointers without any clear ownership. We may well be holding some for too long. Which is exactly the kind of resource that would be seen as "anon_inode:
| Alan Griffiths (alan-griffiths) wrote : | #30 |
Mir does associate resources with a client socket that are not released until one of two things happen:
1. The client explicitly disconnects
2. An operation (like the outstanding read attempt) on the socket reports an error
AFAIK there is no *guarantee* that the latter happens in a timely manner when a client disconnects abruptly.
| Changed in qtmir (Ubuntu): | |
| status: | New → In Progress |
| affects: | qtmir (Ubuntu) → qtmir |
| Daniel van Vugt (vanvugt) wrote : | #31 |
Note that tools like valgrind (memcheck) are unlikely to find the problem. Because C++ and RAII... the offending resources get released on shutdown and never actually leaked. So instead of leaks, you have to hunt for "bloat" and use a memory/resource profiler.
I got distracted for a long time finding lots of similar but unrelated erratic bloat in Mir demo servers on Android. But I think those are specific to Mir demos and not Unity8. Seems plausible Unity8 is possibly avoiding some Mir logic that's bloaty in its own right.
Next stop: Rebuild a qtmir development environment and investigate the snapshot stuff. In theory if the snapshot code is leaking a buffer then that would explain the anon_inode:dmabuf handle. However it's not just the snapshot code but also bug 1498816 and how that affects SessionMediator that's concerning.
| Daniel van Vugt (vanvugt) wrote : | #32 |
Handy tip:
You can see the call stack of (many but not all) fds unity8 has open while it's running. Just start unity8 under valgrind with --track-fds=yes. Then you can get the live list of fds with their backtraces:
vgdb v.info open_fds
Annoyingly this doesn't show the dmabuf ones we're looking for.
| Daniel van Vugt (vanvugt) wrote : | #33 |
Dropped severity because the bug requires significant time to occur, if at all.
| Changed in qtmir: | |
| importance: | Critical → High |
| kevin gunn (kgunn72) wrote : | #34 |
Just noting, i agree with drop in priority, since we have a fix for u-a-l and that was the original culprit.
we do need to eventually get this nailed, so please keep this as a front burner activity
| Changed in canonical-devices-system-image: | |
| importance: | Critical → High |
| Daniel van Vugt (vanvugt) wrote : | #35 |
aacid:
Can you please test the attached proposed branches? I'm wondering if I'm imaging things when it seems fixed.
| Daniel van Vugt (vanvugt) wrote : | #36 |
*imagining*
| Albert Astals Cid (aacid) wrote : | #37 |
I can confirm that the fd leaking on killing apps like explained in comment #18 also seems to stop for me when using https:/
| Daniel van Vugt (vanvugt) wrote : | #38 |
Awesomtastic
| Changed in qtmir: | |
| assignee: | Daniel van Vugt (vanvugt) → Daniel d'Andrada (dandrader) |
| assignee: | Daniel d'Andrada (dandrader) → Daniel van Vugt (vanvugt) |
| assignee: | Daniel van Vugt (vanvugt) → Daniel d'Andrada (dandrader) |
| Changed in qtmir (Ubuntu): | |
| status: | New → Triaged |
| importance: | Undecided → High |
| Launchpad Janitor (janitor) wrote : | #39 |
This bug was fixed in the package qtmir - 0.4.6+15.
---------------
qtmir (0.4.6+
[ CI Train Bot ]
* New rebuild forced.
[ Daniel d'Andrada ]
* MirSurfaceItem: texture must be manipulated only from the scene
graph thread (LP: #1499388, #1495871)
[ Gerry Boland ]
* Add "Closing" state to Application, use it to distinguish user-
induced close from app-induced close. Don't clear QML cache if user-
induced. (LP: #1500372)
-- Michał Sawicz <email address hidden> Wed, 30 Sep 2015 10:08:37 +0000
| Changed in qtmir (Ubuntu): | |
| status: | Triaged → Fix Released |
| Changed in qtmir: | |
| status: | In Progress → Fix Released |
| Changed in canonical-devices-system-image: | |
| status: | In Progress → Fix Committed |
| Changed in canonical-devices-system-image: | |
| status: | Fix Committed → Fix Released |
| Changed in qtmir (Ubuntu): | |
| assignee: | nobody → Daniel d'Andrada (dandrader) |
| no longer affects: | qtmir |

For completeness, here are Jean-Baptiste's findings:
"I reproduce the issue by following the "fd leak" lead with the following test case:
1. Open 2 apps
2. Switch from the app scope to app1
3. Switch from app1 to the app scope
4. Switch from the app scope to app2
5 lock the screen (optional but it leaks even more fd)
At the same time monitor the number of file descriptors used by unity8:
$ watch -n1 "ls /proc/$(pidof unity8)/fdinfo/| wc -l "
After a while unity8 runs out of fd and the system hangs. Eventually unity8 will restart a moment later."