uevent spam causes libdbus client code in session upstart to consume massive amounts of memory on Ubuntu Touch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Touch Landing Plan |
Undecided
|
Unassigned | |||
Unity |
Confirmed
|
Undecided
|
Unassigned | ||
upstart |
Undecided
|
James Hunt | |||
linux (Ubuntu) |
Critical
|
Unassigned | |||
Saucy |
Critical
|
Unassigned | |||
systemd (Ubuntu) |
Undecided
|
Dimitri John Ledkov | |||
Saucy |
Undecided
|
Dimitri John Ledkov | |||
unity (Ubuntu) |
Undecided
|
Unassigned | |||
Saucy |
Undecided
|
Unassigned | |||
upstart (Ubuntu) |
Critical
|
James Hunt | |||
Saucy |
Critical
|
James Hunt |
Bug Description
using ubuntu touch image 82 i see the session init consume about 10MB per minute as long as the screen is on with Mir.
running the same session with surfaceflinger only consumes 1MB per minute.
in both cases the system starts to swap heavily at some point, making the UI unresponsive.
http://
http://
apparently the leak only occurs while the screen is on, it seems to be permanently there but in the case of surfaceflinger it hits less hard.
Related branches
- James Hunt: Disapprove on 2013-10-08
-
Diff: 18 lines (+8/-0)1 file modifiedextra/upstart-udev-bridge.c (+8/-0)
- PS Jenkins bot (community): Needs Fixing (continuous-integration) on 2013-10-10
- Thomas Voß (community): Approve on 2013-10-10
-
Diff: 88 lines (+20/-25)2 files modifiedplugins/Unity/Indicators/indicatorsmanager.cpp (+20/-24)
plugins/Unity/Indicators/indicatorsmanager.h (+0/-1)
- Steve Langasek: Disapprove on 2013-10-23
-
Diff: 1044 lines (+730/-34)13 files modifiedChangeLog (+48/-1)
init/control.c (+45/-12)
init/control.h (+5/-1)
init/environ.c (+4/-0)
init/job_class.c (+56/-6)
init/job_process.c (+1/-1)
init/main.c (+12/-5)
init/tests/test_control.c (+2/-2)
init/tests/test_environ.c (+457/-0)
test/test_util_common.c (+11/-1)
util/initctl.c (+2/-3)
util/man/initctl.8 (+2/-2)
util/tests/test_initctl.c (+85/-0)
- Dimitri John Ledkov: Approve on 2013-11-04
-
Diff: 817 lines (+477/-46)9 files modifiedChangeLog (+39/-8)
dbus/com.ubuntu.Upstart.xml (+4/-0)
init/control.c (+111/-18)
init/control.h (+12/-1)
init/main.c (+22/-7)
init/tests/test_control.c (+2/-2)
util/initctl.c (+53/-4)
util/man/initctl.8 (+23/-6)
util/tests/test_initctl.c (+211/-0)
Changed in upstart (Ubuntu): | |
status: | New → Confirmed |
Oliver Grawert (ogra) wrote : | #2 |
after capturing the above i tried to upgrade to upstart 1.10-0ubuntu6 from the archive which exposes the same behavior.
Oliver Grawert (ogra) wrote : | #3 |
i tried downgrading upstart version by version until 1.9.1-0ubuntu5 (which was the first one with user sessions enabled) and it seems all versions expose this behavior.
Jean-Baptiste Lallement (jibel) wrote : | #4 |
I confirm this behaviour with build 82 on mako.
To illustrate it I made the following test:
Start and stop dialer-app every 6s (reasonable delay to give time to the application to start and stop)
The application is started with: start application APP_ID=dialer-app
and stop with a kill.
I captured RSS measured by ps after each stop and made 2 runs with Mir and 1 with SF. The result is attached.
I shows that RSS usage of init with Mir is much higher the SF and growth is exponential while it is linear with Mir.
As a side note, the number of samples with Mir is limited to less than 50 because the phone stops responding at this moment while it continues to run fine with SF. This is reproduceable.
tags: | added: qa-manual-testing qa-touch rls-s-incoming |
Jean-Baptiste Lallement (jibel) wrote : | #5 |
replace:
I shows that RSS usage of init with Mir is much higher the SF and growth is exponential while it is linear with Mir.
by
It shows that RSS usage of init with Mir is much higher than SF and growth is exponential while it is linear with SF.
Oliver Grawert (ogra) wrote : | #6 |
the growth in itself is an issue though, even SF will hit swap at some point, just later ...
James Hunt (jamesodhunt) wrote : | #7 |
I don't have a maguro device so debugging is going to have to be by proxy :)
A few questions:
1) Does this only occur when the system is read-only?
2) Are logs growning fast in ~/.cache/upstart/ ?
3) What is the screen doing when it is on? Is it enough to run "powerd-cli display on" to see the issue?
4) Is the issue seen on any other hardware?
5) How do you toggle between Mir and SF?
Please could you:
a) Run the Session Init in debug mode by adding '--debug' to 'init --user' in /usr/bin/
b) attach ~/.xsession-errors
c) Get an strace of the Session Init.
Sounds like this issue may be related to kernel bug 1234743.
James Hunt (jamesodhunt) wrote : | #9 |
Answering own question (5): https:/
Jean-Baptiste Lallement (jibel) wrote : Re: [Bug 1235649] Re: session upstart leaks massive amounts of memory on Ubuntu Touch | #10 |
Le 07/10/2013 11:12, James Hunt a écrit :
> I don't have a maguro device so debugging is going to have to be by
> proxy :)
>
> A few questions:
>
> 1) Does this only occur when the system is read-only?
No. init leaks the same amount of memory when the file system is RO or RW.
> 2) Are logs growning fast in ~/.cache/upstart/ ?
No. The biggest log file is application-
application I used for the test) and it is around 700kB. The size is
approximately the same with Mir or SF and in RO or RW mode. (ls -lSr
/home/phablet/
> 3) What is the screen doing when it is on? Is it enough to run "powerd-cli display on" to see the issue?
In my test, I was launching/stopping dialer app, so the screen was
switching between the dialer and the application scope.
Running powerd-cli or not doesn't make a difference (excepted that I
have to tap the screen regularly to keep the display alive without
powerd-cli :))
> 4) Is the issue seen on any other hardware?
Reproduced on maguro and mako
> Please could you:
>
> a) Run the Session Init in debug mode by adding '--debug' to 'init --user' in /usr/bin/
> b) attach ~/.xsession-errors
> c) Get an strace of the Session Init.
In progress
--
Jean-Baptiste
irc: jibel
Jean-Baptiste Lallement (jibel) wrote : Re: session upstart leaks massive amounts of memory on Ubuntu Touch | #11 |
/home/phablet/
Jean-Baptiste Lallement (jibel) wrote : | #12 |
strace of session init
Jean-Baptiste Lallement (jibel) wrote : | #13 |
btw, .xession-errors shows 5 run of dialer, 1 run of shorts and 1 run of calculator, and strace shows 1 run of dialer.
James Hunt (jamesodhunt) wrote : | #14 |
jibel - thanks. It does indeed look like bug 1234743 as the trace shows a lot of the following events:
:sys:cpu-
:sys:cpu-
:sys:power_
This would explain the Session consuming a lot of CPU since those events are generated by the kernel, picked up by the upstart-udev-bridge which injects them into the system init event space. These events are then noticed by the upstart-
Could you run that again as "strace -fFttv -s 1024 ..." so we can see some timestamps?
Also, please attach /var/log/udev and the output of 'ls -al /proc/$
Adding kernel since as this issue only affects particular hardward, it is highly unlikely to be an Upstart issue.
Changed in upstart (Ubuntu Saucy): | |
status: | Confirmed → Incomplete |
James Hunt (jamesodhunt) wrote : | #15 |
Also, please could you run http://
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1235649
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Jean-Baptiste Lallement (jibel) wrote : Re: session upstart leaks massive amounts of memory on Ubuntu Touch | #17 |
more verbose strace during a start/stop of dialer-app
Jean-Baptiste Lallement (jibel) wrote : | #18 |
output of get_state.sh after dialer-app has been stopped.
Jean-Baptiste Lallement (jibel) wrote : | #19 |
/var/log/udev
Jean-Baptiste Lallement (jibel) wrote : | #20 |
output of ls -al /proc/1467/fd/
Changed in upstart (Ubuntu Saucy): | |
status: | Incomplete → Confirmed |
Changed in linux (Ubuntu Saucy): | |
status: | Incomplete → Confirmed |
tags: | added: kernel-bot-stop-nagging |
Changed in linux (Ubuntu Saucy): | |
importance: | Undecided → Critical |
Oliver Grawert (ogra) wrote : | #21 |
i can definitely make it stop consuming more ram by stopping the upstart-
this works on both devices.
the ram does not get freed but the consumption does not keep rising either anymore then.
James Hunt (jamesodhunt) wrote : | #22 |
Apparently, the high cpu load stop when the upstart-
However, what is not explained is why the Session Init retains the memory it consumes once the upstart-
Please would someone check to see if the issue is recreatable on grouper?
James Hunt (jamesodhunt) wrote : | #23 |
@jibel - that get_state.out appears to have either been run as root, or atleast outside the session init environment. Please could you re-run as user phablet?
Also, please could you attach /var/log/udev ?
Jean-Baptiste Lallement (jibel) wrote : | #24 |
output of get_state.sh as user phablet
James Hunt (jamesodhunt) wrote : | #25 |
Can someone try running "/sbin/initctl notify-
James Hunt (jamesodhunt) wrote : | #26 |
From info provided by jibel...
Forcing a log flush (#25) doesn't resolve the high memory usage (as we'd expect, but worth ruling out).
The state file in #24 looks sane to me; forcing a re-exec of the Session Init does cause the memory to fall to reasonable levels. This almost feels like a compiler bug where __attribute__ ((cleanup())) is failing for some reason.
James Hunt (jamesodhunt) wrote : | #27 |
To summarise: there are 2 problems here:
1) The kernel spamming userspace with udev messages on mako+maguro (grouper doesn't appear to be affected but not independently verified yet).
2) Upstart not releasing memory once the kernel has finished spamming userspace (artificially forced by running "stop upstart-
Jean-Baptiste Lallement (jibel) wrote : | #28 |
output of "initctl list" as user phablet.
James Hunt (jamesodhunt) wrote : | #29 |
Could someone run the "make check" tests for the libnih package on a mako/maguro device?
James Hunt (jamesodhunt) wrote : | #30 |
I can no recreate the issue with memory not being freed by running: http://
This is recreatable on i386 desktop images so although problem (1) in #27 is h/w specific, problem (2) is not.
James Hunt (jamesodhunt) wrote : | #31 |
Attempting to run the script in #30 against a session init as below *outside* of a the standard gui environment with no .conf files does not trigger the issue:
$ mkdir -p /tmp/conf /tmp/log
$ init --debug --no-startup-event --confdir /tmp/conf --logdir /tmp/log --user
Same outcome running like this:
$ init --debug --no-startup-event --user
So, Event objects themselves are not the issue. More likely a subtle ref-counting issue related to one of the other object types.
tags: | added: kernel-key |
tags: | removed: rls-s-incoming |
Changed in upstart (Ubuntu Saucy): | |
assignee: | nobody → James Hunt (jamesodhunt) |
James Hunt (jamesodhunt) wrote : | #32 |
Valgrinding the session init shows that most of the memory is correctly freed at quiesce time.
James Hunt (jamesodhunt) wrote : | #33 |
Valgrind run as:
valgrind -v -v -v -v --show-
summary: |
- session upstart leaks massive amounts of memory on Ubuntu Touch + uevent spam causes session upstart to consume massive amounts of memory + on Ubuntu Touch |
Changed in linux (Ubuntu Saucy): | |
status: | Confirmed → Incomplete |
tags: |
added: kernel-da-key removed: kernel-key |
Dimitri John Ledkov (xnox) wrote : Re: uevent spam causes session upstart to consume massive amounts of memory on Ubuntu Touch | #34 |
After discussion on irc between ogra, pitti and I:
workaround #1 - blacklist that uevent in upstart-
workaround #2 - blacklist that uevent in systemd-udev (via socket filter) if not needed by Mir.
The first one is proposed in a branch already, and that still leaves 5% cpu usage by systemd-udev constantly whilst the screen is on.
James Hunt (jamesodhunt) wrote : | #35 |
The problem is triggered by init/control.
James Hunt (jamesodhunt) wrote : | #36 |
The working theory at this stage is that init/control.
James Hunt (jamesodhunt) wrote : | #37 |
init/control.
James Hunt (jamesodhunt) wrote : | #38 |
To be clear, adding a call to dbus_connection
James Hunt (jamesodhunt) wrote : | #39 |
Persistent D-Bus connections are as expected (1 / bridge and 1 for unity-panel-
Ephemeral D-Bus connections are being handled correctly and removed from the control_conns list.
James Hunt (jamesodhunt) wrote : | #40 |
Still unclear why memory is being cached in Upstart address space, but I'm starting to think this might be a libdbus issue.
What is triggering the memory rise though is unity-panel-
Oliver Grawert (ogra) wrote : | #41 |
note that touch images do not use unity-panel-service anywhere
James Hunt (jamesodhunt) wrote : | #42 |
Right - recall that I am testing on i386 as I don't have mako/maguro :)
However, note that on Touch, the upstart code in unity8 (plugins/
Is it possible to somehow disable the loading of /usr/lib/
James Hunt (jamesodhunt) wrote : | #43 |
Re #41, interesting that unity-panel-service *is* installed on Touch image though :)
Michał Sawicz (saviq) wrote : | #44 |
Sure:
=== modified file 'Panel/Panel.qml'
--- Panel/Panel.qml 2013-09-24 09:00:55 +0000
+++ Panel/Panel.qml 2013-10-09 14:14:02 +0000
@@ -71,7 +71,7 @@
}
}
- Indicators {
+ Showable {
id: indicatorsMenu
@@ -79,10 +79,10 @@
y: panelBackground.y
width: root.indicators
shown: false
- hintValue: __panelMinusSep
- panelHeight: __panelMinusSep
- openedHeight: parent.height + (pinnedMode ? 0 : root.panelHeight)
- pinnedMode: !fullscreenMode
+// hintValue: __panelMinusSep
+// panelHeight: __panelMinusSep
+// openedHeight: parent.height + (pinnedMode ? 0 : root.panelHeight)
+// pinnedMode: !fullscreenMode
property real unitProgress: (height - panelHeight) / (openedHeight - panelHeight)
}
=== modified file 'Shell.qml'
--- Shell.qml 2013-10-08 19:11:11 +0000
+++ Shell.qml 2013-10-09 14:13:59 +0000
@@ -568,7 +568,7 @@ ...
Michał Sawicz (saviq) wrote : | #45 |
The above can be applied straight on device, in /usr/share/unity8/.
Steve Langasek (vorlon) wrote : | #46 |
Workaround is in place in upstart package to reduce the flood of events that are causing this rapid memory growth, and we believe a fix to the unity upstart client will fix the memory leak entirely; but upstart needs to be more robust against misbehaving clients. Opening a task against upstart upstream.
Launchpad Janitor (janitor) wrote : | #47 |
This bug was fixed in the package upstart - 1.10-0ubuntu7
---------------
upstart (1.10-0ubuntu7) saucy; urgency=low
* Ignore emitting specific udev event spam on maguro, as a workaround
for LP: #1235649.
-- Dmitrijs Ledkovs <email address hidden> Wed, 09 Oct 2013 13:52:10 +0100
Changed in upstart (Ubuntu Saucy): | |
status: | Confirmed → Fix Released |
Changed in linux (Ubuntu Saucy): | |
status: | Incomplete → Invalid |
James Hunt (jamesodhunt) wrote : | #48 |
This problem can now be demonstrated with a single connection to upstart that makes the following calls:
dbus_connection
nih_dbus_proxy_new ()
The problem is not seen if the app instead calls:
- nih_dbus_connect (upstart_session, disconnect_handler)
- nih_dbus_proxy_new ()
- nih_main_loop()
Or,
- nih_dbus_connect (upstart_session, NULL)
- nih_dbus_proxy_new ()
- nih_main_loop()
Specifically, the problem appears to be that when an app calls dbus_connection
James Hunt (jamesodhunt) wrote : | #49 |
The problem is not observed when the only connections to upstart are the standard upstart bridges (since they use nih_dbus_
For reference, here is nih_dbus_setup() which includes one or more calls that unity seemingly should be calling itself:
James Hunt (jamesodhunt) wrote : | #50 |
Note that disabling libIndicatorsQml.so with #44 did not resolve the problem. However, I am confident this is because there were other connections to the Session Init (probably resulting from libupstart-
This does need to be confirmed though.
James Hunt (jamesodhunt) wrote : | #51 |
C program to trigger this bug. Run it in an environment with UPSTART_SESSION set (such as a desktop/Touch environment), then run the about-to-be attached spam.sh script.
James Hunt (jamesodhunt) wrote : | #52 |
Run as:
$ spam.sh <session_init_pid>
James Hunt (jamesodhunt) wrote : | #53 |
Program attached in #51 demonstrates the incorrect calling behaviour. To demonstrate the correct behaviour change the #if0/else/endif such that rather than calling dbus_connection
nih_dbus_connect(): No memory growth.
dbus_connection
summary: |
- uevent spam causes session upstart to consume massive amounts of memory - on Ubuntu Touch + uevent spam causes libdbus client code in session upstart to consume + massive amounts of memory on Ubuntu Touch |
James Hunt (jamesodhunt) wrote : | #54 |
Note that the following call on the server side does not resolve the issue:
while (dbus_connectio
James Hunt (jamesodhunt) wrote : | #55 |
Neither does calling the following help:
dbus_
Steve Langasek (vorlon) wrote : | #56 |
I've tested on a mako device here, and the memory usage is not all due to a misbehaved client. I've tracked down all the clients of the session init; unity8 is the only long-lived client other than the bridges, and if I kill unity8 and let it respawn, upstart gains back some of the memory, but *not all*: if I feed events into the session init, bringing its memory usage up to 93.3MB, and then kill unity8 (and, in fact,if I go through one-by-one and kill *all* the processes related to the session), upstart frees *some* memory, but its total usage remains at 88MB - compared to 5MB when it starts out.
Another thing I've found out, though, is that if I force a re-exec of the session init with 'telinit u', the memory usage drops down... and never rises again, even if I spam it with more events. This suggests a possible workaround of just re-execing the session init immediately after startup. But that probably requires landing a fix for bug #1238078 first.
Steve Langasek (vorlon) wrote : | #57 |
I've also checked this memory usage with valgrind; valgrind doesn't think there's a leak, at the end of a valgrind session there are only 211,206 bytes "possibly lost" - nowhere near the many MB that we're seeing.
Steve Langasek (vorlon) wrote : | #58 |
So the reason the memory usage wasn't growing after re-exec was because unity8 didn't automatically reconnect when the dbus connection dropped. If I kill and restart unity8, the problem reappears.
So the memory usage is indeed associated with the dbus client, even though disconnecting the client doesn't free the memory, and valgrind doesn't consider the memory to have been lost.
James Hunt (jamesodhunt) wrote : | #59 |
Just got a much more interesting valgrind run. Juicy excerpt:
==9366== 147,560 bytes in 1,190 blocks are still reachable in loss record 187 of 188
==9366== at 0x402B965: calloc (in /usr/lib/
==9366== by 0x408E730: dbus_malloc0 (dbus-memory.c:572)
==9366== by 0x407983B: dbus_message_
==9366== by 0x4079D0F: dbus_message_
==9366== by 0x80835F5: control_
==9366== by 0x806BC6C: event_finished (event.c:501)
==9366== by 0x806B26F: event_poll (event.c:259)
==9366== by 0x4044B63: nih_main_loop (main.c:609)
==9366== by 0x804D507: main (main.c:688)
==9366==
==9366== 228,394,752 bytes in 2,386 blocks are still reachable in loss record 188 of 188
==9366== at 0x402BB88: realloc (in /usr/lib/
==9366== by 0x408E778: dbus_realloc (dbus-memory.c:678)
==9366== by 0x408F08C: reallocate_
==9366== by 0x408F12B: set_length (dbus-string.c:390)
==9366== by 0x408F1A9: open_gap (dbus-string.c:411)
==9366== by 0x408FA67: copy (dbus-string.
==9366== by 0x408FCCE: _dbus_string_
==9366== by 0x408DCF8: marshal_
==9366== by 0x408DD95: marshal_string (dbus-marshal-
==9366== by 0x408DF60: _dbus_marshal_
==9366== by 0x4075CD7: _dbus_type_
==9366== by 0x40768BB: _dbus_type_
==9366==
==9366== LEAK SUMMARY:
==9366== definitely lost: 28 bytes in 1 blocks
==9366== indirectly lost: 24 bytes in 1 blocks
==9366== possibly lost: 12,907 bytes in 290 blocks
==9366== still reachable: 228,607,834 bytes in 3,732 blocks
==9366== suppressed: 0 bytes in 0 blocks
Steve Langasek (vorlon) wrote : | #60 |
James, how was this valgrind trace produced? The stacktrace doesn't show anything above the low-level calls in dbus, making it difficult to pinpoint the actual source of the problem.
James Hunt (jamesodhunt) wrote : | #61 |
Hi Steve,
The trace in #59 was produced by:
- Running Upstart via valgrind which upstart linked against debug builds of nih and dbus.
- Running a single instance of the test program (#51).
- Running spam.sh (#52).
- Waiting until init was consuming ~200MB, then sending SIGTERM to the init/valgrind process.
James Hunt (jamesodhunt) wrote : | #62 |
Another valgrind log taken using the same precedure as above. However, this time I waited until the init process was consuming over 2GB of RAM. Top showed:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15329 james 20 0 2463m 2,0g 12m R 100,0 26,1 6:51.59 memcheck-x86-li
Once again, the biggest offender by a huge margin is dbus:
==15329== 1,153,679,208 bytes in 12,025 blocks are still reachable in loss record 208 of 208
==15329== at 0x402BB88: realloc (in /usr/lib/
==15329== by 0x408E778: dbus_realloc (dbus-memory.c:678)
==15329== by 0x408F08C: reallocate_
==15329== by 0x408F12B: set_length (dbus-string.c:390)
==15329== by 0x408F1A9: open_gap (dbus-string.c:411)
==15329== by 0x408FA67: copy (dbus-string.
==15329== by 0x408FCCE: _dbus_string_
==15329== by 0x408DCF8: marshal_
==15329== by 0x408DD95: marshal_string (dbus-marshal-
==15329== by 0x408DF60: _dbus_marshal_
==15329== by 0x4075CD7: _dbus_type_
==15329== by 0x40768BB: _dbus_type_
==15329==
==15329== LEAK SUMMARY:
==15329== definitely lost: 28 bytes in 1 blocks
==15329== indirectly lost: 24 bytes in 1 blocks
==15329== possibly lost: 133,265 bytes in 315 blocks
==15329== still reachable: 1,154,753,166 bytes in 18,194 blocks
==15329== suppressed: 0 bytes in 0 blocks
Dimitri John Ledkov (xnox) wrote : | #63 |
systemd task assigned to me:
- make it possible for systemd-udevd to ignore kernel events from devices tagged "ignore", using a kernel netlink filter to completely avoid wakeups of systemd-udevd.
Changed in systemd (Ubuntu Saucy): | |
status: | New → In Progress |
assignee: | nobody → Dmitrijs Ledkovs (xnox) |
milestone: | none → ubuntu-13.10 |
Changed in systemd (Ubuntu Saucy): | |
status: | In Progress → Invalid |
Steve Langasek (vorlon) wrote : Re: [Bug 1235649] Re: uevent spam causes libdbus client code in session upstart to consume massive amounts of memory on Ubuntu Touch | #65 |
On Tue, Oct 15, 2013 at 08:19:53AM -0000, James Hunt wrote:
> Once again, the biggest offender by a huge margin is dbus:
> ==15329== 1,153,679,208 bytes in 12,025 blocks are still reachable in loss record 208 of 208
> ==15329== at 0x402BB88: realloc (in /usr/lib/
> ==15329== by 0x408E778: dbus_realloc (dbus-memory.c:678)
> ==15329== by 0x408F08C: reallocate_
> ==15329== by 0x408F12B: set_length (dbus-string.c:390)
> ==15329== by 0x408F1A9: open_gap (dbus-string.c:411)
> ==15329== by 0x408FA67: copy (dbus-string.
> ==15329== by 0x408FCCE: _dbus_string_
> ==15329== by 0x408DCF8: marshal_
> ==15329== by 0x408DD95: marshal_string (dbus-marshal-
> ==15329== by 0x408DF60: _dbus_marshal_
> ==15329== by 0x4075CD7: _dbus_type_
> ==15329== by 0x40768BB: _dbus_type_
Yet this backtrace is still deep in the dbus internals and shows us nothing
about who the actual caller is that's causing the leak. Can you please run
this with --num-callers=20 so we can see why these strings are actually
being allocated?
James Hunt (jamesodhunt) wrote : | #64 |
I can now recreate the memory bloat using a basic NIH D-Bus server program and associated client program.
1) Run the server as:
$ ./test_
2) Run the client as:
$ ./test_
3) Run the updated spammer as:
$ ./spam_
4) Run top and watch the memory used by test_nih_
The problem seems to be caused by the call to notify_clients() - this sends a signal to every connected client every time the spam_dbus_server.sh sends its data to the server. Commenting out the call to notify_clients() stop the memory bloating issue.
The function echo_handled() that is called by notify_clients() is analogous to control_
Again, running valgrind on the server shows the same stack traces as above:
==30015== 91,750,175 bytes in 7,617 blocks are still reachable in loss record 164 of 164
==30015== at 0x402BB88: realloc (in /usr/lib/
==30015== by 0x408E778: dbus_realloc (dbus-memory.c:678)
==30015== by 0x408F08C: reallocate_
==30015== by 0x408F12B: set_length (dbus-string.c:390)
==30015== by 0x408F1A9: open_gap (dbus-string.c:411)
==30015== by 0x408FA67: copy (dbus-string.
==30015== by 0x408FCCE: _dbus_string_
==30015== by 0x408DCF8: marshal_
==30015== by 0x408DD95: marshal_string (dbus-marshal-
==30015== by 0x408DF60: _dbus_marshal_
==30015== by 0x4075CD7: _dbus_type_
==30015== by 0x40768BB: _dbus_type_
James Hunt (jamesodhunt) wrote : | #66 |
James Hunt (jamesodhunt) wrote : | #67 |
James Hunt (jamesodhunt) wrote : | #68 |
Valgrind log from test_nih_
James Hunt (jamesodhunt) wrote : | #69 |
FWICS D-Bus is behaving as designed: since the client has not called dbus_connection
The key part of the D-Bus code that explains this is dbus_connection
-------
/* Now we need to run an iteration to hopefully just write the messages
* out immediately, and otherwise get them queued up
*/
_dbus_
/* If stuff is still queued up, be sure we wake up the main loop */
if (connection-
_dbus_
-------
As shown, the messages (signals in the case of Upstart) are added to the client connection queue, but since the client called
dbus_connection
From the upstart perspective, a fix is to call dbus_connection
client still has unprocessed messages (patch attached).
From the D-Bus perspective, it would be useful to introduce a new public API:
dbus_bool_t
dbus_connection
This would allow Upstart and other D-Bus server applications to detect erroneous client connections.
James Hunt (jamesodhunt) wrote : | #70 |
TBC, using the patch on #69 results in no memory bloat.
tags: | added: patch |
Steve Langasek (vorlon) wrote : | #71 |
After discussing this with Ted, I think the right answer architecturally is to move these clients to talking to the session bus, where filtering of messages will be done for us automatically, and have the session init register on the session bus, rather than deploying an unreliable change like having upstart only keep one message at a time in the queue for a client. This will require a few changes in upstart, including working out the correct way to get the session init to register with the session bus after dbus has been spawned (presumably we want to reuse SIGUSR1 for this, and maybe get the session bus address out of the global environment?)
James Hunt (jamesodhunt) wrote : | #72 |
We already have bug 1203595 for having the Session Init connect to the D-Bus session bus.
Changed in upstart (Ubuntu): | |
status: | Fix Released → Incomplete |
status: | Incomplete → In Progress |
tags: | added: needs-ap-test |
Launchpad Janitor (janitor) wrote : | #73 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in unity (Ubuntu Saucy): | |
status: | New → Confirmed |
Changed in unity (Ubuntu): | |
status: | New → Confirmed |
Changed in upstart: | |
assignee: | nobody → James Hunt (jamesodhunt) |
status: | New → Fix Committed |
Launchpad Janitor (janitor) wrote : | #75 |
This bug was fixed in the package upstart - 1.11-0ubuntu1
---------------
upstart (1.11-0ubuntu1) trusty; urgency=low
[ Dmitrijs Ledkovs ]
* Run as much test-suites as possible, before exiting with failure.
[ James Hunt ]
* New upstream release.
- the global environment is now serialized on re-exec. LP: #1238078.
- fixes support for upstart-file-bridge detecting directory creation.
LP: #1221466.
- Fixes upstart user session to not override the user's configured
umask. LP: #1240686.
- Makes the session init available on the session dbus.
LP: #1203595, #1235649.
[ Steve Langasek ]
* Document the authoritative Vcs branch in debian/control.
* Update Standards-Version to 3.9.5, no changes required.
-- James Hunt <email address hidden> Fri, 15 Nov 2013 13:16:56 +0000
Changed in upstart (Ubuntu): | |
status: | In Progress → Fix Released |
Changed in landing-plan: | |
status: | New → Fix Released |
Changed in unity: | |
status: | New → Confirmed |
Changed in upstart: | |
status: | Fix Committed → Fix Released |
status: | Fix Released → Confirmed |
Changed in upstart: | |
status: | Confirmed → Fix Released |
Changed in unity (Ubuntu Saucy): | |
status: | Confirmed → Invalid |
Ower (ower86-g) wrote : | #76 |
same problem.
Linux 3.14.0-generic
ubuntu 14.04
kdm
init --user use 1.8GB memory in 9 days
both of the above was observed on maguro btw.