Bug #1317436 “onboard memory usage/cpu increase and lockup” : Bugs : at-spi2-core package : Ubuntu

Revision history for this message

Zen25000 (zen25000) wrote on 2014-05-08:

#1

memory usage breakdown Edit (81.3 KiB, text/plain)

Revision history for this message

marmuta (marmuta) wrote on 2014-05-08:

#2

Thanks for reporting this. I've been tracking down memory leaks not long ago and found two so far.

One that leaks for every update of the word suggestions is fixed in trunk:
https://bazaar.launchpad.net/~onboard/onboard/trunk/revision/1803

The other one appears to be caused by libatspi/libdbus. Heap usage still counts against Onboard, but I can reproduce the leak with a trivial C sample, meaning I can't fix it from Onboard. The effects of this leak can be quite massive under the right circumstances. Simply using the Gtk3 version of the synaptic package manager quickly adds 100s of MB to GB Onboard, but that seemed to be the onIy occasion so far. I haven't seen significant heap increases during regular usage.

If you monitor memory usage, please apply commit 1803 first. Then it would be interesting to see if a specific application triggers additional leaks or if there is a more or less consistent increase across all usage.

I'm not sure about the lockup. A crash at the right spot can incapacitate Onboard, but it would usually still show some useless reactions, not lockup completely. Maybe, If you can, keep Onboard running in the terminal and check if there are python tracebacks when it happens again.

Revision history for this message

Francesco Fumanti (frafu) wrote on 2014-05-08:

#3

I just uploaded revision 1805 of trunk into our Snapshots PPA. It should be available as soon as launchpad has finished with building it.
https://launchpad.net/~onboard/+archive/snapshots

Revision history for this message

Zen25000 (zen25000) wrote on 2014-05-09:

#4

memory usage log with notes Edit (161.1 KiB, text/plain)

I rebuilt onboard with the patch applied and ran it while the machine was idle for about 16 hours. ( Overnight until 4pm the following day)
On returning to the machine the onboard tray icon refused to open it as before.

During the test there was no direct use of onboard at all and the machine was idle except for the regular system cron jobs plus another two cron jobs one running a speed test to a server hourly and the other checking memory for this test. It was also running hexchat and connected to freenode.

The memory usage increased steadily with time and appears to be still increasing after the lock-up.
Unfortunately I did not run it in a terminal so have no output for that occasion.

I then forcibly killed the process and restarted it (in a terminal) just before the 16:30 cron job, to get an idea of the memory used at start, and followed this with a full breakdown (seconds later).

See notes in attached test results.

The terminal output is interesting at the bottom.

I will re-run with the terminal output piped to a file until it locks again.

Revision history for this message

Zen25000 (zen25000) wrote on 2014-05-10:

#5

continuation of memory use log Edit (3.5 KiB, text/plain)

Following on from the previous post - I let this run overnight again and this morning it had locked up.

The terminal output file was last updated at 02.46 which coincides with memory use of 278MB.

There was nothing but the same repeating message in the terminal output, and nothing was added on clicking the tray icon.

journalctl -f showed nothing either.

As can be seen from the attached, the memory use continues increasing after the lockup.

Maybe you could give me some ideas on how to test/debug further?

Revision history for this message

marmuta (marmuta) wrote on 2014-05-10:

#6

python3.supp Edit (2.1 KiB, text/plain)

I see, we'd need to figure out what makes the heap grow. I'd first try to narrow it down by turning off features in preferences. Start with turning off word-suggestions, auto-capitalization and auto-show, in that order, restart Onboard and watch memory usage. Not too long, just long enough so you can tell memory usage is rising consistently.

Valgrind would give more detailed insights, but it's not an easy beast to handle. I'll have more time to look into the leak tomorrow, but if you want to try it, here is something to get started.

1) install debug symbols for Onboard's dependencies, I'd start with all things at-spi and libdbus

2) rebuild Onboard without optimization
cd onboard
CFLAGS="-g -O0" ./setup.py build --force

3) copy the attached suppression file somewhere. I'll assume it's in Onboard's project directory. This will cut down on the size of valgrind.log.

3) run ./onboard in valgrind. Do this in an xterm to avoid a feedback loop where word suggestions are updated due to Onboard's own console output. Warning: onboard will run at <1/10 of its original speed.
xterm
cd onboard
valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all --suppressions=python3.supp --log-file=valgrind.log python3 -E -tt -u ./onboard

4) wait... not necessarily over night, but long enough to have the leak stand out due to its size. valgrind.log will report tons of perceived problems, only very few are interesting for us.

5) check for prominent leaks in valgrind.log. That file will be too large to post. I'm no expert on valgrind usage, but what I do is go to the end and scroll backwards until I've found an entry for a large amount of lost memory in a high number of blocks. If you know Onboard grew 10MB since the start, that's about the size I would be looking for first.

6) If you find suspicious entries, post them here including the call stack below them. Make sure the call stacks' topmost entries are not filled with question marks, but show function and file names. If there are question marks you'll have to install more debug symbols and restart from 3).

The "AT-SPI: Error in GetItems" warning used to be benign. I believe this is caused by accessibles getting lost, which happens all the time when their widgets are destroyed. Having this message periodically every 2 seconds is weird though. It might be related to the way you track memory usage for this bug. Perhaps run all diagnostics and do all Onboard launches from xterms. xterm doesn't have accessibility support and is less prone to interfere with the testing.