Ubuntu
snapd package

snapd memory sizes grows huge over time

Bug #1642068 reported by Colin Ian King on 2016-11-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	snapd (Ubuntu)	Fix Released	High	Unassigned

Bug Description

I observed that installing and removing snaps seemed to make snapd grow in size, so to see if this assertion was true I repeatedly installed and removed a small package over the course of a few hours and measured the memory footprint with smemstat.

Attached is the smemstat log, with samples taken every 60 seconds. As one can see the USS, PSS and RSS sizes grow steadily to over 90MB over time. This suggests either data is being cached too aggressively or there is a memory leak somewhere, or perhaps garbage collection is not occurring as expected.

Although my test case is rather artificial, it does show there is an issue with memory size on snapd on a long running system when snaps are installed and removed. This is concerning for small IoT embedded devices.

Notes:

USS = Unique Set Size
PSS = Proportional Set Size
RSS = Resident Set Size

AttachedL smemstat log + data in a LibreOffice spreadsheet with graph showing memory size growth.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-15:

smemstat stats on snapd Edit (31.1 KiB, text/plain)

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-15:

Data in LibreOffice spreadsheet + graph Edit (26.5 KiB, application/vnd.oasis.opendocument.spreadsheet)

Changed in snapd (Ubuntu):
importance:	Undecided → High

Revision history for this message

Oliver Grawert (ogra) wrote on 2016-11-16:

could this be related to Bug 1636847 ?

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-16:

I don't think so, this is just snapd itself growing as it eats up heap.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-16:

valgrind is finding ~288 bytes per thread create being leaked:

==12683== Thread 1:
==12683== 288 bytes in 1 blocks are possibly lost in loss record 1 of 7
==12683== at 0x4C2EB45: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12683== by 0x40139AA: allocate_dtv (dl-tls.c:322)
==12683== by 0x40139AA: _dl_allocate_tls (dl-tls.c:539)
==12683== by 0x4E412C9: allocate_stack (allocatestack.c:586)
==12683== by 0x4E412C9: pthread_create@@GLIBC_2.2.5 (pthread_create.c:539)
==12683== by 0xA89731: _cgo_sys_thread_start (in /usr/lib/snapd/snapd)
==12683== by 0x5EB33C: ??? (in /usr/lib/snapd/snapd)
==12683== by 0xC820000A7F: ???
==12683== by 0xC84CBF: ??? (in /usr/lib/snapd/snapd)

..but I suspect there are other memory consumers somewhere else too as this can't explain the megabytes of leakage I'm seeing.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-16:

gdb is showing that go is expanding the arena via anonymous mmap's, e.g: 128MB, 64MB etc

__mmap (addr=0x0, len=134217728, prot=0, flags=16418, fd=-1, offset=0)
__mmap (addr=0x7ff774000000, len=67108864, prot=0, flags=16418, fd=-1, offset=0)
etc..

so something is causing go to chomp up large amount of heap.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-16:

And running health-check on snapd shows the same king of arena expansions with mmap:

Memory Change via mmap() and munmap():
PID mmaps munmaps Change (K) Rate (K/Sec)
21627 snapd 36 1 87124 1452.06 (growing fast)
21637 snapd 35 1 86992 1449.86 (growing fast)
21236 snapd 1 1 65536 1092.26 (growing fast)
21860 snapd 41 1 38971 649.53 (growing fast)
21665 snapd 41 1 38971 649.53 (growing fast)
21598 snapd 41 1 38971 649.53 (growing fast)
21664 snapd 23 2 19995 333.26 (growing fast)
21707 snapd 12 1 12138 202.31 (growing moderately fast)
20838 snapd 3 0 9248 154.13 (growing moderately fast)
20843 snapd 1 0 8196 136.60 (growing moderately fast)

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-16:

If you install/remove on a loop, this will create changes which are kept in memory until they are pruned. This happens in the period of three days for successful ones, and a week for unsuccessful ones (which are aborted, and later pruned). Eventually we'll need to move to a model where this data is kept on disk with proper transactional control around it, but for now this is a cheap way for us to get going.

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-16:

Btw, there's a "snap changes" command you can see which changes currently exist.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-16:

#10

Pruning based on a time interval perhaps should be complimented with a memory threshold check too - on small devices we need to be sure we constrain memory footprint, especially on non-swap based devices otherwise processes will be OOM'd.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-16:

#11

Well, until it's fixed, does that mean that unsuccessful transactions will cause snapd to eat memory - and if so, that seems like a DoS vector to me.

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-16: Re: [Bug 1642068] Re: snapd memory sizes grows huge over time

#12

Out of curiosity, how many changes per second were you doing?

On 16 Nov 2016 23:20, "Colin Ian King" <email address hidden> wrote:

> Well, until it's fixed, does that mean that unsuccessful transactions
> will cause snapd to eat memory - and if so, that seems like a DoS vector
> to me.
>
> --
> You received this bug notification because you are a member of Snappy
> Developers, which is subscribed to snapd in Ubuntu.
> https://bugs.launchpad.net/bugs/1642068
>
> Title:
> snapd memory sizes grows huge over time
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/
> 1642068/+subscriptions
>

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#13

As mentioned above, unsuccessful changes are aborted after a few days, and pruned thereafter.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#14

I still think a daemon that can gobble up a load of memory through misuse needs fixing. Maybe I'm old school when I believe that small footprint daemons that function correctly when abused is the correct behaviour.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#15

@John Lenton: about 3 per sec

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#16

Unless I'm misreading something, it's growing at about 2MB every 60 sec. 180 changes should use a *lot* less than 2MB. So there probably is something else going on...

Revision history for this message

Mark Shuttleworth (sabdfl) wrote on 2016-11-17:

#17

Is pruning expensive? Could we set a bound on the number of unsuccessful
transactions in the system and prune when that is reached? So the number
is bounded in time (3 days) and in length?

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#18

@Colin, Sure it definitely needs to be improved. As I said above, this is just a cheap way to get us going, and the proper fix is a transactional storage on disk.

@Mark, Yes, we can prune based on number of changes as well. It's not expensive. That's a good trick to get us going farther.

@John, 2MB / (10 tasks * 180 changes) is in the ballpark of 1kb per task. That's not unrealistic I think. Also worth noting it's not just plain data.. there will be a small amount of additional garbage created per entry when processing it.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#19

Sure. The other issue was the potential leak of ~288 bytes per thread creation at the C/Go interface layer, as mentioned in comment #5. I suspect that's an issue that needs fixing with the Go layering glue

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#20

Did you did into that problem? I don't see the potential leak looking at the function. All of the arguments other than *ThreadStart are allocated in the stack, and ts is freed on threadentry which is the function provided as the start_routine parameter of pthread_create.

What is the actual issue reported?

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#21

@gustavo, I was just reporting some incidental data that I observed with valgrind as I thought it may be pertinent. It is just an observation that made me dig a bit deeper as it didn't explain the larger heap allocations. I just wanted to be thorough and report my findings as it may be useful if it's a leak issue that could affect other threaded Go applications.

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#22

I've put a branch up that drops a few things so that building snapd without cgo is possible. Could you take it from https://github.com/chipaca/snappy/tree/no-c and either build it, and repeat the memory tests?

You can either build it entirely by hand setting CGO_ENABLED=0 to force it to build a static binary, or build it normally (using dpkg-buildpackage) and then force the resolver to use the pure go one with GODEBUG=netdns=go

Thanks!

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#23

(if you'd rather I put the binary somewhere for you, do let me know)

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#24

@Colin, That's what I understood previously, but thanks for confirming it. The question was whether you had digged further to see whether that warning made any sense, or whether you had any further data confirming the leak to be real. I'll take your response as a no, which is fine.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#25

@Gustavo, sorry for the confusion, my response is no.

@John, building this on my Zesty box:

$ dpkg-buildpackage
...
debian/rules:67: recipe for target 'override_dh_auto_build' failed
make[1]: *** [override_dh_auto_build] Error 1
make[1]: Leaving directory '/home/king/tmp/snappy'
debian/rules:50: recipe for target 'build' failed
make: *** [build] Error 2
dpkg-buildpackage: error: debian/rules build gave error exit status 2

perhaps you could supply .deb for me

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#26

snapd.deb Edit (6.7 MiB, application/x-debian-package)

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#27

attached

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#28

(that one has a static binary; the only caveat is because it's master it'll bump your state.json to the latest thing, which might then not let you go back to whatever you were using before)

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#29

@John, I'm not seeing memory leaks on the thread create paths now. I guess that was expected.

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#30

snapd_dyn.deb Edit (6.7 MiB, application/x-debian-package)

Could you try again with this one? This one is dynamically linked but has an Environment line in /lib/systemd/system/snapd.service to force it to use the Go resolver.

And, lastly, if you can remove that line from the service file and try again, then it should use the cgo resolver.

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#31

Once more, unless we have evidence showing otherwise, there were no leaks before either. Let's please stop pretending otherwise.

Revision history for this message

Colin Ian King (colin-king) wrote on 2016-11-17:

#32

@Gustavo, OK, I'll defer to your wishes. Just not 100% how valgrind could spot a leak which is not there.

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#33

@Colin, Sorry if it sounds different, but this is not a wish of mine. It's just about respecting the data we have. Not even Valgrind itself is reporting this as a leak. It's saying "block possibly lost". Unless we have either very basic empirical evidence of a leak (e.g. offending code alone in a loop grows the process larger), or someone that disagrees with the assessment I've made above reading the code, we shouldn't regard this as a leak.

Revision history for this message

John Lenton (chipaca) wrote on 2016-11-17:

#34

For example, doing `snap find` in a loop grows snapd. I struggle to not call that a leak.

Revision history for this message

Gustavo Niemeyer (niemeyer) wrote on 2016-11-17:

#35

If it never stops growing, it's obviously a leak.. somewhere! If it stabilizes after a while, then it's a side effect of Go being a non-deterministic garbage collected language. Again, data.

If the goal is testing this particular code path, it'd be best to have a tiny test case exercising it, rather than doing something like "snap find".

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2016-11-17:

#36

On 18 November 2016 at 02:26, Colin Ian King <email address hidden>
wrote:

> @gustavo, I was just reporting some incidental data that I observed with
> valgrind as I thought it may be pertinent.

FWIW, I wouldn't in general expect valgrind to be useful with a go program.
The Go runtime manages its heap in a totally different way to a C program.

Revision history for this message

babak (bk-ravandi) wrote on 2016-11-28:

#37

I am having the same issue. The /usr/lib/snapd/snapd memory usage grows very large and never releases. Here is what my application does:

1) In runtime attach volumes to the ubuntu server using the OpenStack cinder
2) run FIO test on each volume every 30 seconds
3) randomly detach volumes

the snapd grows too large that I get "[Errno 12] Cannot allocate memory" trying to run any application (even pkill) so the only option is rebooting the system. Even when I stop my application, the memory won't release unless reboot.

I am using FIO version: fio-2.2.10

Below is my ubuntu version (its 32 bit):
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

ubuntu@vm-1:~$ uname -a
Linux vm-1 4.4.0-47-generic #68-Ubuntu SMP Wed Oct 26 19:39:59 UTC 2016 i686 i686 i686 GNU/Linux

Michael Vogt (mvo) on 2017-01-02

Changed in snapd (Ubuntu):
status:	New → In Progress
status:	In Progress → Triaged

Revision history for this message

Michael Vogt (mvo) wrote on 2017-01-02:

#38

The latest snapd 2.20 does no longer use cgo, so the risk of memory leaks is greatly reduced this way.

There is still the problem that changes accumulate over time and that can eat quite a bit of memory. The following branch addresses this: https://github.com/snapcore/snapd/pull/2545

With the above branch I get something like this: http://paste.ubuntu.com/23728089/ (while running "while true; do sudo snap remove hello; sudo snap install hello; done").

Revision history for this message

Michael Vogt (mvo) wrote on 2017-01-03:

#39

When running snapd with the following branches:
- https://github.com/snapcore/snapd/pull/2546
- https://github.com/snapcore/snapd/pull/2545
under:
`while true; do sudo smemstat -p snapd ; sleep 60; done`
and with constantly installing/removing a snap:
`while true; do sudo snap remove hello; sudo snap install hello; done`
it is going into a steady state at some point:
http://paste.ubuntu.com/23732666/
where on my system the total RSS is around ~50MB but no longer growing.

The amount of changes goes up to ~350 and is then purged down to 100
again after the 10min purge interval.

So it looks like once the changes limiter lands this should be fixed.

Revision history for this message

Michael Vogt (mvo) wrote on 2017-01-26:

#40

The fix for this has landed, we now have an upper bound for the amount of changes we keep.

Changed in snapd (Ubuntu):
status:	Triaged → Fix Committed

Maciej Borzecki (maciek-borzecki) on 2019-10-31

Changed in snapd (Ubuntu):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntusnapd package

snapd memory sizes grows huge over time

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
snapd package