memleak in 2.38+ ?

Bug #1822738 reported by Michael Vogt on 2019-04-02
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
snapd
Undecided
Unassigned
snapd (Ubuntu)
Critical
Unassigned

Bug Description

We might have a memleak in snapd 2.38+ on install - this could be related to the move from go-1.6 to go-1.10. This was reported by a customer. The data points we have so far:

- happens on refresh for the customer
- on systems with low amounts of memory (256M) this can lead to install/revert failures because e.g. unsquashfs will fail to extract the snap.yaml
- the customer is using a plugin that talks to the socket regularly, maybe we leak there

I did some initial testing that refreshes/installs a bunch of times (20 in this test) to see how that affects the memory. Once for 2.37.4 once for 2.38-git. One thing to keep in mind is that the changes data will requires a small
amount of memory. So a small growth per install is normal (that the data that
snap changes will display). We prune those after pruneMaxChanges is hit or
after 24h.

The data:

$ snap version
snap 2.37.4+18.04.1
snapd 2.37.4+18.04.1
series 16
ubuntu 18.04
kernel 4.15.0-46-generic
$ sudo systemctl restart snapd ; sudo snap install --dangerous /tmp/hello_20.snap; ps -C snapd -ocmd,rsz,vsz; for i in $(seq 20); do sudo snap install --dangerous /tmp/hello_20.snap ; done ; ps -C snapd -ocmd,rsz,vsz
hello 2.10 installed
CMD RSZ VSZ
/usr/lib/snapd/snapd 31580 1221524
hello 2.10 installed
...
hello 2.10 installed
CMD RSZ VSZ
/usr/lib/snapd/snapd 33952 1297088

and then with:
$ snap version
snap 2.38+git1216.13ed1b8~ubuntu16.04.1
snapd 2.38+git1216.13ed1b8~ubuntu16.04.1
series 16
ubuntu 18.04
kernel 4.15.0-46-generic
$ sudo systemctl restart snapd ; sudo snap install --dangerous /tmp/hello_20.snap; ps -C snapd -ocmd,rsz,vsz; for i in $(seq 20); do sudo snap install --dangerous /tmp/hello_20.snap ; done ; ps -C snapd -ocmd,rsz,vsz
hello 2.10 installed
CMD RSZ VSZ
/usr/lib/snapd/snapd 35832 1226080
hello 2.10 installed
...
hello 2.10 installed
CMD RSZ VSZ
/usr/lib/snapd/snapd 36760 1300068

So this looks like the RSZ size of snapd itself is bigger but it seems like we indeed grow.

Samuele Pedroni (pedronis) wrote :

We need to factor that we keep changes around until pruning, not saying there's no leak but that we need to take that into account when searching for it.

Michael Vogt (mvo) on 2019-04-02
description: updated
Samuele Pedroni (pedronis) wrote :

This is a good idea from John:

http://paste.ubuntu.com/p/yZkPWJ3FmH/

might also be 0,0,1

Samuele Pedroni (pedronis) wrote :

Another idea if we suspect go1.10 changes, is to try repeated refreshes with forced prune in between

with snapd compiled with 1.6 , 1.10 and 1.12 perhapse ?

Samuele Pedroni (pedronis) wrote :

The other option is to try the same with a recompiled snapd from 2.37 (if the issue is in our own code)

The binaries themselves are quite large. Using current master (a057a16c6), and go 1.11.5, snapd is ~20MB, snap is ~15MB in size. I think it's fair to assume that whenever snapd or snap runs, most of it ends up in the memory. So calling 'snap refresh' would take up (worst case) 35MB to start with.

As I understand, the situation is blocking the refresh of some snaps. Wonder if running a simple curl like this would help:

 sudo curl -X POST --unix-socket /run/snapd.socket \
     http://localhost/v2/snaps/core \
     -d '{"action":"refresh"}'

Did some tinkering in my system with snapd and tweaking some Golang runtime options (GOGC specifically).

Right after starting:
$ ps -C snapd -ocmd,rsz,vsz,pid
CMD RSZ VSZ PID
/usr/lib/snapd/snapd 18968 3811024 6923

Running install hello && remove hello in a loop makes the RSS grow. At 23 iterations, it grew from 18100kB to 22352kB, ~180kB per iteration. The state file grew fro ~50kB, to ~140kB. Pruning the state, the state file goes back to ~50kB, RSS stays unchanged (which is expected, I wouldn't expect the allocator to give back the memory). Restarting the loop, it took 25 iterations to see RSS growing again.

This would suggest that what's taking the memory is related to the state.

The state file grows at ~4kB per install/remove loop iteration.

Repeated the same with 2.37:

$ ps -C snapd -ocmd,rsz,vsz,pid
CMD RSZ VSZ PID
/usr/lib/snapd/snapd 17940 2630004 31818

After 23 loops, RSS was 20532. After pruning, took 20 iterations to start growing again.

Feels like we should hook up proper tracing facilities.

Samuele Pedroni (pedronis) wrote :

you probably want 2.38 and not current master because master now puts things in timings as well

John Lenton (chipaca) wrote :

What architecture are they seeing the OOMs on?
Have they enabled swap?
What's the output of `free`?

taw

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in snapd (Ubuntu):
status: New → Confirmed

I've looked as some /proc/meminfo dumps I was provided with.

Dump were made at these time points:
1. after boot
2. successful core refresh from 2.38 to 2.37.4
3. successful core refresh from 2.37.4 to 2.38
4. failed refresh of other snap, failed when mounting the snap with: fork/exec /usr/bin/unsquashfs: cannot allocate memory

This is a list of values that varied between dumps (with the exception of LowTotal and CmaTotal):
                     (1) (2) (3) (4)

MemFree: 4536 kB 11068 kB 23088 kB 52928 kB
MemAvailable: 163016 kB 163572 kB 163900 kB 155532 kB
Buffers: 29036 kB 30404 kB 15008 kB 1992 kB
Cached: 131148 kB 123692 kB 128144 kB 104288 kB
Active: 110720 kB 109724 kB 102740 kB 129208 kB
Inactive: 80724 kB 75892 kB 71808 kB 16128 kB
Active(anon): 33848 kB 34136 kB 33984 kB 41660 kB
Inactive(anon): 3160 kB 3160 kB 3160 kB 3160 kB
Active(file): 76872 kB 75588 kB 68756 kB 87548 kB
Inactive(file): 77564 kB 72732 kB 68648 kB 12968 kB
LowTotal: 243168 kB 243168 kB 243168 kB 243168 kB
LowFree: 4536 kB 11068 kB 23088 kB 52928 kB
AnonPages: 31304 kB 31568 kB 31416 kB 39092 kB
Mapped: 54896 kB 55140 kB 52052 kB 18556 kB
Slab: 23968 kB 23340 kB 22424 kB 21780 kB
SReclaimable: 11112 kB 11252 kB 10476 kB 9156 kB
SUnreclaim: 12856 kB 12088 kB 11948 kB 12624 kB
KernelStack: 1032 kB 1000 kB 992 kB 1008 kB
PageTables: 1064 kB 1044 kB 1044 kB 1060 kB
Committed_AS: 489144 kB 339420 kB 487848 kB 512848 kB
CmaTotal: 65536 kB 65536 kB 65536 kB 65536 kB
CmaFree: 2128 kB 208 kB 15752 kB 33700 kB

Interesting observations:
- MemAvailable stays high, consistent with top output recordings I was provided with previously
- CmaFree drops, that's the memory available to the contiguous memory allocation, IIRC that's for DMA transfers and such
- Slab unreclaimable (SUnreclaim) rises

Even though MemAvailable is high, launching new processes fails. CMA free falling is intersting, I would expect weird things to happen when it drops too low.

I think it would be worth collecting the output of:
- sudo cat /proc/slabinfo
- sudo slabtop -o

In the kernel source code, in tools/vm, there is a tool `slabinfo` which can process the data from /proc/slabinfo in a useful manner. If possible, I think it would be useful to rebuild the tool for given architecture and collect the output of:
- slabinfo
- slabinfo -T

Another angle that may be interesting to explore, just for the sake of making sure we check for the obvious things, is to collect this:
- cat /proc/sys/kernel/pid_max
- ps -efT | wc -l
- dmesg

Michael Vogt (mvo) on 2019-04-09
Changed in snapd (Ubuntu):
status: Confirmed → In Progress
importance: Undecided → Critical

When investigating on the affected 32bit ARM system, we have observed that Go runtime would make a request to allocate system memory for it's arena allocator, using a size that was questionable. Even most trivial binary would request 150MB, using mmap(.., PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE..). At the same time, the system would report plenty of available memory, because although the request was satisfied, only few pages were eventually mapped, because the actual use of the mapped region (as reported by pmap -x) was no more than a few MB.

I was able to craft a program that would issue mmap() with large sizes, use only a piece of mapped area and eventually break other programs. This would suggest we would exhaust overcommit limits globally.

In the end we managed to reproduce the problem on the affected system using a trivial piece of Go code. The breaking change was enabling -buildmode=pie (like we do for snapd). The program would fail in runtime initialization with `fatal error: runtime: out of memory`. Relevant strace output:

brk(NULL) = 0x7f6b0000
mmap2(0x7f700000, 807411712, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f700000
mmap2(0x7f900000, 150536192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
fatal error: runtime: out of memory

When a PIE binary was loaded on the system, auxv showed relatively high entry address (0x7f.....). At the same time, brk() would also return high address values (seen in strace output above). Go runtime probes the start address of brk and calculates the size of the arenas. We suspect, that the sizes are miscalculated, resulting in a large mmap() request, thus contributing to the problem.

We did a rundown of a different Go versions. We've verified 1.7 and 1.10 to have the problem, 1.11 was seemingly fixed.

I've bisected Go versions between 1.10beta2 and 1.11beta1, found this commit https://github.com/golang/go/commit/c0392d2e7fbdcd38aafb959e94daf6bbafe2e4e9 to be the first one when the problem is 'fixed' (or not triggered on that particular kernel + userland combination).

The observed mmap() calls request a sane amount of memory, 262kB instead of 150MB like the broken versions did.

Proposed tenative fix in snapd is to disable PIE builds. Relevant PR: https://github.com/snapcore/snapd/pull/6700

On Tue, Apr 09, 2019 at 07:58:20AM -0000, Maciej Borzecki wrote:
> Proposed tenative fix in snapd is to disable PIE builds. Relevant PR:
> https://github.com/snapcore/snapd/pull/6700

I dislike this change.

While ASLR is not particularly strong on 32 bit platforms, it is
significantly more useful when used with 64 bit platforms. Go itself may
be relatively safe from the common mistakes that make ASLR really useful
in C, but loading C libraries does happen.

Furthermore I'm not confident it will actually help. The problem appears
to be that a specific size was requested at a specific location and
MAP_FIXED prevents the kernel from finding another region where the
requested allocation could fit:

mmap2(0x7f900000, 150536192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot
allocate memory)

Removing the MAP_FIXED would probably allow this allocation to continue,
albeit at a different location, and would also avoid the caveat given
in the mmap(2) manpage:

              Furthermore, this option is extremely hazardous (when
              used on its own), because it forcibly removes
              preexisting mappings, making it easy for a
              multithreaded process to corrupt its own address
              space.

Thanks

Zygmunt Krynicki (zyga) wrote :

The following pull request was merged into master in relation to this bug https://github.com/snapcore/snapd/pull/6706

I'm marking it as patch was released since.

Changed in snapd:
status: New → Fix Released
Changed in snapd (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers