Bug #1822738 “memleak in 2.38+ ?” : Bugs : snapd package : Ubuntu

Revision history for this message

Samuele Pedroni (pedronis) wrote on 2019-04-02:

#1

We need to factor that we keep changes around until pruning, not saying there's no leak but that we need to take that into account when searching for it.

Michael Vogt (mvo) on 2019-04-02

description:

updated

Revision history for this message

Samuele Pedroni (pedronis) wrote on 2019-04-02:

#2

This is a good idea from John:

http://paste.ubuntu.com/p/yZkPWJ3FmH/

might also be 0,0,1

Revision history for this message

Samuele Pedroni (pedronis) wrote on 2019-04-02:

#3

Another idea if we suspect go1.10 changes, is to try repeated refreshes with forced prune in between

with snapd compiled with 1.6 , 1.10 and 1.12 perhapse ?

Revision history for this message

Samuele Pedroni (pedronis) wrote on 2019-04-02:

#4

The other option is to try the same with a recompiled snapd from 2.37 (if the issue is in our own code)

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2019-04-02:

#5

The binaries themselves are quite large. Using current master (a057a16c6), and go 1.11.5, snapd is ~20MB, snap is ~15MB in size. I think it's fair to assume that whenever snapd or snap runs, most of it ends up in the memory. So calling 'snap refresh' would take up (worst case) 35MB to start with.

As I understand, the situation is blocking the refresh of some snaps. Wonder if running a simple curl like this would help:

sudo curl -X POST --unix-socket /run/snapd.socket \
http://localhost/v2/snaps/core \
-d '{"action":"refresh"}'

Did some tinkering in my system with snapd and tweaking some Golang runtime options (GOGC specifically).

Right after starting:
$ ps -C snapd -ocmd,rsz,vsz,pid
CMD RSZ VSZ PID
/usr/lib/snapd/snapd 18968 3811024 6923

Running install hello && remove hello in a loop makes the RSS grow. At 23 iterations, it grew from 18100kB to 22352kB, ~180kB per iteration. The state file grew fro ~50kB, to ~140kB. Pruning the state, the state file goes back to ~50kB, RSS stays unchanged (which is expected, I wouldn't expect the allocator to give back the memory). Restarting the loop, it took 25 iterations to see RSS growing again.

This would suggest that what's taking the memory is related to the state.

The state file grows at ~4kB per install/remove loop iteration.

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2019-04-02:

#6

Repeated the same with 2.37:

$ ps -C snapd -ocmd,rsz,vsz,pid
CMD RSZ VSZ PID
/usr/lib/snapd/snapd 17940 2630004 31818

After 23 loops, RSS was 20532. After pruning, took 20 iterations to start growing again.

Feels like we should hook up proper tracing facilities.

Revision history for this message

Samuele Pedroni (pedronis) wrote on 2019-04-02:

#7

you probably want 2.38 and not current master because master now puts things in timings as well

Revision history for this message

John Lenton (chipaca) wrote on 2019-04-02:

#8

What architecture are they seeing the OOMs on?
Have they enabled swap?
What's the output of `free`?

taw

Revision history for this message

Launchpad Janitor (janitor) wrote on 2019-04-03:

#9

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in snapd (Ubuntu):
status:	New → Confirmed

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2019-04-05:

#10

I've looked as some /proc/meminfo dumps I was provided with.

Dump were made at these time points:
1. after boot
2. successful core refresh from 2.38 to 2.37.4
3. successful core refresh from 2.37.4 to 2.38
4. failed refresh of other snap, failed when mounting the snap with: fork/exec /usr/bin/unsquashfs: cannot allocate memory

This is a list of values that varied between dumps (with the exception of LowTotal and CmaTotal):
(1) (2) (3) (4)

MemFree: 4536 kB 11068 kB 23088 kB 52928 kB
MemAvailable: 163016 kB 163572 kB 163900 kB 155532 kB
Buffers: 29036 kB 30404 kB 15008 kB 1992 kB
Cached: 131148 kB 123692 kB 128144 kB 104288 kB
Active: 110720 kB 109724 kB 102740 kB 129208 kB
Inactive: 80724 kB 75892 kB 71808 kB 16128 kB
Active(anon): 33848 kB 34136 kB 33984 kB 41660 kB
Inactive(anon): 3160 kB 3160 kB 3160 kB 3160 kB
Active(file): 76872 kB 75588 kB 68756 kB 87548 kB
Inactive(file): 77564 kB 72732 kB 68648 kB 12968 kB
LowTotal: 243168 kB 243168 kB 243168 kB 243168 kB
LowFree: 4536 kB 11068 kB 23088 kB 52928 kB
AnonPages: 31304 kB 31568 kB 31416 kB 39092 kB
Mapped: 54896 kB 55140 kB 52052 kB 18556 kB
Slab: 23968 kB 23340 kB 22424 kB 21780 kB
SReclaimable: 11112 kB 11252 kB 10476 kB 9156 kB
SUnreclaim: 12856 kB 12088 kB 11948 kB 12624 kB
KernelStack: 1032 kB 1000 kB 992 kB 1008 kB
PageTables: 1064 kB 1044 kB 1044 kB 1060 kB
Committed_AS: 489144 kB 339420 kB 487848 kB 512848 kB
CmaTotal: 65536 kB 65536 kB 65536 kB 65536 kB
CmaFree: 2128 kB 208 kB 15752 kB 33700 kB

Interesting observations:
- MemAvailable stays high, consistent with top output recordings I was provided with previously
- CmaFree drops, that's the memory available to the contiguous memory allocation, IIRC that's for DMA transfers and such
- Slab unreclaimable (SUnreclaim) rises

Even though MemAvailable is high, launching new processes fails. CMA free falling is intersting, I would expect weird things to happen when it drops too low.

I think it would be worth collecting the output of:
- sudo cat /proc/slabinfo
- sudo slabtop -o

In the kernel source code, in tools/vm, there is a tool `slabinfo` which can process the data from /proc/slabinfo in a useful manner. If possible, I think it would be useful to rebuild the tool for given architecture and collect the output of:
- slabinfo
- slabinfo -T

Another angle that may be interesting to explore, just for the sake of making sure we check for the obvious things, is to collect this:
- cat /proc/sys/kernel/pid_max
- ps -efT | wc -l
- dmesg

I've looked as some /proc/meminfo dumps I was provided with.

Dump were made at these time points:
1. after boot
2. successful core refresh from 2.38 to 2.37.4
3. successful core refresh from 2.37.4 to 2.38
4. failed refresh of other snap, failed when mounting the snap with: fork/exec /usr/bin/unsquashfs: cannot allocate memory

This is a list of values that varied between dumps (with the exception of LowTotal and CmaTotal):
                     (1)          (2)          (3)           (4)

MemFree:            4536 kB     11068 kB     23088 kB      52928 kB
MemAvailable:     163016 kB    163572 kB    163900 kB     155532 kB
Buffers:           29036 kB     30404 kB     15008 kB       1992 kB
Cached:           131148 kB    123692 kB    128144 kB     104288 kB
Active:           110720 kB    109724 kB    102740 kB     129208 kB
Inactive:          80724 kB     75892 kB     71808 kB      16128 kB
Active(anon):      33848 kB     34136 kB     33984 kB      41660 kB
Inactive(anon):     3160 kB      3160 kB      3160 kB       3160 kB
Active(file):      76872 kB     75588 kB     68756 kB      87548 kB
Inactive(file):    77564 kB     72732 kB     68648 kB      12968 kB
LowTotal:         243168 kB    243168 kB    243168 kB     243168 kB
LowFree:            4536 kB     11068 kB     23088 kB      52928 kB
AnonPages:         31304 kB     31568 kB     31416 kB      39092 kB
Mapped:            54896 kB     55140 kB     52052 kB      18556 kB
Slab:              23968 kB     23340 kB     22424 kB      21780 kB
SReclaimable:      11112 kB     11252 kB     10476 kB       9156 kB
SUnreclaim:        12856 kB     12088 kB     11948 kB      12624 kB
KernelStack:        1032 kB      1000 kB       992 kB       1008 kB
PageTables:         1064 kB      1044 kB      1044 kB       1060 kB
Committed_AS:     489144 kB    339420 kB    487848 kB     512848 kB
CmaTotal:          65536 kB     65536 kB     65536 kB      65536 kB
CmaFree:            2128 kB       208 kB     15752 kB      33700 kB

Interesting observations:
- MemAvailable stays high, consistent with top output recordings I was provided with previously
- CmaFree drops, that's the memory available to the contiguous memory allocation, IIRC that's for DMA transfers and such
- Slab unreclaimable (SUnreclaim) rises

Even though MemAvailable is high, launching new processes fails. CMA free falling is intersting, I would expect weird things to happen when it drops too low.

I think it would be worth collecting the output of:
- sudo cat /proc/slabinfo
- sudo slabtop -o

In the kernel source code, in tools/vm, there is a tool `slabinfo` which can process the data from /proc/slabinfo in a useful manner. If possible, I think it would be useful to rebuild the tool for given architecture and collect the output of:
- slabinfo
- slabinfo -T

Another angle that may be interesting to explore, just for the sake of making sure we check for the obvious things, is to collect this:
- cat /proc/sys/kernel/pid_max
- ps -efT | wc -l
- dmesg

Michael Vogt (mvo) on 2019-04-09

Changed in snapd (Ubuntu):
status:	Confirmed → In Progress
importance:	Undecided → Critical

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2019-04-09:

#11

When investigating on the affected 32bit ARM system, we have observed that Go runtime would make a request to allocate system memory for it's arena allocator, using a size that was questionable. Even most trivial binary would request 150MB, using mmap(.., PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE..). At the same time, the system would report plenty of available memory, because although the request was satisfied, only few pages were eventually mapped, because the actual use of the mapped region (as reported by pmap -x) was no more than a few MB.

I was able to craft a program that would issue mmap() with large sizes, use only a piece of mapped area and eventually break other programs. This would suggest we would exhaust overcommit limits globally.

In the end we managed to reproduce the problem on the affected system using a trivial piece of Go code. The breaking change was enabling -buildmode=pie (like we do for snapd). The program would fail in runtime initialization with `fatal error: runtime: out of memory`. Relevant strace output:

brk(NULL) = 0x7f6b0000
mmap2(0x7f700000, 807411712, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f700000
mmap2(0x7f900000, 150536192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
fatal error: runtime: out of memory

When a PIE binary was loaded on the system, auxv showed relatively high entry address (0x7f.....). At the same time, brk() would also return high address values (seen in strace output above). Go runtime probes the start address of brk and calculates the size of the arenas. We suspect, that the sizes are miscalculated, resulting in a large mmap() request, thus contributing to the problem.

We did a rundown of a different Go versions. We've verified 1.7 and 1.10 to have the problem, 1.11 was seemingly fixed.

I've bisected Go versions between 1.10beta2 and 1.11beta1, found this commit https://github.com/golang/go/commit/c0392d2e7fbdcd38aafb959e94daf6bbafe2e4e9 to be the first one when the problem is 'fixed' (or not triggered on that particular kernel + userland combination).

The observed mmap() calls request a sane amount of memory, 262kB instead of 150MB like the broken versions did.

When investigating on the affected 32bit ARM system, we have observed that Go runtime would make a request to allocate system memory for it's arena allocator, using a size that was questionable. Even most trivial binary would request 150MB, using mmap(.., PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE..). At the same time, the system would report plenty of available memory, because although the request was satisfied, only few pages were eventually mapped, because the actual use of the mapped region (as reported by pmap -x) was no more than a few MB.

I was able to craft a program that would issue mmap() with large sizes, use only a piece of mapped area and eventually break other programs. This would suggest we would exhaust overcommit limits globally.

In the end we managed to reproduce the problem on the affected system using a trivial piece of Go code. The breaking change was enabling -buildmode=pie (like we do for snapd). The program would fail in runtime initialization with `fatal error: runtime: out of memory`. Relevant strace output:

brk(NULL)                               = 0x7f6b0000
mmap2(0x7f700000, 807411712, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f700000
mmap2(0x7f900000, 150536192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
fatal error: runtime: out of memory

When a PIE binary was loaded on the system, auxv showed relatively high entry address (0x7f.....). At the same time, brk() would also return high address values (seen in strace output above). Go runtime probes the start address of brk and calculates the size of the arenas. We suspect, that the sizes are miscalculated, resulting in a large mmap() request, thus contributing to the problem.

We did a rundown of a different Go versions. We've verified 1.7 and 1.10 to have the problem, 1.11 was seemingly fixed.

I've bisected Go versions between 1.10beta2 and 1.11beta1, found this commit https://github.com/golang/go/commit/c0392d2e7fbdcd38aafb959e94daf6bbafe2e4e9 to be the first one when the problem is 'fixed' (or not triggered on that particular kernel + userland combination).

The observed mmap() calls request a sane amount of memory, 262kB instead of 150MB like the broken versions did.

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2019-04-09:

#12

Proposed tenative fix in snapd is to disable PIE builds. Relevant PR: https://github.com/snapcore/snapd/pull/6700

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2019-04-10: Re: [Bug 1822738] Re: memleak in 2.38+ ?

#13

On Tue, Apr 09, 2019 at 07:58:20AM -0000, Maciej Borzecki wrote:
> Proposed tenative fix in snapd is to disable PIE builds. Relevant PR:
> https://github.com/snapcore/snapd/pull/6700

I dislike this change.

While ASLR is not particularly strong on 32 bit platforms, it is
significantly more useful when used with 64 bit platforms. Go itself may
be relatively safe from the common mistakes that make ASLR really useful
in C, but loading C libraries does happen.

Furthermore I'm not confident it will actually help. The problem appears
to be that a specific size was requested at a specific location and
MAP_FIXED prevents the kernel from finding another region where the
requested allocation could fit:

mmap2(0x7f900000, 150536192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot
allocate memory)

Removing the MAP_FIXED would probably allow this allocation to continue,
albeit at a different location, and would also avoid the caveat given
in the mmap(2) manpage:

              Furthermore, this option is extremely hazardous (when
              used on its own), because it forcibly removes
              preexisting mappings, making it easy for a
              multithreaded process to corrupt its own address
              space.

Thanks

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-10-29:

#14

The following pull request was merged into master in relation to this bug https://github.com/snapcore/snapd/pull/6706

I'm marking it as patch was released since.

Changed in snapd:
status:	New → Fix Released
Changed in snapd (Ubuntu):
status:	In Progress → Fix Released

Ubuntu
snapd package

memleak in 2.38+ ?

Bug Description

Other bug subscribers

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	snapd	Fix Released	Undecided	Unassigned
	snapd (Ubuntu)	Fix Released	Critical	Unassigned

Ubuntusnapd package

memleak in 2.38+ ?

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
snapd package