High resource usage and possible memory leak 1.24.5

Bug #1519473 reported by Jorge Niedbalski
This bug report is a duplicate of:  Bug #1587644: jujud and mongo cpu/ram usage spike. Edit Remove
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
juju-core
Triaged
High
Unassigned

Bug Description

[Environment]

Description: Ubuntu 14.04.3 LTS
Juju-core: 1.24.5

[Description]

This is an openstack installation with 34 machines all of them running agent-version: 1.24.5.1, after
the service is restarted the virtual memory usage is around 1G, and then it increases over time, which is clearly
a memory leak somehwere.

$ grep jujud ps
root 14529 71.0 82.3 4853644 3252872 ? Ssl 04:36 171:21 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug

-- top --
top - 13:07:37 up 3:01, 1 user, load average: 36.30, 33.79, 32.95
Tasks: 112 total, 3 running, 109 sleeping, 0 stopped, 0 zombie
%Cpu(s): 69.1 us, 17.7 sy, 0.0 ni, 5.7 id, 2.4 wa, 0.0 hi, 5.1 si, 0.0 st
KiB Mem: 7982264 total, 7860304 used, 121960 free, 4404 buffers
KiB Swap: 1985532 total, 151688 used, 1833844 free. 299900 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 1110 root 20 0 8191312 6.810g 8108 R 188.0 89.5 380:04.17 jujud
 1107 root 20 0 5427900 193928 12548 S 150.4 2.4 197:31.79 mongod
 1577 syslog 20 0 794796 13656 1264 S 37.6 0.2 61:42.08 rsyslogd

-- process map ---

00400000-010c8000 r-xp 00000000 fd:01 525132 /var/lib/juju/tools/1.24.5.1-trusty-amd64/jujud
010c8000-03fa1000 r--p 00cc8000 fd:01 525132 /var/lib/juju/tools/1.24.5.1-trusty-amd64/jujud
03fa1000-03fd2000 rw-p 03ba1000 fd:01 525132 /var/lib/juju/tools/1.24.5.1-trusty-amd64/jujud
03fd2000-03ff8000 rw-p 00000000 00:00 0
058b5000-058d6000 rw-p 00000000 00:00 0 [heap]
c000000000-c000d68000 rw-p 00000000 00:00 0
c1f530c000-c3bcf30000 rw-p 00000000 00:00 0
7fdbacff0000-7fdbb24ef000 rw-p 00000000 00:00 0
7fdbb24ef000-7fdbb24f0000 ---p 00000000 00:00 0
7fdbb24f0000-7fdbb2cf0000 rw-p 00000000 00:00 0 [stack:8484]
7fdbb385a000-7fdbb431a000 rw-p 00000000 00:00 0
7fdbb431a000-7fdbb431b000 ---p 00000000 00:00 0
7fdbb431b000-7fdbb705b000 rw-p 00000000 00:00 0 [stack:7440]
7fdbb7077000-7fdbb91b7000 rw-p 00000000 00:00 0
7fdbb91df000-7fdbbacdf000 rw-p 00000000 00:00 0
7fdbbacfa000-7fdbbbf1a000 rw-p 00000000 00:00 0
7fdbbbf22000-7fdbbc9e2000 rw-p 00000000 00:00 0
7fdbbc9fd000-7fdbbd17d000 rw-p 00000000 00:00 0
7fdbbd18f000-7fdbbd74f000 rw-p 00000000 00:00 0
7fdbbd766000-7fdbbec06000 rw-p 00000000 00:00 0
7fdbbec20000-7fdbc0000000 rw-p 00000000 00:00 0
7fdbc0000000-7fdbc0021000 rw-p 00000000 00:00 0
7fdbc0021000-7fdbc4000000 ---p 00000000 00:00 0
7fdbc400d000-7fdbc654d000 rw-p 00000000 00:00 0
7fdbc655e000-7fdbc693e000 rw-p 00000000 00:00 0
7fdbc693e000-7fdbc693f000 ---p 00000000 00:00 0
7fdbc693f000-7fdbc77ff000 rw-p 00000000 00:00 0 [stack:1988]
7fdbc77ff000-7fdbc7800000 ---p 00000000 00:00 0
7fdbc7800000-7fdbc8000000 rw-p 00000000 00:00 0 [stack:1956]
7fdbc8000000-7fdbc8021000 rw-p 00000000 00:00 0
7fdbc8021000-7fdbcc000000 ---p 00000000 00:00 0
7fdbcc018000-7fdbccff8000 rw-p 00000000 00:00 0
7fdbcd00e000-7fdbcebae000 rw-p 00000000 00:00 0
7fdbcebae000-7fdbcebc5000 r-xp 00000000 fd:01 526895 /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdbcebc5000-7fdbcedc5000 ---p 00017000 fd:01 526895 /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdbcedc5000-7fdbcedc6000 r--p 00017000 fd:01 526895 /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdbcedc6000-7fdbcedc7000 rw-p 00018000 fd:01 526895 /lib/x86_64-linux-gnu/libresolv-2.19.so
7fdbcedc7000-7fdbcedc9000 rw-p 00000000 00:00 0
7fdbcedc9000-7fdbcedce000 r-xp 00000000 fd:01 533585 /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdbcedce000-7fdbcefcd000 ---p 00005000 fd:01 533585 /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdbcefcd000-7fdbcefce000 r--p 00004000 fd:01 533585 /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdbcefce000-7fdbcefcf000 rw-p 00005000 fd:01 533585 /lib/x86_64-linux-gnu/libnss_dns-2.19.so
7fdbcefcf000-7fdbcf1cf000 rw-p 00000000 00:00 0
7fdbcf1cf000-7fdbcf1da000 r-xp 00000000 fd:01 533603 /lib/x86_64-linux-gnu/libnss_nis-2.19.so
7fdbcf1da000-7fdbcf3d9000 ---p 0000b000 fd:01 533603 /lib/x86_64-linux-gnu/libnss_nis-2.19.so
7fdbcf3d9000-7fdbcf3da000 r--p 0000a000 fd:01 533603 /lib/x86_64-linux-gnu/libnss_nis-2.19.so
7fdbcf3da000-7fdbcf3db000 rw-p 0000b000 fd:01 533603 /lib/x86_64-linux-gnu/libnss_nis-2.19.so
7fdbcf3db000-7fdbcf3f2000 r-xp 00000000 fd:01 533592 /lib/x86_64-linux-gnu/libnsl-2.19.so
7fdbcf3f2000-7fdbcf5f1000 ---p 00017000 fd:01 533592 /lib/x86_64-linux-gnu/libnsl-2.19.so
7fdbcf5f1000-7fdbcf5f2000 r--p 00016000 fd:01 533592 /lib/x86_64-linux-gnu/libnsl-2.19.so
7fdbcf5f2000-7fdbcf5f3000 rw-p 00017000 fd:01 533592 /lib/x86_64-linux-gnu/libnsl-2.19.so
7fdbcf5f3000-7fdbcf5f5000 rw-p 00000000 00:00 0
7fdbcf5f5000-7fdbcf5fe000 r-xp 00000000 fd:01 533591 /lib/x86_64-linux-gnu/libnss_compat-2.19.so
7fdbcf5fe000-7fdbcf7fd000 ---p 00009000 fd:01 533591 /lib/x86_64-linux-gnu/libnss_compat-2.19.so
7fdbcf7fd000-7fdbcf7fe000 r--p 00008000 fd:01 533591 /lib/x86_64-linux-gnu/libnss_compat-2.19.so
7fdbcf7fe000-7fdbcf7ff000 rw-p 00009000 fd:01 533591 /lib/x86_64-linux-gnu/libnss_compat-2.19.so
7fdbcf7ff000-7fdbcf800000 ---p 00000000 00:00 0
7fdbcf800000-7fdbd0000000 rw-p 00000000 00:00 0 [stack:1554]
7fdbd0000000-7fdbd0021000 rw-p 00000000 00:00 0
7fdbd0021000-7fdbd4000000 ---p 00000000 00:00 0
7fdbd4000000-7fdbd4021000 rw-p 00000000 00:00 0
7fdbd4021000-7fdbd8000000 ---p 00000000 00:00 0
7fdbd8000000-7fdbd8021000 rw-p 00000000 00:00 0
7fdbd8021000-7fdbdc000000 ---p 00000000 00:00 0
7fdbdc000000-7fdbdc021000 rw-p 00000000 00:00 0
7fdbdc021000-7fdbe0000000 ---p 00000000 00:00 0
7fdbe0000000-7fdbe0021000 rw-p 00000000 00:00 0
7fdbe0021000-7fdbe4000000 ---p 00000000 00:00 0
7fdbe4000000-7fdbe4021000 rw-p 00000000 00:00 0
7fdbe4021000-7fdbe8000000 ---p 00000000 00:00 0
7fdbe8009000-7fdbe8269000 rw-p 00000000 00:00 0
7fdbe826d000-7fdbe832d000 rw-p 00000000 00:00 0
7fdbe832d000-7fdbe832e000 ---p 00000000 00:00 0
7fdbe832e000-7fdbe8b2e000 rw-p 00000000 00:00 0 [stack:1547]
7fdbe8b2e000-7fdbe8b2f000 ---p 00000000 00:00 0
7fdbe8b2f000-7fdbe95cf000 rw-p 00000000 00:00 0 [stack:1545]
7fdbe95cf000-7fdbe95d0000 ---p 00000000 00:00 0
7fdbe95d0000-7fdbe9dd0000 rw-p 00000000 00:00 0 [stack:1520]
7fdbe9dd0000-7fdbe9dd1000 ---p 00000000 00:00 0
7fdbe9dd1000-7fdbea5f1000 rw-p 00000000 00:00 0 [stack:1519]
7fdbea5f1000-7fdbea5f2000 ---p 00000000 00:00 0
7fdbea5f2000-7fdbeadf2000 rw-p 00000000 00:00 0 [stack:1354]
7fdbeadf2000-7fdbeadfd000 r-xp 00000000 fd:01 533583 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdbeadfd000-7fdbeaffc000 ---p 0000b000 fd:01 533583 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdbeaffc000-7fdbeaffd000 r--p 0000a000 fd:01 533583 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdbeaffd000-7fdbeaffe000 rw-p 0000b000 fd:01 533583 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fdbeaffe000-7fdbeafff000 ---p 00000000 00:00 0
7fdbeafff000-7fdbeb7ff000 rw-p 00000000 00:00 0 [stack:1353]
7fdbeb7ff000-7fdbeb800000 ---p 00000000 00:00 0
7fdbeb800000-7fdbec000000 rw-p 00000000 00:00 0 [stack:1320]
7fdbec000000-7fdbec021000 rw-p 00000000 00:00 0
7fdbec021000-7fdbf0000000 ---p 00000000 00:00 0
7fdbf0018000-7fdbf0058000 rw-p 00000000 00:00 0
7fdbf0058000-7fdbf0059000 ---p 00000000 00:00 0
7fdbf0059000-7fdbf0959000 rw-p 00000000 00:00 0 [stack:1319]
7fdbf0959000-7fdbf095a000 ---p 00000000 00:00 0
7fdbf095a000-7fdbf115a000 rw-p 00000000 00:00 0
7fdbf115a000-7fdbf115b000 ---p 00000000 00:00 0
7fdbf115b000-7fdbf195b000 rw-p 00000000 00:00 0 [stack:1241]
7fdbf195b000-7fdbf195c000 ---p 00000000 00:00 0
7fdbf195c000-7fdbf215c000 rw-p 00000000 00:00 0 [stack:1233]
7fdbf215c000-7fdbf2317000 r-xp 00000000 fd:01 533597 /lib/x86_64-linux-gnu/libc-2.19.so
7fdbf2317000-7fdbf2516000 ---p 001bb000 fd:01 533597 /lib/x86_64-linux-gnu/libc-2.19.so
7fdbf2516000-7fdbf251a000 r--p 001ba000 fd:01 533597 /lib/x86_64-linux-gnu/libc-2.19.so
7fdbf251a000-7fdbf251c000 rw-p 001be000 fd:01 533597 /lib/x86_64-linux-gnu/libc-2.19.so
7fdbf251c000-7fdbf2521000 rw-p 00000000 00:00 0
7fdbf2521000-7fdbf253a000 r-xp 00000000 fd:01 533598 /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdbf253a000-7fdbf2739000 ---p 00019000 fd:01 533598 /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdbf2739000-7fdbf273a000 r--p 00018000 fd:01 533598 /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdbf273a000-7fdbf273b000 rw-p 00019000 fd:01 533598 /lib/x86_64-linux-gnu/libpthread-2.19.so
7fdbf273b000-7fdbf273f000 rw-p 00000000 00:00 0
7fdbf273f000-7fdbf2762000 r-xp 00000000 fd:01 533594 /lib/x86_64-linux-gnu/ld-2.19.so
7fdbf2765000-7fdbf2905000 rw-p 00000000 00:00 0 [stack:1242]
7fdbf2915000-7fdbf2958000 rw-p 00000000 00:00 0
7fdbf295f000-7fdbf2961000 rw-p 00000000 00:00 0
7fdbf2961000-7fdbf2962000 r--p 00022000 fd:01 533594 /lib/x86_64-linux-gnu/ld-2.19.so
7fdbf2962000-7fdbf2963000 rw-p 00023000 fd:01 533594 /lib/x86_64-linux-gnu/ld-2.19.so
7fdbf2963000-7fdbf2964000 rw-p 00000000 00:00 0
7fff4df0a000-7fff4df2b000 rw-p 00000000 00:00 0 [stack]
7fff4dffe000-7fff4e000000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

--- Attachments

1) Strace output of the jujud process
2) Memory dump of the jujud process.

Tags: sts
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :
tags: added: sts
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.3
Revision history for this message
Jorge Niedbalski (niedbalski) wrote :
Revision history for this message
Felipe Reyes (freyes) wrote :

I have an environment using juju .1.20.11, where we are seeing a similar behaviour, here are some details about the connections http://pastebin.ubuntu.com/13629263/ . This environment is using 3 state server

I'm also attaching machine-0.log

Changed in juju-core:
milestone: 1.25.3 → 1.25.4
Revision history for this message
Junien F (axino) wrote :

Hi,

I think I'm running into the same bug with juju version 1.24.0 on a GCE environment, probably as a fallout from a charm upgrade.

The machine 0 jujud, when started, quickly (ballpark figure : 15 min) uses all the memory of the server, and then does nothing. All the clients command then start failing (eg "juju status").

machine-0.log is at https://pastebin.canonical.com/148046/ (cut from the start of the agent)
juju status (which works if you run it briefly after restarting machine-0 jujud) is at https://pastebin.canonical.com/148047/ - as you can see, each unit is in a "hook failed" status.

And finally a pmap of the machine-0 jujud process : https://pastebin.canonical.com/148033/

If you need more data, we should be able to keep the environment for a few days, but we'll need to redeploy after that.

Thanks !

Changed in juju-core:
milestone: 1.25.4 → 1.25.5
Changed in juju-core:
assignee: nobody → Dave Cheney (dave-cheney)
Revision history for this message
Haw Loeung (hloeung) wrote :

We might be seeing this in PS4.5 as well. Not much in all-machines.log other than leadership manager stopped spam as per pastebin:

https://pastebin.canonical.com/148810/

(starts line #2)

Then a combination of the leadership and tomb dying:

https://pastebin.canonical.com/148811/

It recovers after we bounce both jujud-machine-0 and juju-db.

Revision history for this message
Haw Loeung (hloeung) wrote :

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28238 root 20 0 2950520 1.380g 14708 R 262.2 8.8 56:46.95 jujud
 4757 root 20 0 14.358g 1.445g 1.208g S 259.9 9.2 100569:44 mongod

Output from 'top' when it happened. mongod had a virtual memory size of 14G.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

@hloeung - The "leadership manager stopped" error is likely a dup of bug #1539656. It is triggered by losing connections to mongo.

Revision history for this message
Dave Cheney (dave-cheney) wrote :

https://github.com/juju/juju/pull/4259 adds a profiling facility to jujud, giving us a toehold to figure out what is going on here.

Changed in juju-core:
milestone: 1.25.5 → 1.25.6
Revision history for this message
Alexis Bruemmer (alexis-bruemmer) wrote :

The next step for diagnosis this issue will require running the pprof tool Dave has instrumented. Details on running the pprof tool w/ jujud can be found here:

https://github.com/juju/juju/wiki/pprof-facility

@dave-cheney can you please provide some more details on what information would be useful for diagnosis.

Revision history for this message
Dave Cheney (dave-cheney) wrote :

Please capture a heap profile[1] and a goroutine[2] profile as described in the wiki from every process that is experiencing high resource usage.

1. https://github.com/juju/juju/wiki/pprof-facility#heap-profile
2. https://github.com/juju/juju/wiki/pprof-facility#goroutine-profile

Once we have that data I will analyse it.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Hit this w/1.25.5 also (on one state server in an HA deploy),
FYI kill -HUP made it behave.

Revision history for this message
Mario Splivalo (mariosplivalo) wrote :

Not sure how related this is, but:

One of the customers had very large juju database - around 15-17GB was the disk footprint. The txns collection was using around 9GB - this was juju 1.20.

mongod on that deployment was using around 10GB of memory, so was jujud - this behavior was observed on all state servers (they are running juju-ha).

Once we purged unreferenced transactions from the txns collection (and reduced the size of txns-queue fields in various documents), and run repairdatabase (so that the disk footprint is smaller - the database was shrunk to cca 2.6GB), both jujud and mongod are using around 300-500 megabytes of RAM.

Mind you, this is on juju v1.20 - these bugs should not be present in 1.25, but it might be wise to check the size of the database, and particularly the txns collection.

One can use the 'unofficial' mgopurge tool to clean up juju database: https://github.com/mjs/mgopurge

Of course, make sure you have a backup of the database prior running the tool!

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 1.25.6 → 1.25.7
Changed in juju-core:
assignee: Dave Cheney (dave-cheney) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.