Canonical Juju

Juju 2.1.1: Mongo Out Of Memory on bootstrap machine 0

Bug #1655169 reported by Chad Smith on 2017-01-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Unassigned	Canonical Juju 2.1-beta5

Bug Description

Update:
   - validated comparable OOMs (though lower frequency) on juju2.1beta4.
   - Logging verbosity is turned up on the juju controller in all failure cases, so the logs collection growth is much faster than default configuration.
      -- "logging-config: <root>=DEBUG;juju.apiserver=TRACE"

When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes.

Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run.

This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application.

The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc.

Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail.

Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive.

See original description

Tags:

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-09:

status.out Edit (41.2 KiB, text/plain)

Juju status failed OpenStack HA deployment.

Changed in landscape:
milestone:	none → 16.12

Chad Smith (chad.smith) on 2017-01-09

description:

updated

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-09: Re: Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0

All juju logs from each unit in the deployment. base-machine/0 will have the Out of Memory errors related the the bootstrap node killing mongod.

summary:

- Juju 2.1beta3: Mongo OOM on bootstrap machine 0
+ Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-09:

We might want to watch: https://bugs.launchpad.net/juju/+bug/1653558 as well

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-09:

All juju 2.1beta3 logs from OpenStack HA region deployment. base-machine/0 == bootstrap node
https://private-fileshare.canonical.com/~csmith/my.tgz

Revision history for this message

Adam Collard (adam-collard) wrote on 2017-01-10:

Chad, Juju team think they fixed this in beta4, can you retry with that?

Changed in juju:
status:	New → Incomplete

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-10:

Additional data points:

1. tried an HA OpenStack 6 node deployment (juju2.1beta4) which was making use of a new model in using an external 2.1beta3 controller. On the bootstrap node of the 2.1beta3 controller mongod climbed from 1.3GB -> 1.7Gb over two hours of idle update-status hooks. This leads me to believe that mongod will continue to grow if left over a 24 hour period.

2. Deployment #2: Allowed landscape to bootstrap a 2.1beta4 controller and use the bootstrap node and the controller model for the 6 node HA deployment. At the end of the HA deployment, mongod claims 1.4Gb of Memory. Will update with mongodb memory footprint for these two deployments after the get a few hours of idle "bake time".

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-11:

Followup on my 6 node HA OpenStack clusters:

  1. external juju 2.1beta3 controller separate model for the cloud deployment: mongodb has continued to grow in memory for my idle 6 node cloud:
  - bootstrap node is not involved in the deployment
  - 9 hours since cloud deployment: mongod memory footprint 1.3Gb -> 2.1 Gb

  2. internal (bootstrapped by landscape) juju 2.1beta4 controller, using the :controller model (and bootstrap node as one of the 6 machines) for the deployment:
   - bootstrap node is one of the 6 machines in the cloud (has 7 lxc with openstack services on it)
   - 7 hours since cloud deployment: mongodb memory footprint 1.3Gb -> 2.7Gb
   - juju status or juju ssh 0 no longer responsive, swap space fully allocated at 100%

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-11:

Must mention I have not yet seen an OOM event on either deployment yet. Will update tomorrow.

Chad Smith (chad.smith) on 2017-01-11

Changed in landscape:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-11:

Update on existing 6 node deployments:
1. External 2.1beta3 controller (beta4 deployment in a separate model)
mongod has grown to 2.6GiB from 2.1GiB in the last 19 hours

2. Internal 2.1beta4 controller services places on the bootstrap machine 0 in :controller model:
    - mongod has grown to 4.6GiB from 2.7GiB in the last 19 hour
    - no OOM events killing mongo yet, and memory is nearly at capacity
              total used free shared buff/cache available
Mem: 15G 14G 234M 431M 1.0G 359M
Swap: 4.0G 4.0G 0B

We wrote a script to sort mongo collections * databases based on storage size. The most significant mongo collections are juju log related.

juju.statuseshistory: 25.03 MiB (0%)
blobstore.blobstore.chunks: 33.65 MiB (1%)
local.oplog.rs: 1.00 GiB (35%)
logs.logs: 1.76 GiB (61%)

--- database size
admin 951.00 B
presence 737.90 kiB
blobstore 33.65 MiB
juju 45.33 MiB
local 1.00 GiB
logs 1.76 GiB

The simple script: http://paste.ubuntu.com/23783766/

I'm wondering if Landscape using the :controller bootstrap machine 0 is what leads to the eventual OOM because of wiredTiger cache grabbing 50% of the machine's total memory.

Since juju doesn't specify --wiredTigerCacheSizeGB option to mongod, mongo assumes machine 0 owns all physical memory and wiredTiger will grow quickly to 50% of the physical memory on the system [1].

Since landscape deploys lxcs to machine 0 in the controller model, memory is impacted by the number of applications running in the lxcs so mongo claiming 50% of the physical memory will quickly lead to saturating swap space as mongo gets too big.

References:
[1] From https://docs.mongodb.com/manual/faq/storage/
"""
Starting in 3.4, the WiredTiger internal cache, by default, will use the larger of either:

50% of RAM minus 1 GB, or
256 MB.

NOTE
The storage.wiredTiger.engineConfig.cacheSizeGB limits the size of the WiredTiger internal cache. The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.
"""

Update on existing 6 node deployments:
 1. External 2.1beta3 controller (beta4 deployment in a separate model)
    mongod has grown to 2.6GiB from 2.1GiB in the last 19 hours

2. Internal 2.1beta4 controller services places on the bootstrap machine 0 in :controller model:
    - mongod has grown to 4.6GiB from 2.7GiB in the last 19 hour
    - no OOM events killing mongo yet, and memory is nearly at capacity
              total        used        free      shared  buff/cache   available
Mem:            15G         14G        234M        431M        1.0G        359M
Swap:          4.0G        4.0G          0B

We wrote a script to sort mongo collections * databases based on storage size. The most significant mongo collections are juju log related.

juju.statuseshistory:                                 25.03 MiB (0%)
blobstore.blobstore.chunks:                           33.65 MiB (1%)
local.oplog.rs:                                        1.00 GiB (35%)
logs.logs:                                             1.76 GiB (61%)

--- database size
admin                  951.00 B
presence               737.90 kiB
blobstore               33.65 MiB
juju                    45.33 MiB
local                    1.00 GiB
logs                     1.76 GiB

The simple script: http://paste.ubuntu.com/23783766/

I'm wondering if Landscape using the :controller bootstrap machine 0 is what leads to the eventual OOM because of wiredTiger cache grabbing 50% of the machine's total memory.

Since juju doesn't specify --wiredTigerCacheSizeGB option to mongod, mongo assumes machine 0 owns all physical memory and wiredTiger will grow quickly to 50% of the physical memory on the system [1].

References:
[1] From https://docs.mongodb.com/manual/faq/storage/
"""
Starting in 3.4, the WiredTiger internal cache, by default, will use the larger of either:

50% of RAM minus 1 GB, or
256 MB.

To accommodate the additional consumers of RAM, you may have to decrease WiredTiger internal cache size.
"""

Changed in juju:
status:	Incomplete → New

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-11:

#10

ps_mem and Mongo database stats Edit (410.0 KiB, text/plain)

Marking back as new for comment from the juju team. I have a deployment up that I believe I can give access to if more debugging is needed. We have a cron running every 30 mins to grab mongo and ps_mem related information.

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-11:

#11

Just got my first OOM on this juju2.1beta4 system:
Jan 11 23:27:24 bierstadt kernel: [102792.536095] Out of memory: Kill process 5423 (mongod) score 256 or sacrifice child

Francis Ginther (fginther) on 2017-01-12

summary:

- Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0
+ Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-12: Re: Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0

#12

I'm not certain if juju has the ability to cap mongo's (or wiredTiger's) memory growth, or whether juju can govern smaller set sizes in memory so mongo doesn't keep around as much data.
It seems this problem is exacerbated by two conditions:
1. having multiple active lxcs on the bootstrap machine
2. having an HA OpenStack deployment (with high-frequency leader election logging)

Background on landscape using the bootstrap machine for deployment is to avoid burning a full machine for just juju's controller when the controller's only role is to support a single region.

If landscape had a config setting knob in juju to tune down the mongo memory caps, then landscape could knowingly set a cap during bootstrap and/or lxc placement events. Landscape would know that the additional lxcs will impact available memory on the bootstrap node so it could limit mongo(wireTiger) growth appropriately.

Chad Smith (chad.smith) on 2017-01-12

description:

updated

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-12:

#13

Validated wiredTiger default cache settings are in fact ~8Gb from mongo's db.serverStatus() command linked in comment #10.

looked at wiredTiger: : { "cache" : { "maximum bytes configured" : 8589934592 }} which is ~8Gig on the bootstrap node. This will be too high a value if the bootstrap has any memory hungry lxcs on it.

Revision history for this message

Tim Penhey (thumper) wrote on 2017-01-13:

#14

There was a bug we found earlier in the week that may well have contributed to this. The guys in the OIL CI lab are testing.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-01-13:

#15

marking as Incomplete until we hear back from OIL. It's possible that this bug has been fixed as per comment # 14.

Changed in juju:
status:	New → Incomplete

Adam Collard (adam-collard) on 2017-01-13

tags:	added: cdo-qa-blocker
tags:	added: landscape

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-13:

#16

Can someone please link to this "other bug" and to the PR that fixed it? And can someone from OIL please make a comment here as to what they saw, what they are testing, and the results when they come? So that we have a point of contact.

This is quite expensive and time consuming to test and we don't want to just "could you please test this new version and see if it still happens" based on vague statements.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-13:

#17

mongod-%MEM.png Edit (13.8 KiB, image/png)

juju 2.1beta4 controller on an idle cloud.

mongod %MEM taken from ps output.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-13:

#18

mongod-rss.png Edit (12.9 KiB, image/png)

juju 2.1beta4 controller on an idle cloud.

mongod RSS taken from ps output.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-13:

#19

mongod-vsz.png Edit (13.8 KiB, image/png)

juju 2.1beta4 controller on an idle cloud.

mongod VSZ taken from ps output.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-13:

#20

overall-controller-ram-swap.png Edit (32.4 KiB, image/png)

juju 2.1beta4 controller on an idle cloud.

overall controller RAM and swap usage as measured by Landscape.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-13:

#21

The above graphs are from a controller created a few days ago. The cloud deployed using that controller is idle.

This controller does not have the TRACE logging issue, i.e., it's logging is set to "<root>=WARNING;unit=DEBUG".

Chad's script for mongo stats outputs this at the end:
(...)
juju.txns: 5.57 MiB (0%)
juju.txns.log: 9.54 MiB (0%)
blobstore.blobstore.chunks: 34.16 MiB (2%)
juju.statuseshistory: 79.36 MiB (5%)
logs.logs: 262.03 MiB (18%)
local.oplog.rs: 1.01 GiB (72%)

--- database size
admin 951.00 B
presence 1.48 MiB
blobstore 34.16 MiB
juju 97.65 MiB
logs 262.03 MiB
local 1.01 GiB

The memory usage growth is much more tame, but looks like it's still happening.

BTW, uptime:

ubuntu@born:~$ uptime
13:41:30 up 2 days, 18:10, 1 user, load average: 1.24, 1.46, 1.55

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-01-16:

#22

@Andreas,

The PR with the fix is against 2.1: https://github.com/juju/juju/pull/6789

The bug that we've referred to is: https://bugs.launchpad.net/juju/+bug/1649719

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-01-17:

#23

status.out Edit (94.8 KiB, text/plain)

Thanks Anastasia for the updates here, we'll grab the RC1 when it comes out to continue memory growth investigation.
I've attached juju status for landscape's 6 node HA deployment for reference

Andreas Hasenack (ahasenack) on 2017-01-18

Changed in landscape:
milestone:	16.12 → 17.01

Anastasia (anastasia-macmood) on 2017-01-20

Changed in juju:
status:	Incomplete → Fix Committed
milestone:	none → 2.1-beta5

Curtis Hovey (sinzui) on 2017-01-23

Changed in juju:
importance:	Undecided → High

Revision history for this message

Tim Penhey (thumper) wrote on 2017-01-26:

#24

Hi there,

There was a very bad memory leak that was identified in the beta3 and beta4 releases of 2.1. This was causing rampant memory usage. This particular leak was identified and fixed in early January, but I am still investigating other leaks in the 2.1 branch.

Some of the continued chatter about leaks is on bug 1649719.

Revision history for this message

Andreas Hasenack (ahasenack) wrote on 2017-01-26: Re: [Bug 1655169] Re: Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0

#25

Thanks for the update, Tim!

On Jan 26, 2017 6:41 PM, "Tim Penhey" <email address hidden> wrote:

> Hi there,
>
> There was a very bad memory leak that was identified in the beta3 and
> beta4 releases of 2.1. This was causing rampant memory usage. This
> particular leak was identified and fixed in early January, but I am
> still investigating other leaks in the 2.1 branch.
>
> Some of the continued chatter about leaks is on bug 1649719.
>
> --
> You received this bug notification because you are subscribed to
> Landscape Project.
> Matching subscriptions: default, everything
> https://bugs.launchpad.net/bugs/1655169
>
> Title:
> Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1655169/+subscriptions
>

Revision history for this message

Gareth Woolridge (moon127) wrote on 2017-01-27: Re: Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0

#26

Sinzui pointed me to these bugs after we observed issues on 2.0.2 in our HA controller for deploying models into Prodstack.

https://grafana.admin.canonical.com/dashboard/snapshot/EiekDtFVtH6VJzO1ANUKFehtROypKdIq?from=1485370617479&to=1485525311195

We have seen occasional need to restart a single instance due to an OOM but unfortunately today had 2 of 3 near simultaneously which took us a while to recover from and clean up lost agents in the models once juju status etc started returning again.

Per sinzui we shouldn't expect a fix in 2.0.3 but rather this fix to land in 2.1

Curtis Hovey (sinzui) on 2017-02-03

Changed in juju:
status:	Fix Committed → Fix Released

Chad Smith (chad.smith) on 2017-02-10

Changed in landscape:
milestone:	17.01 → 17.02

Chad Smith (chad.smith) on 2017-03-16

Changed in landscape:
milestone:	17.02 → 17.03

Chad Smith (chad.smith) on 2017-03-21

summary:

- Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0
+ Juju 2.1.1: Mongo Out Of Memory on bootstrap machine 0

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-03-21:

#27

ps_mem.py output showing mongodb memory and swap consumption Edit (5.5 KiB, text/plain)

Adding an update after setting up a long running cloud with 2.1.1:

   - This cloud is a 6 Node High-availability configuration kvm, openvswitch ceph object ceph block
   - Additional memory pressure on the controller machine with 7 lxds on machine 0 with various openstack services
   - Additional mongodb log pressure due to a debug juju log setting in landscape
            logging-config model <root>=DEBUG;juju.apiserver=TRACE;unit=DEBUG

   - The deployment has OOM killed mongod on the controller machine twice in 7 days at a rate of about once every 2.5 days
   - Controller machine has 16 Gb of memory, so wiredtiger will grab up to 8 Gb for cache.
   - mongod is grabbing 6.6Gb memory currently (see attached ps_mem.log)
   - mongo logs database is around 3.3 Gb of that memory consumption (see attached mongodb.log)

While we don't expect customers to set landscape debug-juju feature flag, this represents that customers will be unable to tune juju logging configs and expect to keep their autopilot deployed cloud running.

Revision history for this message

Chad Smith (chad.smith) wrote on 2017-03-21:

#28

mongodb database usage showing logs at 3.2 Gb and mongo configuration options Edit (405.3 KiB, text/plain)

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-03-22:

#29

@Chad Smith (chad.smith),

We are aware of the residual and minor memory leaks in 2.1. The workaround for these is to restart jujud occasionally via cron job. Majority of improvements did go into 2.1.x, hence, the Fix Released status on this report.

These are being fixed in Juju 2.2. The fix requires a dependent websocket library update and will not be backported to previous Juju versions. We are tracking the progress of this work through bug # 1649719.

If you are experiencing further memory leaks with 2.2x, please file a separate bug rather than morph this one.

David Britton (dpb) on 2017-03-28