Activity log for bug #1655169

Date Who What changed Old value New value Message
2017-01-09 22:35:23 Chad Smith bug added bug
2017-01-09 22:36:27 Chad Smith bug task added landscape
2017-01-09 22:36:32 Chad Smith landscape: milestone 16.12
2017-01-09 22:40:55 Chad Smith attachment added status.out https://bugs.launchpad.net/landscape/+bug/1655169/+attachment/4802133/+files/status.out
2017-01-09 22:42:23 Chad Smith description When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes. Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run. This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application. The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc. Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail. When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes. Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run. This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application. The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc. Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail. Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive.
2017-01-09 22:53:54 Chad Smith summary Juju 2.1beta3: Mongo OOM on bootstrap machine 0 Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0
2017-01-10 10:12:48 Adam Collard juju: status New Incomplete
2017-01-11 15:44:02 Chad Smith landscape: status New Triaged
2017-01-11 15:44:05 Chad Smith landscape: importance Undecided High
2017-01-11 22:51:06 Chad Smith juju: status Incomplete New
2017-01-11 22:55:52 Chad Smith attachment added ps_mem and Mongo database stats https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803099/+files/mongo_stats
2017-01-12 16:21:23 Francis Ginther summary Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0 Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0
2017-01-12 17:41:56 Chad Smith description When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes. Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run. This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application. The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc. Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail. Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive. Update: - validated comparable OOMs (though lower frequency) on juju2.1beta4. - Logging verbosity is turned up on the juju controller in all failure cases, so the logs collection growth is much faster than default configuration. -- "logging-config: <root>=DEBUG;juju.apiserver=TRACE" When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes. Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run. This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application. The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc. Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail. Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive.
2017-01-13 01:11:17 Anastasia juju: status New Incomplete
2017-01-13 09:07:33 Adam Collard tags cdo-qa-blocker
2017-01-13 09:07:48 Adam Collard tags cdo-qa-blocker cdo-qa-blocker landscape
2017-01-13 13:36:07 Andreas Hasenack attachment added mongod-%MEM.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803860/+files/mongod-%25MEM.png
2017-01-13 13:36:28 Andreas Hasenack attachment added mongod-rss.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803861/+files/mongod-rss.png
2017-01-13 13:36:47 Andreas Hasenack attachment added mongod-vsz.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803862/+files/mongod-vsz.png
2017-01-13 13:38:05 Andreas Hasenack attachment added overall-controller-ram-swap.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803863/+files/overall-controller-ram-swap.png
2017-01-17 16:53:07 Chad Smith attachment added status.out https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4805401/+files/status.out
2017-01-18 11:51:57 Andreas Hasenack landscape: milestone 16.12 17.01
2017-01-20 00:35:42 Anastasia juju: status Incomplete Fix Committed
2017-01-20 00:35:46 Anastasia juju: milestone 2.1-beta5
2017-01-23 19:06:47 Curtis Hovey juju: importance Undecided High
2017-01-26 15:47:06 James Troup bug added subscriber The Canonical Sysadmins
2017-02-03 17:22:08 Curtis Hovey juju: status Fix Committed Fix Released
2017-02-10 20:54:54 Chad Smith landscape: milestone 17.01 17.02
2017-03-16 15:20:19 Chad Smith landscape: milestone 17.02 17.03
2017-03-21 21:18:24 Chad Smith summary Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0 Juju 2.1.1: Mongo Out Of Memory on bootstrap machine 0
2017-03-21 21:30:11 Chad Smith attachment added ps_mem.py output showing mongodb memory and swap consumption https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4841961/+files/ps_mem.txt
2017-03-21 21:31:07 Chad Smith attachment added mongodb database usage showing logs at 3.2 Gb and mongo configuration options https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4841971/+files/mongodb.log
2017-03-28 18:01:11 David Britton landscape: status Triaged Fix Committed
2017-03-28 18:02:30 David Britton bug task deleted landscape