2017-01-09 22:35:23 |
Chad Smith |
bug |
|
|
added bug |
2017-01-09 22:36:27 |
Chad Smith |
bug task added |
|
landscape |
|
2017-01-09 22:36:32 |
Chad Smith |
landscape: milestone |
|
16.12 |
|
2017-01-09 22:40:55 |
Chad Smith |
attachment added |
|
status.out https://bugs.launchpad.net/landscape/+bug/1655169/+attachment/4802133/+files/status.out |
|
2017-01-09 22:42:23 |
Chad Smith |
description |
When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes.
Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run.
This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application.
The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc.
Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail. |
When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes.
Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run.
This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application.
The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc.
Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail.
Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive. |
|
2017-01-09 22:53:54 |
Chad Smith |
summary |
Juju 2.1beta3: Mongo OOM on bootstrap machine 0 |
Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0 |
|
2017-01-10 10:12:48 |
Adam Collard |
juju: status |
New |
Incomplete |
|
2017-01-11 15:44:02 |
Chad Smith |
landscape: status |
New |
Triaged |
|
2017-01-11 15:44:05 |
Chad Smith |
landscape: importance |
Undecided |
High |
|
2017-01-11 22:51:06 |
Chad Smith |
juju: status |
Incomplete |
New |
|
2017-01-11 22:55:52 |
Chad Smith |
attachment added |
|
ps_mem and Mongo database stats https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803099/+files/mongo_stats |
|
2017-01-12 16:21:23 |
Francis Ginther |
summary |
Juju 2.1beta3: Mongo Out Of Memory on bootstrap machine 0 |
Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0 |
|
2017-01-12 17:41:56 |
Chad Smith |
description |
When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes.
Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run.
This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application.
The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc.
Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail.
Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive. |
Update:
- validated comparable OOMs (though lower frequency) on juju2.1beta4.
- Logging verbosity is turned up on the juju controller in all failure cases, so the logs collection growth is much faster than default configuration.
-- "logging-config: <root>=DEBUG;juju.apiserver=TRACE"
When using Juju2.1beta3 to deploy a newton OpenStack HA cloud, I've run into out of memory errors where the kernel kills off mongod every 8 - 15 minutes.
Mongo quickly climbs to > 4Gb in memory on my 16G the bootstrap node. The node becomes completely unresponsive to ssh. I can see the mongo service timing out and causing errors with other juju units as update-status hooks are run.
This cloud deployment is a 6 node cluster, each node has 16G of memory. Each node has around 7 lxcs configured runnning various OpenStack applications. Lots of hooks firing for each highly-available application.
The bootstrap node also shares resources with 7 other lxcs running on that machine running various openstack services in each lxc.
Because of the OOM issues, juju doesn't respond to status or ssh commands, many service endpoints timeout and service status updates are lost causing update-status hooks to fail.
Note: Initially, the region deployment succeeded and I was able to bootstrap on the deployed cloud. I left the cloud untouched for 3 days and came back to see it completely unresponsive. |
|
2017-01-13 01:11:17 |
Anastasia |
juju: status |
New |
Incomplete |
|
2017-01-13 09:07:33 |
Adam Collard |
tags |
|
cdo-qa-blocker |
|
2017-01-13 09:07:48 |
Adam Collard |
tags |
cdo-qa-blocker |
cdo-qa-blocker landscape |
|
2017-01-13 13:36:07 |
Andreas Hasenack |
attachment added |
|
mongod-%MEM.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803860/+files/mongod-%25MEM.png |
|
2017-01-13 13:36:28 |
Andreas Hasenack |
attachment added |
|
mongod-rss.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803861/+files/mongod-rss.png |
|
2017-01-13 13:36:47 |
Andreas Hasenack |
attachment added |
|
mongod-vsz.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803862/+files/mongod-vsz.png |
|
2017-01-13 13:38:05 |
Andreas Hasenack |
attachment added |
|
overall-controller-ram-swap.png https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4803863/+files/overall-controller-ram-swap.png |
|
2017-01-17 16:53:07 |
Chad Smith |
attachment added |
|
status.out https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4805401/+files/status.out |
|
2017-01-18 11:51:57 |
Andreas Hasenack |
landscape: milestone |
16.12 |
17.01 |
|
2017-01-20 00:35:42 |
Anastasia |
juju: status |
Incomplete |
Fix Committed |
|
2017-01-20 00:35:46 |
Anastasia |
juju: milestone |
|
2.1-beta5 |
|
2017-01-23 19:06:47 |
Curtis Hovey |
juju: importance |
Undecided |
High |
|
2017-01-26 15:47:06 |
James Troup |
bug |
|
|
added subscriber The Canonical Sysadmins |
2017-02-03 17:22:08 |
Curtis Hovey |
juju: status |
Fix Committed |
Fix Released |
|
2017-02-10 20:54:54 |
Chad Smith |
landscape: milestone |
17.01 |
17.02 |
|
2017-03-16 15:20:19 |
Chad Smith |
landscape: milestone |
17.02 |
17.03 |
|
2017-03-21 21:18:24 |
Chad Smith |
summary |
Juju 2.1beta4: Mongo Out Of Memory on bootstrap machine 0 |
Juju 2.1.1: Mongo Out Of Memory on bootstrap machine 0 |
|
2017-03-21 21:30:11 |
Chad Smith |
attachment added |
|
ps_mem.py output showing mongodb memory and swap consumption https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4841961/+files/ps_mem.txt |
|
2017-03-21 21:31:07 |
Chad Smith |
attachment added |
|
mongodb database usage showing logs at 3.2 Gb and mongo configuration options https://bugs.launchpad.net/juju/+bug/1655169/+attachment/4841971/+files/mongodb.log |
|
2017-03-28 18:01:11 |
David Britton |
landscape: status |
Triaged |
Fix Committed |
|
2017-03-28 18:02:30 |
David Britton |
bug task deleted |
landscape |
|
|