Comment 0 for bug 1799360

Revision history for this message
Joel Sing (jsing) wrote :

In our production environment, when a HA Juju controller starts up we see memory usage (RSS) of around 2GB. Memory usage continues to grow based on the number of clients connected to the controller - around 0.8MB per client, based on rough observations. With 11,000 client connections this means that the controller jujud is using around 11GB of memory. Add in a 4GB mongod, some additional jujuds for subordinates and other operational processes, and a host with 16GB is run out of memory, resulting in the jujud for the controller being OOM killed (and once killed, they rarely recover by themselves due to the client DDoS issue - lp#1793245).

Each controller should have a hard upper bound on the number of client connections it will accept (as far as I'm aware this does not currently exist) - this limit probably needs to be dynamically determined based on the amount of system resources available (e.g. (host memory - 5GB (for mongod et al) - 2GB (for base jujud)) / 1MB =~ 1,000 for an 8GB host; 9,000 for a 16GB host, etc). Bottom line - it is far better to reject connections and have clients connect to another controller (or fail to connect entirely), than push a jujud to the point of being OOM killed.