Juju controllers should hard limit client connections

Bug #1799360 reported by Joel Sing
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

In our production environment, when a HA Juju controller starts up we see memory usage (RSS) of around 2GB. Memory usage continues to grow based on the number of clients connected to the controller - around 0.8MB per client, based on rough observations. With 11,000 client connections this means that the controller jujud is using around 11GB of memory. Add in a 4GB mongod, some additional jujuds for subordinates and other operational processes, and a host with 16GB is run out of memory, resulting in the jujud for the controller being OOM killed (and once killed, they rarely recover by themselves due to the client DDoS issue - LP: #1793245).

Each controller should have a hard upper bound on the number of client connections it will accept (as far as I'm aware this does not currently exist) - this limit probably needs to be dynamically determined based on the amount of system resources available (e.g. (host memory - 5GB (for mongod et al) - 2GB (for base jujud)) / 1MB =~ 1,000 for an 8GB host; 9,000 for a 16GB host, etc). Bottom line - it is far better to reject connections and have clients connect to another controller (or fail to connect entirely), than push a jujud to the point of being OOM killed.

Haw Loeung (hloeung)
description: updated
Changed in juju:
status: New → Triaged
importance: Undecided → High
Paul Gear (paulgear)
tags: added: canonical-is
Revision history for this message
Tim Penhey (thumper) wrote :

I have a feeling that this could be more than just connection load. Is it possible to gather a few heap reports for a loaded controller over a few hours?

Revision history for this message
Alexandre Gomes (alejdg) wrote :

@thumper here are the reports.

Revision history for this message
Stuart Bishop (stub) wrote :

This bug is specifically about putting defenses in place so that whole classes of bugs and attacks don't take down the controllers, rather than diagnosing a particular trigger.

Revision history for this message
Tim Penhey (thumper) wrote :

@stub I agree, and we are putting things into place, but I'm also just checking that we aren't leaking.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.