Comment 0 for bug 1799363

Revision history for this message
Joel Sing (jsing) wrote :

On our production Juju HA controllers, we've seen repeated situations where the controller-to-self and controller-to-controller communication cannot be established due to client connections overloading the controller, resulting in extreme load and out of file descriptors (EMFILE). This is largely due to the current client behaviour, which dials all controllers in a fixed order and with a very small delay between connections (lp#1793245). Typically this requires manual intervention and firewalling in order to get the controllers to recover and/or an upgrade to complete.

The Juju HA controllers should protect themselves from bad client behaviour - either by splitting the controller-to-self and controller-to-controller communication off onto a separate API listener (that non-controllers cannot reach/use), or reserving a pool of API connections specifically for this purpose.

See further discussion in lp#1793245 and at https://discourse.jujucharms.com/t/stable-controller-startup-under-heavy-agent-load/296