Juju controllers need to be able to communicate regardless of client connections

Bug #1799363 reported by Joel Sing
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead

Bug Description

On our production Juju HA controllers, we've seen repeated situations where the controller-to-self and controller-to-controller communication cannot be established due to client connections overloading the controller, resulting in extreme load and out of file descriptors (EMFILE). This is largely due to the current client behaviour, which dials all controllers in a fixed order and with a very small delay between connections (LP: #1793245). Typically this requires manual intervention and firewalling in order to get the controllers to recover and/or an upgrade to complete.

The Juju HA controllers should protect themselves from bad client behaviour - either by splitting the controller-to-self and controller-to-controller communication off onto a separate API listener (that non-controllers cannot reach/use), or reserving a pool of API connections specifically for this purpose.

See further discussion in LP: #1793245 and at https://discourse.jujucharms.com/t/stable-controller-startup-under-heavy-agent-load/296

Tags: canonical-is
Haw Loeung (hloeung)
description: updated
Changed in juju:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Tim Penhey (thumper)
status: Triaged → In Progress
milestone: none → 2.5-beta1
Tim Penhey (thumper)
Changed in juju:
assignee: Tim Penhey (thumper) → Christian Muirhead (2-xtian)
Paul Gear (paulgear)
tags: added: canonical-is
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.