Activity log for bug #1799363

Date Who What changed Old value New value Message
2018-10-23 07:29:24 Joel Sing bug added bug
2018-10-23 08:23:08 Haw Loeung bug added subscriber The Canonical Sysadmins
2018-10-23 08:23:24 Haw Loeung description On our production Juju HA controllers, we've seen repeated situations where the controller-to-self and controller-to-controller communication cannot be established due to client connections overloading the controller, resulting in extreme load and out of file descriptors (EMFILE). This is largely due to the current client behaviour, which dials all controllers in a fixed order and with a very small delay between connections (lp#1793245). Typically this requires manual intervention and firewalling in order to get the controllers to recover and/or an upgrade to complete. The Juju HA controllers should protect themselves from bad client behaviour - either by splitting the controller-to-self and controller-to-controller communication off onto a separate API listener (that non-controllers cannot reach/use), or reserving a pool of API connections specifically for this purpose. See further discussion in lp#1793245 and at https://discourse.jujucharms.com/t/stable-controller-startup-under-heavy-agent-load/296 On our production Juju HA controllers, we've seen repeated situations where the controller-to-self and controller-to-controller communication cannot be established due to client connections overloading the controller, resulting in extreme load and out of file descriptors (EMFILE). This is largely due to the current client behaviour, which dials all controllers in a fixed order and with a very small delay between connections (LP: #1793245). Typically this requires manual intervention and firewalling in order to get the controllers to recover and/or an upgrade to complete. The Juju HA controllers should protect themselves from bad client behaviour - either by splitting the controller-to-self and controller-to-controller communication off onto a separate API listener (that non-controllers cannot reach/use), or reserving a pool of API connections specifically for this purpose. See further discussion in LP: #1793245 and at https://discourse.jujucharms.com/t/stable-controller-startup-under-heavy-agent-load/296
2018-10-23 08:24:19 Haw Loeung bug added subscriber Haw Loeung
2018-10-23 15:46:25 Richard Harding juju: status New Triaged
2018-10-23 15:46:27 Richard Harding juju: importance Undecided High
2018-10-23 15:46:35 Richard Harding juju: assignee Tim Penhey (thumper)
2018-10-23 15:46:38 Richard Harding juju: status Triaged In Progress
2018-10-23 15:46:42 Richard Harding juju: milestone 2.5-beta1
2018-10-23 19:29:38 Tim Penhey juju: assignee Tim Penhey (thumper) Christian Muirhead (2-xtian)
2018-10-24 01:36:33 Paul Gear tags canonical-is
2018-11-12 21:50:35 Richard Harding juju: status In Progress Fix Committed
2019-03-22 01:35:57 Anastasia juju: status Fix Committed Fix Released