Juju clients need to randomise controller IPs and backoff appropriately
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
John A Meinel | ||
2.4 |
Fix Released
|
High
|
John A Meinel |
Bug Description
Hi,
This has happened a couple of times now. I know that there's an open bug for lots of MongoDB connections (LP:1786258) but not sure if there's one for agents DoS'ing the controller.
Today, we've had the Juju machine agent OOM killed on one of the controllers (ubuntu/1). This caused a storm where a bunch of agents were all connecting to a single controller (which happens to also be the juju-db/MongoDB primary so may be related in that it's more busier than the others).
Controllers are:
| ubuntu/0 10.25.2.109
| ubuntu/1 10.25.2.111
| ubuntu/2 10.25.2.110
The one that was getting hammered was ubuntu/1. The work-around was to firewall off client connections on ubuntu/1.
I'm not sure what magic happens and how clients pick which controller it should talk to. Does it try to connect to all 3 and completing the handshake to the one that responds first? Or does it pick from a list given hitting the first?
For the Juju 2 controller environment, .local/
| api-endpoints: ['10.25.
summary: |
- Juju agents DoS'ing the server + Juju agents DoS'ing the controller |
description: | updated |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
milestone: | none → 2.5-beta1 |
summary: |
- Juju agents DoS'ing the controller + Juju clients need to randomise controller IPs and backoff appropriately |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
Oh yeah, the other time we've seen this happen was when upgrading the controllers from 2.3.4 to 2.4.3.