juju should tune kernel socket options on apiserver
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Horacio Durán | ||
2.1 |
Fix Released
|
High
|
Ian Booth |
Bug Description
In large deployments (400+ servers, 1700+ juju agents), default tcp sysctls are quickly overwhelmed preventing 'dblogpruner' connecting to mongod and trimming the logs collections as scheduled.
When this happens, the logs database collection can grow very rapidly resulting in high cpu, ram, and I/O usage by mongod. As a result, 'juju status' takes upwards of 5 minutes to return and we receive repeated 'i/o timeout' in the juju controller logs.
After experiencing this in our environment, we manually removed documents from the logs collection using mongoshell and tuned some tcp sysctls (net.core.
machine-0.log
--------------
ERROR exited "dblogpruner": worker "dblogpruner" exited: failed to prune logs by time: read tcp 10.1.100.
ERROR exited "dblogpruner": worker "dblogpruner" exited: failed to prune logs by time: read tcp 10.1.100.
ERROR exited "dblogpruner": worker "dblogpruner" exited: failed to prune logs by time: read tcp 10.1.100.
ERROR failed to write status history: read tcp 127.0.0.
kern.log
---------
TCP: request_sock_TCP: Possible SYN flooding on port 37017. Sending cookies. Check SNMP counters.
TCP: request_sock_TCP: Possible SYN flooding on port 17070. Sending cookies. Check SNMP counters.
mongodb logs collection (33GB):
-------
33500684288 Jan 11 22:43 collection-
Changed in juju: | |
milestone: | 2.1-rc1 → 2.2.0-alpha1 |
Changed in juju: | |
assignee: | nobody → Horacio Durán (hduran-8) |
status: | Triaged → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
juju version:
2.0.0-xenial-amd64