/var/spool/rsyslog grows without bound
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-core |
Fix Released
|
High
|
Andrew Wilkins | ||
1.22 |
Fix Released
|
Critical
|
Andrew Wilkins | ||
1.23 |
Fix Released
|
High
|
Andrew Wilkins | ||
1.24 |
Fix Released
|
High
|
Andrew Wilkins |
Bug Description
We are currently using rsyslog with Disk-Assisted Memory Queues:
ActionQueueFil
However, we are not passing a value for
ActionQueueMax
Which means that the size of the rsyslog disk assistance is unbounded.
On one production site we currently have a 12GB disk queue which is causing us to run out of disk space (which then makes mongo go crazy).
This problem is probably exacerbated when you go into HA mode, because then we end up with a forward rule (and associated Queue) for each other API server. Which means that instead of just 1 disk assisted queue, we end up with 3 or even 5 of them.
I'm not sure if we are properly sending our messages (it is possible that we forward every message we get, including the ones that are sent to us from someone else, which would mean we overbroadcast by a factor of at least N, if not N^2)
But at the very least we should bound our QueueSize to be no larger than our maximum all-machines.log size.
Changed in juju-core: | |
status: | Triaged → In Progress |
assignee: | nobody → Andrew Wilkins (axwalk) |
milestone: | none → 1.25.0 |
description: | updated |
Changed in juju-core: | |
status: | In Progress → Fix Committed |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
In current trunk: https:/ /github. com/juju/ juju/blob/ master/ utils/syslog/ config. go#L75 says that we will rotate all-machines.log when it grows beyond 512MB, thus it is more than safe enough to set ActionQueueSize to 512MB. That is probably still too big, but at least it is bounded.