/var/spool/rsyslog grows without bound

Bug #1453801 reported by John A Meinel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
High
Andrew Wilkins
1.22
Fix Released
Critical
Andrew Wilkins
1.23
Fix Released
High
Andrew Wilkins
1.24
Fix Released
High
Andrew Wilkins

Bug Description

We are currently using rsyslog with Disk-Assisted Memory Queues:
 ActionQueueFileName machine-X_X

However, we are not passing a value for
 ActionQueueMaxDiskSpace

Which means that the size of the rsyslog disk assistance is unbounded.
On one production site we currently have a 12GB disk queue which is causing us to run out of disk space (which then makes mongo go crazy).

This problem is probably exacerbated when you go into HA mode, because then we end up with a forward rule (and associated Queue) for each other API server. Which means that instead of just 1 disk assisted queue, we end up with 3 or even 5 of them.

I'm not sure if we are properly sending our messages (it is possible that we forward every message we get, including the ones that are sent to us from someone else, which would mean we overbroadcast by a factor of at least N, if not N^2)

But at the very least we should bound our QueueSize to be no larger than our maximum all-machines.log size.

Tags: stakeholder
Revision history for this message
John A Meinel (jameinel) wrote :

In current trunk: https://github.com/juju/juju/blob/master/utils/syslog/config.go#L75 says that we will rotate all-machines.log when it grows beyond 512MB, thus it is more than safe enough to set ActionQueueSize to 512MB. That is probably still too big, but at least it is bounded.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

According to http://www.rsyslog.com/doc/queues.html (under "Limiting the Queue Size"), we should be setting "$<object>QueueMaxDiskSpace", and not ActionQueuSize.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

I'm not sure how to repro this, so I haven't been able to verify my fix beyond ensuring that the rsyslog config is updated after upgrading. I tried creating an HA env, creating a giant machine-0.log and all-machines.log on machine 0, and then disabling rsyslog on the 2nd and 3rd state servers; didn't cause the spool to grow by more than a couple MB.

I'll mark Fix Committed once I've merged, but would be good if someone could provide steps to repro.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Seems that long log lines get silently chucked away. When I created the "giant all-machines.log", I was creating log lines that were ~1MB long. John provided a snippet of Python to create log lines in all-machines.log:

    python -c "import syslog; syslog.openlog('juju-test'); syslog.syslog(syslog.LOG_WARNING, 'test this out')"

(which can be tweaked to generate longer lines, and so on)

Andrew Wilkins (axwalk)
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Andrew Wilkins (axwalk)
milestone: none → 1.25.0
John A Meinel (jameinel)
description: updated
Andrew Wilkins (axwalk)
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.