Canonical Juju

Bug #1852502
Comment #29

Comment 29 for bug 1852502

Revision history for this message

John A Meinel (jameinel) wrote on 2022-04-20:

#29

I don't think it is something that makes sense for Juju to do by default, as the goal of backup is to not interrupt your normal flow.
Likely we also need to understand why a given model is generating that many logs in the first place. It sounds like something is in error and spinning and that error is just being ignored.

There are also some tools around rate limiting for logs that could also be tweaked here, though I don't think they are exposed in a clean fashion.
inside of agent.conf there is a section on 'values:' and you can set:
LOGSINK_DBLOGGER_BUFFER_SIZE: 1000
LOGSINK_DBLOGGER_FLUSH_INTERVAL: 2s
LOGSINK_RATELIMIT_BURST: 1000
LOGSINK_RATELIMIT_REFILL: 1ms

(those are the defaults).

The interesting ones are probably BURST and REFILL.
The general design is a token bucket, where every log message grabs a token as to whether it is allowed to be logged, and we refill that bucket as one token per time period.
So in the default config, any given agent can log up to 1000 messages as fast as they want, but they only get those back every 1ms. Which isn't very much rate limiting (it essentially means you can stream 1000 log messages per second indefinitely).
Changing it to either keep the burst fixed but drop the refill to 2ms or even 10ms, could be interesting.
There are a few other things:

a) *We* don't have a great feel for what is an appropriate stream of log messages that doesn't lose things you care about but does avoid overloading. So working with us to help find reasonable defaults.

b) We really should expose these as `juju controller-config` rather than being hidden in agent.conf. (The code to handle rate limiting predates our support for controller-config.)

c) We *might* want to also have a total rate limiting, or a per-model rate limiting (those do get trickier because you have to share the rate limiting bucket between threads.)