Telegraf causes regular peaks of high load average

Bug #1905537 reported by Przemyslaw Hausman
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Telegraf Charm
Fix Released
Undecided
Unassigned

Bug Description

On a fresh, hyperconverged OpenStack deployment I noticed regular peaks of high load average.

Before: see before.png attached.

Every 105 minutes, load average at around 5 jumps to ~100 and it takes around 30 minutes to get back to the original value.

This leads to a bug report [1] and an investigation of regular high load [2].

So I tweaked telegraf's collection_jitter and flush_jitter (by default set to 0s) and set it to 5s. This update removed periodical peaks of high load average. At the same time it increased load average overall (from ~5 to ~8).

I introduced the telegraf config update at 15:30.

After: see after.png attached (more detailed graph: telegraf-jitter.png)

Note that on the affected machine I have 16 LXD containers with OpenStack control plane services and other supporting applications. When I removed telegraf from all LXD containers, but left it on the bare metal node only, this effect is much less prominent, see attached telegraf-on-bare-metal-only.png. For this test I left collection_jitter and flush_jitter set to 0s (by default).

I would suggest changing the charm defaults to the values that do not cause this effect. Or at least add a highly visible note in the charm's README describing the behaviour, with the recommended workaround. So that the charm users are aware of this and are educated on how to mitigate the problem.

1. https://github.com/influxdata/telegraf/issues/3465
2. https://blog.avast.com/investigation-of-regular-high-load-on-unused-machines-every-7-hours

Related branches

Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Xav Paice (xavpaice)
Changed in charm-telegraf:
status: New → Fix Released
milestone: none → 21.01
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.