logrotation of /var/log/libvirt/qemu is misconfigured for clouds

Bug #1460197 reported by James Troup
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Triaged
High
Unassigned
nova-compute (Juju Charms Collection)
Invalid
High
Unassigned

Bug Description

By default libvirt stores a per instance log file in
/var/log/libvirt/qemu.

It also ships with a logrotate.d file to rotate these but that file
specifies 'minsize 100k' which the vast majority of instances in the
vast majority of OpenStack workloads will never get near (e.g. none of
ours are >> 8k) IMO.

The end result of this (for us at leasat) is that these log files grow
without bound. One of our internal OpenStack clusters has well over a
million of them across only a relatively small number of compute
nodes.

I think the nova-compute charm or package should override (or
supplement) the libvirt logrotate configuration with something that
rotates these files out based solely on age and regardless of size.

Tags: openstack
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
status: New → Triaged
importance: Undecided → High
milestone: none → 15.07
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 15.07 → 15.10
Revision history for this message
Arne Wiebalck (arne-wiebalck) wrote :

The 'minsize 100k' setting may not be the only problem.

I suspect that even when rotated, empty log files of deleted instances would stay behind forever (as they would be rotated forever and are not distinguishable from files of still existing instances that just don't write anything. And 'notifempty' doesn't help as then they would never be deleted as things like 'maxage' are only applied when the files are rotated).

Another issue may be the distinction of deleted instances from shutdown instances. I guess logrotate can't tell these apart, while you probably want to keep the latter and get rid of the former.

Revision history for this message
Arne Wiebalck (arne-wiebalck) wrote :

In order to

- keep the logs of deleted instances for N days
- not rotate/delete the logs of shutdown instances
- not rotate (and then lose) the last messages logged by running instances

how about:

-->
/var/log/libvirt/qemu/*.log {
        weekly
        missingok
        rotate 4
        compress
        delaycompress
        copytruncate
        notifempty
        lastaction
                keep=`virsh list --all|grep instance|awk '{print "! -name "$2".log*"}'`
                find /var/log/libvirt/qemu/ -type f -mtime 30 $keep -exec rm {} \;
        endscript
}
<--

The main changes to the current default are
- replace 'minsize 100k' by 'notifempty': this is mostly to not rotate empty logs, but to none the less ensure the 'lastaction' script is run whenever there is actually a change (lastaction is *only* run by logrotate when some rotation happened);
- add a lastaction script: this is meant to preserve the logs of all instances that still exist and should expire out the logs of deleted instances after 30 days.

There is at least one issue with this apparoach: logrotate will need 4 weeks before the first logs are deleted, so an initial cleanup may be needed for compute nodes that piled up logs.

tags: added: openstack
Changed in nova-compute (Juju Charms Collection):
milestone: 15.10 → 16.04
Revision history for this message
Edward Hope-Morley (hopem) wrote :

See comments #50 & #51 in bug 832507:

"Patches are ready to solve this entirely in the libvirt layer one & for all. It'll be fixed with libvirt 1.3.3

https://www.redhat.com/archives/libvir-list/2016-February/msg01449.html"

James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 16.04 → 16.07
Liam Young (gnuoy)
Changed in nova-compute (Juju Charms Collection):
milestone: 16.07 → 16.10
James Page (james-page)
Changed in nova-compute (Juju Charms Collection):
milestone: 16.10 → 17.01
Revision history for this message
James Page (james-page) wrote :

The comments in #3 relate to a different problem in nova with regards to console logs, not libvirt logs which this bug relates to.

James Page (james-page)
Changed in charm-nova-compute:
importance: Undecided → High
status: New → Triaged
Changed in nova-compute (Juju Charms Collection):
status: Triaged → Invalid
Revision history for this message
Edward Hope-Morley (hopem) wrote :

@jamespage re your comment #4 this LP does still somewhat relate to the bug i mentioned in #3 since nova now has the ability [1] to leverage virtlogd for actions like getConsoleOutput and iiuc libvirt will log by default to /var/log/libvirt/qemu/ yet for this type of log we would never want to do a logrotate (since that is managed by virtlogd itself).

[1] https://blueprints.launchpad.net/nova/+spec/libvirt-virtlogd

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Sorry i mean to paste this link since it is the actual spec that was implemented for Ocata:

https://specs.openstack.org/openstack/nova-specs/specs/ocata/implemented/libvirt-virtlogd.html

James Page (james-page)
Changed in charm-nova-compute:
milestone: none → 19.04
David Ames (thedac)
Changed in charm-nova-compute:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-nova-compute:
milestone: 19.07 → 19.10
David Ames (thedac)
Changed in charm-nova-compute:
milestone: 19.10 → 20.01
James Page (james-page)
Changed in charm-nova-compute:
milestone: 20.01 → 20.05
David Ames (thedac)
Changed in charm-nova-compute:
milestone: 20.05 → 20.08
James Page (james-page)
Changed in charm-nova-compute:
milestone: 20.08 → none
Revision history for this message
Trent Lloyd (lathiat) wrote :

This log information has historically been quite valuable for support. If we do modify this configuration, we should consider keeping at least a reasonable history longer than 7 days going back at least a number of months.

Revision history for this message
Felipe Alencastro (falencastro) wrote (last edit ):

This became a problem during the lma->cos transition, we have grafana-agents consuming more than 3000% cpu due to /var/log/libvirt/qemu/ having more than 120k files.

I believe this wasn't an issue with lma, because filebeat was set to scan a single level inside /var/log and the grafana-agent is multilevel.

The lastaction workaround on #2 would probably work, but unfortunately charm-logrotated only allows for overriding 'rotate', 'interval' and 'size'. The possibility of a custom template for a particular file would be nice.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.