be more deterministic in log collection in /var/log and /etc/

Bug #1723182 reported by wes hayutin on 2017-10-12
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

Building off of what the puppet team [1] has done re: log collection. Let propose a list of files and directories to collect in logs for /etc and /var/log


Ben Nemec (bnemec) wrote :

So reading what if we started tar.gz'ing /etc like we used to? That would eliminate the "uploading hundreds of small files" problem and I feel like in general the logs are looked at more than the config files so it would be a minor inconvenience to occasionally have to download and extract them instead of having them immediately available. By gzipping them locally we would also have less data to upload to the log server in the first place.

David Moreau Simard (dmsimard) wrote :

@Ben, doing .tar.gz archives resolves the problem of size and amount of files, however it makes for a poor developer UX because you then need to download the archive, extract it locally and then browse to the file you're interested in instead of just using your browser.

Ideally, we should only be recovering what we need. Going from a blacklist approach [1] (i.e, recover /etc completely BUT exclude these dirs.. or recover /var/log completely BUT exclude these dirs..) to a whitelist approach [2] (recover /etc/nova, /etc/glance, /var/log/nova, /var/log/glance, etc.).

We're not going to get the list of files and directories right the first time, but that's okay.
We can iterate and improve this list as we move along.

Only recovering what we are interested in will make more efficient usage of our resources. There will be less files, they will be smaller and this will have a positive impact on log collection performance (moving files around, compressing them, uploading them, etc.). Doing this also lets us keep individual gzipped files to keep the developers in their browsers when browsing logs.


Excluding logs files from collection will affect seriously investigation of failures, unlike in configurations you can find error in any log file and any log file could be helpful, especially in tripleo deployments.
I don't think we need the same policy as we have in /etc, where are a lot of not even active services. But every log file in /var/log means this service really worked during the deployment and could affect it.

BTW, before moving to oooq we had both all files flat in directories and tar.gz files for all hosts log dirs like undercloud.tar.gz, overcloud-controller.tar.gz , etc

Change abandoned by wes hayutin (<email address hidden>) on branch: master

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Ronelle Landy (rlandy) wrote :

Wes, does this review cover it?
Be more prescriptive in log collection list out the files and directories we collect for /var/log and /etc/

Merged today

Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers