Bug #1382515 “Logrotate incorrect configuration: hourly logrotat...” : Bugs : Fuel for OpenStack

Nastya Urlapova (aurlapova) on 2014-10-17

Changed in fuel:
milestone:	none → 6.0
Changed in mos:
milestone:	none → 6.0
Changed in fuel:
importance:	Undecided → High
Changed in mos:
importance:	Undecided → High

Matthew Mosesohn (raytrac3r) on 2014-10-17

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)

Bogdan Dobrelya (bogdando) on 2014-10-17

Changed in mos:
status:	New → Invalid
Changed in fuel:
status:	New → Invalid

Tomasz 'Zen' Napierala (tzn) on 2014-10-17

Changed in mos:
importance:	High → Medium

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-10-17:

#1

By design, 20-fuel-*.conf entries *must* repeat the ones for 10-fuel-*.conf.
That is needed for two separate schedule plans for the same set of the logs - one daily based and one hourly based.
The error reported in bug is non fatal and does not prevent logs from being rotated and could be ignored

Nastya Urlapova (aurlapova) on 2014-10-17

Changed in fuel:
importance:	High → Medium

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-04:

#2

https://bugs.launchpad.net/fuel/+bug/1378327/comments/12

When running logrotate manually with this command
logrotate -f /etc/logrotate.conf
we can observe this error
error: 20-fuel-docker.conf:3 duplicate log entry for /var/log/kern.log

To avoid this:
add to /etc/logrotate.conf before(this is important) 'include /etc/logrotate.d' directive this line:
tabooext + .ignoredaily

After that
mv /etc/logrotate.d/20-fuel-docker.conf /etc/logrotate.d/20-fuel-docker.conf.ignoredaily

Now logrotate will ignore file with hourly log rotation options /etc/logrotate.d/20-fuel-docker.conf.ignoredaily, and we get rid of "duplicate log entry" message.

I think this modification should be added both to fuel master node and to rsyslog container.

Changed in fuel:
status:	Invalid → Confirmed

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-07:

#3

Lowered to low as this issue does not affect neither the deployment nor the operations

Changed in mos:
importance:	Medium → Low
Changed in fuel:
importance:	Medium → Low
milestone:	6.0 → 7.0
tags:	added: low-hanging-fruit

Bartłomiej Piotrowski (bpiotrowski) on 2015-04-07

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Bartlomiej Piotrowski (bpiotrowski)

Fuel Devops McRobotson (fuel-devops-robot) on 2015-04-09

no longer affects:	mos
Changed in fuel:
milestone:	7.0 → 6.1
status:	Confirmed → New

Vladimir Kuklin (vkuklin) on 2015-04-09

Changed in fuel:
status:	New → Won't Fix

Alexander Bozhenko (alexbozhenko) on 2015-04-12

summary:

- Logrotate incorrect configuration
+ Logrotate incorrect configuration: hourly logrotate is broken

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-12: Re: Logrotate incorrect configuration: hourly logrotate is broken

#4

Download full text (3.2 KiB)

We have different suffixes for rotated files in hourly and daily logrotate configs.
We have dateext config in the /etc/logrotate.conf, but we do not have that directive in /etc/logrotate.d/20-fuel-docker.conf
Since we are running
/usr/sbin/logrotate /etc/logrotate.d/20-fuel*.conf
every hour directly in cron, it leads to the situation, when we have rotated files with both suffixes:
-rw-r--r-- 1 root root 105M Apr 28 14:29 /var/log/docker-logs/dump.log
-rw-r--r-- 1 root root 106M Apr 28 13:29 /var/log/docker-logs/dump.log.1.gz
-rw-r--r-- 1 root root 211M Apr 20 01:06 /var/log/docker-logs/dump.log-20150428.gz
-rw-r--r-- 1 root root 106M Apr 28 12:29 /var/log/docker-logs/dump.log.2.gz
-rw-r--r-- 1 root root 106M Apr 28 11:30 /var/log/docker-logs/dump.log.3.gz
-rw-r--r-- 1 root root 106M Apr 28 10:29 /var/log/docker-logs/dump.log.4.gz

Despite I can't reproduce it in my lab, I'm sure that with this configuration hourly logrotate is be broken.
Proof:
By design we shouldn't have any files older that one hour, and bigger than 100MB, right?
Execute the command below on any snapshot from long-running environment.
find /path/to/snapshot/from/long_running_environment -type f | xargs du | sort -n
You will find a lot offiles >100MB and in the same time older that 1 hour.
I can provide such snapshots, if you need.

What is the reason to have two parallel logrotate configs for daily and hourly, 10-fuel-docker.conf and 20-fuel-docker.conf?
I don't think it is suggested use.

From man logrotate:
       hourly
              Log files are rotated every hour. Note that usually logrotate is configured to be run by cron
              daily. You have to change this configuration and run logrotate hourly to be able to really
              rotate logs hourly.

So, can we do the following?
1) Remove separate daily config:
rm /etc/logrotate.d/10-fuel-docker.conf

2) Remove executing of logrotate.d/20-fuel-docker.conf directly:
rm /etc/cron.hourly/logrotate

3) Move normal logrotate configuration to cron. hourly, as suggested in man
mv /etc/cron.daily/logrotate /etc/cron.hourly/

4) Change size to maxsize and add weekly to /etc/logrotate.d/20-fuel-docker.conf, so it will contain the following:
# rotate IF (log is older than one week AND bigger than 30M in the same time)
weekly
minsize 30M
# OR it is bigger than 100M, without regard the age
maxsize 100M

5) Add to /etc/logrotate.conf dateformat suffix with unix time in the end.
dateformat -%Y%m%d-%s
So, rotated files will look like:
dump.log-20150412-1428814861.gz
Or, remove dateext from /etc/logrotate.conf, and rotated files will look like:
dump.log.1.gz
This addition is necessary to successfully run hourly logrotation.
Without it we will have errors like:
"destination /var/log/verybig.201409.gz already exists, skipping rotation"

This config is simple and easier to manage.
" weekly (4 rotations)
empty log files are not rotated, only log files >= 31457280 bytes are rotated, log files >= 104857600 are
rotated earlier, old logs are removed"

Also, with this configuration logrotate -f /etc/logrotate.conf will work without errors.

Same should be applied to all nodes too.
I think all versions starting from this...

We have different suffixes for rotated files in hourly and daily logrotate configs.
We have dateext config in the /etc/logrotate.conf, but we do not have that directive in /etc/logrotate.d/20-fuel-docker.conf
Since we are running
 /usr/sbin/logrotate /etc/logrotate.d/20-fuel*.conf
 every hour directly in cron, it leads to the  situation, when we have rotated files with both suffixes:
-rw-r--r-- 1 root root 105M Apr 28 14:29 /var/log/docker-logs/dump.log
-rw-r--r-- 1 root root 106M Apr 28 13:29 /var/log/docker-logs/dump.log.1.gz
-rw-r--r-- 1 root root 211M Apr 20 01:06 /var/log/docker-logs/dump.log-20150428.gz
-rw-r--r-- 1 root root 106M Apr 28 12:29 /var/log/docker-logs/dump.log.2.gz
-rw-r--r-- 1 root root 106M Apr 28 11:30 /var/log/docker-logs/dump.log.3.gz
-rw-r--r-- 1 root root 106M Apr 28 10:29 /var/log/docker-logs/dump.log.4.gz

Despite I can't reproduce it in my lab, I'm sure that with this configuration hourly logrotate is be broken.
Proof:
By design we shouldn't have any files older that one hour, and bigger than 100MB, right?
Execute the command below on any snapshot from long-running environment.
find /path/to/snapshot/from/long_running_environment -type f | xargs du | sort -n
You will find a lot offiles >100MB and in the same time older that 1 hour.
I can provide such snapshots, if you need.

What is the reason to have two parallel logrotate configs for daily and hourly, 10-fuel-docker.conf and 20-fuel-docker.conf?
I don't think it is suggested use.

From man logrotate:
       hourly 
              Log files are rotated every hour. Note that usually logrotate is configured to be run by  cron
              daily.  You  have  to  change this configuration and run logrotate hourly to be able to really
              rotate logs hourly.

So, can we do the following?
1) Remove separate daily config:
rm /etc/logrotate.d/10-fuel-docker.conf

2) Remove executing of logrotate.d/20-fuel-docker.conf directly:
rm /etc/cron.hourly/logrotate

3) Move normal logrotate configuration to cron. hourly, as suggested in man
mv /etc/cron.daily/logrotate /etc/cron.hourly/

4) Change size to maxsize and add weekly to /etc/logrotate.d/20-fuel-docker.conf, so it will contain the following:
# rotate IF (log is older than one week AND bigger than 30M in the same time)
weekly
minsize 30M
# OR it is bigger than 100M, without regard the age
maxsize 100M

5) Add to /etc/logrotate.conf dateformat suffix with unix time in the end.
dateformat -%Y%m%d-%s
So, rotated files will look like:
dump.log-20150412-1428814861.gz
Or, remove dateext from /etc/logrotate.conf, and rotated files will look like:
dump.log.1.gz
This addition is necessary to successfully run hourly logrotation. 
Without it we will have errors like:
"destination /var/log/verybig.201409.gz already exists, skipping rotation"

This config is simple and easier to manage. 
" weekly (4 rotations)
empty log files are not rotated, only log files >= 31457280 bytes are rotated, log files >= 104857600 are
 rotated earlier, old logs are removed"

Also, with this configuration logrotate -f /etc/logrotate.conf will work without errors.

Same should be applied to all nodes too.
I think all versions starting from this commit are affected
https://github.com/stackforge/fuel-library/commit/84cd2624ec4292b9877c29e5be6f20946f8ef53a

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-12:

#5

delorian.sh Edit (759 bytes, text/x-sh)

Attached little script that can be used for tests.
It is destructive, so use only on lab environment that you are going to destory.
It is loop, that rewinds time, add 105M to /var/log/docker-logs/dump.log, and wait for logrotation cron job to execute on first minute of hour.

If you want to test weekly rotations, use
date -s "+10 days"

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-13:

#6

note, we use size 300M, so it is expected that there would be many files >100M not rotated. They are waiting to became 300M

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-13:

#7

There are too many suggested changes to be addressed within a single bug scope, please separate them and submit separate bugs, or submit a logging improvements bp with a spec to pass review process, as it suggests too much changes and a major redesign for logging

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-13:

#8

I agree we should use only 1 rotation file instead of two and replace the 'size' directive to 'maxsize', which would allow to combine both 'weekly' and 'maxsize' directives in one template.

Alexander Bozhenko (alexbozhenko) on 2015-04-13

tags:

added: customer-found logrotate

Revision history for this message

Raul Flores (raul-flores11) wrote on 2015-04-13:

#9

Checked my Fuel 6.0 master and it seems logrotate is configured for 100M

[root@fuel ~]# grep -Ri 300 /etc/logrotate.d/*
[root@fuel ~]# grep -Ri 100 /etc/logrotate.d/*
/etc/logrotate.d/20-fuel-docker.conf: # also rotate if 100M size have exceeded, should be size > minsize
/etc/logrotate.d/20-fuel-docker.conf: size 100M

Revision history for this message

Victor Denisov (vdenisov) wrote on 2015-04-13:

#10

Bogdan,

The description of the problem is pretty straight forward.
If you think it makes sense to split this ticket due to development needs - please do,
because we don't see how exactly it should be subdivided.

Please consider raising the priority as it is a customer found defect and causes a lot of container corruptions in
their environments.

Thanks,
Victor.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-14:

#11

In addition to comment #9, to confirm that hourly logrotate is not working:
find fuel-snapshot-2015-04-11_00-50-26 -type f | xargs du | sort -n
552192 fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/nailgun/api.log
2873112 fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/remote/node-20.domain.tld/kernel.log

find fuel-snapshot-2015-04-11_22-30-40 -type f | xargs du | sort -n
160460 fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/error_nailgun.log.1
190796 fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/remote/node-1.dtw.os.vdc.local/neutron-server.log
219136 fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/access_nailgun.log
300560 fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/access_nailgun.log.1
344556 fuel-snapshot-2015-04-11_22-30-40/node-1.domain.local/var/log/ceph/ceph-mon.node-1.log

And to confirm that file is older that one hour:
[alex@localhost tmp] $ head -n1 fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/access_nailgun.log
172.17.42.1 - - [04/Feb/2015:20:25:13 +0000] "GET /api/nodes/allocation/stats?_=1423019698392 HTTP/1.1" 502 575 "http://10.240.16.150:8000/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.91 Safari/537.36"
[alex@localhost tmp] $ tail -n1 fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/access_nailgun.log
172.17.42.1 - - [11/Apr/2015:22:32:19 +0100] "GET /api/notifications?_=1428787811780 HTTP/1.1" 200 3435 "http://10.240.16.150:8000/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"

Just pick any long-running env's snapshot and execute the same command.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-14:

#12

In the given output, what is the du -h human readable file sizes?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-14:

#13

Please also on affected master node provide an output for ls -ld /var/log/remote/* | awk '{ print $11 }' | sort | uniq -d

There is known issue with broken log rotation due to wrong symlinks created for some ".bak" directories https://bugs.launchpad.net/fuel/+bug/1428150

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-15:

#14

find fuel-snapshot-2015-04-11_00-50-26 -type f | xargs du -h | sort -h
37M fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/nailgun/app.log
102M fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/remote/node-35.domain.tld/kernel.log.1
114M fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/remote/node-20.domain.tld/kernel.log.1
169M fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/remote/node-35.domain.tld/kernel.log
540M fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/nailgun/api.log
2.8G fuel-snapshot-2015-04-11_00-50-26/10.43.0.10/var/log/docker-logs/remote/node-20.domain.tld/kernel.log

find fuel-snapshot-2015-04-11_22-30-40 -type f | xargs du -h | sort -h
...
214M fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/access_nailgun.log
294M fuel-snapshot-2015-04-11_22-30-40/10.240.16.150/var/log/docker-logs/nginx/access_nailgun.log.1
337M fuel-snapshot-2015-04-11_22-30-40/node-1.domain.local/var/log/ceph/ceph-mon.node-1.log

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-15:

#15

Bogdan,
in one of the snapshots above .bak directories are present.
What I saw on customer's environment is exactly the bug you mentioned:
https://bugs.launchpad.net/fuel/+bug/1428150

>> Please also on affected master node provide an output for...
I will post when I get it from the customer.

I propose to increase priority of this to high, to prevent shipping of the release with this bug.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-15:

#16

The bug https://bugs.launchpad.net/fuel/+bug/1428150 is already high. And this has main impact on broken state of log rotation.

Please also note, what we should not put rotation into /etc/cron.hourly, we should rather use a separate cron job running, for example, each 10 minutes, otherwise the bug https://bugs.launchpad.net/bugs/1440925 would arise when debug is ON and there are many nodes producing logs

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-15:

#17

This bug has impact on rotation too, described in my comment, #4. It is not related to .bak dirs.

"We have dateext config in the /etc/logrotate.conf, but we do not have that directive in /etc/logrotate.d/20-fuel-docker.conf
Since we are running
/usr/sbin/logrotate /etc/logrotate.d/20-fuel*.conf
every hour directly in cron, it leads to the situation, when we have rotated files with both suffixes:
-rw-r--r-- 1 root root 105M Apr 28 14:29 /var/log/docker-logs/dump.log
-rw-r--r-- 1 root root 106M Apr 28 13:29 /var/log/docker-logs/dump.log.1.gz
-rw-r--r-- 1 root root 211M Apr 20 01:06 /var/log/docker-logs/dump.log-20150428.gz
-rw-r--r-- 1 root root 106M Apr 28 12:29 /var/log/docker-logs/dump.log.2.gz
-rw-r--r-- 1 root root 106M Apr 28 11:30 /var/log/docker-logs/dump.log.3.gz
-rw-r--r-- 1 root root 106M Apr 28 10:29 /var/log/docker-logs/dump.log.4.gz
"

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-15:

#18

I raised this to high as there are many logrotate related bugs, for examle https://bugs.launchpad.net/fuel/+bug/1440925, which impactgs scale lab and large deployments.

So we should work on optimizing the logrotate configuration, as it was suggested at #4 at #8, for the 6.1 release as well

summary:

- Logrotate incorrect configuration: hourly logrotate is broken
+ Logrotate incorrect configuration: hourly logrotate is badly configured

Bartłomiej Piotrowski (bpiotrowski) on 2015-04-15

Changed in fuel:
status:	Won't Fix → Confirmed
importance:	Low → High

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-15:

#19

Let me summarize the scope of changes. I suggest us to address the following improvements (should be done both for *fuel-docker* and *fuel* rotation templates, which are for master node and Openstack nodes):
1) tabooext + .ignoredaily from #2 for overlapping files
2) single rotation config file with weekly and maxsize / minsize parameters instead of two files - points (1) and (4) from comment #4.
As I mentioned, points (2), (3) should not be addressed due to https://bugs.launchpad.net/fuel/+bug/1440925 related concerns. Otherwise we should:
3) separate */3 cron job (every 20 minutes) which calls /etc/cron.hourly/logrotate script directly (just rename it to /etc/cron.d/fuel-logrotate).
4) Use dateformat suffix, see (5) from #4
5) make maxsize value to depend on the number of hiera(nodes) size - we want less maxsize value for more nodes

no longer affects:	fuel/7.0.x
Changed in fuel:
status:	Confirmed → Triaged

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-15:

#20

If we are having
2) single rotation config
we don't need
1) tabooext + .ignoredaily

3) I think it is safier to have only one cronjob that is running logrotate, so if we want to run it every to minutes, we should run logrotate /etc/logrotate.conf, without running it hourly.

In 5), will we dynamically change maxsize when number of nodes is changing?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-16:

#21

We should not modify the standard logrotate schedule for /etc/logrotate.conf. This might impact host OS in an unexpected way.

Yes, puppet apply should automatically update logrotate config for nodes. But it it not clear how to deal with this at master node.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-20: Fix proposed to fuel-library (master)

#22

Fix proposed to branch: master
Review: https://review.openstack.org/175271

Changed in fuel:
status:	Triaged → In Progress

OpenStack Infra (hudson-openstack) on 2015-04-20

Changed in fuel:
assignee:	Bartlomiej Piotrowski (bpiotrowski) → Bogdan Dobrelya (bogdando)

Bartłomiej Piotrowski (bpiotrowski) on 2015-04-20

Changed in fuel:
assignee:	Bogdan Dobrelya (bogdando) → Bartlomiej Piotrowski (bpiotrowski)

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-21:

#23

Let's do calculations:
There are around 80 files in fuel master node in /var/log/remote/fqdn/
Let's consider WORST case scenario, speed of writing to _every_ file is 100MB/hour.
Compressed 100mb log weights 4MB.
So with 4 rotations we will have 100*80 + (4*80)*4 = 9280 MB for each node.
So, for 200 nodes you need to have 2TB in /var/log/

>> We should not modify the standard logrotate schedule for /etc/logrotate.conf. This might impact host OS in an unexpected way.

As for me what is really dangerous and not supposed it to have 2 logrtate schedules.
They will meet each other every day dayly and 15-minutes schedules, and try to write to /var/lib/logrotate.status at the same time.
If we really need to run logrotate every 15 min we need to remove script from /etc/cron.daily/logrotate and run it with our schedule in /etc/cron.d, but that should be the only one logrotate run.

But do we really need to run it every 15 min?
100MB of logs is ~ 765153 lines of text. Question to mos-scale:
Have you seen speed more than 100MB per hour, even in debug mode?
I think one hour would be enough. In that case just:
mv /etc/cron.daily/logrotate /etc/cron.hourly

And this way is adviced in man. Let me quote again:
hourly Log files are rotated every hour. Note that usually logrotate is configured to be run by cron
daily. You have to change this configuration and run logrotate hourly to be able to really
rotate logs hourly.

Revision history for this message

Bartłomiej Piotrowski (bpiotrowski) wrote on 2015-04-21:

#24

There will be no files overlap, so I don't see any issue with running logrotate both daily and every 15 minutes.

15 minutes schedule give us clear conscience in case of sudden increase of logs.

The man page doesn't recommend to use hourly rotation, just describes steps required to achieve it.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-21:

#25

@Alexandr, the calculations you provided should be done as pre-deploy planning. And I see nothing strange here, just make sure you will provide 2Tb for master node's /var/log

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2015-04-21:

#26

So we need this info for deployment guide, is it there?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-21:

#27

If it is not there, when we have a docs bug to be submitted and addressed separately

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-21:

#28

Related docs bug https://bugs.launchpad.net/fuel/+bug/1446668

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-22:

#29

@Bartlomiej
>>>There will be no files overlap, so I don't see any issue with running logrotate both daily and every 15 minutes.
Once a day daily logrotate will start, and 15-minutes rotation will start at the same time.
We need to use tabbotext in /etc/logrotate.cong to prevent it.

>>>15 minutes schedule give us clear conscience in case of sudden increase of logs.
15 is magic number, why it is better then 5, or 150? We need to research maximum speed of logs writing under high load when debug log is enabled, and choose rotation frequency according to it.
If we have maxsize 100MB, 15 mins rotation and 100MB per 15 mins speed, our logs will gone in one hour(4 rotations).
Is it really what we need?
What if rotation will last longer than 15 mins?

>>>The man page doesn't recommend to use hourly rotation, just describes steps required to achieve it.
Having one hourly rotation is easier to manage and understand than daily + 15-minutes rotation.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-22:

#30

Also, need to modify this part in the docs:
http://docs.mirantis.com/openstack/fuel/fuel-6.0/operations.html?highlight=logrotate#fuel-master-node-log-rotation-template
when the change will be implemented.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-22:

#31

@Alexander, thank you for valuable feedback and good concerns!

>Once a day daily logrotate will start, and 15-minutes rotation will start at the same time. We need to use tabbotext in /etc/logrotate.cong to prevent it.

Could you please clarify the exact changes we have to make in the logrotate configuration?
Do you tink we should also use a lock file in order to prevent overlapping with itself of our custom rotation job scheduled to be run every 15 minutes? I believe we already have such lock files for the nailgun agent, so we could reuse this approach here as well.

>15 minutes schedule give us clear conscience in case of sudden increase of logs. 15 is magic number, why it is better then 5, or 150? We need to research maximum speed of logs writing under high load when debug log is enabled, and choose rotation frequency according to it.

What exactly do you suggest we should fix here? You can never predict this, right? Configuration management cannot predict this and cannot update this as we don't use puppet agents.

>If we have maxsize 100MB, 15 mins rotation and 100MB per 15 mins speed, our logs will gone in one hour(4 rotations).
Is it really what we need?

What do you suggest? That is why originally there were two different files so we could configure two different "keep" policies for size based and time based templates.

> What if rotation will last longer than 15 mins?

Please see my comment about lock file. Perhaps we should implement it inside the job which calls logrotate and exit if it is already running and there is a lockfile exist. Easy thing.

>Having one hourly rotation is easier to manage and understand than daily + 15-minutes rotation.

I agree, but as I mentioned, one hour delay is too high and logs could got overwhelmed, see related bug https://bugs.launchpad.net/fuel/+bug/1440925

Revision history for this message

Bartłomiej Piotrowski (bpiotrowski) wrote on 2015-04-22:

#32

We can increase the number of logs to keep, albeit it will affect how much space you need to allocate just for logs.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-22:

#33

>>>Could you please clarify the exact changes we have to make in the logrotate configuration?
Do you tink we should also use a lock file in order to prevent overlapping with itself of our custom rotation job scheduled to be run every 15 minutes? I believe we already have such lock files for the nailgun agent, so we could reuse this approach here as well.
Add to /etc/logrotate.conf
tabooext + .ignoredaily
and
mv /etc/logrotate.d/fuel.conf /etc/logrotate.d/fuel.conf.ignoredaily
This will lead to daily logrotate job, that run all configs from /etc/logrotate.d will ignore fuel.conf.ignoredaily file, that we run on separate schedule.

>>>Do you tink we should also use a lock file in order to prevent overlapping with itself of our custom rotation job scheduled to be run every 15 minutes? I believe we already have such lock files for the nailgun agent, so we could reuse this approach here as well.
Probably, but in that case we should check if old logrotate job is hung for several hours, and kill it in that case.

>>>What exactly do you suggest we should fix here? You can never predict this, right? Configuration management cannot predict this and cannot update this as we don't use puppet agents.
I suggest to choose the values based on research.
On scale lab with debug logging turned max speed that I saw so far is 239M per hour in neutron-server.log
So, I agree that hourly won't be enough.

>>>What do you suggest? That is why originally there were two different files so we could configure two different "keep" policies for size based and time based templates.
I think there are only two possible options: either you allocate more free space, or your logs will be discarded more frequently. Two different policies is will not work if both hourly and daily schedules will use the same suffix: Logrotate will search for old files by pattern, and remove those who is older than "rotate" count.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-22:

#34

Bogdan, another question:
We have same 10-fuel-docker.conf and 20-fuel-docker.conf in rsyslog docker container.
Do we really need them there? Aren't all logs rotated by master node?

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-22:

#35

On 200 node fuel scale lab logrotate lasted for 23 minutes:
tail /var/log/cron:
Apr 22 22:01:02 fuel run-parts(/etc/cron.hourly)[9423]: starting logrotate
Apr 22 22:10:01 fuel CROND[10453]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Apr 22 22:20:01 fuel CROND[11500]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Apr 22 22:23:08 fuel run-parts(/etc/cron.hourly)[11847]: finished logrotate

UPD: new speed record is 253M per 30 mins:
ls -lah /var/log/docker-logs/remote/10.20.1.127/neutron-server.log;
head -n1 /var/log/docker-logs/remote/10.20.1.127/neutron-server.log |cut -d':' -f1,2,3,4;
tail -n1 /var/log/docker-logs/remote/10.20.1.127/neutron-server.log | cut -d':' -f1,2,3,4
-rw-r----- 1 root adm 253M Apr 22 22:33 /var/log/docker-logs/remote/10.20.1.127/neutron-server.log
2015-04-22T22:01:04.943260+00:00 debug
2015-04-22T22:33:32.294769+00:00 info

So, if debug logging is turned on user needs _a_lot_of space on fuel master node.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-22:

#36

Quick and dirty test if logrotate work:
Find biggest file, and check first line and last line:
biggest_file=$(find /var/log/ -type f | xargs du -h | sort -h | tail -n 1 | cut -f2);ls -lah $biggest_file ;head -n1 $biggest_file | head -c35; echo "";tail -n1 $biggest_file | head -c35; echo"";
If it is older than you rotation schedule, and if it is bigger then maxsize, than logrotate is not working correctly.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-22: Fix merged to fuel-library (master)

#37

Reviewed: https://review.openstack.org/175271
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=eb3e29caec4c2979d22e46a470610e8f6b8ac751
Submitter: Jenkins
Branch: master

commit eb3e29caec4c2979d22e46a470610e8f6b8ac751
Author: Bartłomiej Piotrowski <email address hidden>
Date: Fri Apr 17 15:27:29 2015 +0200

Rework logrotate configuration

    - instead of rotating logs daily and weekly, run the script every 15
      minutes
    - merge weekly and daily configuration files
    - rotate logs basing on two criteria: age (a week; only if file is
      bigger than 10M on master or 5MB on remote nodes) and size (rotate the file
      always if its size exceedes 100M on master or 20M on remote nodes, regardless
      its age)
    - use dateformat to make rotated logs' filenames more readable;
      as logrotate supports only year, month, day and unix epoch, use them
      all
    - fix a lot of lint complaints, including aligments, curly braces and
      proper quoting; give logrotate related files more intuitive names
    - dump.log has been renamed to shotgun.log (LP#1318516)
    - check if there is no logrotate instances running before executing
      cronjob

DocImpact: ops guide

    Change-Id: Ie7040ebcdc1a3329ebb7507807e5c27f0c87b484
    Closes-Bug: 1382515
    Closes-Bug: 1446790

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-23:

#38

And another thing:
On the nodes we have packages, that put their files to /etc/logrotate.d
rpm -qf /etc/logrotate.d/openstack-nova
openstack-nova-common-2014.2-fuel6.0.mira19.noarch

And we will have another copy of rotation pattern in 10-fuel.conf.
So, need to do the same on the nodes:
Add to /etc/logrotate.conf
dateformat -%Y%m%d-%s
tabooext + .ignoredaily
and rename /etc/logrotate.d/10-fuel.conf
to make daily logrotate not to read that config.

>>Do you think we should also use a lock file in order to prevent overlapping with itself..
We can't predict on what minute daily rotation will start, since it is managed by anacron, and it has random delay for jobs start minute. So it is not possible modify schedule of our logrotate schedule not to meet daily.
So, lock file or pgrep solution probably is the only one.
Although this means we may skip 15-minutes rotation if logrotate daily is already running.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-23:

#39

Also, to prevent this
https://bugs.launchpad.net/ubuntu/+source/logrotate/+bug/1278193
on ubuntu need to add this line:
su root syslog
to /etc/logrotate.d/10-fuel.conf

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-23:

#40

Also, on scale lab, where 14.04 is deployed, this dir exists:
root@node-1:/etc/logrotate.d# ls -1 /var/log/upstart/
cinder-api.log
cinder-backup.log
cinder-scheduler.log
cinder-volume.log
glance-api.log
glance-registry.log
heat-api-cfn.log
heat-api-cloudwatch.log
heat-api.log
keystone.log
neutron-server.log
nova-api.log
nova-cert.log
nova-conductor.log
nova-consoleauth.log
nova-novncproxy.log
nova-scheduler.log
systemd-logind.log

These files are huge and not covered by any rotation patter. Who is familiar with upstart and knows how to get rid of them?
https://ask.openstack.org/en/question/28968/logs-exist-in-both-varlognova-and-varlogupstart/

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-23:

#41

Looks like there is option in upstart that can help us get rid of /var/log/upstart/
https://wiki.ubuntu.com/PrecisePangolin/ReleaseNotes/TechnicalOverviewUpstart#console_log

Revision history for this message

Bartłomiej Piotrowski (bpiotrowski) wrote on 2015-04-23:

#42

Thanks Alexander. I'll open a separate issue for issues you mentioned. We certainly need to fix these.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-23:

#43

Thank you for you collaboration, Bartlomiej.

I made changes similar to what I described and left it overnight on 200 node scale lab. Let's see tomorrow the results.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-23:

#44

Thank you for the test case in #36, here is a system tests bug for this https://bugs.launchpad.net/fuel/+bug/1447488

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-23:

#45

>Bogdan, another question:
>We have same 10-fuel-docker.conf and 20-fuel-docker.conf in rsyslog docker container.
>Do we really need them there? Aren't all logs rotated by master node?

We don't have a cron running inside this container, so we don't need them as well

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-24:

#46

UPD about #40 and #41
There is
/etc/logrotate.d/upstart:
/var/log/upstart/*.log {
        daily
        missingok
        rotate 7
        compress
        notifempty
        nocreate
}

But it is daily rotation, so we still need to turn off loggingin /var/log/upstart/ for openstack components.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-24:

#47

UPD from 200 node scale lab(where debug logging is turned on).
From /var/log/cron:
Apr 22 22:01:02 fuel run-parts(/etc/cron.hourly)[9423]: starting logrotate
Apr 22 22:23:08 fuel run-parts(/etc/cron.hourly)[11847]: finished logrotate
Apr 22 23:01:01 fuel run-parts(/etc/cron.hourly)[15818]: starting logrotate
Apr 22 23:45:52 fuel run-parts(/etc/cron.hourly)[20496]: finished logrotate
Apr 23 00:01:01 fuel run-parts(/etc/cron.hourly)[22010]: starting logrotate
Apr 23 00:26:06 fuel run-parts(/etc/cron.hourly)[24608]: finished logrotate

In bad case logrotate lasted for 45 minutes. What he is doing all the time is checking, rotating and compressing(what is giving significant load on disk) 4818 files in /var/log/remote.

So 15minutes schedule will introduce more problems then it is solving: saving space.
I think we should go back to 1h schedule, and just add notes in documentation that with debug logging enabled you need to have a lot of free space.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-24:

#48

Also, my calculations in #23 was wrong.
Not every file has equal speed of growing: almost all files grow slow, but several of them grow very fast.
These a 10 biggest dir every hour from scale lab.
http://paste.openstack.org/show/205480/
Note that 3 biggest are controllers, and they no bigger than 2.1GB.
Computes are less than 400MB.

So, let's say 3GB for logs per node on fuel master is ought to be enough for anybody.

(probably, ceilometer and mongo are more verbose and are writing their logs faster. It is need to be tested, but I'm not allowed to change scale lab deploy)

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-29:

#49

Is anybody still working on the bug?
Was the patch verified on real installation?
What about concerns in comments #29 - #48?

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-04-29:

#50

@Alexander, bugs in "Fix commited" state are normally getting verified by the Fuel QA team, eventually.

Revision history for this message

Bartłomiej Piotrowski (bpiotrowski) wrote on 2015-04-29:

#51

Alexander, I have submitted new issue with summary of problems you have found.

Please take a look at https://bugs.launchpad.net/fuel/+bug/1450039 and add anything I could miss.

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-29:

#52

@Bogdan, is this activity tracked? What if they forgot to verify it, and we will ship change, that can be not working?

Revision history for this message

Alexander Bozhenko (alexbozhenko) wrote on 2015-04-30:

#53

I want to add that "Fix Committed" status of the bug may confuse customers.
We are referring to the bug in zendesk tickets, and if customer see "Fix Committed" he may try to apply the patch by himself.
But the patch is not in ready state yet.
Please focus on finishing this improvement. We continue to have flow of tickets with filled up disks.
We want to send preventive technical bulletin when 100% working solution will come.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-07: Related fix proposed to fuel-library (master)

#54

Related fix proposed to branch: master
Review: https://review.openstack.org/180892

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-07: Related fix merged to fuel-library (master)

#55

Reviewed: https://review.openstack.org/180892
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=86947bb9df42b7f3bd8625be456ffe2dafdc50ee
Submitter: Jenkins
Branch: master

commit 86947bb9df42b7f3bd8625be456ffe2dafdc50ee
Author: Bartłomiej Piotrowski <email address hidden>
Date: Thu May 7 08:03:50 2015 +0200

Polish logrotate configuration

    * move custom logrotate scripts from anacron to openstack module
    * use tabooext to ensure there is no overlap between Fuel's cronjob and
      the daily one
    * run logrotate twice an hour
    * make sure that logs are saved as root:syslog due to bug in Ubuntu
      package[1]
    * fix path to logrotate configuration file in Fuel's logrotate wrapper

[1] https://bugs.launchpad.net/ubuntu/+source/logrotate/+bug/1278193

    Change-Id: Id161e8074c7fe80d69eb831db74b844094f327e7
    Closes-Bug: 1450039
    Closes-Bug: 1450425
    Related-Bug: 1382515

Revision history for this message

Vladimir Khlyunev (vkhlyunev) wrote on 2015-05-13:

#56

build_id: 2015-05-08_11-08-49
build_number: '395'

The "/usr/bin/fuel-logrotate" script is not working. I assume that this bug comes from the previous commit.

[root@nailgun ~]# dd if=/dev/urandom of=/var/log/fuelmenu.log bs=1024K count=150
150+0 records in
150+0 records out
157286400 bytes (157 MB) copied, 24.5792 s, 6.4 MB/s
[root@nailgun ~]# ls -ahl /var/log/fuelmenu*
-rw-r--r-- 1 root root 150M May 13 06:32 /var/log/fuelmenu.log
[root@nailgun ~]# /usr/bin/fuel-logrotate # instant exit, no actions
[root@nailgun ~]# echo $?
127
[root@nailgun ~]# ls -ahl /var/log/fuelmenu*
-rw-r--r-- 1 root root 150M May 13 06:32 /var/log/fuelmenu.log

The log file was not rotated. Then I run logrotate manually:

[root@nailgun ~]# /usr/sbin/logrotate /etc/logrotate.d/fuel.nodaily
[root@nailgun ~]# echo $?
0
[root@nailgun ~]# ls -ahl /var/log/fuelmenu*
-rw-r--r-- 1 root root 0 May 13 06:34 /var/log/fuelmenu.log
-rw-r--r-- 1 root root 151M May 13 06:32 /var/log/fuelmenu.log.1.gz

And the file was archived as expected

Vladimir Khlyunev (vkhlyunev) on 2015-05-13

Changed in fuel:
status:	Fix Committed → Confirmed

Bartłomiej Piotrowski (bpiotrowski) on 2015-05-13

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-13: Fix proposed to fuel-library (master)

#57

Fix proposed to branch: master
Review: https://review.openstack.org/182579

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-13: Fix merged to fuel-library (master)

#58

Reviewed: https://review.openstack.org/182579
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c8702d93371e138e8ed362cf25e959ce960d79f3
Submitter: Jenkins
Branch: master

commit c8702d93371e138e8ed362cf25e959ce960d79f3
Author: Bartłomiej Piotrowski <email address hidden>
Date: Wed May 13 10:54:20 2015 +0200

Pass -x to pgrep to ensure strict pattern matching

    Without the -x flag, 'pgrep logrotate' matches both 'logrotate' and
    'fuel-logrotate', causing the script to exit too soon to do the actual
    log rotation.

Change-Id: Ie836a4aa65e4a4c19c2ad7540a0fb0572d14f846
Closes-Bug: 1382515

Changed in fuel:
status:	In Progress → Fix Committed

Sergey Novikov (snovikov) on 2015-06-01

tags:

added: on-verification

Revision history for this message

Sergey Novikov (snovikov) wrote on 2015-06-02:

#59

Verified on fuel-6.1-478-2015-05-28_20-55-26.iso.

Steps to verify:
    1. Deploy cluster
    2. Run OSTF
    3. Check that dump.log had been renamed to shotgun.log
    4. dd if=/dev/urandom of=/var/log/fuelmenu.log bs=1024K count=110
    5. Run fuel-logrotate script
    6. Check that logs were rotated
    7. dd if=/dev/urandom of=/var/log/fuelmenu.log bs=1024K count=110
    8. Wait
    9. Check that logs were rotated

tags:	removed: on-verification
Changed in fuel:
status:	Fix Committed → Fix Released

Bartłomiej Piotrowski (bpiotrowski) on 2015-07-07

no longer affects:

fuel/6.0.x

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-07: Related fix proposed to fuel-library (stable/6.0)

#60

Related fix proposed to branch: stable/6.0
Review: https://review.openstack.org/199026

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-07: Related fix merged to fuel-library (stable/6.0)

#61

Reviewed: https://review.openstack.org/199026
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=cf13078c4fe492ea13a6698b8ed765bc1a2cdb82
Submitter: Jenkins
Branch: stable/6.0

commit cf13078c4fe492ea13a6698b8ed765bc1a2cdb82
Author: Bartłomiej Piotrowski <email address hidden>
Date: Tue Jul 7 11:12:53 2015 +0200

Backport logrotate changes from 6.1

    * use tabooext to ensure no overlap between Fuel's cron job and the
      system one
    * run logrotate twice an hour
    * merge weekly and daily configuration files
    * rotate logs basing on two criteria: age and size
    * use dateformat to make rotated logs' filename more readable
    * enable delayed compression to avoid race condition between
      fuel-logrotate and regular logrotate
    * combine CentOS and Ubuntu logrotate wrappers to a single file

    Related-Bug: 1382515
    Related-Bug: 1450039
    Related-Bug: 1461400
    Closes-Bug: 1472179
    Change-Id: I58caf3c93943af818a03e07d534ff8c42a63d2cb

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-05: Related fix proposed to fuel-library (stable/5.1)

#62

Related fix proposed to branch: stable/5.1
Review: https://review.openstack.org/263639

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-25: Change abandoned on fuel-library (stable/5.1)

#63

Change abandoned by Tony Breeds (<email address hidden>) on branch: stable/5.1
Review: https://review.openstack.org/263639
Reason: This branch (stable/5.1) is at End Of Life

Fuel for OpenStack

Logrotate incorrect configuration: hourly logrotate is badly configured

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches