fuel_logrotate destroys logging for docker containers

Bug #1606209 reported by Nadezhda Kabanova
This bug report is a duplicate of:  Bug #1597352: varlog full on fuel / docker. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
New
Undecided
Unassigned
8.0.x
New
Undecided
MOS Maintenance
Mitaka
New
Undecided
Unassigned
Newton
New
Undecided
Unassigned

Bug Description

Detailed bug description:
Now we have two rules for logrotate to manage rotations logs of docker-containers:
/etc/logrotate.d/fuel.nodaily on Host OS includes following patterns:
"/var/log/nailgun-agent.log"
"/var/log/nailgun/*.log"
"/var/log/nginx/*.log"
"/var/log/ostf*.log"
"/var/log/rabbitmq/*"
 and rules are:
  copytruncate
  compress
  missingok
  notifempty
  dateformat -%Y%m%d-%s
  rotate 4
  weekly
  minsize 10M
  maxsize 100M

On the containers space we have separate files for each container. Lets see as an example config for nginx:
# cat /etc/logrotate.d/nginx
    create 0644 nginx nginx
    daily
    rotate 10
    missingok
    notifempty
    compress
    sharedscripts
    postrotate
        /bin/kill -USR1 `cat /run/nginx.pid 2>/dev/null` 2>/dev/null || true
    endscript
}

As a result we see that nginx failed to write to the proper log and tried to write to one that already rotated/deleted and hold file descriptors of files that don't exist anymore. It could be not a big problemm till these files are not huge. Customer met a problem, when #du /var/log shows 14GB used and #df /var/log shows 430 GB used and the heavist log og nginx hold more than 400GB,
as workaround we can just restart problem container.
I thing the solution can be just exclude patterns from file /etc/logrotate.d/fuel.nodaily all patterns that manage containers logs that managed in the same time by logrotate configs in cntainers spase.

Expected result:
[root@fuel ~]# find /proc/*/fd -ls | grep '(deleted)' | sort -u
[root@fuel ~]#

[root@fuel ~]# ps -ef | grep nginx
root 25034 22535 0 10:03 pts/11 00:00:00 grep nginx
root 25129 1222 0 Jul22 ? 00:00:02 docker start -a fuel-core-7.0-nginx
root 25654 25118 0 Jul22 pts/8 00:00:00 nginx: master process nginx -g daemon off;
496 25655 25654 0 Jul22 pts/8 00:00:10 nginx: worker process

[root@fuel ~]# ls -l /proc/25654/fd (some output ommited)
total 0
l-wx------ 1 root root 64 Jul 25 09:56 4 -> /var/log/nginx/error.log
l-wx------ 1 root root 64 Jul 25 09:56 5 -> /var/log/nginx/access.log
l-wx------ 1 root root 64 Jul 25 09:56 6 -> /var/log/nginx/access_repo.log
l-wx------ 1 root root 64 Jul 25 09:56 7 -> /var/log/nginx/error_repo.log
l-wx------ 1 root root 64 Jul 25 09:56 8 -> /var/log/nginx/access_nailgun.log
l-wx------ 1 root root 64 Jul 25 09:56 9 -> /var/log/nginx/error_nailgun.log
 from the nginx container shell [root@fuel /]# ls -l /var/log/nginx/ (some output ommited)
-rw-r--r-- 1 root root 34895 Jul 23 03:14 access_nailgun.log
-rw-r--r-- 1 root root 8951 Jul 22 18:06 access_nailgun.log.1
-rw-r--r-- 1 root root 38951 Jul 15 16:18 access_nailgun.log.2.gz
-rw-r--r-- 1 root root 12626 Jul 23 03:13 access_nailgun.log.3.gz
-rw-r--r-- 1 root root 7823 Jul 23 03:14 error_nailgun.log
-rw-r--r-- 1 root root 6743 Jul 22 18:06 error_nailgun.log.1
-rw-r--r-- 1 root root 9440 Jul 15 16:18 error_nailgun.log.2.gz
-rw-r--r-- 1 root root 18770 Jul 23 03:13 error_nailgun.log.3.gz

Actual result:
[root@fuel ~]# find /proc/*/fd -ls | grep '(deleted)' | sort -u
2046143 0 l-wx------ 1 root root 64 Jul 25 09:56 /proc/25654/fd/8 -> /var/log/nginx/access_nailgun.log-20160723\ (deleted)
2046144 0 l-wx------ 1 root root 64 Jul 25 09:56 /proc/25654/fd/9 -> /var/log/nginx/error_nailgun.log-20160723\ (deleted)
2046155 0 l-wx------ 1 496 nginx 64 Jul 25 09:56 /proc/25655/fd/8 -> /var/log/nginx/access_nailgun.log-20160723\ (deleted)
2046156 0 l-wx------ 1 496 nginx 64 Jul 25 09:56 /proc/25655/fd/9 -> /var/log/nginx/error_nailgun.log-20160723\ (deleted)

[root@fuel ~]# ps -ef | grep nginx
root 25034 22535 0 10:03 pts/11 00:00:00 grep nginx
root 25129 1222 0 Jul22 ? 00:00:02 docker start -a fuel-core-7.0-nginx
root 25654 25118 0 Jul22 pts/8 00:00:00 nginx: master process nginx -g daemon off;
496 25655 25654 0 Jul22 pts/8 00:00:10 nginx: worker process

[root@fuel ~]# ls -l /proc/25654/fd (some output ommited)
total 0
l-wx------ 1 root root 64 Jul 25 09:56 4 -> /var/log/nginx/error.log
l-wx------ 1 root root 64 Jul 25 09:56 5 -> /var/log/nginx/access.log
l-wx------ 1 root root 64 Jul 25 09:56 6 -> /var/log/nginx/access_repo.log
l-wx------ 1 root root 64 Jul 25 09:56 7 -> /var/log/nginx/error_repo.log
l-wx------ 1 root root 64 Jul 25 09:56 8 -> /var/log/nginx/access_nailgun.log-20160723 (deleted)
l-wx------ 1 root root 64 Jul 25 09:56 9 -> /var/log/nginx/error_nailgun.log-20160723 (deleted)
 from the nginx container shell [root@fuel /]# ls -l /var/log/nginx/ (some output ommited)
-rw-r--r-- 1 root root 0 Jul 23 03:14 access_nailgun.log
-rw-r--r-- 1 root root 0 Jul 22 18:06 access_nailgun.log-20160715
-rw-r--r-- 1 root root 38951 Jul 15 16:18 access_nailgun.log-20160715.gz
-rw-r--r-- 1 root root 12626 Jul 23 03:13 access_nailgun.log-20160723.gz
-rw-r--r-- 1 root root 0 Jul 23 03:14 error_nailgun.log
-rw-r--r-- 1 root root 0 Jul 22 18:06 error_nailgun.log-20160715
-rw-r--r-- 1 root root 9440 Jul 15 16:18 error_nailgun.log-20160715.gz
-rw-r--r-- 1 root root 18770 Jul 23 03:13 error_nailgun.log-20160723.gz

Workaround:
restart docker container will fix an issue temporary.

Impact:
fuel master can run out of space

Description of the environment:
MOS8

Changed in fuel:
milestone: none → 8.0-updates
assignee: nobody → MOS Maintenance (mos-maintenance)
tags: added: area-library
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.