Processes continue to hold open file descriptor for old log when log file is renamed or deleted

Bug #1437444 reported by Lars Kellogg-Stedman
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.log
Fix Released
Wishlist
Marian Horban

Bug Description

For OpenStack services that spawn multiple processes (such as nova-api or cinder-volume), some of those processes will hold open file descriptors to renamed or deleted log files. The latter case will prevent the filesystem from recovering space from a deleted file until after the process exits.

For example, nova-api has four processes holding open
/var/log/nova/nova-api.log:

    # ps -e -o pid,cmd |
      grep nova-api |
      awk '/python/ {print $1}' |
      xargs -iPID ls -l /proc/PID/fd/3
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3485/fd/3 -> /var/log/nova/nova-api.log
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3490/fd/3 -> /var/log/nova/nova-api.log
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3492/fd/3 -> /var/log/nova/nova-api.log
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3501/fd/3 -> /var/log/nova/nova-api.log

If I delete this file:

    # rm /var/log/nova/nova-api.log

The processes will continue to hold that fd open until a write occurs:

    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3485/fd/3 -> /var/log/nova/nova-api.log (deleted)
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3490/fd/3 -> /var/log/nova/nova-api.log (deleted)
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3492/fd/3 -> /var/log/nova/nova-api.log (deleted)
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3501/fd/3 -> /var/log/nova/nova-api.log (deleted)

After a write operation, the process making the write will recognize
that the file has gone missing and will open a new output file, but
other processes will continue to hold open the file descriptor until
(a) they exit or (b) they write a log message:

    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3485/fd/3 -> /var/log/nova/nova-api.log (deleted)
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3490/fd/3 -> /var/log/nova/nova-api.log (deleted)
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3492/fd/3 -> /var/log/nova/nova-api.log
    l-wx------. 1 nova nova 64 Mar 27 17:56 /proc/3501/fd/3 -> /var/log/nova/nova-api.log (deleted)

This could lead to problems in the event that someone is trying to
delete log files in order to free up space. There are several ways to
solve this, the simplest of which is to clearly document this behavior
and recommend the use of something like logrotates "copytruncate"
option for handling log rotation. Is that the best option?

At a glance, this affects at least nova, cinder, neutron, and glance.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

So i just tested this, for example when i deleted the file and ran a nova list, i see this

stack@dims-ubuntu:~/devstack$ ps -e -o pid,cmd | grep nova-api | awk '/python/ {print $1}' | xargs -iPID ls -l /proc/PID/fd/3
l-wx------ 1 stack stack 64 Apr 9 12:29 /proc/5321/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:29 /proc/5326/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5327/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5328/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5329/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5331/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5332/fd/3 -> /opt/stack/logs/nova-api.log
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5333/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5334/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5341/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5342/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5343/fd/3 -> /opt/stack/logs/nova-api.log (deleted)
l-wx------ 1 stack stack 64 Apr 9 12:32 /proc/5344/fd/3 -> /opt/stack/logs/nova-api.log (deleted)

Which basically means that one of the nova-api processes got triggered and the WatchedFileHandler did its job fine which is
to make sure it opened up a new file. i tried several other nova commands and a few more switched over to not show the "deleted" message. So the WatchedFileHandler is getting triggered when there is something to log in a specific process and at that time it detects that the file has changed and will close the file and open a fresh fd.

With latest nova trunk, we have a new feature "kill -SIGHUP 5321" where 5321 is the parent process of all the nova-api processes and you will see all the subprocesses switch immediately to the new fds.

*IF* we want to enforce a rule where the process will switch over immediately then we need to implement a file watcher thingy possibly using pyinotify:
https://github.com/seb-m/pyinotify

thanks,
dims

Changed in oslo.log:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Lars Kellogg-Stedman (larsks) wrote :

> So the WatchedFileHandler is getting triggered when there is something to log in a specific process

RIght, that's what I said in the original description:

> but other processes will continue to hold open the file descriptor until (a) they exit or (b) they write a log message:

Having SIGHUP handling as you've described seems like a reasonable solution. Hopefully that will start to crop up in other services as well.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (master)

Fix proposed to branch: master
Review: https://review.openstack.org/208366

Changed in oslo.log:
assignee: nobody → Marian Horban (mhorban)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (master)

Reviewed: https://review.openstack.org/208366
Committed: https://git.openstack.org/cgit/openstack/oslo.log/commit/?id=a834e6a10c433d20934de01c2c4cfe38e386968a
Submitter: Jenkins
Branch: master

commit a834e6a10c433d20934de01c2c4cfe38e386968a
Author: Marian Horban <email address hidden>
Date: Fri Aug 28 13:15:58 2015 -0400

    FastWatchedFileHandler class was added

    If one program opens file(in our case log file) and this file is moved
    or removed by other program, the first program still works with opened
    file descriptor.
    If we use WatchedFileHandler it allows to recreate file only after
    first emitting of WatchedFileHandler. It means that deleted files still
    consumes disk space.
    Class FastWatchedFileHandler is designed to observe file system. If log
    file is moved or removed FastWatchedFileHandler instantaneously closes
    old file descriptor and creates new file with specified name.
    Config option watch-log-file was added. If this option is set
    FastWatchedFileHandler handler will be used.
    Option watch-log-file brings effect for Linux platform only.

    DocImpact
    Closes-Bug: #1437444
    Change-Id: Ie70c70ad2d8cf198a8fd72488a43fb1bd71c57be

Changed in oslo.log:
status: In Progress → Fix Committed
Changed in oslo.log:
milestone: none → 1.12.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.