Nailgun doesn't work if /var/log partition is full

Bug #1736098 reported by Alexander Rubtsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Undecided
MOS Maintenance
Mitaka
Invalid
High
Denis Meltsaykin

Bug Description

--- Environment ---
MOS: 9.2

--- Steps to reproduce ---
1. Fill up the /var/log/ partition on the Fuel master node:
[root@fuel ~]# df | grep "os-varlog"
/dev/mapper/os-varlog 48200412 48184028 0 100% /var/log

2. Reboot the Fuel master node:
[root@fuel ~]# reboot

3. Run a command which interact with Nailgun:
[root@fuel ~]# fuel node

--- Actual result ---
500 Server Error: Internal Server Error for url: http://10.20.17.2:8000/api/v1/version (Internal Server Error)

The following messages are in /var/log/nailgun/app.log:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 872, in emit
    stream.write(fs % msg)
IOError: [Errno 28] No space left on device
Logged from file logger.py, line 129

--- Expected result ---
The command has been successfully launched:
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+----------+------------------+---------+------------+-------------------+------------+---------------+--------+---------
12 | ready | Untitled (4b:04) | 3 | 10.20.17.4 | 52:54:00:38:4b:04 | compute | | 1 | 3
10 | ready | Untitled (4b:03) | 3 | 10.20.17.6 | 52:54:00:38:4b:03 | controller | | 1 | 3
11 | ready | Untitled (4b:01) | 3 | 10.20.17.5 | 52:54:00:38:4b:01 | compute | | 1 | 3
13 | ready | Untitled (4b:02) | 3 | 10.20.17.7 | 52:1d:b1:8d:e3:46 | compute | | 1 | 3
 9 | discover | Untitled (4b:05) | | 10.20.17.7 | 52:54:00:38:4b:05 | | | |

[root@fuel ~]# echo $?
0

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

sla1 for 9.0-updates

tags: added: customer-found sla1
Changed in fuel:
assignee: nobody → Oleksiy Molchanov (omolchanov)
assignee: Oleksiy Molchanov (omolchanov) → MOS Maintenance (mos-maintenance)
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Alexander, log directory must not be full, it should be cleaned using logrotate or manually.

Changed in fuel:
status: New → Invalid
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

The python logger uses direct writes to the /var/log directory, if the dir is full - it fails. This is not a bug, it's a normal behavior. On the other hand the logging may be refactored to use syslog/udp logging, but this doesn't fit the maintenance policy and takes extra efforts to re-test everything.

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

Denis,

In this particular case, after the partition became full, Nailgun is still running and only reboot of the node leads to its failure.

Is there any way to make logging module to handle such IO errors properly, so that the application itself keeps working and ignores the fact that its log is not being written?

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Alexandr, we cannot change behavior of logging in python, we also cannot wrap every logging call into try/except block since it's nonsense. Oleksiy will take a look if it is possible to implement syslog/udp logging in nailgun. But I'm not sure it's an easy task, it will also take reconfiguration of rsyslog and logrotated.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Alexander, I deeply investigated this and came to a conclusion that there is no bug in nailgun. Here is my explanation. The logging system in nailgun designed to use several output handlers, one of them is WatchedFileHandler from the python library which should close and re-open the file if it has changed between writes, mostly to use in conjunction with logrotate. This is used because of the huge amount of logging for API calls when the logging level is set to `debug` and very frequent log rotation in fuel's debug configuration. There should not be any problem if logrotate is working, OTOH the logs directory space exhaustion is an exceptional situation and should be treated accordingly. Ignoring logger errors violates the 'fail-fast' principle (https://en.wikipedia.org/wiki/Fail-fast) which is common in software development nowadays.

Using a syslog/udp handler is not an option too - it demands refactoring of the whole logging process in nailgun which is not acceptable for a stable release.

Therefore I'm closing this as Invalid.

P.S. I can give you a patch to ignore exceptions from the logger, but it never will be merged in our repos.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.