logging fail to recover on rsyslog restart

Bug #1076466 reported by Tom Hancock on 2012-11-08
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
oslo.log
Medium
Tom Hancock

Bug Description

If syslog logging is enabled, then if rsyslog is restarted, is possible to break logging
such that it will continually issue tracebacks and not log.

This seems to happen in applications which use eventlet, and has been reproduced in
glance and in swift.

The problem seems to be this code in /usr/lib/python2.7/logging/handlers.py
in SyslogHander:emit()

       try:
            if self.unixsocket:
                try:
                    self.socket.send(msg)
                except socket.error, f:
                    self._connect_unixsocket(self.address)
                    self.socket.send(msg)
            elif self.socktype == socket.SOCK_DGRAM:
                self.socket.sendto(msg, self.address)
            else:
                self.socket.sendall(msg)
        except (KeyboardInterrupt, SystemExit):
            raise
        except Exception, e:
            self.handleError(record)

When rsyslog restarts, socket.send() raises a RuntimeError which falls into
the outer loop and doesn't reconnect to rsyslog

Here's a reproducer program (not meant to be pretty, or the smallest possible code)

import logging
import logging.config
import logging.handlers

import eventlet
from eventlet.green import socket, ssl
import eventlet.greenio

def _setup_logging_from_conf(product_name):
    log_root = logging.getLogger(product_name)
    facility = logging.handlers.SysLogHandler.LOG_LOCAL2
    syslog = logging.handlers.SysLogHandler(address='/dev/log',
                                                facility=facility)
    log_root.addHandler(syslog)
    log_root.setLevel(logging.DEBUG)

def func(num1,num2):
    thelogger=logging.getLogger('hello')
    while 1:
        for i in range(0,30):
            thelogger.error('a1')
        eventlet.sleep(5)

_setup_logging_from_conf('hello')
eventlet.patcher.monkey_patch(all=False, socket=True)
pool = eventlet.greenpool.GreenPool(1024)
nums = []
for i in range(0,30):
    nums.append((i,i+1))
while 1:
    for job in pool.starmap(func,nums):
        pass

Run this in one window and restart rsyslog in another window and you'll see repeated exceptions.
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/handlers.py", line 805, in emit
    self.socket.send(msg)
  File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 296, in send
    timeout_exc=socket.timeout("timed out"))
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/__init__.py", line 119, in trampoline
    listener = hub.add(hub.WRITE, fileno, current.switch)
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/epolls.py", line 46, in add
    listener = BaseHub.add(self, evtype, fileno, cb)
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 126, in add
    evtype, fileno, evtype))
RuntimeError: Second simultaneous write on fileno 5 detected. Unless you really know what you're doing, make sure that only one greenthread can write any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_multiple_reader_prevention(False)
.....

I'm not sure what the upstream means to address should be, but overriding SysLogHandler.emit()
with a change to catch RuntimeError looks like it could be a workaround?

Mark McLoughlin (markmc) on 2012-11-13
affects: openstack-common → oslo
Mark McLoughlin (markmc) wrote :

If there was a workaround patch to add to oslo, we could consider that but otherwise it sounds like a python bug to me and should be addressed there

Changed in oslo:
status: New → Incomplete

Fix proposed to branch: master
Review: https://review.openstack.org/18383

Changed in oslo:
status: Incomplete → In Progress
Mark McLoughlin (markmc) on 2013-01-12
Changed in oslo:
importance: Undecided → Medium
assignee: nobody → Tom Hancock (tom-hancock)
Mark McLoughlin (markmc) wrote :

Honestly, I think this is an eventlet bug plain and simple

The url describes the issue well: https://lists.secondlife.com/pipermail/eventletdev/2012-September/001095.html

I'm really not keen on working around this in OpenStack while leaving the issue lingering in eventlet

Have you tried getting it fixed in eventlet?

Changed in oslo:
status: In Progress → Incomplete
affects: oslo-incubator → oslo.log
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers