Swift daemons die when syslog stops running

Bug #1094230 reported by Apollon Oikonomopoulos
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Expired
Undecided
Unassigned

Bug Description

We noticed that on a machine running swift 1.7.5, the replicator, updater and auditor processes die as a side effect of rsyslogd crashing. This only happens if the swift processes are started while rsyslog is running (so swift.common.utils.get_logger returns a logger with a syslog handler) and then subsequently rsyslog is stopped.

Running the object-replicator in the foreground and then stopping rsyslog while object-replicator is running, we get:
------------------------------------------------------------------
Starting object-replicator...(/etc/swift/object-server.conf)
object-replicator Starting object replicator in daemon mode.
object-replicator Starting object replication pass.
object-replicator 775/775 (100.00%) partitions replicated in 4.76s (162.68/sec, 0s remaining)
object-replicator 70578 suffixes checked - 0.00% hashed, 0.00% synced
object-replicator Partition times: max 0.0290s, min 0.0040s, med 0.0047s
object-replicator Object replication complete. (0.08 minutes)
object-replicator Starting object replication pass.
object-replicator 775/775 (100.00%) partitions replicated in 4.72s (164.15/sec, 0s remaining)
object-replicator 70578 suffixes checked - 0.00% hashed, 0.00% synced
object-replicator Partition times: max 0.0241s, min 0.0041s, med 0.0047s
object-replicator Object replication complete. (0.08 minutes)
Error in sys.excepthook:

Original exception was:

------------------------------------------------------------------
It seems that run-time exceptions occuring during calls to the logging system are not handled gracefully. Furthermore, the catch-all sys.excepthook (overriden in swift.common.utils.capture_stdio) uses the system logger to log the unhandled exception. In this case however the logger does not work, so a further exception is raised from within the excepthook, causing the daemon to exit.

Revision history for this message
Kun Huang (academicgareth) wrote :

I think we should add a little change in capture_stdio:
if syslog works:
    sys.excepthook = lambda * exc_info: \
         logger.critical(_('UNCAUGHT EXCEPTION'), exc_info=exc_info)
else:
    print xxxxx

Revision history for this message
Filippo Giunchedi (filippo) wrote :

I couldn't reproduce this in icehouse btw, restarting rsyslog didn't affect the daemons

Revision history for this message
clayg (clay-gerrard) wrote :

IIRC, this was cleaned up in python logging's SysLogHandler in python 2.7 but probably still exists in python 2.6

I guess we could change it to incomplete until someone can confirm the issue still effects python 2.6 and we can try and figure out how to prioritize it?

Changed in swift:
status: New → Incomplete
Revision history for this message
Samuel Merritt (torgomatic) wrote :

I vote for prioritizing it low enough that we don't start on it until after April 2015, at which point we can close it out as only affecting unsupported Python versions.

I mean, Lucid's got support through April 2015 and then it's done, so the two Ubuntu LTSes will be Precise and Trusty, both of which have Python 2.7, plus RHEL 7 has Python 2.7, so that'll be it. Plus, the Python team has said that 2.6 is EOL, so no more updates for anything at all ever.

I guess if someone wants to take a crack at fixing it in Swift for 2.6 I'm not going to stop them, but it's super-duper-low-priority to me.

Revision history for this message
Filippo Giunchedi (filippo) wrote :

ah indeed, running on precise with python 2.7 as default. +1 to waiting until Lucid's LTS end and that's it

Revision history for this message
Filippo Giunchedi (filippo) wrote :

FWIW I said this seemed fixed but it isn't, we've seen reoccurence today (cfr https://wikitech.wikimedia.org/wiki/Incident_documentation/20140910-swift-syslog)

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Object Storage (swift) because there has been no activity for 60 days.]

Changed in swift:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers