Cannot shut down ReactorNotRunning

Bug #717205 reported by John A Meinel on 2011-02-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Unassigned
Twisted
New
Unknown

Bug Description

There seems to be a bug in Twisted 10.2. If something bad happens, the system can get into the state where it isn't stopped, but both SIGTERM and SIGINT refuse to shut down because they "can't stop a reactor that isn't running".

I wasn't very worried about it, but it just happened in production, so I'm escalating the severity. The traceback looks something like this:
2011-02-11 11:21:17+0000 [-] Received SIGTERM, shutting down.
2011-02-11 11:21:17+0000 [-] Unhandled Error
        Traceback (most recent call last):
          File "/srv/bazaar.launchpad.net/production/launchpad-rev-12351/eggs/Twisted-10.2.0_4395fix_1-py2.6-linux-x86_64.egg/twisted/application/app.py", line 390, in startReactor
            self.config, oldstdout, oldstderr, self.profiler, reactor)
          File "/srv/bazaar.launchpad.net/production/launchpad-rev-12351/eggs/Twisted-10.2.0_4395fix_1-py2.6-linux-x86_64.egg/twisted/application/app.py", line 311, in runReactorWithLogging
            reactor.run()
          File "/srv/bazaar.launchpad.net/production/launchpad-rev-12351/eggs/Twisted-10.2.0_4395fix_1-py2.6-linux-x86_64.egg/twisted/internet/base.py", line 1158, in run
            self.mainLoop()
          File "/srv/bazaar.launchpad.net/production/launchpad-rev-12351/eggs/Twisted-10.2.0_4395fix_1-py2.6-linux-x86_64.egg/twisted/internet/base.py", line 1167, in mainLoop
            self.runUntilCurrent()
        --- <exception caught here> ---
          File "/srv/bazaar.launchpad.net/production/launchpad-rev-12351/eggs/Twisted-10.2.0_4395fix_1-py2.6-linux-x86_64.egg/twisted/internet/base.py", line 762, in runUntilCurrent
            f(*a, **kw)
          File "/srv/bazaar.launchpad.net/production/launchpad-rev-12351/eggs/Twisted-10.2.0_4395fix_1-py2.6-linux-x86_64.egg/twisted/internet/base.py", line 570, in stop
            "Can't stop reactor that isn't running.")
        twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.

This is causing lots of oops reports such as OOPS-1868SMPSSH1000

Jonathan Lange (jml) on 2011-02-11
security vulnerability: yes → no
visibility: private → public
Andrew Bennetts (spiv) wrote :

<jml> just found out about this issue affecting us: https://bugs.launchpad.net/launchpad/+bug/717205
<exarkun> Unless there's more details, I don't think that's new in 10.2
 You could always have a before shutdown trigger that returns a Deferred that doesn't fire as soon as you'd like
<exarkun> And re-sending a shutdown signal while waiting for that would always do something wacky
<exarkun> But! Certainly it would be nice to do something better.

In my experience with this error, that's exactly the cause.

So the obvious question is: what would “something better” be? Some possibilities:
 * a subsequent SIGTERM/SIGINT forces shutdown to continue without waiting for the unfired Deferred(s).
 * a subsequent SIGTERM/SIGINT just logs a simple warning “SIGFOO received but shutdown already in progress”
 * a subsequent SIGTERM/SIGINT logs a warning and some details about what it is waiting on (ideally identifying the trigger(s) involved and even e.g. the outstanding connections or whatever is involved)
 * provide an API on the reactor that controls whether shutdown-signal-received-during-blocked-shutdown warns, forces shutdown to proceed, or whatever, so that service authors can choose which behaviour they want.

I'm sure there are others. I'm not sure which is best for Launchpad's use case(s), or best in general.

A workaround for Launchpad's cases may be to explicitly override SIGTERM/INT from the before-shutdown trigger we register, but it seems fairly clear to me that Twisted should offer better facilities here. The relevant upstream bug appears to be <http://twistedmatrix.com/trac/ticket/4406>.

Gavin Panella (allenap) on 2011-02-14
Changed in launchpad:
status: Confirmed → Triaged
Jonathan Lange (jml) on 2011-02-14
tags: added: twisted
Changed in twisted:
status: Unknown → New
tags: added: oops
description: updated
Ursula Junque (ursinha) on 2011-02-23
Changed in launchpad:
importance: High → Critical
William Grant (wgrant) on 2012-10-22
tags: added: codehosting-ssh
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.