Proxy server sometimes deadlocks while logging client disconnect

Bug #1895739 reported by Tim Burke
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

I still haven't found a reliable way to reproduce this, but now and then I see my proxy-server hang indefinitely. Fortunately, if I kill the worker PID and let the parent spin up a new one, my clients are robust enough that everything just kind of magically starts moving again, but it's still an annoying and manual process to fix it. Using https://github.com/swiftstack/python-stack-xray/blob/master/python-stack-xray, I can get a stack out that looks like http://paste.openstack.org/show/797139/

The trouble is that double-call into current_thread() that takes us down into enumerate() -- the _active_limbo_lock is not re-entrant, so the thread deadlocks waiting for itself. I'm still not entirely sure where the fault lies, though:

* Maybe we need to make sure our app iters get closed out promptly so they never get randomly GC'ed while the lock is held.
* Maybe we need to just avoid logging in `except GeneratorExit` (and maybe `finally`?) clauses -- though that's a sizeable loss of functionality.
* Maybe eventlet needs to avoid looping over all threads in current_thread() -- CPython's implementation doesn't do that.
* Maybe eventlet needs to special-case this particular lock and swap it out with an RLock.
* Maybe CPython needs to use a re-entrant lock.

Simplest fix might be for us to swap the lock out for a PipeMutex in eventlet_monkey_patch().

Revision history for this message
Tim Burke (1-tim-z) wrote :

Confirmed by DHE on py36: http://paste.openstack.org/show/798034/

Changed in swift:
status: New → Confirmed
Revision history for this message
Tim Burke (1-tim-z) wrote :

Also seen on py39 (while running unit tests, no less!): http://paste.openstack.org/show/799765/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.27.0

This issue was fixed in the openstack/swift 2.27.0 release.

Tim Burke (1-tim-z)
Changed in swift:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.