Proxy server sometimes deadlocks while logging client disconnect

Bug #1895739 reported by Tim Burke on 2020-09-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Undecided
Unassigned

Bug Description

I still haven't found a reliable way to reproduce this, but now and then I see my proxy-server hang indefinitely. Fortunately, if I kill the worker PID and let the parent spin up a new one, my clients are robust enough that everything just kind of magically starts moving again, but it's still an annoying and manual process to fix it. Using https://github.com/swiftstack/python-stack-xray/blob/master/python-stack-xray, I can get a stack out that looks like http://paste.openstack.org/show/797139/

The trouble is that double-call into current_thread() that takes us down into enumerate() -- the _active_limbo_lock is not re-entrant, so the thread deadlocks waiting for itself. I'm still not entirely sure where the fault lies, though:

* Maybe we need to make sure our app iters get closed out promptly so they never get randomly GC'ed while the lock is held.
* Maybe we need to just avoid logging in `except GeneratorExit` (and maybe `finally`?) clauses -- though that's a sizeable loss of functionality.
* Maybe eventlet needs to avoid looping over all threads in current_thread() -- CPython's implementation doesn't do that.
* Maybe eventlet needs to special-case this particular lock and swap it out with an RLock.
* Maybe CPython needs to use a re-entrant lock.

Simplest fix might be for us to swap the lock out for a PipeMutex in eventlet_monkey_patch().

Tim Burke (1-tim-z) wrote :

Confirmed by DHE on py36: http://paste.openstack.org/show/798034/

Changed in swift:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers