codebrowse deadlocks on logging lock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Michael Hudson-Doyle |
Bug Description
We kill threads that take longer than a minute or so by sending them a SystemExit exception. Unfortunately, threading.RLock isn't safe against asynchronous exceptions, and it's possible for a thread to be killed while holding the lock that the logging module uses to ensure that logging output doesn't get jumbled. After that happens, any thread that tries to log (i.e., all of them) will block forever.
Although the unsafe window where an exception will cause this problem is pretty small, this problem seems to be happening surprisingly often.
I don't really know what the fix is. I guess figuring out why we have requests taking so long that we have to kill the threads processing them would be ideal, but possibly a little ambitious. We need to do something, though.
r51 of ~launchpad- pqm/launchpad- loggerhead/ devel