Launchpad itself

Bug #669296
Comment #6

Comment 6 for bug 669296

Revision history for this message

Robert Collins (lifeless) wrote on 2010-11-02: Re: [Bug 669296] Re: lpnet11 "critical timeout" to nagios, non responsive

So, lets see:
Thread 6 is a srv/launchpad.net/production/launchpad-rev-11793/eggs/zope.sendmail-3.7.1-py2.6.egg/zope/sendmail/queue.py
(155): run
- I thought we had disabled the zope mail queue because it crashed appservers?
- waiting on lock=0x2e67e50 (aka the GIL)

Thread 5 has crashed - the entire thread has gone boom, its written
the error to a file is is trying to obtain the GIL to return to python
code. Note that its in 'excepthook' here.

Thread 4 is in a __del__ handler and we've lost the bottom of the
thread - it may even be approximately empty. its waiting on the GIL,
and it was handling
"
/srv/launchpad.net/production/launchpad-rev-11793/eggs/storm-0.18-py2.6-linux-x86_64.egg/storm/database.py
(245): close
/srv/launchpad.net/production/launchpad-rev-11793/eggs/storm-0.18-py2.6-linux-x86_64.egg/storm/database.py
(188): __del__
"

call_function is the basic trampoline, AIUI, so it could be any of the
close methods lines - but WAG: the postgresql line.

Thread 3 has also blown up, and been writing to the *exact same file*
as Thread 5. '0x2b119cb631e0'. Its also blown up the entire stack to
the excepthook default handler.

Thread 2 is deleting itself and is waiting for the thread HEAD_LOCK
to do so. It holds the GIL.

Thread 1 is in a __del__ handler, had been inserting into the request
timeline the fact that the database connection is being closed, and is
waiting for the GIL.

Thread 2 is the reason other threads are not shutting down. Now, why
is thread 2 waiting on the head_mutex?

This looks to me like the slow shutdown bug spm filed a few weeks
back, FWIW. The big blowups on threads are when the interpreter is
shutting down, and the hang is due to a deadlock around the
head_mutex.