Comment 4 for bug 669296

Revision history for this message
Gary Poster (gary) wrote : Re: lpnet11 "critical timeout" to nagios, non responsive

See also bug 669776.

Thread 2 in this pastebin and thread 3 in the other bug's pastebin both seem to be the "odd men out". The other one is not as obviously all about the GIL, but both of the "unusual" threads are calling PyThreadState_DeleteCurrent. Here's a pithy description of what that call is about: http://code.activestate.com/lists/python-list/269073/ . Here's the good bits:

"""
PyThreadState_DeleteCurrent() takes no argument, and deletes the current
thread state. You must hold the GIL while calling it, and the thread
calling it loses its tstate, and loses the GIL, and must never call anything
in the Python C API ever again (unless it acquires another tstate first, but
that's an intended use case). As that description hints <wink>, it was
introduced to plug a Python shutdown race.

A detailed explanation of "the bug" can be found in the comments here:

    http://www.python.org/sf/225673
"""

That's from 2002, but still seems like a good pointer.

Lucid is using Python 2.6.5 locally, and 2.6.6 has been released, but nothing in the NEWS seems to be pertinent to thread issues. http://python.org/download/releases/2.6.6/NEWS.txt

I'll ask Michael Hudson, Barry Warsaw, and Robert to take a look in case they can provide some insight. Other things to look at are changes in before this incident: we've been running Python 2.6 for a few weeks now, so it seems like it might be triggered by other things.

If there's one more somewhat like this soon, it will push everything to the back of the queue. I'd rather not drop everything until that happens, though.