Comment 16 for bug 711225

Revision history for this message
Barry Warsaw (barry) wrote :

Well, this is definitely going to be a fun one. Here's some status:

I know that the exception is getting raised in PyInt_AsLong(). See intobject.c:161 in the Python 2.7 source. I had to rebuild a Python 2.7 with all the relevant TypeErrors containing an extended message string to tell me exactly which one of about a dozen was getting tripped. This was fun because I also had to disable all the tests in the package build because they just took so darn long.

I tried various combinations of Python level changes to subprocess.py, but I do not have a workaround at the Python level. E.g. things like this still broke:

   active = _active[:]
or
   active = list(_active)

and so on. Those all worked fine, but as soon as you iterated over the new list with the for statement, boom! I even tried hardcoding things like _active[0:len(_active)] and _active[0:0] but those still fail as soon as you begin for-loop iterating.

I tried to change /usr/share/jockey/jockey-backend to use python-dbg but too many 64bit libraries are missing so that doesn't work. I'm going to rebuild a 32 bit VM to see if I can get attached to a debug version of Python to further inspect the object getting passed to PyInt_AsLong() right at the failure point.

This is almost certainly a bug in Python, but searching the upstream tracker hasn't yielded anything directly related, though there are several issues (mostly closed) about _cleanup() and it's problem in the face of threads. It may be fruitful to work around it by explicitly closing the popen objects rather than letting gc and __del__ put them in _active, but I'll have to look at the jockey code more closely to know that. Still, I'd like to spend a bit more time investigating the issue at the CPython layer before moving on to workaround or hacks.

OTOH, it seems unlikely that this is some kind of race condition since it's 100% reproducible using Martin's recipe (thanks for that!). Anyway, that's it for now.