Using multiprocessing module crashes parallel iPython

Bug #517358 reported by Justin MacCallum
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
IPython
Confirmed
Undecided
Unassigned

Bug Description

I have a parallel scientific code that runs on clusters. It currently uses my own (badly designed) parallel communication engine and I'm trying to transition to iPython's TaskClient interface. One part of my code uses the subprocess module to wrap a call to a different piece of software. Unfortunately, this seems to be causing problems for me. I'm running on OS X 10.6.2, with python 2.6.4 and ipython 0.10 as supplied by MacPorts. The following code snippet will reproduce the problem I'm having.

-----
#!/usr/bin/env python

from IPython.kernel import client

tc = client.TaskClient()

@tc.parallel()
def remote_test(input):
   import subprocess
   # I'm obviously not wrapping ls, but I have the same problem with the real binary I'm trying to call
   subprocess.check_call('ls')
   return input

work = range(100)

results = remote_test(work)
-----

The output of this program is:

-----
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/python/filepath.py:12: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
 import sha
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/clientconnector.py:43: DeprecationWarning: Importing class Tub directly from 'foolscap' is deprecated since Foolscap 0.4.3. Please import foolscap.api.Tub instead
 self.tub = Tub()
/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/taskfc.py:79: DeprecationWarning: Importing class Referenceable1 directly from 'foolscap' is deprecated since Foolscap 0.4.3. Please import foolscap.api.Referenceable instead
 class FCTaskControllerFromTaskController(Referenceable):
Traceback (most recent call last):
 File "./test.py", line 15, in <module>
   results = remote_test(work)
 File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/parallelfunction.py", line 104, in call_function
   return self.mapper.map(self.func, *sequences)
 File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/mapper.py", line 230, in map
   task_results = [self.task_controller.get_task_result(tid) for tid in task_ids]
 File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/taskclient.py", line 93, in get_task_result
   taskid, block)
 File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/twistedutil.py", line 72, in blockingCallFromThread
   return twisted.internet.threads.blockingCallFromThread(reactor, f, *a, **kw)
 File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/internet/threads.py", line 114, in blockingCallFromThread
   result.raiseException()
 File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/python/failure.py", line 326, in raiseException
   raise self.type, self.value, self.tb
OSError: [Errno 4] Interrupted system call
-----

The ipcontroller log file looks like:

-----
2010-02-04 12:42:36-0800 [-] Running task 82 on worker 1
2010-02-04 12:42:36-0800 [Negotiation,2,192.168.0.194] Task completed: 82
2010-02-04 12:42:36-0800 [Negotiation,2,192.168.0.194] distributing Tasks
2010-02-04 12:42:36-0800 [Negotiation,2,192.168.0.194] Running task 83 on worker 1
2010-02-04 12:42:36-0800 [Negotiation,2,192.168.0.194] Task 83 failed on worker 1
2010-02-04 12:42:36-0800 [-] distributing Tasks
2010-02-04 12:42:36-0800 [-] Running task 84 on worker 0
2010-02-04 12:42:36-0800 [Negotiation,0,192.168.0.194] Task 84 failed on worker 0
2010-02-04 12:42:37-0800 [-] distributing Tasks
2010-02-04 12:42:37-0800 [-] Running task 85 on worker 1
2010-02-04 12:42:37-0800 [Negotiation,2,192.168.0.194] Task 85 failed on worker 1
2010-02-04 12:42:37-0800 [-] distributing Tasks
2010-02-04 12:42:37-0800 [-] Running task 86 on worker 0
2010-02-04 12:42:37-0800 [Negotiation,0,192.168.0.194] Task 86 failed on worker 0
2010-02-04 12:42:38-0800 [-] distributing Tasks
2010-02-04 12:42:38-0800 [-] Running task 87 on worker 1
-----

Notice that a few of the tasks actually complete, while the majority fail with the strange interrupted system call error.

Revision history for this message
Justin MacCallum (justin-maccallum) wrote :

After some research, I believe that this is an issue that arises because of Twisted, which doesn't play nice with mulitprocessing or popen.

Revision history for this message
Brian Granger (ellisonbg) wrote : Re: [Bug 517358] [NEW] Using multiprocessing module crashes parallel iPython
Download full text (9.6 KiB)

Are you using multiprocessing or just subprocess?

Brian

On Thu, Feb 4, 2010 at 5:14 PM, Justin MacCallum
<email address hidden> wrote:
> Public bug reported:
>
> I have a parallel scientific code that runs on clusters. It currently
> uses my own (badly designed) parallel communication engine and I'm
> trying to transition to iPython's TaskClient interface. One part of my
> code uses the subprocess module to wrap a call to a different piece of
> software. Unfortunately, this seems to be causing problems for me. I'm
> running on OS X 10.6.2, with python 2.6.4 and ipython 0.10 as supplied
> by MacPorts. The following code snippet will reproduce the problem I'm
> having.
>
> -----
> #!/usr/bin/env python
>
> from IPython.kernel import client
>
> tc = client.TaskClient()
>
> @tc.parallel()
> def remote_test(input):
>   import subprocess
>   # I'm obviously not wrapping ls, but I have the same problem with the real binary I'm trying to call
>   subprocess.check_call('ls')
>   return input
>
> work = range(100)
>
> results = remote_test(work)
> -----
>
>
> The output of this program is:
>
> -----
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/python/filepath.py:12: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
>  import sha
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/clientconnector.py:43: DeprecationWarning: Importing class Tub directly from 'foolscap' is deprecated since Foolscap 0.4.3. Please import foolscap.api.Tub instead
>  self.tub = Tub()
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/taskfc.py:79: DeprecationWarning: Importing class Referenceable1 directly from 'foolscap' is deprecated since Foolscap 0.4.3. Please import foolscap.api.Referenceable instead
>  class FCTaskControllerFromTaskController(Referenceable):
> Traceback (most recent call last):
>  File "./test.py", line 15, in <module>
>   results = remote_test(work)
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/parallelfunction.py", line 104, in call_function
>   return self.mapper.map(self.func, *sequences)
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/mapper.py", line 230, in map
>   task_results = [self.task_controller.get_task_result(tid) for tid in task_ids]
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/taskclient.py", line 93, in get_task_result
>   taskid, block)
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/twistedutil.py", line 72, in blockingCallFromThread
>   return twisted.internet.threads.blockingCallFromThread(reactor, f, *a, **kw)
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/internet/threads.py", line 114, in blockingCallFromThread
>   result.raiseException()
>  Fi...

Read more...

Revision history for this message
Justin MacCallum (justin-maccallum) wrote :

Sigh... sorry. It was a typo, I'm just using subprocess.

I still believe my comment is valid though; I think this is a long standing problem with Twisted.

Revision history for this message
Brian Granger (ellisonbg) wrote : Re: [Bug 517358] Re: Using multiprocessing module crashes parallel iPython
Download full text (5.3 KiB)

Yes, my feeling is that subprocess won't work reliably in any twisted
using process as it installs its own signal handlers. I will look
into this though.

Cheers,

Brian

On Thu, Feb 4, 2010 at 6:28 PM, Justin MacCallum
<email address hidden> wrote:
> Sigh... sorry. It was a typo, I'm just using subprocess.
>
> I still believe my comment is valid though; I think this is a long
> standing problem with Twisted.
>
> --
> Using multiprocessing module crashes parallel iPython
> https://bugs.launchpad.net/bugs/517358
> You received this bug notification because you are a member of IPython
> Developers, which is subscribed to IPython.
>
> Status in IPython - Enhanced Interactive Python: New
>
> Bug description:
> I have a parallel scientific code that runs on clusters. It currently uses my own (badly designed) parallel communication engine and I'm trying to transition to iPython's TaskClient interface. One part of my code uses the subprocess module to wrap a call to a different piece of software. Unfortunately, this seems to be causing problems for me. I'm running on OS X 10.6.2, with python 2.6.4 and ipython 0.10 as supplied by MacPorts. The following code snippet will reproduce the problem I'm having.
>
> -----
> #!/usr/bin/env python
>
> from IPython.kernel import client
>
> tc = client.TaskClient()
>
> @tc.parallel()
> def remote_test(input):
>   import subprocess
>   # I'm obviously not wrapping ls, but I have the same problem with the real binary I'm trying to call
>   subprocess.check_call('ls')
>   return input
>
> work = range(100)
>
> results = remote_test(work)
> -----
>
>
> The output of this program is:
>
> -----
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/python/filepath.py:12: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
>  import sha
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/clientconnector.py:43: DeprecationWarning: Importing class Tub directly from 'foolscap' is deprecated since Foolscap 0.4.3. Please import foolscap.api.Tub instead
>  self.tub = Tub()
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/taskfc.py:79: DeprecationWarning: Importing class Referenceable1 directly from 'foolscap' is deprecated since Foolscap 0.4.3. Please import foolscap.api.Referenceable instead
>  class FCTaskControllerFromTaskController(Referenceable):
> Traceback (most recent call last):
>  File "./test.py", line 15, in <module>
>   results = remote_test(work)
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/parallelfunction.py", line 104, in call_function
>   return self.mapper.map(self.func, *sequences)
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/mapper.py", line 230, in map
>   task_results = [self.task_controller.get_task_result(tid) for tid in task_ids]
>  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/taskcl...

Read more...

Revision history for this message
Justin MacCallum (justin-maccallum) wrote : Re: [Bug 517358] Re: Using multiprocessing module crashes parallel iPython
Download full text (11.3 KiB)

Hi Brian,

I agree that the problem lies in how Twisted deals with signal handling. However, this has been a known problem with Twisted for a very long time. Unfortunately, it doe not look like fixing it is much of a priority. The usual response is that you should do things the Twisted way and use their API to spawn a process. The problem comes for projects like iPython that, ideally, shouldn't be exposing any of Twisted at all. Perhaps you might have more traction with the Twisted folks, given that you have a well known project that would benefit from fixing this bug. As I understand it, the solution is to write a simple C extension module that fixes signal handling - just no one has done it yet.

I should also point out, that these problems don't exist if the calls to subprocess occur in a separate thread spawned by deferToThread. I haven't had time to look at the source, but how hard would it be to setup the engines so that remote calls are run using deferToThread?

Regards,
Justin

On Feb 4, 2010, at 6:37 PM, Brian Granger wrote:

> Yes, my feeling is that subprocess won't work reliably in any twisted
> using process as it installs its own signal handlers. I will look
> into this though.
>
> Cheers,
>
> Brian
>
> On Thu, Feb 4, 2010 at 6:28 PM, Justin MacCallum
> <email address hidden> wrote:
>> Sigh... sorry. It was a typo, I'm just using subprocess.
>>
>> I still believe my comment is valid though; I think this is a long
>> standing problem with Twisted.
>>
>> --
>> Using multiprocessing module crashes parallel iPython
>> https://bugs.launchpad.net/bugs/517358
>> You received this bug notification because you are a member of IPython
>> Developers, which is subscribed to IPython.
>>
>> Status in IPython - Enhanced Interactive Python: New
>>
>> Bug description:
>> I have a parallel scientific code that runs on clusters. It currently uses my own (badly designed) parallel communication engine and I'm trying to transition to iPython's TaskClient interface. One part of my code uses the subprocess module to wrap a call to a different piece of software. Unfortunately, this seems to be causing problems for me. I'm running on OS X 10.6.2, with python 2.6.4 and ipython 0.10 as supplied by MacPorts. The following code snippet will reproduce the problem I'm having.
>>
>> -----
>> #!/usr/bin/env python
>>
>> from IPython.kernel import client
>>
>> tc = client.TaskClient()
>>
>> @tc.parallel()
>> def remote_test(input):
>> import subprocess
>> # I'm obviously not wrapping ls, but I have the same problem with the real binary I'm trying to call
>> subprocess.check_call('ls')
>> return input
>>
>> work = range(100)
>>
>> results = remote_test(work)
>> -----
>>
>>
>> The output of this program is:
>>
>> -----
>> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Twisted-8.2.0-py2.6-macosx-10.6-i386.egg/twisted/python/filepath.py:12: DeprecationWarning: the sha module is deprecated; use the hashlib module instead
>> import sha
>> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/kernel/clientconnector.py:43: DeprecationWarning: Importing class T...

Revision history for this message
Brian Granger (ellisonbg) wrote : Re: [Bug 517358] Re: Using multiprocessing module crashes parallel iPython
Download full text (16.7 KiB)

> I agree that the problem lies in how Twisted deals with signal handling.
> However, this has been a known problem with Twisted for a very long
> time. Unfortunately, it doe not look like fixing it is much of a
> priority. The usual response is that you should do things the Twisted
> way and use their API to spawn a process. The problem comes for projects
> like iPython that, ideally, shouldn't be exposing any of Twisted at all.
> Perhaps you might have more traction with the Twisted folks, given that
> you have a well known project that would benefit from fixing this bug.
> As I understand it, the solution is to write a simple C extension module
> that fixes signal handling - just no one has done it yet.

I think your assessment is correct. When you use Twisted you have to
buy into the
entire universe. That is problematic and we may end up moving away
from twisted.

> I should also point out, that these problems don't exist if the calls to
> subprocess occur in a separate thread spawned by deferToThread. I
> haven't had time to look at the source, but how hard would it be to
> setup the engines so that remote calls are run using deferToThread?

I don't see how deferToThread helps as the signal handlers are process wide.
Also, I don't think we can use deferToThread in our case for other reasons.

Cheers,

Brian

> Regards,
> Justin
>
>
> On Feb 4, 2010, at 6:37 PM, Brian Granger wrote:
>
>> Yes, my feeling is that subprocess won't work reliably in any twisted
>> using process as it installs its own signal handlers.  I will look
>> into this though.
>>
>> Cheers,
>>
>> Brian
>>
>> On Thu, Feb 4, 2010 at 6:28 PM, Justin MacCallum
>> <email address hidden> wrote:
>>> Sigh... sorry. It was a typo, I'm just using subprocess.
>>>
>>> I still believe my comment is valid though; I think this is a long
>>> standing problem with Twisted.
>>>
>>> --
>>> Using multiprocessing module crashes parallel iPython
>>> https://bugs.launchpad.net/bugs/517358
>>> You received this bug notification because you are a member of IPython
>>> Developers, which is subscribed to IPython.
>>>
>>> Status in IPython - Enhanced Interactive Python: New
>>>
>>> Bug description:
>>> I have a parallel scientific code that runs on clusters. It currently uses my own (badly designed) parallel communication engine and I'm trying to transition to iPython's TaskClient interface. One part of my code uses the subprocess module to wrap a call to a different piece of software. Unfortunately, this seems to be causing problems for me. I'm running on OS X 10.6.2, with python 2.6.4 and ipython 0.10 as supplied by MacPorts. The following code snippet will reproduce the problem I'm having.
>>>
>>> -----
>>> #!/usr/bin/env python
>>>
>>> from IPython.kernel import client
>>>
>>> tc = client.TaskClient()
>>>
>>> @tc.parallel()
>>> def remote_test(input):
>>>   import subprocess
>>>   # I'm obviously not wrapping ls, but I have the same problem with the real binary I'm trying to call
>>>   subprocess.check_call('ls')
>>>   return input
>>>
>>> work = range(100)
>>>
>>> results = remote_test(work)
>>> -----
>>>
>>>
>>> The output of this program is:
>>>
>>> -----
>>> /opt/local/Library/Framewo...

Changed in ipython:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.