Iterator version of that returns results as they become available

Bug #520691 reported by Jon Olav Vik on 2010-02-11
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description


I have lots of trivially parallel computations and will store the results to
disk. Output order does not matter.

The function or TaskClient.parallel() decorator is convenient
and load-balanced, but blocks until all the results are in. What I would like
instead is a map()-like iterator that returns each result as it becomes
available; similarly, an iparallel() decorator that returns an iterator. Then I
could do:

from IPython.kernel import client
tc = client.TaskClient()
# Tasks that take an varying amount of time
import numpy as np
import time
@tc.iparallel() # <-- nifty feature to be written
def f(x):
    return -x
N = 1000
for result in f(range(N)):
    print result # or save to file, or plot a data point

By flushing the output from time to time, I could then watch results take shape
as they get computed.

I tried digging into the source code for, but was overwhelmed
by the layers and intricacies of Twisted. Any pointers in the right direction
would be highly appreciated.

Response by Brian Granger:
Yes, the code is pretty complicated right now and definitely uses
Twisted heavily.

The place to start looking is in:


In particular look at the code for barrier. I would have to think
about it more, but we really want a barrier method that has a mode
that will return a list of task ids that are complete at the time
barrier is called, rather than waiting for all tasks to be finished.
Might not be too difficult to implement this.

Changed in ipython:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers