propose change of API for IPython.kernel.task.Task's depend function

Bug #289561 reported by yichun
2
Affects Status Importance Assigned to Milestone
IPython
Won't Fix
Undecided
Brian Granger

Bug Description

Currently IPython.kernel.task.Task can have a depend function that is called with the only argument as the property dictionary of the engine.

Since whether or not a task is runnable on an engine depends on (1) the prerequisite of the task itself (that can be described by the user create the Task object); (2) the status/properties of the engine at the time the scheduling happens. I see the current API address the 2nd condition well, but not the 1st condition by calling depend function with the properties dictionary:

in line 708 in src/IPython/kernel/task.py:
cando = t.check_depend(w.properties)

in line 252 in src/IPython/kernel/task.py:
return self.depend(properties)

Can the signature of the depend function (called in line 252) be changed to

return self.depend(self, properties)

in class BaseTask, instead of simply depend(properties), so both (1) and (2) be addressed?

If this is approved, the user will be able to organize and construct complex dependency networks of jobs with this new depend API and opens up a lot of possibility of potentially novel use cases.

Tags: kernel
yichun (yichun-wei)
description: updated
description: updated
Revision history for this message
Brian Granger (ellisonbg) wrote : Re: [Bug 289561] [NEW] propose change of API for IPython.kernel.task.Task's depend function
Download full text (5.4 KiB)

Can you give a concrete example of how this could be used. I am more
than willing to make this change if there is a really good reason - it
just want to make sure that there isn't another way of accomplishing
it.

Thanks for the feedbak

On Sun, Oct 26, 2008 at 10:47 AM, yichun <email address hidden> wrote:
> Public bug reported:
>
> Currently IPython.kernel.task.Task can have a depend function that is
> called with the only argument as the property dictionary of the engine.
>
> Since whether or not a task is runnable on an engine depends on (1) the
> prerequisite of the task itself (that can be described by the user
> create the Task object); (2) the status/properties of the engine at the
> time the scheduling happens. I see the current API address the 2nd
> condition well, but not the 1st condition by calling depend function
> with the properties dictionary:
>
> in line 708 in src/IPython/kernel/task.py:
> cando = t.check_depend(w.properties)
>
> in line 252 in src/IPython/kernel/task.py:
> return self.depend(properties)
>
> Can the signature of the depend function (called in line 252) be changed
> to
>
> return self.depend(self, properties)
>
> in class BaseTask, instead of simply depend(properties), so both (1) and
> (2) be addressed?
>
> If this is approved, the user will be able to organize and construct
> complex dependency networks of jobs with this new depend API and opens
> up a lot of possibility of potentially novel use cases.
>
> ** Affects: ipython
> Importance: Undecided
> Status: New
>
> ** Description changed:
>
> Currently IPython.kernel.task.Task can have a depend function that is
> called with the only argument as the property dictionary of the engine.
>
> - Since whether or not an task is runnable on an engine depends on (1) the
> + Since whether or not a task is runnable on an engine depends on (1) the
> prerequisite of the task itself (that can be described by the user
> create the Task object); (2) the status/properties of the engine at the
> time the scheduling happens. I see the current API address the 2nd
> condition well, but not the 1st condition by calling depend function
> with the properties dictionary:
>
> in line 708 in src/IPython/kernel/task.py:
> cando = t.check_depend(w.properties)
>
> in line 252 in src/IPython/kernel/task.py:
> return self.depend(properties)
>
> Can the signature of the depend function (called in line 252) be changed
> to
>
> return depend(self, properties)
>
> in class BaseTask, instead of simply depend(properties), so both (1) and
> (2) be addressed?
>
> If this is approved, the user will be able to organize and construct
> complex dependency networks of jobs with this new depend API and opens
> up a lot of possibility of potentially novel use cases.
>
> ** Description changed:
>
> Currently IPython.kernel.task.Task can have a depend function that is
> called with the only argument as the property dictionary of the engine.
>
> Since whether or not a task is runnable on an engine depends on (1) the
> prerequisite of the task itself (that can be described by the user
> create the Task object); (2) the status/properties of the engine at the
> time th...

Read more...

Revision history for this message
yichun (yichun-wei) wrote :

Thanks for the comment, Brain. After quite some thought, I still cannot find a way to accomplish under the current API a quite simple thing:

I want to code a "depend" function in which another variable "prereqs" (other than the engine properties dictionary) is referenced. This variable contains what the Task should look for in the engine's property dictionary. Its value can only be determined in run time when the Task is created, say

prereqs = ['mpi', name_of_result_of_the_last_task]

def has_prop(props):
    return all([props.get(p) for p in prereqs])

(i) A closure will not help because they cannot be pushed to engines. (This is the error I met when trying to accomplish this via a closure.
(ii) Pushing the variable "prereqs" to engines is also troublesome because one would like his tasks to utilize newly available engines without having to keep watching exactly which engine comes up new and/or which ones are already capable of running the next task. Plus, what a Task should look for in the property dictionary is an information of the Task, not a property of the engine, so such information is better associated with the Task object.

I am happy to learn a way to accomplish this with the current API, if that is the case, this proposal should surely go moot.

Revision history for this message
Min Ragan-Kelley (minrk) wrote : Re: [Bug 289561] Re: propose change of API for IPython.kernel.task.Task's depend function
Download full text (3.8 KiB)

I'm not sure I understand what kind of situation there would be where the
names of dependencies would be variable.
When you say previous task, do you mean the previously submitted task, or
the most recently completed task? If it is the previously submitted task,
this should be easy, by just having tasks write their taskID to the
properties of the engine - since the taskID is determined at the time of
task submission, you know it right away, and could do something like the
following:

def depend_on_ID(taskID):
    return lambda props: taskID in props.get("task_ids",[])

previous = tc.run(taskA)
taskB.depend = depend_on_ID(previous)
tc.run(taskB)

It would be harder to get it to depend on the previously completed task, and
this information is not even in the Task, it is only in the TaskController -
so a solution to that is not clear to me.

If neither of these is actually the case you mean, could you clarify why you
would want the dependencies to be variable?

Thanks for the input,
-Min RK

On Mon, Oct 27, 2008 at 1:45 PM, yichun <email address hidden> wrote:

> Thanks for the comment, Brain. After quite some thought, I still cannot
> find a way to accomplish under the current API a quite simple thing:
>
> I want to code a "depend" function in which another variable "prereqs"
> (other than the engine properties dictionary) is referenced. This
> variable contains what the Task should look for in the engine's property
> dictionary. Its value can only be determined in run time when the Task
> is created, say
>
> prereqs = ['mpi', name_of_result_of_the_last_task]
>
> def has_prop(props):
> return all([props.get(p) for p in prereqs])
>
> (i) A closure will not help because they cannot be pushed to engines. (This
> is the error I met when trying to accomplish this via a closure.
> (ii) Pushing the variable "prereqs" to engines is also troublesome because
> one would like his tasks to utilize newly available engines without having
> to keep watching exactly which engine comes up new and/or which ones are
> already capable of running the next task. Plus, what a Task should look for
> in the property dictionary is an information of the Task, not a property of
> the engine, so such information is better associated with the Task object.
>
> I am happy to learn a way to accomplish this with the current API, if
> that is the case, this proposal should surely go moot.
>
> --
> propose change of API for IPython.kernel.task.Task's depend function
> https://bugs.launchpad.net/bugs/289561
> You received this bug notification because you are a member of IPython
> Developers, which is subscribed to IPython.
>
> Status in IPython - Enhanced Interactive Python: New
>
> Bug description:
> Currently IPython.kernel.task.Task can have a depend function that is
> called with the only argument as the property dictionary of the engine.
>
> Since whether or not a task is runnable on an engine depends on (1) the
> prerequisite of the task itself (that can be described by the user create
> the Task object); (2) the status/properties of the engine at the time the
> scheduling happens. I see the current API address the 2nd condition well,
> but not the 1st condition by c...

Read more...

Revision history for this message
yichun (yichun-wei) wrote :

> When you say previous task, do you mean the previously submitted task, or
> the most recently completed task? If it is the previously submitted task,

Ultimately one would like to depend on the most recently completed task, could the depend function check some pre-negotiated labels in the engine's namespace to know the previous task (as a dependency) has finished?

But that is a more advanced scenario. My condition is really as simple as what you've answered here:

> this should be easy, by just having tasks write their taskID to the
> properties of the engine - since the taskID is determined at the time of
> task submission, you know it right away, and could do something like the
> following:
>
> def depend_on_ID(taskID):
> return lambda props: taskID in props.get("task_ids",[])
>
> previous = tc.run(taskA)
> taskB.depend = depend_on_ID(previous)
> tc.run(taskB)

Are you sure this will work? depend_on_ID(previous) returns a closure function, won't it? I tried to write depend function like this, but got error saying "cannot pickle closure functions". I might have made some other errors to break my code, so let me try again....

Thanks for the prompt reply, Min!

-yichun

Revision history for this message
yichun (yichun-wei) wrote :

> Are you sure this will work? depend_on_ID(previous) returns a closure function,
> won't it? I tried to write depend function like this, but got error saying "cannot
> pickle closure functions". I might have made some other errors to break my
> code, so let me try again....

I am still getting errors:
"Sorry, cannot pickle code objects with closures"

Revision history for this message
Min Ragan-Kelley (minrk) wrote :
Download full text (3.4 KiB)

Right - I forgot about the closure issue. There should be a way to get the
values of variables through, but perhaps adding a task.dependencies variable
would be useful. This is actually how the first implementation of
dependencies worked, before we figured out how to pickle functions - but
flexible implementation was cumbersome. I thought we could do everything we
used to with the current model, but I may be wrong as getting around the
closers is not obvious to me at the moment.

Brian, is there a way to get the following to work with the current
implementation, avoiding closures?
>>> t = big_task(N)
>>> footprint = determine_footprint_of_big_task(N)
>>> def dep(p):
           return p['RAM'] >= footprint
>>> task.depend = dep

adding a list of keys to check would solve part of the problem, but we want
to be able to perform basic logic, not just checking whether keys exist
(i.e. memory >= task_size), and a function is the easiest way to do this,
but I am blanking on how to get variables over the network.

The answer to following the previously completed task is not apparent to me
yet, nor is a case in which it would be useful. I cannot really think of a
usage case where it is important to submit a task to the most recent
completed worker when you have no information about what the most recently
completed task may have been. For this, we would have to allow the task to
be able to perform logic on the TaskController, which we would like to avoid
if possible, so a useful situation would have to be presented that cannot be
implemented otherwise before we allow that coupling.

-MinRK

On Mon, Oct 27, 2008 at 4:37 PM, yichun <email address hidden> wrote:

> > Are you sure this will work? depend_on_ID(previous) returns a closure
> function,
> > won't it? I tried to write depend function like this, but got error
> saying "cannot
> > pickle closure functions". I might have made some other errors to break
> my
> > code, so let me try again....
>
> I am still getting errors:
> "Sorry, cannot pickle code objects with closures"
>
> --
> propose change of API for IPython.kernel.task.Task's depend function
> https://bugs.launchpad.net/bugs/289561
> You received this bug notification because you are a member of IPython
> Developers, which is subscribed to IPython.
>
> Status in IPython - Enhanced Interactive Python: New
>
> Bug description:
> Currently IPython.kernel.task.Task can have a depend function that is
> called with the only argument as the property dictionary of the engine.
>
> Since whether or not a task is runnable on an engine depends on (1) the
> prerequisite of the task itself (that can be described by the user create
> the Task object); (2) the status/properties of the engine at the time the
> scheduling happens. I see the current API address the 2nd condition well,
> but not the 1st condition by calling depend function with the properties
> dictionary:
>
> in line 708 in src/IPython/kernel/task.py:
> cando = t.check_depend(w.properties)
>
> in line 252 in src/IPython/kernel/task.py:
> return self.depend(properties)
>
> Can the signature of the depend function (called in line 252) be changed to
>
> return self.depend(self, properties)
>
...

Read more...

Revision history for this message
Brian Granger (ellisonbg) wrote :
Download full text (5.8 KiB)

Sorry about the silence. I am getting caught up with email.

A few points:

* We definitely want to minimize the logic that the controller does.
Eventually with might even want to move the checking of dependencies
out of the controller to the engines themselves. The benefit of this
is that you then don't have to sync the properties of the engines back
to the controller.

* I still don't have a really clear idea of the usage case. Let's say
you have two tasks, t1, t2. Let's say the first has finished running
(assume 1 engine), what do you want to be checked before t2 is run?
Before we talk about implementation, let's make sure we all have a
clear idea of what we are trying to implement.

Brian

On Mon, Oct 27, 2008 at 5:34 PM, Min Ragan-Kelley <email address hidden> wrote:
> Right - I forgot about the closure issue. There should be a way to get the
> values of variables through, but perhaps adding a task.dependencies variable
> would be useful. This is actually how the first implementation of
> dependencies worked, before we figured out how to pickle functions - but
> flexible implementation was cumbersome. I thought we could do everything we
> used to with the current model, but I may be wrong as getting around the
> closers is not obvious to me at the moment.
>
> Brian, is there a way to get the following to work with the current
> implementation, avoiding closures?
>>>> t = big_task(N)
>>>> footprint = determine_footprint_of_big_task(N)
>>>> def dep(p):
> return p['RAM'] >= footprint
>>>> task.depend = dep
>
> adding a list of keys to check would solve part of the problem, but we want
> to be able to perform basic logic, not just checking whether keys exist
> (i.e. memory >= task_size), and a function is the easiest way to do this,
> but I am blanking on how to get variables over the network.
>
> The answer to following the previously completed task is not apparent to me
> yet, nor is a case in which it would be useful. I cannot really think of a
> usage case where it is important to submit a task to the most recent
> completed worker when you have no information about what the most recently
> completed task may have been. For this, we would have to allow the task to
> be able to perform logic on the TaskController, which we would like to avoid
> if possible, so a useful situation would have to be presented that cannot be
> implemented otherwise before we allow that coupling.
>
> -MinRK
>
> On Mon, Oct 27, 2008 at 4:37 PM, yichun <email address hidden> wrote:
>
>> > Are you sure this will work? depend_on_ID(previous) returns a closure
>> function,
>> > won't it? I tried to write depend function like this, but got error
>> saying "cannot
>> > pickle closure functions". I might have made some other errors to break
>> my
>> > code, so let me try again....
>>
>> I am still getting errors:
>> "Sorry, cannot pickle code objects with closures"
>>
>> --
>> propose change of API for IPython.kernel.task.Task's depend function
>> https://bugs.launchpad.net/bugs/289561
>> You received this bug notification because you are a member of IPython
>> Developers, which is subscribed to IPython.
>>
>> Status in IPython - Enhanced Interacti...

Read more...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (7.0 KiB)

On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden> wrote:
> Sorry about the silence. I am getting caught up with email.
>
> A few points:
>
> * We definitely want to minimize the logic that the controller does.
> Eventually with might even want to move the checking of dependencies
> out of the controller to the engines themselves. The benefit of this
> is that you then don't have to sync the properties of the engines back
> to the controller.

+1 from me on this.

>
> * I still don't have a really clear idea of the usage case. Let's say
> you have two tasks, t1, t2. Let's say the first has finished running
> (assume 1 engine), what do you want to be checked before t2 is run?
> Before we talk about implementation, let's make sure we all have a
> clear idea of what we are trying to implement.

For an imaginary example, t1 might end up with 2 possible result:
"dead" or "live", if t1 resulted in "dead", then t2 should not run,
and another task t1' that's a modified version of t1 might be tried
till t1 results in "live", at that time a t2 can be run on that engine
to continue the evolution... There can be a chain of such processing
tasks t3, t4....

That being said, currently the biggest headache for me is not about
those fancy dependencies, but that there is no clear way to accomplish
even the simplest task dependency where the depend function referring
to a variable that can only be decided when task is created, as shown
in Min's example. The key lies in the fact that the depend function
has no way to refer to the properties of the task itself.

-yichun

>
> Brian
>
>
> On Mon, Oct 27, 2008 at 5:34 PM, Min Ragan-Kelley <email address hidden> wrote:
>> Right - I forgot about the closure issue. There should be a way to get the
>> values of variables through, but perhaps adding a task.dependencies variable
>> would be useful. This is actually how the first implementation of
>> dependencies worked, before we figured out how to pickle functions - but
>> flexible implementation was cumbersome. I thought we could do everything we
>> used to with the current model, but I may be wrong as getting around the
>> closers is not obvious to me at the moment.
>>
>> Brian, is there a way to get the following to work with the current
>> implementation, avoiding closures?
>>>>> t = big_task(N)
>>>>> footprint = determine_footprint_of_big_task(N)
>>>>> def dep(p):
>> return p['RAM'] >= footprint
>>>>> task.depend = dep
>>
>> adding a list of keys to check would solve part of the problem, but we want
>> to be able to perform basic logic, not just checking whether keys exist
>> (i.e. memory >= task_size), and a function is the easiest way to do this,
>> but I am blanking on how to get variables over the network.
>>
>> The answer to following the previously completed task is not apparent to me
>> yet, nor is a case in which it would be useful. I cannot really think of a
>> usage case where it is important to submit a task to the most recent
>> completed worker when you have no information about what the most recently
>> completed task may have been. For this, we would have to allow the task to
>> be able to perform logic on the Tas...

Read more...

Revision history for this message
Min Ragan-Kelley (minrk) wrote :
Download full text (10.3 KiB)

On Fri, Oct 31, 2008 at 8:54 PM, yichun <email address hidden> wrote:

> On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden>
> wrote:
> > Sorry about the silence. I am getting caught up with email.
> >
> > A few points:
> >
> > * We definitely want to minimize the logic that the controller does.
> > Eventually with might even want to move the checking of dependencies
> > out of the controller to the engines themselves. The benefit of this
> > is that you then don't have to sync the properties of the engines back
> > to the controller.
>
> +1 from me on this.
>

If we want to move the logic to the engine, there are two logical models:

add a method for engines:

engine.check_depend(callable_dep_func):
returns True if dep_func returns True, False if it returns False or raises
exception.

>
> >
> > * I still don't have a really clear idea of the usage case. Let's say
> > you have two tasks, t1, t2. Let's say the first has finished running
> > (assume 1 engine), what do you want to be checked before t2 is run?
> > Before we talk about implementation, let's make sure we all have a
> > clear idea of what we are trying to implement.
>
> For an imaginary example, t1 might end up with 2 possible result:
> "dead" or "live", if t1 resulted in "dead", then t2 should not run,
> and another task t1' that's a modified version of t1 might be tried
> till t1 results in "live", at that time a t2 can be run on that engine
> to continue the evolution... There can be a chain of such processing
> tasks t3, t4....
>
> That being said, currently the biggest headache for me is not about
> those fancy dependencies, but that there is no clear way to accomplish
> even the simplest task dependency where the depend function referring
> to a variable that can only be decided when task is created, as shown
> in Min's example. The key lies in the fact that the depend function
> has no way to refer to the properties of the task itself.
>
> -yichun
>

The logical solution to this (no matter where it is executed - on the engine
or controller) is that the depend function takes two arguments rather than
just one:

depend(properties, data)

where properties is the engine.properties object, and data is a dict
(namespace) of data to allow more complicated logic (would cover discussed
cases). This can be implemented either by adding a Task.dep_data attribute,
or by having DependencyFunction be a class that stores this data dict as
well as the real callable.

Now, this would still not allow for a dependency function to depend on data
in the _Controller_, such as the last completed task. However, the proposed
case of a chain can be implemented with the current system:
t1 writes it's result to the properties dict:
properties["phase1"] = "dead" | "live"
Now, if you want t1' to run if t1 gives "dead", then you should actually
have t1 raise an exception if it is dead, and have t1' as t1's
recoveryTask. This will run t1' if t1 raises an exception. You can have an
arbitrary chain of such things - if t1' fails, t1'' runs, until either
running out of primes, or success. On success, each t1 would write the
phase1="live" property, which is a dependency of t2 and t2 wil...

Revision history for this message
Brian Granger (ellisonbg) wrote :
Download full text (13.3 KiB)

Yichun,

The StringTask and MapTask objects can be subclassed now:

class MyStringTask(StringTask):

    def __init__(self, expression, pull=None, push=None,
            clear_before=False, clear_after=False, retries=0,
            recovery_task=None, depend=None, depend_data=None):
        self.depend_data = depend_data
        StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
            recovery_task,depend)

    def check_depend(self, properties):
        if self.depend is not None:
            return self.depend(properties, self.depend_data)
        else:
            return True

Could you try this subclass (I haven't run it so it may have a few
bugs) to see it this accomplishes what you want? Once you have a
simple example of how you want to use this, could you post it here.

But, I am not sure that the property system is the best way to build
hierarchies of dependent tasks. That seems like a different issues
that we may want to think about separately. Thoughts Min?

Thanks

Brian

On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden> wrote:
> On Fri, Oct 31, 2008 at 8:54 PM, yichun <email address hidden> wrote:
>
>> On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden>
>> wrote:
>> > Sorry about the silence. I am getting caught up with email.
>> >
>> > A few points:
>> >
>> > * We definitely want to minimize the logic that the controller does.
>> > Eventually with might even want to move the checking of dependencies
>> > out of the controller to the engines themselves. The benefit of this
>> > is that you then don't have to sync the properties of the engines back
>> > to the controller.
>>
>> +1 from me on this.
>>
>
> If we want to move the logic to the engine, there are two logical
> models:
>
> add a method for engines:
>
> engine.check_depend(callable_dep_func):
> returns True if dep_func returns True, False if it returns False or raises
> exception.
>
>
>>
>> >
>> > * I still don't have a really clear idea of the usage case. Let's say
>> > you have two tasks, t1, t2. Let's say the first has finished running
>> > (assume 1 engine), what do you want to be checked before t2 is run?
>> > Before we talk about implementation, let's make sure we all have a
>> > clear idea of what we are trying to implement.
>>
>> For an imaginary example, t1 might end up with 2 possible result:
>> "dead" or "live", if t1 resulted in "dead", then t2 should not run,
>> and another task t1' that's a modified version of t1 might be tried
>> till t1 results in "live", at that time a t2 can be run on that engine
>> to continue the evolution... There can be a chain of such processing
>> tasks t3, t4....
>>
>> That being said, currently the biggest headache for me is not about
>> those fancy dependencies, but that there is no clear way to accomplish
>> even the simplest task dependency where the depend function referring
>> to a variable that can only be decided when task is created, as shown
>> in Min's example. The key lies in the fact that the depend function
>> has no way to refer to the properties of the task itself.
>>
>> -yichun
>>
>
> The logical solution to this (no matter where it is e...

Revision history for this message
yichun (yichun-wei) wrote :

On Fri, Oct 31, 2008 at 11:25 PM, Min Ragan-Kelley <email address hidden> wrote:
>
> Now, this would still not allow for a dependency function to depend on data
> in the _Controller_, such as the last completed task. However, the proposed
> case of a chain can be implemented with the current system:
> t1 writes it's result to the properties dict:
> properties["phase1"] = "dead" | "live"
> Now, if you want t1' to run if t1 gives "dead", then you should actually
> have t1 raise an exception if it is dead, and have t1' as t1's
> recoveryTask. This will run t1' if t1 raises an exception. You can have an
> arbitrary chain of such things - if t1' fails, t1'' runs, until either
> running out of primes, or success. On success, each t1 would write the
> phase1="live" property, which is a dependency of t2 and t2 will run.
>
> -MinRK
>

If I understand it right, currently the recovery_task for a Task does
not necessarily run on the
same engine the Task ran and failed. The TaskController simply adds
recovery_task as a new
task to the queue which is distributed by its .distributeTasks()
method. (Brain, is this the
desired behavior of recovery_task?) This makes it very hard to utilize
recovery_task in this
scenario. But you are right, a Task should always be able to check
what is in the engine's
namespace to determine whether or not it can run.

-yichun

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (14.1 KiB)

Thanks Brian, this should really work even in the old IPython1, which
I am still using. Will report back what I get and post an example once
I get around to test it. -yichun

On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
> Yichun,
>
> The StringTask and MapTask objects can be subclassed now:
>
> class MyStringTask(StringTask):
>
> def __init__(self, expression, pull=None, push=None,
> clear_before=False, clear_after=False, retries=0,
> recovery_task=None, depend=None, depend_data=None):
> self.depend_data = depend_data
> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
> recovery_task,depend)
>
> def check_depend(self, properties):
> if self.depend is not None:
> return self.depend(properties, self.depend_data)
> else:
> return True
>
> Could you try this subclass (I haven't run it so it may have a few
> bugs) to see it this accomplishes what you want? Once you have a
> simple example of how you want to use this, could you post it here.
>
> But, I am not sure that the property system is the best way to build
> hierarchies of dependent tasks. That seems like a different issues
> that we may want to think about separately. Thoughts Min?
>
> Thanks
>
> Brian
>
>
> On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden> wrote:
>> On Fri, Oct 31, 2008 at 8:54 PM, yichun <email address hidden> wrote:
>>
>>> On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden>
>>> wrote:
>>> > Sorry about the silence. I am getting caught up with email.
>>> >
>>> > A few points:
>>> >
>>> > * We definitely want to minimize the logic that the controller does.
>>> > Eventually with might even want to move the checking of dependencies
>>> > out of the controller to the engines themselves. The benefit of this
>>> > is that you then don't have to sync the properties of the engines back
>>> > to the controller.
>>>
>>> +1 from me on this.
>>>
>>
>> If we want to move the logic to the engine, there are two logical
>> models:
>>
>> add a method for engines:
>>
>> engine.check_depend(callable_dep_func):
>> returns True if dep_func returns True, False if it returns False or raises
>> exception.
>>
>>
>>>
>>> >
>>> > * I still don't have a really clear idea of the usage case. Let's say
>>> > you have two tasks, t1, t2. Let's say the first has finished running
>>> > (assume 1 engine), what do you want to be checked before t2 is run?
>>> > Before we talk about implementation, let's make sure we all have a
>>> > clear idea of what we are trying to implement.
>>>
>>> For an imaginary example, t1 might end up with 2 possible result:
>>> "dead" or "live", if t1 resulted in "dead", then t2 should not run,
>>> and another task t1' that's a modified version of t1 might be tried
>>> till t1 results in "live", at that time a t2 can be run on that engine
>>> to continue the evolution... There can be a chain of such processing
>>> tasks t3, t4....
>>>
>>> That being said, currently the biggest headache for me is not about
>>> those fancy dependencies, but that there is no clear way to accomplish
>>> even the s...

Revision history for this message
Min Ragan-Kelley (minrk) wrote :
Download full text (17.9 KiB)

I fail to see why you would want to run the recovery task on the same engine
as a failure. My understanding of the scenario was that you have a sequence
of tasks to run, which would only need to continue staying on a single
engine once a version of task1 succeeded *somewhere*. This should do that.

The model for the recovery_task is 'if t doesn't work, try something else',
and something else includes a potentially different engine, so the general
behavior of the recovery_task should certainly not be to run on the same
engine. Under normal usage, tasks fail when something has gone wrong, and
it could be that the engine is responsible, or that some task code
segfaulted on that machine, and took out the engine - in which case it would
certainly not want to try to resubmit to that machine. If you want
t1,t1',etc. to all run on one machine, then I would actually suggest that
they be a single task, because it doesn't really make sense to implement:
t1s = [t1,t1',t1''...]
success = False
while t1s and success:
  try:
     do t1s[0]
  except:
     t1s.pop(0)
  else:
     success = True

with the task interface unless you want the try-except to include the engine
as the possible cause for failure - which is exactly what the recovery_task
gets you.

If there is a case to be made for the additional functionality of a task to
be run on an engine in the event of that same engine's failure to complete a
task (I can see failure logging or something), then we can consider adding
that functionality, but that is certainly not the appropriate behavior of
the current model.

-MinRK

On Wed, Nov 5, 2008 at 5:13 PM, yichun <email address hidden> wrote:

> Thanks Brian, this should really work even in the old IPython1, which
> I am still using. Will report back what I get and post an example once
> I get around to test it. -yichun
>
> On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
> > Yichun,
> >
> > The StringTask and MapTask objects can be subclassed now:
> >
> > class MyStringTask(StringTask):
> >
> > def __init__(self, expression, pull=None, push=None,
> > clear_before=False, clear_after=False, retries=0,
> > recovery_task=None, depend=None, depend_data=None):
> > self.depend_data = depend_data
> >
> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
> > recovery_task,depend)
> >
> > def check_depend(self, properties):
> > if self.depend is not None:
> > return self.depend(properties, self.depend_data)
> > else:
> > return True
> >
> > Could you try this subclass (I haven't run it so it may have a few
> > bugs) to see it this accomplishes what you want? Once you have a
> > simple example of how you want to use this, could you post it here.
> >
> > But, I am not sure that the property system is the best way to build
> > hierarchies of dependent tasks. That seems like a different issues
> > that we may want to think about separately. Thoughts Min?
> >
> > Thanks
> >
> > Brian
> >
> >
> > On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden>
> wrote:
> >> On Fri, Oct 31, 2008 at 8:54 PM, yichun <yichun.we...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (15.8 KiB)

I am getting errors when do TaskController.run(task):

/mnt/disks/tided0/ywei/nrn/ballstickoo/experiments/2ndsyn/test_task.py
in <module>()
      8
      9 a = MyTask("import os; m=os.uname()[1]",pull=['m',])
---> 10 res = tc.run(a)
     11
     12

/mnt/disks/tided0/ywei/src/ipython1-dev/ipython1/kernel/taskclient.pyc
in run(self, task)
     99 The `Task` object to run
    100 """
--> 101 return blockingCallFromThread(self.task_controller.run, task)
    102
    103 def get_task_result(self, taskid, block=False):

/mnt/disks/tided0/ywei/src/ipython1-dev/ipython1/kernel/twistedutil.pyc
in blockingCallFromThread(f, *a, **kw)
     67 @raise: any error raised during the callback chain.
     68 """
---> 69 return
twisted.internet.threads.blockingCallFromThread(reactor, f, *a, **kw)
     70
     71 else:

/lnc/ywei/usr/lib/python2.5/site-packages/Twisted-8.1.0-py2.5-linux-i686.egg/twisted/internet/threads.pyc
in blockingCallFromThread(reactor, f, *a, **kw)
     81 result = queue.get()
     82 if isinstance(result, failure.Failure):
---> 83 result.raiseException()
     84 return result
     85

/lnc/ywei/usr/lib/python2.5/site-packages/Twisted-8.1.0-py2.5-linux-i686.egg/twisted/python/failure.pyc
in raiseException(self)
    317 information if available.
    318 """
--> 319 raise self.type, self.value, self.tb
    320
    321

UnpickleableError: Cannot pickle <type 'str'> objects
WARNING: Failure executing file: <test_task.py>

test_task,py:
----------------------------------------------------------------------
from ipython1.kernel import client
from asynparallel2 import MyTask

rc = client.get_multiengine_client()
tc = client.get_task_client()

a = MyTask("import os; m=os.uname()[1]",pull=['m',])
res = tc.run(a)
---------------------------------------------------------------------

On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
> Yichun,
>
> The StringTask and MapTask objects can be subclassed now:
>
> class MyStringTask(StringTask):
>
> def __init__(self, expression, pull=None, push=None,
> clear_before=False, clear_after=False, retries=0,
> recovery_task=None, depend=None, depend_data=None):
> self.depend_data = depend_data
> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
> recovery_task,depend)
>
> def check_depend(self, properties):
> if self.depend is not None:
> return self.depend(properties, self.depend_data)
> else:
> return True
>
> Could you try this subclass (I haven't run it so it may have a few
> bugs) to see it this accomplishes what you want? Once you have a
> simple example of how you want to use this, could you post it here.
>
> But, I am not sure that the property system is the best way to build
> hierarchies of dependent tasks. That seems like a different issues
> that we may want to think about separately. Thoughts Min?
>
> Thanks
>
> Brian
>
>
> On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden> wrote:
>> On Fri, Oct 31, 2008 at 8:54 PM, yichun <yichun....

Revision history for this message
Brian Granger (ellisonbg) wrote :
Download full text (16.3 KiB)

Yichun,

I am pretty sure that my subclass won't work with the old IPython1.
After we merged IPython1 into IPython (but before the the 0.9.1
release) we refactored the task stuff significantly. Is there a
reason you haven't upgraded to the 0.9.1 release of IPython?

Brian

On Wed, Nov 5, 2008 at 5:13 PM, yichun <email address hidden> wrote:
> Thanks Brian, this should really work even in the old IPython1, which
> I am still using. Will report back what I get and post an example once
> I get around to test it. -yichun
>
> On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
>> Yichun,
>>
>> The StringTask and MapTask objects can be subclassed now:
>>
>> class MyStringTask(StringTask):
>>
>> def __init__(self, expression, pull=None, push=None,
>> clear_before=False, clear_after=False, retries=0,
>> recovery_task=None, depend=None, depend_data=None):
>> self.depend_data = depend_data
>> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
>> recovery_task,depend)
>>
>> def check_depend(self, properties):
>> if self.depend is not None:
>> return self.depend(properties, self.depend_data)
>> else:
>> return True
>>
>> Could you try this subclass (I haven't run it so it may have a few
>> bugs) to see it this accomplishes what you want? Once you have a
>> simple example of how you want to use this, could you post it here.
>>
>> But, I am not sure that the property system is the best way to build
>> hierarchies of dependent tasks. That seems like a different issues
>> that we may want to think about separately. Thoughts Min?
>>
>> Thanks
>>
>> Brian
>>
>>
>> On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden> wrote:
>>> On Fri, Oct 31, 2008 at 8:54 PM, yichun <email address hidden> wrote:
>>>
>>>> On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden>
>>>> wrote:
>>>> > Sorry about the silence. I am getting caught up with email.
>>>> >
>>>> > A few points:
>>>> >
>>>> > * We definitely want to minimize the logic that the controller does.
>>>> > Eventually with might even want to move the checking of dependencies
>>>> > out of the controller to the engines themselves. The benefit of this
>>>> > is that you then don't have to sync the properties of the engines back
>>>> > to the controller.
>>>>
>>>> +1 from me on this.
>>>>
>>>
>>> If we want to move the logic to the engine, there are two logical
>>> models:
>>>
>>> add a method for engines:
>>>
>>> engine.check_depend(callable_dep_func):
>>> returns True if dep_func returns True, False if it returns False or raises
>>> exception.
>>>
>>>
>>>>
>>>> >
>>>> > * I still don't have a really clear idea of the usage case. Let's say
>>>> > you have two tasks, t1, t2. Let's say the first has finished running
>>>> > (assume 1 engine), what do you want to be checked before t2 is run?
>>>> > Before we talk about implementation, let's make sure we all have a
>>>> > clear idea of what we are trying to implement.
>>>>
>>>> For an imaginary example, t1 might end up with 2 possible result:
>>>> "dead" or "live", if t1 resulted in "dead", then t2...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (19.5 KiB)

On Wed, Nov 5, 2008 at 6:16 PM, Min Ragan-Kelley <email address hidden> wrote:
> I fail to see why you would want to run the recovery task on the same engine
> as a failure. My understanding of the scenario was that you have a sequence
> of tasks to run, which would only need to continue staying on a single
> engine once a version of task1 succeeded *somewhere*. This should do that.
>
>
> The model for the recovery_task is 'if t doesn't work, try something else',
> and something else includes a potentially different engine, so the general
> behavior of the recovery_task should certainly not be to run on the same
> engine. Under normal usage, tasks fail when something has gone wrong, and
> it could be that the engine is responsible, or that some task code
> segfaulted on that machine, and took out the engine - in which case it would
> certainly not want to try to resubmit to that machine. If you want

or the engine is not ready. In this case one would want to run something in the
same engine that failed, preparing it to be ready. This happens when
new engines
become available/connected and the user cannot interactively prepare the engine
via MultiEngineClient, say, when not operating interactively.

> t1,t1',etc. to all run on one machine, then I would actually suggest that
> they be a single task, because it doesn't really make sense to implement:
> t1s = [t1,t1',t1''...]
> success = False
> while t1s and success:
> try:
> do t1s[0]
> except:
> t1s.pop(0)
> else:
> success = True
>
> with the task interface unless you want the try-except to include the engine
> as the possible cause for failure - which is exactly what the recovery_task
> gets you.
>
> If there is a case to be made for the additional functionality of a task to
> be run on an engine in the event of that same engine's failure to complete a
> task (I can see failure logging or something), then we can consider adding
> that functionality, but that is certainly not the appropriate behavior of
> the current model.

I agree that's recovery_task's purpose, thought It doesn't cope with the
following use case: in your t1s list, say with only 2 tasks [t1,t2], let's say
one needs to run different versions of t2 on every engine that had already
run t1 successfully, but for various reasons we do not want to make
 (t1+t2) as one task (t1 is time consuming etc etc...), instead we
directly run t2 and if it failed, we would like to run t1 on the engines that
had failed the first run of t2.

>
> -MinRK
>
> On Wed, Nov 5, 2008 at 5:13 PM, yichun <email address hidden> wrote:
>
>> Thanks Brian, this should really work even in the old IPython1, which
>> I am still using. Will report back what I get and post an example once
>> I get around to test it. -yichun
>>
>> On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
>> > Yichun,
>> >
>> > The StringTask and MapTask objects can be subclassed now:
>> >
>> > class MyStringTask(StringTask):
>> >
>> > def __init__(self, expression, pull=None, push=None,
>> > clear_before=False, clear_after=False, retries=0,
>> > recovery_task=None, depend=None, depend_data=None):
>> > self.depend...

Revision history for this message
yichun (yichun-wei) wrote :

oh. I'll try 0.9.1. I still use ipcluster script to start a cluster.
that's the reason still dwelling on old code. What is the recommended
way to start a cluster in 0.9.1?

Because I am sharing computing resource with others using PBS, I try
to submit engines as batch jobs to pbs server, but since the engines
are available at different time and I would like to be able to utilize
those newly available engines with ipython's task system, I ran into
problems of job dependencies...

I'll give 0.9.1 a try some time later..

On Wed, Nov 5, 2008 at 6:38 PM, Brian Granger <email address hidden> wrote:
> Yichun,
>
> I am pretty sure that my subclass won't work with the old IPython1.
> After we merged IPython1 into IPython (but before the the 0.9.1
> release) we refactored the task stuff significantly. Is there a
> reason you haven't upgraded to the 0.9.1 release of IPython?

Revision history for this message
Min Ragan-Kelley (minrk) wrote :
Download full text (4.4 KiB)

I am a bit confused about the case you are describing. My understanding of
the first proposal was:
t1s = [t1,t1',t1''...]
and t2s = [t2,t2',t2''...]

where the t1 list is walked as there are failures, and when there are
successes, then the t2 list is followed on engines that have completed a
version of t1 and so on, trying a different version of tn until it succeeds,
then moving on to t(n+1) when a version of tn succeeds. I believe this
absolutely can be implemented in the way I proposed with the current system.

Now, in your second proposal, you seem to want to run another version of t1
where some version of t2 fails after the initial t1 succeeded? I still
think this can be implemented using the properties and dependencies as a
sort of State Machine, but it would get sticky.

I can see a desire for a cleanup_task (or some other name), which is to be
attempted on an engine in the case of its failure (unless that failure
includes losing the engine). This would be an _additional_ method,
orthogonal to the current recovery_task.

The two models to me are summarized as the following:
recovery_task: "if I failed, try something _else_"
cleanup_task: "if I failed, try to fix it"

And I can certainly imagine cases where both would be desirable.

As for handling incoming engines not being initialized, it makes more sense
to me to use the "ipengine -s <initscript.py>" to ensure that engines are
initialized when they connect than for tasks to assume that the engine has
not been initialized in the event of a failure. The tasks should have
initialization as part of their dependencies - the point of dependencies is
to not run on an engine that is unprepared for the task. The case you seem
to have proposed is analagous to:
try:
   task()
except:
   prepare_for_task()
where using dependencies would be:
if prepared_for_task:
  try:
    task()
  except:
    recover_from_real_failure()

Also, the top of your task chain (t1''''') could include initialization, and
that, too, would solve the new engines problem.

-MinRK

On Wed, Nov 5, 2008 at 7:28 PM, yichun <email address hidden> wrote:

> oh. I'll try 0.9.1. I still use ipcluster script to start a cluster.
> that's the reason still dwelling on old code. What is the recommended
> way to start a cluster in 0.9.1?
>
> Because I am sharing computing resource with others using PBS, I try
> to submit engines as batch jobs to pbs server, but since the engines
> are available at different time and I would like to be able to utilize
> those newly available engines with ipython's task system, I ran into
> problems of job dependencies...
>
> I'll give 0.9.1 a try some time later..
>
> On Wed, Nov 5, 2008 at 6:38 PM, Brian Granger <email address hidden> wrote:
> > Yichun,
> >
> > I am pretty sure that my subclass won't work with the old IPython1.
> > After we merged IPython1 into IPython (but before the the 0.9.1
> > release) we refactored the task stuff significantly. Is there a
> > reason you haven't upgraded to the 0.9.1 release of IPython?
>
> --
> propose change of API for IPython.kernel.task.Task's depend function
> https://bugs.launchpad.net/bugs/289561
> You received this bug notification because you are a me...

Read more...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (5.1 KiB)

On Thu, Nov 6, 2008 at 10:43 PM, Min Ragan-Kelley <email address hidden> wrote:
> I am a bit confused about the case you are describing. My understanding of
> the first proposal was:
> t1s = [t1,t1',t1''...]
> and t2s = [t2,t2',t2''...]
>
> where the t1 list is walked as there are failures, and when there are
> successes, then the t2 list is followed on engines that have completed a
> version of t1 and so on, trying a different version of tn until it succeeds,
> then moving on to t(n+1) when a version of tn succeeds. I believe this
> absolutely can be implemented in the way I proposed with the current system.
>
> Now, in your second proposal, you seem to want to run another version of t1
> where some version of t2 fails after the initial t1 succeeded? I still
> think this can be implemented using the properties and dependencies as a
> sort of State Machine, but it would get sticky.

You are right, Min. Sorry for mixing two different conditions in my previous
discussions.

>
> I can see a desire for a cleanup_task (or some other name), which is to be
> attempted on an engine in the case of its failure (unless that failure
> includes losing the engine). This would be an _additional_ method,
> orthogonal to the current recovery_task.
>
> The two models to me are summarized as the following:
> recovery_task: "if I failed, try something _else_"
> cleanup_task: "if I failed, try to fix it"
>
> And I can certainly imagine cases where both would be desirable.
>
> As for handling incoming engines not being initialized, it makes more sense
> to me to use the "ipengine -s <initscript.py>" to ensure that engines are
> initialized when they connect than for tasks to assume that the engine has
> not been initialized in the event of a failure. The tasks should have
> initialization as part of their dependencies - the point of dependencies is
> to not run on an engine that is unprepared for the task. The case you seem
> to have proposed is analagous to:
> try:
> task()
> except:
> prepare_for_task()
> where using dependencies would be:
> if prepared_for_task:
> try:
> task()
> except:
> recover_from_real_failure()
>
> Also, the top of your task chain (t1''''') could include initialization, and
> that, too, would solve the new engines problem.

Well, "ipengine -s init.py" doesn't work for me. But you are
absolutely right in that I should rewrite my tasks so they have such
try:...except... structure. I will try that out after I upgrade to
0.9.1, quite some code that needs change.

-yichun

>
> -MinRK
>
> On Wed, Nov 5, 2008 at 7:28 PM, yichun <email address hidden> wrote:
>
>> oh. I'll try 0.9.1. I still use ipcluster script to start a cluster.
>> that's the reason still dwelling on old code. What is the recommended
>> way to start a cluster in 0.9.1?
>>
>> Because I am sharing computing resource with others using PBS, I try
>> to submit engines as batch jobs to pbs server, but since the engines
>> are available at different time and I would like to be able to utilize
>> those newly available engines with ipython's task system, I ran into
>> problems of job dependencies...
>>
>> I'll give 0.9.1 a try some time later..
>>
>> On Wed, Nov 5, 200...

Read more...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (17.1 KiB)

Brian, what is the reason for ipcluster in 0.9.1 to be broken? Looks
like a simple fix removing obsolete ipengine option args make it work.
But I might be missing something obvious... -yichun

On Wed, Nov 5, 2008 at 6:38 PM, Brian Granger <email address hidden> wrote:
> Yichun,
>
> I am pretty sure that my subclass won't work with the old IPython1.
> After we merged IPython1 into IPython (but before the the 0.9.1
> release) we refactored the task stuff significantly. Is there a
> reason you haven't upgraded to the 0.9.1 release of IPython?
>
> Brian
>
> On Wed, Nov 5, 2008 at 5:13 PM, yichun <email address hidden> wrote:
>> Thanks Brian, this should really work even in the old IPython1, which
>> I am still using. Will report back what I get and post an example once
>> I get around to test it. -yichun
>>
>> On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
>>> Yichun,
>>>
>>> The StringTask and MapTask objects can be subclassed now:
>>>
>>> class MyStringTask(StringTask):
>>>
>>> def __init__(self, expression, pull=None, push=None,
>>> clear_before=False, clear_after=False, retries=0,
>>> recovery_task=None, depend=None, depend_data=None):
>>> self.depend_data = depend_data
>>> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
>>> recovery_task,depend)
>>>
>>> def check_depend(self, properties):
>>> if self.depend is not None:
>>> return self.depend(properties, self.depend_data)
>>> else:
>>> return True
>>>
>>> Could you try this subclass (I haven't run it so it may have a few
>>> bugs) to see it this accomplishes what you want? Once you have a
>>> simple example of how you want to use this, could you post it here.
>>>
>>> But, I am not sure that the property system is the best way to build
>>> hierarchies of dependent tasks. That seems like a different issues
>>> that we may want to think about separately. Thoughts Min?
>>>
>>> Thanks
>>>
>>> Brian
>>>
>>>
>>> On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden> wrote:
>>>> On Fri, Oct 31, 2008 at 8:54 PM, yichun <email address hidden> wrote:
>>>>
>>>>> On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden>
>>>>> wrote:
>>>>> > Sorry about the silence. I am getting caught up with email.
>>>>> >
>>>>> > A few points:
>>>>> >
>>>>> > * We definitely want to minimize the logic that the controller does.
>>>>> > Eventually with might even want to move the checking of dependencies
>>>>> > out of the controller to the engines themselves. The benefit of this
>>>>> > is that you then don't have to sync the properties of the engines back
>>>>> > to the controller.
>>>>>
>>>>> +1 from me on this.
>>>>>
>>>>
>>>> If we want to move the logic to the engine, there are two logical
>>>> models:
>>>>
>>>> add a method for engines:
>>>>
>>>> engine.check_depend(callable_dep_func):
>>>> returns True if dep_func returns True, False if it returns False or raises
>>>> exception.
>>>>
>>>>
>>>>>
>>>>> >
>>>>> > * I still don't have a really clear idea of the usage case. Let's say
>>>>> > you have two tasks, t1, t2. Let's say the first ha...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (4.3 KiB)

On Sat, Nov 8, 2008 at 5:28 PM, Yichun Wei <email address hidden> wrote:
>> As for handling incoming engines not being initialized, it makes more sense
>> to me to use the "ipengine -s <initscript.py>" to ensure that engines are
>> initialized when they connect than for tasks to assume that the engine has
>> not been initialized in the event of a failure. The tasks should have
>> initialization as part of their dependencies - the point of dependencies is
>> to not run on an engine that is unprepared for the task. The case you seem
>> to have proposed is analagous to:
>> try:
>> task()
>> except:
>> prepare_for_task()
>> where using dependencies would be:
>> if prepared_for_task:
>> try:
>> task()
>> except:
>> recover_from_real_failure()
>>
>> Also, the top of your task chain (t1''''') could include initialization, and
>> that, too, would solve the new engines problem.
>
> Well, "ipengine -s init.py" doesn't work for me. But you are
> absolutely right in that I should rewrite my tasks so they have such
> try:...except... structure. I will try that out after I upgrade to
> 0.9.1, quite some code that needs change.

After having a brief look at 0.9.1, I realized that what I need is the
MapTask class. May I ask how would you prepare engines so one can
submit MapTasks without worrying engines not ready? Rewriting the task
as this try: catch: pattern could hardly work under that scenario. I
hope this "ipengine init.py" method will really work, but at a first
try it doesn't seem to work for me. Any suggestion is highly
appreciated.

-yichun

>
> -yichun
>
>>
>> -MinRK
>>
>> On Wed, Nov 5, 2008 at 7:28 PM, yichun <email address hidden> wrote:
>>
>>> oh. I'll try 0.9.1. I still use ipcluster script to start a cluster.
>>> that's the reason still dwelling on old code. What is the recommended
>>> way to start a cluster in 0.9.1?
>>>
>>> Because I am sharing computing resource with others using PBS, I try
>>> to submit engines as batch jobs to pbs server, but since the engines
>>> are available at different time and I would like to be able to utilize
>>> those newly available engines with ipython's task system, I ran into
>>> problems of job dependencies...
>>>
>>> I'll give 0.9.1 a try some time later..
>>>
>>> On Wed, Nov 5, 2008 at 6:38 PM, Brian Granger <email address hidden> wrote:
>>> > Yichun,
>>> >
>>> > I am pretty sure that my subclass won't work with the old IPython1.
>>> > After we merged IPython1 into IPython (but before the the 0.9.1
>>> > release) we refactored the task stuff significantly. Is there a
>>> > reason you haven't upgraded to the 0.9.1 release of IPython?
>>>
>>> --
>>> propose change of API for IPython.kernel.task.Task's depend function
>>> https://bugs.launchpad.net/bugs/289561
>>> You received this bug notification because you are a member of IPython
>>> Developers, which is subscribed to IPython.
>>>
>>> Status in IPython - Enhanced Interactive Python: New
>>>
>>> Bug description:
>>> Currently IPython.kernel.task.Task can have a depend function that is
>>> called with the only argument as the property dictionary of the engine.
>>>
>>> Since whether or not a task is runnable on an engine depends on (1) the
>>...

Read more...

Revision history for this message
yichun (yichun-wei) wrote :
Download full text (17.8 KiB)

Brain,

I am reporting back after having a look at 0.9.1.

Issue 1: ipcluster.py script doesn't look like broken in a deep way.
Issue 2: If I am to add the previously discussed depend(properties,
depend_data) method to, maybe my own renamed version of, the tc.map
function, then do I have to subclass MapTask and
SynchronousTaskMapper, then start to write the tc.map function that
accept an extra depend= argument? This looks like a bit more
troublesome than necessary, I might give it a try later next week.

For my own engine initialization problem, I started to like the simple
method Ming proposed: to have engines run a script when they come up
and before they register to the controller, say, by launching the
engines with "ipengine -s initscript.py". However, I did not see an
apparent way to manage it. Maybe again I am missing something here?

-yichun

On Wed, Nov 5, 2008 at 6:38 PM, Brian Granger <email address hidden> wrote:
> Yichun,
>
> I am pretty sure that my subclass won't work with the old IPython1.
> After we merged IPython1 into IPython (but before the the 0.9.1
> release) we refactored the task stuff significantly. Is there a
> reason you haven't upgraded to the 0.9.1 release of IPython?
>
> Brian
>
> On Wed, Nov 5, 2008 at 5:13 PM, yichun <email address hidden> wrote:
>> Thanks Brian, this should really work even in the old IPython1, which
>> I am still using. Will report back what I get and post an example once
>> I get around to test it. -yichun
>>
>> On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
>>> Yichun,
>>>
>>> The StringTask and MapTask objects can be subclassed now:
>>>
>>> class MyStringTask(StringTask):
>>>
>>> def __init__(self, expression, pull=None, push=None,
>>> clear_before=False, clear_after=False, retries=0,
>>> recovery_task=None, depend=None, depend_data=None):
>>> self.depend_data = depend_data
>>> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
>>> recovery_task,depend)
>>>
>>> def check_depend(self, properties):
>>> if self.depend is not None:
>>> return self.depend(properties, self.depend_data)
>>> else:
>>> return True
>>>
>>> Could you try this subclass (I haven't run it so it may have a few
>>> bugs) to see it this accomplishes what you want? Once you have a
>>> simple example of how you want to use this, could you post it here.
>>>
>>> But, I am not sure that the property system is the best way to build
>>> hierarchies of dependent tasks. That seems like a different issues
>>> that we may want to think about separately. Thoughts Min?
>>>
>>> Thanks
>>>
>>> Brian
>>>
>>>
>>> On Sat, Nov 1, 2008 at 12:25 AM, Min Ragan-Kelley <email address hidden> wrote:
>>>> On Fri, Oct 31, 2008 at 8:54 PM, yichun <email address hidden> wrote:
>>>>
>>>>> On Thu, Oct 30, 2008 at 2:27 PM, Brian Granger <email address hidden>
>>>>> wrote:
>>>>> > Sorry about the silence. I am getting caught up with email.
>>>>> >
>>>>> > A few points:
>>>>> >
>>>>> > * We definitely want to minimize the logic that the controller does.
>>>>> > Eventually with might even want to move the checking of...

Revision history for this message
Brian Granger (ellisonbg) wrote :
Download full text (20.6 KiB)

> Issue 1: ipcluster.py script doesn't look like broken in a deep way.

There is a new version of ipcluster that we are about to release. If
you want to play with it, it is in my branch launchpad:

https://code.launchpad.net/~ellisonbg/ipython/trunk-dev

This fixes a number of bugs in the old version and also adds new
capabilities. Docs are not yet written though. Hopefully in the next
week.

> Issue 2: If I am to add the previously discussed depend(properties,
> depend_data) method to, maybe my own renamed version of, the tc.map
> function, then do I have to subclass MapTask and
> SynchronousTaskMapper, then start to write the tc.map function that
> accept an extra depend= argument? This looks like a bit more
> troublesome than necessary, I might give it a try later next week.

Yes, the current API for tc.map is not setup to use a custom MapTask
subclass. I am super busy right now, but feel free to play with these
things and let us know if you come up with something that works and
fits your needs. However, I think we might need to rethink some of
the design of how properties are used for tests. I guess the main
thing that I don't see is how we can introduce task dependencies in a
clean way. My feeling is that using properties is *not* the way to
go.

> For my own engine initialization problem, I started to like the simple
> method Ming proposed: to have engines run a script when they come up
> and before they register to the controller, say, by launching the
> engines with "ipengine -s initscript.py". However, I did not see an
> apparent way to manage it. Maybe again I am missing something here?

Can you submit a new bug report about this particular issue. It
should work, but it doesn't! Also, this will be easy to fix!

Cheers,

Brian

> -yichun
>
> On Wed, Nov 5, 2008 at 6:38 PM, Brian Granger <email address hidden> wrote:
>> Yichun,
>>
>> I am pretty sure that my subclass won't work with the old IPython1.
>> After we merged IPython1 into IPython (but before the the 0.9.1
>> release) we refactored the task stuff significantly. Is there a
>> reason you haven't upgraded to the 0.9.1 release of IPython?
>>
>> Brian
>>
>> On Wed, Nov 5, 2008 at 5:13 PM, yichun <email address hidden> wrote:
>>> Thanks Brian, this should really work even in the old IPython1, which
>>> I am still using. Will report back what I get and post an example once
>>> I get around to test it. -yichun
>>>
>>> On Sat, Nov 1, 2008 at 1:51 PM, Brian Granger <email address hidden> wrote:
>>>> Yichun,
>>>>
>>>> The StringTask and MapTask objects can be subclassed now:
>>>>
>>>> class MyStringTask(StringTask):
>>>>
>>>> def __init__(self, expression, pull=None, push=None,
>>>> clear_before=False, clear_after=False, retries=0,
>>>> recovery_task=None, depend=None, depend_data=None):
>>>> self.depend_data = depend_data
>>>> StringTask.__init__(expression,pull,push,clear_before,clear_after,retries,
>>>> recovery_task,depend)
>>>>
>>>> def check_depend(self, properties):
>>>> if self.depend is not None:
>>>> return self.depend(properties, self.depend_data)
>>>> else:
>>>> return True
>>>>...

Revision history for this message
Brian Granger (ellisonbg) wrote :

We are going to change the task dependency API in a completely different way than this.

Changed in ipython:
assignee: nobody → ellisonbg
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.