Refactor the task dependencies system

Bug #361419 reported by Brian Granger
Affects Status Importance Assigned to Milestone
In Progress
Brian Granger

Bug Description

Currently the task dependency system does lots of work on the controller. This makes the system difficult to use and causes potential performance problems for the controller. It also leads to user code being run on the controller, which we want to avoid. To fix these issue, we will look into refactoring the dependency system in the following manner:

1. We will create an exception TaskRejectError or something that users should raise in the task code if an engine does have the required deps.

2. The user will specify the retries argument to get the task to be rerun.

3. We will also add a new keyword argument to our task objects that tells which engines the task can be run on. We will also modify the scheduler to use this information.

The only issue that needs to be worked out is whether or not we want the schedular to distinguish between a task that fails with TaskRejectError and one that fails for another reason. If we don't distinguish tasks that truly do fail could be retried a large number of times.

Related branches

Changed in ipython:
assignee: nobody → ellisonbg
importance: Undecided → Medium
status: New → Confirmed
Changed in ipython:
status: Confirmed → In Progress
Revision history for this message
Brian Granger (ellisonbg) wrote :

We are going to actually implement taskid based task dependencies to allow full DAG based scheduling.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers