Use the "parallelize" distribution on the classical calculator

Bug #1240046 reported by Michele Simionato
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake Engine
Fix Released
High
Michele Simionato

Bug Description

The classical calculator, as all the calculators except the event based one, is based on a custom distribution mechanism built on top of kombu, which is implementing a task pool. Such approach has a lot of know disadvantages:

1. it is not the standard celery distribution mechanism
2. it is cumbersome to use: in particular it is very easy to forget to update the progress counter correctly, thus having a calculator that hangs (this happened several time when developing new calculators)
3. it is terrible with error reporting: the errors are logged and a supervisor is needed, just to parse the log files and to kill the computation is an error occurs; in Jenkins and the engine-server the supervisor is missing, so an error in a task hangs the whole computation
4. the distribution is tightly coupled with a custom progress mechanism which is complex and requires Redis as a dependency: so, it is hard to port on Windows, an issue in view of the engine-lite project
5. the distribution does not limit the number of tasks generated, so it is pretty easy to use a block_size too small, generate a lot of tasks and having a calculation that does nothing except passing messages around with rabbitmq.

The event based calculator has a distribution mechanism (the "parallelize" mechanism) with none of these issues, so it makes sense to use it for all calculators.

Changed in oq-engine:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Michele Simionato (michele-simionato)
milestone: none → 1.0.1
Revision history for this message
Michele Simionato (michele-simionato) wrote :
Changed in oq-engine:
status: In Progress → Fix Committed
Revision history for this message
Michele Simionato (michele-simionato) wrote :

The approach was extended to all calculators in https://github.com/gem/oq-engine/pull/1311

Changed in oq-engine:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.