Jobs are not stopped by a celery kill

Bug #1292606 reported by Daniele Viganò
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake Engine
Fix Released
Critical
Michele Simionato

Bug Description

When you kill celery on the workers the job on the masters doesn't exit and keep running waiting for data. This behavior probably depends on the switch to iterator_native() which is not polling the workers anymore (the old iterator did)

Changed in oq-engine:
importance: Medium → Critical
status: New → Confirmed
Changed in oq-engine:
status: Confirmed → In Progress
Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

We found two solutions:

- Modify celery to add a callback to drain_events() in backends/amqp.py [iter_native() -> get_many() -> drain_events()]
- Create a supervisor as a forked thread/process which monitor the workers status

Revision history for this message
Daniele Viganò (daniele-vigano) wrote :
Revision history for this message
Daniele Viganò (daniele-vigano) wrote :
Revision history for this message
Daniele Viganò (daniele-vigano) wrote :
Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

The above patches are just a mock-up fo the solution 1) "Modify celery to add a callback to drain_events() in backends/amqp.py [iter_native() -> get_many() -> drain_events()]"

Changed in oq-engine:
status: In Progress → Fix Committed
Revision history for this message
Michele Simionato (michele-simionato) wrote :

Solution two implemented here: https://github.com/gem/oq-engine/pull/1403

Changed in oq-engine:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.