Job revocation (CTRL-C) will stop also other jobs

Bug #1244648 reported by Daniele Viganò
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake Engine
Fix Released
Critical
Michele Simionato

Bug Description

When CTRL-C is pressed OpenQuake master will send a revoke command to all the workers. This is causing an automatic restart of celery which will lead to a failure on other jobs running on the same workers.

Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

Probably we are using 'terminate=True' and/or with the wrong SIGNAL.
http://docs.celeryproject.org/en/2.4/userguide/workers.html#revoking-tasks

Changed in oq-engine:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Michele Simionato (michele-simionato) wrote :

Yes, we are using terminate=True, see ./openquake/engine/supervising/supervisor.py

Revision history for this message
Daniele Viganò (daniele-vigano) wrote :

As the default sign sent is TERM we are soft-killing the celery processes. This will break communication between RabbitMQ/AMPQ and celery causing a calculation failure.

Changed in oq-engine:
importance: High → Critical
assignee: nobody → Michele Simionato (michele-simionato)
milestone: none → 1.0.1
Revision history for this message
Michele Simionato (michele-simionato) wrote :

Fixed in https://github.com/gem/oq-engine/pull/1303 (but we would need a better solution)

Changed in oq-engine:
status: Confirmed → Fix Committed
Changed in oq-engine:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.