Supervisors must detect and document failed OQ jobs

Bug #809231 reported by Muharem Hrnjadovic
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenQuake (deprecated)
Fix Released
High
Gabriele Favalessa

Bug Description

This is the responsibility of the OQ job's supervisor:
When the machines/workers calculating a job emit log records with errors/failures the job is to be terminated and marked as failed in the postgres database.
Also brief/detailed errors are to be stored in the db so that the various front-ends can display these to the end users.

tags: added: database error-feedback
summary: - A job must be marked as failed upon seeing log records with
- errors/failures
+ Failed/crashed OQ jobs must have their status and error info in the
+ postgres db updated
Changed in openquake:
status: New → Confirmed
importance: Undecided → High
milestone: none → 0.4.2
description: updated
tags: added: job-supervision user-interface
removed: error-feedback
summary: - Failed/crashed OQ jobs must have their status and error info in the
- postgres db updated
+ Supervisors must detect and document failed/crashed OQ jobs
description: updated
summary: - Supervisors must detect and document failed/crashed OQ jobs
+ Supervisors must detect and document failedOQ jobs
summary: - Supervisors must detect and document failedOQ jobs
+ Supervisors must detect and document failed OQ jobs
description: updated
Revision history for this message
Muharem Hrnjadovic (al-maisan) wrote :

terminate the failed job and clean up after it (revocation of pending celery tasks, trigger redis garbage collection)

John Tarter (toh2)
Changed in openquake:
milestone: 0.4.2 → 0.4.3
Changed in openquake:
assignee: nobody → Gabriele Favalessa (favalex)
Changed in openquake:
status: Confirmed → In Progress
Revision history for this message
Gabriele Favalessa (favalex) wrote :
Changed in openquake:
status: In Progress → Fix Committed
Changed in openquake:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.