- With the addition of the STARTED state, I think it is extremely unlikely to get PENDING tasks lost. Let's focus on lost STARTED tasks then.
- ClickPackageUpload.scan_task and .review_task are plain string ids that can be used to retrieve the task via:
- celery.result.AsyncResult(task_id) from the results backend (DB as per django-celery-results), or
- celery_app.control.inspect() from running workers.
- As queues/workers are partitined by sca release, workers only contain a partial view of running tasks. Thus, tasks have to be retrieved from the results backend.
- Cron job to run on the active leader node every 1h.
- Get all STARTED scan/review tasks from the results backend that are older than the task timeout (20 minutes), but no older than eg. 24h to prevent infinite loops (need to update django-celery-results to latest version which added date_created field to TaskResult model [0]).
- The task status can be updated to some custom state such as 'LOST' for easier tracing; this seems to be safe and won't affect celery internals as the task *is* effectively lost.
- Fire a new task of the same type with the same args/kwargs as the retrieved task.
- Update the scan_task/review_task field in the respective upload.
- Profit.
Thanks all for your feedback. The plan:
- With the addition of the STARTED state, I think it is extremely unlikely to get PENDING tasks lost. Let's focus on lost STARTED tasks then.
- ClickPackageUpl oad.scan_ task and .review_task are plain string ids that can be used to retrieve the task via: result. AsyncResult( task_id) from the results backend (DB as per django- celery- results) , or app.control. inspect( ) from running workers.
- celery.
- celery_
- As queues/workers are partitined by sca release, workers only contain a partial view of running tasks. Thus, tasks have to be retrieved from the results backend.
- Cron job to run on the active leader node every 1h. celery- results to latest version which added date_created field to TaskResult model [0]). review_ task field in the respective upload.
- Get all STARTED scan/review tasks from the results backend that are older than the task timeout (20 minutes), but no older than eg. 24h to prevent infinite loops (need to update django-
- The task status can be updated to some custom state such as 'LOST' for easier tracing; this seems to be safe and won't affect celery internals as the task *is* effectively lost.
- Fire a new task of the same type with the same args/kwargs as the retrieved task.
- Update the scan_task/
- Profit.
Makes sense?
[0] https:/ /github. com/celery/ django- celery- results/ pull/111