PCJ race between process-job-source.py and celery can generate OOPS
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | Launchpad itself |
Critical
|
Unassigned | ||
Bug Description
I got OOPS-7d0f700be19191e98139cdab67a81ea7, which is:
InvalidTransi
Traceback (most recent call last):
Module lazr.jobrunner.
self.
Module lp.services.
super(
Module lazr.jobrunner.
job.
Module lp.services.
self.
Module lp.services.
raise InvalidTransiti
InvalidTransition: Transition from Running to Running is invalid.
<oops-message-0>: {'target_
This was because the job had been picked up by celery at almost exactly the same time:
[2014-04-30 09:23:13,769: DEBUG3/
[2014-04-30 09:23:13,881: INFO/PoolWorker-3] Running <PlainPackageCo
2014-04-30 09:23:13 DEBUG Trying to acquire lease for job in state Waiting
2014-04-30 09:23:13 INFO Running <PlainPackageCo
2014-04-30 09:23:14 INFO Job resulted in OOPS: OOPS-7d0f700be19191e98139cdab67a81ea7
So this is harmless in that the copy happened anyway, but Critical by Launchpad bug policy since it shouldn't generate an OOPS.
I thought the point of acquiring a lease for the job was that it couldn't be picked up by another job runner. Does celery not honour that?
| Colin Watson (cjwatson) wrote : | #1 |
| Changed in launchpad: | |
| status: | New → Triaged |

I think the problem may be in lazr.jobrunner. RunJob.run does indeed do a job.acquireLease(), but it doesn't commit the transaction at that point (unlike JobRunner.runAll) so other processes won't see it.