Comment 0 for bug 1397445

Revision history for this message
Berend Ozceri (berend-p) wrote : Premature NOOP wake-up for epoch jobs

Workers that have notified the jobs server of their intention to go to sleep via the PRESLEEP command are woken up when a new epoch job is added to the queue, rather than when the job becomes runnable (due to reaching its scheduled epoch). This can create a scenario (on otherwise-quiet queues) where an epoch job that's schedule to run 1<=N<60 seconds from "now" may get stuck in the queue up to 60 seconds.

This occurs because when the worker is woken up via the NOOP and issues a GRAB_JOB command, there are no eligible jobs (the job is in the queue, but it's epoch is >=1 second in the future), so it receives a NO_JOB response, and promptly goes back to sleep. When the epoch time of the job arrives, there's no communication from the server to the worker to wake it up, so the worker continues sleeping until its "sleep timeout," at which point it issues a GRAB_JOB and find the runnable job.

It feels like in addition to sending NOOPs to workers that are sleeping at the time jobs are added, the same has to be done when epochs of otherwise ineligible jobs are reached.