Premature/missing NOOP wake-up for epoch jobs

Bug #1397445 reported by Berend Ozceri on 2014-11-28
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman
Undecided
Unassigned

Bug Description

Workers that have notified the jobs server of their intention to go to sleep via the PRESLEEP command are woken up when a new epoch job is added to the queue, rather than when the job becomes runnable (due to reaching its scheduled epoch). This can create a scenario (on otherwise-quiet queues) where an epoch job that's schedule to run 1<=N<60 seconds from "now" may get stuck in the queue up to 60 seconds.

This occurs because when the worker is woken up via the NOOP and issues a GRAB_JOB command, there are no eligible jobs (the job is in the queue, but its epoch is >=1 second in the future), so the worker receives a NO_JOB response, and promptly goes back to sleep. When the epoch time of the job arrives, there's no asynchronous communication from the server to the worker to wake it up, so the worker continues sleeping until its "sleep timeout," at which point it issues a GRAB_JOB and find the runnable job.

It feels like in addition to sending NOOPs to workers that are sleeping at the time jobs are added, the same has to be done when epochs of otherwise ineligible jobs are reached.

Berend Ozceri (berend-p) on 2014-11-28
description: updated
description: updated
summary: - Premature NOOP wake-up for epoch jobs
+ Premature/missing NOOP wake-up for epoch jobs
chjgcn (chjgcn) wrote :

You can use the patch file at #6 comment in
        https://bugs.launchpad.net/gearmand/+bug/1339730

yunfei (233602551-t) wrote :

Hi, chjgcn. I read some code of your patch, I found a little problem.

In libgearman-server/job.cc , The function : gearman_server_job_queue.

" while (worker != job->function->worker_list && (worker_wakeup == 0 || worker_wakeup < noop_sent));" This line.

Are you sure the condition " worker_wakeup < noop_sent" is right?

Maybe it's "noop_sent < worker_wakeup ". Please confirm this, thx!

chjgcn (chjgcn) wrote :

Hi, yunfei, you are right! I had made a mistake there. When I moved 'worker_wakeup' to the left, I forgot to change '<' to '>' .
Thank you!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers