gearmand locks up after processing large replay queue

Bug #1381817 reported by Mark Elrod on 2014-10-15
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

We are using Gearman 1.1.12 and during our testing of large queues encountered an issue when restarting. It seems that while the server replays the queue it accepts connections but does not service requests. This leads to the connected workers timing out eventually but oddly the server thinks it has handed out jobs to those workers and so everything grinds to a halt. Even odder, in gearadmin we have sometimes seen the number of jobs allocated to workers be greater than the actual number of workers.

We have found that moving the queue replay code in gearmand_run to the beginning rather than the end remedies the lock up since it does not accept connections until it is done but this is at the expense of waiting until the server is done before any other work could be submitted and so is not ideal.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers