gearmand locks up after processing large replay queue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gearman |
New
|
Undecided
|
Unassigned |
Bug Description
We are using Gearman 1.1.12 and during our testing of large queues encountered an issue when restarting. It seems that while the server replays the queue it accepts connections but does not service requests. This leads to the connected workers timing out eventually but oddly the server thinks it has handed out jobs to those workers and so everything grinds to a halt. Even odder, in gearadmin we have sometimes seen the number of jobs allocated to workers be greater than the actual number of workers.
We have found that moving the queue replay code in gearmand_run to the beginning rather than the end remedies the lock up since it does not accept connections until it is done but this is at the expense of waiting until the server is done before any other work could be submitted and so is not ideal.