Gearman Jobserver Failover

Bug #585054 reported by Felix Gorodishter on 2010-05-24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Clint Byrum

Bug Description

It appears that currently libgearman does not allow for proper failover to utilize a second jobserver.

I run two gearman job servers and if i take the first one down, the jobs simply fail with an error of:
   gearman_connection_flush:could not connect

(For what it's worth, I use the gearman PHP extension, currently on version 0.7.0, to interface with gearman)

It appears the error is coming from line 488 of libgearman/connection.c where the connection->addrinfo_next is NULL.

Related branches

description: updated
Clint Byrum (clint-fewbar) wrote :

I've seen this affect too, and I think the logic is slightly off for the failover in some instances, but not all.

Changed in gearmand:
status: New → Confirmed
Clint Byrum (clint-fewbar) wrote :

Merge proposal submitted.. I think this is due to the connection code checking for POLLOUT before POLLERR, so it wrongly assumes that a socket has been connected because it comes back with both POLLOUT and POLLERR revents.

Changed in gearmand:
assignee: nobody → Clint Byrum (clint-fewbar)
Brian Aker (brianaker) wrote :

Clint's fix is in trunk.

Changed in gearmand:
status: Confirmed → Fix Committed
Brian Aker (brianaker) on 2011-02-22
Changed in gearmand:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers