Gearman Server and Client Libraries

Client connection handling improvements

Reported by Keyur on 2013-01-11
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gearman
Medium
Brian Aker

Bug Description

There's 2 scenarios that this bug relates to:

1) a client adds 2 or more servers (gearmand-1 and gearmand-2 in that order). It sends a job over to gearmand-1 and all is well. gearmand-1 then dies. The client tries to insert a new job insertions and it fails with LOST_CONNECTION. The client tries again and gets a COULD_NOT_CONNECT, even though gearmand-2 is up. In fact, the job is sent to gearmand-2 but the client never reads the server's response. The server though will merrily process that job.

2) This is similar. A client adds 2 or more servers (gearmand-1 and gearmand-2 in that order). It sends a job over to gearmand-1 and all is well. gearmand-1 then dies. The client tries to insert 2 new jobs. If #1 is not fixed, both will fail. Then gearmand-1 comes back. Now when a new job is tried to be inserted to gearmand-1, the client will go into a poll() and timeout. This is even though the server actually responds correctly and has inserted the job into its queue.

The test case for both is attached.

Patch 1 fixes scenario #1. We do not return an error code from gearman_flush_all(). Instead let the errors bubble up in gearman_wait().

Patch 2 fixes scenario #2. The created_id in the connection and task get out of sync when there are errors. Instead of figuring out all the places to update to keep them in sync, I thought it simpler to reset the connection id's when it is closed.

Keyur (keyurdg) wrote :
Keyur (keyurdg) wrote :

Patch 1

Keyur (keyurdg) wrote :

Patch 2

Brian Aker (brianaker) on 2013-01-11
Changed in gearmand:
assignee: nobody → Brian Aker (brianaker)
Keyur (keyurdg) wrote :

Patches are based off of v1.0.2

Brian Aker (brianaker) on 2013-01-23
Changed in gearmand:
status: New → In Progress
importance: Undecided → Medium
Brian Aker (brianaker) wrote :

BTW sorry about the slow progress on merging this. I pulled in the test case first so that I could spend some time looking for different angles on how not testing the return could go wrong.

At this point I am comfortable enough with this, so I will go on and merge it.

Thanks for your patience.

Brian Aker (brianaker) on 2013-01-24
Changed in gearmand:
status: In Progress → Fix Committed
milestone: none → 1.0.3
Brian Aker (brianaker) on 2013-08-02
Changed in gearmand:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers