libmemcached autoeject/conn retry problem
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libmemcached |
Fix Released
|
Medium
|
Brian Aker |
Bug Description
libmemcached's auto eject + retry support seems broken at the moment. I found other reports of the same issue: https:/
My test setup for this was 3 memcached services running on the same box on different ports. The client was using ketama
remove_
Basically if I shut down one of the memcached services, the client would always keep on trying to set to it and return MEMCACHED_
Traced the problem a bit and turns out the retry timeout in this case is never set. Also the code repeatedly simply calls set_last_
Attached patch got both issues solved for me and now in case I turn off a server it drops it from the list for the retry timeout wait period after first having returned MEMCACHED_
Changed in libmemcached: | |
status: | New → In Progress |
status: | In Progress → Fix Committed |
Changed in libmemcached: | |
status: | Fix Committed → Fix Released |
Hi!
Have you tried running the test suite once you have applied your patch?
The auto-eject support was a patch from twitter, and the behavior is not very defined but there are test cases in place to make sure that to some degree it continues to work. In the latest version of libmemcached we have memcached_ create( "--REMOVE- FAILED- SERVERS= 3") which has a better defined behavior (if you are using the older behavior set system this would be MEMCACHED_ BEHAVIOR_ REMOVE_ FAILED_ SERVERS) .
I am on vacation this week so my responses will be a bit slow. I'll toss the patch into the regression system later on and see what it shows. From a glance I don't think you want to have the additional if on the last_disconnect ed_host since that would setup a situation where one host gets rejected, but there after no other host can be rejected.
Cheers,
-Brian
On May 4, 2011, at 11:03 PM, Hannu Valtonen wrote:
> ** Patch added: "libmemcached_ ha.diff" /bugs.launchpad .net/bugs/ 777672/ +attachment/ 2114022/ +files/ libmemcached_ ha.diff /bugs.launchpad .net/bugs/ 777672 /github. com/lericson/ pylibmc/ issues/ 32 and the a patch from code.google. com/p libmemcached/ source/ browse/ trunk/patches/ fail_over. patch. failed_ servers, failure_limit of 2, and retry_timeout of 10. SERVER_ MARKED_ DEAD; What I mean is it didn't try disconnected_ host() and never gets to the switch gate at the SERVER_ MARKED_ DEAD once.
> https:/
>
> --
> You received this bug notification because you are subscribed to
> libmemcached.
> https:/
>
> Title:
> libmemcached autoeject/conn retry problem
>
> Status in libmemcached - A C and C++ client library for memcached:
> New
>
> Bug description:
> libmemcached's auto eject + retry support seems broken at the moment.
> I found other reports of the same issue:
> https:/
> someone that does almost the same thing: http://
> /python-
>
> My test setup for this was 3 memcached services running on the same box on different ports. The client was using ketama
> remove_
>
> Basically if I shut down one of the memcached services, the client
> would always keep on trying to set to it and return
> MEMCACHED_
> reconnecting to it, it just tried setting the key to a server it
> thought was live but was actually marked dead internally.
>
> Traced the problem a bit and turns out the retry timeout in this case
> is never set. Also the code repeatedly simply calls
> set_last_
> end of memcached_connect() so it'd try reconnecting.
>
> Attached patch got both issues solved for me and now in case I turn
> off a server it drops it from the list for the retry timeout wait
> period after first having returned MEMCACHED_
> It also cleanly reconnects after the timeout if I start the memcached
> service up again.