Comment 2 for bug 777672

Revision history for this message
Brian Aker (brianaker) wrote : Re: [Bug 777672] Re: libmemcached autoeject/conn retry problem

Hi!

Have you tried running the test suite once you have applied your patch?

The auto-eject support was a patch from twitter, and the behavior is not very defined but there are test cases in place to make sure that to some degree it continues to work. In the latest version of libmemcached we have memcached_create("--REMOVE-FAILED-SERVERS=3") which has a better defined behavior (if you are using the older behavior set system this would be MEMCACHED_BEHAVIOR_REMOVE_FAILED_SERVERS).

I am on vacation this week so my responses will be a bit slow. I'll toss the patch into the regression system later on and see what it shows. From a glance I don't think you want to have the additional if on the last_disconnected_host since that would setup a situation where one host gets rejected, but there after no other host can be rejected.

Cheers,
 -Brian

On May 4, 2011, at 11:03 PM, Hannu Valtonen wrote:

> ** Patch added: "libmemcached_ha.diff"
> https://bugs.launchpad.net/bugs/777672/+attachment/2114022/+files/libmemcached_ha.diff
>
> --
> You received this bug notification because you are subscribed to
> libmemcached.
> https://bugs.launchpad.net/bugs/777672
>
> Title:
> libmemcached autoeject/conn retry problem
>
> Status in libmemcached - A C and C++ client library for memcached:
> New
>
> Bug description:
> libmemcached's auto eject + retry support seems broken at the moment.
> I found other reports of the same issue:
> https://github.com/lericson/pylibmc/issues/32 and the a patch from
> someone that does almost the same thing: http://code.google.com/p
> /python-libmemcached/source/browse/trunk/patches/fail_over.patch.
>
> My test setup for this was 3 memcached services running on the same box on different ports. The client was using ketama
> remove_failed_servers, failure_limit of 2, and retry_timeout of 10.
>
> Basically if I shut down one of the memcached services, the client
> would always keep on trying to set to it and return
> MEMCACHED_SERVER_MARKED_DEAD; What I mean is it didn't try
> reconnecting to it, it just tried setting the key to a server it
> thought was live but was actually marked dead internally.
>
> Traced the problem a bit and turns out the retry timeout in this case
> is never set. Also the code repeatedly simply calls
> set_last_disconnected_host() and never gets to the switch gate at the
> end of memcached_connect() so it'd try reconnecting.
>
> Attached patch got both issues solved for me and now in case I turn
> off a server it drops it from the list for the retry timeout wait
> period after first having returned MEMCACHED_SERVER_MARKED_DEAD once.
> It also cleanly reconnects after the timeout if I start the memcached
> service up again.