Comment 11 for bug 777672

Revision history for this message
Brian Aker (brianaker) wrote : Re: [Bug 777672] libmemcached autoeject/conn retry problem

Hi!

The big problem with fixing this is the following:

1) We don't have a defined behavior. What should the behavior be, how should it fail.

2) We need some test cases to double check the behavior.

If we have the above, we can define the solution (which has always been the problem with auto-eject, the original patch from twitter was not complete).

Cheers,
 -Brian

On Jul 30, 2011, at 9:45 AM, Matt Reiferson wrote:

> Hi,
>
> Just wanted to throw my hat into the ring here and see if I can lend a
> hand in helping test or shepherd this patch through into a release.
>
> We're seeing similar issues around the use of this setting.
>
> Thanks,
>
> Matt
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/777672
>
> Title:
> libmemcached autoeject/conn retry problem
>
> Status in libmemcached - A C and C++ client library for memcached:
> New
>
> Bug description:
> libmemcached's auto eject + retry support seems broken at the moment.
> I found other reports of the same issue:
> https://github.com/lericson/pylibmc/issues/32 and the a patch from
> someone that does almost the same thing: http://code.google.com/p
> /python-libmemcached/source/browse/trunk/patches/fail_over.patch.
>
> My test setup for this was 3 memcached services running on the same box on different ports. The client was using ketama
> remove_failed_servers, failure_limit of 2, and retry_timeout of 10.
>
> Basically if I shut down one of the memcached services, the client
> would always keep on trying to set to it and return
> MEMCACHED_SERVER_MARKED_DEAD; What I mean is it didn't try
> reconnecting to it, it just tried setting the key to a server it
> thought was live but was actually marked dead internally.
>
> Traced the problem a bit and turns out the retry timeout in this case
> is never set. Also the code repeatedly simply calls
> set_last_disconnected_host() and never gets to the switch gate at the
> end of memcached_connect() so it'd try reconnecting.
>
> Attached patch got both issues solved for me and now in case I turn
> off a server it drops it from the list for the retry timeout wait
> period after first having returned MEMCACHED_SERVER_MARKED_DEAD once.
> It also cleanly reconnects after the timeout if I start the memcached
> service up again.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libmemcached/+bug/777672/+subscriptions