Thanks, let me take a look at this. From a first glance it looks good. I need to see if there is any logical reason as to why next should not be reset like this.

On Sep 7, 2011, at 7:39 AM, David Matthew Bond wrote:

> The retry is failing in the 0.51 libmemcached Release.
> The retry is also failing with Branched 953 revision(s) downloaded with bzr branch lp:libmemcached. (Built on SUSE Linux 11 SP1).
> The problem is reproducable.
> 
> Scenario:
> Single Memcached Server (running on localhost).
> Application gets a memcached connection (we use
> memcached_set : SUCCESS
>>> Stop the memecached Server.
> memcached_set: UNKNOWN READ FAILURE
> memcached_set: CONNECTION FAILURE
> memcached_set: SERVER IS MARKED DEAD
> memcached_set: SERVER IS MARKED DEAD
> memcached_set: SERVER IS MARKED DEAD
>>> Next Retry is due.
> memcached_set: CONNECTION FAILURE
> memcached_set: SERVER IS MARKED DEAD
> memcached_set: SERVER IS MARKED DEAD
>>> memecached Server is started.
> memcached_set: SERVER IS MARKED DEAD
>>> Next Retry is due.
> memcached_set: CONNECTION FAILURE
> memcached_set: SERVER IS MARKED DEAD
> memcached_set: SERVER IS MARKED DEAD
> etc.
> 
> Reason: in connect.cc method network_connect there is a loop over all
> the address_info objects. However ptr->address_info_next is always NULL
> since the reconnect attempt which resulted in the CONNECTION FAILED
> error as this iterated throuch all the available addres_info objects and
> advanced address_info_next to NULL which never gets reset.
> 
>  /* Create the socket */
>  while (ptr->address_info_next && ptr->fd == INVALID_SOCKET)
> 
> To solve the problem the ptr->address_info_next needs to be reset to the first address_info 
>  ptr->address_info_next= ptr->address_info;
>  ptr->state= MEMCACHED_SERVER_STATE_ADDRINFO;
> 
> Then it works fine again.
> 
> Attached is a diff as a suggested patch - however only first looked at the libmemcached code yesterday so there may be a much better way to fix this by someone with more experience.
> Attached two test programs to reproduce this. One using a memcached pool and one without. 
> 
> Matt
> 
> 
> ** Patch added: "libmemcached_Branched953revision.diff"
>   https://bugs.launchpad.net/libmemcached/+bug/777672/+attachment/2367449/+files/libmemcached_Branched953revision.diff
> 
> -- 
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/777672
> 
> Title:
>  libmemcached autoeject/conn retry problem
> 
> Status in libmemcached - A C and C++ client library for memcached:
>  Fix Committed
> 
> Bug description:
>  libmemcached's auto eject + retry support seems broken at the moment.
>  I found other reports of the same issue:
>  https://github.com/lericson/pylibmc/issues/32 and the a patch from
>  someone that does almost the same thing: http://code.google.com/p
>  /python-libmemcached/source/browse/trunk/patches/fail_over.patch.
> 
>  My test setup for this was 3 memcached services running on the same box on different ports. The client was using ketama
>  remove_failed_servers, failure_limit of  2,  and retry_timeout of 10. 
> 
>  Basically if I shut down one of the memcached services, the client
>  would always keep on trying to set to it and return
>  MEMCACHED_SERVER_MARKED_DEAD; What I mean is it didn't try
>  reconnecting to it, it just tried setting the key to a server it
>  thought was live but was actually marked dead internally.
> 
>  Traced the problem a bit and turns out the retry timeout in this case
>  is never set.  Also the code repeatedly simply calls
>  set_last_disconnected_host() and never gets to the switch gate at the
>  end of memcached_connect() so it'd try reconnecting.
> 
>  Attached patch got both issues solved for me and now in case I turn
>  off a server it drops it from the list for the retry timeout wait
>  period after first having returned MEMCACHED_SERVER_MARKED_DEAD once.
>  It also cleanly reconnects after the timeout if I start the memcached
>  service up again.
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libmemcached/+bug/777672/+subscriptions