libmemcached

Bug #928696
Comment #11

Comment 11 for bug 928696

Revision history for this message

Brian Aker (brianaker) wrote on 2013-07-13: Re: [Bug 928696] Re: incorrect handling of server restart

#11

Thanks, I will look at it.

On Jul 12, 2013, at 13:01, Don MacAskill <email address hidden> wrote:

> As noted, this is *not* fixed. We're testing nathanael-foy's fix right
> now, with promising results.
>
> See: https://github.com/onethumb/libmemcached/pull/1
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/928696
>
> Title:
> incorrect handling of server restart
>
> Status in libmemcached - A C and C++ client library for memcached:
> Fix Released
>
> Bug description:
> libmemcached version: 1.0.3, 1.0.4
>
> we have the following setup:
> - backend (membase/couchbase: several sasl buckets)
> - libmemcached-based client (c++, multi-threaded, uses memcached_pool, single server record, binary protocol with sasl authentication, tcp sockets)
>
> client is organized to dispatch every single get/set/del request to
> distinct thread (thread pool is used). upon receiving task, thread
> obtains connection from memcached_pool related to backend bucket
> specified in request, performs needed memcached_* calls to process
> request, returns backend connection to memcached_pool, and waits for
> next request.
>
> everything goes ok until server is restarted.
> after server restarts, next request gets CONNECTION_FAILURE result (which is ok). and then comes the problem: all following memcached_set calls (performed with in-between time interval less than memcached_st::retry_timeout) get WRITE_FAILURE result. hours of gdb'ing revealed that every such request "renews" memcached_server_write_instance_st::next_retry field. this does not cause any problems if requests come less frequently than memcached_st::retry_timeout - once memcached_server_write_instance_st::next_retry "expires", all functionality gets back to normal.
>
> some info on failing requests:
> memcached_set() returns "WRITE FAILURE" (5)
> error stack(root): {
> "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/storage.cc:180",
> "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/connect.cc:614"
> }
> error stack(server-0): {
> "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/storage.cc:180",
> "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/connect.cc:614"
> }
>
>
> there was dumb (in terms of overall libmemcached code awareness) attempt to add the following check at the very beginning of <void memcached_quit_server(memcached_server_st *ptr, bool io_death)>:
> --------------------------------------------
> if ( ptr->state == MEMCACHED_SERVER_STATE_IN_TIMEOUT )
> return;
> --------------------------------------------
> this DID help with described problem, but at the same time it broke other functionality.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libmemcached/+bug/928696/+subscriptions

Thanks, I will look at it.

On Jul 12, 2013, at 13:01, Don MacAskill <928696@bugs.launchpad.net> wrote:

> As noted, this is *not* fixed.  We're testing nathanael-foy's fix right
> now, with promising results.
>
> See:  https://github.com/onethumb/libmemcached/pull/1
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/928696
>
> Title:
>  incorrect handling of server restart
>
> Status in libmemcached - A C and C++ client library for memcached:
>  Fix Released
>
> Bug description:
>  libmemcached version: 1.0.3, 1.0.4
>
>  we have the following setup:
>   - backend (membase/couchbase: several sasl buckets)
>   - libmemcached-based client (c++, multi-threaded, uses memcached_pool, single server record, binary protocol with sasl authentication, tcp sockets)
>
>  client is organized to dispatch every single get/set/del request to
>  distinct thread (thread pool is used). upon receiving task, thread
>  obtains connection from memcached_pool related to backend bucket
>  specified in request, performs needed memcached_* calls to process
>  request, returns backend connection to memcached_pool, and waits for
>  next request.
>
>  everything goes ok until server is restarted.
>  after server restarts, next request gets CONNECTION_FAILURE result (which is ok). and then comes the problem: all following memcached_set calls (performed with in-between time interval less than memcached_st::retry_timeout) get WRITE_FAILURE result. hours of gdb'ing revealed that every such request "renews" memcached_server_write_instance_st::next_retry field. this does not cause any problems if requests come less frequently than memcached_st::retry_timeout - once memcached_server_write_instance_st::next_retry "expires", all functionality gets back to normal.
>
>  some info on failing requests:
>  memcached_set() returns "WRITE FAILURE" (5)
>  error stack(root): {
>      "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/storage.cc:180",
>      "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/connect.cc:614"
>  }
>  error stack(server-0): {
>      "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/storage.cc:180",
>      "(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/connect.cc:614"
>  }
>
>
>  there was dumb (in terms of overall libmemcached code awareness) attempt to add the following check at the very beginning of <void memcached_quit_server(memcached_server_st *ptr, bool io_death)>:
>  --------------------------------------------
>    if ( ptr->state == MEMCACHED_SERVER_STATE_IN_TIMEOUT )
>      return;
>  --------------------------------------------
>  this DID help with described problem, but at the same time it broke other functionality.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libmemcached/+bug/928696/+subscriptions