incorrect handling of server restart
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libmemcached |
Fix Released
|
Undecided
|
Brian Aker |
Bug Description
libmemcached version: 1.0.3, 1.0.4
we have the following setup:
- backend (membase/couchbase: several sasl buckets)
- libmemcached-based client (c++, multi-threaded, uses memcached_pool, single server record, binary protocol with sasl authentication, tcp sockets)
client is organized to dispatch every single get/set/del request to distinct thread (thread pool is used). upon receiving task, thread obtains connection from memcached_pool related to backend bucket specified in request, performs needed memcached_* calls to process request, returns backend connection to memcached_pool, and waits for next request.
everything goes ok until server is restarted.
after server restarts, next request gets CONNECTION_FAILURE result (which is ok). and then comes the problem: all following memcached_set calls (performed with in-between time interval less than memcached_
some info on failing requests:
memcached_set() returns "WRITE FAILURE" (5)
error stack(root): {
"(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/
"(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/
}
error stack(server-0): {
"(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/
"(166484608) SERVER HAS FAILED AND IS DISABLED UNTIL TIMED RETRY, host: 192.168.65.3:11211 -> libmemcached/
}
there was dumb (in terms of overall libmemcached code awareness) attempt to add the following check at the very beginning of <void memcached_
-------
if ( ptr->state == MEMCACHED_
return;
-------
this DID help with described problem, but at the same time it broke other functionality.
Changed in libmemcached: | |
status: | Incomplete → Fix Released |
That for using the word "code awareness", it greatly increases the odds that one of us will spend more time on this :)
Let me see if we can simulate this behavior in a test case.