[patch included] libmemcached exhibits odd behaviour on TCP connection close
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libmemcached |
In Progress
|
Medium
|
Brian Aker |
Bug Description
I created a libmemcached client connection to a memcached server over TCP. This connection was long-lived and infrequently used. At some point, my memcached server was restarted, which caused it to close the libmemcached connection with a TCP FIN, which was ACKed.
A short while later (when the memcached server was back up and running), I tried to send a request over the libmemcached client connection. libmemcached tried to use the previous connection that had been closed with a FIN, which resulted in a TCP RST from the other end. This failure caused libmemcached to set a backoff timeout on the connection, so that even though the TCP FIN had arrived several minutes ago, the memcached server was running again and the reconnection attempt succeeded, libmemcached wouldn't let me make any more requests on that connection for a short while.
I also noticed that the behaviour varied depending on what type of request I was making - if I tried to store data in memcached, the error code was "MEMCACHED_
The attached patch (against the v1.0 trunk, though I was using libmemcached 1.0.10 when I saw this behaviour) fixes both these issues:
* for TCP connections, an failure to send data will never set a backoff timeout on the server - only a failure to connect will do that. This protects against the case where a remote server drops the connection, but this is not noticed for a short while, during which the server recovers. The request to send data will still fail, but if the request is retried, libmemcached will successfully re-establish the connection and send the data rather than rejecting the request due to backoff logic. (The behaviour for UDP is unchanged.)
* for get/gets requests, if there is no MEMCACHED_SUCCESS result and one or more servers returned a MEMCACHED_
I'm afraid my editor has stripped trailing spaces in a couple of places - hopefully that's not too confusing.
I have permission from my employer to contribute this back under libmemcached's 3-clause BSD license.
Changed in libmemcached: | |
importance: | Undecided → Medium |
assignee: | nobody → Brian Aker (brianaker) |
status: | New → In Progress |
https:/ /bugs.launchpad .net/libmemcach ed/+bug/ 1251482 seems to be fixing a similar issue, but I think my fix has the advantage that if a server is genuinely down (e.g. the attempt to reconnect over TCP fails), backoff logic will still kick in and protect it, rather than being bypassed entirely.