[patch included] libmemcached exhibits odd behaviour on TCP connection close

Bug #1270651 reported by Rob Day on 2014-01-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libmemcached
Medium
Brian Aker

Bug Description

I created a libmemcached client connection to a memcached server over TCP. This connection was long-lived and infrequently used. At some point, my memcached server was restarted, which caused it to close the libmemcached connection with a TCP FIN, which was ACKed.

A short while later (when the memcached server was back up and running), I tried to send a request over the libmemcached client connection. libmemcached tried to use the previous connection that had been closed with a FIN, which resulted in a TCP RST from the other end. This failure caused libmemcached to set a backoff timeout on the connection, so that even though the TCP FIN had arrived several minutes ago, the memcached server was running again and the reconnection attempt succeeded, libmemcached wouldn't let me make any more requests on that connection for a short while.

I also noticed that the behaviour varied depending on what type of request I was making - if I tried to store data in memcached, the error code was "MEMCACHED_CONNECTION_FAILED" (which was useful as I could simply notice that and retry the request) but when retrieving data from memcached the error code was "MEMCACHED_NOT_FOUND" which was much less useful.

The attached patch (against the v1.0 trunk, though I was using libmemcached 1.0.10 when I saw this behaviour) fixes both these issues:
* for TCP connections, an failure to send data will never set a backoff timeout on the server - only a failure to connect will do that. This protects against the case where a remote server drops the connection, but this is not noticed for a short while, during which the server recovers. The request to send data will still fail, but if the request is retried, libmemcached will successfully re-establish the connection and send the data rather than rejecting the request due to backoff logic. (The behaviour for UDP is unchanged.)

* for get/gets requests, if there is no MEMCACHED_SUCCESS result and one or more servers returned a MEMCACHED_CONNECTION_FAILED error, the return code will be MEMCACHED_CONNECTION_FAILED rather than MEMCACHED_NOT_FOUND. This lets the client make informed decisions about whether to retry.

I'm afraid my editor has stripped trailing spaces in a couple of places - hopefully that's not too confusing.

I have permission from my employer to contribute this back under libmemcached's 3-clause BSD license.

Rob Day (rkd-u) wrote :
Rob Day (rkd-u) wrote :

https://bugs.launchpad.net/libmemcached/+bug/1251482 seems to be fixing a similar issue, but I think my fix has the advantage that if a server is genuinely down (e.g. the attempt to reconnect over TCP fails), backoff logic will still kick in and protect it, rather than being bypassed entirely.

Download full text (3.2 KiB)

Investigating

On Jan 19, 2014, at 3:14 PM, Rob Day <email address hidden> wrote:

> https://bugs.launchpad.net/libmemcached/+bug/1251482 seems to be fixing
> a similar issue, but I think my fix has the advantage that if a server
> is genuinely down (e.g. the attempt to reconnect over TCP fails),
> backoff logic will still kick in and protect it, rather than being
> bypassed entirely.
>
> --
> You received this bug notification because you are subscribed to
> libmemcached.
> https://bugs.launchpad.net/bugs/1270651
>
> Title:
> [patch included] libmemcached exhibits odd behaviour on TCP connection
> close
>
> Status in libmemcached - A C and C++ client library for memcached:
> New
>
> Bug description:
> I created a libmemcached client connection to a memcached server over
> TCP. This connection was long-lived and infrequently used. At some
> point, my memcached server was restarted, which caused it to close the
> libmemcached connection with a TCP FIN, which was ACKed.
>
> A short while later (when the memcached server was back up and
> running), I tried to send a request over the libmemcached client
> connection. libmemcached tried to use the previous connection that had
> been closed with a FIN, which resulted in a TCP RST from the other
> end. This failure caused libmemcached to set a backoff timeout on the
> connection, so that even though the TCP FIN had arrived several
> minutes ago, the memcached server was running again and the
> reconnection attempt succeeded, libmemcached wouldn't let me make any
> more requests on that connection for a short while.
>
> I also noticed that the behaviour varied depending on what type of
> request I was making - if I tried to store data in memcached, the
> error code was "MEMCACHED_CONNECTION_FAILED" (which was useful as I
> could simply notice that and retry the request) but when retrieving
> data from memcached the error code was "MEMCACHED_NOT_FOUND" which was
> much less useful.
>
> The attached patch (against the v1.0 trunk, though I was using libmemcached 1.0.10 when I saw this behaviour) fixes both these issues:
> * for TCP connections, an failure to send data will never set a backoff timeout on the server - only a failure to connect will do that. This protects against the case where a remote server drops the connection, but this is not noticed for a short while, during which the server recovers. The request to send data will still fail, but if the request is retried, libmemcached will successfully re-establish the connection and send the data rather than rejecting the request due to backoff logic. (The behaviour for UDP is unchanged.)
>
> * for get/gets requests, if there is no MEMCACHED_SUCCESS result and
> one or more servers returned a MEMCACHED_CONNECTION_FAILED error, the
> return code will be MEMCACHED_CONNECTION_FAILED rather than
> MEMCACHED_NOT_FOUND. This lets the client make informed decisions
> about whether to retry.
>
> I'm afraid my editor has stripped trailing spaces in a couple of
> places - hopefully that's not too confusing.
>
> I have permission from my employer to contribute this back under
> libmemcached's 3-clause BSD license.
>
> To ...

Read more...

Brian Aker (brianaker) on 2014-02-01
Changed in libmemcached:
importance: Undecided → Medium
assignee: nobody → Brian Aker (brianaker)
status: New → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers