connect timeout does not work (library ignores the error)

Bug #778777 reported by Andrew Skalski
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libmemcached
Fix Released
Medium
Brian Aker

Bug Description

Connect timeouts are not working (libmemcached 0.49, CentOS 5 / Ubuntu 11.04). Here's strace output with connect-timeout and poll-timeout both set to 50 milliseconds:

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("192.168.1.222")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
... repeating forever ...

Also, I noticed in the strace that an extra space character " " is being added to the "get" command before the CRLF.

The attached code uses a hard-coded IP address of "192.168.1.222", which should be changed to a dead IP if it is not already.

Revision history for this message
Andrew Skalski (askalski) wrote :
Revision history for this message
Brian Aker (brianaker) wrote : Re: [Bug 778777] Re: connect timeout does not work (library ignores the error)

The additional space is there to hide and particularly old, and annoying bug in memcached (which has not existed in a while)

On May 6, 2011, at 11:45 AM, Andrew Skalski wrote:

> ** Attachment added: "timeout-test.c"
> https://bugs.launchpad.net/bugs/778777/+attachment/2117301/+files/timeout-test.c
>
> --
> You received this bug notification because you are subscribed to
> libmemcached.
> https://bugs.launchpad.net/bugs/778777
>
> Title:
> connect timeout does not work (library ignores the error)
>
> Status in libmemcached - A C and C++ client library for memcached:
> New
>
> Bug description:
> Connect timeouts are not working (libmemcached 0.49, CentOS 5 / Ubuntu
> 11.04). Here's strace output with connect-timeout and poll-timeout
> both set to 50 milliseconds:
>
> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
> fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("192.168.1.222")}, 16) = -1 EINPROGRESS (Operation now in progress)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> ... repeating forever ...
>
> Also, I noticed in the strace that an extra space character " " is
> being added to the "get" command before the CRLF.
>
> The attached code uses a hard-coded IP address of "192.168.1.222",
> which should be changed to a dead IP if it is not already.

Revision history for this message
Brian Aker (brianaker) wrote :

What is happening is that the timeout is happening, but it gets caught and is being retried.

Changed in libmemcached:
importance: Undecided → Medium
assignee: nobody → Brian Aker (brianaker)
status: New → In Progress
Brian Aker (brianaker)
Changed in libmemcached:
status: In Progress → Fix Committed
Brian Aker (brianaker)
Changed in libmemcached:
status: Fix Committed → Fix Released
Revision history for this message
Helder Martins (heldergaray) wrote :

Anyone has a patch for this? I'm having this problem in a production release, and increasing the version is not an option.

Revision history for this message
Brian Aker (brianaker) wrote :

Have you looked at the most recent release?

Cheers,
 -Brian

On Aug 23, 2012, at 10:01 AM, Helder Martins <email address hidden> wrote:

> Anyone has a patch for this? I'm having this problem in a production
> release, and increasing the version is not an option.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/778777
>
> Title:
> connect timeout does not work (library ignores the error)
>
> Status in libmemcached - A C and C++ client library for memcached:
> Fix Released
>
> Bug description:
> Connect timeouts are not working (libmemcached 0.49, CentOS 5 / Ubuntu
> 11.04). Here's strace output with connect-timeout and poll-timeout
> both set to 50 milliseconds:
>
> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
> fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(11211), sin_addr=inet_addr("192.168.1.222")}, 16) = -1 EINPROGRESS (Operation now in progress)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> sendto(3, "get test \r\n", 11, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> recvfrom(3, 0x156a478, 8196, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
> poll([{fd=3, events=POLLOUT}], 1, 50) = 0 (Timeout)
> ... repeating forever ...
>
> Also, I noticed in the strace that an extra space character " " is
> being added to the "get" command before the CRLF.
>
> The attached code uses a hard-coded IP address of "192.168.1.222",
> which should be changed to a dead IP if it is not already.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libmemcached/+bug/778777/+subscriptions

Revision history for this message
Helder Martins (heldergaray) wrote :

Yes, I'm trying to find the exact commit that would fix this issue, but with no sucess so far. I'm using libmemcached-0.49 in a system that is already at production, and I don't think that increasing the version of libmemcached would be an option. I'm in search of a patch to fix this, to apply it to 0.49. Judging from the date, I found out revision 945 (http://bazaar.launchpad.net/~tangent-org/libmemcached/trunk/revision/945#libmemcached/connect.cc), and my guess is that the diff at connect.cc fix this, but I'm not sure. And I imagine that this revision has several other branches merged into it. Could you point me to a branch/commit/diff that fixes this issue?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.