Connect syscall hangs in non-blocking mode
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libmemcached |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
In 0.44, I can confirm that if the connection is set to non-blocking, the connect() call will hang. This seems to be related to the fix that was introduced to solve this bug:
https:/
It seems to be that that patch fixes the blocking mode but breaks the non-blocking mode.
The relevant lines in connect.c
23 if (ptr->root-
24 timeout= -1;
If non-blocking is true there will be no timeout and as a result the poll() call after connect() will hang until the socket descriptor used in connect() generates an event. This is a snippet of the strace output that captures that moment:
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sa_family=AF_INET, sin_port=
poll([{fd=3, events=POLLOUT}], 1, -1
Please, correct me If I am wrong but we should be able to set a timeout for the connect() call in non-blocking mode as well.
As a possible fix I wonder if it would make sense to set the connect_timeout for both blocking and non-blocking?
Thanks
Changed in libmemcached: | |
status: | New → Fix Committed |
Changed in libmemcached: | |
status: | Fix Committed → Fix Released |
Related bug: https:/ /bugs.launchpad .net/libmemcach ed/+bug/ 583031
There are two places in the library where this code appears:
1) in connect.c:23-24 (connect_poll()), where it tries to connect to the remote socket;
2) in io.c:60-61 (io_wait()), where it waits for a blocked socket to become ready while transmitting data.
I'd like to point out that the library _always_ sets O_NONBLOCK on the socket, regardless of the behavior set by the user. The NO_BLOCK behavior only sets a linger structure for the socket for graceful socket shutdown. With this considered, the beforementioned code fragments are there to *simulate* blocking behavior by setting infinite poll timeouts on socket operations in non-blocking mode.
But in the current version (0.44), those two code fragments actually perform _opposite_ things - in the connect() part it sets infinite poll timeout for nonblocking mode, and in transmitting part it sets infinite poll timeout for blocking mode, which is weird.
To get the intended behavior, as I see it, both code fragments should set infinite timeouts for blocking mode to simulate completely blocking socket calls with nonblocking socket setup. But this approach is also flawed, because in that case we would get insane timeouts in this 'pseudo-blocking' mode.
With all that considered, I recommend removing blocking mode completely and use reasonably large user-set timeouts to achieve the same results.