This is weird. I was able to reproduce the problem; the proxy server accumulates a bunch of open filehandles for dead sockets.
I can see with strace that the sockets in question are used for client <--> proxy communication. We're not leaking connections to the storage backends. Also, it looks like someone is trying to clean them up, but is just bad at it. Check this out:
# we have a GET request from the client; the socket is fd 220
accept(4, {sa_family=AF_INET, sin_port=htons(38280), sin_addr=inet_addr("127.0.0.1")}, [16]) = 220
fcntl(220, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl(220, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
sendto(3, "<139>proxy-server: STDERR: (2834"..., 65, 0, NULL, 0) = 65
accept(4, 0x7fffa0e778c0, [16]) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(220, "GET /v1/AUTH_test/test/largefile"..., 65536, 0, NULL, NULL) = 160
# ... removed stuff talking to memcached + storage backends
# then we try to send something and learn that it's shut down, so SIGPIPE
sendto(220, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---
# stupidly, we try to send it again ang get another SIGPIPE
sendto(220, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---
# attempted cleanup...
shutdown(220, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected)
# and now we completely forget about fd 220 and it's never mentioned again. there's the leak.
poll([{fd=4, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 13) = 1 ([{fd=4, revents=POLLIN}])
# ...thousands more lines not mentioning fd 220
What's weird here is that eventlet is doing this at least somewhat intentionally. Here's an annotated eventlet.wsgi.HttpProtocol.finish():
def finish(self):
try:
# this tries to flush any buffers; this is probably what
# causes the second sendto()/SIGPIPE pair, but I have not
# verified that. BaseHTTPServer.BaseHTTPRequestHandler.finish(self)
except socket.error as e:
# Broken pipe, connection reset by peer
if support.get_errno(e) not in BROKEN_SOCK: raise
# This is responsible for the shutdown call. It executes every time;
# the try/except above doesn't let any exceptions out that would
# exit this method early. greenio.shutdown_safe(self.connection)
# Here's the fun part: this method call gets executed, but it
# doesn't make any syscalls. Something is causing this to be a no-op,
# but I don't know what it is. self.connection.close()
Looks like it might be a bug in eventlet, maybe? I'm not convinced this bug is fixable from within Swift.
This is weird. I was able to reproduce the problem; the proxy server accumulates a bunch of open filehandles for dead sockets.
I can see with strace that the sockets in question are used for client <--> proxy communication. We're not leaking connections to the storage backends. Also, it looks like someone is trying to clean them up, but is just bad at it. Check this out:
# we have a GET request from the client; the socket is fd 220 htons(38280) , sin_addr= inet_addr( "127.0. 0.1")}, [16]) = 220 test/test/ largefile" ..., 65536, 0, NULL, NULL) = 160
accept(4, {sa_family=AF_INET, sin_port=
fcntl(220, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
fcntl(220, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
sendto(3, "<139>proxy-server: STDERR: (2834"..., 65, 0, NULL, 0) = 65
accept(4, 0x7fffa0e778c0, [16]) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(220, "GET /v1/AUTH_
# ... removed stuff talking to memcached + storage backends
# then we try to send something and learn that it's shut down, so SIGPIPE 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe) 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0\0\0\0\ 0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
sendto(220, "\0\0\0\
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---
# stupidly, we try to send it again ang get another SIGPIPE
sendto(220, "\0\0\0\
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---
# attempted cleanup...
shutdown(220, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected)
# and now we completely forget about fd 220 and it's never mentioned again. there's the leak. POLLIN| POLLPRI| POLLERR| POLLHUP} ], 1, 13) = 1 ([{fd=4, revents=POLLIN}])
poll([{fd=4, events=
# ...thousands more lines not mentioning fd 220
What's weird here is that eventlet is doing this at least somewhat intentionally. Here's an annotated eventlet. wsgi.HttpProtoc ol.finish( ):
def finish(self):
BaseHTTPSe rver.BaseHTTPRe questHandler. finish( self) get_errno( e) not in BROKEN_SOCK:
raise
greenio. shutdown_ safe(self. connection)
try:
# this tries to flush any buffers; this is probably what
# causes the second sendto()/SIGPIPE pair, but I have not
# verified that.
except socket.error as e:
# Broken pipe, connection reset by peer
if support.
# This is responsible for the shutdown call. It executes every time;
# the try/except above doesn't let any exceptions out that would
# exit this method early.
# Here's the fun part: this method call gets executed, but it
self.connectio n.close( )
# doesn't make any syscalls. Something is causing this to be a no-op,
# but I don't know what it is.
Looks like it might be a bug in eventlet, maybe? I'm not convinced this bug is fixable from within Swift.