Connection between client and proxy service does not closes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
High
|
drax |
Bug Description
In Object GET operation, in case object service crashes, client connection is not closed as eventlet layer. For certain client if they have not introduced a timeout for chunks then they will hang indefinitely.
Below iterator is used to send data to client :
================
def _make_app_
"""
Returns an iterator over the contents of the source (via its read
func). There is also quite a bit of cleanup to ensure garbage
collection works and the underlying socket of the source is closed.
:param req: incoming request object
:param source: The httplib.Response object this iterator should read
:param node: The node the source is reading from, for logging purposes.
"""
#NOTE: I am not pasting the complete code, but areas where code can create issues.
#NOTE: in case chunk is empty by Object service this iterator will finish without any exception. Rather exit gracefully although complete body of the object was not read.
if not chunk:
================
Please note in case Object service crashes in between when transferring Body of the object this iterator will not any exception rather it will simply finish and above layer of eventlet.wsgi won't close the connection.
Below is the function which is called in an loop in HTTPServer.py. function is overridden in eventlet.
================
def handle_
if self.server.
#NOTE: As per the client it is still expecting chunk from proxy service , but as object service has crashed and iterator finished cleanly it cannot send anymore data to client. But this statement will indefinitely hang now.
try:
if not self.raw_
return
================
In a normal Object GET operation, whenever the Client gets all the data from the proxy service it closes the connection from its end and the "readline" call returns empty string immediately. As raw_requestline is empty this will set the close_connection to true and cause the request loop to finish at BaseRequestHandler level
================
class HTTPServer(
def handle(self):
"""Handle multiple requests if necessary."""
while not self.close_
class BaseRequestHandler:
def __init__(self, request, client_address, server):
self.server = server
try:
================
In the cases where the iterator finishes gracefully without checking that completion of body transfer this issue will occur.
I have below fix to resolve this issue in "_make_app_iter" funtion.
================
def _make_app_
+ content_length = source.length or 0
if not chunk:
+ if (content_length - bytes_read_
+ self.app.
+ raise
except GeneratorExit:
if not req.environ.
+ if (content_length - bytes_read_
+ self.app.
+ raise
================
In case whenever generator exits, it will check whether content length requested is completed or not . In case of not an exception will be generated.
description: | updated |
description: | updated |
Changed in swift: | |
assignee: | nobody → drax (devesh-gupta) |
Changed in swift: | |
status: | In Progress → Confirmed |
Ok, I'd say there's something going on here - it's probably terrible.
First I uploaded a 100MB object into a replicated storage policy:
$ swift stat test !$ octet-stream 5852449ed940bfc 51 3f6a4018- 00570d3e7a
swift stat test big.test
Account: AUTH_test
Container: test
Object: big.test
Content Type: application/
Content Length: 104857600
Last Modified: Tue, 12 Apr 2016 18:28:47 GMT
ETag: 2f282b84e7e608d
Meta Mtime: 1460485719.000000
Accept-Ranges: bytes
X-Timestamp: 1460485726.32074
X-Trans-Id: tx155cddfa10d44
Then I started to download it - *slowly* [1]
Then I sang that song about dem bones while looking at my script connected to the proxy and the proxy connected to the object server:
$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:45703 localhost:44453 ESTABLISHED 2265/python
tcp 0 0 localhost:59865 localhost:http-alt TIME_WAIT -
tcp 6117196 0 localhost:51880 localhost:6040 ESTABLISHED 2524/python
tcp 0 0 localhost:59077 localhost:43993 ESTABLISHED 2264/python
tcp 0 0 saio:ssh 10.0.2.2:50402 ESTABLISHED 2420/sshd: vagrant
tcp 0 0 saio:ssh 10.0.2.2:50398 ESTABLISHED 2273/sshd: vagrant
tcp 0 0 localhost:39449 localhost:6022 TIME_WAIT -
tcp 0 3971935 localhost:http-alt localhost:59867 ESTABLISHED 2524/python
tcp 0 0 localhost:43993 localhost:59077 ESTABLISHED 2264/python
tcp 0 0 localhost:56920 localhost:46044 ESTABLISHED 2263/python
tcp 3172362 0 localhost:59867 localhost:http-alt ESTABLISHED 2553/python
tcp 0 0 localhost:44453 localhost:45703 ESTABLISHED 2265/python
tcp 0 0 localhost:46044 localhost:56920 ESTABLISHED 2263/python
tcp 0 0 localhost:11211 localhost:39119 ESTABLISHED 914/memcached
tcp 0 4097781 localhost:6040 localhost:51880 ESTABLISHED 2263/python
tcp 0 0 localhost:55820 localhost:6021 TIME_WAIT -
tcp 0 0 localhost:39119 localhost:11211 ESTABLISHED 2524/python
I had previously looked at the pids for my object servers' so I knew that 2263 with all the bytes in the Send-Q was the object server apparently servicing this request.
I killed it.
$ kill -9 2263
$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:45703 localhost:44453 ESTABLISHED 2265/python
tcp 0 0 localhost:59865 localhost:http-alt TIME_WAIT -
tcp 6113698 0 localhost:51880 localhost:6040 ESTABLISHED 2524/pytho...