OpenStack Object Storage (swift)

Connection between client and proxy service does not closes

Bug #1568650 reported by drax on 2016-04-11

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	High	drax

Bug Description

In Object GET operation, in case object service crashes, client connection is not closed as eventlet layer. For certain client if they have not introduced a timeout for chunks then they will hang indefinitely.

Below iterator is used to send data to client :

================
   def _make_app_iter(self, req, node, source):
        """
        Returns an iterator over the contents of the source (via its read
        func). There is also quite a bit of cleanup to ensure garbage
        collection works and the underlying socket of the source is closed.

        :param req: incoming request object
        :param source: The httplib.Response object this iterator should read
                       from.
        :param node: The node the source is reading from, for logging purposes.
        """
        #NOTE: I am not pasting the complete code, but areas where code can create issues.
        #NOTE: in case chunk is empty by Object service this iterator will finish without any exception. Rather exit gracefully although complete body of the object was not read.
                if not chunk:
                    break
                with ChunkWriteTimeout(self.app.client_timeout, GenericTimeout):
                    yield chunk
================

Please note in case Object service crashes in between when transferring Body of the object this iterator will not any exception rather it will simply finish and above layer of eventlet.wsgi won't close the connection.

Below is the function which is called in an loop in HTTPServer.py. function is overridden in eventlet.wsgi.HttpProtocol class:

================
    def handle_one_request(self):
        if self.server.max_http_version:
            self.protocol_version = self.server.max_http_version

        #NOTE: As per the client it is still expecting chunk from proxy service , but as object service has crashed and iterator finished cleanly it cannot send anymore data to client. But this statement will indefinitely hang now.
        try:
            self.raw_requestline = self.rfile.readline(self.server.url_length_limit)

        if not self.raw_requestline:
            self.close_connection = 1
            return

================

In a normal Object GET operation, whenever the Client gets all the data from the proxy service it closes the connection from its end and the "readline" call returns empty string immediately. As raw_requestline is empty this will set the close_connection to true and cause the request loop to finish at BaseRequestHandler level

================
class HTTPServer(SocketServer.TCPServer):
    def handle(self):
        """Handle multiple requests if necessary."""
        self.close_connection = 1

        self.handle_one_request()
        while not self.close_connection:
            self.handle_one_request()

class BaseRequestHandler:
    def __init__(self, request, client_address, server):
        self.request = request
        self.client_address = client_address
        self.server = server
        try:
            self.setup()
            self.handle()
            self.finish()
================

In the cases where the iterator finishes gracefully without checking that completion of body transfer this issue will occur.
I have below fix to resolve this issue in "_make_app_iter" funtion.

================
def _make_app_iter(self, req, node, source):
+ content_length = source.length or 0
bytes_read_from_source = 0

if not chunk:
+ if (content_length - bytes_read_from_source) > 0:
+ self.app.logger.error(_('Incomplete bytes : %s' %bytes_read_from_source))
+ raise

         except GeneratorExit:
             if not req.environ.get('swift.non_client_disconnect'):
                 self.app.logger.warn(_('Client disconnected on read'))
+ if (content_length - bytes_read_from_source) > 0:
+ self.app.logger.error(_('Incomplete bytes : %s' %bytes_read_from_source))
+ raise

================
In case whenever generator exits, it will check whether content length requested is completed or not . In case of not an exception will be generated.

See original description

Tags:

drax (devesh-gupta) on 2016-04-11

description:	updated
description:	updated

Revision history for this message

clayg (clay-gerrard) wrote on 2016-04-12:

Download full text (13.3 KiB)

Ok, I'd say there's something going on here - it's probably terrible.

First I uploaded a 100MB object into a replicated storage policy:

$ swift stat test !$
swift stat test big.test
       Account: AUTH_test
     Container: test
        Object: big.test
  Content Type: application/octet-stream
Content Length: 104857600
Last Modified: Tue, 12 Apr 2016 18:28:47 GMT
          ETag: 2f282b84e7e608d5852449ed940bfc51
    Meta Mtime: 1460485719.000000
Accept-Ranges: bytes
   X-Timestamp: 1460485726.32074
    X-Trans-Id: tx155cddfa10d443f6a4018-00570d3e7a

Then I started to download it - *slowly* [1]

Then I sang that song about dem bones while looking at my script connected to the proxy and the proxy connected to the object server:

I had previously looked at the pids for my object servers' so I knew that 2263 with all the bytes in the Send-Q was the object server apparently servicing this request.

I killed it.

Ok, I'd say there's something going on here - it's probably terrible.

First I uploaded a 100MB object into a replicated storage policy:

$ swift stat test !$
swift stat test big.test
       Account: AUTH_test
     Container: test
        Object: big.test
  Content Type: application/octet-stream
Content Length: 104857600
 Last Modified: Tue, 12 Apr 2016 18:28:47 GMT
          ETag: 2f282b84e7e608d5852449ed940bfc51
    Meta Mtime: 1460485719.000000
 Accept-Ranges: bytes
   X-Timestamp: 1460485726.32074
    X-Trans-Id: tx155cddfa10d443f6a4018-00570d3e7a

Then I started to download it - *slowly* [1]

Then I sang that song about dem bones while looking at my script connected to the proxy and the proxy connected to the object server:

$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp        0      0 localhost:59865         localhost:http-alt      TIME_WAIT   -               
tcp   6117196      0 localhost:51880         localhost:6040          ESTABLISHED 2524/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0      0 localhost:39449         localhost:6022          TIME_WAIT   -               
tcp        0 3971935 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp        0      0 localhost:56920         localhost:46044         ESTABLISHED 2263/python     
tcp   3172362      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:46044         localhost:56920         ESTABLISHED 2263/python     
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0 4097781 localhost:6040          localhost:51880         ESTABLISHED 2263/python     
tcp        0      0 localhost:55820         localhost:6021          TIME_WAIT   -               
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

I had previously looked at the pids for my object servers' so I knew that 2263 with all the bytes in the Send-Q was the object server apparently servicing this request.

I killed it.

$ kill -9 2263
$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp        0      0 localhost:59865         localhost:http-alt      TIME_WAIT   -               
tcp   6113698      0 localhost:51880         localhost:6040          ESTABLISHED 2524/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0      0 localhost:39449         localhost:6022          TIME_WAIT   -               
tcp        0 4102901 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp   3203274      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:46044         localhost:56920         TIME_WAIT   -               
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0 2657156 localhost:6040          localhost:51880         FIN_WAIT1   -               
tcp        0      0 localhost:55820         localhost:6021          TIME_WAIT   -               
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

Voila - we're now in FIN_WAIT

And the object server knew what's up:

Apr 12 18:33:51 saio object-6040: Removing dead child 2263
Apr 12 18:33:51 saio object-6040: Started child 2559

The client was still happily eating bytes from out of it's plugged full Recv-Q and the proxy was happily shuffling bits into it's Send-Q slowly draining the dead object's buffers:

$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp        0      0 localhost:59865         localhost:http-alt      TIME_WAIT   -               
tcp   6111366      0 localhost:51880         localhost:6040          ESTABLISHED 2524/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0      0 localhost:39449         localhost:6022          TIME_WAIT   -               
tcp        0 3928980 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp   3268447      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:46044         localhost:56920         TIME_WAIT   -               
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0      0 localhost:6040          localhost:51880         FIN_WAIT2   -               
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

Notice 2524 is the proxy, and it's connection on :51880 is now CLOSE_WAIT and draining

vagrant@saio:~$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp   5886104      0 localhost:51880         localhost:6040          CLOSE_WAIT  2524/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0 3077701 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp   3212624      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:46044         localhost:56920         TIME_WAIT   -               
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0      0 localhost:6040          localhost:51880         FIN_WAIT2   -               
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

... all the way down to

vagrant@saio:~$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp   1560728      0 localhost:51880         localhost:6040          CLOSE_WAIT  2524/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0 2946735 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp   3266361      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0      0 localhost:6040          localhost:51880         FIN_WAIT2   -               
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

... zero

vagrant@saio:~$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0 2820327 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp   3255128      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

... *eventually* our slow client drains the proxy's Send-Q out:

$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0     36 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0      0 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp   2175455      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

... and it's own Recv-Q

$ sudo netstat -pt
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:45703         localhost:44453         ESTABLISHED 2265/python     
tcp        0      0 localhost:59077         localhost:43993         ESTABLISHED 2264/python     
tcp        0      0 saio:ssh                10.0.2.2:50402          ESTABLISHED 2420/sshd: vagrant 
tcp        0      0 saio:ssh                10.0.2.2:50398          ESTABLISHED 2273/sshd: vagrant 
tcp        0      0 localhost:http-alt      localhost:59867         ESTABLISHED 2524/python     
tcp        0      0 localhost:43993         localhost:59077         ESTABLISHED 2264/python     
tcp        0      0 localhost:59867         localhost:http-alt      ESTABLISHED 2553/python     
tcp        0      0 localhost:44453         localhost:45703         ESTABLISHED 2265/python     
tcp        0      0 localhost:11211         localhost:39119         ESTABLISHED 914/memcached   
tcp        0      0 localhost:39119         localhost:11211         ESTABLISHED 2524/python

And... here we sit.  Still connected.  But not filling any bytes or reading any bytes.

The proxy's already logged:

Apr 12 18:35:16 saio proxy-server: 127.0.0.1 127.0.0.1 12/Apr/2016/18/35/16 GET /v1/AUTH_test/test/big.test HTTP/1.0 200 - python-swiftclient-3.0.1.dev16 AUTH_tk413f78ca5... - 21090455 - txd6b4c07c23c4404e9c2ea-00570d3f71 - 115.2363 - - 1460486001.594772100 1460486116.831044912 0

So I don't think we're going to get out of this state - the proxy is treating the client (blocked on read) as if it's going to send another pipelined request or something?

Need to look into why the resuming getter isn't doing it's thing.

1. https://gist.github.com/clayg/2458a4a7e1451c75fbc5a63fcae11635

Changed in swift:
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

drax (devesh-gupta) wrote on 2016-04-13:

Clyag,

As per my analysis of the swift, eventlet and HTTPServer code. This problem is occurring due to the HTTPserver looping mechanism.

As already mentioned in the description, the loop will finish only when "self.close_connection" is set to 1 by the overridden function in wsgi.py.

In the situation mentioned above the iterator which is sending data to client gracefully exits when object service crashes.
As this is gracefull exit, handle_one_request function completes without setting self.close_connection to 1. Due to which the above loop calls this handle_one_request function again. As this function expects the client to send data initially (new request string) so it gets blocked as client is waiting for the data to be sent.

I have already mentioned the fix in description but i am not sure whether it will be required in some other places as well.

drax (devesh-gupta) on 2016-04-20

Changed in swift:
assignee:	nobody → drax (devesh-gupta)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-01: Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/323756

Changed in swift:
status:	Confirmed → In Progress

Revision history for this message

clayg (clay-gerrard) wrote on 2016-09-01:

So the problem is just that as the proxy is reading from the backend object node - once it gets to the end it doesn't really know that the EOF coming up out of the socket means the object-server closed the connection prematurely. I guess the ResummingGetter stuff only works if the failure from the backend is initiated by the proxy (i.e. a timeout) if the socket just dies proxy thinks that's the end of the object (even if it's < content-length)

here's a download of an object using my slow_download script where I kill -9'd the servicing object server:

Sep 1 21:56:24 saio proxy-server: 127.0.0.1 127.0.0.1 01/Sep/2016/21/56/24 GET /v1/AUTH_test/test/delete.me HTTP/1.0 200 - python-swiftclient-3.0.1.dev72 AUTH_tk4f571562b... - 88739429 - tx527790b89a084b478e5e7-0057c8a388 - 127.9797 - - 1472766856.030810118 1472766984.010467052 0

And here's the same download using swift command line client:

Sep 1 21:58:48 saio proxy-server: 127.0.0.1 127.0.0.1 01/Sep/2016/21/58/48 GET /v1/AUTH_test/test/delete.me HTTP/1.0 200 - python-swiftclient-3.0.1.dev72 AUTH_tk4f571562b... - 183797100 - tx4b89ab1148bf495ca4f99-0057c8a496 - 2.7687 - - 1472767126.060199022 1472767128.828860998 0

It's possible this all gets easier to fix now that we're tracking backend bytes [2]

1. https://gist.github.com/clayg/2458a4a7e1451c75fbc5a63fcae11635
2. https://review.openstack.org/#/c/363316/

clayg (clay-gerrard) on 2016-11-30

Changed in swift:
status:	In Progress → Confirmed

Revision history for this message

clayg (clay-gerrard) wrote on 2017-04-12:

probably related to lp bug #1572719

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-15: Related fix proposed to swift (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/575876

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-28: Related fix merged to swift (master)

Reviewed: https://review.openstack.org/575876
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=4a0afa9fea60bf2d1943b59a1f4ab49cb89e682a
Submitter: Zuul
Branch: master

commit 4a0afa9fea60bf2d1943b59a1f4ab49cb89e682a
Author: Samuel Merritt <email address hidden>
Date: Fri Jun 15 16:31:25 2018 -0700

Enforce Content-Length in catch_errors

    If a WSGI application produces the header "Content-Length: <N>" but
    does not produce exactly N bytes of response, then that is an error
    and an exception should be thrown so that the WSGI server can take the
    correct action (close the TCP connection for HTTP <= 1.1, something
    else for HTTP 2.0).

As part of this, I also fixed a bug in DLOs where a HEAD response
might have a body. The way it works is this:

* user makes HEAD request for DLO manifest

* DLO middleware makes GET request for container

     * authorize callback (e.g. from tempurl) replies 401 for container
       GET; response has a nonempty body (it's a GET response; that's
       fine)

* DLO notes that response is non-2xx, returns it as-is

* client gets response with nonempty body to a HEAD request

The fix there was simple; if the original request method was HEAD,
clear out the response body.

Change-Id: I74d8c13eba2a4917b5a116875b51a781b33a7abf
Related-Bug: 1568650

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-06: Fix merged to swift (master)

Reviewed: https://review.openstack.org/575254
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0e81ffd1e1481a73146fce17f61f2ab9e01eb684
Submitter: Zuul
Branch: master

commit 0e81ffd1e1481a73146fce17f61f2ab9e01eb684
Author: Samuel Merritt <email address hidden>
Date: Wed Jun 13 14:28:28 2018 -0700

Fix socket leak on object-server death

    Consider a client that's downloading a large replicated object of size
    N bytes. If the object server process dies (e.g. with a segfault)
    partway through the download, the proxy will have read fewer than N
    bytes, and then read(sockfd) will start returning 0 bytes. At this
    point, the proxy believes the object download is complete, and so the
    WSGI server waits for a new request to come in. Meanwhile, the client
    is waiting for the rest of their bytes. Until the client times out,
    that socket will be held open.

    The fix is to look at the Content-Length and Content-Range headers in
    the response from the object server, then retry with another object
    server in case the original GET is truncated. This way, the client
    gets all the bytes they should.

    Note that ResumingGetter already had retry logic for when an
    object-server is slow to send bytes -- this extends it to also cover
    unexpected disconnects.

    Change-Id: Iab1e07706193ddc86832fd2cff0d7c2cb6d79ad9
    Related-Change: I74d8c13eba2a4917b5a116875b51a781b33a7abf
    Closes-Bug: 1568650

Changed in swift:
status:	Confirmed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-03-25: Fix included in openstack/swift 2.21.0

This issue was fixed in the openstack/swift 2.21.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-23: Fix proposed to swift (stable/rocky)

#10

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/690644

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-24: Fix merged to swift (stable/rocky)

#11

Reviewed: https://review.opendev.org/690644
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e3739920c4db97f255ab973cf125f0091bb15428
Submitter: Zuul
Branch: stable/rocky

commit e3739920c4db97f255ab973cf125f0091bb15428
Author: Samuel Merritt <email address hidden>
Date: Wed Jun 13 14:28:28 2018 -0700

Fix socket leak on object-server death

    Note that ResumingGetter already had retry logic for when an
    object-server is slow to send bytes -- this extends it to also cover
    unexpected disconnects.

    Change-Id: Iab1e07706193ddc86832fd2cff0d7c2cb6d79ad9
    Related-Change: I74d8c13eba2a4917b5a116875b51a781b33a7abf
    Closes-Bug: 1568650
    (cherry picked from commit 0e81ffd1e1481a73146fce17f61f2ab9e01eb684)