DB replicators tries to re-use connections on timeouts

Bug #1968224 reported by Tim Burke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

We've seen container-replicator logging something like

 ERROR reading HTTP response from {'device': 'd8271', ...}: Timeout (10.0s)

then a short time (<100ms) later,

 ERROR reading HTTP response from {'device': 'd8271', ...}:
 Traceback (most recent call last):
   File ".../swift/common/db_replicator.py", line 172, in replicate
     {'Content-Type': 'application/json'})
   File ".../eventlet/green/http/client.py", line 1310, in request
     self._send_request(method, url, body, headers, encode_chunked)
   File ".../eventlet/green/http/client.py", line 1345, in _send_request
     self.putrequest(method, url, **skips)
   File ".../swift/common/bufferedhttp.py", line 228, in putrequest
     skip_accept_encoding)
   File ".../eventlet/green/http/client.py", line 1169, in putrequest
     raise CannotSendRequest(self.__state)
 eventlet.green.http.client.CannotSendRequest: Request-sent

The trouble seems to be the exception handling in our ReplConnection.replicate() helper at https://github.com/openstack/swift/blob/2.29.1/swift/common/db_replicator.py#L169-L179 -- if there's a timeout while waiting on getresponse(), we just carry on and let the replicator (try to) continue using the connection.

We should force the socket to close and re-initialize the connection.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/837038

Changed in swift:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/837038
Committed: https://opendev.org/openstack/swift/commit/f6f474e429af11ea5566f408d5e69e86c2cec977
Submitter: "Zuul (22348)"
Branch: master

commit f6f474e429af11ea5566f408d5e69e86c2cec977
Author: Tim Burke <email address hidden>
Date: Thu Apr 7 14:54:16 2022 -0700

    db: Close ReplConnection sockets on errors/timeouts

    This could happen when there was a timeout running _sync_shard_ranges()
    in _choose_replication_mode() -- syncing of shard ranges failed, but we
    still want to attempt to replicate. If we close the socket, the
    connection will automatically spin up a new one the next time we call
    request() instead of raising CannotSendRequest.

    Change-Id: I242351078e26213f43c1ccc0fed534b64aa29ab6
    Closes-Bug: #1968224

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.30.0

This issue was fixed in the openstack/swift 2.30.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.