OpenStack Object Storage (swift)

DB replicators tries to re-use connections on timeouts

Bug #1968224 reported by Tim Burke on 2022-04-07

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	Undecided	Unassigned

Bug Description

We've seen container-replicator logging something like

ERROR reading HTTP response from {'device': 'd8271', ...}: Timeout (10.0s)

then a short time (<100ms) later,

ERROR reading HTTP response from {'device': 'd8271', ...}:
Traceback (most recent call last):
   File ".../swift/common/db_replicator.py", line 172, in replicate
     {'Content-Type': 'application/json'})
   File ".../eventlet/green/http/client.py", line 1310, in request
     self._send_request(method, url, body, headers, encode_chunked)
   File ".../eventlet/green/http/client.py", line 1345, in _send_request
     self.putrequest(method, url, **skips)
   File ".../swift/common/bufferedhttp.py", line 228, in putrequest
     skip_accept_encoding)
   File ".../eventlet/green/http/client.py", line 1169, in putrequest
     raise CannotSendRequest(self.__state)
eventlet.green.http.client.CannotSendRequest: Request-sent

The trouble seems to be the exception handling in our ReplConnection.replicate() helper at https://github.com/openstack/swift/blob/2.29.1/swift/common/db_replicator.py#L169-L179 -- if there's a timeout while waiting on getresponse(), we just carry on and let the replicator (try to) continue using the connection.

We should force the socket to close and re-initialize the connection.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-04-07: Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/837038

Changed in swift:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-21: Fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/837038
Committed: https://opendev.org/openstack/swift/commit/f6f474e429af11ea5566f408d5e69e86c2cec977
Submitter: "Zuul (22348)"
Branch: master

commit f6f474e429af11ea5566f408d5e69e86c2cec977
Author: Tim Burke <email address hidden>
Date: Thu Apr 7 14:54:16 2022 -0700

db: Close ReplConnection sockets on errors/timeouts

    This could happen when there was a timeout running _sync_shard_ranges()
    in _choose_replication_mode() -- syncing of shard ranges failed, but we
    still want to attempt to replicate. If we close the socket, the
    connection will automatically spin up a new one the next time we call
    request() instead of raising CannotSendRequest.

Change-Id: I242351078e26213f43c1ccc0fed534b64aa29ab6
Closes-Bug: #1968224

Changed in swift:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-08-24: Fix included in openstack/swift 2.30.0

This issue was fixed in the openstack/swift 2.30.0 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.