Avoid calling sync for a job where suffixes list is empty

Bug #1886782 reported by Bhaskar Singhal
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

In swift/obj/replicator.py:

While replicating a partition, sometimes we end up calling sync[1] with an empty suffixes list. This causes the replication for that partition to fail[2], even though it shouldn't even replicate.

We should check for suffixes list being empty, just like we do at [3].

Also, if sync[1] fails, do we still want to send REPLICATE to remote?

[1] https://github.com/openstack/swift/blob/master/swift/obj/replicator.py#L691
[2] https://github.com/openstack/swift/blob/master/swift/obj/replicator.py#L457
[3] https://github.com/openstack/swift/blob/master/swift/obj/replicator.py#L678

Revision history for this message
Tim Burke (1-tim-z) wrote :

Sounds a lot like https://bugs.launchpad.net/swift/+bug/1862645; do you think they might be the same issue?

The last time I looked at that bug, I was having a hard time reproducing the issue -- I guess it's a concurrency bug, where we aren't tolerant of a write coming in between the _get_hashes() at the start of the function [1] and the one after receiving a REPLICATE response [2]?

I think the REPLICATE-after-failed-sync is fine -- we didn't get *everything* synced like we wanted, but we may have moved *something*. We need to tell the remote those suffixes are dirty, or they'll have the wrong hash cached in their hashes.pkl, right?

[1] https://github.com/openstack/swift/blob/2.25.0/swift/obj/replicator.py#L627-L632
[2] https://github.com/openstack/swift/blob/2.25.0/swift/obj/replicator.py#L681-L684

Revision history for this message
Bhaskar Singhal (bhaskarsinghal) wrote :

Thanks Tim.

Yes, https://bugs.launchpad.net/swift/+bug/1862645 looks same.

I agree, for a replicate after failed sync should be fine where atleast few suffixes were synced and we want the remote to re-hash.

Regarding root causing, I am not been able to pinpoint the exact issue. But, yes it could be a concurrency bug.

For now I think we can fix this by making the following changes:

diff --git a/swift/obj/replicator.py b/swift/obj/replicator.py
index e3634bb8f..230b2fc79 100644
--- a/swift/obj/replicator.py
+++ b/swift/obj/replicator.py
@@ -687,6 +687,9 @@ class ObjectReplicator(Daemon):
                     suffixes = [suffix for suffix in local_hash if
                                 local_hash[suffix] !=
                                 remote_hash.get(suffix, -1)]
+ if not suffixes:
+ stats.hashmatch += 1
+ continue
                     stats.rsync += 1
                     success, _junk = self.sync(node, job, suffixes)
                     with Timeout(self.http_timeout):

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/741538
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=907942eb4743e5a4213b1f84fd57a58386594bb6
Submitter: Zuul
Branch: master

commit 907942eb4743e5a4213b1f84fd57a58386594bb6
Author: Tim Burke <email address hidden>
Date: Thu Jul 16 13:04:00 2020 -0700

    Stop syncing empty suffixes list

    Change-Id: I918ab4ccbf4d081b26f4117937410cdad1caf8d3
    Closes-Bug: #1862645
    Closes-Bug: #1886782

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/743279

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/ussuri)

Reviewed: https://review.opendev.org/743279
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0bdbbc4a655fb4f74cd087636fc136c0f9d57754
Submitter: Zuul
Branch: stable/ussuri

commit 0bdbbc4a655fb4f74cd087636fc136c0f9d57754
Author: Tim Burke <email address hidden>
Date: Thu Jul 16 13:04:00 2020 -0700

    Stop syncing empty suffixes list

    Change-Id: I918ab4ccbf4d081b26f4117937410cdad1caf8d3
    Closes-Bug: #1862645
    Closes-Bug: #1886782
    (cherry picked from commit 907942eb4743e5a4213b1f84fd57a58386594bb6)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/743614

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/train)

Reviewed: https://review.opendev.org/743614
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=69225ada606c00e5a68d82c297c7c5c100bf33ab
Submitter: Zuul
Branch: stable/train

commit 69225ada606c00e5a68d82c297c7c5c100bf33ab
Author: Tim Burke <email address hidden>
Date: Thu Jul 16 13:04:00 2020 -0700

    Stop syncing empty suffixes list

    Change-Id: I918ab4ccbf4d081b26f4117937410cdad1caf8d3
    Closes-Bug: #1862645
    Closes-Bug: #1886782
    (cherry picked from commit 907942eb4743e5a4213b1f84fd57a58386594bb6)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.