Comment 3 for bug 1897177

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/754242
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8c0a1abf744a11b5c289239e3ac830786a9de4e9
Submitter: Zuul
Branch: master

commit 8c0a1abf744a11b5c289239e3ac830786a9de4e9
Author: Romain LE DISEZ <email address hidden>
Date: Thu Sep 24 20:36:36 2020 -0400

    Fix a race condition in case of cross-replication

    In a situation where two nodes does not have the same version of a ring
    and they both think the other node is the primary node of a partition,
    a race condition can lead to the loss of some of the objects of the
    partition.

    The following sequence leads to the loss of some of the objects:

      1. A gets and reloads the new ring
      2. A starts to replicate/revert the partition P to node B
      3. B (with the old ring) starts to replicate/revert the (partial)
         partition P to node A
         => replication should be fast as all objects are already on node A
      4. B finished replication of (partial) partition P to node A
      5. B remove the (partial) partition P after replication succeeded
      6. A finishes replication of partition P to node B
      7. A removes the partition P
      8. B gets and reloads the new ring

    All data transfered between steps 2 and 5 will be lost as they are not
    anymore on node B and they are also removed from node A.

    This commit make the replicator/reconstructor to hold a replication_lock
    on partition P so that remote node cannot start an opposite replication.

    Change-Id: I29acc1302a75ed52c935f42485f775cd41648e4d
    Closes-Bug: #1897177