SSYNC: Race condition in replication/reconstruction can lead to loss of datafile
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
High
|
Unassigned |
Bug Description
We discovered that, after a rebalance, some replica or fragments were missing in the cluster.
We enabled debug and added some extra logging around all calls of rmdir/rmtree.
Here is the example of a fragment that disappeared during a rebalance.
partition: 222495
from: 172.16.
to: 172.16.
object: 222495/
OLD 2020-09-24T15:18:16 obj-reconstructor Ring change detected. Aborting current reconstruction pass.
NEW 2020-09-24T15:18:37 obj-server 172.16.0.38 - - [24/Sep/
NEW 2020-09-24T15:20:32 obj-server - - - [24/Sep/
OLD 2020-09-24T15:22:09 obj-server 172.16.0.126 - - [24/Sep/
OLD 2020-09-24T15:22:09 obj-server 172.16.0.126 - - [24/Sep/
NEW 2020-09-24T15:22:10 obj-reconstructor rmdir(/
NEW 2020-09-24T15:24:13 obj-reconstructor Ring change detected. Aborting current reconstruction pass.
OLD 2020-09-24T15:24:45 obj-reconstructor rmdir(/
NEW 2020-09-24T15:24:45 obj-server 172.16.0.38 - - [24/Sep/
In this cluster, the distribution of a new ring can take up to 30 minutes.
The extract from the log shows that, while the old primary is reverting the partition to the new primary, the new primary still has the old ring so it also tries to revert it (to the old primary).
It leads:
- the new primary to delete the fragments of the partition he already got from the old primary because old primary is in sync
- the old primary to delete all fragments as the new primary confirmed the PUT succeeded
summary: |
- Race condition in replication/reconstruction can lead to loss of + SSYNC: Race condition in replication/reconstruction can lead to loss of datafile |
I'm thinking that the replicator/ reconstructor should lock the partition so that an incoming SSYNC would fail. It would avoid such race condition