Objects are not deleted from additional handoff when primary disk fails

Bug #1690791 reported by Pavel Kvasnička
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Medium
Unassigned

Bug Description

We had corrupted some partitions of hard drive that results I/O error on mkdir when rsync tried to replicate objects. This results more rsyncs and may result in returning objects "from grave" when disk is not replaced during reclaim age.

Description of state
--------------------

Probably when we had disconnected network between zones, was created data file on the 5th handoff - server "11" (I opened a ringfile to see that the partition belongs to other 4 handoffs and then was the server that repeats object replication). We have replica = 4, 2 zones and 2 region in each, and one failing disk drive (I expect that the failed drive was in the same region where server 11 - the 5th replica, but I'm not sure now).

The file was deleted, but not from server 11. Then the object was every replicator cycle rsynced to all primaries (except the failing one) and deleted, because there was a tombstone. After reclaim age was the object placed on primaries.

I write "an object", but that are few hundred objects and thousands of tombstones.

Suggestions
-----------

Make a new metrics
 - age of the oldest partition that is placed on handoff
 - count of "handoff objects" waiting to delete

Fix it - place objects to the best handoffs. When first handoff is used then tombstone or object updates will be placed there too. "First" means the necessary count of handoffs to satisfy replication count from the begin of handoffs list.

Revision history for this message
clayg (clay-gerrard) wrote :
Changed in swift:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
clayg (clay-gerrard) wrote :

I think this mostly boils down to us continuing to make more and more conscious decisions about the difference in how we want to handle handoff partitions vs. mis-placed partitions. A mis-placed partition (part moved from rebalance, non-primary & non-handoff) perhaps should not *immediately* move to handoff if a primary rejects it for e.g. concurrency... but the longer a part sits misplaced the more chance there is for reclaim age related dark data issues.

More analytics on partitions (handoff vs. misplaced, last updated, size & density) the better decisions we'll be able to make!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by Pavel Kvasnička (<email address hidden>) on branch: master
Review: https://review.openstack.org/470440
Reason: This issue should be solved by monitoring rsync log (something like I/O error on drive ...). When we focus on there errors then we don't need this patch.

I agree with Clay, we don't need to increase load in all cases when some replicas are not accessible, so I close this "fix".

I'm leaving a bug opened because monitoring can be useful for other incidents.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.