OpenStack Object Storage (swift)

Objects are not deleted from additional handoff when primary disk fails

Bug #1690791 reported by Pavel Kvasnička on 2017-05-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Confirmed	Medium	Unassigned

Bug Description

We had corrupted some partitions of hard drive that results I/O error on mkdir when rsync tried to replicate objects. This results more rsyncs and may result in returning objects "from grave" when disk is not replaced during reclaim age.

Description of state
--------------------

Probably when we had disconnected network between zones, was created data file on the 5th handoff - server "11" (I opened a ringfile to see that the partition belongs to other 4 handoffs and then was the server that repeats object replication). We have replica = 4, 2 zones and 2 region in each, and one failing disk drive (I expect that the failed drive was in the same region where server 11 - the 5th replica, but I'm not sure now).

The file was deleted, but not from server 11. Then the object was every replicator cycle rsynced to all primaries (except the failing one) and deleted, because there was a tombstone. After reclaim age was the object placed on primaries.

I write "an object", but that are few hundred objects and thousands of tombstones.

Suggestions
-----------

Make a new metrics
- age of the oldest partition that is placed on handoff
- count of "handoff objects" waiting to delete

Fix it - place objects to the best handoffs. When first handoff is used then tombstone or object updates will be placed there too. "First" means the necessary count of handoffs to satisfy replication count from the begin of handoffs list.

Revision history for this message

clayg (clay-gerrard) wrote on 2017-07-27:

https://review.openstack.org/#/c/470440/

Changed in swift:
importance:	Undecided → Medium
status:	New → Confirmed

Revision history for this message

clayg (clay-gerrard) wrote on 2017-07-27:

I think this mostly boils down to us continuing to make more and more conscious decisions about the difference in how we want to handle handoff partitions vs. mis-placed partitions. A mis-placed partition (part moved from rebalance, non-primary & non-handoff) perhaps should not *immediately* move to handoff if a primary rejects it for e.g. concurrency... but the longer a part sits misplaced the more chance there is for reclaim age related dark data issues.

More analytics on partitions (handoff vs. misplaced, last updated, size & density) the better decisions we'll be able to make!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-10: Change abandoned on swift (master)

Change abandoned by Pavel Kvasnička (<email address hidden>) on branch: master
Review: https://review.openstack.org/470440
Reason: This issue should be solved by monitoring rsync log (something like I/O error on drive ...). When we focus on there errors then we don't need this patch.

I agree with Clay, we don't need to increase load in all cases when some replicas are not accessible, so I close this "fix".

I'm leaving a bug opened because monitoring can be useful for other incidents.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.