object-expirer logs un-needed traceback on expected 404/412 until marker reclaims

Bug #1688558 reported by Phillip Couto on 2017-05-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Medium
Unassigned

Bug Description

In my current swift deployment we are seeing a large amount of errors from the object-expirer.

Here is an example from our logs:
object-expirer: Exception while deleting object 1493251104 1493688825-AUTH_esp/tokens/5bc64136-ffaa-51e0-7fb4-a40f1bcd7762
Unexpected response: 412 Precondition
Failed:
#012 Traceback (most recent call last):
#012 File "/usr/lib/python2.7/dist-packages/swift/obj/expirer.py", line 261, in delete_object#012 self.delete_actual_object(actual_obj, timestamp)
#012 File "/usr/lib/python2.7/dist-packages/swift/obj/expirer.py", line 305, in delete_actual_object
#012 (2,))
#012 File "/usr/lib/python2.7/dist-packages/swift/common/internal_client.py", line 195, in make_request
#012 _('Unexpected response: %s') % resp.status, resp)
#012 UnexpectedResponse: Unexpected response: 412 Precondition Failed (txn: txfde64949aff04c7da14d9-00590c7957)

Now I have reviewed the source code and see that this is because the x-delete-at for the object is not matching the x-if-delete-at header. If this is not a bug then what could the cause be for the values to be out of sync?

clayg (clay-gerrard) wrote :

I think when an object gets re-uploaded it's possible the old expirer entry doesn't get reaped.

1) Is the object *supposed* to be expired - can you check on all primary on-disk locations using swift-get-nodes or similar?

2) Does the expirer keep retrying or is the .expiring_objects marker cleaned up

It's possible the traceback is benign and everything is working as expected except for the annoying log message.

Phillip Couto (z-phillip) wrote :

I am very sorry for taking so long to respond. I will get the information you are asking for shortly.

Phillip Couto (z-phillip) wrote :

Ok when using swift-get-nodes it returns the urls to HEAD the individual nodes that should be storing the object.

I tested all the urls and each server even the handoffs are reporting 404. This is expected as the object is long over due to be expired at this point.

I also checked on disk and the object does have a tombstone on the storage nodes. Now when I use swift-object-info on the file on disk there is no metadata so I am not sure what the original X-Delete value was.

The errors eventually go away after some days probably the reclaim age which is left at the default currently.

So in this case is there something that can be done to prevent this sprawl of errors? Like if the DELETE request from the object-expirer happens on an object that is already deleted instead of returning a 412 can't it be a 404 and then the object-expirer can clean up yhr .expiring_objects marker?

clayg (clay-gerrard) wrote :

It could be left over entries from before the cut over to using sharded .expiring_object containers [1] - were you using swift before 2.1.0?

Someone at the summit (BOS '17) was asking about making the .expiring_objects container sharding tunable - if we add that it might make orphaned expiry entries (and the associated log noise) more common.

The 412 handling seems related to the 404 handling [2] - it does seem after a reclaim age the whole process gives up - but logs nosily the whole time.

https://github.com/openstack/swift/blob/6e893e228840bc42cfd13546245438832bc2bb46/swift/obj/expirer.py#L265

Not sure what the ideal logging pattern would be - certainly we don't need a traceback for a specific and well understood condition. I'd argue we need no more than a debug message ever pass if at all - and maybe only a warning when we finally reap the expirer marker?

Keeping the traceback on other unknown conditions would be nice - we definitely *can not loose* the blanket exception handling since that method is spawn_n'd.

1. https://review.openstack.org/#/c/113394/
2. https://review.openstack.org/#/c/326903/

Changed in swift:
status: New → Confirmed
importance: Undecided → Medium
tags: added: low-hanging-fruit
summary: - object-expirer errors due to 412 error code
+ object-expirer logs un-needed traceback on expected 404/412 until marker
+ reclaims
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers