object-expirer logs un-needed traceback on expected 404/412 until marker reclaims

Bug #1688558 reported by Phillip Couto
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Medium
Unassigned

Bug Description

In my current swift deployment we are seeing a large amount of errors from the object-expirer.

Here is an example from our logs:
object-expirer: Exception while deleting object 1493251104 1493688825-AUTH_esp/tokens/5bc64136-ffaa-51e0-7fb4-a40f1bcd7762
Unexpected response: 412 Precondition
Failed:
#012 Traceback (most recent call last):
#012 File "/usr/lib/python2.7/dist-packages/swift/obj/expirer.py", line 261, in delete_object#012 self.delete_actual_object(actual_obj, timestamp)
#012 File "/usr/lib/python2.7/dist-packages/swift/obj/expirer.py", line 305, in delete_actual_object
#012 (2,))
#012 File "/usr/lib/python2.7/dist-packages/swift/common/internal_client.py", line 195, in make_request
#012 _('Unexpected response: %s') % resp.status, resp)
#012 UnexpectedResponse: Unexpected response: 412 Precondition Failed (txn: txfde64949aff04c7da14d9-00590c7957)

Now I have reviewed the source code and see that this is because the x-delete-at for the object is not matching the x-if-delete-at header. If this is not a bug then what could the cause be for the values to be out of sync?

Revision history for this message
clayg (clay-gerrard) wrote :

I think when an object gets re-uploaded it's possible the old expirer entry doesn't get reaped.

1) Is the object *supposed* to be expired - can you check on all primary on-disk locations using swift-get-nodes or similar?

2) Does the expirer keep retrying or is the .expiring_objects marker cleaned up

It's possible the traceback is benign and everything is working as expected except for the annoying log message.

Revision history for this message
Phillip Couto (z-phillip) wrote :

I am very sorry for taking so long to respond. I will get the information you are asking for shortly.

Revision history for this message
Phillip Couto (z-phillip) wrote :

Ok when using swift-get-nodes it returns the urls to HEAD the individual nodes that should be storing the object.

I tested all the urls and each server even the handoffs are reporting 404. This is expected as the object is long over due to be expired at this point.

I also checked on disk and the object does have a tombstone on the storage nodes. Now when I use swift-object-info on the file on disk there is no metadata so I am not sure what the original X-Delete value was.

The errors eventually go away after some days probably the reclaim age which is left at the default currently.

So in this case is there something that can be done to prevent this sprawl of errors? Like if the DELETE request from the object-expirer happens on an object that is already deleted instead of returning a 412 can't it be a 404 and then the object-expirer can clean up yhr .expiring_objects marker?

Revision history for this message
clayg (clay-gerrard) wrote :

It could be left over entries from before the cut over to using sharded .expiring_object containers [1] - were you using swift before 2.1.0?

Someone at the summit (BOS '17) was asking about making the .expiring_objects container sharding tunable - if we add that it might make orphaned expiry entries (and the associated log noise) more common.

The 412 handling seems related to the 404 handling [2] - it does seem after a reclaim age the whole process gives up - but logs nosily the whole time.

https://github.com/openstack/swift/blob/6e893e228840bc42cfd13546245438832bc2bb46/swift/obj/expirer.py#L265

Not sure what the ideal logging pattern would be - certainly we don't need a traceback for a specific and well understood condition. I'd argue we need no more than a debug message ever pass if at all - and maybe only a warning when we finally reap the expirer marker?

Keeping the traceback on other unknown conditions would be nice - we definitely *can not loose* the blanket exception handling since that method is spawn_n'd.

1. https://review.openstack.org/#/c/113394/
2. https://review.openstack.org/#/c/326903/

Changed in swift:
status: New → Confirmed
importance: Undecided → Medium
tags: added: low-hanging-fruit
summary: - object-expirer errors due to 412 error code
+ object-expirer logs un-needed traceback on expected 404/412 until marker
+ reclaims
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/496534
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=cf1a1e89bbca50e285e99d31209c6eac6c697083
Submitter: Zuul
Branch: master

commit cf1a1e89bbca50e285e99d31209c6eac6c697083
Author: Tim Burke <email address hidden>
Date: Wed Aug 23 07:25:09 2017 +0000

    expirer: unexpected responses don't warrant tracebacks

    If you want more information, you need to go check out the *other* node.

    Maybe this should be further refined to only log at debug for specific
    statuses like 404 and 412?

    Partial-Bug: 1688558
    Related-Bug: 1455221
    Change-Id: Ieefd8841154faba40dcf2a03abc5f056bdccd54f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/s3api)

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/535623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/s3api)
Download full text (33.3 KiB)

Reviewed: https://review.openstack.org/535623
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=271b80d0f51078719de35bf6f75b7e06ac3e5b91
Submitter: Zuul
Branch: feature/s3api

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit d924fa759967b7cdca0d91f21112725f6099a254
Author: Samuel Merritt <email address hidden>
Date: Tue Jan 16 22:19:09 2018 -0800

    Remove old post-as-copy leftovers from tests.

    Since commit 1e79f828, we don't need to test with post_as_copy=True
    any more since we haven't got post_as_copy at all.

    Change-Id: I9c96ce0b812d877bbe11bdb50eb160d6ffa5933d

commit dfa0c4e604fb931d232395599bd0e7b0f11441ee
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 12:04:45 2018 +0000

    Preserve expiring object behaviour with old proxy-server

    The related change [1] causes expiring object records to no longer be
    created if the X-Delete-At-Container header is not sent to the object
    server, but old proxies prior to [2] (i.e. releases prior to 1.9.0)
    did not send this header.

    The goal of [1] can be alternatively achieved by making expiring
    object record creation be conditional on the X-Delete-At-Host header.

    [1] Related-Change: I20fc2f42f590fda995814a2fa7ba86019f9fddc1
    [2] Related-Change: Id0873a3f2198ce285fe0b0c777738eff38bc2438

    Change-Id: Ia0081693f01631d3f2a59612308683e939ced76a

commit d707fc7b6d0ceb4556dddfc258c5de8c4baff05c
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 16:30:13 2018 -0800

    DRY out tests until the stone bleeds

    Can we go deeper!?

    Change-Id: Ibd3b06542aa1bfcbcb71cc98e6bb21a6a67c12f4

commit ba8f1b1c3786df4e79fc3f9e4747d7cfb9072b6f
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 15:25:33 2018 +0000

    Fix intermittent unit test failure

    test_check_delete_headers_removes_delete_after was
    failing intermittently due to rounding of float time
    values.

    Change-Id: Ia126ad6988f387bbd2d1f5ddff0a56d457a1fc9b
    Closes-Bug: #1743804

commit e747f94313f315fdf8d8fc01fb0c5aac60c33897
Author: Kota Tsuyuzaki <email address hidden>
Date: Wed Dec 27 14:37:29 2017 +0900

    Fix InternalClient to drain response body if the request fails

    If we don't drain the body, the proxy logging in the internal client
    pipeline will log 499 client disconnect instead of actual error response
    code.

    For error responses, we try to do the most helpful thing using swob's
    closing and caching response body attribute. For non-error responses
    which are returned to the client, we endeavour to keep the app_iter
    intact and unconsumed, trusting expecting the caller to do the right
    thing is the only reasonable interface. We must cleanly close any WSGI
    app_iter which we do not return to the client rega...

tags: added: in-feature-s3api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/deep)

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/535990

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/deep)
Download full text (32.4 KiB)

Reviewed: https://review.openstack.org/535990
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3122895118111b2b11f5ef9d0b3410b337624b1b
Submitter: Zuul
Branch: feature/deep

commit ddb13aa5eab03b6993887eb02260b4bc0b256922
Author: vxlinux <email address hidden>
Date: Sat Jan 20 17:23:35 2018 +0800

    Remove redundant blank space in README.rst

    Change-Id: If347476e3b9185921ff174d3f8170a1c4d0622e8

commit 12f874534925b52f9d1c91580794eb9e5e9a4589
Author: vxlinux <email address hidden>
Date: Fri Jan 19 16:54:26 2018 +0800

    Add Docstrings to validate_replicas_by_tier

    New common functions should have Docstrings

    Change-Id: Icbb3cdf38509fd6d034cbb2271786559780a7b68

commit d2034cd7b6946829a7d95c4d2c71d4322f80e855
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 17:03:38 2018 -0800

    Keep object-updater stats logging consistent

    If we're going to encapsulate the stats tracking it seems reasonable if
    we ever add any more metrics we can reduce the number of places we need
    to update log messages.

    Change-Id: I187cf6cfec1e0a9138b709fa298e1991aa809ec4

commit cd2c73fd955317a3f40758cef45ee48bef8fbc79
Author: Tim Burke <email address hidden>
Date: Tue Jan 16 01:07:35 2018 +0000

    internal_client: Don't retry when we expect the same reponse

    This boils down to 404, 412, or 416; or 409 when we provided an
    X-Timestamp.

    This means, among other things, that the expirer won't issue 3 DELETEs
    every cycle for every stale work item.

    Related-Change: Icd63c80c73f864d2561e745c3154fbfda02bd0cc
    Change-Id: Ie5f2d3824e040bbc76d511a54d1316c4c2503732

commit 222df9185782f59ffdc96c3534afaa2fb1361235
Author: chengebj5238 <email address hidden>
Date: Thu Jan 18 17:03:11 2018 +0800

    Modify redirection URL and broken URL

    Change-Id: I9a04cb2fbe61e1fbd8185ab2fac9abbcea4d55cc

commit d1656e334959e09d13eea98c2696e58c77e4ab91
Author: Tim Burke <email address hidden>
Date: Fri Jan 12 13:17:45 2018 -0800

    slo: Send ETag header in 206 responses

    Why weren't we doing that before?? The etag should be the same as for
    GET/HEAD, and by sending it, we can assure resuming clients that they're
    downlading the same object even if they didn't include an If-Match
    header.

    Change-Id: I4ccbd1ae3a909ecb4606ef18211d1b868f5cad86
    Related-Change: Ic11662eb5c7176fbf422a6fc87a569928d6f85a1

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit f64c00b00aa8df31a937448917421891904abdc8
Author: Samuel Merritt <email address hidden>
Date: Fri Jan 12 07:17:18 2018 -0800

    Improve object-updater's stats logging

    The object updater has five different stats, but its logging only told
    you two of them (successes and failures), and it only told you after
    finishing all the async_pendings for a device. If y...

tags: added: in-feature-deep
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.