race with container recreate

Bug #1292784 reported by clayg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

So I always hated the way that the status field worked on our db's - but when looking at it I thought I spotted a race, and now I'm pretty sure it's there. Can someone check out this test case (redbo, put on your concurrency hat).

Basically if you PUT on a container if the file on disk already exists it will update the put_timestamp, if not it goes into initialize which *may* raise AlreadyExists - but in the second case we don't update put_timestamp.

Replication was fixing it eventually...

Revision history for this message
clayg (clay-gerrard) wrote :
Revision history for this message
Samuel Merritt (torgomatic) wrote :

I mean, we could try updating the put_timestamp after catching AlreadyExists, but that doesn't guarantee the container exists after a PUT. If the deleter has a later X-Timestamp but gets to disk first, then even if we do update put_timestamp, the DB will still be considered deleted.

Revision history for this message
clayg (clay-gerrard) wrote :

Yes, totally agree, thanks for the confirmation - that would totally fix it.

It looks weird tho (doing the same thing in the except block as the else block)?

clayg (clay-gerrard)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/81104
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2e8bc44c9d0586813587a5711f8e857a31393326
Submitter: Jenkins
Branch: master

commit 2e8bc44c9d0586813587a5711f8e857a31393326
Author: Clay Gerrard <email address hidden>
Date: Mon Mar 17 16:00:46 2014 -0700

    Fix race on container recreate

    There was a path on container recreate that would sometimes allow db to get
    reinitialized without updating put_timestamp. Replication would of course fix
    it up, but that node would think it's database was deleted till then desipite
    just ok'ing a request with a newer X-Timestamp than the deleted_timestamp on
    disk.

    Change-Id: I8b98afb2aac2e433b6ecb5c421ba0d778cef42fa
    Closes-Bug: #1292784

Changed in swift:
status: New → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/82165

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/ec)
Download full text (6.4 KiB)

Reviewed: https://review.openstack.org/82165
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=cbf8a01032ee36bc48a39b18f11c1a04d95596a4
Submitter: Jenkins
Branch: feature/ec

commit 2e6325f35174cef6cdc0545901d268c5b1d94575
Author: Peter Portante <email address hidden>
Date: Wed Mar 19 16:19:45 2014 -0400

    Enable object versions for SAIO by default

    Change-Id: I4a20cacd6c8eb4d82a1f0b59a6b26340bc57184a

commit 7829ca07d7b21cd16efcc6144c4b727240e9e9b2
Author: Samuel Merritt <email address hidden>
Date: Tue Mar 18 12:50:51 2014 -0700

    Remove some debugging prints from tests

    Change-Id: I7116aa75ea5c8e1ef85a799a6e1ddf0d6edffb4f

commit 6514a485d5f32923d2c7870f790e047643ae08df
Author: Clay Gerrard <email address hidden>
Date: Mon Mar 17 20:58:42 2014 -0700

    move cors-test-page to literal include

    This makes it so test-cors.html is a real file in doc/source so it's easy for
    those in the know to jump in there with a `python -m SimpleHTTPServer` and
    point their webbrowser to `http://localhost:8000/test-cors.html`.

    The example html and javascript still appear in the docs in their entirety
    using the Sphinx literal include directive.

    Change-Id: Ia0ba36df6c58795e3764fa53b7f585dcc1b3be07

commit 01410c685080c9d700278fc430cc004fa112d783
Author: Clay Gerrard <email address hidden>
Date: Mon Mar 17 20:18:42 2014 -0700

    Make backend container requests use the same X-Timestamp

    The proxy.controllers.base's generate_request_headers will set an X-Timestamp
    header for you if it didn't get populated by additional kwarg or the
    transfer_headers method. This works fine if you only call it once per
    request, but because of how proxy.controllers.obj and
    proxy.controllers.container fill in the backend update header chains in
    _backend_requests we need multiple independent copies and call the base
    controllers generate_request_headers once of each backend request - which left
    the ContainerController sending down different X-Timestamp values
    (microseconds apart) for PUT and DELETE.

    The ObjectController skirts the issue entirely because it always preloads a
    X-Timestamp on the req used to generate backend headers, and it allows it to
    be copied over via transfer_headers by including 'x-timestamp' in it's
    pass_through_headers attribute.

    Because the container-replicator is already does merge_timestamps the
    differences would always eventaully even out and there is no consistency bug,
    but this seems cleaner since they put_timestamp being stored on the three
    replicas during a container PUT were all coming from the same client request.

    Since both PUT and DELETE were effected, and the ContainerController doesn't
    need to allow X-Timestamp to pass_through like the ObjectController does for
    container-sync, it seemed cleanest to fix the issue in _backend_requests via
    the additional kwarg to generate_request_headers.

    There's a driveby fix for FakeLogger and update to the proxy_server's
    ContainerController tests.

    Change-Id: Idbdf1204da33f8fb356ae35961dbdc931b228...

Read more...

Thierry Carrez (ttx)
Changed in swift:
milestone: none → 1.13.1-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in swift:
milestone: 1.13.1-rc1 → 1.13.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.