when min_part_hours is zero composite rebalance may move same partition in multiple cobuilders

Bug #1714274 reported by Alistair Coles on 2017-08-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
High
Alistair Coles

Bug Description

CompositeRingBuilder.rebalance() attempts to prevent the same partition moving in multiple component builders by checking all component builders _can_part_move(). It also makes sure that all component builders _last_part_moves arrays are up to date by calling *all* component builder _update_last_part_moves() during *each* component rebalance. Each call to _update_last_part_moves() resets the component builder's _part_moved_bitmap. When the builder has zero min_part_hours the _part_moved_bitmap is the only state that indicates if a component builder has moved a part during the current composite rebalance. Consequently, components with zero min_part_hours do not currently prevent their co-builders from moving the same partitions.[1]

The proposd patch will include a test that illustrates the bug.

[1] It might be argued that zero min_part_hours suggests it is ok for multiple co-builders to move the same partition - "it is fine to move this part again in a subsequent rebalance with no delay". However, IMO that is not the design goal of composite rebalance, where each component rebalance is not a "subsequent rebalance". Users may have other means of preventing subsequent rebalance of single or composite rings occurring too often, which makes the use of zero min_part_hours simply a convenience. Composite rebalance must enforce single partition movement regardless of min_part_hours.

Alistair Coles (alistair-coles) wrote :
Changed in swift:
status: New → In Progress
assignee: nobody → Alistair Coles (alistair-coles)
Changed in swift:
importance: Undecided → High

Reviewed: https://review.openstack.org/499634
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=348bd83b7eb638d3309ff6313d9a6501c540fa24
Submitter: Zuul
Branch: master

commit 348bd83b7eb638d3309ff6313d9a6501c540fa24
Author: Alistair Coles <email address hidden>
Date: Wed Aug 30 15:29:14 2017 +0100

    Respect co-builder partition moves when min_part_hours is zero

    Repeated calls to each co-builder's _update_last_part_moves() are
    unnecessary and have the unfortunate side effect of resetting the
    _last_part_moved bitmap. When a component builder has zero
    min_part_hours this results in it not preventing its co-builders from
    moving parts that it has already moved.

    This patch changes the CompositeRingBuilder to call each component
    builder _update_last_part_moves() *once* before rebalancing any
    component builder. CooperativeRingBuilder's no longer forward calls to
    their _update_last_part_moves() method. Each component's
    _last_part_moved bitmap is therefore preserved until for the duration
    of the composite rebalance.

    The initialisation of the RingBuilder _last_part_moves array is moved
    to the RingBuilder __init__ method, so that calls to
    _update_last_part_moves() are effective even when rebalance() has
    never been called on that builder. Otherwise, during a composite
    rebalance, a component that has not previously been rebalanced will
    not have its _last_part_moves_epoch updated during rebalance and as a
    result may report erroneous min_part_seconds_left after its first
    rebalance.

    Related-Change: I1b30cb3d776be441346a4131007d2487a5440a81
    Closes-Bug: #1714274
    Change-Id: Ib165cf974c865d47c2d9e8f7b3641971d2e9f404

Changed in swift:
status: In Progress → Fix Released
Download full text (17.0 KiB)

Reviewed: https://review.openstack.org/517718
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=5c564e53966b74d6d7589c3505dd33816b3c19f1
Submitter: Zuul
Branch: feature/deep

commit 77a8a4455d4a2bd0a9b9146a01acfe0d93fe7550
Author: Tim Burke <email address hidden>
Date: Tue Oct 3 00:11:07 2017 +0000

    Let clients request heartbeats during SLO PUTs

    An SLO PUT requires that we HEAD every referenced object; as a result, it
    can be a very time-intensive operation. This makes it difficult as a
    client to differentiate between a proxy-server that's still doing work and
    one that's crashed but left the socket open.

    Now, clients can opt-in to receiving heartbeats during long-running PUTs
    by including the query parameter

        heartbeat=on

    With heartbeating turned on, the proxy will start its response immediately
    with 202 Accepted then send a single whitespace character periodically
    until the request completes. At that point, a final summary chunk will be
    sent which includes a "Response Status" key indicating success or failure
    and (if successful) an "Etag" key indicating the Etag of the resulting SLO.

    This mechanism is very similar to the way bulk extractions and deletions
    work, and even the way SLO behaves for ?multipart-manifest=delete requests.

    Note that this is opt-in: this prevents us from sending the 202 response
    to existing clients that may mis-interpret it as an immediate indication
    of success.

    Co-Authored-By: Alistair Coles <email address hidden>
    Related-Bug: 1718811
    Change-Id: I65cee5f629c87364e188aa05a06d563c3849c8f3

commit 92705bb36b4a771a757977d11875e7b929c41741
Author: David Rabel <email address hidden>
Date: Thu Nov 2 12:38:41 2017 +0100

    Fix indent in overview_policies.rst

    Change-Id: I7f070956d8b996db798837392adfca4483067aea

commit feee3998408e5ed03563c317ad9506ead92083a6
Author: Clay Gerrard <email address hidden>
Date: Fri Sep 1 14:15:45 2017 -0700

    Use check_drive consistently

    We added check_drive to the account/container servers to unify how all
    the storage wsgi servers treat device dirs/mounts. Thus pushes that
    unification down into the consistency engine.

    Drive-by:
     * use FakeLogger less
     * clean up some repeititon in probe utility for device re-"mounting"

    Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
    Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a

commit 29e9ae1cc5b74dce205643c101601555c06b67dc
Author: Tim Burke <email address hidden>
Date: Thu Oct 19 18:53:04 2017 +0000

    Make xml responses less insane

    Looking at bulk extractions:

       $ tar c *.py | curl http://saio:8090/v1/AUTH_test/c?extract-archive=tar \
          -H accept:application/xml -T -
       <?xml version="1.0" encoding="UTF-8"?>
       <delete>
       <number_files_created>2</number_files_created>
       <response_body></response_body>
       <response_status>201 Created</response_status>
       <errors>
       </errors>
       </delete>

    Or SLO upload failures:

       $ curl http://saio:8090/v1/AUTH_...

tags: added: in-feature-deep
Download full text (8.2 KiB)

Reviewed: https://review.openstack.org/517838
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=76917dfb1dcc509ca9fa181639f37c50ade0434a
Submitter: Zuul
Branch: feature/s3api

commit 77a8a4455d4a2bd0a9b9146a01acfe0d93fe7550
Author: Tim Burke <email address hidden>
Date: Tue Oct 3 00:11:07 2017 +0000

    Let clients request heartbeats during SLO PUTs

    An SLO PUT requires that we HEAD every referenced object; as a result, it
    can be a very time-intensive operation. This makes it difficult as a
    client to differentiate between a proxy-server that's still doing work and
    one that's crashed but left the socket open.

    Now, clients can opt-in to receiving heartbeats during long-running PUTs
    by including the query parameter

        heartbeat=on

    With heartbeating turned on, the proxy will start its response immediately
    with 202 Accepted then send a single whitespace character periodically
    until the request completes. At that point, a final summary chunk will be
    sent which includes a "Response Status" key indicating success or failure
    and (if successful) an "Etag" key indicating the Etag of the resulting SLO.

    This mechanism is very similar to the way bulk extractions and deletions
    work, and even the way SLO behaves for ?multipart-manifest=delete requests.

    Note that this is opt-in: this prevents us from sending the 202 response
    to existing clients that may mis-interpret it as an immediate indication
    of success.

    Co-Authored-By: Alistair Coles <email address hidden>
    Related-Bug: 1718811
    Change-Id: I65cee5f629c87364e188aa05a06d563c3849c8f3

commit 92705bb36b4a771a757977d11875e7b929c41741
Author: David Rabel <email address hidden>
Date: Thu Nov 2 12:38:41 2017 +0100

    Fix indent in overview_policies.rst

    Change-Id: I7f070956d8b996db798837392adfca4483067aea

commit feee3998408e5ed03563c317ad9506ead92083a6
Author: Clay Gerrard <email address hidden>
Date: Fri Sep 1 14:15:45 2017 -0700

    Use check_drive consistently

    We added check_drive to the account/container servers to unify how all
    the storage wsgi servers treat device dirs/mounts. Thus pushes that
    unification down into the consistency engine.

    Drive-by:
     * use FakeLogger less
     * clean up some repeititon in probe utility for device re-"mounting"

    Related-Change-Id: I3362a6ebff423016bb367b4b6b322bb41ae08764
    Change-Id: I941ffbc568ebfa5964d49964dc20c382a5e2ec2a

commit 29e9ae1cc5b74dce205643c101601555c06b67dc
Author: Tim Burke <email address hidden>
Date: Thu Oct 19 18:53:04 2017 +0000

    Make xml responses less insane

    Looking at bulk extractions:

       $ tar c *.py | curl http://saio:8090/v1/AUTH_test/c?extract-archive=tar \
          -H accept:application/xml -T -
       <?xml version="1.0" encoding="UTF-8"?>
       <delete>
       <number_files_created>2</number_files_created>
       <response_body></response_body>
       <response_status>201 Created</response_status>
       <errors>
       </errors>
       </delete>

    Or SLO upload failures:

       $ curl http://saio:8090/v1/AUTH...

Read more...

tags: added: in-feature-s3api

This issue was fixed in the openstack/swift 2.16.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers