Ring refuses to save even when 100% parts move

Bug #1697543 reported by clayg on 2017-06-12
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Medium
Unassigned

Bug Description

If you're gradually adding weight while adding devices in multiple zones in a EC ring with multiple replicas in each zone, it's possible the rings preference to move replicas of parts that need to disperse to the new zone *first* might leave some new capacity in the original zone waiting for enough frags to move before it can start taking the part-replicas it wants.

duplication example script is attached

Work arounds are to either use "rebalance -f" multiple times until enough frags get assigned to the other zone that the balance changed detection code will start working again.

Or to change weights before you rebalance until the over-assignment in the standing zone becomes more apparent in the balance.

Fix would be to just look at dispersion or changed_parts coming out of rebalance (in addition to delta balance) before we design if the rebalance is worth saving.

clayg (clay-gerrard) wrote :

With enough replicas and a failed device it's easier to see that we should look at delta_dispersion in addition to delta_balance:

https://gist.github.com/clayg/b0d0d41a382e70356bb58a1ee94d1b73

With the failed device on the server that's desperately trying to shed parts, and enough replicas -
 balance will not change significantly from one invocation to the next while rebalance is busy fixing dispersion...

We should expect that as a desire-able behavior and use delta_dispersion to get over the hump:

ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance
Cowardly refusing to save rebalance as it did not change at least 1%.
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder |head
stuck.builder, build version 63, id a5b9fbd213bb4c20ab60eff2a2bb3a75
256 partitions, 13.000000 replicas, 1 regions, 1 zones, 52 devices, 100.00 balance, 100.00 dispersion
...
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance
Cowardly refusing to save rebalance as it did not change at least 1%.
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance -f
Reassigned 256 (100.00%) partitions. Balance is now 100.00. Dispersion is now 0.00
-------------------------------------------------------------------------------
NOTE: Balance of 100.00 indicates you should push this
      ring, wait at least 0 hours, and rebalance/repush.
-------------------------------------------------------------------------------
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance
Reassigned 255 (99.61%) partitions. Balance is now 1.56. Dispersion is now 0.00

Notice the delta_dispersion when "cowardly refusing to save rebalance" is *HUGE*

Changed in swift:
importance: Low → Medium
clayg (clay-gerrard) wrote :

I was pretty sure this changed would solve this

https://review.openstack.org/#/c/479012/

But maybe there's something else going on? I'd like to know what...

clayg (clay-gerrard) wrote :

So if you have a sufficient number of replicas to move you can have 100 dispersion even after moving the maximum whole replicanth

If you go from 6 replicanths in z1 and 1 in z2 to 5 replicanths in z1 and 2 in z2 the dispersion metric should probably represent some kind of improvement - but I'm not sure how exactly... and we'd still need to related change.

Reviewed: https://review.openstack.org/479012
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=aa82d2cba82209f1bf3944c6d2a67965af5a1540
Submitter: Zuul
Branch: master

commit aa82d2cba82209f1bf3944c6d2a67965af5a1540
Author: Samuel Merritt <email address hidden>
Date: Thu Jun 29 10:23:38 2017 -0700

    Save ring builder if dispersion changes

    There are cases where a rebalance improves dispersion, but doesn't
    improve balance. This is because the balance of a ring builder is
    taken to be the balance of its least-balanced device, so if there's a
    device that has no partitions, wants some, but can't get them, then
    we'll never save the ring builder even if every other device in the
    ring got better.

    We can detect this situation by looking at the dispersion number; if it
    changes, then the rebalance needs to be saved in order to continue to
    make progress.

    Partial-Bug: #1697543

    Change-Id: Ie239b958fc7e0547ffda2bebf61546bd4ef3d829

Download full text (16.1 KiB)

Reviewed: https://review.openstack.org/530733
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e2f780492487a31d4f8111d53cbde6c02d9d3237
Submitter: Zuul
Branch: feature/deep

commit 61fe6aae81d00597c777a64ac337a8dfb990f0c2
Author: Tim Burke <email address hidden>
Date: Tue Aug 22 22:40:58 2017 +0000

    Better mock out OSErrors in test_replicator before raising them

    Also, provide a return value for resp.read() so we hit a
    pickle error instead of a type error.

    Change-Id: I56141eee63ad1ceb2edf807432fa2516fabb15a6

commit 0bdec4661b5609ca1bf813a7ccd514e5d444b07f
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Mon Dec 25 09:13:17 2017 +0000

    Skip symlink + vw functional tests if symlink is not enabled

    Functional tests for symlink and versioned writes run and result in
    falure even if symlink is not enabled.

    This patch fixes the functional tests to run only if both of
    symlink and versioned writes are enabled.

    Change-Id: I5ffd0b6436e56a805784baf5ceb722effdf74884

commit 17e6950aa08101b5f3bec0f2f9c32cfd5f51fa36
Author: Kazuhiro MIYAHARA <email address hidden>
Date: Fri Dec 22 02:18:09 2017 +0000

    Fix manpage docs' daemon names

    In current manpage docs, some of daemon names for concurrency
    explanation is wrong.

    This patch fixes the daemon names.

    Change-Id: I2a505c9590ee3a3a7e37e8d949a10db36206faec

commit af2c2a6eb54d848eefc2d0a1b619e0b86eed2eb5
Author: Samuel Merritt <email address hidden>
Date: Thu Dec 21 10:43:39 2017 -0800

    Fix sometimes-flaky container name functional test.

    You've got two test classes: TestContainer and TestContainerUTF8. They
    each try to create the same set of containers with names of varying
    lengths to make sure the container-name length limit is being honored.

    Also, each test class tries to clean up pre-existing data in its
    setUpClass method. If TestContainerUTF8 fails to delete a contaienr
    that TestContainer made, then its testContainerNameLimit method will
    fail because the container PUT response has status 202 instead of 201,
    which is because the container still existed from the prior test.

    I've made the test consider both 201 and 202 as success. For purposes
    of testing the maximum container name length, any 2xx is fine.

    Change-Id: I7b343a8ed0d12537659c051ddf29226cefa78a8f

commit 609c757e698ff7893e1b1a0e32d088ad9d05ad95
Author: Clay Gerrard <email address hidden>
Date: Tue Dec 12 21:39:54 2017 -0800

    functest for symlink + versioned writes

    Co-Author: Alistair Coles <email address hidden>

    Related-Change-Id: I838ed71bacb3e33916db8dd42c7880d5bb9f8e18
    Change-Id: I0ccff1eafcfb3fdbdda9faf55a44c45b834e723a

commit bdd4eb6936b0e25aff5357bde876309ee5b032ec
Author: Andreas Jaeger <email address hidden>
Date: Wed Dec 20 07:14:03 2017 +0100

    Install liberasurecode-devel for CentOS 7

    Since I747c2b8754effbc6ec82af3bf7543fd9599a6c14 we do not install
    the RDO package repository anymore and thus liberasurecode-devel
    cannot be installed.

    For CentOS 7, remove liberasure...

tags: added: in-feature-deep

Reviewed: https://review.openstack.org/528155
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=7013e70ca67891e94664e9eca70925b61ee8f689
Submitter: Zuul
Branch: master

commit 7013e70ca67891e94664e9eca70925b61ee8f689
Author: Clay Gerrard <email address hidden>
Date: Thu Dec 14 20:03:24 2017 -0800

    Represent dispersion worse than one replicanth

    With a sufficiently undispersed ring it's possible to move an entire
    replicas worth of parts and yet the value of dispersion may not get any
    better (even though in reality dispersion has dramatically improved).
    The problem is dispersion will currently only represent up to one whole
    replica worth of parts being undispersed.

    However with EC rings it's possible for more than one whole replicas
    worth of partitions to be undispersed, in these cases the builder will
    require multiple rebalance operations to fully disperse replicas - but
    the dispersion value should improve with every rebalance.

    N.B. with this change it's possible for rings with a bad dispersion
    value to measure as having a significantly smaller dispersion value
    after a rebalance (even though they may not have had their dispersion
    change) because the total amount of bad dispersion we can measure has
    been increased but we're normalizing within a similar range.

    Closes-Bug: #1697543

    Change-Id: Ifefff0260deac0c3e8b369a1e158686c89936686

Changed in swift:
status: Confirmed → Fix Released
Download full text (33.3 KiB)

Reviewed: https://review.openstack.org/535623
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=271b80d0f51078719de35bf6f75b7e06ac3e5b91
Submitter: Zuul
Branch: feature/s3api

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit d924fa759967b7cdca0d91f21112725f6099a254
Author: Samuel Merritt <email address hidden>
Date: Tue Jan 16 22:19:09 2018 -0800

    Remove old post-as-copy leftovers from tests.

    Since commit 1e79f828, we don't need to test with post_as_copy=True
    any more since we haven't got post_as_copy at all.

    Change-Id: I9c96ce0b812d877bbe11bdb50eb160d6ffa5933d

commit dfa0c4e604fb931d232395599bd0e7b0f11441ee
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 12:04:45 2018 +0000

    Preserve expiring object behaviour with old proxy-server

    The related change [1] causes expiring object records to no longer be
    created if the X-Delete-At-Container header is not sent to the object
    server, but old proxies prior to [2] (i.e. releases prior to 1.9.0)
    did not send this header.

    The goal of [1] can be alternatively achieved by making expiring
    object record creation be conditional on the X-Delete-At-Host header.

    [1] Related-Change: I20fc2f42f590fda995814a2fa7ba86019f9fddc1
    [2] Related-Change: Id0873a3f2198ce285fe0b0c777738eff38bc2438

    Change-Id: Ia0081693f01631d3f2a59612308683e939ced76a

commit d707fc7b6d0ceb4556dddfc258c5de8c4baff05c
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 16:30:13 2018 -0800

    DRY out tests until the stone bleeds

    Can we go deeper!?

    Change-Id: Ibd3b06542aa1bfcbcb71cc98e6bb21a6a67c12f4

commit ba8f1b1c3786df4e79fc3f9e4747d7cfb9072b6f
Author: Alistair Coles <email address hidden>
Date: Wed Jan 17 15:25:33 2018 +0000

    Fix intermittent unit test failure

    test_check_delete_headers_removes_delete_after was
    failing intermittently due to rounding of float time
    values.

    Change-Id: Ia126ad6988f387bbd2d1f5ddff0a56d457a1fc9b
    Closes-Bug: #1743804

commit e747f94313f315fdf8d8fc01fb0c5aac60c33897
Author: Kota Tsuyuzaki <email address hidden>
Date: Wed Dec 27 14:37:29 2017 +0900

    Fix InternalClient to drain response body if the request fails

    If we don't drain the body, the proxy logging in the internal client
    pipeline will log 499 client disconnect instead of actual error response
    code.

    For error responses, we try to do the most helpful thing using swob's
    closing and caching response body attribute. For non-error responses
    which are returned to the client, we endeavour to keep the app_iter
    intact and unconsumed, trusting expecting the caller to do the right
    thing is the only reasonable interface. We must cleanly close any WSGI
    app_iter which we do not return to the client rega...

tags: added: in-feature-s3api
Download full text (32.4 KiB)

Reviewed: https://review.openstack.org/535990
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3122895118111b2b11f5ef9d0b3410b337624b1b
Submitter: Zuul
Branch: feature/deep

commit ddb13aa5eab03b6993887eb02260b4bc0b256922
Author: vxlinux <email address hidden>
Date: Sat Jan 20 17:23:35 2018 +0800

    Remove redundant blank space in README.rst

    Change-Id: If347476e3b9185921ff174d3f8170a1c4d0622e8

commit 12f874534925b52f9d1c91580794eb9e5e9a4589
Author: vxlinux <email address hidden>
Date: Fri Jan 19 16:54:26 2018 +0800

    Add Docstrings to validate_replicas_by_tier

    New common functions should have Docstrings

    Change-Id: Icbb3cdf38509fd6d034cbb2271786559780a7b68

commit d2034cd7b6946829a7d95c4d2c71d4322f80e855
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 16 17:03:38 2018 -0800

    Keep object-updater stats logging consistent

    If we're going to encapsulate the stats tracking it seems reasonable if
    we ever add any more metrics we can reduce the number of places we need
    to update log messages.

    Change-Id: I187cf6cfec1e0a9138b709fa298e1991aa809ec4

commit cd2c73fd955317a3f40758cef45ee48bef8fbc79
Author: Tim Burke <email address hidden>
Date: Tue Jan 16 01:07:35 2018 +0000

    internal_client: Don't retry when we expect the same reponse

    This boils down to 404, 412, or 416; or 409 when we provided an
    X-Timestamp.

    This means, among other things, that the expirer won't issue 3 DELETEs
    every cycle for every stale work item.

    Related-Change: Icd63c80c73f864d2561e745c3154fbfda02bd0cc
    Change-Id: Ie5f2d3824e040bbc76d511a54d1316c4c2503732

commit 222df9185782f59ffdc96c3534afaa2fb1361235
Author: chengebj5238 <email address hidden>
Date: Thu Jan 18 17:03:11 2018 +0800

    Modify redirection URL and broken URL

    Change-Id: I9a04cb2fbe61e1fbd8185ab2fac9abbcea4d55cc

commit d1656e334959e09d13eea98c2696e58c77e4ab91
Author: Tim Burke <email address hidden>
Date: Fri Jan 12 13:17:45 2018 -0800

    slo: Send ETag header in 206 responses

    Why weren't we doing that before?? The etag should be the same as for
    GET/HEAD, and by sending it, we can assure resuming clients that they're
    downlading the same object even if they didn't include an If-Match
    header.

    Change-Id: I4ccbd1ae3a909ecb4606ef18211d1b868f5cad86
    Related-Change: Ic11662eb5c7176fbf422a6fc87a569928d6f85a1

commit 88eea33ccd1875af811b59d15df55e2bffa27f77
Author: Clay Gerrard <email address hidden>
Date: Thu Jan 11 13:36:09 2018 -0800

    Recenter builder test expectation around random variance

    ... in order to make the test pass with more seeds and fail less
    frequently in the gate.

    Change-Id: I059e80af87fd33a3b6c0731fbad62e035215eca5

commit f64c00b00aa8df31a937448917421891904abdc8
Author: Samuel Merritt <email address hidden>
Date: Fri Jan 12 07:17:18 2018 -0800

    Improve object-updater's stats logging

    The object updater has five different stats, but its logging only told
    you two of them (successes and failures), and it only told you after
    finishing all the async_pendings for a device. If y...

This issue was fixed in the openstack/swift 2.17.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers