Bug #1699636 “some rings won't rebalance” : Bugs : OpenStack Object Storage (swift)

Tim Burke (1-tim-z) on 2017-07-21

tags:

added: ring

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-26: Fix merged to swift (master)

#1

Reviewed: https://review.openstack.org/503152
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=23219664564d1b5a7ba02bbf8309ec699ab7a4cb
Submitter: Jenkins
Branch: master

commit 23219664564d1b5a7ba02bbf8309ec699ab7a4cb
Author: Kota Tsuyuzaki <email address hidden>
Date: Fri Jun 30 02:03:48 2017 -0700

Accept a trade off of dispersion for balance

... but only if we *have* to!

    During the initial gather for balance we prefer to avoid replicas on
    over-weight devices that are already under-represented in any of it's
    tiers (i.e. if a zone has to have at least one, but may have as many of
    two, don't take the only replica). Instead we hope by going for
    replicas on over-weight devices that are at the limits of their
    dispersion we might have a better than even chance we find a better
    place for them during placement!

    This normally works on out - and especially so for rings which can
    disperse and balance. But for existing rings where we'd have to
    sacrifice dispersion to improve balance the existing optimistic gather
    will end up refusing to trade dispersion for balance - and instead get
    stuck without solving either!

    You should always be able to solve for *either* dispersion or balance.
    But if you can't solve *both* - we bail out on our optimistic gather
    much more quickly and instead just focus on improving balance. With
    this change, the ring can get into balanced (and un-dispersed) states
    much more quickly!

    Change-Id: I17ac627f94f64211afaccad15596a9fcab2fada2
    Related-Change-Id: Ie6e2d116b65938edac29efa6171e2470bb3e8e12
    Closes-Bug: 1699636
    Closes-Bug: 1701472

Changed in swift:
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-30: Fix proposed to swift (feature/deep)

#2

Fix proposed to branch: feature/deep
Review: https://review.openstack.org/508700

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-30: Fix merged to swift (feature/deep)

#3

Download full text (7.2 KiB)

Reviewed: https://review.openstack.org/508700
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0c75ddf6fe5a4843fe60836b402f27cb1b83d8c5
Submitter: Zuul
Branch: feature/deep

commit 93fc9d2de86f37f62b1d6768600d0551e1b72fb6
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 16:35:27 2017 +0100

Add cautionary note re delay_reaping in account-server.conf-sample

Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
Related-Bug: #1514528

commit c6aea4b3730c937c41815831a7b4d60ff2899fcb
Author: Tim Burke <email address hidden>
Date: Wed Sep 27 19:19:53 2017 +0000

Fix intermittent failure in test_x_delete_after

X-Delete-After: 1 is known to be flakey; use 2 instead.

    When the proxy receives an X-Delete-After header, it automatically
    converts it to an X-Delete-At header based on the current time. So far,
    so good. But in normalize_delete_at_timestamp we convert our

time.time() + int(req.headers['X-Delete-After'])

    to a string representation of an integer and in the process always round
    *down*. As a result, we lose up to a second worth of object validity,
    meaning the object server can (rarely) respond 400, complaining that the
    X-Delete-At is in the past.

    Change-Id: Ib5e5a48f5cbed0eade8ba3bca96b26c82a9f9d84
    Related-Change: I643be9af8f054f33897dd74071027a739eaa2c5c
    Related-Change: I10d3b9fcbefff3c415a92fa284a1ea1eda458581
    Related-Change: Ifdb1920e5266aaa278baa0759fc0bfaa1aff2d0d
    Related-Bug: #1597520
    Closes-Bug: #1699114

commit 5c76b9e691166acc1f7b8483aaa3980ebc70bd3a
Author: Alistair Coles <email address hidden>
Date: Wed Sep 27 14:11:14 2017 +0100

Add concurrent_gets to proxy.conf man page

Change-Id: Iab1beff4899d096936c0e5915f3ec32364b3e517
Closes-Bug: #1559347

commit b4f08b6090057897ac647ba6331a4ec867b8e3b8
Author: Jens Harbott <email address hidden>
Date: Wed Sep 27 09:10:54 2017 +0000

Fix functest for IPv6 endpoints

Currently the functional tests fail if the storage_url contains a quoted
IPv6 address because we try to split on ':'.

    But actually we don't need to split hostname and port only in order to
    combine it back together lateron. Use the standard urlparse() function
    instead and work with the 'netloc' part of the URL which keeps hostname
    and port together.

Change-Id: I64589e5f2d6fb3cebc6768dc9e4de6264c09cbeb
Partial-Bug: 1656329

commit 53ab6f2907eff2bb90528010d881f2f87ee02505
Author: Alistair Coles <email address hidden>
Date: Tue Sep 26 11:43:53 2017 +0100

Assert memcached connection error is logged

Follow up to [1] - change logger mocking so that we can
assert the memcached connection error is logged.

[1] Related-Change: I97c5420b4b4ecc127e9e94e9d0f91fbe92a5f623

Change-Id: I87cf4245082c5e0f0705c2c14ddfc0b5d5d89c06

commit e501ac7d2be5c11b2ed0005885c84023054ec041
Author: Matthew Oliver <email address hidden>
Date: Thu Sep 3 12:19:05 2015 +1000

Fix memcached exception out of range stacktrace

When a memecached server goes offline in the middle of a
MemcahceRing...

Reviewed:  https://review.openstack.org/508700
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0c75ddf6fe5a4843fe60836b402f27cb1b83d8c5
Submitter: Zuul
Branch:    feature/deep

commit 93fc9d2de86f37f62b1d6768600d0551e1b72fb6
Author: Alistair Coles <alistairncoles@gmail.com>
Date:   Wed Sep 27 16:35:27 2017 +0100

Add cautionary note re delay_reaping in account-server.conf-sample
    
    Change-Id: I2c3eea783321338316eecf467d30ba0b3217256c
    Related-Bug: #1514528

commit c6aea4b3730c937c41815831a7b4d60ff2899fcb
Author: Tim Burke <tim.burke@gmail.com>
Date:   Wed Sep 27 19:19:53 2017 +0000

Fix intermittent failure in test_x_delete_after
    
    X-Delete-After: 1 is known to be flakey; use 2 instead.
    
    When the proxy receives an X-Delete-After header, it automatically
    converts it to an X-Delete-At header based on the current time. So far,
    so good. But in normalize_delete_at_timestamp we convert our
    
        time.time() + int(req.headers['X-Delete-After'])
    
    to a string representation of an integer and in the process always round
    *down*. As a result, we lose up to a second worth of object validity,
    meaning the object server can (rarely) respond 400, complaining that the
    X-Delete-At is in the past.
    
    Change-Id: Ib5e5a48f5cbed0eade8ba3bca96b26c82a9f9d84
    Related-Change: I643be9af8f054f33897dd74071027a739eaa2c5c
    Related-Change: I10d3b9fcbefff3c415a92fa284a1ea1eda458581
    Related-Change: Ifdb1920e5266aaa278baa0759fc0bfaa1aff2d0d
    Related-Bug: #1597520
    Closes-Bug: #1699114

commit 5c76b9e691166acc1f7b8483aaa3980ebc70bd3a
Author: Alistair Coles <alistairncoles@gmail.com>
Date:   Wed Sep 27 14:11:14 2017 +0100

Add concurrent_gets to proxy.conf man page
    
    Change-Id: Iab1beff4899d096936c0e5915f3ec32364b3e517
    Closes-Bug: #1559347

commit b4f08b6090057897ac647ba6331a4ec867b8e3b8
Author: Jens Harbott <j.harbott@x-ion.de>
Date:   Wed Sep 27 09:10:54 2017 +0000

Fix functest for IPv6 endpoints
    
    Currently the functional tests fail if the storage_url contains a quoted
    IPv6 address because we try to split on ':'.
    
    But actually we don't need to split hostname and port only in order to
    combine it back together lateron. Use the standard urlparse() function
    instead and work with the 'netloc' part of the URL which keeps hostname
    and port together.
    
    Change-Id: I64589e5f2d6fb3cebc6768dc9e4de6264c09cbeb
    Partial-Bug: 1656329

commit 53ab6f2907eff2bb90528010d881f2f87ee02505
Author: Alistair Coles <alistairncoles@gmail.com>
Date:   Tue Sep 26 11:43:53 2017 +0100

Assert memcached connection error is logged
    
    Follow up to [1] - change logger mocking so that we can
    assert the memcached connection error is logged.
    
    [1] Related-Change: I97c5420b4b4ecc127e9e94e9d0f91fbe92a5f623
    
    Change-Id: I87cf4245082c5e0f0705c2c14ddfc0b5d5d89c06

commit e501ac7d2be5c11b2ed0005885c84023054ec041
Author: Matthew Oliver <matt@oliver.net.au>
Date:   Thu Sep 3 12:19:05 2015 +1000

Fix memcached exception out of range stacktrace
    
    When a memecached server goes offline in the middle of a
    MemcahceRing (swift memcache client) session then a request
    to memcached returns nothing and the client inside swift
    leaves an "IndexError: list index out of range" stacktrace.
    
    This change corrects that in all the places of MemcacheRing
    that is susceptible to it, and added some tests to stop
    regression.
    
    Clay added a diff to the bug that pretty much did the same
    thing I did, so I'll co-author him.
    
    Change-Id: I97c5420b4b4ecc127e9e94e9d0f91fbe92a5f623
    Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com>
    Closes-Bug: #897451

commit 23219664564d1b5a7ba02bbf8309ec699ab7a4cb
Author: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Date:   Fri Jun 30 02:03:48 2017 -0700

Accept a trade off of dispersion for balance
    
    ... but only if we *have* to!
    
    During the initial gather for balance we prefer to avoid replicas on
    over-weight devices that are already under-represented in any of it's
    tiers (i.e. if a zone has to have at least one, but may have as many of
    two, don't take the only replica).  Instead we hope by going for
    replicas on over-weight devices that are at the limits of their
    dispersion we might have a better than even chance we find a better
    place for them during placement!
    
    This normally works on out - and especially so for rings which can
    disperse and balance.  But for existing rings where we'd have to
    sacrifice dispersion to improve balance the existing optimistic gather
    will end up refusing to trade dispersion for balance - and instead get
    stuck without solving either!
    
    You should always be able to solve for *either* dispersion or balance.
    But if you can't solve *both* - we bail out on our optimistic gather
    much more quickly and instead just focus on improving balance.  With
    this change, the ring can get into balanced (and un-dispersed) states
    much more quickly!
    
    Change-Id: I17ac627f94f64211afaccad15596a9fcab2fada2
    Related-Change-Id: Ie6e2d116b65938edac29efa6171e2470bb3e8e12
    Closes-Bug: 1699636
    Closes-Bug: 1701472

commit 69a90dcd7511756ff72d89bb0b6f744e1a135456
Author: Thiago da Silva <thiago@redhat.com>
Date:   Mon Sep 25 13:27:50 2017 -0400

Remove reference to EC being in beta
    
    Closes-Bug: #1719095
    
    Change-Id: I8051895987bf72c8095e72b5a521042a13993174
    Signed-off-by: Thiago da Silva <thiago@redhat.com>

commit 64d24076842559fcfc5d654eaf3a303b3112ea38
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Tue Sep 19 18:11:09 2017 -0700

Follow-up test fixup
    
    Use more literals to make test more obvious/readable - DAMP not DRY.
    
    Change-Id: I2562085c829dbc2c812d8e624d6b71a7ccee91ed
    Related-Change-Id: Ie153d01479c4242c01f48bf0ada78c2f9b6c8ff0

commit cc17c99e73e9ddb1768f2979074c3ec043e0a3b4
Author: Tim Burke <tim.burke@gmail.com>
Date:   Thu Sep 21 22:25:57 2017 +0000

Stop reloading swift.common.utils in test_daemon
    
    This was causing some headaches over on feature/deep where a __eq__
    wasn't working as expected because neither self nor other was an
    instance of the class we thought we were using. Apparently, this
    also fixes some issues when using fake_syslog = True?
    
    There are two other places that we use reload_module, in
    test_db_replicator and test_manager, but the monkey patching isn't
    nearly as straight-forward.
    
    Change-Id: I94d6578e275219e9687fee2f0c7cc4f99454b77f
    Related-Bug: 1704192

commit c6d00fe22f5c9962928cfb94635a79097d3f0c6b
Author: Tim Burke <tim.burke@gmail.com>
Date:   Mon Sep 11 15:03:12 2017 +0000

api-ref: Fix container PUT response codes
    
    Change-Id: I7b57b6ee7095105399518873f8ae59da63cd8ce5
    Closes-Bug: #1549411

commit 5e673a542401a2d95249c3c03f26175214f08c79
Author: Tim Burke <tim.burke@gmail.com>
Date:   Thu May 25 09:11:43 2017 -0700

Log deprecation warning for allow_versions
    
    ...with the hope of removing support for the option in the future.
    We'll forever need to watch for X-Versions-Location in on-disk
    data, though.
    
    Change-Id: I19c4c66102aa96df393a642dbcf984f77aa3f25a
    Related-Change: Ie899290b3312e201979eafefb253d1a60b65b837

tags:

added: in-feature-deep

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-16: Fix proposed to swift (feature/s3api)

#4

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/512277

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-16:

#5

Fix proposed to branch: feature/s3api
Review: https://review.openstack.org/512283

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-16: Change abandoned on swift (feature/s3api)

#6

Change abandoned by Alistair Coles (<email address hidden>) on branch: feature/s3api
Review: https://review.openstack.org/512283
Reason: I was just trying to get sensible topic

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-18: Fix merged to swift (feature/s3api)

#7

Download full text (23.2 KiB)

Reviewed: https://review.openstack.org/512277
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f94d6567a7e2e8b3ca1168b4a41c42c1ee371af5
Submitter: Zuul
Branch: feature/s3api

commit 24188beb81d39790034fa0902246163a7bf54c91
Author: Samuel Merritt <email address hidden>
Date: Thu Oct 12 16:13:25 2017 -0700

Remove some leftover threadpool cruft.

Change-Id: I43a1a428bd96a2e18aac334c03743a9f94f7d3e1

commit 1d67485c0b935719e0c8999eb353dfd84713add6
Author: Samuel Merritt <email address hidden>
Date: Fri Apr 15 12:43:44 2016 -0700

Move all monkey patching to one function

Change-Id: I2db2e53c50bcfa17f08a136581cfd7ac4958ada2

commit 407f5394f0f5cb422c06b4e5b2f9fbfdb07782d1
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Oct 12 08:12:38 2017 +0000

Imported Translations from Zanata

For more information about this automatic import see:
https://docs.openstack.org/i18n/latest/reviewing-translation-import.html

Change-Id: I628cb09aa78d8e339b4762a3c9ed8aed43941261

commit 45ca39fc68cdb42b382c1638a92cc8d3cec5529a
Author: Clay Gerrard <email address hidden>
Date: Tue Oct 10 11:47:50 2017 -0700

add mangle_client_paths to example config

Change-Id: Ic1126fc95e8152025fccf25356c253facce3e3ec

commit 94bac4ab2fe65104d602378e8e49c37b8187a75d
Author: Tim Burke <email address hidden>
Date: Fri May 12 10:55:21 2017 -0400

domain_remap: stop mangling client-provided paths

    The root_path option for domain_remap seems to serve two purposes:
     - provide the first component (version) for the backend request
     - be an optional leading component for the client request, which
       should be stripped off

As a result, we have mappings like:

c.a.example.com/v1/o -> /v1/AUTH_a/c/o

instead of

c.a.example.com/v1/o -> /v1/AUTH_a/c/v1/o

which is rather bizarre. Why on earth did we *ever* start doing this?

Now, this second behavior is managed by a config option
(mangle_client_paths) with the default being to disable it.

Upgrade Consideration
=====================

If for some reason you *do* want to drop some parts of the
client-supplied path, add

mangle_client_paths = True

to the [filter:domain_remap] section of your proxy-server.conf. Do this
before upgrading to avoid any loss of availability.

UpgradeImpact
Change-Id: I87944bfbf8b767e1fc36dbc7910305fa1f11eeed

commit a4a5494fd2fe8a43a5d50a21a1951266cc7c4212
Author: Alistair Coles <email address hidden>
Date: Mon Oct 9 11:33:28 2017 +0100

test account autocreate listing format

Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
Change-Id: I50c22225bbebff71600bea9158bda1edd18b48b0

commit 8b7f15223cde4c19fd9cbbd97e8ad79a1b4afa8d
Author: Alistair Coles <email address hidden>
Date: Mon Oct 9 10:06:19 2017 +0100

Add example to container-sync-realms.conf.5 man page

Related-Change: I0760ce149e6d74f2b3f1badebac3e36da1ab7e77

Change-Id: I129de42f91d7924c7bcb9952f17fe8a1a10ae219

commit 816331155c624c444ed123bcab412...

Reviewed:  https://review.openstack.org/512277
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f94d6567a7e2e8b3ca1168b4a41c42c1ee371af5
Submitter: Zuul
Branch:    feature/s3api

commit 24188beb81d39790034fa0902246163a7bf54c91
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Oct 12 16:13:25 2017 -0700

Remove some leftover threadpool cruft.
    
    Change-Id: I43a1a428bd96a2e18aac334c03743a9f94f7d3e1

commit 1d67485c0b935719e0c8999eb353dfd84713add6
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Fri Apr 15 12:43:44 2016 -0700

Move all monkey patching to one function
    
    Change-Id: I2db2e53c50bcfa17f08a136581cfd7ac4958ada2

commit 407f5394f0f5cb422c06b4e5b2f9fbfdb07782d1
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Thu Oct 12 08:12:38 2017 +0000

Imported Translations from Zanata
    
    For more information about this automatic import see:
    https://docs.openstack.org/i18n/latest/reviewing-translation-import.html
    
    Change-Id: I628cb09aa78d8e339b4762a3c9ed8aed43941261

commit 45ca39fc68cdb42b382c1638a92cc8d3cec5529a
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Tue Oct 10 11:47:50 2017 -0700

add mangle_client_paths to example config
    
    Change-Id: Ic1126fc95e8152025fccf25356c253facce3e3ec

commit 94bac4ab2fe65104d602378e8e49c37b8187a75d
Author: Tim Burke <tim.burke@gmail.com>
Date:   Fri May 12 10:55:21 2017 -0400

domain_remap: stop mangling client-provided paths
    
    The root_path option for domain_remap seems to serve two purposes:
     - provide the first component (version) for the backend request
     - be an optional leading component for the client request, which
       should be stripped off
    
    As a result, we have mappings like:
    
       c.a.example.com/v1/o -> /v1/AUTH_a/c/o
    
    instead of
    
       c.a.example.com/v1/o -> /v1/AUTH_a/c/v1/o
    
    which is rather bizarre. Why on earth did we *ever* start doing this?
    
    Now, this second behavior is managed by a config option
    (mangle_client_paths) with the default being to disable it.
    
    Upgrade Consideration
    =====================
    
    If for some reason you *do* want to drop some parts of the
    client-supplied path, add
    
       mangle_client_paths = True
    
    to the [filter:domain_remap] section of your proxy-server.conf. Do this
    before upgrading to avoid any loss of availability.
    
    UpgradeImpact
    Change-Id: I87944bfbf8b767e1fc36dbc7910305fa1f11eeed

commit a4a5494fd2fe8a43a5d50a21a1951266cc7c4212
Author: Alistair Coles <alistairncoles@gmail.com>
Date:   Mon Oct 9 11:33:28 2017 +0100

test account autocreate listing format
    
    Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d
    Change-Id: I50c22225bbebff71600bea9158bda1edd18b48b0

commit 8b7f15223cde4c19fd9cbbd97e8ad79a1b4afa8d
Author: Alistair Coles <alistairncoles@gmail.com>
Date:   Mon Oct 9 10:06:19 2017 +0100

Add example to container-sync-realms.conf.5 man page
    
    Related-Change: I0760ce149e6d74f2b3f1badebac3e36da1ab7e77
    
    Change-Id: I129de42f91d7924c7bcb9952f17fe8a1a10ae219

commit 816331155c624c444ed123bcab412821bd7854fb
Author: HCLTech-SSW <hcl_ss_oss@hcl.com>
Date:   Fri Oct 6 01:37:34 2017 -0700

Added the man page for container-sync-realms.conf
    
    Updated the comments of reviewers.
    
    Change-Id: I0760ce149e6d74f2b3f1badebac3e36da1ab7e77
    Closes-Bug: #1607026

commit 747b9d928624a3f44f1f9f0269489597cddc5997
Author: Jan Zerebecki <jan.openstack@zerebecki.de>
Date:   Wed Oct 4 21:14:03 2017 +0200

Fix swift-ring-builder set_weight with >1 device
    
    When iterating over the (device, weight) tuples do not carry over the
    device from the previous iteration.
    
    Closes-Bug: 1454433
    Change-Id: Iba82519b0b2bc80e2c1abbed308b651c4da4b06a

commit 839c13003aea955c48e77269d4d40a567e07dd44
Author: Tim Burke <tim.burke@gmail.com>
Date:   Wed Oct 4 18:59:49 2017 +0000

Stop clearing params for account_autocreate responses
    
    Otherwise, we send back a 204 where middlewares should be expecting
    a 200 and an empty JSON array.
    
    Change-Id: I05549342327108f71b60a316f734c55bc9589915
    Related-Change: Id3ce37aa0402e2d8dd5784ce329d7cb4fbaf700d

commit 4665c175be7f5299b577925e922a59dfa33ada8c
Author: Tim Burke <tim.burke@gmail.com>
Date:   Mon Oct 2 22:56:42 2017 +0000

Clean up SLO tests and docs
    
    Change-Id: If7087cb674d6c575c4073ba09b5ef056d908655b

commit 79905ae794db2da82c8834dc24177b1820b8c53a
Author: Tim Burke <tim.burke@gmail.com>
Date:   Wed Sep 27 22:10:42 2017 +0000

Replace SOSO auth prefix in examples with more-standard AUTH
    
    Change-Id: I98643d6acf248840a8360f31e446bc8ecb834898

commit 4716d3da1188eb2f2971004461554b05d0061ec6
Author: Tim Burke <tim.burke@gmail.com>
Date:   Wed Sep 27 22:05:40 2017 +0000

swift-account-audit: compare each etag to the hash from container
    
    ...rather than only comparing the ETag from the last response over and
    over again.
    
    NB: This tool *does not* like EC data :-(
    
    Change-Id: Idd37f94b07f607ab8a404dd986760361c39af029
    Closes-Bug: 1266636