swift-ring-builder rebalance - endless loop

Bug #1642538 reported by Falk Reimann on 2016-11-17
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Undecided
Tim Burke
Ubuntu Cloud Archive
Status tracked in Ocata
Mitaka
High
Unassigned
Newton
High
Unassigned
Ocata
High
Unassigned
swift (Ubuntu)
Status tracked in Zesty
Xenial
High
Unassigned
Yakkety
High
Unassigned
Zesty
High
Unassigned

Bug Description

There is an issue with swift-ring-builder and rebalancing under certain conditions beginning in Mitaka.

Steps to reproduce:
Create a ring with part power of 12. 3 servers with 11 devices each, BUT 2 servers in zone 1 and one server in zone 2. After adding all devices as described above, swift-ring-builder <ring>.builder rebalance does not return and seems to run into an endless loop.

The weirdness starts with the 11th device, 10 devices still work. It does not matter whether you add each 10 devices from the 3 servers and rebalance, and than add the 11th device from each server afterwards or do it all at once.

This behaviour does not exists in Liberty and starts with Mitaka.

For connivence I am adding the statements to reproduce:

swift-ring-builder account.builder create 12 3 1

swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-01 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-02 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-03 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-04 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-05 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-06 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-07 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-08 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-09 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-10 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.60 --port 6002 --replication-ip 10.46.15.60 --replication-port 6002 --device swift-11 --weight 100.00

swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-01 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-02 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-03 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-04 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-05 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-06 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-07 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-08 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-09 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-10 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 1 --ip 10.46.15.61 --port 6002 --replication-ip 10.46.15.61 --replication-port 6002 --device swift-11 --weight 100.00

swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-01 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-02 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-03 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-04 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-05 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-06 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-07 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-08 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-09 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-10 --weight 100.00
swift-ring-builder account.builder add --region 1 --zone 2 --ip 10.46.15.62 --port 6002 --replication-ip 10.46.15.62 --replication-port 6002 --device swift-11 --weight 100.00

swift-ring-builder account.builder rebalance

Tim Burke (1-tim-z) wrote :

FWIW, running with --debug, I see something like https://gist.github.com/tipabu/9687531671aa9a193ddd51151c511267

I wonder if maybe the "requires -3.05311331772e-16 overload" is indicating some numerical analysis troubles?

Changed in swift:
status: New → Confirmed
Tim Burke (1-tim-z) wrote :

With a little more debug logging in place, I tracked it down to https://github.com/openstack/swift/blob/2.10.0/swift/common/ring/builder.py#L806. Apparently we can get parts to go negative, which causes the loop five lines later, on `while parts`. Should be an easy fix, though.

Changed in swift:
assignee: nobody → Tim Burke (1-tim-z)

Fix proposed to branch: master
Review: https://review.openstack.org/399237

Changed in swift:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/399237
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2e7a7347fc58676fbaabce3d87a15866796d32e4
Submitter: Jenkins
Branch: master

commit 2e7a7347fc58676fbaabce3d87a15866796d32e4
Author: Tim Burke <email address hidden>
Date: Thu Nov 17 13:02:06 2016 -0800

    Avoid infinite loop while placing parts

    Previously, we could over-assign how many parts should be in a tier.
    This would cause the local `parts` variable to go negative, which meant
    that our `while parts` loop would never terminate.

    Change-Id: Id7e7889742ca37cf1a9c0d55fba78d967e90e8d0
    Closes-Bug: 1642538

Changed in swift:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/399722
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a419b397dbc03e3a06d708631eef7e800e98503a
Submitter: Jenkins
Branch: stable/newton

commit a419b397dbc03e3a06d708631eef7e800e98503a
Author: Tim Burke <email address hidden>
Date: Thu Nov 17 13:02:06 2016 -0800

    Avoid infinite loop while placing parts

    Previously, we could over-assign how many parts should be in a tier.
    This would cause the local `parts` variable to go negative, which meant
    that our `while parts` loop would never terminate.

    Change-Id: Id7e7889742ca37cf1a9c0d55fba78d967e90e8d0
    Closes-Bug: 1642538
    (cherry picked from commit 2e7a7347fc58676fbaabce3d87a15866796d32e4)

tags: added: in-stable-newton
Download full text (78.0 KiB)

Reviewed: https://review.openstack.org/400985
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0c3f8f87104af8717115c5badffd243dbaa1c430
Submitter: Jenkins
Branch: feature/hummingbird

commit 2d25fe6ad3573b2a06b6b3e5e66493d7b0c55693
Author: Tim Burke <email address hidden>
Date: Mon Jul 25 15:06:23 2016 -0700

    Reduce backend requests for SLO If-Match / HEAD requests

    ... by storing SLO Etag and size in sysmeta.

    Previously, we had to GET the manifest for every HEAD or conditional
    request to an SLO. Worse, since SLO PUTs require that we HEAD every
    segment, we'd GET all included sub-SLO manifests. This was necessary so
    we could recompute the large object's Etag and content-length.

    Since we already know both of those during PUT, we'll now store it in
    object sysmeta. This allows us to:

     * satisfy HEAD requests based purely off of the manifest's HEAD
       response, and
     * perform the If-(None-)Match comparison on the object server, without
       any additional subrequests.

    Note that the large object content-length can't just be parsed from
    content-type -- with fast-POST enabled, the content-type coming out of
    the object-server won't necessarily include swift_bytes.

    Also note that we must still fall back to GETting the manifest if the
    sysmeta headers were not found. Otherwise, we'd break existing large
    objects.

    Change-Id: Ia6ad32354105515560b005cea750aa64a88c96f9

commit ae7dddd801e28217d7dc46bd45cd6b621f29340c
Author: Ondřej Nový <email address hidden>
Date: Mon Nov 21 22:13:11 2016 +0100

    Added comment for "user" option in drive-audit config

    Change-Id: I24362826bee85ac3304e9b63504c9465da673014

commit c3e1d847f4b9d6cc6212aae4dc1b1e6dff45fb40
Author: Thiago da Silva <email address hidden>
Date: Thu Nov 17 17:17:00 2016 -0500

    breaking down tests.py into smaller pieces

    tests.py is currently at ~5500 lines of code, it's
    time to break it down into smaller files.

    I started with an easy middleware set of tests
    (i.e., versioned writes, ~600 lines of code ) so I can get
    some feedback. There are more complicated tests that cover
    multiple middlewares for example, it is not so clear where
    those should go.

    Change-Id: I2aa6c18ee5b68d0aae73cc6add8cac6fbf7f33da
    Signed-off-by: Thiago da Silva <email address hidden>

commit 5d7a3a4172f0f11ab870252eec784cf24b247dea
Author: Ondřej Nový <email address hidden>
Date: Sat Nov 19 23:24:30 2016 +0100

    Removed "in-process-" from func env tox name

    This shorten shebang in infra, because we are hitting 128 bytes limit.

    Change-Id: I02477d81b836df71780942189d37d616944c4dce

commit 9ea340256996a03c8c744201297b47a0e91fe65b
Author: Kota Tsuyuzaki <email address hidden>
Date: Fri Nov 18 01:50:11 2016 -0800

    Don't overwrite built-in 'id'

    This is a follow up for https://review.openstack.org/#/c/399237

    'id' is assigned as a builtin function so that we should not use 'id'
    for the local variable name.

    Change-Id: Ic27460d49e68f6cd50bda1d5b3810e01ccb07a37

commit bf...

tags: added: in-feature-hummingbird
Changed in swift (Ubuntu Xenial):
status: New → Triaged
Changed in swift (Ubuntu Yakkety):
status: New → Triaged
Changed in swift (Ubuntu Zesty):
importance: Undecided → High
status: New → Triaged
Changed in swift (Ubuntu Yakkety):
importance: Undecided → High
Changed in swift (Ubuntu Xenial):
importance: Undecided → High

Reviewed: https://review.openstack.org/399723
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=17f6a6989d09d3d7522efcedf747d773d11d2535
Submitter: Jenkins
Branch: stable/mitaka

commit 17f6a6989d09d3d7522efcedf747d773d11d2535
Author: Tim Burke <email address hidden>
Date: Thu Nov 17 13:02:06 2016 -0800

    Avoid infinite loop while placing parts

    Previously, we could over-assign how many parts should be in a tier.
    This would cause the local `parts` variable to go negative, which meant
    that our `while parts` loop would never terminate.

    Change-Id: Id7e7889742ca37cf1a9c0d55fba78d967e90e8d0
    Closes-Bug: 1642538
    (cherry picked from commit 2e7a7347fc58676fbaabce3d87a15866796d32e4)

Corey Bryant (corey.bryant) wrote :

This has been uploaded to zesty, as well as the yakkety and xenial review queues where packages are awaiting SRU team review.

Corey Bryant (corey.bryant) wrote :

SRU information for Ubuntu

[Description]
See bug description above.

[Testcase]
See bug description above.

[Regression Potential]
Minimal as patches have been cherry-picked from upstream branches.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package swift - 2.11.0-0ubuntu2

---------------
swift (2.11.0-0ubuntu2) zesty; urgency=medium

  * d/p/avoid-infinite-loop-while-placing-parts.patch: Cherry-picked from
    upstream master branch to avoid infinite loop while placing parts
    (LP: #1642538).

 -- Corey Bryant <email address hidden> Fri, 09 Dec 2016 09:05:26 -0500

Changed in swift (Ubuntu Zesty):
status: Triaged → Fix Released

This issue was fixed in the openstack/swift 2.7.1 release.

This issue was fixed in the openstack/swift 2.10.1 release.

This issue was fixed in the openstack/swift 2.12.0 release.

Hello Falk, or anyone else affected,

Accepted swift into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/swift/2.10.0-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in swift (Ubuntu Yakkety):
status: Triaged → Fix Committed
tags: added: verification-needed
Changed in swift (Ubuntu Xenial):
status: Triaged → Fix Committed
Brian Murray (brian-murray) wrote :

Hello Falk, or anyone else affected,

Accepted swift into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/swift/2.7.0-0ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Corey Bryant (corey.bryant) wrote :

Regression tested successfully on xenial-proposed and yakkety-proposed.

Ryan Beisner (1chb1n) on 2017-01-05
Changed in cloud-archive:
status: Triaged → Fix Committed
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package swift - 2.7.0-0ubuntu2.1

---------------
swift (2.7.0-0ubuntu2.1) xenial; urgency=medium

  * d/p/avoid-infinite-loop-while-placing-parts.patch: Cherry-picked from
    upstream stable/mitaka branch to avoid infinite loop while placing parts
    (LP: #1642538).

 -- Corey Bryant <email address hidden> Fri, 09 Dec 2016 10:40:09 -0500

Changed in swift (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for swift has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package swift - 2.10.0-0ubuntu1.1

---------------
swift (2.10.0-0ubuntu1.1) yakkety; urgency=medium

  * d/p/avoid-infinite-loop-while-placing-parts.patch: Cherry-picked from
    upstream stable/newton branch to avoid infinite loop while placing parts
    (LP: #1642538).

 -- Corey Bryant <email address hidden> Fri, 09 Dec 2016 10:08:37 -0500

Changed in swift (Ubuntu Yakkety):
status: Fix Committed → Fix Released

Hello Falk, or anyone else affected,

Accepted swift into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers