builder file out of balance but cannot be rebalanced, have 0 weight drives with 1 partition

Bug #1400083 reported by Caleb Tennis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Samuel Merritt

Bug Description

In the attached builder file, we have a lot of drives that have been given 0 weight, however some of the drives still have a lone partition assigned to them. The balance is 999, but an attempt to rebalance fails saying "Cowardly refusing to save rebalance as it did not change at least 1%."

Revision history for this message
Caleb Tennis (ctennis) wrote :
Revision history for this message
Samuel Merritt (torgomatic) wrote :

I tried it, and swift-ring-builder definitely updated the builder file (the md5sum changed).

swift@saio:~/rb$ swift-ring-builder account-1.builder rebalance
Reassigned 35 (0.05%) partitions. Balance is now 999.99.
-------------------------------------------------------------------------------
NOTE: Balance of 999.99 indicates you should push this
      ring, wait at least 1 hours, and rebalance/repush.
-------------------------------------------------------------------------------

Maybe it had to do with min_part_hours? What happens if you try again now (it's been more than one hour since this bug was filed)?

Changed in swift:
status: New → Incomplete
Revision history for this message
Caleb Tennis (ctennis) wrote :

Same issue. In this case we're retrying it every 65 minutes expecting it improve (it does not).

This is 2.2.0 btw.

Revision history for this message
Caleb Tennis (ctennis) wrote :
Download full text (9.1 KiB)

Actually on a complete different machine I was able to get it to rebalance, but the change isn't anything helpful (it didn't move the lone partitions anywhere, instead just shifting some of the 4095s to 4096 and visa versa).

65536 partitions, 3.000000 replicas, 1 regions, 3 zones, 419 devices, 999.99 balance
 The minimum number of hours before a partition can be reassigned is 1
 Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
@@ -6,7 +6,7 @@
              1 1 3 10.149.30.7 6002 10.149.30.7 6005 d1 3000.59 4096 -0.00
              2 1 3 10.149.30.7 6002 10.149.30.7 6005 d2 3000.59 4096 -0.00
              3 1 3 10.149.30.7 6002 10.149.30.7 6005 d3 0.00 0 0.00
- 4 1 3 10.149.30.7 6002 10.149.30.7 6005 d4 3000.59 4095 -0.02
+ 4 1 3 10.149.30.7 6002 10.149.30.7 6005 d4 3000.59 4096 -0.00
              5 1 3 10.149.30.7 6002 10.149.30.7 6005 d5 3000.59 4095 -0.02
              6 1 3 10.149.30.7 6002 10.149.30.7 6005 d6 0.00 0 0.00
              7 1 3 10.149.30.7 6002 10.149.30.7 6005 d7 0.00 0 0.00
@@ -69,7 +69,7 @@
             65 1 2 10.149.30.8 6002 10.149.30.8 6005 d65 0.00 0 0.00
             66 1 1 10.149.30.17 6002 10.149.30.17 6005 d66 0.00 0 0.00
             67 1 2 10.149.30.8 6002 10.149.30.8 6005 d67 0.00 0 0.00
- 68 1 2 10.149.30.9 6002 10.149.30.9 6005 d68 3000.59 4095 -0.02
+ 68 1 2 10.149.30.9 6002 10.149.30.9 6005 d68 3000.59 4096 -0.00
             69 1 2 10.149.30.9 6002 10.149.30.9 6005 d69 3000.59 4096 -0.00
             70 1 2 10.149.30.9 6002 10.149.30.9 6005 d70 3000.59 4096 -0.00
             71 1 2 10.149.30.9 6002 10.149.30.9 6005 d71 3000.59 4095 -0.02
@@ -103,8 +103,8 @@
             99 1 2 10.149.30.9 6002 10.149.30.9 6005 d99 0.00 0 0.00
            100 1 2 10.149.30.9 6002 10.149.30.9 6005 d100 0.00 0 0.00
            101 1 2 10.149.30.9 6002 10.149.30.9 6005 d101 0.00 0 0.00
- 102 1 2 10.149.30.11 6002 10.149.30.11 6005 d102 3000.59 4096 -0.00
- 103 1 2 10.149.30.11 6002 10.149.30.11 6005 d103 3000.59 4096 -0.00
+ 102 1 2 10.149.30.11 6002 10.149...

Read more...

Revision history for this message
Caleb Tennis (ctennis) wrote :

The original notion of being unable to rebalance is a red herring, looks like that was against an older version of the ring builder and works now with 2.2.0. However the point still stands that the changed partitions are not the ones that should be clearing up.

Changed in swift:
status: Incomplete → Confirmed
importance: Undecided → High
assignee: nobody → Samuel Merritt (torgomatic)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/140879

Changed in swift:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/140879
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=1880351f1a862ae434ab23701535628f6f9258e1
Submitter: Jenkins
Branch: master

commit 1880351f1a862ae434ab23701535628f6f9258e1
Author: Samuel Merritt <email address hidden>
Date: Wed Dec 10 15:59:21 2014 -0800

    Only move too-close-together replicas when they can spread out.

    Imagine a 3-zone ring, and consider a partition in that ring with
    replicas placed as follows:

    * replica 0 is on device A (zone 2)
    * replica 1 is on device B (zone 1)
    * replica 2 is on device C (zone 2)

    Further, imagine that there are zero parts_wanted in all of zone 3;
    that is, zone 3 is completely full. However, zones 1 and 2 each have
    at least one parts_wanted on at least one device.

    When the ring builder goes to gather replicas to move, it gathers
    replica 0 because there are three zones available, but the replicas
    are only in two of them. Then, it places replica 0 in zone 1 or 2
    somewhere because those are the only zones with parts_wanted. Notice
    that this does *not* do anything to spread the partition out better.

    Then, on the next rebalance, replica 0 gets picked up and moved
    (again) but doesn't improve its placement (again).

    If your builder has min_part_hours > 0 (and it should), then replicas
    1 and 2 cannot move at all. A coworker observed the bug because a
    customer had such a partition, and its replica 2 was on a zero-weight
    device. He thought it odd that a zero-weight device should still have
    one partition on it despite the ring having been rebalanced dozens of
    times.

    Even if you don't have zero-weight devices, having a bunch of
    partitions trade places on each rebalance isn't particularly good.

    Note that this only happens with an unbalanceable ring; if the ring
    *can* balance, the gathered partitions will swap places, but they will
    get spread across more zones, so they won't get gathered up again on
    the next rebalance.

    Change-Id: I8f44f032caac25c44778a497dedf23f5cb61b6bb
    Closes-Bug: 1400083

Changed in swift:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/148983

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/ec)
Download full text (18.2 KiB)

Reviewed: https://review.openstack.org/148983
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ef59cde83f176df9e36064f70142c9d8e81318fe
Submitter: Jenkins
Branch: feature/ec

commit 6f21504ccc9046e3f0b4db88f78297a00030dd3d
Author: Kota Tsuyuzaki <email address hidden>
Date: Tue Jan 13 05:34:37 2015 -0800

    Fix missing content length of Response

    This patch fixes swob.Response to set missing content
    length correctly.

    When a child class of swob.Response is initialized with
    both "body" and "headers" arguments which includes content
    length, swob.Response might loose the acutual content length
    generated from the body because "headers" will overwrite the
    content length property after the body assignment.

    It'll cause the difference between headers's content length
    and acutual body length. This would affect mainly 3rd party
    middleware(s) to make an original response as follows:

    req = swob.Request.blank('/')
    req.method = 'HEAD'
    resp = req.get_response(app)
    return HTTPOk(body='Ok', headers=resp.headers)

    This patch changes the order of headers updating and then
    fixes init() to set correct content length.

    Change-Id: Icd8b7cbfe6bbe2c7965175969af299a5eb7a74ef

commit b434be452ead0625728afedfe01bac1c30629d30
Author: Donagh McCabe <email address hidden>
Date: Thu Jan 8 14:52:32 2015 +0000

    Use TCP_NODELAY on outgoing connections

    On a loopback device (e.g., when proxy-server and object-server are on
    same node), PUTs in the range 64-200K may experience a delay due to the
    effect of Nagel interacting with the loopback MTU of 64K.

    This effect has been directly seen by Mark Seger and Rick Jones on a
    proxy-server to object-server PUT. However, you could expect to see a
    similar effect on replication via ssync if the object being replicated
    is on a different drive on the same node.

    A prior change [1] related to Nagel set TCP_NODELAY on responses. This change
    sets it on all outgoing connections.

    [1] I11f86df1f56fba1c6ab6084dc1f580c395f072dc

    Change-Id: Ife8885a42b289a5eb4ac7e4698f8889858bc8b7e
    Closes-bug: 1408622

commit b5586427e503ee22c0b20b109cad83e166ed3fd8
Author: Pete Zaitcev <email address hidden>
Date: Sat Jan 10 17:14:46 2015 -0700

    Drop redundant index output

    The output of format_device() now includes index as the first "dX"
    element, for example d1r1z2-127.0.0.1:6200R127.0.0.1:6200/db_"".

    Change-Id: Ib5f8e3a692fddbe8b1f4994787b2883130e9536f

commit c65bc49e099928801b80dce399d6098f7e10e137
Author: Pete Zaitcev <email address hidden>
Date: Sat Jan 10 08:20:25 2015 -0700

    Mark the --region as mandatory

    We used to permit to omit region in the old parameter syntax, although
    we now throw a warning if it's missing. In the new parameter syntax,
    --region is mandatory. It's enforced by build_dev_from_opts in
    swift/common/ring/utils.py.

    On the other hand, --replication-ip, --replication-port, and --meta
    are not obligatory.

    Change-Id: Ia70228f2c99595501271765286431f68e82e800b
...

Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.2.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.