Bug #1400083 “builder file out of balance but cannot be rebalanc...” : Bugs : OpenStack Object Storage (swift)

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-12-07:

#1

account-1.builder Edit (630.1 KiB, application/octet-stream)

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2014-12-09:

#2

I tried it, and swift-ring-builder definitely updated the builder file (the md5sum changed).

swift@saio:~/rb$ swift-ring-builder account-1.builder rebalance
Reassigned 35 (0.05%) partitions. Balance is now 999.99.
-------------------------------------------------------------------------------
NOTE: Balance of 999.99 indicates you should push this
ring, wait at least 1 hours, and rebalance/repush.
-------------------------------------------------------------------------------

Maybe it had to do with min_part_hours? What happens if you try again now (it's been more than one hour since this bug was filed)?

Changed in swift:
status:	New → Incomplete

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-12-09:

#3

Same issue. In this case we're retrying it every 65 minutes expecting it improve (it does not).

This is 2.2.0 btw.

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-12-09:

#4

Download full text (9.1 KiB)

Actually on a complete different machine I was able to get it to rebalance, but the change isn't anything helpful (it didn't move the lone partitions anywhere, instead just shifting some of the 4095s to 4096 and visa versa).

65536 partitions, 3.000000 replicas, 1 regions, 3 zones, 419 devices, 999.99 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
@@ -6,7 +6,7 @@
              1 1 3 10.149.30.7 6002 10.149.30.7 6005 d1 3000.59 4096 -0.00
              2 1 3 10.149.30.7 6002 10.149.30.7 6005 d2 3000.59 4096 -0.00
              3 1 3 10.149.30.7 6002 10.149.30.7 6005 d3 0.00 0 0.00
- 4 1 3 10.149.30.7 6002 10.149.30.7 6005 d4 3000.59 4095 -0.02
+ 4 1 3 10.149.30.7 6002 10.149.30.7 6005 d4 3000.59 4096 -0.00
              5 1 3 10.149.30.7 6002 10.149.30.7 6005 d5 3000.59 4095 -0.02
              6 1 3 10.149.30.7 6002 10.149.30.7 6005 d6 0.00 0 0.00
              7 1 3 10.149.30.7 6002 10.149.30.7 6005 d7 0.00 0 0.00
@@ -69,7 +69,7 @@
             65 1 2 10.149.30.8 6002 10.149.30.8 6005 d65 0.00 0 0.00
             66 1 1 10.149.30.17 6002 10.149.30.17 6005 d66 0.00 0 0.00
             67 1 2 10.149.30.8 6002 10.149.30.8 6005 d67 0.00 0 0.00
- 68 1 2 10.149.30.9 6002 10.149.30.9 6005 d68 3000.59 4095 -0.02
+ 68 1 2 10.149.30.9 6002 10.149.30.9 6005 d68 3000.59 4096 -0.00
             69 1 2 10.149.30.9 6002 10.149.30.9 6005 d69 3000.59 4096 -0.00
             70 1 2 10.149.30.9 6002 10.149.30.9 6005 d70 3000.59 4096 -0.00
             71 1 2 10.149.30.9 6002 10.149.30.9 6005 d71 3000.59 4095 -0.02
@@ -103,8 +103,8 @@
             99 1 2 10.149.30.9 6002 10.149.30.9 6005 d99 0.00 0 0.00
            100 1 2 10.149.30.9 6002 10.149.30.9 6005 d100 0.00 0 0.00
            101 1 2 10.149.30.9 6002 10.149.30.9 6005 d101 0.00 0 0.00
- 102 1 2 10.149.30.11 6002 10.149.30.11 6005 d102 3000.59 4096 -0.00
- 103 1 2 10.149.30.11 6002 10.149.30.11 6005 d103 3000.59 4096 -0.00
+ 102 1 2 10.149.30.11 6002 10.149...

Actually on a complete different machine I was able to get it to rebalance, but the change isn't anything helpful (it didn't move the lone partitions anywhere, instead just shifting some of the 4095s to 4096 and visa versa).

65536 partitions, 3.000000 replicas, 1 regions, 3 zones, 419 devices, 999.99 balance
 The minimum number of hours before a partition can be reassigned is 1
 Devices:    id  region  zone      ip address  port  replication ip  replication port      name weight partitions balance meta
@@ -6,7 +6,7 @@
              1       1     3     10.149.30.7  6002     10.149.30.7              6005        d1 3000.59       4096   -0.00
              2       1     3     10.149.30.7  6002     10.149.30.7              6005        d2 3000.59       4096   -0.00
              3       1     3     10.149.30.7  6002     10.149.30.7              6005        d3   0.00          0    0.00
-             4       1     3     10.149.30.7  6002     10.149.30.7              6005        d4 3000.59       4095   -0.02
+             4       1     3     10.149.30.7  6002     10.149.30.7              6005        d4 3000.59       4096   -0.00
              5       1     3     10.149.30.7  6002     10.149.30.7              6005        d5 3000.59       4095   -0.02
              6       1     3     10.149.30.7  6002     10.149.30.7              6005        d6   0.00          0    0.00
              7       1     3     10.149.30.7  6002     10.149.30.7              6005        d7   0.00          0    0.00
@@ -69,7 +69,7 @@
             65       1     2     10.149.30.8  6002     10.149.30.8              6005       d65   0.00          0    0.00
             66       1     1    10.149.30.17  6002    10.149.30.17              6005       d66   0.00          0    0.00
             67       1     2     10.149.30.8  6002     10.149.30.8              6005       d67   0.00          0    0.00
-            68       1     2     10.149.30.9  6002     10.149.30.9              6005       d68 3000.59       4095   -0.02
+            68       1     2     10.149.30.9  6002     10.149.30.9              6005       d68 3000.59       4096   -0.00
             69       1     2     10.149.30.9  6002     10.149.30.9              6005       d69 3000.59       4096   -0.00
             70       1     2     10.149.30.9  6002     10.149.30.9              6005       d70 3000.59       4096   -0.00
             71       1     2     10.149.30.9  6002     10.149.30.9              6005       d71 3000.59       4095   -0.02
@@ -103,8 +103,8 @@
             99       1     2     10.149.30.9  6002     10.149.30.9              6005       d99   0.00          0    0.00
            100       1     2     10.149.30.9  6002     10.149.30.9              6005      d100   0.00          0    0.00
            101       1     2     10.149.30.9  6002     10.149.30.9              6005      d101   0.00          0    0.00
-           102       1     2    10.149.30.11  6002    10.149.30.11              6005      d102 3000.59       4096   -0.00
-           103       1     2    10.149.30.11  6002    10.149.30.11              6005      d103 3000.59       4096   -0.00
+           102       1     2    10.149.30.11  6002    10.149.30.11              6005      d102 3000.59       4095   -0.02
+           103       1     2    10.149.30.11  6002    10.149.30.11              6005      d103 3000.59       4095   -0.02
            104       1     2    10.149.30.11  6002    10.149.30.11              6005      d104 3000.59       4096   -0.00
            105       1     2    10.149.30.11  6002    10.149.30.11              6005      d105 3000.59       4096   -0.00
            106       1     2    10.149.30.11  6002    10.149.30.11              6005      d106   0.00          0    0.00
@@ -139,10 +139,10 @@
            135       1     2    10.149.30.11  6002    10.149.30.11              6005      d135   0.00          0    0.00
            136       1     2    10.149.30.11  6002    10.149.30.11              6005      d136   0.00          0    0.00
            137       1     2    10.149.30.11  6002    10.149.30.11              6005      d137   0.00          0    0.00
-           138       1     3    10.149.30.12  6002    10.149.30.12              6005      d138 3000.59       4096   -0.00
-           139       1     3    10.149.30.12  6002    10.149.30.12              6005      d139 3000.59       4095   -0.02
-           140       1     3    10.149.30.12  6002    10.149.30.12              6005      d140 3000.59       4096   -0.00
-           141       1     3    10.149.30.12  6002    10.149.30.12              6005      d141 3000.59       4095   -0.02
+           138       1     3    10.149.30.12  6002    10.149.30.12              6005      d138 3000.59       4095   -0.02
+           139       1     3    10.149.30.12  6002    10.149.30.12              6005      d139 3000.59       4096   -0.00
+           140       1     3    10.149.30.12  6002    10.149.30.12              6005      d140 3000.59       4095   -0.02
+           141       1     3    10.149.30.12  6002    10.149.30.12              6005      d141 3000.59       4096   -0.00
            142       1     3    10.149.30.12  6002    10.149.30.12              6005      d142   0.00          0    0.00
            143       1     3    10.149.30.12  6002    10.149.30.12              6005      d143   0.00          0    0.00
            144       1     3    10.149.30.12  6002    10.149.30.12              6005      d144   0.00          0    0.00
@@ -177,7 +177,7 @@
            173       1     3    10.149.30.15  6002    10.149.30.15              6005      d173 3000.59       4095   -0.02
            174       1     3    10.149.30.15  6002    10.149.30.15              6005      d174 3000.59       4096   -0.00
            175       1     3    10.149.30.15  6002    10.149.30.15              6005      d175 3000.59       4095   -0.02
-           176       1     3    10.149.30.15  6002    10.149.30.15              6005      d176 3000.59       4095   -0.02
+           176       1     3    10.149.30.15  6002    10.149.30.15              6005      d176 3000.59       4096   -0.00
            177       1     3    10.149.30.15  6002    10.149.30.15              6005      d177   0.00          1  999.99
            178       1     3    10.149.30.15  6002    10.149.30.15              6005      d178   0.00          0    0.00
            179       1     3    10.149.30.15  6002    10.149.30.15              6005      d179   0.00          0    0.00
@@ -358,9 +358,9 @@
            418       1     3    10.149.30.16  6002    10.149.30.16              6005      d418   0.00          0    0.00
            419       1     3    10.149.30.16  6002    10.149.30.16              6005      d419   0.00          0    0.00
            420       1     3    10.149.30.14  6002    10.149.30.14              6005      d420 3000.59       4096   -0.00
-           421       1     3    10.149.30.14  6002    10.149.30.14              6005      d421 3000.59       4095   -0.02
+           421       1     3    10.149.30.14  6002    10.149.30.14              6005      d421 3000.59       4096   -0.00
            422       1     3    10.149.30.14  6002    10.149.30.14              6005      d422 3000.59       4096   -0.00
-           423       1     3    10.149.30.14  6002    10.149.30.14              6005      d423 3000.59       4095   -0.02
+           423       1     3    10.149.30.14  6002    10.149.30.14              6005      d423 3000.59       4096   -0.00
            424       1     3    10.149.30.14  6002    10.149.30.14              6005      d424   0.00          0    0.00
            425       1     3    10.149.30.14  6002    10.149.30.14              6005      d425   0.00          0    0.00
            426       1     3    10.149.30.14  6002    10.149.30.14              6005      d426   0.00          0    0.00
@@ -389,9 +389,9 @@
            450       1     3    10.149.30.14  6002    10.149.30.14              6005      d450   0.00          0    0.00
            451       1     3    10.149.30.14  6002    10.149.30.14              6005      d451   0.00          0    0.00
            452       1     3    10.149.30.14  6002    10.149.30.14              6005      d452   0.00          0    0.00
-           453       1     2    10.149.30.10  6002    10.149.30.10              6005      d453 3000.59       4096   -0.00
-           454       1     2    10.149.30.10  6002    10.149.30.10              6005      d454 3000.59       4096   -0.00
-           455       1     2    10.149.30.10  6002    10.149.30.10              6005      d455 3000.59       4096   -0.00
+           453       1     2    10.149.30.10  6002    10.149.30.10              6005      d453 3000.59       4095   -0.02
+           454       1     2    10.149.30.10  6002    10.149.30.10              6005      d454 3000.59       4095   -0.02
+           455       1     2    10.149.30.10  6002    10.149.30.10              6005      d455 3000.59       4095   -0.02
            456       1     2    10.149.30.10  6002    10.149.30.10              6005      d456 3000.59       4096   -0.00
            457       1     2    10.149.30.10  6002    10.149.30.10              6005      d457   0.00          0    0.00
            458       1     2    10.149.30.10  6002    10.149.30.10              6005      d458   0.00          0    0.00

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-12-09:

#5

The original notion of being unable to rebalance is a red herring, looks like that was against an older version of the ring builder and works now with 2.2.0. However the point still stands that the changed partitions are not the ones that should be clearing up.

Samuel Merritt (torgomatic) on 2014-12-11

Changed in swift:
status:	Incomplete → Confirmed
importance:	Undecided → High
assignee:	nobody → Samuel Merritt (torgomatic)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-11: Fix proposed to swift (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/140879

Changed in swift:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-07: Fix merged to swift (master)

#7

Reviewed: https://review.openstack.org/140879
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=1880351f1a862ae434ab23701535628f6f9258e1
Submitter: Jenkins
Branch: master

commit 1880351f1a862ae434ab23701535628f6f9258e1
Author: Samuel Merritt <email address hidden>
Date: Wed Dec 10 15:59:21 2014 -0800

Only move too-close-together replicas when they can spread out.

Imagine a 3-zone ring, and consider a partition in that ring with
replicas placed as follows:

    * replica 0 is on device A (zone 2)
    * replica 1 is on device B (zone 1)
    * replica 2 is on device C (zone 2)

    Further, imagine that there are zero parts_wanted in all of zone 3;
    that is, zone 3 is completely full. However, zones 1 and 2 each have
    at least one parts_wanted on at least one device.

    When the ring builder goes to gather replicas to move, it gathers
    replica 0 because there are three zones available, but the replicas
    are only in two of them. Then, it places replica 0 in zone 1 or 2
    somewhere because those are the only zones with parts_wanted. Notice
    that this does *not* do anything to spread the partition out better.

Then, on the next rebalance, replica 0 gets picked up and moved
(again) but doesn't improve its placement (again).

    If your builder has min_part_hours > 0 (and it should), then replicas
    1 and 2 cannot move at all. A coworker observed the bug because a
    customer had such a partition, and its replica 2 was on a zero-weight
    device. He thought it odd that a zero-weight device should still have
    one partition on it despite the ring having been rebalanced dozens of
    times.

Even if you don't have zero-weight devices, having a bunch of
partitions trade places on each rebalance isn't particularly good.

    Note that this only happens with an unbalanceable ring; if the ring
    *can* balance, the gathered partitions will swap places, but they will
    get spread across more zones, so they won't get gathered up again on
    the next rebalance.

Change-Id: I8f44f032caac25c44778a497dedf23f5cb61b6bb
Closes-Bug: 1400083

Reviewed:  https://review.openstack.org/140879
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=1880351f1a862ae434ab23701535628f6f9258e1
Submitter: Jenkins
Branch:    master

commit 1880351f1a862ae434ab23701535628f6f9258e1
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Dec 10 15:59:21 2014 -0800

Only move too-close-together replicas when they can spread out.
    
    Imagine a 3-zone ring, and consider a partition in that ring with
    replicas placed as follows:
    
    * replica 0 is on device A (zone 2)
    * replica 1 is on device B (zone 1)
    * replica 2 is on device C (zone 2)
    
    Further, imagine that there are zero parts_wanted in all of zone 3;
    that is, zone 3 is completely full. However, zones 1 and 2 each have
    at least one parts_wanted on at least one device.
    
    When the ring builder goes to gather replicas to move, it gathers
    replica 0 because there are three zones available, but the replicas
    are only in two of them. Then, it places replica 0 in zone 1 or 2
    somewhere because those are the only zones with parts_wanted. Notice
    that this does *not* do anything to spread the partition out better.
    
    Then, on the next rebalance, replica 0 gets picked up and moved
    (again) but doesn't improve its placement (again).
    
    If your builder has min_part_hours > 0 (and it should), then replicas
    1 and 2 cannot move at all. A coworker observed the bug because a
    customer had such a partition, and its replica 2 was on a zero-weight
    device. He thought it odd that a zero-weight device should still have
    one partition on it despite the ring having been rebalanced dozens of
    times.
    
    Even if you don't have zero-weight devices, having a bunch of
    partitions trade places on each rebalance isn't particularly good.
    
    Note that this only happens with an unbalanceable ring; if the ring
    *can* balance, the gathered partitions will swap places, but they will
    get spread across more zones, so they won't get gathered up again on
    the next rebalance.
    
    Change-Id: I8f44f032caac25c44778a497dedf23f5cb61b6bb
    Closes-Bug: 1400083

Changed in swift:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-21: Fix proposed to swift (feature/ec)

#8

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/148983

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-26: Fix merged to swift (feature/ec)

#9

Download full text (18.2 KiB)

Reviewed: https://review.openstack.org/148983
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ef59cde83f176df9e36064f70142c9d8e81318fe
Submitter: Jenkins
Branch: feature/ec

commit 6f21504ccc9046e3f0b4db88f78297a00030dd3d
Author: Kota Tsuyuzaki <email address hidden>
Date: Tue Jan 13 05:34:37 2015 -0800

Fix missing content length of Response

This patch fixes swob.Response to set missing content
length correctly.

    When a child class of swob.Response is initialized with
    both "body" and "headers" arguments which includes content
    length, swob.Response might loose the acutual content length
    generated from the body because "headers" will overwrite the
    content length property after the body assignment.

    It'll cause the difference between headers's content length
    and acutual body length. This would affect mainly 3rd party
    middleware(s) to make an original response as follows:

    req = swob.Request.blank('/')
    req.method = 'HEAD'
    resp = req.get_response(app)
    return HTTPOk(body='Ok', headers=resp.headers)

This patch changes the order of headers updating and then
fixes init() to set correct content length.

Change-Id: Icd8b7cbfe6bbe2c7965175969af299a5eb7a74ef

commit b434be452ead0625728afedfe01bac1c30629d30
Author: Donagh McCabe <email address hidden>
Date: Thu Jan 8 14:52:32 2015 +0000

Use TCP_NODELAY on outgoing connections

    On a loopback device (e.g., when proxy-server and object-server are on
    same node), PUTs in the range 64-200K may experience a delay due to the
    effect of Nagel interacting with the loopback MTU of 64K.

    This effect has been directly seen by Mark Seger and Rick Jones on a
    proxy-server to object-server PUT. However, you could expect to see a
    similar effect on replication via ssync if the object being replicated
    is on a different drive on the same node.

A prior change [1] related to Nagel set TCP_NODELAY on responses. This change
sets it on all outgoing connections.

[1] I11f86df1f56fba1c6ab6084dc1f580c395f072dc

Change-Id: Ife8885a42b289a5eb4ac7e4698f8889858bc8b7e
Closes-bug: 1408622

commit b5586427e503ee22c0b20b109cad83e166ed3fd8
Author: Pete Zaitcev <email address hidden>
Date: Sat Jan 10 17:14:46 2015 -0700

Drop redundant index output

The output of format_device() now includes index as the first "dX"
element, for example d1r1z2-127.0.0.1:6200R127.0.0.1:6200/db_"".

Change-Id: Ib5f8e3a692fddbe8b1f4994787b2883130e9536f

commit c65bc49e099928801b80dce399d6098f7e10e137
Author: Pete Zaitcev <email address hidden>
Date: Sat Jan 10 08:20:25 2015 -0700

Mark the --region as mandatory

    We used to permit to omit region in the old parameter syntax, although
    we now throw a warning if it's missing. In the new parameter syntax,
    --region is mandatory. It's enforced by build_dev_from_opts in
    swift/common/ring/utils.py.

On the other hand, --replication-ip, --replication-port, and --meta
are not obligatory.

Change-Id: Ia70228f2c99595501271765286431f68e82e800b
...

Reviewed:  https://review.openstack.org/148983
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ef59cde83f176df9e36064f70142c9d8e81318fe
Submitter: Jenkins
Branch:    feature/ec

commit 6f21504ccc9046e3f0b4db88f78297a00030dd3d
Author: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Date:   Tue Jan 13 05:34:37 2015 -0800

Fix missing content length of Response
    
    This patch fixes swob.Response to set missing content
    length correctly.
    
    When a child class of swob.Response is initialized with
    both "body" and "headers" arguments which includes content
    length, swob.Response might loose the acutual content length
    generated from the body because "headers" will overwrite the
    content length property after the body assignment.
    
    It'll cause the difference between headers's content length
    and acutual body length. This would affect mainly 3rd party
    middleware(s) to make an original response as follows:
    
    req = swob.Request.blank('/')
    req.method = 'HEAD'
    resp = req.get_response(app)
    return HTTPOk(body='Ok', headers=resp.headers)
    
    This patch changes the order of headers updating and then
    fixes init() to set correct content length.
    
    Change-Id: Icd8b7cbfe6bbe2c7965175969af299a5eb7a74ef

commit b434be452ead0625728afedfe01bac1c30629d30
Author: Donagh McCabe <donagh.mccabe@hp.com>
Date:   Thu Jan 8 14:52:32 2015 +0000

Use TCP_NODELAY on outgoing connections
    
    On a loopback device (e.g., when proxy-server and object-server are on
    same node), PUTs in the range 64-200K may experience a delay due to the
    effect of Nagel interacting with the loopback MTU of 64K.
    
    This effect has been directly seen by Mark Seger and Rick Jones on a
    proxy-server to object-server PUT. However, you could expect to see a
    similar effect on replication via ssync if the object being replicated
    is on a different drive on the same node.
    
    A prior change [1] related to Nagel set TCP_NODELAY on responses. This change
    sets it on all outgoing connections.
    
    [1] I11f86df1f56fba1c6ab6084dc1f580c395f072dc
    
    Change-Id: Ife8885a42b289a5eb4ac7e4698f8889858bc8b7e
    Closes-bug: 1408622

commit b5586427e503ee22c0b20b109cad83e166ed3fd8
Author: Pete Zaitcev <zaitcev@kotori.zaitcev.us>
Date:   Sat Jan 10 17:14:46 2015 -0700

Drop redundant index output
    
    The output of format_device() now includes index as the first "dX"
    element, for example d1r1z2-127.0.0.1:6200R127.0.0.1:6200/db_"".
    
    Change-Id: Ib5f8e3a692fddbe8b1f4994787b2883130e9536f

commit c65bc49e099928801b80dce399d6098f7e10e137
Author: Pete Zaitcev <zaitcev@kotori.zaitcev.us>
Date:   Sat Jan 10 08:20:25 2015 -0700

Mark the --region as mandatory
    
    We used to permit to omit region in the old parameter syntax, although
    we now throw a warning if it's missing. In the new parameter syntax,
    --region is mandatory. It's enforced by build_dev_from_opts in
    swift/common/ring/utils.py.
    
    On the other hand, --replication-ip, --replication-port, and --meta
    are not obligatory.
    
    Change-Id: Ia70228f2c99595501271765286431f68e82e800b

commit 99fa8b3f8e4dc024bab68899736a2881cc9fedf4
Author: Harshit <harshit@acelio.com>
Date:   Sat Jan 10 01:07:45 2015 -0800

Removing commented out test in test_db_replicator
    
    It removes test_dispatch test from test_db_replicator
    which has been commented out for a while.
    
    Change-Id: Ia28fa923a65ad7d85804cbf6f7acef244741bab1
    Closes-Bug: #1408502

commit 172a9b369f8e19d1dd6526a10787e79e4309e74e
Author: David Goetz <dpgoetz@gmail.com>
Date:   Fri Oct 3 12:11:06 2014 -0700

Change black/white-listing to use sysmeta.
    
    The way we do this now involves a conf change and a proxy
    reload which is a pain. You can now just set these:
    
    X-Account-Sysmeta-Global-Write-Ratelimit: WHITELIST
    
    or
    
    X-Account-Sysmeta-Global-Write-Ratelimit: BLACKLIST
    
    NOTE:
    The existing proxy config settings: account_whitelist
    and account_blacklist will continue to work.
    
    Change-Id: I532663f1d2c75d03170c5fdb9b330416822fbc88

commit 7958729198045a2fc95480e9713a4dde2f86ad01
Author: Alistair Coles <alistair.coles@hp.com>
Date:   Fri Jan 9 14:38:23 2015 +0000

Test that SLO disallows too small first segment if other segments
    
    SLO allows the first segment to be less than min_segment_size if
    it is the only segment. Current tests verify that a single small
    segment is allowed, and that multiple small segments are disallowed.
    This patch adds a test to verify that SLO will disallow a manifest
    with a small first segment followed by another correctly sized
    segment.
    
    Change-Id: I920c0aee38e4e16c49bd84a3b772308a00794fa7

commit 92fd28aa6afdb97a23027a2f02631eaf693a41b4
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Thu Jan 8 18:22:40 2015 -0800

Fix a few edges where we lost python 2.6 support
    
    a.k.a. put my head in the sand about the reality of not supporting python 2.6
    a little while longer.  We need to get something in the next release notes
    about deprecating support for python 2.6 ASAP.  I don't really care enough
    about it to keep cleaning up the junk we're going to let slip through not
    testing python 2.6 in the gate.
    
    Change-Id: Ib36cd66bda29d75d3b5f4ef0a0ef7b824923df28

commit 404ac092d19ef80a5f4d96e9cd36a5bd69499a1f
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Thu Dec 11 01:59:52 2014 -0800

Fix large out of sync out of date containers
    
    As I understand it db replication starts with a preflight sync request
    to the remote container server who's response will include the last
    synced row_id that it has on file for the sending nodes database id.
    
    If the difference in the last sync point returned is more than 50% of
    the local sending db's rows, it'll fall back to sending the whole db
    over rsync and let the remote end merge items locally - but generally
    there's just a few rows missing and they're shipped over the wire as
    json and stuffed into some rather normal looking merge_items calls.
    
    The one thing that's a bit different with these remote merge_items calls
    (compared to your average run of the mill eat a bunch of entries out of
    a .pending file) is the is source kwarg.  When this optional kwarg comes
    into merge_items it's the remote sending db's uuid, and after we eat all
    the rows it sent us we update our local incoming_sync table for that
    uuid so that next time when it makes it's pre-flight sync request we can
    tell it where it left off.
    
    Now normally the sending db is going to push out it's rows up from the
    returned sync_point in 1000 item diffs, up to 10 batches total (per_diff
    and max_diffs options) - 10K rows.  If that goes well then everything is
    in sync up to at least the point it started, and the sending db will
    *also* ship over *it's* incoming_sync rows to merge_syncs on the remote
    end.  Since the sending db is in sync with these other db's up to those
    points so is the remote db now by way of the transitive property.  Also
    note through some weird artifact that I'm not entirely convinced isn't
    an unrelated and possibly benign bug the incoming_sync table on the
    sending db will often also happen to include it's own uuid - maybe it
    got pushed back to it from another node?
    
    Anyway, that seemed to work well enough until a sending db got diff
    capped (i.e. sent it's 10K rows and wasn't finished), when this happened
    the final merge_syncs call never gets sent because the remote end is
    definitely *not* up to date with the other databases that the sending db
    is - it's not even up-to-date with the sending db yet!  But the hope is
    certainly that on the next pass it'll be able to finish sending the
    remaining items.  But since the remote end is who decides what the last
    successfully synced row with this local sending db was - it's super
    important that the incoming_sync table is getting updated in merge_items
    when that source kwarg is there.
    
    I observed this simple and straight forward process wasn't working well
    in one case - which is weird considering it didn't have much in the way
    of tests.  After I had the test and started looking into it seemed maybe
    the source kwarg handling got over-indented a bit in the bulk insert
    merge_items refactor.  I think this is correct - maybe we could send
    someone up to the mountain temple to seek out gholt?
    
    Change-Id: I4137388a97925814748ecc36b3ab5f1ac3309659

commit bcf26f52096e444fd03cbab26e016b3306b354df
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Dec 17 13:48:42 2014 -0800

Add notion of overload to swift-ring-builder
    
    The ring builder's placement algorithm has two goals: first, to ensure
    that each partition has its replicas as far apart as possible, and
    second, to ensure that partitions are fairly distributed according to
    device weight. In many cases, it succeeds in both, but sometimes those
    goals conflict. When that happens, operators may want to relax the
    rules a little bit in order to reach a compromise solution.
    
    Imagine a cluster of 3 nodes (A, B, C), each with 20 identical disks,
    and using 3 replicas. The ring builder will place 1 replica of each
    partition on each node, as you'd expect.
    
    Now imagine that one disk fails in node C and is removed from the
    ring. The operator would probably be okay with remaining at 1 replica
    per node (unless their disks are really close to full), but to
    accomplish that, they have to multiply the weights of the other disks
    in node C by 20/19 to make C's total weight stay the same. Otherwise,
    the ring builder will move partitions around such that some partitions
    have replicas only on nodes A and B.
    
    If 14 more disks failed in node C, the operator would probably be okay
    with some data not living on C, as a 4x increase in storage
    requirements is likely to fill disks.
    
    This commit introduces the notion of "overload": how much extra
    partition space can be placed on each disk *over* what the weight
    dictates.
    
    For example, an overload of 0.1 means that a device can take up to 10%
    more partitions than its weight would imply in order to make the
    replica dispersion better.
    
    Overload only has an effect when replica-dispersion and device weights
    come into conflict.
    
    The overload is a single floating-point value for the builder
    file. Existing builders get an overload of 0.0, so there will be no
    behavior change on existing rings.
    
    In the example above, imagine the operator sets an overload of 0.112
    on his rings. If node C loses a drive, each other drive can take on up
    to 11.2% more data. Splitting the dead drive's partitions among the
    remaining 19 results in a 5.26% increase, so everything that was on
    node C stays on node C. If another disk dies, then we're up to an
    11.1% increase, and so everything still stays on node C. If a third
    disk dies, then we've reached the limits of the overload, so some
    partitions will begin to reside solely on nodes A and B.
    
    DocImpact
    
    Change-Id: I3593a1defcd63b6ed8eae9c1c66b9d3428b33864

commit 5b99ba1c8a78fe8cc1c5ad2ca554289188881919
Author: Dhriti Shikhar <dhrish20@gmail.com>
Date:   Wed Jan 7 00:13:33 2015 +0530

Substituted object storage paragraph with simple definition
    
    Change-Id: I32711fd10dfb1b84cbea9d05638b9ee002588104
    Closes-bug: #1373925

commit 1880351f1a862ae434ab23701535628f6f9258e1
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Dec 10 15:59:21 2014 -0800

Only move too-close-together replicas when they can spread out.
    
    Imagine a 3-zone ring, and consider a partition in that ring with
    replicas placed as follows:
    
    * replica 0 is on device A (zone 2)
    * replica 1 is on device B (zone 1)
    * replica 2 is on device C (zone 2)
    
    Further, imagine that there are zero parts_wanted in all of zone 3;
    that is, zone 3 is completely full. However, zones 1 and 2 each have
    at least one parts_wanted on at least one device.
    
    When the ring builder goes to gather replicas to move, it gathers
    replica 0 because there are three zones available, but the replicas
    are only in two of them. Then, it places replica 0 in zone 1 or 2
    somewhere because those are the only zones with parts_wanted. Notice
    that this does *not* do anything to spread the partition out better.
    
    Then, on the next rebalance, replica 0 gets picked up and moved
    (again) but doesn't improve its placement (again).
    
    If your builder has min_part_hours > 0 (and it should), then replicas
    1 and 2 cannot move at all. A coworker observed the bug because a
    customer had such a partition, and its replica 2 was on a zero-weight
    device. He thought it odd that a zero-weight device should still have
    one partition on it despite the ring having been rebalanced dozens of
    times.
    
    Even if you don't have zero-weight devices, having a bunch of
    partitions trade places on each rebalance isn't particularly good.
    
    Note that this only happens with an unbalanceable ring; if the ring
    *can* balance, the gathered partitions will swap places, but they will
    get spread across more zones, so they won't get gathered up again on
    the next rebalance.
    
    Change-Id: I8f44f032caac25c44778a497dedf23f5cb61b6bb
    Closes-Bug: 1400083

commit fd8eb6b280ca15c0cfc9723c056cdac8548b34fd
Author: Alistair Coles <alistair.coles@hp.com>
Date:   Tue Jan 6 16:57:17 2015 +0000

Add undocumented options to keystoneauth sample config
    
    Adds is_admin and allow_overrides to the keystoneauth section
    of proxy-server.conf.sample and also adds related comments to
    the keystoneauth docstring.
    
    DocImpact
    
    Change-Id: I7c751880cb6742db7347f31c4d32b523e33da75b

commit 60504a9d23a8f910d883b448565dbfb798776415
Author: Kota Tsuyuzaki <tsuyuzaki.kota@lab.ntt.co.jp>
Date:   Tue Jan 6 01:31:19 2015 -0800

Fix slo constraint according to the error message
    
    This patch allows to create a slo manifest when the manifest
    includes only one segment less than min_segment_size.
    
    When putting a manifest for slo with only one segment less than
    min_segment_size, it will fail as EntityTooSmall with the message
    "Each segment, except the last, must be at least min_segment_size
    bytes." This behavior is different from the message because when
    there is only one segment, the segment is absolutely the last
    segment.
    
    Change-Id: I8f0203afe55536207a41e1267128f8378f1ba15f

commit 42c790d04b85e2d2665da7c13f800d03b263a22f
Author: Prashanth Pai <ppai@redhat.com>
Date:   Thu Dec 25 14:26:28 2014 +0530

Avoid unnecessary unlink() on every successful PUT
    
    If you do a strace on object-server PUT operation, you'd see that
    there's an unlink() sys call which _always_ fails with ENOENT.
    
    mkstemp() creates a temp file which is renamed later to it's final
    object path in filesystem. Hence, on a successful object PUT, the
    tempfile would never exist in its original location after rename.
    
    Change-Id: I805c7c200107e2d56278f0fb35692a51cb1edc0b
    Signed-off-by: Prashanth Pai <ppai@redhat.com>

commit 199bf8fce45cfedfc060a00ede8f603110872c14
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Tue Jan 6 06:14:04 2015 +0000

Imported Translations from Transifex
    
    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure
    
    Change-Id: I50077e8a0a840f64b197fecf266f0c8fcd605804

commit 5ca49ca92485b6ba868544f12fa524d9d7b666c6
Author: Hisashi Osanai <osanai.hisashi@jp.fujitsu.com>
Date:   Wed Nov 26 05:25:01 2014 +0900

Fix the GET's response code when there is a missing segment in LO
    
    This patch changes the response code from Internal Server Error to
    Conflict when there is a missing segment and the position is first.
    
    Co-Authored-By: Samuel Merritt <sam@swiftstack.com>
    Closes-Bug: #1386568
    Change-Id: Iac175b4dc6ac9081436738697a27fe669acce0eb

commit bf4c78bc25303264d661ae144a46217e39007219
Author: Nicolas Trangez <ikke@nicolast.be>
Date:   Thu Dec 18 17:09:10 2014 +0100

Add tests for unavailability of `tee` and `splice` in `libc`
    
    As suggested by Paul Luse in review 135319 (for 2a0a8ae00f2), this
    brings test coverage of the `swift.common.splice` module up to 100%.
    
    The mechanism used to check whether the functions are looked up on `libc`
    is somewhat ugly, but using a `PropertyMock` raising an `AttributeError`
    as `side_effect` doesn't work: it results in `mock` creating a `Mock`
    instance and returning it.
    
    Change-Id: I14828cfc2ae644dbd9ead8c20613b19cea8607f1
    See: https://review.openstack.org/#/c/135319/4/swift/common/splice.py,cm
    See: 2a0a8ae00f2d3b7db255b0905b063e930f824f3d

commit 99d501831ebdba1a228022805ee1a4bb98ecd77a
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Mon Nov 17 20:29:45 2014 -0800

Consistently apply node error limiting rules in proxy
    
    All GET or HEAD requests consistently error limit nodes that return 507
    and increment errors for nodes responding with any other 5XX.
    
    There were two places in the object PUT path where the proxy was error
    limiting nodes and their behavior was inconsistent.  During expect-100
    connect we would only error_limit nodes on 507, and during response we
    would increment errors for all 5XX series responses.  This was pretty
    hard to reason about and the divergence in behavior of questionable
    value.
    
    An audit of base controller highlighted where make_requests would apply
    error_limit's on 507 but not increment errors on other 5XX responses.
    
    Now anywhere we track errors on nodes we use error_limit on 507 and
    error_occurred on any other 5XX series request.  Additionally a Timeout
    or Exception that is logged through exception_occurred will bump errors -
    which is consistent with the approach in "Add Error Limiting to slow
    nodes" [1].
    
    1. https://review.openstack.org/#/c/112424/
    
    Change-Id: I67e489d18afd6bdfc730bfdba76f85a2e3ca74f0

Thierry Carrez (ttx) on 2015-01-29

Changed in swift:
milestone:	none → 2.2.2
status:	Fix Committed → Fix Released

OpenStack Object Storage (swift)

builder file out of balance but cannot be rebalanced, have 0 weight drives with 1 partition

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches