OpenStack Object Storage (swift)

swift-ringbuilder rebalance moves 100% partitions when adding a new node to a new region

Bug #1367826 reported by Florent Flament on 2014-09-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	Undecided	Christian Schwede	OpenStack Object Storage (swift) 2.2.0 "juno"

Bug Description

When adding a new node to a new region of an existing Swift cluster,
the number of partitions moved is still 100%. On the other hand, when
the node is added to the current region, only the required partitions
are moved.

Patch https://review.openstack.org/#/c/115441/ allows moving
partitions to a new region progressively. The drawback is that a lot
of useless traffic is generated inside the initial region.

Starting from a 5 nodes Swift cluster, when I add a new device to the
current 'r1' region with weight 1000, 18.75% of partitions are moved
to this new device (as expected).

  $ swift-ring-builder object.builder
  object.builder, build version 5
  262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 0.00 balance
  The minimum number of hours before a partition can be reassigned is 0
  Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
               0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 157287 0.00
               1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 157286 -0.00
               2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 157286 -0.00
               3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 157287 0.00
               4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 157286 -0.00
  $ swift-ring-builder object.builder add r1z1-192.168.100.15:6000/d1 1000
  Device d5r1z1-192.168.100.15:6000R192.168.100.15:6000/d1_"" with 1000.0 weight got id 5
  $ swift-ring-builder object.builder rebalance
  Reassigned 49152 (18.75%) partitions. Balance is now 0.00.
  $ swift-ring-builder object.builder
  object.builder, build version 7
  262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.00 balance
  The minimum number of hours before a partition can be reassigned is 0
  Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
               0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 147456 0.00
               1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 147456 0.00
               2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 147456 0.00
               3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 147456 0.00
               4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 147456 0.00
               5 1 1 192.168.100.15 6000 192.168.100.15 6000 d1 1000.00 49152 0.00
  $

Now, I start from the same builder as previously with my 5 nodes
cluster. When I add a new device to a new 'r2' region with weight
1000, 100% of partitions (262144 partitions) are moved, while 18.75%
were expected to move as when adding the node in the 'r1'
region. While it is true that only 49152 partitions are in the new
region, many partitions seem to have been moved between nodes of the
'r1' region uselessly. This would most probably generate heavy traffic
on a running cluster.

  $ swift-ring-builder object.builder
  object.builder, build version 5
  262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 0.00 balance
  The minimum number of hours before a partition can be reassigned is 0
  Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
               0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 157287 0.00
               1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 157286 -0.00
               2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 157286 -0.00
               3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 157287 0.00
               4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 157286 -0.00
  $ swift-ring-builder object.builder add r2z1-192.168.100.15:6000/d1 1000
  Device d5r2z1-192.168.100.15:6000R192.168.100.15:6000/d1_"" with 1000.0 weight got id 5
  $ swift-ring-builder object.builder rebalance
  Reassigned 262144 (100.00%) partitions. Balance is now 0.00.
  $ swift-ring-builder object.builder
  object.builder, build version 7
  262144 partitions, 3.000000 replicas, 2 regions, 2 zones, 6 devices, 0.00 balance
  The minimum number of hours before a partition can be reassigned is 0
  Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
               0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 147456 0.00
               1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 147456 0.00
               2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 147456 0.00
               3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 147456 0.00
               4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 147456 0.00
               5 2 1 192.168.100.15 6000 192.168.100.15 6000 d1 1000.00 49152 0.00
  $

Initial cluster was built with the following script:

#!/bin/bash
swift-ring-builder object.builder create 18 3 0

IPs="192.168.100.10 192.168.100.11 192.168.100.12 192.168.100.13 192.168.100.14"

  for ip in $IPs
  do
      swift-ring-builder object.builder add r1z1-${ip}:6000/d1 3000
  done

swift-ring-builder object.builder
swift-ring-builder object.builder rebalance

Christian Schwede (cschwede) on 2014-09-10

Changed in swift:
assignee:	nobody → Christian Schwede (cschwede)
status:	New → In Progress

Revision history for this message

Christian Schwede (cschwede) wrote on 2014-09-10:

sim.sh Edit (1.3 KiB, text/x-sh)

Florent, thanks for your extensive test.

I just had a quick look into this, and currently I think only the displayed percentage is wrong. The actual number of moved partitions from one device to another is quite low; I attached a small shell script that diffs the assigned partitions after each rebalance, and these numbers are very low (as expected).

Will have a deeper look into this tomorrow.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-11: Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/120713

Revision history for this message

Florent Flament (florentflament) wrote on 2014-09-11:

sim2.sh Edit (1.7 KiB, text/x-sh)

Hi Christian,
Thank you very much for you feedback.

I did some more testing based on the `sim.sh` script that you
provided. I uploaded an updated version `sim2.sh`, which allows you to
specify the number of devices in the initial region. The partitions
count is computed based on this number.

My findings are the followings:

* You are right, the reassigned partitions number displayed by the
`rebalance` command is wrong.

* In the case of a 3 devices initial region, the number of partitions
actually moved is almost optimal: 16 partitions assigned to the new
node, with a total of 17 partitions moved (3.3% of partitions)

* However, when you increase the devices count in the initial region,
  the number of partitions moved increases. As in the following
  example, with a 10 devices initial region, 61 partitions have been
  added to the new node, while a total of 1806 partitions have moved
  (88.2% of the partitions).

What do you think about that ?

Best regards,
Florent Flament

Details
-------

$ sh sim2.sh 3
Devices count in the initial region: 3
Partitions count is: 512
+ swift-ring-builder test.builder add r2z0-127.0.0.1:6000/sdb1 3
Device d3r2z0-127.0.0.1:6000R127.0.0.1:6000/sdb1_"" with 3.0 weight got id 3
+ swift-ring-builder test.builder rebalance
Reassigned 512 (100.00%) partitions. Balance is now 5.21.
-------------------------------------------------------------------------------
NOTE: Balance of 5.21 indicates you should push this
ring, wait at least 0 hours, and rebalance/repush.
-------------------------------------------------------------------------------
Partitions changes inside initial region: 18
Partitions count for new device: 16
Total partitions moved: 17 (3.3203125%)

$ sh sim2.sh 10
Devices count in the initial region: 10
Partitions count is: 2048
+ swift-ring-builder test.builder add r2z0-127.0.0.1:6000/sdb1 10
Device d10r2z0-127.0.0.1:6000R127.0.0.1:6000/sdb1_"" with 10.0 weight got id 10
+ swift-ring-builder test.builder rebalance
Reassigned 2048 (100.00%) partitions. Balance is now 0.28.
Partitions changes inside initial region: 3551
Partitions count for new device: 61
Total partitions moved: 1806 (88.18359375%)

Hi Christian,
Thank you very much for you feedback.

My findings are the followings:

* You are right, the reassigned partitions number displayed by the
  `rebalance` command is wrong.

* In the case of a 3 devices initial region, the number of partitions
  actually moved is almost optimal: 16 partitions assigned to the new
  node, with a total of 17 partitions moved (3.3% of partitions)

What do you think about that ?

Best regards,
Florent Flament

Details
-------

$ sh sim2.sh 3
Devices count in the initial region: 3
Partitions count is: 512
+ swift-ring-builder test.builder add r2z0-127.0.0.1:6000/sdb1 3
Device d3r2z0-127.0.0.1:6000R127.0.0.1:6000/sdb1_"" with 3.0 weight got id 3
+ swift-ring-builder test.builder rebalance
Reassigned 512 (100.00%) partitions. Balance is now 5.21.
-------------------------------------------------------------------------------
NOTE: Balance of 5.21 indicates you should push this
      ring, wait at least 0 hours, and rebalance/repush.
-------------------------------------------------------------------------------
Partitions changes inside initial region: 18
Partitions count for new device: 16
Total partitions moved: 17 (3.3203125%)

Revision history for this message

Christian Schwede (cschwede) wrote on 2014-09-12:

Hello Florent,

you're right - there is still a lot of partition movement in the initital region, depending on the size of the cluster (it's getting worse with bigger clusters).

I reopened a former abandoned patch and updated it: https://review.openstack.org/#/c/105666/

Best,

Christian

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-14:

Fix proposed to branch: master
Review: https://review.openstack.org/121422

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-18: Change abandoned on swift (master)

Change abandoned by Christian Schwede (<email address hidden>) on branch: master
Review: https://review.openstack.org/105666
Reason: Abandoning; https://review.openstack.org/#/c/121422/ is the better solution IMO.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-25: Fix merged to swift (master)

Reviewed: https://review.openstack.org/120713
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=09bdc87cbc1e7bc1918f9b5094bec266b6761d75
Submitter: Jenkins
Branch: master

commit 09bdc87cbc1e7bc1918f9b5094bec266b6761d75
Author: Christian Schwede <email address hidden>
Date: Thu Sep 11 08:01:51 2014 +0000

Return correct number of changed partitions

    When a ring is rebalanced the number of changed partitions is counted.
    Before this patch partitions might be rebalanced, but actually no data
    is moved - for example, when a partition is assigned to the same device
    as before. This results in a wrong number of reassigned partitions that
    is shown to the user.

    This patch remembers the partition allocation before the rebalance, and
    compares it to the new allocation after a rebalance. Only partitions
    that are stored on a different device than before are counted.

    Partial-Bug: 1367826
    Also-By: Florent Flament <email address hidden>
    Change-Id: Iacfd514df3af351791f9191cef78cff1b3e2645f

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-26: Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/124503

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-27: Fix merged to swift (feature/ec)

Download full text (9.1 KiB)

Reviewed: https://review.openstack.org/124503
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=bcaa00f25f3e8bd4123645256c535d0efe8a6733
Submitter: Jenkins
Branch: feature/ec

commit 15fbf9fe7cf33ed4b56569078400a2ba070b6bce
Author: paul luse <email address hidden>
Date: Thu Sep 11 06:55:45 2014 -0700

Add container_count to policy_stat table

    Start tracking the container count per policy including reporting
    it in account HEAD and supporting installations where the DB
    existed before the updated schema.

    Migration is triggered by the account audtior; if the database is
    un-migrated it will continue to report policy_stats without the per
    policy container_count keys.

Closes-Bug: #1367514
Change-Id: I07331cea177e19b3df303609a4ac510765a19162

commit d10462e8704e7f5efe03c4024b541a3780348615
Author: Darrell Bishop <email address hidden>
Date: Tue Sep 23 09:11:44 2014 -0700

Fix profile tests to clean up its tempdirs.

Change-Id: I363651046cf414e14f15affd834043aabd5427c0

commit b68258a322cb004151b84584d00b3c76ee01bc29
Author: Mahati Chamarthy <email address hidden>
Date: Fri Sep 5 01:42:17 2014 +0530

Added instructions to create a label or UUID to the XFS volume and mount using it.

Change-Id: Idcaf16a278d6c34770af9b1f17d69bdd94fb86b7

commit 4d23a0fcf5faa6339a1a58fcbdab8687a6c88feb
Author: Samuel Merritt <email address hidden>
Date: Thu Aug 28 09:39:38 2014 -0800

Reject overly-taxing ranged-GET requests

    RFC 7233 says that servers MAY reject egregious range-GET requests
    such as requests with hundreds of ranges, requests with non-ascending
    ranges, and so on.

    Such requests are fairly hard for Swift to process. Consider a Range
    header that asks for the first byte of every 10th MiB in a 4 GiB
    object, but in some random order. That'll cause a lot of seeks on the
    object server, but the corresponding response body is quite small in
    comparison to the workload.

    This commit makes Swift reject, with a 416 response, any ranged GET
    request with more than fifty ranges, more than three overlapping
    ranges, or more than eight non-increasing ranges.

    This is a necessary prerequisite for supporting multi-range GETs on
    large objects. Otherwise, a malicious user could construct a Range
    header with hundreds of byte ranges where each individual byterange
    requires the proxy to contact a different object server. If seeking
    all over a disk is bad, connecting all over the cluster is way worse.

DocImpact

Change-Id: I4dcedcaae6c3deada06a0223479e611094d57234

commit 5efdab6055bc99638e4e1388bef685b19682025d
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Sep 22 06:07:56 2014 +0000

Imported Translations from Transifex

Change-Id: Ibd8882766a87c6d77e786f7635b1290391e43f10

commit 4faf1702706b289521329243e5802cf86e54c7f7
Author: Lorcan <email address hidden>
Date: Tue Sep 2 18:12:18 2014 +0100

Add "--no-overlap" option to swift-dispersion populate

This change allows the user to use a "--no-overlap" parameter w...

Reviewed:  https://review.openstack.org/124503
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=bcaa00f25f3e8bd4123645256c535d0efe8a6733
Submitter: Jenkins
Branch:    feature/ec

commit 15fbf9fe7cf33ed4b56569078400a2ba070b6bce
Author: paul luse <paul.e.luse@intel.com>
Date:   Thu Sep 11 06:55:45 2014 -0700

Add container_count to policy_stat table
    
    Start tracking the container count per policy including reporting
    it in account HEAD and supporting installations where the DB
    existed before the updated schema.
    
    Migration is triggered by the account audtior; if the database is
    un-migrated it will continue to report policy_stats without the per
    policy container_count keys.
    
    Closes-Bug: #1367514
    Change-Id: I07331cea177e19b3df303609a4ac510765a19162

commit d10462e8704e7f5efe03c4024b541a3780348615
Author: Darrell Bishop <darrell@swiftstack.com>
Date:   Tue Sep 23 09:11:44 2014 -0700

Fix profile tests to clean up its tempdirs.
    
    Change-Id: I363651046cf414e14f15affd834043aabd5427c0

commit b68258a322cb004151b84584d00b3c76ee01bc29
Author: Mahati Chamarthy <mahati.chamarthy@gmail.com>
Date:   Fri Sep 5 01:42:17 2014 +0530

Added instructions to create a label or UUID to the XFS volume and mount using it.
    
    Change-Id: Idcaf16a278d6c34770af9b1f17d69bdd94fb86b7

commit 4d23a0fcf5faa6339a1a58fcbdab8687a6c88feb
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Aug 28 09:39:38 2014 -0800

Reject overly-taxing ranged-GET requests
    
    RFC 7233 says that servers MAY reject egregious range-GET requests
    such as requests with hundreds of ranges, requests with non-ascending
    ranges, and so on.
    
    Such requests are fairly hard for Swift to process. Consider a Range
    header that asks for the first byte of every 10th MiB in a 4 GiB
    object, but in some random order. That'll cause a lot of seeks on the
    object server, but the corresponding response body is quite small in
    comparison to the workload.
    
    This commit makes Swift reject, with a 416 response, any ranged GET
    request with more than fifty ranges, more than three overlapping
    ranges, or more than eight non-increasing ranges.
    
    This is a necessary prerequisite for supporting multi-range GETs on
    large objects. Otherwise, a malicious user could construct a Range
    header with hundreds of byte ranges where each individual byterange
    requires the proxy to contact a different object server. If seeking
    all over a disk is bad, connecting all over the cluster is way worse.
    
    DocImpact
    
    Change-Id: I4dcedcaae6c3deada06a0223479e611094d57234

commit 5efdab6055bc99638e4e1388bef685b19682025d
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Mon Sep 22 06:07:56 2014 +0000

Imported Translations from Transifex
    
    Change-Id: Ibd8882766a87c6d77e786f7635b1290391e43f10

commit 4faf1702706b289521329243e5802cf86e54c7f7
Author: Lorcan <lorcan.browne@hp.com>
Date:   Tue Sep 2 18:12:18 2014 +0100

Add "--no-overlap" option to swift-dispersion populate
    
    This change allows the user to use a "--no-overlap" parameter when
    running the tool multiple times. It will increase the coverage by
    whatever is specified in the dispersion_coverage field of the conf
    file in a manner where existing container/objects are left in place
    and no partition is populated more than once.
    
    Related-Bug: #1233045
    
    Change-Id: I139fed2f4c967ba18d073b7ecd1e946ed4da1271

commit c1f6569c00951e3b57d58a0cd32dc28a638f5a81
Author: Rafael Rivero <rafael@cloudscaling.com>
Date:   Thu Sep 18 21:16:35 2014 -0700

Fixes several typos (Swift)
    
    Corrects spelling errors found in comments.
    
    Change-Id: I228a888e3f256569ea32ef1613092dbd63e13c62

commit 1678083d0e48f24e6a74eada874f01ac2755cf78
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Sep 18 17:04:30 2014 -0700

Test for unicode names in AccountBroker.merge_items()
    
    ContainerBroker.merge_items() had a bug in it where non-ASCII Unicode
    names would possibly result in duplicate entries in container
    databases. AccountBroker.merge_items() doesn't do the same
    bulk-operations tricks that ContainerBroker does, so it doesn't
    currently have the bug. This commit just adds a test to ensure the bug
    doesn't creep in should someone decide to make AccountBroker look more
    like ContainerBroker someday.
    
    Change-Id: Id2ac129828dbdf55b609d839ce4d9d42437ee0a3

commit 7d0e5ebe690bf3cf41ccd970281d532a23284e58
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Tue Jun 10 14:15:27 2014 -0700

Zero-copy object-server GET responses with splice()
    
    This commit lets the object server use splice() and tee() to move data
    from disk to the network without ever copying it into user space.
    
    Requires Linux. Sorry, FreeBSD folks. You still have the old
    mechanism, as does anyone who doesn't want to use splice. This
    requires a relatively recent kernel (2.6.38+) to work, which includes
    the two most recent Ubuntu LTS releases (Precise and Trusty) as well
    as RHEL 7. However, it excludes Lucid and RHEL 6. On those systems,
    setting "splice = on" will result in warnings in the logs but no
    actual use of splice.
    
    Note that this only applies to GET responses without Range headers. It
    can easily be extended to single-range GET requests, but this commit
    leaves that for future work. Same goes for PUT requests, or at least
    non-chunked ones.
    
    On some real hardware I had laying around (not a VM), this produced a
    37% reduction in CPU usage for GETs made directly to the object
    server. Measurements were done by looking at /proc/<pid>/stat,
    specifically the utime and stime fields (user and kernel CPU jiffies,
    respectively).
    
    Note: There is a Python module called "splicetee" available on PyPi,
    but it's licensed under the GPL, so it cannot easily be added to
    OpenStack's requirements. That's why this patch uses ctypes instead.
    
    Also fixed a long-standing annoyance in FakeLogger:
    
        >>> fake_logger.warn('stuff')
        >>> fake_logger.get_lines_for_level('warn')
        []
        >>>
    
    This, of course, is because the correct log level is 'warning'. Now
    you get a KeyError if you call get_lines_for_level with a bogus log
    level.
    
    Change-Id: Ic6d6b833a5b04ca2019be94b1b90d941929d21c8

commit eaab4d3fd6b7a330c1b904dc69d7ea4f0fbe8781
Author: Michael Barton <mike@weirdlooking.com>
Date:   Thu Sep 18 19:20:51 2014 +0000

container.merge_items bug
    
    When replicated container entries get round-tripped through json, they wind up
    with unicode objects for names.  This causes equality checks to fail against
    container entries, and you can wind up with duplicate records.  My bad.
    
    Change-Id: I3aee2ad8dbd3a617efe37e887cfb902a3e4a1646

commit 09bdc87cbc1e7bc1918f9b5094bec266b6761d75
Author: Christian Schwede <christian.schwede@enovance.com>
Date:   Thu Sep 11 08:01:51 2014 +0000

Return correct number of changed partitions
    
    When a ring is rebalanced the number of changed partitions is counted.
    Before this patch partitions might be rebalanced, but actually no data
    is moved - for example, when a partition is assigned to the same device
    as before. This results in a wrong number of reassigned partitions that
    is shown to the user.
    
    This patch remembers the partition allocation before the rebalance, and
    compares it to the new allocation after a rebalance. Only partitions
    that are stored on a different device than before are counted.
    
    Partial-Bug: 1367826
    Also-By: Florent Flament <florent.flament-ext@cloudwatt.com>
    Change-Id: Iacfd514df3af351791f9191cef78cff1b3e2645f

commit a7604da065f025931f42bbcf9307bbabda3a37a9
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Sep 10 17:29:05 2014 -0700

Move multipart MIME parser into utils
    
    Over on the EC branch, we need to be able to parse multipart MIME
    documents in the object server. The formpost middleware has a
    perfectly good MIME parser, but it seems sort of awful to import
    things from formpost in swift/obj/server.py, so I pulled it out into
    common.utils.
    
    Change-Id: Ieb4c05d02d8e4ef51a3a11d26c503786b1897f60

commit 49fa5b8fb467bb5900dda36da47d46d4c5882bb0
Author: Alistair Coles <alistair.coles@hp.com>
Date:   Wed Sep 10 16:09:13 2014 +0100

Update documentation for using keystone auth
    
    Cleanup and add clarification to the documentation
    for using Keystone auth.
    
    Update to refer to auth_token middleware being
    distributed as part of the keystomemiddelware project
    rather than keystone.
    
    Include capabilities (/info) in the list of reasons
    why delay_auth_decision might need to be set in
    auth_token middleware config.
    
    Add description of the project_id:user_id format
    for container ACLs and emphasize that ids rather than
    names should be used since this patch has now merged:
    https://review.openstack.org/#/c/86430
    
    DocImpact
    blueprint keystone-v3-support
    Change-Id: Idda4a3dcf8240474f1d2d163016ca2d40ec2d589

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-03: Fix merged to swift (master)

#10

Reviewed: https://review.openstack.org/121422
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=20e9ad538b6ba90674a75310d9cc184162ba9398
Submitter: Jenkins
Branch: master

commit 20e9ad538b6ba90674a75310d9cc184162ba9398
Author: Christian Schwede <email address hidden>
Date: Sun Sep 14 20:48:32 2014 +0000

Limit partition movement when adding a new tier

    When adding a new tier (region, zone, node, device) to an existing,
    already balanced ring all existing partitions in the existing tiers of
    the same level are gathered for reassigning, even when there is
    not enough space in the new tier. This will create a lot of unnecessary
    replication traffic in the backend network.

    For example, when only one region exists in the ring and a new region is
    added, all existing parts are selected to reassign, even when the new
    region has a total weight of 0. Same for zones, nodes and devices.

    This patch limits the number of partitions that are choosen to reassign
    by checking for devices on other tiers that are asking for more
    partitions.

Failed devices are not considered when applying the limit.

    Co-Authored By: Florent Flament <email address hidden>
    Change-Id: I6178452e47492da4677a8ffe4fb24917b5968cd9
    Closes-Bug: 1367826

Changed in swift:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-10-06

Changed in swift:
milestone:	none → 2.2.0-rc1
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-07: Fix proposed to swift (feature/ec)

#11

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/126595

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-07: Fix merged to swift (feature/ec)

#12

Download full text (11.3 KiB)

Reviewed: https://review.openstack.org/126595
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=06800cbe446ce4c937a57b69517b55c3bba9b6e1
Submitter: Jenkins
Branch: feature/ec

commit 7528f2b22169e90fe8ddd19b7ef7d46ecff5d231
Author: Christian Schwede <email address hidden>
Date: Mon Oct 6 10:01:03 2014 +0000

Fix minor typo

Fixes minor typo in one method and adds missing parameter in other
method. Only checked swift/container/reconciler.py for now.

Change-Id: I5c648010f09b6e4b1fb0380bc97b266e680602f8

commit 94fd95ba30c72fbcb03367aaa8da407a408948d5
Author: OpenStack Proposal Bot <email address hidden>
Date: Sat Oct 4 06:07:47 2014 +0000

Imported Translations from Transifex

Change-Id: I31b5e6b0f2922150902e1bfa52144302ee0c7a8e

commit d6a827792619f3343af07fc2519f4253fbdc67f7
Author: John Dickinson <email address hidden>
Date: Fri Oct 3 10:17:00 2014 -0400

updated AUTHORS and CHANGELOG for 2.2.0

Change-Id: I6c0bc1570f6a48439de5a029a86f1b582f30f8a6

commit 5b2c27a5874c2b5b0a333e4955b03544f6a8119f
Author: Richard (Rick) Hawkins <email address hidden>
Date: Wed Oct 1 09:37:47 2014 -0400

Fix metadata overall limits bug

    Currently metadata limits are checked on a per request basis. If
    multiple requests are sent within the per request limits, it is
    possible to exceed the overall limits. This patch adds an overall
    metadata check to ensure that multiple requests to add metadata to
    an account/container will check overall limits before adding
    the additional metadata.

Change-Id: Ib9401a4ee05a9cb737939541bd9b84e8dc239c70
Closes-Bug: 1365350

commit 301a96f664d58b4ccad8e3cbf5d5a889cc76790f
Author: Jay S. Bryant <email address hidden>
Date: Tue Sep 30 15:08:59 2014 -0500

Ensure sys.exit called in fork_child after exception

    Currently, the fork_child() function in auditor.py does not
    handle the case where run_audit() encounters an exception
    properly.

    A simple case is where the /srv directory is set
    with permissions such that the 'swift' user cannot access it.
    Such a situation causes a os.listdir() to return an OSError
    exception. When this happens the fork_child() process does not
    run to completion and sys.exit() is not executed. The process
    that was forked off continues to run as a result. Execution goes
    back up to the audit_loop function which restarts the whole process. The
    end result is an increasing number of processes on the system
    until the parent is terminated. This can quickly exhaust the
    process descriptors on a system.

    This change wraps run_audit() in a try block and adds an
    exception handler that prints what exception was encountered.
    The sys.exit() was moved to a finally: block so that it will
    always be run, avoiding the creation of zombies.

Change-Id: I89d7cd27112445893852e62df857c3d5262c27b3
Closes-bug: 1375348

commit 6d49cc3092168de6d22378557b2c37ea4063beeb
Author: Samuel Merritt <email address hidden>
Date: Thu Oct 2 17:14:58 2014 -0400

Fix ring-builder crash.

If you adjust ...

Reviewed:  https://review.openstack.org/126595
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=06800cbe446ce4c937a57b69517b55c3bba9b6e1
Submitter: Jenkins
Branch:    feature/ec

commit 7528f2b22169e90fe8ddd19b7ef7d46ecff5d231
Author: Christian Schwede <christian.schwede@enovance.com>
Date:   Mon Oct 6 10:01:03 2014 +0000

Fix minor typo
    
    Fixes minor typo in one method and adds missing parameter in other
    method. Only checked swift/container/reconciler.py for now.
    
    Change-Id: I5c648010f09b6e4b1fb0380bc97b266e680602f8

commit 94fd95ba30c72fbcb03367aaa8da407a408948d5
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Sat Oct 4 06:07:47 2014 +0000

Imported Translations from Transifex
    
    Change-Id: I31b5e6b0f2922150902e1bfa52144302ee0c7a8e

commit d6a827792619f3343af07fc2519f4253fbdc67f7
Author: John Dickinson <me@not.mn>
Date:   Fri Oct 3 10:17:00 2014 -0400

updated AUTHORS and CHANGELOG for 2.2.0
    
    Change-Id: I6c0bc1570f6a48439de5a029a86f1b582f30f8a6

commit 5b2c27a5874c2b5b0a333e4955b03544f6a8119f
Author: Richard (Rick) Hawkins <richard.hawkins@rackspace.com>
Date:   Wed Oct 1 09:37:47 2014 -0400

Fix metadata overall limits bug
    
    Currently metadata limits are checked on a per request basis. If
    multiple requests are sent within the per request limits, it is
    possible to exceed the overall limits.  This patch adds an overall
    metadata check to ensure that multiple requests to add metadata to
    an account/container will check overall limits before adding
    the additional metadata.
    
    Change-Id: Ib9401a4ee05a9cb737939541bd9b84e8dc239c70
    Closes-Bug: 1365350

commit 301a96f664d58b4ccad8e3cbf5d5a889cc76790f
Author: Jay S. Bryant <jsbryant@us.ibm.com>
Date:   Tue Sep 30 15:08:59 2014 -0500

Ensure sys.exit called in fork_child after exception
    
    Currently, the fork_child() function in auditor.py does not
    handle the case where run_audit() encounters an exception
    properly.
    
    A simple case is where the /srv directory is set
    with permissions such that the 'swift' user cannot access it.
    Such a situation causes a os.listdir() to return an OSError
    exception.  When this happens the fork_child() process does not
    run to completion and sys.exit() is not executed.  The process
    that was forked off continues to run as a result.  Execution goes
    back up to the audit_loop function which restarts the whole process.  The
    end result is an increasing number of processes on the system
    until the parent is terminated.  This can quickly exhaust the
    process descriptors on a system.
    
    This change wraps run_audit() in a try block and adds an
    exception handler that prints what exception was encountered.
    The sys.exit() was moved to a finally: block so that it will
    always be run, avoiding the creation of zombies.
    
    Change-Id: I89d7cd27112445893852e62df857c3d5262c27b3
    Closes-bug: 1375348

commit 6d49cc3092168de6d22378557b2c37ea4063beeb
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Oct 2 17:14:58 2014 -0400

Fix ring-builder crash.
    
    If you adjust the replica count after adding devices but before the
    first rebalance, the ring builder would crash.
    
    To reproduce:
    
        $ swift-ring-builder crashy.builder create 8 3 1
        $ swift-ring-builder crashy.builder add r1z1-10.1.1.1:6000/sda1 1
        $ swift-ring-builder crashy.builder set_replicas 5
        $ swift-ring-builder crashy.builder rebalance
    
    Change-Id: Id91cf8680961ccf6e3db153577e99cad545bee9d

commit 702d88bc89f56b96f9e47dc0923f66cbc7cbd5f6
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Oct 1 14:03:51 2014 -0400

fix a comment
    
    Change-Id: I6310390f756bc0d326e56fceef32b630f0fe1ca1

commit 64aa8062bc96799993deafbdf184adb72c509f20
Author: YummyBian <yummy.bian@gmail.com>
Date:   Sat Sep 27 23:49:38 2014 +0800

Some statements are evaluated twice in the setUp of the
    TestObjectReplicator
    
    Remove the duplicated statements.
    
    Closes-Bug: #1374783
    
    Change-Id: If2b55e864fea497d7a7b218adf11eb7749c27765

commit 20e9ad538b6ba90674a75310d9cc184162ba9398
Author: Christian Schwede <christian.schwede@enovance.com>
Date:   Sun Sep 14 20:48:32 2014 +0000

Limit partition movement when adding a new tier
    
    When adding a new tier (region, zone, node, device) to an existing,
    already balanced ring all existing partitions in the existing tiers of
    the same level are gathered for reassigning, even when there is
    not enough space in the new tier. This will create a lot of unnecessary
    replication traffic in the backend network.
    
    For example, when only one region exists in the ring and a new region is
    added, all existing parts are selected to reassign, even when the new
    region has a total weight of 0. Same for zones, nodes and devices.
    
    This patch limits the number of partitions that are choosen to reassign
    by checking for devices on other tiers that are asking for more
    partitions.
    
    Failed devices are not considered when applying the limit.
    
    Co-Authored By: Florent Flament <florent.flament-ext@cloudwatt.com>
    Change-Id: I6178452e47492da4677a8ffe4fb24917b5968cd9
    Closes-Bug: 1367826

commit 2ce6341b527e81a43513adef6f5c54b4e09e27a4
Author: Pete Zaitcev <zaitcev@kotori.zaitcev.us>
Date:   Mon Sep 29 14:49:42 2014 -0600

Fix up the return value of launch()
    
    This is "obviously" the right thing to do, except of course it's
    pure sugar: the return value is not used anywhere. Except if someone
    has a script that imports the whole thing somehow and then does
    isinstance(dict). Because that is so much easier than submitting
    a patch, I can imagine someone, somewhere doing that.
    
    The fix came in a patch by dfg, see review 121851.
    
    Change-Id: I39ddf70dc2d027b7db6e31097400248dc66eb137

commit e2285e46db11dd59edbe076540ac0db24ef6f3f2
Author: OpenStack Proposal Bot <openstack-infra@lists.openstack.org>
Date:   Sun Sep 28 06:08:55 2014 +0000

Imported Translations from Transifex
    
    Change-Id: I59b68ecb6f07e2b71e4655699d0165007d764718

commit 041cf67c36988e3aee0eab0e597d78cec8688df0
Author: YummyBian <yummy.bian@gmail.com>
Date:   Sat Sep 27 23:10:48 2014 +0800

Incorrect variable name in the diskfile.py
    
    You should use the suffix variable instead of the hash_ variable when
    updates the hashes dictionary in the get_hashes().
    
    Fixes Bug1374777
    
    Change-Id: If9a7ab5b6d271385e4fdc79ff9c216047a2f4d7e

commit 0f93fff46ad098a360f719a762dde4d17e852fbc
Author: Keshava Bharadwaj <kb.sankethi@gmail.com>
Date:   Thu Sep 25 00:48:49 2014 +0530

Fixes unit tests to clean up temporary directories
    
    This patch fixes the unit tests to remove the temporary directories
    created during run of unit tests. Some of unit tests did not tear down
    correctly, whatever it had set it up for running. This would over period
    of time bloat up the tmp directory. As on date, there were around 49 tmp
    directories left uncleared per round of unit tests. This patch fixes it.
    
    Change-Id: If591375ca9cc87d52c7c9c6dc16c9fb4b49e99fc

commit 38ba5790fb527967c2fcbaf094e76a73f4b94d38
Author: Keshava Bharadwaj <kb.sankethi@gmail.com>
Date:   Thu Sep 18 00:45:35 2014 +0530

Provides proper error handling on builder unpickle
    
    This patch provides the necessary error handling while unpickling
    a builder file. Earlier if a builder file is empty/invalid/corrupted,
    the stacktrace was shown to user with an exit code of 1. This fixes it
    to show a user-friendly message and also returns the exit code of 2,
    indicating there was a failure.
    
    Change-Id: I51eb24702c422299629f8053d4591dd10f5863f8
    Closes-Bug: #1370680

commit e567722c4ea125f4607093112396e16ae6c6dd42
Author: John Dickinson <me@not.mn>
Date:   Mon Sep 22 09:46:34 2014 -0700

updated hacking rules
    
    1) Added comment for H231, which we were already enforcing. H231
    is for Python 3.x compatible except statements.
    
    2) Added check for H201, which we were enforcing in reviews
    but waiting on hacking checks to be updated. H201 is for bare
    except statements, and the update in upstream hacking is to
    support the "  # noqa" flag on it.
    
    The H201 check catches some existing bare excepts that are fixed.
    
    Change-Id: I68638aa9ea925ef62f9035a426548c2c804911a8

commit 781a1620048edbbc057b15c94da390847b34d9df
Author: Michael Barton <mike@weirdlooking.com>
Date:   Thu Sep 25 16:29:57 2014 +0000

use get_container_info in ratelimit
    
    Everyone else cooperates to store the container_info cache in the env the
    first time it's requested.  This should save a duplicate memcache hit on
    requests that ratelimiting looks at.
    
    Change-Id: Ic6411d4619db6b53fafe9fdbf1d0a370d1258c38

commit 23c2d95e98ee5edf317da55e3ee4c303a18991b6
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Tue Sep 23 12:32:42 2014 -0700

Remove unused option from ObjectReplicator.__init__
    
    disk_chunk_size is unused by the ObjectReplicator class. Note that it
    *is* used by the object replicator process; it's just that
    DiskFileManager is the one that pulls it out of the conf and pays
    attention to it. Keeping it as an attribute on ObjectReplicator is
    unnecessary.
    
    Change-Id: I1eeef7b33873b4c8bb269ca02dcb067098b6fded

commit 6354e2da57a8d487caf3605d4005134f584cf935
Author: Gerry Drudy <gerry.drudy@hp.com>
Date:   Mon Sep 15 11:52:14 2014 +0100

direct_client not passing args between some functions
    
    The call to _get_direct_account_container in direct_get_account
    has several of its args =None instead of set to the value passed
    to direct_get_account.
    
    The same applies to _get_direct_account_container in
    direct_get_container.
    
    The direct_get_container is only called by the account-reaper
    and this bug will have limited impact on it. The marker,
    maintained in reap_container, is ignored by direct_get_container.
    This is not as bad as it sounds, if the account-reaper successfully
    deletes the first 10K objects, assuming the container has > 10K
    objects, the next call to direct_get_container will in fact return
    the next 10K objects even though it sets marker=None (assuming the
    first 10K objects were successfully deleted).
    
    This patch also updates test_direct_get_account and
    test_direct_get_container to ensure the appropriate
    args are included in the connection query_string.
    
    Closes-Bug: #1369558
    Change-Id: If1c8aa1240d38354ebc9b1ebca92dc1c8c36cb5f

commit c767ec9a37faa4414702aa9e39231afe0ba3b097
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Wed Jul 9 16:04:20 2014 -0700

Let eventlet.wsgi.server log tracebacks when eventlet_debug is enabled
    
    The "logging" available in eventlet.wsgi.server/BaseHTTPServer doesn't
    generally suite our needs, so it should be bypassed using a NullLogger in
    production.  But in development it can be useful if tracebacks generated from
    inside eventlet.wsgi (say a NameError in DiskFile.__iter__) end up in logs.
    Since we already have eventlet_debug parsed inside of run_server we can skip
    the NullLogger bypass and let stuff blast out to STDERR when configured for
    development/debug logging.
    
    Change-Id: I20a9e82c7fed8948bf649f1f8571b4145fca201d

Thierry Carrez (ttx) on 2014-10-16

Changed in swift:
milestone:	2.2.0-rc1 → 2.2.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

sim.sh Edit
sim2.sh Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.