Let admins add a region without melting their cluster

Bug #1360706 reported by OpenStack Infra
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-manuals
Invalid
Medium
Unassigned

Bug Description

https://review.openstack.org/116394
commit 6d77c379bd8f51e520896967cdcfc9add3d934a4
Author: Samuel Merritt <email address hidden>
Date: Tue Aug 19 16:44:56 2014 -0700

    Let admins add a region without melting their cluster

    Prior to this commit, swift-ring-builder would place partitions on
    devices by first going for maximal dispersion and breaking ties with
    device weight. This commit flips the order so that device weight
    trumps dispersion.

    Note: if your ring can be balanced, you won't see a behavior
    change. It's only when device weights and maximal-dispersion come into
    conflict that this commit changes anything.

    Example: a cluster with two regions. Region 1 has a combined weight of
    1000, while region 2 has a combined weight of only 400. The ring has 3
    replicas and 2^16 partitions.

    Prior to this commit, the balance would look like so:

      Region 1: 2 * 2^16 partitions
      Region 2: 2^16 partitions

    After this commit, the balance will be:

      Region 1: 10/14 * 2^16 partitions (more than before)
      Region 2: 4/14 * 2^16 partitions (fewer than before)

    One consequence of this is that some partitions will not have a
    replica in region 2, since it's not big enough to hold all of them.

    This way, a cluster operator can add a new region to a single-region
    cluster in a gradual fashion so as not to destroy their WAN link with
    replication traffic. As device weights are increased in the second
    region, more replicas will shift over to it. Once its weight is half
    that of the first region's, every partition will have a replica there.

    DocImpact

    Change-Id: I945abcc4a2917bb12be554b640f7507dd23cd0da

commit 8fecf490fe3e1befaa5e3c1a09b28bda50a90b47
Author: Samuel Merritt <email address hidden>
Date: Tue Aug 19 14:50:44 2014 -0700

    Respect device weights when adding replicas

    Previously, any new (partition, replica) pairs created by adding
    replicas were spread evenly across the whole ring. Now it respects
    device weights, so more (partition, replica) pairs are placed on
    higher-weight devices.

    Note that this only affects partition assignment *when the replica
    count is increased*. Normal adding of disks without changing the
    replica count is, and was, just fine.

    The cause was that the devices' parts_wanted values weren't being
    updated when the replica count went up. Since adding replicas makes
    more things to assign, each device's desired share should have gone
    up, but it didn't. This appears to be a simple oversight on the part
    of the original author (me).

    Change-Id: Idab14be90fab243c1077a584396a9981a4bd8638

commit 8526a071909f32835ca04d4114a77c486f936e29
Author: paul luse <email address hidden>
Date: Tue Aug 19 07:10:06 2014 -0700

    Fix sporadic false failure in xprofile unit test code (master)

    Same fix as is going through on feature/ec, fixed it there first
    as it was happening often and no point in waiting until EC is
    complete before getting it over to master...

    Appears that what's been happening on feature/ec lately
    with a middleware failure has to do with a hardcoded PID in
    the test code itself causing a profile file to exist when
    its not expected to by the test.

    Test code used a PID of 135 and based on how get_logfiles()
    is written, any real PID that starts with 135 will cause a
    false failure in test_call(). This can be seen via inspection
    and confirmed in logfiles where all captured assertions show
    a profile filename beginning with 135. Tried getting smarter
    about choosing a fake PID (int) but then decided it was 100%
    safe to use 'ABC' for this test since that'll never show up!

    Change-Id: I958f1525c2727b3fb3f533242fa509fa8e59926c

commit 75a329c7f54fc3b06e11dea1b395d5e552ce0230
Author: John Dickinson <email address hidden>
Date: Sun Aug 17 21:08:57 2014 -0700

    add the account management config values to swift info

    added account_autocreate and allow_account_management to
    the /info endpoint

    Change-Id: I4b239c9cefb728c3c93bf75cad065c72edf2fc0a

commit 0abd2cba035f9f9ab0970708ce5187bfb52462e5
Author: David Goetz <email address hidden>
Date: Mon Aug 11 15:08:18 2014 -0700

    Shard expiring object container

    All the expiring objects for a given X-Delete-At are funnelled into the
    same expiring object container- this can act as a bottleneck.

    Change-Id: I288a177a7ae3e213c727a2a81fa76d4ef9cf7eb3

commit a0e0014159c0e12a10cc452b92e86f99196e77bb
Author: David Goetz <email address hidden>
Date: Mon Aug 11 09:43:13 2014 -0700

    Sleep for longer at a time in lock_path.

    When lock_path is called and the lock goes for the whole 10 seconds,
    the flock is called 1000 times. With this patch, the short 0.01 sleep
    is used for the first 1% of the total lock time and then 1% of the
    total lock time is used.

    Change-Id: Ibed6bdb49bddcdb868742c41f86d2482a7edfd29

commit d8b3e16c038eac0f3b691ceca88b9bf7a0b631e1
Author: Alistair Coles <email address hidden>
Date: Mon Aug 4 17:22:08 2014 +0100

    Update tempurl docstring with methods config option

    Adds a description of the methods config option to the
    docstring.

    Also fixes description of swift.source value and a
    couple of other typos.

    Change-Id: If3489087df020536ff663ffe4e249c300ea2d506

commit e114a628dcaf59e2e4649dff10916b78fead4ba9
Author: Clay Gerrard <email address hidden>
Date: Tue Jul 1 11:23:48 2014 -0700

    Remove fake _get_part and use the real thing

    The get_part method is fast and stable given a consistent hash path
    suffix/prefix, so there's no absolute requirement for the fake
    implementation other than convenience. OTOH, removing the fake
    implementation and fixing the tests that were relying on it should make
    it easier to write better tests going forward and harder to hide bugs
    that don't show up when using the fakes.

    There may be some overhead when writing new tests that use the ring if
    you're making assertions on partitions or paths, but with a part power
    of zero it's normally trivially obvious when a 1 needs to be a 0 or vice
    versa. Or you can just drop the assertions around the parts you were
    faking anyway.

    Change-Id: I8bfc388a04eff6491038991cdfd7686c9d961545

commit ba38ba5df54b8eb24a371f5ffd19a44a5b7eeed7
Author: Christian Schwede <email address hidden>
Date: Fri Jun 13 10:33:03 2014 +0000

    Fix object auditor recon and logging

    1. Nothing is logged until at least one audit needs more than log_time.
       If the audit runtime never exceeds this value (which is 3600 seconds
       by default) nothing is logged and the recon entry is never updated.
       It happens especially on very fast disks with a low usage and/or if
       only a few disks are audited (for example, using the --devices
       parameter on the command line).

       This patch changes this to log and update the recon cache entry
       at least one time after the first device audit.

    2. If device_dirs is set the recon entry will be deleted after all
        devices are audited.

    Change-Id: Ifa504d21389b3a5f7eaf914b19d6e26543dac121

Tags: swift
Tom Fifield (fifieldt)
Changed in openstack-manuals:
milestone: none → juno
status: New → Triaged
importance: Undecided → Medium
Tom Fifield (fifieldt)
Changed in openstack-manuals:
status: Triaged → Invalid
milestone: juno → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.