Comment 9 for bug 1452431

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.openstack.org/222799
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=5070869ac0e6a2d577dd4054ffbcbffd06db3c5b
Submitter: Jenkins
Branch: master

commit 5070869ac0e6a2d577dd4054ffbcbffd06db3c5b
Author: Clay Gerrard <email address hidden>
Date: Fri Sep 11 16:24:52 2015 -0700

    Validate against duplicate device part replica assignment

    We should never assign multiple replicas of the same partition to the
    same device - our on-disk layout can only support a single replica of a
    given part on a single device. We should not do this, so we validate
    against it and raise a loud warning if this terrible state is ever
    observed after a rebalance.

    Unfortunately currently there's a couple not necessarily uncommon
    scenarios which will trigger this observed state today:

     1. If we have less devices than replicas
     2. If a server or zones aggregate device weight make it the most
        appropriate candidate for multiple replicas and you're a bit unlucky

    Fixing #1 would be easy, we should just not allow that state anymore.
    Really we never did - if you have a 3 replica ring with one device - you
    have one replica. Everything that iter_nodes'd would de-dupe. We
    should just be insisting that you explicitly acknowledge your replica
    count with set_replicas.

    I have been lost in the abyss for days searching for a general solutions
    to #2. I'm sure it exists, but I will not have wrestled it to
    submission by RC1. In the meantime we can eliminate a great deal of the
    luck required simply by refusing to place more than one replica of a
    part on a device in assign_parts.

    The meat of the change is a small update to the .validate method in
    RingBuilder. It basically unrolls a pre-existing (part, replica) loop
    so that all the replicas of the part come out in order so that we can
    build up the set of dev_id's for which all the replicas of a given part
    are assigned part-by-part.

    If we observe any duplicates - we raise a warning.

    To clean the cobwebs out of the rest of the corner cases we're going to
    delay get_required_overload from kicking in until we achive dispersion,
    and a small check was added when selecting a device subtier to validate
    if it's already being used - picking any other device in the tier works
    out much better. If no other devices are available in the tier - we
    raise a warning. A more elegant or optimized solution may exist.

    Many unittests did not meet the criteria #1, but the fix was straight
    forward after being identified by the pigeonhole check.

    However, many more tests were affected by #2 - but again the fix came to
    be simply adding more devices. The fantasy that all failure domains
    contain at least replica count devices is prevalent in both our ring
    placement algorithm and it's tests. These tests were trying to
    demonstrate some complex characteristics of our ring placement algorithm
    and I believe we just got a bit too carried away trying to find the
    simplest possible example to demonstrate the desirable trait. I think
    a better example looks more like a real ring - with many devices in each
    server and many servers in each zone - I think more devices makes the
    tests better. As much as possible I've tried to maintain the original
    intent of the tests - when adding devices I've either spread the weight
    out amongst them or added proportional weights to the other tiers.

    I added an example straw man test to validate that three devices with
    different weights in three different zones won't blow up. Once we can
    do that without raising warnings and assigning duplicate device part
    replicas - we can add more. And more importantly change the warnings to
    errors - because we would much prefer to not do that #$%^ anymore.

    Co-Authored-By: Kota Tsuyuzaki <email address hidden>
    Related-Bug: #1452431
    Change-Id: I592d5b611188670ae842fe3d030aa3b340ac36f9