Bug #1586167 “min_part_hours of 0 is not helpful - fix it” : Bugs : OpenStack Object Storage (swift)

Revision history for this message

Cheng Li (shcli) wrote on 2016-07-19:

#1

Hi clayg,

I can think of two solutions for this problem. And I prefer the first one, but I don't know if this will cause other problems.

Proposal 1.
One hour is very long, but one **second** is not. Cloud we change the unit of elapsed time to second instead of hour?

for example:
https://github.com/openstack/swift/blob/master/swift/common/ring/builder.py#L847-L858
these code could be replaced with follow lines:
```
        elapsed_seconds = int(time() - self._last_part_moves_epoch)
        if elapsed_seconds <= 0:
            return
        for part in range(self.parts):
            last_plus_elapsed = self._last_part_moves[part] + elapsed_seconds
            if last_plus_elapsed < 0xff * 3600:
                self._last_part_moves[part] = last_plus_elapsed
            else:
                self._last_part_moves[part] = 0xff * 3600
        self._last_part_moves_epoch = int(time())
```

and
```
self._last_part_moves[part] < self.min_part_hours
```
be replaced by
```
self._last_part_moves[part] <= self.min_part_hours * 3600
```

Proposal 2.
Adding an array to track if one partition had ever been moved in the rebalance.

```
self._last_part_moves[part] = 0
```

will be replaced with:

```
self._last_part_moves[part] = 0
self._part_moved[part] = True
```

```
if self._last_part_moves[part] < self.min_part_hours
```

be replaces with:
```
if self._last_part_moves[part] < self.min_part_hours or self._part_moved[part]
```

Changed in swift:
assignee:	nobody → Cheng Li (shcli)

Revision history for this message

clayg (clay-gerrard) wrote on 2016-07-19:

#2

I'm not sure exactly how the first proposal works, the trick seems to be adding a the granularity to _last_part_moves - which is currently just a cheap a byte array.

The second proposal seems quite reasonable to me and was indeed how I was thinking the problem would be approached. We'd only need to keep it around during the rebalance, depending on how many parts move in a rebalance 2 ** partpower bitmap might be more attractive than a huge python dictionary of booleans. This would change the behavior of rings with min_part_hours = 0, but for the better as I reason.

Revision history for this message

Cheng Li (shcli) wrote on 2016-07-20:

#3

Yes, the second one is clearer. And the idea of bitmap is great.

Revision history for this message

Cheng Li (shcli) wrote on 2016-07-20:

#4

I have searched current swift code, but didn't find bitmap related code.
Does swift/openstack have the standard for using bitmap?

Revision history for this message

clayg (clay-gerrard) wrote on 2016-07-20:

#5

... not to my knowledge? Maybe just "do it"?

parts_moved = 0

# set part
parts_moved |= 1 << part

# test part
parts_moved & 1 << part

I was poking at it with sys.getsizeof(parts_moved) comparing parts_moved[random.randint(0, 2 ** 16)] = True vs parts_moved |= 1 << random.randint(0, 2 ** 16) and even after only a few hundred parts I think we might be better off with the big num than the dict?

... maybe there's something better.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-24: Fix proposed to swift (master)

#6

Fix proposed to branch: master
Review: https://review.openstack.org/346475

Changed in swift:
status:	New → In Progress

clayg (clay-gerrard) on 2016-10-11

Changed in swift:
importance:	Undecided → Medium

Revision history for this message

Matthew Oliver (matt-0) wrote on 2016-10-11: Re: [Bug 1586167] Re: min_part_hours of 0 is not helpful - fix it

#8

Download full text (4.2 KiB)

I remember playing with a bitmap idea back in sharding POC #1, where I'd
track what container shards existed in certain shard powers. I don't know
how useful it would be but here is a link to the basic Bitmap class I
created for it.. it never was fully tested though, and unlike just a
number, it used a string but really only a Hex string (each character
represents 4 partitions).. I don't know why I didn't just go one step
further and just use ints and bit shifting (you'd have to ask past Matt)..
so you could say this is somewhere in between the 2 solutions.

But either way, seeing as POC #1 is a no go, maybe the Bitmap class that
was never used could be fixed, tuned and inproved:
https://github.com/matthewoliver/swift/blob/sharding/swift/common/utils.py#L3324

Matt

On Tue, Oct 11, 2016 at 11:46 AM, clayg <email address hidden> wrote:

> ** Changed in: swift
> Importance: Undecided => Medium
>
> --
> You received this bug notification because you are a member of Swift
> Core security contacts, which is subscribed to OpenStack Object Storage
> (swift).
> https://bugs.launchpad.net/bugs/1586167
>
> Title:
> min_part_hours of 0 is not helpful - fix it
>
> Status in OpenStack Object Storage (swift):
> In Progress
>
> Bug description:
> As more people are moving to monitoring partition placement in their
> clusters independently to control when it's "safe" to rebalance a ring
> - min_part_hours is starting to create more and more surprising
> behaviors (e.g. lp bug #1558754)
>
> Unfortunately a min_part_hours of 0 is a non-starter because of it's
> tendency to move more than one replica of a part in the same rebalance
> [1] - which is *related* to the constraint of min_part_hours but
> somewhat orthogonal. Even if a part hasn't moved in >24 hours it's
> still never sane to move more than one part-replica in the same
> rebalance cycle. No matter how small your min_part_hours it should
> still require at least N rebalances to move N part-replicas of the
> same part.
>
> I wouldn't necessarily *recommend* running your builders with min-
> part-hours of 0 - but if you're fully automating dispersion report
> style partition placement monitoring and tying that into automated
> ring rebalancing it's starting to look more attractive to take full
> control of how frequently part-replicas are allowed to move separately
> from min_part_hours. Essentially as soon as parts are on the right
> nodes - if you have pending changes to the ring (changes weights of
> devices your gradually adding/removing, remove failed devices or
> entire nodes, etc.) it's best to allow those changes to proceed as
> soon as the replication from the previous rebalance is "finished".
> Worse still if a topology change coupled with device failures causes
> new capacity to receive parts that are not optimum for dispersion
> (because the best parts are locked down from min_part_hours) - you may
> end up in a placement hole that's tedious for the current rebalance
> algorithm to dig out of (see all the "part_swapping" tests in
> test.unit.common.ring.test_builder)
>
> It's possible to run with a *non-zero* min_part_hour...

I remember playing with a bitmap idea back in sharding POC #1, where I'd
track what container shards existed in certain shard powers. I don't know
how useful it would be but here is a link to the basic Bitmap class I
created for it.. it never was fully tested though, and unlike just a
number, it used a string but really only a Hex string (each character
represents 4 partitions).. I don't know why I didn't just go one step
further and just use ints and bit shifting (you'd have to ask past Matt)..
so you could say this is somewhere in between the 2 solutions.

But either way, seeing as POC #1 is a no go, maybe the Bitmap class that
was never used could be fixed, tuned and inproved:
https://github.com/matthewoliver/swift/blob/sharding/swift/common/utils.py#L3324

Matt

On Tue, Oct 11, 2016 at 11:46 AM, clayg <clay.gerrard@gmail.com> wrote:

> ** Changed in: swift
>    Importance: Undecided => Medium
>
> --
> You received this bug notification because you are a member of Swift
> Core security contacts, which is subscribed to OpenStack Object Storage
> (swift).
> https://bugs.launchpad.net/bugs/1586167
>
> Title:
>   min_part_hours of 0 is not helpful - fix it
>
> Status in OpenStack Object Storage (swift):
>   In Progress
>
> Bug description:
>   As more people are moving to monitoring partition placement in their
>   clusters independently to control when it's "safe" to rebalance a ring
>   - min_part_hours is starting to create more and more surprising
>   behaviors (e.g. lp bug #1558754)
>
>   Unfortunately a min_part_hours of 0 is a non-starter because of it's
>   tendency to move more than one replica of a part in the same rebalance
>   [1] - which is *related* to the constraint of min_part_hours but
>   somewhat orthogonal.  Even if a part hasn't moved in >24 hours it's
>   still never sane to move more than one part-replica in the same
>   rebalance cycle.  No matter how small your min_part_hours it should
>   still require at least N rebalances to move N part-replicas of the
>   same part.
>
>   I wouldn't necessarily *recommend* running your builders with min-
>   part-hours of 0 - but if you're fully automating dispersion report
>   style partition placement monitoring and tying that into automated
>   ring rebalancing it's starting to look more attractive to take full
>   control of how frequently part-replicas are allowed to move separately
>   from min_part_hours.  Essentially as soon as parts are on the right
>   nodes - if you have pending changes to the ring (changes weights of
>   devices your gradually adding/removing, remove failed devices or
>   entire nodes, etc.) it's best to allow those changes to proceed as
>   soon as the replication from the previous rebalance is "finished".
>   Worse still if a topology change coupled with device failures causes
>   new capacity to receive parts that are not optimum for dispersion
>   (because the best parts are locked down from min_part_hours) - you may
>   end up in a placement hole that's tedious for the current rebalance
>   algorithm to dig out of (see all the "part_swapping" tests in
>   test.unit.common.ring.test_builder)
>
>   It's possible to run with a *non-zero* min_part_hours and use
>   pretend_min_part_hours_passed before each rebalance to achieve
>   essentially this behavior but less obvious why that's better than just
>   "fixing" min_part_hours 0.
>
>   Discuss.
>
>   1. here's an obvious example:
>
>   #!/bin/bash
>   rm test.builder || true
>   swift-ring-builder test.builder create 8 3 0
>   swift-ring-builder test.builder add r1z1-127.0.0.1:6000/d1 1.0
>   swift-ring-builder test.builder add r1z1-127.0.0.1:6000/d2 1.0
>   swift-ring-builder test.builder add r1z1-127.0.0.1:6000/d3 1.0
>   swift-ring-builder test.builder rebalance
>   swift-ring-builder test.builder add r1z1-127.0.0.2:6000/d4 1.0
>   swift-ring-builder test.builder add r1z1-127.0.0.2:6000/d5 1.0
>   swift-ring-builder test.builder add r1z1-127.0.0.2:6000/d6 1.0
>   swift-ring-builder test.builder rebalance
>   if [ $? -eq 0 ]; then
>       echo "Should not have reassigned >100% of part-replicas!"
>       exit 1
>   fi
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/1586167/+subscriptions
>

Revision history for this message

Matthew Oliver (matt-0) wrote on 2016-10-11:

#7

I remember playing with a bitmap idea back in sharding POC #1, where I'd track what container shards existed in certain shard powers. I don't know how useful it would be but here is a link to the basic Bitmap class I created for it.. it never was fully tested though, and unlike just a number, it used a string but really only a Hex string (each character represents 4 partitions).. I don't know why I didn't just go one step further and just use ints and bit shifting (you'd have to ask past Matt).. so you could say this is somewhere in between the 2 solutions.

But either way, seeing as POC #1 is a no go, maybe the Bitmap class that was never used could be fixed, tuned and inproved: https://github.com/matthewoliver/swift/blob/sharding/swift/common/utils.py#L3324

Matt

Revision history for this message

Cheng Li (shcli) wrote on 2016-10-12:

#9

Thanks, Matt. In the patch, I used bytes directly, because I didn't use too many features of bitmap, just set and get method.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-11: Fix merged to swift (master)

#10

Reviewed: https://review.openstack.org/346475
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ce26e78902a8c2feff5706a07750e34e9a94536e
Submitter: Jenkins
Branch: master

commit ce26e78902a8c2feff5706a07750e34e9a94536e
Author: cheng <email address hidden>
Date: Sun Jul 24 01:10:36 2016 +0000

For any part, only one replica can move in a rebalance

With a min_part_hours of zero, it's possible to move more than one
replicas of the same part in a single rebalance.

    This change in behavior only effects min_part_hour zero rings, which
    are understood to be uncommon in production mostly because of this
    very specific and strange behavior of min_part_hour zero rings.

    With this change, no matter how small your min_part_hours it will
    always require at least N rebalances to move N part-replicas of the
    same part.

    To supplement the existing persisted _last_part_moves structure to
    enforce min_part_hours, this change adds a _part_moved_bitmap that
    exists only during the life of the rebalance, to track when rebalance
    moves a part in order to prevent another replicas of the same part
    from being moved in the same rebalance.

Add authors: Clay Gerrard, <email address hidden>
Christian Schwede, <email address hidden>

Closes-bug: #1586167

Change-Id: Ia1629abd5ce6e1b3acc2e94f818ed8223eed993a

Changed in swift:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-13: Fix proposed to swift (stable/newton)

#11

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/419816

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-16: Fix included in openstack/swift 2.13.0

#12

This issue was fixed in the openstack/swift 2.13.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-24: Fix merged to swift (stable/newton)

#13

Reviewed: https://review.openstack.org/419816
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e5dd050113646a93a0fe7fb1aed4f1cafdc9139f
Submitter: Jenkins
Branch: stable/newton

commit e5dd050113646a93a0fe7fb1aed4f1cafdc9139f
Author: cheng <email address hidden>
Date: Sun Jul 24 01:10:36 2016 +0000

For any part, only one replica can move in a rebalance

With a min_part_hours of zero, it's possible to move more than one
replicas of the same part in a single rebalance.

    This change in behavior only effects min_part_hour zero rings, which
    are understood to be uncommon in production mostly because of this
    very specific and strange behavior of min_part_hour zero rings.

    With this change, no matter how small your min_part_hours it will
    always require at least N rebalances to move N part-replicas of the
    same part.

    To supplement the existing persisted _last_part_moves structure to
    enforce min_part_hours, this change adds a _part_moved_bitmap that
    exists only during the life of the rebalance, to track when rebalance
    moves a part in order to prevent another replicas of the same part
    from being moved in the same rebalance.

Add authors: Clay Gerrard, <email address hidden>
Christian Schwede, <email address hidden>

Closes-bug: #1586167

Change-Id: Ia1629abd5ce6e1b3acc2e94f818ed8223eed993a
(cherry picked from commit ce26e78)

tags:

added: in-stable-newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-31: Fix included in openstack/swift 2.10.2

#14

This issue was fixed in the openstack/swift 2.10.2 release.

OpenStack Object Storage (swift)

min_part_hours of 0 is not helpful - fix it

Bug Description

Other bug subscribers

Remote bug watches