some rings won't rebalance
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
The problem is documented here:
I saw it again recently and it looks like this:
https:/
I kicked up some better rebalance --debug logging to make it easier to see:
https:/
Basically you see a bunch of this:
DEBUG: Gathered 5814/0 from dev r3z5-127.
DEBUG: Gathered 5855/1 from dev r3z5-127.
DEBUG: Gathered 5916/0 from dev r3z5-127.
DEBUG: Gathered 5956/1 from dev r3z5-127.
DEBUG: Gathered 5962/2 from dev r3z5-127.
DEBUG: Gathered 5971/0 from dev r3z5-127.
DEBUG: Gathered 5983/0 from dev r3z5-127.
DEBUG: Gathered 5987/0 from dev r3z5-127.
DEBUG: Gathered 5988/1 from dev r3z5-127.
DEBUG: Gathered 6029/1 from dev r3z5-127.
DEBUG: Gathered 6041/0 from dev r3z5-127.
Followed by this:
DEBUG: Placed 5586/2 onto dev r3z5-127.
DEBUG: Placed 5597/0 onto dev r3z5-127.
DEBUG: Placed 5752/0 onto dev r3z5-127.
DEBUG: Placed 4970/0 onto dev r3z5-127.
DEBUG: Placed 5916/0 onto dev r3z5-127.
DEBUG: Placed 5472/0 onto dev r3z5-127.
DEBUG: Placed 5358/0 onto dev r3z5-127.
DEBUG: Placed 5962/2 onto dev r3z5-127.
DEBUG: Placed 5056/0 onto dev r3z5-127.
DEBUG: Placed 5235/2 onto dev r3z5-127.
but meanwhile this isn't the zone that needs the parts!
r1z1 49152 0.00 1 16384 49152 0 0
r1z2 49152 0.00 1 16384 49152 0 0
r3z5 49321 0.00 1 16215 49321 0 0
r3z6 48983 0.00 1 16553 48983 0 0
IIRC the issue is that by weight we want to pull parts out of r3z5, and we tend to take ones that are over-represented in r3 (two replicas) when we go to set them down we see both regions have one copy (we're holding the third) but r3z6 is hungry so head back into r3 - only we can't put it on r3z6, so we land back on r3z5 rather than putting extra parts in r1
The problem is in the implementation phase, not the planning phase - we know each server should hold ~0.75 replicanths - but we can't seem to notice that the ones we want to move from r3z5 need swap places with other parts in r3z6. I think I had some idea about overloading gather so that we pickup some extra % of parts... but like I think there's only a few 100 out of ~50K parts that *could* move from r3z5 to r3z6 and we find a few every time we rebalance - but we have to move 1K's of parts to do it. NOT GREAT!
Reviewed: https:/ /review. openstack. org/503152 /git.openstack. org/cgit/ openstack/ swift/commit/ ?id=23219664564 d1b5a7ba02bbf83 09ec699ab7a4cb
Committed: https:/
Submitter: Jenkins
Branch: master
commit 23219664564d1b5 a7ba02bbf8309ec 699ab7a4cb
Author: Kota Tsuyuzaki <email address hidden>
Date: Fri Jun 30 02:03:48 2017 -0700
Accept a trade off of dispersion for balance
... but only if we *have* to!
During the initial gather for balance we prefer to avoid replicas on
over-weight devices that are already under-represented in any of it's
tiers (i.e. if a zone has to have at least one, but may have as many of
two, don't take the only replica). Instead we hope by going for
replicas on over-weight devices that are at the limits of their
dispersion we might have a better than even chance we find a better
place for them during placement!
This normally works on out - and especially so for rings which can
disperse and balance. But for existing rings where we'd have to
sacrifice dispersion to improve balance the existing optimistic gather
will end up refusing to trade dispersion for balance - and instead get
stuck without solving either!
You should always be able to solve for *either* dispersion or balance.
But if you can't solve *both* - we bail out on our optimistic gather
much more quickly and instead just focus on improving balance. With
this change, the ring can get into balanced (and un-dispersed) states
much more quickly!
Change-Id: I17ac627f94f642 11afaccad15596a 9fcab2fada2 Change- Id: Ie6e2d116b65938 edac29efa6171e2 470bb3e8e12
Related-
Closes-Bug: 1699636
Closes-Bug: 1701472