Comment 2 for bug 1697543

Revision history for this message
clayg (clay-gerrard) wrote :

With enough replicas and a failed device it's easier to see that we should look at delta_dispersion in addition to delta_balance:

https://gist.github.com/clayg/b0d0d41a382e70356bb58a1ee94d1b73

With the failed device on the server that's desperately trying to shed parts, and enough replicas -
 balance will not change significantly from one invocation to the next while rebalance is busy fixing dispersion...

We should expect that as a desire-able behavior and use delta_dispersion to get over the hump:

ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance
Cowardly refusing to save rebalance as it did not change at least 1%.
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder |head
stuck.builder, build version 63, id a5b9fbd213bb4c20ab60eff2a2bb3a75
256 partitions, 13.000000 replicas, 1 regions, 1 zones, 52 devices, 100.00 balance, 100.00 dispersion
...
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance
Cowardly refusing to save rebalance as it did not change at least 1%.
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance -f
Reassigned 256 (100.00%) partitions. Balance is now 100.00. Dispersion is now 0.00
-------------------------------------------------------------------------------
NOTE: Balance of 100.00 indicates you should push this
      ring, wait at least 0 hours, and rebalance/repush.
-------------------------------------------------------------------------------
ubuntu@saio:/vagrant/.scratch/rings/tata$ swift-ring-builder stuck.builder rebalance
Reassigned 255 (99.61%) partitions. Balance is now 1.56. Dispersion is now 0.00

Notice the delta_dispersion when "cowardly refusing to save rebalance" is *HUGE*