swift-ringbuilder rebalance moves 100% partitions when adding a new node to a new region
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Christian Schwede |
Bug Description
When adding a new node to a new region of an existing Swift cluster,
the number of partitions moved is still 100%. On the other hand, when
the node is added to the current region, only the required partitions
are moved.
Patch https:/
partitions to a new region progressively. The drawback is that a lot
of useless traffic is generated inside the initial region.
Starting from a 5 nodes Swift cluster, when I add a new device to the
current 'r1' region with weight 1000, 18.75% of partitions are moved
to this new device (as expected).
$ swift-ring-builder object.builder
object.builder, build version 5
262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 0
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 157287 0.00
1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 157286 -0.00
2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 157286 -0.00
3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 157287 0.00
4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 157286 -0.00
$ swift-ring-builder object.builder add r1z1-192.
Device d5r1z1-
$ swift-ring-builder object.builder rebalance
Reassigned 49152 (18.75%) partitions. Balance is now 0.00.
$ swift-ring-builder object.builder
object.builder, build version 7
262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 0
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 147456 0.00
1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 147456 0.00
2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 147456 0.00
3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 147456 0.00
4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 147456 0.00
5 1 1 192.168.100.15 6000 192.168.100.15 6000 d1 1000.00 49152 0.00
$
Now, I start from the same builder as previously with my 5 nodes
cluster. When I add a new device to a new 'r2' region with weight
1000, 100% of partitions (262144 partitions) are moved, while 18.75%
were expected to move as when adding the node in the 'r1'
region. While it is true that only 49152 partitions are in the new
region, many partitions seem to have been moved between nodes of the
'r1' region uselessly. This would most probably generate heavy traffic
on a running cluster.
$ swift-ring-builder object.builder
object.builder, build version 5
262144 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 0
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 157287 0.00
1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 157286 -0.00
2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 157286 -0.00
3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 157287 0.00
4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 157286 -0.00
$ swift-ring-builder object.builder add r2z1-192.
Device d5r2z1-
$ swift-ring-builder object.builder rebalance
Reassigned 262144 (100.00%) partitions. Balance is now 0.00.
$ swift-ring-builder object.builder
object.builder, build version 7
262144 partitions, 3.000000 replicas, 2 regions, 2 zones, 6 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 0
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
0 1 1 192.168.100.10 6000 192.168.100.10 6000 d1 3000.00 147456 0.00
1 1 1 192.168.100.11 6000 192.168.100.11 6000 d1 3000.00 147456 0.00
2 1 1 192.168.100.12 6000 192.168.100.12 6000 d1 3000.00 147456 0.00
3 1 1 192.168.100.13 6000 192.168.100.13 6000 d1 3000.00 147456 0.00
4 1 1 192.168.100.14 6000 192.168.100.14 6000 d1 3000.00 147456 0.00
5 2 1 192.168.100.15 6000 192.168.100.15 6000 d1 1000.00 49152 0.00
$
Initial cluster was built with the following script:
#!/bin/bash
swift-
IPs="
for ip in $IPs
do
swift-
done
swift-
swift-
Changed in swift: | |
assignee: | nobody → Christian Schwede (cschwede) |
status: | New → In Progress |
Changed in swift: | |
milestone: | none → 2.2.0-rc1 |
status: | Fix Committed → Fix Released |
Changed in swift: | |
milestone: | 2.2.0-rc1 → 2.2.0 |
Florent, thanks for your extensive test.
I just had a quick look into this, and currently I think only the displayed percentage is wrong. The actual number of moved partitions from one device to another is quite low; I attached a small shell script that diffs the assigned partitions after each rebalance, and these numbers are very low (as expected).
Will have a deeper look into this tomorrow.