swift-ring-builder not distributing partitions evenly between zones
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
There seems to be a different behavior in the swift-ring-builder between swift versions 1.10.0 and 2.2.0 which leads to multiple copies of an object being stored in the same zone.
Using the swift-ring-builder utility in swift version 1.10.0, I could use the following commands to build an object ring for a single node with 2 replicas, 2 zones, and 3 equally weighted devices and see the partitions distributed evenly between the 2 zones. Using the same commands after updating to swift version 2.2.0, I see the partitions distributed evenly among the 3 devices, but not distributed evenly between the 2 zones, which seems to break a fundamental concept of swift.
Here is the script that executes the commands:
#!/bin/bash
REPLICA_COUNT=2
swift-
swift-
swift-
swift-
swift-
swift-
With version 1.10.0, I see the following output:
WARNING: No region specified for z1-172.
Device d0r1z1-
Device d1r1z2-
Device d2r1z2-
Reassigned 32768 (100.00%) partitions. Balance is now 50.00.
---
NOTE: Balance of 50.00 indicates you should push this
ring, wait at least 1 hours, and rebalance/repush.
---
The minimum number of hours before a partition can be reassigned is now set to 1
And with version 1.10.0, I see the following builder file created:
$ swift-ring-builder object.builder
object.builder, build version 3
32768 partitions, 2.000000 replicas, 1 regions, 2 zones, 3 devices, 50.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication
port name weight partitions balance meta
0 1 1 172.1.1.25 6000 172.1.1.25 6000 d1 99.00 32768 50.00
1 1 2 172.1.1.25 6000 172.1.1.25 6000 d2 99.00 16384 -25.00
2 1 2 172.1.1.25 6000 172.1.1.25 6000 d3 99.00 16384 -25.00
With version 2.2.0, I see the following output:
WARNING: No region specified for z1-172.
Device d0r1z1-
WARNING: No region specified for z2-172.
Device d1r1z2-
WARNING: No region specified for z2-172.
Device d2r1z2-
Reassigned 32768 (100.00%) partitions. Balance is now 0.00.
The minimum number of hours before a partition can be reassigned is now set to 1
And with version 2.2.0, I see the following builder file created:
$ swift-ring-builder object.builder
object.builder, build version 3
32768 partitions, 2.000000 replicas, 1 regions, 2 zones, 3 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices: id region zone ip address port replication ip replication port name weight partitions balance meta
0 1 1 172.1.1.25 6000 172.1.1.25 6000 d1 99.00 21846 0.00
1 1 2 172.1.1.25 6000 172.1.1.25 6000 d2 99.00 21845 -0.00
2 1 2 172.1.1.25 6000 172.1.1.25 6000 d3 99.00 21845 -0.00
Can anyone tell me what has changed and whether or not there is additional configuration that is required with the updated code to resolve this?
Thanks in advance.
Thanks for your bug report!
Indeed there is a change in the calculation:
Swift now also takes the device weight into account when assigning partitions. Your zone 1 has a total weight of 99, but zone 2 has a total weight of 198 - thus more partitions are assigned to zone 2.
If you increase the weight of device number 1 to 198 (or lower the weight for device 2+3 to 49.5) and rebalance your ring you'll see that zone 1 has the same amount of partitions assigned as zone 2.
It's also a little bit special because there are two zones with two replicas in this example. Let's assume you would add another zone, also with a single device and a weight of 99 - now you end up with one replica in zone 2, and the other replica in zone 1 or 3.
I start working on a patch for this that will raise a warning if there is a problem (like in this case) and update the docs as well.