rebalance crashes after removing many devices
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Samuel Merritt |
Bug Description
You can get RingBuilder.
If more than one replica of a partition has to move due to devices being deleted, rebalance() only moves one, leaving the ring in an invalid state, and then validate() complains about it.
You can make it happen for an N-replica ring by removing more than 1/N of the devices; however, there's a chance it'll happen any time more than one device is removed between rebalances.
Example reproduction:
vagrant@
vagrant@
Device z1-10.1.
vagrant@
Device z2-10.1.
vagrant@
Device z3-10.1.
vagrant@
Device z1-10.1.
vagrant@
Device z2-10.1.
vagrant@
Device z3-10.1.
vagrant@
Reassigned 256 (100.00%) partitions. Balance is now 0.00.
vagrant@
d3z1-10.
vagrant@
d4z2-10.
vagrant@
d5z3-10.
vagrant@
-------
An error has occurred during ring validation. Common
causes of failure are rings that are empty or do not
have enough devices to accommodate the replica count.
Original exception message:
All partitions are not double accounted for: 600 != 768
-------
This occurs on tag 1.4.6 as well as master (1a7453f).
Changed in swift: | |
milestone: | none → 1.4.7 |
status: | Fix Committed → Fix Released |
Fix proposed to branch: master /review. openstack. org/4710
Review: https:/