index out of range on ring rebalance

Bug #845952 reported by David Kranz
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Mark Gius

Bug Description

I was experimenting with removing a disk from the ring and got this error. The size of the ring is obviously unrealistically small and I am not sure if this could ever bite a real configuration but am reporting it anyway just in case. Here is the transcript with the calls to create the ring, add the disks, and then remove one of the disks: (using latest ppa:swift-core/release)

swift-ring-builder ring-files/account.builder create 10 3 0
swift-ring-builder ring-files/container.builder create 10 3 0
swift-ring-builder ring-files/object.builder create 10 3 0
swift-ring-builder ring-files/account.builder add z1-172.18.0.102:6002/sdb1 100
Device z1-172.18.0.102:6002/sdb1_"" with 100.0 weight got id 0
swift-ring-builder ring-files/container.builder add z1-172.18.0.102:6001/sdb1 100
Device z1-172.18.0.102:6001/sdb1_"" with 100.0 weight got id 0
swift-ring-builder ring-files/object.builder add z1-172.18.0.102:6000/sdb1 100
Device z1-172.18.0.102:6000/sdb1_"" with 100.0 weight got id 0
swift-ring-builder ring-files/account.builder add z2-172.18.0.102:6002/sdc1 100
Device z2-172.18.0.102:6002/sdc1_"" with 100.0 weight got id 1
swift-ring-builder ring-files/container.builder add z2-172.18.0.102:6001/sdc1 100
Device z2-172.18.0.102:6001/sdc1_"" with 100.0 weight got id 1
swift-ring-builder ring-files/object.builder add z2-172.18.0.102:6000/sdc1 100
Device z2-172.18.0.102:6000/sdc1_"" with 100.0 weight got id 1
swift-ring-builder ring-files/account.builder add z3-172.18.0.102:6002/sdd1 100
Device z3-172.18.0.102:6002/sdd1_"" with 100.0 weight got id 2
swift-ring-builder ring-files/container.builder add z3-172.18.0.102:6001/sdd1 100
Device z3-172.18.0.102:6001/sdd1_"" with 100.0 weight got id 2
swift-ring-builder ring-files/object.builder add z3-172.18.0.102:6000/sdd1 100
Device z3-172.18.0.102:6000/sdd1_"" with 100.0 weight got id 2
swift-ring-builder ring-files/account.builder add z4-172.18.0.103:6002/sdb1 100
Device z4-172.18.0.103:6002/sdb1_"" with 100.0 weight got id 3
swift-ring-builder ring-files/container.builder add z4-172.18.0.103:6001/sdb1 100
Device z4-172.18.0.103:6001/sdb1_"" with 100.0 weight got id 3
swift-ring-builder ring-files/object.builder add z4-172.18.0.103:6000/sdb1 100
Device z4-172.18.0.103:6000/sdb1_"" with 100.0 weight got id 3
swift-ring-builder ring-files/account.builder add z5-172.18.0.103:6002/sdc1 100
Device z5-172.18.0.103:6002/sdc1_"" with 100.0 weight got id 4
swift-ring-builder ring-files/container.builder add z5-172.18.0.103:6001/sdc1 100
Device z5-172.18.0.103:6001/sdc1_"" with 100.0 weight got id 4
swift-ring-builder ring-files/object.builder add z5-172.18.0.103:6000/sdc1 100
Device z5-172.18.0.103:6000/sdc1_"" with 100.0 weight got id 4
swift-ring-builder ring-files/account.builder add z1-172.18.0.103:6002/sdd1 100
Device z1-172.18.0.103:6002/sdd1_"" with 100.0 weight got id 5
swift-ring-builder ring-files/container.builder add z1-172.18.0.103:6001/sdd1 100
Device z1-172.18.0.103:6001/sdd1_"" with 100.0 weight got id 5
swift-ring-builder ring-files/object.builder add z1-172.18.0.103:6000/sdd1 100
Device z1-172.18.0.103:6000/sdd1_"" with 100.0 weight got id 5
swift-ring-builder ring-files/account.builder add z2-172.18.0.104:6002/sdb1 100
Device z2-172.18.0.104:6002/sdb1_"" with 100.0 weight got id 6
swift-ring-builder ring-files/container.builder add z2-172.18.0.104:6001/sdb1 100
Device z2-172.18.0.104:6001/sdb1_"" with 100.0 weight got id 6
swift-ring-builder ring-files/object.builder add z2-172.18.0.104:6000/sdb1 100
Device z2-172.18.0.104:6000/sdb1_"" with 100.0 weight got id 6
swift-ring-builder ring-files/account.builder add z3-172.18.0.104:6002/sdc1 100
Device z3-172.18.0.104:6002/sdc1_"" with 100.0 weight got id 7
swift-ring-builder ring-files/container.builder add z3-172.18.0.104:6001/sdc1 100
Device z3-172.18.0.104:6001/sdc1_"" with 100.0 weight got id 7
swift-ring-builder ring-files/object.builder add z3-172.18.0.104:6000/sdc1 100
Device z3-172.18.0.104:6000/sdc1_"" with 100.0 weight got id 7
swift-ring-builder ring-files/account.builder add z4-172.18.0.104:6002/sdd1 100
Device z4-172.18.0.104:6002/sdd1_"" with 100.0 weight got id 8
swift-ring-builder ring-files/container.builder add z4-172.18.0.104:6001/sdd1 100
Device z4-172.18.0.104:6001/sdd1_"" with 100.0 weight got id 8
swift-ring-builder ring-files/object.builder add z4-172.18.0.104:6000/sdd1 100
Device z4-172.18.0.104:6000/sdd1_"" with 100.0 weight got id 8
swift-ring-builder ring-files/account.builder rebalance
Reassigned 1024 (100.00%) partitions. Balance is now 0.20.
swift-ring-builder ring-files/container.builder rebalance
Reassigned 1024 (100.00%) partitions. Balance is now 0.20.
swift-ring-builder ring-files/object.builder rebalance
Reassigned 1024 (100.00%) partitions. Balance is now 0.20.

swift-ring-builder ring-files/account.builder remove 172.18.0.104/sdd1
d8z4-172.18.0.104:6002/sdd1_"" marked for removal and will be removed next rebalance.
swift-ring-builder ring-files/container.builder remove 172.18.0.104/sdd1
d8z4-172.18.0.104:6001/sdd1_"" marked for removal and will be removed next rebalance.
swift-ring-builder ring-files/object.builder remove 172.18.0.104/sdd1
d8z4-172.18.0.104:6000/sdd1_"" marked for removal and will be removed next rebalance.
swift-ring-builder ring-files/account.builder rebalance
Traceback (most recent call last):
  File "/usr/bin/swift-ring-builder", line 5, in <module>
    pkg_resources.run_script('swift==1.4.3', 'swift-ring-builder')
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 489, in run_script
  File "build/bdist.linux-i686/egg/pkg_resources.py", line 1207, in run_script
  File "/usr/lib/python2.6/site-packages/swift-1.4.3-py2.6.egg/EGG-INFO/scripts/swift-ring-builder", line 651, in <module>
    Commands.__dict__.get(command, Commands.unknown)()
  File "/usr/lib/python2.6/site-packages/swift-1.4.3-py2.6.egg/EGG-INFO/scripts/swift-ring-builder", line 512, in rebalance
    parts, balance = builder.rebalance()
  File "/usr/lib/python2.6/site-packages/swift-1.4.3-py2.6.egg/swift/common/ring/builder.py", line 250, in rebalance
    reassign_parts = self._gather_reassign_parts()
  File "/usr/lib/python2.6/site-packages/swift-1.4.3-py2.6.egg/swift/common/ring/builder.py", line 462, in _gather_reassign_parts
    dev = self.devs[part2dev[part]]
IndexError: list index out of range

Revision history for this message
Mark Gius (markgius) wrote :

Here is another reproduction script:

swift-ring-builder account.builder create 6 3 0 # allows for rapid rebalancing

# add minimum nodes
swift-ring-builder account.builder add z1-172.17.3.1:8080/sdb 100.0
swift-ring-builder account.builder add z2-172.17.3.2:8080/sdb 100.0
swift-ring-builder account.builder add z3-172.17.3.3:8080/sdb 100.0

swift-ring-builder account.builder rebalance
swift-ring-builder account.builder

# double size of cluster
swift-ring-builder account.builder add z1-172.17.3.4:8080/sdb 100.0
swift-ring-builder account.builder add z2-172.17.3.5:8080/sdb 100.0
swift-ring-builder account.builder add z3-172.17.3.6:8080/sdb 100.0

swift-ring-builder account.builder rebalance
swift-ring-builder account.builder

# a node has died and will not be replaced soon
swift-ring-builder account.builder remove d5

# this command will crash
swift-ring-builder account.builder rebalance

Revision history for this message
Mark Gius (markgius) wrote :

I wrote a unit test for the same behavior, which passes. Unit test patch attached. It looks as though this has something to do with swift-ring-builder itself, possibly a load/save error.

Mark Gius (markgius)
Changed in swift:
assignee: nobody → Mark Gius (markgius)
status: New → In Progress
Revision history for this message
Mark Gius (markgius) wrote :
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : A change has been merged to openstack/swift

Reviewed: https://review.openstack.org/522
Committed: http://github.com/openstack/swift/commit/c0315a89dfd13bcccbc8ddc5d57adb9afd085afc
Submitter: Jenkins
Branch: master

 status fixcommitted
 done

commit c0315a89dfd13bcccbc8ddc5d57adb9afd085afc
Author: Mark Gius <email address hidden>
Date: Wed Sep 21 13:17:50 2011 -0700

    Fix for bug 845952

    Devices scheduled to be removed are assigned a device of 65535. When
    looking for parts to reassign from heavy nodes, these parts need to be
    skipped.

    Includes review suggestions

    Change-Id: I61f40c36509bf998834c123b0f80117ca6def3ff

Changed in swift:
status: In Progress → Fix Committed
Changed in swift:
milestone: none → 1.4.4
Thierry Carrez (ttx)
Changed in swift:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.