ok, so this *is* a duplicate of https://bugs.launchpad.net/swift/+bug/1452431
$ python validate_ec_ring_parts.py object-1.builder set([]) 14 ERROR: all parts have 13 device ids for replicas instead of 14
^ pretty crappy, but explains the index error
I instrumented the reconstructor's _get_part_jobs method
# diff reconstructor.py reconstructor.py.bak 726d725 < print partition 729d727 < print node 752d749 < print fi, len(part_nodes)
and so it's happening basically like I thought:
object-reconstructor: STDOUT: 955 object-reconstructor: STDOUT: {'index': 0, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 18, 'replication_ip': '172.30.3.45', 'device': 'd18', 'port': 6000} object-reconstructor: STDOUT: {'index': 1, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.43', 'region': 1, 'id': 5, 'replication_ip': '172.30.3.43', 'device': 'd5', 'port': 6000} object-reconstructor: STDOUT: {'index': 2, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 11, 'replication_ip': '172.30.3.44', 'device': 'd11', 'port': 6000} object-reconstructor: STDOUT: {'index': 3, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 16, 'replication_ip': '172.30.3.45', 'device': 'd16', 'port': 6000} object-reconstructor: STDOUT: {'index': 4, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 13, 'replication_ip': '172.30.3.44', 'device': 'd13', 'port': 6000} object-reconstructor: STDOUT: {'index': 5, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.43', 'region': 1, 'id': 6, 'replication_ip': '172.30.3.43', 'device': 'd6', 'port': 6000} object-reconstructor: STDOUT: {'index': 6, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 9, 'replication_ip': '172.30.3.44', 'device': 'd9', 'port': 6000} object-reconstructor: STDOUT: {'index': 7, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 20, 'replication_ip': '172.30.3.45', 'device': 'd20', 'port': 6000} object-reconstructor: STDOUT: {'index': 8, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.43', 'region': 1, 'id': 7, 'replication_ip': '172.30.3.43', 'device': 'd7', 'port': 6000} object-reconstructor: STDOUT: {'index': 9, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 10, 'replication_ip': '172.30.3.44', 'device': 'd10', 'port': 6000} object-reconstructor: STDOUT: {'index': 10, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 21, 'replication_ip': '172.30.3.45', 'device': 'd21', 'port': 6000} object-reconstructor: STDOUT: {'index': 11, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 23, 'replication_ip': '172.30.3.45', 'device': 'd23', 'port': 6000} object-reconstructor: STDOUT: {'index': 12, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 8, 'replication_ip': '172.30.3.44', 'device': 'd8', 'port': 6000} object-reconstructor: STDOUT: 13 object-reconstructor: STDOUT: 13 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 908, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 867, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/local/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 757, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
here's the what the ring tools have to say about it
# swift-get-nodes -p 955 /etc/swift/object-1.ring.gz
Account None Container None Object None
Partition 955 Hash None
Server:Port Device 172.30.3.45:6000 d18 Server:Port Device 172.30.3.43:6000 d5 Server:Port Device 172.30.3.44:6000 d11 Server:Port Device 172.30.3.45:6000 d16 Server:Port Device 172.30.3.44:6000 d13 Server:Port Device 172.30.3.43:6000 d6 Server:Port Device 172.30.3.44:6000 d9 Server:Port Device 172.30.3.45:6000 d20 Server:Port Device 172.30.3.43:6000 d7 Server:Port Device 172.30.3.44:6000 d10 Server:Port Device 172.30.3.45:6000 d21 Server:Port Device 172.30.3.45:6000 d23 Server:Port Device 172.30.3.44:6000 d8 Server:Port Device 172.30.3.44:6000 d12 [Handoff] Server:Port Device 172.30.3.45:6000 d19 [Handoff] Server:Port Device 172.30.3.44:6000 d14 [Handoff] Server:Port Device 172.30.3.45:6000 d17 [Handoff] Server:Port Device 172.30.3.44:6000 d15 [Handoff] Server:Port Device 172.30.3.45:6000 d22 [Handoff]
ok, so this *is* a duplicate of https:/ /bugs.launchpad .net/swift/ +bug/1452431
$ python validate_ ec_ring_ parts.py object-1.builder
set([]) 14
ERROR: all parts have 13 device ids for replicas instead of 14
^ pretty crappy, but explains the index error
I instrumented the reconstructor's _get_part_jobs method
# diff reconstructor.py reconstructor. py.bak
726d725
< print partition
729d727
< print node
752d749
< print fi, len(part_nodes)
and so it's happening basically like I thought:
object- reconstructor: STDOUT: 955 reconstructor: STDOUT: {'index': 0, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 18, 'replication_ip': '172.30.3.45', 'device': 'd18', 'port': 6000} reconstructor: STDOUT: {'index': 1, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.43', 'region': 1, 'id': 5, 'replication_ip': '172.30.3.43', 'device': 'd5', 'port': 6000} reconstructor: STDOUT: {'index': 2, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 11, 'replication_ip': '172.30.3.44', 'device': 'd11', 'port': 6000} reconstructor: STDOUT: {'index': 3, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 16, 'replication_ip': '172.30.3.45', 'device': 'd16', 'port': 6000} reconstructor: STDOUT: {'index': 4, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 13, 'replication_ip': '172.30.3.44', 'device': 'd13', 'port': 6000} reconstructor: STDOUT: {'index': 5, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.43', 'region': 1, 'id': 6, 'replication_ip': '172.30.3.43', 'device': 'd6', 'port': 6000} reconstructor: STDOUT: {'index': 6, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 9, 'replication_ip': '172.30.3.44', 'device': 'd9', 'port': 6000} reconstructor: STDOUT: {'index': 7, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 20, 'replication_ip': '172.30.3.45', 'device': 'd20', 'port': 6000} reconstructor: STDOUT: {'index': 8, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.43', 'region': 1, 'id': 7, 'replication_ip': '172.30.3.43', 'device': 'd7', 'port': 6000} reconstructor: STDOUT: {'index': 9, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 10, 'replication_ip': '172.30.3.44', 'device': 'd10', 'port': 6000} reconstructor: STDOUT: {'index': 10, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 21, 'replication_ip': '172.30.3.45', 'device': 'd21', 'port': 6000} reconstructor: STDOUT: {'index': 11, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.45', 'region': 1, 'id': 23, 'replication_ip': '172.30.3.45', 'device': 'd23', 'port': 6000} reconstructor: STDOUT: {'index': 12, 'replication_port': 6003, 'weight': 8.589934592, 'zone': 1, 'ip': '172.30.3.44', 'region': 1, 'id': 8, 'replication_ip': '172.30.3.44', 'device': 'd8', 'port': 6000} reconstructor: STDOUT: 13 reconstructor: STDOUT: 13 reconstructor: Exception in top-levelrecons truction loop: #012Traceback (most recent call last):#012 File "/usr/local/ lib/python2. 7/dist- packages/ swift/obj/ reconstructor. py", line 908, in reconstruct#012 jobs = self.build_ reconstruction_ jobs(part_ info)#012 File "/usr/local/ lib/python2. 7/dist- packages/ swift/obj/ reconstructor. py", line 867, in build_reconstru ction_jobs# 012 jobs = self._get_ part_jobs( **part_ info)#012 File "/usr/local/ lib/python2. 7/dist- packages/ swift/obj/ reconstructor. py", line 757, in _get_part_jobs#012 sync_to= [part_nodes[ fi]],#012IndexE rror: list index out of range
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
object-
here's the what the ring tools have to say about it
# swift-get-nodes -p 955 /etc/swift/ object- 1.ring. gz
Account None
Container None
Object None
Partition 955
Hash None
Server:Port Device 172.30.3.45:6000 d18
Server:Port Device 172.30.3.43:6000 d5
Server:Port Device 172.30.3.44:6000 d11
Server:Port Device 172.30.3.45:6000 d16
Server:Port Device 172.30.3.44:6000 d13
Server:Port Device 172.30.3.43:6000 d6
Server:Port Device 172.30.3.44:6000 d9
Server:Port Device 172.30.3.45:6000 d20
Server:Port Device 172.30.3.43:6000 d7
Server:Port Device 172.30.3.44:6000 d10
Server:Port Device 172.30.3.45:6000 d21
Server:Port Device 172.30.3.45:6000 d23
Server:Port Device 172.30.3.44:6000 d8
Server:Port Device 172.30.3.44:6000 d12 [Handoff]
Server:Port Device 172.30.3.45:6000 d19 [Handoff]
Server:Port Device 172.30.3.44:6000 d14 [Handoff]
Server:Port Device 172.30.3.45:6000 d17 [Handoff]
Server:Port Device 172.30.3.44:6000 d15 [Handoff]
Server:Port Device 172.30.3.45:6000 d22 [Handoff]