=== Resolution ===
As the data was only tests, I deleted the old ring which was created with a part power of 19. And re-created it with a part power of 18, BUT I should have forget to cleanup the data disk content.
This invalid configuration resulted in an IndexError (show below) in the reconstructor logs when it encountered a part that was invalid for the current ring.
This state was hard hard to diagnose because of lp bug #1583798
==== Original Description ===
'm using swift 2.7 packaged version under Ubuntu 16.04. I have an erasure coded ring (9+3) declared and I got errors reported by only one of my four ACO servers. All four servers seems to be installed the same, but there must be either a different configuration somewhere, or something wrong on the erasure coding storage specific to this node. I'm not quite sure if this is a bug.
Here are the "buggy" server logs.
May 17 16:40:39 STACO1 object-reconstructor: Starting object reconstruction pass.
May 17 16:40:46 STACO1 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 912, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 871, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 761, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
May 17 16:40:46 STACO1 object-reconstructor: 184/14061 (1.31%) partitions of 1/7 (14.29%) devices reconstructed in 7.32s (25.14/sec, 1h remaining)
May 17 16:40:46 STACO1 object-reconstructor: Object reconstruction complete. (0.12 minutes)
May 17 16:41:16 STACO1 object-reconstructor: Starting object reconstruction pass.
May 17 16:41:24 STACO1 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 912, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 871, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 761, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
May 17 16:41:24 STACO1 object-reconstructor: 184/14061 (1.31%) partitions of 1/7 (14.29%) devices reconstructed in 7.69s (23.92/sec, 1h remaining)
May 17 16:41:24 STACO1 object-reconstructor: Object reconstruction complete. (0.13 minutes)
May 17 16:41:54 STACO1 object-reconstructor: Starting object reconstruction pass.
May 17 16:42:01 STACO1 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 912, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 871, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 761, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
May 17 16:42:01 STACO1 object-reconstructor: 184/14061 (1.31%) partitions of 1/7 (14.29%) devices reconstructed in 7.36s (25.00/sec, 1h remaining)
May 17 16:42:01 STACO1 object-reconstructor: Object reconstruction complete. (0.12 minutes)
During the same time, here is the type of logs I have on the other nodes :
May 17 16:13:46 STACO2 object-reconstructor: Starting object reconstruction pass.
May 17 16:18:47 STACO2 object-reconstructor: 8696/13827 (62.89%) partitions of 1/7 (14.29%) devices reconstructed in 300.00s (28.99/sec, 50m remaining)
May 17 16:18:47 STACO2 object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:18:47 STACO2 object-reconstructor: Partition times: max 0.6359s, min 0.0004s, med 0.0064s
May 17 16:23:47 STACO2 object-reconstructor: 17318/27401 (63.20%) partitions of 2/7 (28.57%) devices reconstructed in 600.01s (28.86/sec, 45m remaining)
May 17 16:23:47 STACO2 object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:23:47 STACO2 object-reconstructor: Partition times: max 0.7612s, min 0.0003s, med 0.0064s
May 17 16:28:47 STACO2 object-reconstructor: 25998/27401 (94.88%) partitions of 2/7 (28.57%) devices reconstructed in 900.02s (28.89/sec, 40m remaining)
May 17 16:28:47 STACO2 object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:28:47 STACO2 object-reconstructor: Partition times: max 0.7612s, min 0.0003s, med 0.0064s
May 17 16:29:35 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd02/objects-1/auditor_status_ZBF.json'
May 17 16:29:45 STACO2 object-reconstructor: Unable to get enough responses (1/9) to reconstruct 10.10.2.51:6000/s01z1ecd01/200914/AUTH_Joomeo/20160131_O0/test1 policy#1 frag#11 with ETag 31d00d842caf20bbfac92617eed068f2
May 17 16:29:45 STACO2 object-reconstructor: Unable to get enough responses (1/9) to reconstruct 10.10.2.51:6000/s01z1ecd01/200914/AUTH_Joomeo/20160131_O0/test1 policy#1 frag#11 with ETag 31d00d842caf20bbfac92617eed068f2
May 17 16:33:47 STACO2 object-reconstructor: 34593/41065 (84.24%) partitions of 3/7 (42.86%) devices reconstructed in 1200.03s (28.83/sec, 35m remaining)
May 17 16:33:47 STACO2 object-reconstructor: 4 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:33:47 STACO2 object-reconstructor: Partition times: max 0.7612s, min 0.0003s, med 0.0064s
May 17 16:37:27 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd03/objects-1/auditor_status_ZBF.json'
May 17 16:38:47 STACO2 object-reconstructor: 43208/54699 (78.99%) partitions of 4/7 (57.14%) devices reconstructed in 1500.04s (28.80/sec, 30m remaining)
May 17 16:38:47 STACO2 object-reconstructor: 4 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:38:47 STACO2 object-reconstructor: Partition times: max 0.9936s, min 0.0003s, med 0.0064s
May 17 16:43:47 STACO2 object-reconstructor: 51811/54699 (94.72%) partitions of 4/7 (57.14%) devices reconstructed in 1800.05s (28.78/sec, 25m remaining)
May 17 16:43:47 STACO2 object-reconstructor: 4 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:43:47 STACO2 object-reconstructor: Partition times: max 0.9936s, min 0.0003s, med 0.0064s
May 17 16:45:30 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd04/objects-1/auditor_status_ALL.json'
May 17 16:45:30 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd04/objects-1/auditor_status_ZBF.json'
May 17 16:48:27 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd05/objects-1/auditor_status_ZBF.json'
May 17 16:48:47 STACO2 object-reconstructor: 60335/68363 (88.26%) partitions of 5/7 (71.43%) devices reconstructed in 2100.07s (28.73/sec, 20m remaining)
May 17 16:48:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:48:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 16:53:47 STACO2 object-reconstructor: 68943/81920 (84.16%) partitions of 6/7 (85.71%) devices reconstructed in 2400.08s (28.73/sec, 15m remaining)
May 17 16:53:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:53:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 16:58:47 STACO2 object-reconstructor: 77564/81920 (94.68%) partitions of 6/7 (85.71%) devices reconstructed in 2700.10s (28.73/sec, 10m remaining)
May 17 16:58:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:58:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 17:01:18 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd06/objects-1/auditor_status_ZBF.json'
May 17 17:03:47 STACO2 object-reconstructor: 86263/92783 (92.97%) partitions of 7/7 (100.00%) devices reconstructed in 3000.11s (28.75/sec, 3m remaining)
May 17 17:03:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 17:03:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 17:07:29 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd07/objects-1/auditor_status_ZBF.json'
May 17 17:07:29 STACO2 object-reconstructor: 92783/92783 (100.00%) partitions of 7/7 (100.00%) devices reconstructed in 3222.37s (28.79/sec, 0s remaining)
May 17 17:07:29 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 17:07:29 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 17:07:29 STACO2 object-reconstructor: Object reconstruction complete. (53.71 minutes)
Note that it looks like there is a problem with an object test1 : Unable to get enough responses (1/9) to reconstruct 10.10.2.51:6000/s01z1ecd01/200914/AUTH_Joomeo/20160131_O0/test1 policy#1 frag#11 with ETag 31d00d842caf20bbfac92617eed068f2
The disk s01z1ecd01 is located on the buggy server STACO1. However, I can't access to neither delete the container 20160131_O0
> DELETE /v1/AUTH_Joomeo/20160131_O0 HTTP/1.1
> Host: 10.10.1.50:8888
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 204 No Content
Also I'm not sure what means these type of messages : object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd07/objects-1/auditor_status_ZBF.json'
Thank you for reporting this issue - I don't believe I seen it before.
Looks like some sort of issue with the devices in the ring or partition placement - are you running at least Swift 2.6.0? I think it's also possible to slip in an invalid ring under running reconstructors [1] - you might check for errors if you try to restart them. Can you validate all nodes have the same rings - md5 is normally fine - or check with recon. Can you provide the output of `swift-ring-builder <name-of- ec-ring> .builder` ?
1. lp bug #1534572