object-reconstructor IndexError when it encounters parts from another ring

Bug #1583144 reported by Admin6
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Low
Unassigned

Bug Description

=== Resolution ===

As the data was only tests, I deleted the old ring which was created with a part power of 19. And re-created it with a part power of 18, BUT I should have forget to cleanup the data disk content.

This invalid configuration resulted in an IndexError (show below) in the reconstructor logs when it encountered a part that was invalid for the current ring.

This state was hard hard to diagnose because of lp bug #1583798

==== Original Description ===

'm using swift 2.7 packaged version under Ubuntu 16.04. I have an erasure coded ring (9+3) declared and I got errors reported by only one of my four ACO servers. All four servers seems to be installed the same, but there must be either a different configuration somewhere, or something wrong on the erasure coding storage specific to this node. I'm not quite sure if this is a bug.

Here are the "buggy" server logs.

May 17 16:40:39 STACO1 object-reconstructor: Starting object reconstruction pass.
May 17 16:40:46 STACO1 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 912, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 871, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 761, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
May 17 16:40:46 STACO1 object-reconstructor: 184/14061 (1.31%) partitions of 1/7 (14.29%) devices reconstructed in 7.32s (25.14/sec, 1h remaining)
May 17 16:40:46 STACO1 object-reconstructor: Object reconstruction complete. (0.12 minutes)
May 17 16:41:16 STACO1 object-reconstructor: Starting object reconstruction pass.
May 17 16:41:24 STACO1 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 912, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 871, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 761, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
May 17 16:41:24 STACO1 object-reconstructor: 184/14061 (1.31%) partitions of 1/7 (14.29%) devices reconstructed in 7.69s (23.92/sec, 1h remaining)
May 17 16:41:24 STACO1 object-reconstructor: Object reconstruction complete. (0.13 minutes)
May 17 16:41:54 STACO1 object-reconstructor: Starting object reconstruction pass.
May 17 16:42:01 STACO1 object-reconstructor: Exception in top-levelreconstruction loop: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 912, in reconstruct#012 jobs = self.build_reconstruction_jobs(part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 871, in build_reconstruction_jobs#012 jobs = self._get_part_jobs(**part_info)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/reconstructor.py", line 761, in _get_part_jobs#012 sync_to=[part_nodes[fi]],#012IndexError: list index out of range
May 17 16:42:01 STACO1 object-reconstructor: 184/14061 (1.31%) partitions of 1/7 (14.29%) devices reconstructed in 7.36s (25.00/sec, 1h remaining)
May 17 16:42:01 STACO1 object-reconstructor: Object reconstruction complete. (0.12 minutes)

During the same time, here is the type of logs I have on the other nodes :

May 17 16:13:46 STACO2 object-reconstructor: Starting object reconstruction pass.
May 17 16:18:47 STACO2 object-reconstructor: 8696/13827 (62.89%) partitions of 1/7 (14.29%) devices reconstructed in 300.00s (28.99/sec, 50m remaining)
May 17 16:18:47 STACO2 object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:18:47 STACO2 object-reconstructor: Partition times: max 0.6359s, min 0.0004s, med 0.0064s
May 17 16:23:47 STACO2 object-reconstructor: 17318/27401 (63.20%) partitions of 2/7 (28.57%) devices reconstructed in 600.01s (28.86/sec, 45m remaining)
May 17 16:23:47 STACO2 object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:23:47 STACO2 object-reconstructor: Partition times: max 0.7612s, min 0.0003s, med 0.0064s
May 17 16:28:47 STACO2 object-reconstructor: 25998/27401 (94.88%) partitions of 2/7 (28.57%) devices reconstructed in 900.02s (28.89/sec, 40m remaining)
May 17 16:28:47 STACO2 object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:28:47 STACO2 object-reconstructor: Partition times: max 0.7612s, min 0.0003s, med 0.0064s
May 17 16:29:35 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd02/objects-1/auditor_status_ZBF.json'
May 17 16:29:45 STACO2 object-reconstructor: Unable to get enough responses (1/9) to reconstruct 10.10.2.51:6000/s01z1ecd01/200914/AUTH_Joomeo/20160131_O0/test1 policy#1 frag#11 with ETag 31d00d842caf20bbfac92617eed068f2
May 17 16:29:45 STACO2 object-reconstructor: Unable to get enough responses (1/9) to reconstruct 10.10.2.51:6000/s01z1ecd01/200914/AUTH_Joomeo/20160131_O0/test1 policy#1 frag#11 with ETag 31d00d842caf20bbfac92617eed068f2
May 17 16:33:47 STACO2 object-reconstructor: 34593/41065 (84.24%) partitions of 3/7 (42.86%) devices reconstructed in 1200.03s (28.83/sec, 35m remaining)
May 17 16:33:47 STACO2 object-reconstructor: 4 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:33:47 STACO2 object-reconstructor: Partition times: max 0.7612s, min 0.0003s, med 0.0064s
May 17 16:37:27 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd03/objects-1/auditor_status_ZBF.json'
May 17 16:38:47 STACO2 object-reconstructor: 43208/54699 (78.99%) partitions of 4/7 (57.14%) devices reconstructed in 1500.04s (28.80/sec, 30m remaining)
May 17 16:38:47 STACO2 object-reconstructor: 4 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:38:47 STACO2 object-reconstructor: Partition times: max 0.9936s, min 0.0003s, med 0.0064s
May 17 16:43:47 STACO2 object-reconstructor: 51811/54699 (94.72%) partitions of 4/7 (57.14%) devices reconstructed in 1800.05s (28.78/sec, 25m remaining)
May 17 16:43:47 STACO2 object-reconstructor: 4 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:43:47 STACO2 object-reconstructor: Partition times: max 0.9936s, min 0.0003s, med 0.0064s
May 17 16:45:30 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd04/objects-1/auditor_status_ALL.json'
May 17 16:45:30 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd04/objects-1/auditor_status_ZBF.json'
May 17 16:48:27 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd05/objects-1/auditor_status_ZBF.json'
May 17 16:48:47 STACO2 object-reconstructor: 60335/68363 (88.26%) partitions of 5/7 (71.43%) devices reconstructed in 2100.07s (28.73/sec, 20m remaining)
May 17 16:48:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:48:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 16:53:47 STACO2 object-reconstructor: 68943/81920 (84.16%) partitions of 6/7 (85.71%) devices reconstructed in 2400.08s (28.73/sec, 15m remaining)
May 17 16:53:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:53:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 16:58:47 STACO2 object-reconstructor: 77564/81920 (94.68%) partitions of 6/7 (85.71%) devices reconstructed in 2700.10s (28.73/sec, 10m remaining)
May 17 16:58:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 16:58:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 17:01:18 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd06/objects-1/auditor_status_ZBF.json'
May 17 17:03:47 STACO2 object-reconstructor: 86263/92783 (92.97%) partitions of 7/7 (100.00%) devices reconstructed in 3000.11s (28.75/sec, 3m remaining)
May 17 17:03:47 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 17:03:47 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 17:07:29 STACO2 object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd07/objects-1/auditor_status_ZBF.json'
May 17 17:07:29 STACO2 object-reconstructor: 92783/92783 (100.00%) partitions of 7/7 (100.00%) devices reconstructed in 3222.37s (28.79/sec, 0s remaining)
May 17 17:07:29 STACO2 object-reconstructor: 5 suffixes checked - 0.00% hashed, 100.00% synced
May 17 17:07:29 STACO2 object-reconstructor: Partition times: max 1.3764s, min 0.0003s, med 0.0064s
May 17 17:07:29 STACO2 object-reconstructor: Object reconstruction complete. (53.71 minutes)

Note that it looks like there is a problem with an object test1 : Unable to get enough responses (1/9) to reconstruct 10.10.2.51:6000/s01z1ecd01/200914/AUTH_Joomeo/20160131_O0/test1 policy#1 frag#11 with ETag 31d00d842caf20bbfac92617eed068f2

The disk s01z1ecd01 is located on the buggy server STACO1. However, I can't access to neither delete the container 20160131_O0

> DELETE /v1/AUTH_Joomeo/20160131_O0 HTTP/1.1
> Host: 10.10.1.50:8888
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 204 No Content

Also I'm not sure what means these type of messages : object-reconstructor: Unexpected entity in data dir: u'/srv/node/s02z2ecd07/objects-1/auditor_status_ZBF.json'

Revision history for this message
clayg (clay-gerrard) wrote :

Thank you for reporting this issue - I don't believe I seen it before.

Looks like some sort of issue with the devices in the ring or partition placement - are you running at least Swift 2.6.0? I think it's also possible to slip in an invalid ring under running reconstructors [1] - you might check for errors if you try to restart them. Can you validate all nodes have the same rings - md5 is normally fine - or check with recon. Can you provide the output of `swift-ring-builder <name-of-ec-ring>.builder`?

1. lp bug #1534572

Revision history for this message
Admin6 (jmbonnefond) wrote :
Download full text (40.6 KiB)

I use swift 2.7. The rings are the same on all host, and also all swift configuration files are the same. I’ve some monitoring that alerts me if any difference is noticed on any file under the /etc/swift directory.

root@STACO1:~# swift-recon --md5
===============================================================================
--> Starting reconnaissance on 4 hosts
===============================================================================
[2016-05-19 07:28:35] Checking ring md5sums
4/4 hosts matched, 0 error[s] while checking hosts.
===============================================================================
[2016-05-19 07:28:35] Checking swift.conf md5sum
4/4 hosts matched, 0 error[s] while checking hosts.
===============================================================================

root@STACO1:/var/log# md5sum /etc/swift/object-1.builder /etc/swift/object-1.ring.gz
57070932e2c2f4a0006a36cc210ba49e /etc/swift/object-1.builder
e53d6f72660bbadc0fcec832b34818b0 /etc/swift/object-1.ring.gz

root@STACO2:~# md5sum /etc/swift/object-1.builder /etc/swift/object-1.ring.gz
57070932e2c2f4a0006a36cc210ba49e /etc/swift/object-1.builder
e53d6f72660bbadc0fcec832b34818b0 /etc/swift/object-1.ring.gz

The definition of the ring in /etc/swift/swift.conf is :

[storage-policy:1]
name = EC
policy_type = erasure_coding
ec_type = jerasure_rs_vand
ec_num_data_fragments = 9
ec_num_parity_fragments = 3
ec_object_segment_size = 1048576

Here is the current ring status :

root@STACO1:/var/log# swift-ring-builder /etc/swift/object-1.builder
/etc/swift/object-1.builder, build version 44
262144 partitions, 12.000000 replicas, 1 regions, 4 zones, 27 devices, 0.49 balance, 42.95 dispersion
The minimum number of hours before a partition can be reassigned is 1 (0:00:00 remaining)
The overload factor is 0.00% (0.000000)
Ring file /etc/swift/object-1.ring.gz is up-to-date
Devices: id region zone ip address port replication ip replication port name weight partitions balance flags meta
             0 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1ecd01 4000.00 119234 0.07
             1 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1ecd02 4000.00 119233 0.06
             2 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1ecd03 4000.00 119233 0.06
             3 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1ecd04 4000.00 119233 0.06
             4 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1ecd05 4000.00 119233 0.06
             5 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1ecd06 4000.00 119232 0.06
             6 1 2 10.10.1.52 6000 10.10.2.52 6000 s02z2ecd01 4000.00 119232 0.06
             7 1 2 10.10.1.52 6000 10.10.2.52 6000 s02z2ecd02 4000.00 119231 0.06
             8 1 2 10.10.1.52 6000 10.10.2.52 6000 s02z2ecd03 4000.00 119230 0.06
             9 1 2 10.10.1.52 6000 10.10.2.52 ...

Revision history for this message
clayg (clay-gerrard) wrote :

Good that all the rings are the same - nice work. Replica count and policy definition seem good.

The dispersion is high, you can run `swift-ring-builder /etc/swift/object-1.builder dispersion -v` to see a breakout of the part-replica placement - probably those servers with the extra disks have 4 part-replicas for some parts instead of the desired 3 part-replicas. A 9-3 is going to be tough to balance with 4 servers - you may have to bump up overload to get dispersion down.

But I don't believe dispersion is causing the problem - the rings should be fine - so I don't really know whats going on. Can you instrument the reconstructor with some additional logging? [1]

The liberasure warnings are benign - there's an upstream bug [2]

1. https://gist.github.com/clayg/7a8612c1d6f7ee9c4b4fa9c1fe10cfb3
2. https://bitbucket.org/tsg-/liberasurecode/issues/23/spurious-messages-in-syslog

Revision history for this message
Admin6 (jmbonnefond) wrote :
Download full text (12.9 KiB)

Ok, the dispersion is more clear for me now and I understand why this value is high. I'll try to soon add at least a fifth ACO server :-)

Here is, in case, the swift-ring-builder /etc/swift/object-1.builder dispersion -v

root@STACO1:/# swift-ring-builder /etc/swift/object-1.builder dispersion -v
Dispersion is 42.949677, Balance is 0.493145, Overload is 0.00%
Required overload is 10.000000%
Worst tier is 15.285873 (r1z2)
----------------------------------------------------------------------------------------------------------------------------------------------
Tier Parts % Max 0 1 2 3 4 5 6 7 8 9 10 11 12
----------------------------------------------------------------------------------------------------------------------------------------------
r1 262144 0.00 12 0 0 0 0 0 0 0 0 0 0 0 0 262144
r1z1 262144 15.25 3 0 2522 11109 208529 39984 0 0 0 0 0 0 0 0
r1z1-10.10.1.51 262144 15.25 3 0 2522 11109 208529 39984 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd01 119234 0.00 1 142910 119234 0 0 0 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd02 119233 0.00 1 142911 119233 0 0 0 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd03 119233 0.00 1 142911 119233 0 0 0 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd04 119233 0.00 1 142911 119233 0 0 0 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd05 119233 0.00 1 142911 119233 0 0 0 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd06 119232 0.00 1 142912 119232 0 0 0 0 0 0 0 0 0 0 0
r1z1-10.10.1.51/s01z1ecd07 94865 0.00 1 167279 94865 0 0 0 0 0 0 0 0 0 0 0
r1z2 262144 15.29 3 0 2488 11264 208321 40071 0 0 0 0 0 0 0 0
r1z2-10.10.1.52 262144 15.29 3 0 2488 11264 208321 40071 0 0 0 0 0 0 0 0
r1z2-10.10.1.52/s02z2ecd01 119232 0.00 1 142912 119232 0 0 0 0 0 0 0 0 0 0 0
r1z2-10.10.1.52/s02z2ecd02 119231 0.00 1 142913 119231 0 0 0 0 0 0 0 0 0 0 0
r1z2-10.10.1.52/s02z2ecd03 119230 0.00 1 142914 119230 0 0 0 0 0 0 0 0 0 0 0
r1z2-10.10.1.52/s02z2ecd04 119230 0.00 1 142914 119230 0 0 0 0 0 0 0 0 0 0 0
r1z2-10.10.1.52/s02z2ecd05 119230 0.00 1 14291...

Revision history for this message
clayg (clay-gerrard) wrote :

WTF? That file is in the wrong part dir?

root@STACO1:/#object-manager objectinfo /srv/node/s01z1ecd01/objects-1/457293/7d3/df49a292d04fb00c11e667b7f44ae7d3/1454355106.18830#11.data

The partition there is "457293" - which doesn't exist in your ring:

root@STACO1:/var/log# swift-ring-builder /etc/swift/object-1.builder
/etc/swift/object-1.builder, build version 44
262144 partitions, 12.000000 replicas, 1 regions, 4 zones, 27 devices, 0.49 balance, 42.95 dispersion

^ you only have 262K parts, so part # 457K doesn't make sense and the ring returns the empty list of nodes [1]

If you look at the output from `swift-object-info` you can see that name actually hashes to a different partition:

root@STACO1:/#object-manager objectinfo /srv/node/s01z1ecd01/objects-1/457293/7d3/df49a292d04fb00c11e667b7f44ae7d3/1454355106.18830#11.data
Path: /AUTH_Joomeo/20160201_O0/test/4/2075.mp4
  Account: AUTH_Joomeo
<snip>
ETag: 2a469c3245cf1416f77746c3e8f1d70f (valid)
Content-Length: 6862 (valid)
Partition 228646
Hash df49a292d04fb00c11e667b7f44ae7d3
<snip>

The correct partition for that hash in your cluster is "228646"

So try to `mv 457293 228646` and see if that helps - I'll file a separate bug to make ring.get_part_nodes raise a more specific error if the partition isn't in the ring.

... but I'm not sure why/how that objected ended up in the wrong directory on disk? Do you have *other* partition dirs that are "larger" than "262144"!?

1. https://gist.github.com/clayg/9aef468776d410a52e37bb40b9745590

Revision history for this message
Admin6 (jmbonnefond) wrote :

Gotcha, I think you pointed it out. My fault (of course) !

When I first saw errors type "RingValidationWarning: The partition 1968 has been assigned to duplicate devices" I thought this could be related to a too big (or too small) partition power.

As the data was only tests, I deleted the old ring which was created with a part power of 19. And re-created it with a part power of 18, BUT I should have forget to cleanup the data disk content.

So this object must be part of the previous ring. That's why it seems to be a valid object but has a partition number higher than the max part number.

Sorry :-/

Fortnately, the previous test ring wasn't used a lot. Looking at one of my disk, in a pool of 14000 dirs, I have only about 40 directories that are higher than 262144.
I could probably parse all my disks and simply remove all directories wich has a part number greater than 262144.

However, I must also have about the same number of objects with a part number lesser than 262144 but which are part of the old ring and unknown to the new one. Do you think they could mess up my current ring in any way?

Revision history for this message
clayg (clay-gerrard) wrote :

OIC, yeah you should have wiped the disks before deploying new rings. This sort of thing happens in test/dev.

Now, you're looking at a dark data problem. If you clean up those parts so they don't disrupt replication things should be able to move forward. The remaining data in the overlapping parts won't *hurt* anything - but there's a good chance that data is not useful and might not get cleaned up.

It really depends on your situation. You could either write a script to troll the objects datadirs and validate the objects are accessible and listed in their containers, then decide what to do with the data that doesn't match. Or you could wipe the whole thing and start over (reloading any data that you want in the cluster from their sources).

GL!

summary: - object-reconstructor index out of range
+ object-reconstructor IndexError when it encounters parts from another
+ ring
description: updated
Changed in swift:
importance: Undecided → Low
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.