reconstructor tries to reconstruct a deleted object from an orphan fragment

Bug #1655608 reported by Charles Hsu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Alistair Coles

Bug Description

I saw some logs in my reconstructor logs in below, it tries to reconstruct the deleted object from an orphan fragment.

```
root@dev21:/home/vagrant# /opt/ss/bin/swift-object-reconstructor /etc/swift/object-server/2.conf -v -o
object-reconstructor: Running object reconstructor in script mode.
object-reconstructor: Unable to get enough responses (1/4) to reconstruct 192.168.190.222:6003/d2/21425/AUTH_demo/ec_container/rc.local policy#2 frag#5 with ETag b9549b2b2ca3627d9f9b108a84db42b2
object-reconstructor: Unable to get enough responses (1/4) to reconstruct 192.168.190.222:6003/d2/21425/AUTH_demo/ec_container/rc.local policy#2 frag#5 with ETag b9549b2b2ca3627d9f9b108a84db42b2
object-reconstructor: 6/6 (100.00%) partitions of 3/3 (100.00%) devices reconstructed in 0.47s (12.89/sec, 0s remaining)
object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
object-reconstructor: Partition times: max 0.1264s, min 0.0158s, med 0.0293s
object-reconstructor: Object reconstruction complete (once). (0.01 minutes)
object-reconstructor: Exited
```

* Swift 2.9.0
* EC ring: 4+2
* Object reclaim age: 60 sec (for reproduce purpose)

```
root@dev21:/home/vagrant# cat check_rc_local.sh
curl -g -I -XHEAD "http://192.168.190.21:6007/d23/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2"
curl -g -I -XHEAD "http://192.168.190.21:6000/d5/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2"
curl -g -I -XHEAD "http://192.168.190.221:6006/d10/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2"
curl -g -I -XHEAD "http://192.168.190.221:6007/d11/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2"
curl -g -I -XHEAD "http://192.168.190.222:6008/d3/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2"
curl -g -I -XHEAD "http://192.168.190.222:6007/d2/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2"
curl -g -I -XHEAD "http://192.168.190.21:6006/d15/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2" # [Handoff]
curl -g -I -XHEAD "http://192.168.190.221:6008/d16/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2" # [Handoff]
curl -g -I -XHEAD "http://192.168.190.221:6000/d9/21425/AUTH_demo/ec_container/rc.local" -H "X-Backend-Storage-Policy-Index: 2" # [Handoff]
```

1) upload a file `swift upload ec_container rc.local`
2) unmount one of EC drive
3) upload the new file for the same object `swift upload ec_container rc.local.update --object-name rc.local`
4) mount all drives back
5) delete the object `swift delete ec_container rc.local`
6) run curl to check

```
root@dev21:/home/vagrant# ./check_rc_local.sh
HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 200 OK
Content-Length: 139
X-Backend-Timestamp: 1484127827.20242
X-Object-Sysmeta-Ec-Frag-Index: 1
Content-Type: application/octet-stream
X-Object-Sysmeta-Ec-Content-Length: 234
X-Object-Sysmeta-Ec-Etag: b9549b2b2ca3627d9f9b108a84db42b2
Last-Modified: Wed, 11 Jan 2017 09:43:48 GMT
Etag: "f66f68a63f4339c5ff3d40bbaf2cc6b8"
X-Timestamp: 1484127827.20242
X-Object-Sysmeta-Ec-Scheme: isa_l_rs_vand 4+2
X-Object-Meta-Mtime: 1483506217.000000
X-Object-Sysmeta-Ec-Segment-Size: 1048576
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
```

Run object-reconstructor and it just overwrite the folder and ignore the ts file (because the reclaim age?).

```
root@dev21:/home/vagrant# /opt/ss/bin/swift-object-reconstructor /etc/swift/object-server/2.conf -v -o
object-reconstructor: Running object reconstructor in script mode.
object-reconstructor: 7/7 (100.00%) partitions of 3/3 (100.00%) devices reconstructed in 0.63s (11.04/sec, 0s remaining)
object-reconstructor: Object reconstruction complete (once). (0.01 minutes)
object-reconstructor: Exited

root@dev21:/home/vagrant# ./check_rc_local.sh
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 200 OK
Content-Length: 139
X-Backend-Timestamp: 1484127827.20242
X-Object-Sysmeta-Ec-Frag-Index: 1
Content-Type: application/octet-stream
X-Object-Sysmeta-Ec-Content-Length: 234
X-Object-Sysmeta-Ec-Etag: b9549b2b2ca3627d9f9b108a84db42b2
Last-Modified: Wed, 11 Jan 2017 09:43:48 GMT
Etag: "f66f68a63f4339c5ff3d40bbaf2cc6b8"
X-Timestamp: 1484127827.20242
X-Object-Sysmeta-Ec-Scheme: isa_l_rs_vand 4+2
X-Object-Meta-Mtime: 1483506217.000000
X-Object-Sysmeta-Ec-Segment-Size: 1048576
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
```

The partition folder has new timestamp than other ts files, but the orphan fragment keeps original timestamp in his filename.

```
root@dev21:/home/vagrant# ls -lah ${DEVICE:-/srv/node*}/d5/objects-2/21425/9cf/53b1446e285406fcd195cab50dc969cf
total 8.0K
drwxr-xr-x 2 swift swift 67 Jan 11 09:46 .
drwxr-xr-x 3 swift swift 45 Jan 11 09:46 ..
-rw------- 1 swift swift 139 Jan 11 09:46 1484127827.20242#1.data
-rw-r--r-- 1 swift swift 0 Jan 11 09:46 1484127827.20242.durable

root@dev21:/home/vagrant# date -d @1484127827.20242
Wed Jan 11 09:43:47 UTC 2017

HTTP/1.1 404 Not Found
X-Backend-Timestamp: 1484127910.69963
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT

root@dev21:/home/vagrant# date -d @1484127910.69963
Wed Jan 11 09:45:10 UTC 2017
```

After that, reconstructor tries to reconstruct the deleted object form orphan fragment.

```
root@dev21:/home/vagrant# /opt/ss/bin/swift-object-reconstructor /etc/swift/object-server/2.conf -v -o
object-reconstructor: Running object reconstructor in script mode.
object-reconstructor: Unable to get enough responses (1/4) to reconstruct 192.168.190.222:6003/d2/21425/AUTH_demo/ec_container/rc.local policy#2 frag#5 with ETag b9549b2b2ca3627d9f9b108a84db42b2
object-reconstructor: Unable to get enough responses (1/4) to reconstruct 192.168.190.222:6003/d2/21425/AUTH_demo/ec_container/rc.local policy#2 frag#5 with ETag b9549b2b2ca3627d9f9b108a84db42b2
object-reconstructor: 6/6 (100.00%) partitions of 3/3 (100.00%) devices reconstructed in 0.47s (12.89/sec, 0s remaining)
object-reconstructor: 2 suffixes checked - 0.00% hashed, 100.00% synced
object-reconstructor: Partition times: max 0.1264s, min 0.0158s, med 0.0293s
object-reconstructor: Object reconstruction complete (once). (0.01 minutes)
object-reconstructor: Exited
```

Tags: ec
clayg (clay-gerrard)
Changed in swift:
status: New → Confirmed
Revision history for this message
clayg (clay-gerrard) wrote :

I knew about this. I remember chatting with acoles about it (in #openstack-swift on Freenode?) while working on https://review.openstack.org/#/c/385609/ and the other related bugs. However; I don't remember filing a bug for this, and can't find anything. So thanks!

This is sort of the equivalent of EC data data - except instead of the whole object it's just a piece - and instead of happily and silently repopulating the object on all nodes you get annoying messages in your logs FOREVER.

The work around is two fold:

1) don't reintroduce nodes after reclaim age - because that makes dark data

2) use a time machine to not run EC policies with swift < 2.11 because there's since fixed bugs with the reconstructor that can prevent *any* progress which leads to out of date parts/suffixes (it was effectively like reconstructors had been off months on end and then when you upgrade it's totally possible tombstones on handoff nodes get reaped instead of clearing out these orphaned frags)

The fix is not obvious to me :\

I'm also not sure on the priority.

IIRC the "Unable to get enough responses" message just causes the reconstructor to move onto the next hash in the suffix without disrupting the ssync protocol. If that triage is incorrect it's probably HIGH or CRITICAL until we can find a workaround. We need to make sure the reconstructor can make *other* progress even if these frags are unprocessable.

As long as the reconstructor is otherwise making progress I think we can leave it at MEDIUM or LOW priority until we grow some more braincells... once a cluster is fully upgraded and rebalanced it seems manageable that you could even just extract offending object names from the logs and script an audit by hand. Blasting out tombstones over these names with a superadmin/reseller token would totally clean them up.

Revision history for this message
Charles Hsu (charles0126) wrote :

For (2), I can repro the issue in 2.11.0, so maybe it can be avoid in (1).

clayg (clay-gerrard)
Changed in swift:
importance: Undecided → Medium
Revision history for this message
Romain LE DISEZ (rledisez) wrote :

I would love (2) to be merged, because I did (1) ;)

We dealt with that by a simple rule (note: our EC policy is 12 data/3 parity):
if during a SYNC reconstruction we get 404_errors_from_fragments > ndata, we assume it's dark data, and we quarantine the local fragment

It's a pretty simple patch. The threshold might need to be more generic because in some config it would be dangerous (eg: 3 data/6 parity).

Revision history for this message
clayg (clay-gerrard) wrote :

!? although (2) *would* fix a lot of problems - no one has a time machine?

I think my sarcasm was lost.

To workaround/avoid this issue today - you have to (1) operationally avoid re-intruducing hardware that's been out a reclaim age. Unfortunately you also are probably stuck with this issue if you've been following along since early versions of EC (2).

Because we try to have some safe-guards in place to avoid dark data; I was able to pull off a manual cleanup feeding some parsed logs into a bulk delete - and consider the issue mostly solved.

At the design summit in ATL (2/2017) it was pointed out the I/O consumption pattern of re-introducing dark data in Replicated is very different from EC. The noisy messages in logs and continually trying to rebuild data that can't be rebuilt can be more annoying than just having wasted bytes on disk depending on your use-case.

The best idea to date is to aggressively search [1] for potential fragments on un-reconstructable fragments of data older than a reclaim age and quarantine the fragments if there's too many 404's.

1. very aggressively, easily request_node_counts in the 100's

Revision history for this message
clayg (clay-gerrard) wrote :

Ideally we can consider this effort as further justification to address lp bug #1655608

Revision history for this message
clayg (clay-gerrard) wrote :

the best idea for now is to add some code around where we log that message we would go check handoffs and quarantine if the # of 200's was under some threshold - maybe 1/2 data or 2x parity - like if you have TOO many missing fragments quarantine is safe enough to assume the fragment was dark data instead of hoping "someday" the cluster will find enough fragments to rebuild the object to full health.

Revision history for this message
Anton (a.porabkovich) wrote :

The problem is very relevant.
How to delete from the console a deleted object from an orphan fragments?

Revision history for this message
clayg (clay-gerrard) wrote :

if the container is still around you can just issue a normal delete request

otherwise... maybe something like this:

https://gist.github.com/clayg/a01965d36ce0205a2b90364430802b62

Revision history for this message
Anton (a.porabkovich) wrote :

thanks, but does not work
[root@swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Path: /AUTH_ee951409602249bc8d1f2425da8b9635/public/1489/4570836-kdg-_052jpg.jpg
  Account: AUTH_ee951409602249bc8d1f2425da8b9635
  Container: public
  Object: 1489/4570836-kdg-_052jpg.jpg
  Object hash: 210b9a25e7c9edde5b63e5ca21d0566c
Content-Type: image/jpeg
Timestamp: 2018-12-07T13:13:58.189250 (1544188438.18925)
System Metadata:
  X-Object-Sysmeta-Ec-Content-Length: 2883383
  X-Object-Sysmeta-Ec-Etag: 263b0b368ce0278090cd38e77d51778c
  X-Object-Sysmeta-Ec-Frag-Index: 12
  X-Object-Sysmeta-Ec-Scheme: liberasurecode_rs_vand 10+4
  X-Object-Sysmeta-Ec-Segment-Size: 1048576
Transient System Metadata:
  No metadata found
User Metadata:
  No metadata found
Other Metadata:
  No metadata found
ETag: 5f82fe6c740ae5fe0f4392e0618ffeae (valid)
Content-Length: 288580 (valid)
Partition 264
Hash 210b9a25e7c9edde5b63e5ca21d0566c

[root@swift bin]# kill-obj account="AUTH_ee951409602249bc8d1f2425da8b9635" container="public" object="1489/4570836-kdg-_052jpg.jpg"
('params', 'account=AUTH_ee951409602249bc8d1f2425da8b9635', 'container=public', 'object=1489/4570836-kdg-_052jpg.jpg')
('info ', 83, [{'index': 0, u'replication_port': 6201, u'weight': 100.0, u'zone': 3, u'ip': u'10.0.1.200', u'region': 1, u'id': 0, u'replication_ip': u'10.0.1.200', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 1, u'replication_port': 6201, u'weight': 100.0, u'zone': 4, u'ip': u'10.0.1.203', u'region': 1, u'id': 3, u'replication_ip': u'10.0.1.203', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 2, u'replication_port': 6201, u'weight': 100.0, u'zone': 2, u'ip': u'10.0.1.201', u'region': 1, u'id': 1, u'replication_ip': u'10.0.1.201', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 3, u'replication_port': 6201, u'weight': 100.0, u'zone': 1, u'ip': u'10.0.1.202', u'region': 1, u'id': 2, u'replication_ip': u'10.0.1.202', u'meta': u'', u'device': u'md0', u'port': 6201}])
Hasta la vista, baby.

[root@swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Path: /AUTH_ee951409602249bc8d1f2425da8b9635/public/1489/4570836-kdg-_052jpg.jpg
  Account: AUTH_ee951409602249bc8d1f2425da8b9635
  Container: public
  Object: 1489/4570836-kdg-_052jpg.jpg
  Object hash: 210b9a25e7c9edde5b63e5ca21d0566c
Content-Type: image/jpeg
Timestamp: 2018-12-07T13:13:58.189250 (1544188438.18925)
System Metadata:
  X-Object-Sysmeta-Ec-Content-Length: 2883383
  X-Object-Sysmeta-Ec-Etag: 263b0b368ce0278090cd38e77d51778c
  X-Object-Sysmeta-Ec-Frag-Index: 12
  X-Object-Sysmeta-Ec-Scheme: liberasurecode_rs_vand 10+4
  X-Object-Sysmeta-Ec-Segment-Size: 1048576
Transient System Metadata:
  No metadata found
User Metadata:
  No metadata found
Other Metadata:
  No metadata found
ETag: 5f82fe6c740ae5fe0f4392e0618ffeae (valid)
Content-Length: 288580 (valid)
Partition 264
Hash 210b9a25e7c9edde5b63e5ca21d0566c

Revision history for this message
Anton (a.porabkovich) wrote :

Oct 9 05:37:00 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:00 +0000] "GET /cold3/1168/AUTH_ee951409602249bc8d1f2425da8b9635/public/1578/thumbs/thumb_1358327210-img_1458jpg.jpg" 200 11920 "-" "-" "obj-reconstructor
49470" 0.0007 "-" 50176 0
Oct 9 05:37:01 proxy4 account-server: 10.0.1.203 - - [09/Oct/2019:05:37:01 +0000] "HEAD /md0/848/account%3DAUTH_ee951409602249bc8d1f2425da8b9635" 404 - "HEAD http://localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635?
format=json" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 0.0004 "-" 50003 -
Oct 9 05:37:02 proxy4 swift: - - 09/Oct/2019/05/37/02 DELETE /v1/account%253DAUTH_ee951409602249bc8d1f2425da8b9635/container%253Dpublic/object%253D1489/4570836-kdg-_052jpg.jpg HTTP/1.0 499 - internal-terminator - - 70 - tx9875
32779b974dcfa9a94-005d9d71fd - 0.5505 - - 1570599421.745944977 1570599422.296478033 0
Oct 9 05:37:02 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:02 +0000] "DELETE /cold4/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 1.0616 "-" 50190 0
Oct 9 05:37:02 proxy4 container-server: 10.0.1.203 - - [09/Oct/2019:05:37:02 +0000] "DELETE /md0/83/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 - "DELETE http:/
/localhost/cold1/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "object-server 50142" 0.0003 "-" 50004 0
Oct 9 05:37:03 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:03 +0000] "DELETE /cold7/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 1.4418 "-" 50277 0
Oct 9 05:37:03 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:03 +0000] "DELETE /cold1/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 2.1532 "-" 50142 0
Oct 9 05:37:05 proxy4 swift-container: Container sharder cycle starting, auto-sharding False

Revision history for this message
Anton (a.porabkovich) wrote :

everything works! it I not correctly told arguments, thanks!
[root@cz_swift bin]# kill-obj "AUTH_ee951409602249bc8d1f2425da8b9635" "public" "1489/4570836-kdg-_052jpg.jpg" --config="/etc/swift/internal-terminator.conf"
('params', 'AUTH_ee951409602249bc8d1f2425da8b9635', 'public', '1489/4570836-kdg-_052jpg.jpg')
('info ', 25, [{'index': 0, u'replication_port': 6201, u'weight': 100.0, u'zone': 4, u'ip': u'10.0.1.203', u'region': 1, u'id': 3, u'replication_ip': u'10.0.1.203', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 1, u'replication_port': 6201, u'weight': 100.0, u'zone': 1, u'ip': u'10.0.1.202', u'region': 1, u'id': 2, u'replication_ip': u'10.0.1.202', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 2, u'replication_port': 6201, u'weight': 100.0, u'zone': 2, u'ip': u'10.0.1.201', u'region': 1, u'id': 1, u'replication_ip': u'10.0.1.201', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 3, u'replication_port': 6201, u'weight': 100.0, u'zone': 3, u'ip': u'10.0.1.200', u'region': 1, u'id': 0, u'replication_ip': u'10.0.1.200', u'meta': u'', u'device': u'md0', u'port': 6201}])
Hasta la vista, baby.
[root@cz_swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Data file doesn't exist

Revision history for this message
Anton (a.porabkovich) wrote :

hi
After deleting objects, the entire log on all servers is clogged with strings of the form:
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.201 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1621/546499149-4e4a6521jpg.jpg" 404 - "DELETE http://localhost/cold4/1475/AUTH_ee951409602249bc8d1f2425da8b9635/public/1621/546499149-4e4a6521jpg.jpg" "tx74d3b1b9219b45ffbd77a-005daa7b87" "object-updater 8275" 0.0004 "-" 12690 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.202 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1605/2115067551-pesnyakov---301218---0202jpg.JPG" 404 - "DELETE http://localhost/cold2/1650/AUTH_ee951409602249bc8d1f2425da8b9635/public/1605/2115067551-pesnyakov---301218---0202jpg.JPG" "txc623d8d14a74438ca88c2-005daaa972" "object-updater 19896" 0.0004 "-" 12966 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.202 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/417/1478098463-_U5A5686JPG.JPG" 404 - "DELETE http://localhost/cold2/904/AUTH_ee951409602249bc8d1f2425da8b9635/public/417/1478098463-_U5A5686JPG.JPG" "tx3e62e3ce85ce4d5e93ebd-005daaa8a2" "object-updater 19896" 0.0004 "-" 12762 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.200 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1450/1038282845-image00070jpg.jpg" 404 - "DELETE http://localhost/cold2/384/AUTH_ee951409602249bc8d1f2425da8b9635/public/1450/1038282845-image00070jpg.jpg" "txe403ed28cb6b4d1683182-005dace4d4" "object-updater 14943" 0.0004 "-" 12770 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.203 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1488/1670728476-dsc_8754jpg.JPG" 404 - "DELETE http://localhost/cold0/1188/AUTH_ee951409602249bc8d1f2425da8b9635/public/1488/1670728476-dsc_8754jpg.JPG" "txf225c685707440f5b92a2-005daa2b56" "object-updater 31065" 0.0003 "-" 12957 0

Requests arrive frequently, a stack of 1,000 pieces per Ip

Where do you take these changes to delete non-existing objects?
How do I remove these requests?

Revision history for this message
clayg (clay-gerrard) wrote :

Those messages are from the object-updater, they're coming from async pendings and going to the container layer. I don't think they're directly related to the issue in this bug report.

Please post your question to the openstack mailing list [1] or ask for help on irc [2].

1. https://docs.openstack.org/project-team-guide/open-community.html#mailing-lists
2. https://wiki.openstack.org/wiki/IRC

Changed in swift:
assignee: nobody → Alistair Coles (alistair-coles)
Changed in swift:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/787279

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/787279
Committed: https://opendev.org/openstack/swift/commit/7960097f02bab5a4568eddb6b98d7b8b47543561
Submitter: "Zuul (22348)"
Branch: master

commit 7960097f02bab5a4568eddb6b98d7b8b47543561
Author: Alistair Coles <email address hidden>
Date: Tue Apr 20 19:14:19 2021 +0100

    reconstructor: log more details when rebuild fails

    When the reconstructor fails to gather enough fragments to rebuild a
    missing fragment, log more details about the responses that it *did*
    get:

      - log total number of ok responses, as well as the number of useful
        responses, to reveal if, for example, there might have been
        duplicate frag indexes or mixed etags.

      - log the mix of error status codes received to reveal if, for
        example, they were all 404s.

    Also refactor reconstruct_fa to track all state related to a timestamp
    in a small data encapsulation class rather than in multiple dicts.

    Related-Bug: 1655608
    Change-Id: I3f87933f788685775ce59f3724f17d5db948d502

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by "Alistair Coles <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/swift/+/786084
Reason: preferred fix is https://review.opendev.org/c/openstack/swift/+/788833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/788833
Committed: https://opendev.org/openstack/swift/commit/46ea3aeae817132957fb4fee737a1ad8d1e07837
Submitter: "Zuul (22348)"
Branch: master

commit 46ea3aeae817132957fb4fee737a1ad8d1e07837
Author: Alistair Coles <email address hidden>
Date: Thu Apr 29 20:48:51 2021 +0100

    Quarantine stale EC fragments after checking handoffs

    If the reconstructor finds a fragment that appears to be stale then it
    will now quarantine the fragment. Fragments are considered stale if
    insufficient fragments at the same timestamp can be found to rebuild
    missing fragments, and the number found is less than or equal to a new
    reconstructor 'quarantine_threshold' config option.

    Before quarantining a fragment the reconstructor will attempt to fetch
    fragments from handoff nodes in addition to the usual primary nodes.
    The handoff requests are limited by a new 'request_node_count'
    config option.

    'quarantine_threshold' defaults to zero i.e. no fragments will be
    quarantined. 'request node count' defaults to '2 * replicas'.

    Closes-Bug: 1655608

    Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.28.0

This issue was fixed in the openstack/swift 2.28.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.