Bug #1655608 “reconstructor tries to reconstruct a deleted objec...” : Bugs : OpenStack Object Storage (swift)

clayg (clay-gerrard) on 2017-01-11

Changed in swift:
status:	New → Confirmed

Revision history for this message

clayg (clay-gerrard) wrote on 2017-01-11:

#1

I knew about this. I remember chatting with acoles about it (in #openstack-swift on Freenode?) while working on https://review.openstack.org/#/c/385609/ and the other related bugs. However; I don't remember filing a bug for this, and can't find anything. So thanks!

This is sort of the equivalent of EC data data - except instead of the whole object it's just a piece - and instead of happily and silently repopulating the object on all nodes you get annoying messages in your logs FOREVER.

The work around is two fold:

1) don't reintroduce nodes after reclaim age - because that makes dark data

2) use a time machine to not run EC policies with swift < 2.11 because there's since fixed bugs with the reconstructor that can prevent *any* progress which leads to out of date parts/suffixes (it was effectively like reconstructors had been off months on end and then when you upgrade it's totally possible tombstones on handoff nodes get reaped instead of clearing out these orphaned frags)

The fix is not obvious to me :\

I'm also not sure on the priority.

IIRC the "Unable to get enough responses" message just causes the reconstructor to move onto the next hash in the suffix without disrupting the ssync protocol. If that triage is incorrect it's probably HIGH or CRITICAL until we can find a workaround. We need to make sure the reconstructor can make *other* progress even if these frags are unprocessable.

As long as the reconstructor is otherwise making progress I think we can leave it at MEDIUM or LOW priority until we grow some more braincells... once a cluster is fully upgraded and rebalanced it seems manageable that you could even just extract offending object names from the logs and script an audit by hand. Blasting out tombstones over these names with a superadmin/reseller token would totally clean them up.

Revision history for this message

Charles Hsu (charles0126) wrote on 2017-01-12:

#2

For (2), I can repro the issue in 2.11.0, so maybe it can be avoid in (1).

clayg (clay-gerrard) on 2017-01-17

Changed in swift:
importance:	Undecided → Medium

Revision history for this message

Romain LE DISEZ (rledisez) wrote on 2017-02-17:

#3

I would love (2) to be merged, because I did (1) ;)

We dealt with that by a simple rule (note: our EC policy is 12 data/3 parity):
if during a SYNC reconstruction we get 404_errors_from_fragments > ndata, we assume it's dark data, and we quarantine the local fragment

It's a pretty simple patch. The threshold might need to be more generic because in some config it would be dangerous (eg: 3 data/6 parity).

Revision history for this message

clayg (clay-gerrard) wrote on 2017-02-27:

#4

!? although (2) *would* fix a lot of problems - no one has a time machine?

I think my sarcasm was lost.

To workaround/avoid this issue today - you have to (1) operationally avoid re-intruducing hardware that's been out a reclaim age. Unfortunately you also are probably stuck with this issue if you've been following along since early versions of EC (2).

Because we try to have some safe-guards in place to avoid dark data; I was able to pull off a manual cleanup feeding some parsed logs into a bulk delete - and consider the issue mostly solved.

At the design summit in ATL (2/2017) it was pointed out the I/O consumption pattern of re-introducing dark data in Replicated is very different from EC. The noisy messages in logs and continually trying to rebuild data that can't be rebuilt can be more annoying than just having wasted bytes on disk depending on your use-case.

The best idea to date is to aggressively search [1] for potential fragments on un-reconstructable fragments of data older than a reclaim age and quarantine the fragments if there's too many 404's.

1. very aggressively, easily request_node_counts in the 100's

Revision history for this message

clayg (clay-gerrard) wrote on 2017-02-27:

#5

Ideally we can consider this effort as further justification to address lp bug #1655608

Revision history for this message

clayg (clay-gerrard) wrote on 2018-08-23:

#6

the best idea for now is to add some code around where we log that message we would go check handoffs and quarantine if the # of 200's was under some threshold - maybe 1/2 data or 2x parity - like if you have TOO many missing fragments quarantine is safe enough to assume the fragment was dark data instead of hoping "someday" the cluster will find enough fragments to rebuild the object to full health.

Revision history for this message

Anton (a.porabkovich) wrote on 2019-10-08:

#7

The problem is very relevant.
How to delete from the console a deleted object from an orphan fragments?

Revision history for this message

clayg (clay-gerrard) wrote on 2019-10-08:

#8

if the container is still around you can just issue a normal delete request

otherwise... maybe something like this:

https://gist.github.com/clayg/a01965d36ce0205a2b90364430802b62

Revision history for this message

Anton (a.porabkovich) wrote on 2019-10-09:

#9

thanks, but does not work
[root@swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Path: /AUTH_ee951409602249bc8d1f2425da8b9635/public/1489/4570836-kdg-_052jpg.jpg
  Account: AUTH_ee951409602249bc8d1f2425da8b9635
  Container: public
  Object: 1489/4570836-kdg-_052jpg.jpg
  Object hash: 210b9a25e7c9edde5b63e5ca21d0566c
Content-Type: image/jpeg
Timestamp: 2018-12-07T13:13:58.189250 (1544188438.18925)
System Metadata:
  X-Object-Sysmeta-Ec-Content-Length: 2883383
  X-Object-Sysmeta-Ec-Etag: 263b0b368ce0278090cd38e77d51778c
  X-Object-Sysmeta-Ec-Frag-Index: 12
  X-Object-Sysmeta-Ec-Scheme: liberasurecode_rs_vand 10+4
  X-Object-Sysmeta-Ec-Segment-Size: 1048576
Transient System Metadata:
  No metadata found
User Metadata:
  No metadata found
Other Metadata:
  No metadata found
ETag: 5f82fe6c740ae5fe0f4392e0618ffeae (valid)
Content-Length: 288580 (valid)
Partition 264
Hash 210b9a25e7c9edde5b63e5ca21d0566c

[root@swift bin]# kill-obj account="AUTH_ee951409602249bc8d1f2425da8b9635" container="public" object="1489/4570836-kdg-_052jpg.jpg"
('params', 'account=AUTH_ee951409602249bc8d1f2425da8b9635', 'container=public', 'object=1489/4570836-kdg-_052jpg.jpg')
('info ', 83, [{'index': 0, u'replication_port': 6201, u'weight': 100.0, u'zone': 3, u'ip': u'10.0.1.200', u'region': 1, u'id': 0, u'replication_ip': u'10.0.1.200', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 1, u'replication_port': 6201, u'weight': 100.0, u'zone': 4, u'ip': u'10.0.1.203', u'region': 1, u'id': 3, u'replication_ip': u'10.0.1.203', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 2, u'replication_port': 6201, u'weight': 100.0, u'zone': 2, u'ip': u'10.0.1.201', u'region': 1, u'id': 1, u'replication_ip': u'10.0.1.201', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 3, u'replication_port': 6201, u'weight': 100.0, u'zone': 1, u'ip': u'10.0.1.202', u'region': 1, u'id': 2, u'replication_ip': u'10.0.1.202', u'meta': u'', u'device': u'md0', u'port': 6201}])
Hasta la vista, baby.

[root@swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Path: /AUTH_ee951409602249bc8d1f2425da8b9635/public/1489/4570836-kdg-_052jpg.jpg
  Account: AUTH_ee951409602249bc8d1f2425da8b9635
  Container: public
  Object: 1489/4570836-kdg-_052jpg.jpg
  Object hash: 210b9a25e7c9edde5b63e5ca21d0566c
Content-Type: image/jpeg
Timestamp: 2018-12-07T13:13:58.189250 (1544188438.18925)
System Metadata:
  X-Object-Sysmeta-Ec-Content-Length: 2883383
  X-Object-Sysmeta-Ec-Etag: 263b0b368ce0278090cd38e77d51778c
  X-Object-Sysmeta-Ec-Frag-Index: 12
  X-Object-Sysmeta-Ec-Scheme: liberasurecode_rs_vand 10+4
  X-Object-Sysmeta-Ec-Segment-Size: 1048576
Transient System Metadata:
  No metadata found
User Metadata:
  No metadata found
Other Metadata:
  No metadata found
ETag: 5f82fe6c740ae5fe0f4392e0618ffeae (valid)
Content-Length: 288580 (valid)
Partition 264
Hash 210b9a25e7c9edde5b63e5ca21d0566c

thanks, but does not work
[root@swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Path: /AUTH_ee951409602249bc8d1f2425da8b9635/public/1489/4570836-kdg-_052jpg.jpg
  Account: AUTH_ee951409602249bc8d1f2425da8b9635
  Container: public
  Object: 1489/4570836-kdg-_052jpg.jpg
  Object hash: 210b9a25e7c9edde5b63e5ca21d0566c
Content-Type: image/jpeg
Timestamp: 2018-12-07T13:13:58.189250 (1544188438.18925)
System Metadata:
  X-Object-Sysmeta-Ec-Content-Length: 2883383
  X-Object-Sysmeta-Ec-Etag: 263b0b368ce0278090cd38e77d51778c
  X-Object-Sysmeta-Ec-Frag-Index: 12
  X-Object-Sysmeta-Ec-Scheme: liberasurecode_rs_vand 10+4
  X-Object-Sysmeta-Ec-Segment-Size: 1048576
Transient System Metadata:
  No metadata found
User Metadata:
  No metadata found
Other Metadata:
  No metadata found
ETag: 5f82fe6c740ae5fe0f4392e0618ffeae (valid)
Content-Length: 288580 (valid)
Partition       264
Hash            210b9a25e7c9edde5b63e5ca21d0566c

[root@swift bin]# kill-obj account="AUTH_ee951409602249bc8d1f2425da8b9635" container="public" object="1489/4570836-kdg-_052jpg.jpg"
('params', 'account=AUTH_ee951409602249bc8d1f2425da8b9635', 'container=public', 'object=1489/4570836-kdg-_052jpg.jpg')
('info ', 83, [{'index': 0, u'replication_port': 6201, u'weight': 100.0, u'zone': 3, u'ip': u'10.0.1.200', u'region': 1, u'id': 0, u'replication_ip': u'10.0.1.200', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 1, u'replication_port': 6201, u'weight': 100.0, u'zone': 4, u'ip': u'10.0.1.203', u'region': 1, u'id': 3, u'replication_ip': u'10.0.1.203', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 2, u'replication_port': 6201, u'weight': 100.0, u'zone': 2, u'ip': u'10.0.1.201', u'region': 1, u'id': 1, u'replication_ip': u'10.0.1.201', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 3, u'replication_port': 6201, u'weight': 100.0, u'zone': 1, u'ip': u'10.0.1.202', u'region': 1, u'id': 2, u'replication_ip': u'10.0.1.202', u'meta': u'', u'device': u'md0', u'port': 6201}])
Hasta la vista, baby.

[root@swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Path: /AUTH_ee951409602249bc8d1f2425da8b9635/public/1489/4570836-kdg-_052jpg.jpg
  Account: AUTH_ee951409602249bc8d1f2425da8b9635
  Container: public
  Object: 1489/4570836-kdg-_052jpg.jpg
  Object hash: 210b9a25e7c9edde5b63e5ca21d0566c
Content-Type: image/jpeg
Timestamp: 2018-12-07T13:13:58.189250 (1544188438.18925)
System Metadata:
  X-Object-Sysmeta-Ec-Content-Length: 2883383
  X-Object-Sysmeta-Ec-Etag: 263b0b368ce0278090cd38e77d51778c
  X-Object-Sysmeta-Ec-Frag-Index: 12
  X-Object-Sysmeta-Ec-Scheme: liberasurecode_rs_vand 10+4
  X-Object-Sysmeta-Ec-Segment-Size: 1048576
Transient System Metadata:
  No metadata found
User Metadata:
  No metadata found
Other Metadata:
  No metadata found
ETag: 5f82fe6c740ae5fe0f4392e0618ffeae (valid)
Content-Length: 288580 (valid)
Partition       264
Hash            210b9a25e7c9edde5b63e5ca21d0566c

Revision history for this message

Anton (a.porabkovich) wrote on 2019-10-09:

#10

Oct 9 05:37:00 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:00 +0000] "GET /cold3/1168/AUTH_ee951409602249bc8d1f2425da8b9635/public/1578/thumbs/thumb_1358327210-img_1458jpg.jpg" 200 11920 "-" "-" "obj-reconstructor
49470" 0.0007 "-" 50176 0
Oct 9 05:37:01 proxy4 account-server: 10.0.1.203 - - [09/Oct/2019:05:37:01 +0000] "HEAD /md0/848/account%3DAUTH_ee951409602249bc8d1f2425da8b9635" 404 - "HEAD http://localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635?
format=json" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 0.0004 "-" 50003 -
Oct 9 05:37:02 proxy4 swift: - - 09/Oct/2019/05/37/02 DELETE /v1/account%253DAUTH_ee951409602249bc8d1f2425da8b9635/container%253Dpublic/object%253D1489/4570836-kdg-_052jpg.jpg HTTP/1.0 499 - internal-terminator - - 70 - tx9875
32779b974dcfa9a94-005d9d71fd - 0.5505 - - 1570599421.745944977 1570599422.296478033 0
Oct 9 05:37:02 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:02 +0000] "DELETE /cold4/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 1.0616 "-" 50190 0
Oct 9 05:37:02 proxy4 container-server: 10.0.1.203 - - [09/Oct/2019:05:37:02 +0000] "DELETE /md0/83/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 - "DELETE http:/
/localhost/cold1/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "object-server 50142" 0.0003 "-" 50004 0
Oct 9 05:37:03 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:03 +0000] "DELETE /cold7/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 1.4418 "-" 50277 0
Oct 9 05:37:03 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:03 +0000] "DELETE /cold1/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 2.1532 "-" 50142 0
Oct 9 05:37:05 proxy4 swift-container: Container sharder cycle starting, auto-sharding False

Oct  9 05:37:00 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:00 +0000] "GET /cold3/1168/AUTH_ee951409602249bc8d1f2425da8b9635/public/1578/thumbs/thumb_1358327210-img_1458jpg.jpg" 200 11920 "-" "-" "obj-reconstructor
49470" 0.0007 "-" 50176 0
Oct  9 05:37:01 proxy4 account-server: 10.0.1.203 - - [09/Oct/2019:05:37:01 +0000] "HEAD /md0/848/account%3DAUTH_ee951409602249bc8d1f2425da8b9635" 404 - "HEAD http://localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635?
format=json" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 0.0004 "-" 50003 -
Oct  9 05:37:02 proxy4 swift: - - 09/Oct/2019/05/37/02 DELETE /v1/account%253DAUTH_ee951409602249bc8d1f2425da8b9635/container%253Dpublic/object%253D1489/4570836-kdg-_052jpg.jpg HTTP/1.0 499 - internal-terminator - - 70 - tx9875
32779b974dcfa9a94-005d9d71fd - 0.5505 - - 1570599421.745944977 1570599422.296478033 0
Oct  9 05:37:02 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:02 +0000] "DELETE /cold4/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 1.0616 "-" 50190 0
Oct  9 05:37:02 proxy4 container-server: 10.0.1.203 - - [09/Oct/2019:05:37:02 +0000] "DELETE /md0/83/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 - "DELETE http:/
/localhost/cold1/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "object-server 50142" 0.0003 "-" 50004 0
Oct  9 05:37:03 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:03 +0000] "DELETE /cold7/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 1.4418 "-" 50277 0
Oct  9 05:37:03 proxy4 object-server: 10.0.1.203 - - [09/Oct/2019:05:37:03 +0000] "DELETE /cold1/543/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" 404 70 "DELETE http:
//localhost/v1/account%3DAUTH_ee951409602249bc8d1f2425da8b9635/container%3Dpublic/object%3D1489/4570836-kdg-_052jpg.jpg" "tx987532779b974dcfa9a94-005d9d71fd" "proxy-server 46779" 2.1532 "-" 50142 0
Oct  9 05:37:05 proxy4 swift-container: Container sharder cycle starting, auto-sharding False

Revision history for this message

Anton (a.porabkovich) wrote on 2019-10-09:

#11

everything works! it I not correctly told arguments, thanks!
[root@cz_swift bin]# kill-obj "AUTH_ee951409602249bc8d1f2425da8b9635" "public" "1489/4570836-kdg-_052jpg.jpg" --config="/etc/swift/internal-terminator.conf"
('params', 'AUTH_ee951409602249bc8d1f2425da8b9635', 'public', '1489/4570836-kdg-_052jpg.jpg')
('info ', 25, [{'index': 0, u'replication_port': 6201, u'weight': 100.0, u'zone': 4, u'ip': u'10.0.1.203', u'region': 1, u'id': 3, u'replication_ip': u'10.0.1.203', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 1, u'replication_port': 6201, u'weight': 100.0, u'zone': 1, u'ip': u'10.0.1.202', u'region': 1, u'id': 2, u'replication_ip': u'10.0.1.202', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 2, u'replication_port': 6201, u'weight': 100.0, u'zone': 2, u'ip': u'10.0.1.201', u'region': 1, u'id': 1, u'replication_ip': u'10.0.1.201', u'meta': u'', u'device': u'md0', u'port': 6201}, {'index': 3, u'replication_port': 6201, u'weight': 100.0, u'zone': 3, u'ip': u'10.0.1.200', u'region': 1, u'id': 0, u'replication_ip': u'10.0.1.200', u'meta': u'', u'device': u'md0', u'port': 6201}])
Hasta la vista, baby.
[root@cz_swift bin]# swift-object-info /srv/node/object4/cold0/objects/264/66c/210b9a25e7c9edde5b63e5ca21d0566c/1544188438.18925#12#d.data
Data file doesn't exist

Revision history for this message

Anton (a.porabkovich) wrote on 2019-11-17:

#12

hi
After deleting objects, the entire log on all servers is clogged with strings of the form:
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.201 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1621/546499149-4e4a6521jpg.jpg" 404 - "DELETE http://localhost/cold4/1475/AUTH_ee951409602249bc8d1f2425da8b9635/public/1621/546499149-4e4a6521jpg.jpg" "tx74d3b1b9219b45ffbd77a-005daa7b87" "object-updater 8275" 0.0004 "-" 12690 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.202 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1605/2115067551-pesnyakov---301218---0202jpg.JPG" 404 - "DELETE http://localhost/cold2/1650/AUTH_ee951409602249bc8d1f2425da8b9635/public/1605/2115067551-pesnyakov---301218---0202jpg.JPG" "txc623d8d14a74438ca88c2-005daaa972" "object-updater 19896" 0.0004 "-" 12966 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.202 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/417/1478098463-_U5A5686JPG.JPG" 404 - "DELETE http://localhost/cold2/904/AUTH_ee951409602249bc8d1f2425da8b9635/public/417/1478098463-_U5A5686JPG.JPG" "tx3e62e3ce85ce4d5e93ebd-005daaa8a2" "object-updater 19896" 0.0004 "-" 12762 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.200 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1450/1038282845-image00070jpg.jpg" 404 - "DELETE http://localhost/cold2/384/AUTH_ee951409602249bc8d1f2425da8b9635/public/1450/1038282845-image00070jpg.jpg" "txe403ed28cb6b4d1683182-005dace4d4" "object-updater 14943" 0.0004 "-" 12770 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.203 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1488/1670728476-dsc_8754jpg.JPG" 404 - "DELETE http://localhost/cold0/1188/AUTH_ee951409602249bc8d1f2425da8b9635/public/1488/1670728476-dsc_8754jpg.JPG" "txf225c685707440f5b92a2-005daa2b56" "object-updater 31065" 0.0003 "-" 12957 0

Requests arrive frequently, a stack of 1,000 pieces per Ip

Where do you take these changes to delete non-existing objects?
How do I remove these requests?

hi
After deleting objects, the entire log on all servers is clogged with strings of the form:
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.201 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1621/546499149-4e4a6521jpg.jpg" 404 - "DELETE http://localhost/cold4/1475/AUTH_ee951409602249bc8d1f2425da8b9635/public/1621/546499149-4e4a6521jpg.jpg" "tx74d3b1b9219b45ffbd77a-005daa7b87" "object-updater 8275" 0.0004 "-" 12690 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.202 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1605/2115067551-pesnyakov---301218---0202jpg.JPG" 404 - "DELETE http://localhost/cold2/1650/AUTH_ee951409602249bc8d1f2425da8b9635/public/1605/2115067551-pesnyakov---301218---0202jpg.JPG" "txc623d8d14a74438ca88c2-005daaa972" "object-updater 19896" 0.0004 "-" 12966 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.202 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/417/1478098463-_U5A5686JPG.JPG" 404 - "DELETE http://localhost/cold2/904/AUTH_ee951409602249bc8d1f2425da8b9635/public/417/1478098463-_U5A5686JPG.JPG" "tx3e62e3ce85ce4d5e93ebd-005daaa8a2" "object-updater 19896" 0.0004 "-" 12762 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.200 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1450/1038282845-image00070jpg.jpg" 404 - "DELETE http://localhost/cold2/384/AUTH_ee951409602249bc8d1f2425da8b9635/public/1450/1038282845-image00070jpg.jpg" "txe403ed28cb6b4d1683182-005dace4d4" "object-updater 14943" 0.0004 "-" 12770 0
Nov 17 05:27:30 cz-swift1 container-server: 10.0.1.203 - - [17/Nov/2019:05:27:30 +0000] "DELETE /md0/25/AUTH_ee951409602249bc8d1f2425da8b9635/public/1488/1670728476-dsc_8754jpg.JPG" 404 - "DELETE http://localhost/cold0/1188/AUTH_ee951409602249bc8d1f2425da8b9635/public/1488/1670728476-dsc_8754jpg.JPG" "txf225c685707440f5b92a2-005daa2b56" "object-updater 31065" 0.0003 "-" 12957 0

Requests arrive frequently, a stack of 1,000 pieces per Ip

Where do you take these changes to delete non-existing objects?
How do I remove these requests?

Revision history for this message

clayg (clay-gerrard) wrote on 2019-11-18:

#13

Those messages are from the object-updater, they're coming from async pendings and going to the container layer. I don't think they're directly related to the issue in this bug report.

Please post your question to the openstack mailing list [1] or ask for help on irc [2].

1. https://docs.openstack.org/project-team-guide/open-community.html#mailing-lists
2. https://wiki.openstack.org/wiki/IRC

Alistair Coles (alistair-coles) on 2021-04-13

Changed in swift:
assignee:	nobody → Alistair Coles (alistair-coles)

OpenStack Infra (hudson-openstack) on 2021-04-16

Changed in swift:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-20: Related fix proposed to swift (master)

#14

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/787279

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-28: Related fix merged to swift (master)

#15

Reviewed: https://review.opendev.org/c/openstack/swift/+/787279
Committed: https://opendev.org/openstack/swift/commit/7960097f02bab5a4568eddb6b98d7b8b47543561
Submitter: "Zuul (22348)"
Branch: master

commit 7960097f02bab5a4568eddb6b98d7b8b47543561
Author: Alistair Coles <email address hidden>
Date: Tue Apr 20 19:14:19 2021 +0100

reconstructor: log more details when rebuild fails

    When the reconstructor fails to gather enough fragments to rebuild a
    missing fragment, log more details about the responses that it *did*
    get:

      - log total number of ok responses, as well as the number of useful
        responses, to reveal if, for example, there might have been
        duplicate frag indexes or mixed etags.

- log the mix of error status codes received to reveal if, for
example, they were all 404s.

Also refactor reconstruct_fa to track all state related to a timestamp
in a small data encapsulation class rather than in multiple dicts.

Related-Bug: 1655608
Change-Id: I3f87933f788685775ce59f3724f17d5db948d502

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-11: Change abandoned on swift (master)

#16

Change abandoned by "Alistair Coles <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/swift/+/786084
Reason: preferred fix is https://review.opendev.org/c/openstack/swift/+/788833

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-11: Fix merged to swift (master)

#17

Reviewed: https://review.opendev.org/c/openstack/swift/+/788833
Committed: https://opendev.org/openstack/swift/commit/46ea3aeae817132957fb4fee737a1ad8d1e07837
Submitter: "Zuul (22348)"
Branch: master

commit 46ea3aeae817132957fb4fee737a1ad8d1e07837
Author: Alistair Coles <email address hidden>
Date: Thu Apr 29 20:48:51 2021 +0100

Quarantine stale EC fragments after checking handoffs

    If the reconstructor finds a fragment that appears to be stale then it
    will now quarantine the fragment. Fragments are considered stale if
    insufficient fragments at the same timestamp can be found to rebuild
    missing fragments, and the number found is less than or equal to a new
    reconstructor 'quarantine_threshold' config option.

    Before quarantining a fragment the reconstructor will attempt to fetch
    fragments from handoff nodes in addition to the usual primary nodes.
    The handoff requests are limited by a new 'request_node_count'
    config option.

'quarantine_threshold' defaults to zero i.e. no fragments will be
quarantined. 'request node count' defaults to '2 * replicas'.

Closes-Bug: 1655608

Change-Id: I08e1200291833dea3deba32cdb364baa99dc2816

Changed in swift:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-07-27: Fix included in openstack/swift 2.28.0

#18

This issue was fixed in the openstack/swift 2.28.0 release.

OpenStack Object Storage (swift)

reconstructor tries to reconstruct a deleted object from an orphan fragment

Bug Description

Other bug subscribers

Remote bug watches