reconstructor tries to reconstruct a deleted object from an orphan fragment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Medium
|
Alistair Coles |
Bug Description
I saw some logs in my reconstructor logs in below, it tries to reconstruct the deleted object from an orphan fragment.
```
root@dev21:
object-
object-
object-
object-
object-
object-
object-
object-
```
* Swift 2.9.0
* EC ring: 4+2
* Object reclaim age: 60 sec (for reproduce purpose)
```
root@dev21:
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
curl -g -I -XHEAD "http://
```
1) upload a file `swift upload ec_container rc.local`
2) unmount one of EC drive
3) upload the new file for the same object `swift upload ec_container rc.local.update --object-name rc.local`
4) mount all drives back
5) delete the object `swift delete ec_container rc.local`
6) run curl to check
```
root@dev21:
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 200 OK
Content-Length: 139
X-Backend-
X-Object-
Content-Type: application/
X-Object-
X-Object-
Last-Modified: Wed, 11 Jan 2017 09:43:48 GMT
Etag: "f66f68a63f4339
X-Timestamp: 1484127827.20242
X-Object-
X-Object-
X-Object-
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:27 GMT
```
Run object-
```
root@dev21:
object-
object-
object-
object-
root@dev21:
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 200 OK
Content-Length: 139
X-Backend-
X-Object-
Content-Type: application/
X-Object-
X-Object-
Last-Modified: Wed, 11 Jan 2017 09:43:48 GMT
Etag: "f66f68a63f4339
X-Timestamp: 1484127827.20242
X-Object-
X-Object-
X-Object-
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
HTTP/1.1 404 Not Found
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
```
The partition folder has new timestamp than other ts files, but the orphan fragment keeps original timestamp in his filename.
```
root@dev21:
total 8.0K
drwxr-xr-x 2 swift swift 67 Jan 11 09:46 .
drwxr-xr-x 3 swift swift 45 Jan 11 09:46 ..
-rw------- 1 swift swift 139 Jan 11 09:46 1484127827.
-rw-r--r-- 1 swift swift 0 Jan 11 09:46 1484127827.
root@dev21:
Wed Jan 11 09:43:47 UTC 2017
HTTP/1.1 404 Not Found
X-Backend-
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Date: Wed, 11 Jan 2017 09:46:48 GMT
root@dev21:
Wed Jan 11 09:45:10 UTC 2017
```
After that, reconstructor tries to reconstruct the deleted object form orphan fragment.
```
root@dev21:
object-
object-
object-
object-
object-
object-
object-
object-
```
Changed in swift: | |
status: | New → Confirmed |
Changed in swift: | |
importance: | Undecided → Medium |
Changed in swift: | |
assignee: | nobody → Alistair Coles (alistair-coles) |
Changed in swift: | |
status: | Confirmed → In Progress |
I knew about this. I remember chatting with acoles about it (in #openstack-swift on Freenode?) while working on https:/ /review. openstack. org/#/c/ 385609/ and the other related bugs. However; I don't remember filing a bug for this, and can't find anything. So thanks!
This is sort of the equivalent of EC data data - except instead of the whole object it's just a piece - and instead of happily and silently repopulating the object on all nodes you get annoying messages in your logs FOREVER.
The work around is two fold:
1) don't reintroduce nodes after reclaim age - because that makes dark data
2) use a time machine to not run EC policies with swift < 2.11 because there's since fixed bugs with the reconstructor that can prevent *any* progress which leads to out of date parts/suffixes (it was effectively like reconstructors had been off months on end and then when you upgrade it's totally possible tombstones on handoff nodes get reaped instead of clearing out these orphaned frags)
The fix is not obvious to me :\
I'm also not sure on the priority.
IIRC the "Unable to get enough responses" message just causes the reconstructor to move onto the next hash in the suffix without disrupting the ssync protocol. If that triage is incorrect it's probably HIGH or CRITICAL until we can find a workaround. We need to make sure the reconstructor can make *other* progress even if these frags are unprocessable.
As long as the reconstructor is otherwise making progress I think we can leave it at MEDIUM or LOW priority until we grow some more braincells... once a cluster is fully upgraded and rebalanced it seems manageable that you could even just extract offending object names from the logs and script an audit by hand. Blasting out tombstones over these names with a superadmin/reseller token would totally clean them up.