Failed to ssync object repeatedly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Critical
|
Alistair Coles |
Bug Description
I'm solving a misplaced partition, but I get 500 error code multiple times.
Misplaced on node : d655
$ sudo /opt/ss/bin/python3 /opt/ss/
device: primary handoff( suffix) misplaced( suffix)
.....
d655: 0 0( 0) 1( 4089)
Get 500 error from remote node
$ /opt/ss/
object-
object-
object-
object-
object-
object-
object-
Error logs on remote node
2023-02-
2023-02-
....
Object info on remote node
$ ls
XXX.data XXX.meta
$ swift-object-info XXX
Path: XXX
Account: XXX
Container: XXX
Object: XXX
Object hash: XXX
Content-Type: binary/octet-stream
Timestamp: 2023-02-
System Metadata:
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
Transient System Metadata:
No metadata found
User Metadata:
No metadata found
Other Metadata:
No metadata found
ETag: XXX (valid)
Content-Length: 0 (valid)
Partition 229849
Hash XXX
Object info on source node
$ ls
XXX.data XXX.meta
$ swift-object-info XXX
Path: XXX
Account: XXX
Container: XXX
Object: XXX
Object hash: e0766dfd7443638
Content-Type: binary/octet-stream
Timestamp: 2023-02-
System Metadata:
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
X-Object-
Transient System Metadata:
No metadata found
User Metadata:
No metadata found
Other Metadata:
No metadata found
ETag: XXX (valid)
Content-Length: 0 (valid)
Partition 229849
Hash XXX
Changed in swift: | |
status: | New → Confirmed |
importance: | Undecided → Critical |
Changed in swift: | |
status: | Confirmed → In Progress |
The reconstructor/ssync appears to be repeatedly trying to sync an object that is already in sync. That will cause a 409 conflict in the ssync subrequest stream.
We inspected the on disk files for this object on the sender and receiver side. Both had identical timestamps. However, the data files have an offset timestamps, and there is also a meta file.
i.e. both sender and receiver have:
t0_1#2#d.data
t1.meta
The sender encodes these timestamps in a compact form and the receiver decodes them and compares to its on disk file set.
The encoding represents t1 (the meta timestamp) as a delta from the data timestamp, NOT INCLUDING the data timestamp offset, i.e. delta = t1 - t0.
The decoding is erroneously calculating the meta timestamp as the sum of the delta plus the data timestamp INCLUDING the offset.
So the ssync receiver therefore erroneously thinks that the sender has a newer meta file:
t0_1#2#d.data
t1_1.meta
and so the receiver request that the sender POSTs its meta file content. On receiving the POST, the ssync receiver object server compares the correct timestamps and discovers they are identical to what it has on disk and so returns a 409 Conflict response.