reconstructor may fail to rebuild frag if other frags have different *metadata* timestamps
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Alistair Coles |
Bug Description
When trying to rebuild a fragment missing on another node, the reconstructor makes requests for other fragments to other nodes and gathers responses in buckets according to the response X-Backend-
However, if objects have POSTed metadata then X-Backend-Timestamp is the timestamp of the .meta file, not the .data file. This can cause fragment responses with the same *data timestamp* to be placed in multiple buckets if some fragments have a .meta file in their hash dir while others do not (for example, because the POST may have landed on some handoffs).
Because frags are placed in different buckets, no single bucket may have enough frags to rebuild and so the reconstructor fails to rebuild the missing frag.
The situation would likely resolve itself once meta files have been replicated to all nodes, but there is a potential delay is increasing the durability of the data.
The buckets should be keyed by X-Backend-
This unit test patch (against commit 46b8f941920) illustrates the bug:
def test_reconstruc
# verify scenario where all fragments have same data timestamp but some
# have different meta timestamp
job = {
}
part_nodes = self.policy.
node = part_nodes[4]
test_data = (b'rebuild' * self.policy.
etag = md5(test_data, usedforsecurity
broken_body = ec_archive_
ts_data = next(self.ts_iter) # all frags .data timestamp
ts_meta = next(self.ts_iter) # some frags .meta timestamp
ts_cycle = itertools.
responses = list()
for body in ec_archive_bodies:
ts = next(ts_cycle) # vary timestamp between data and meta
headers = get_header_
codes, body_iter, headers_iter = zip(*responses)
with mocked_
df = self.reconstruc
Test should pass but currently fails:
E
=======
ERROR: test_reconstruc
-------
Traceback (most recent call last):
File "/Users/
job, node, dict(self.
File "/Users/
raise DiskFileError(
swift.common.
-------
Ran 1 test in 0.075s
FAILED (errors=1)
object-
object-
object-
Error
Traceback (most recent call last):
File "/Users/
yield
File "/Users/
testMethod()
File "/Users/
job, node, dict(self.
File "/Users/
raise DiskFileError(
swift.common.
Fix proposed to branch: master /review. opendev. org/c/openstack /swift/ +/790235
Review: https:/