OpenStack Object Storage (swift)

Bug #1452553
Activity log

Activity log for bug #1452553

Date	Who	What changed	Old value	New value	Message
2015-05-07 04:27:52	clayg	bug			added bug
2015-05-08 03:27:11	clayg	description	The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting itself) and feed their fragments those through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that includes the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678	The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments those through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that includes the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678
2015-05-08 03:27:26	clayg	description	The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments those through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that includes the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678	The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that includes the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678
2015-05-08 04:15:38	clayg	attachment added		don't use useless fragment archives in rebuild https://bugs.launchpad.net/swift/+bug/1452553/+attachment/4393343/+files/reconstructor.patch
2015-05-11 19:35:38	clayg	tags		ec
2015-06-11 18:49:53	Minwoo Bae	swift: assignee		Minwoo Bae (minwoob)
2015-06-18 19:53:18	OpenStack Infra	swift: status	New	In Progress
2015-06-25 07:29:47	clayg	swift: importance	Undecided	Critical
2015-06-29 17:48:24	clayg	summary	don't rebuild existing fragments	Skip over extra fragments that are not useful to reconstruct
2015-06-29 17:51:46	clayg	description	The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that includes the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678	Trying to rebuild a fragment index from a set of fragments that includes duplicates will not work. The reconstructor should skip over nodes offering fragments it already has connections for. Trying to rebuild a fragment index from a set of fragments that includes the fragment you're trying to rebuild can actually cause a segfault on some backends [1]. Because the segfault will kill the process - as a quick workaround it should skip those to. See bug #1469815 for handling this case more efficiently. 1. https://gist.github.com/clayg/396f341396cf43c99678
2015-07-07 22:24:19	OpenStack Infra	swift: status	In Progress	Fix Committed
2015-07-15 21:17:16	OpenStack Infra	tags	ec	ec in-feature-hummingbird
2015-07-15 21:17:17	OpenStack Infra	bug watch added		http://bugs.python.org/issue16037
2015-07-24 23:49:01	OpenStack Infra	tags	ec in-feature-hummingbird	ec in-feature-crypto in-feature-hummingbird
2015-09-01 12:25:19	Thierry Carrez	swift: status	Fix Committed	Fix Released
2015-09-01 12:25:19	Thierry Carrez	swift: milestone		2.4.0