Activity log for bug #1452553

Date Who What changed Old value New value Message
2015-05-07 04:27:52 clayg bug added bug
2015-05-08 03:27:11 clayg description The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting itself) and feed their fragments those through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that *includes* the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678 The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments those through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that *includes* the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678
2015-05-08 03:27:26 clayg description The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments those through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that *includes* the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678 The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that *includes* the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678
2015-05-08 04:15:38 clayg attachment added don't use useless fragment archives in rebuild https://bugs.launchpad.net/swift/+bug/1452553/+attachment/4393343/+files/reconstructor.patch
2015-05-11 19:35:38 clayg tags ec
2015-06-11 18:49:53 Minwoo Bae swift: assignee Minwoo Bae (minwoob)
2015-06-18 19:53:18 OpenStack Infra swift: status New In Progress
2015-06-25 07:29:47 clayg swift: importance Undecided Critical
2015-06-29 17:48:24 clayg summary don't rebuild existing fragments Skip over extra fragments that are not useful to reconstruct
2015-06-29 17:51:46 clayg description The reconstructor will rebuild any fragments it finds missing from it's partners - to do so it will GET fragment archives from the other primaries (excepting the failure) and feed their fragments through pyeclib.reconstruct to rebuild the missing fragment archive which it ships over to the other primary via ssync. This is not efficient on rebalance where the fragment archive exists - it's just on a handoff. If we were to setup the response green pile to include handoffs (possibly first) there's a non-zero chance we'd find the very fragment archive we're considering rebuilding! But either way, when gathering responses we should at a minimum verify that the fragment we want to rebuild isn't in the collection of nodes we're talking to - because if it is - it'd probably be better to let that node ship it over to the primary on it's own, or at a minimum close the other connections and just send the full fragment archive raw. Trying to rebuild a fragment index from a set of fragments that *includes* the fragment you're trying to rebuild can acctually cause a segfault on some backends [1]. 1. https://gist.github.com/clayg/396f341396cf43c99678 Trying to rebuild a fragment index from a set of fragments that includes duplicates will not work. The reconstructor should skip over nodes offering fragments it already has connections for. Trying to rebuild a fragment index from a set of fragments that *includes* the fragment you're trying to rebuild can actually cause a segfault on some backends [1]. Because the segfault will kill the process - as a quick workaround it should skip those to. See bug #1469815 for handling this case more efficiently. 1. https://gist.github.com/clayg/396f341396cf43c99678
2015-07-07 22:24:19 OpenStack Infra swift: status In Progress Fix Committed
2015-07-15 21:17:16 OpenStack Infra tags ec ec in-feature-hummingbird
2015-07-15 21:17:17 OpenStack Infra bug watch added http://bugs.python.org/issue16037
2015-07-24 23:49:01 OpenStack Infra tags ec in-feature-hummingbird ec in-feature-crypto in-feature-hummingbird
2015-09-01 12:25:19 Thierry Carrez swift: status Fix Committed Fix Released
2015-09-01 12:25:19 Thierry Carrez swift: milestone 2.4.0