Bug #1665141 “Reconstructor should not hash suffixes after failu...” : Bugs : OpenStack Object Storage (swift)

Revision history for this message

clayg (clay-gerrard) wrote on 2017-02-15:

#1

only sync on success Edit (628 bytes, text/plain)

clayg (clay-gerrard) on 2017-02-15

Changed in swift:
importance:	Undecided → High

clayg (clay-gerrard) on 2017-02-15

summary:

- Reconstructor should not sync after failure
+ Reconstructor should not hash suffixes after failure

Revision history for this message

clayg (clay-gerrard) wrote on 2017-03-08:

#2

https://review.openstack.org/#/c/435152/

clayg (clay-gerrard) on 2017-04-12

Changed in swift:
importance:	High → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-01: Fix merged to swift (master)

#3

Reviewed: https://review.openstack.org/435152
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a0fcca1e0576a0dba7a61f05c86aba23d6ddd27f
Submitter: Jenkins
Branch: master

commit a0fcca1e0576a0dba7a61f05c86aba23d6ddd27f
Author: Clay Gerrard <email address hidden>
Date: Thu Feb 16 14:14:09 2017 -0800

Do not sync suffixes when remote rejects reconstructor revert

    SSYNC is designed to limit concurrent incoming connections in order to
    prevent IO contention. The reconstructor should expect remote
    replication servers to fail ssync_sender when the remote is too busy.
    When the remote rejects SSYNC - it should avoid forcing additional IO
    against the remote with a REPLICATE request which causes suffix
    rehashing.

Suffix rehashing via REPLICATE verbs takes two forms:

    1) a initial pre-flight call to REPLICATE /dev/part will cause a remote
    primary to rehash any invalid suffixes and return a map for the local
    sender to compare so that a sync can be performed on any mis-matched
    suffixes.

    2) a final call to REPLICATE /dev/part/suf1-suf2-suf3[-sufX[...]] will
    cause the remote primary to rehash the *given* suffixes even if they are
    *not* invalid. This is a requirement for rsync replication because
    after a suffix is synced via rsync the contents of a suffix dir will
    likely have changed and the remote server needs to update it hashes.pkl
    to reflect the new data.

    SSYNC does not *need* to send a post-sync REPLICATE request. Any
    suffixes that are modified by the SSYNC protocol will call _finalize_put
    under the hood as it is syncing. It is however not harmful and
    potentially useful to go ahead refresh hashes after an SSYNC while the
    inodes of those suffixes are warm in the cache.

    However, that only makes sense if the SSYNC conversation actually synced
    any suffixes - if SSYNC is rejected for concurrency before it ever got
    started there is no value in the remote performing a rehash. It may be
    that *another* reconstructor is pushing data into that same partition
    and the suffixes will become immediately invalidated.

    If a ssync_sender does not successful finish a sync the reconstructor
    should skip the REPLICATE call entirely and move on to the next
    partition without causing any useless remote IO.

Closes-Bug: #1665141

Change-Id: Ia72c407247e4525ef071a1728750850807ae8231

Reviewed:  https://review.openstack.org/435152
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a0fcca1e0576a0dba7a61f05c86aba23d6ddd27f
Submitter: Jenkins
Branch:    master

commit a0fcca1e0576a0dba7a61f05c86aba23d6ddd27f
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Thu Feb 16 14:14:09 2017 -0800

Do not sync suffixes when remote rejects reconstructor revert
    
    SSYNC is designed to limit concurrent incoming connections in order to
    prevent IO contention.  The reconstructor should expect remote
    replication servers to fail ssync_sender when the remote is too busy.
    When the remote rejects SSYNC - it should avoid forcing additional IO
    against the remote with a REPLICATE request which causes suffix
    rehashing.
    
    Suffix rehashing via REPLICATE verbs takes two forms:
    
    1) a initial pre-flight call to REPLICATE /dev/part will cause a remote
    primary to rehash any invalid suffixes and return a map for the local
    sender to compare so that a sync can be performed on any mis-matched
    suffixes.
    
    2) a final call to REPLICATE /dev/part/suf1-suf2-suf3[-sufX[...]] will
    cause the remote primary to rehash the *given* suffixes even if they are
    *not* invalid.  This is a requirement for rsync replication because
    after a suffix is synced via rsync the contents of a suffix dir will
    likely have changed and the remote server needs to update it hashes.pkl
    to reflect the new data.
    
    SSYNC does not *need* to send a post-sync REPLICATE request.  Any
    suffixes that are modified by the SSYNC protocol will call _finalize_put
    under the hood as it is syncing.  It is however not harmful and
    potentially useful to go ahead refresh hashes after an SSYNC while the
    inodes of those suffixes are warm in the cache.
    
    However, that only makes sense if the SSYNC conversation actually synced
    any suffixes - if SSYNC is rejected for concurrency before it ever got
    started there is no value in the remote performing a rehash.  It may be
    that *another* reconstructor is pushing data into that same partition
    and the suffixes will become immediately invalidated.
    
    If a ssync_sender does not successful finish a sync the reconstructor
    should skip the REPLICATE call entirely and move on to the next
    partition without causing any useless remote IO.
    
    Closes-Bug: #1665141
    
    Change-Id: Ia72c407247e4525ef071a1728750850807ae8231

Changed in swift:
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-16: Fix proposed to swift (stable/ocata)

#4

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/464980

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-16: Fix proposed to swift (stable/newton)

#5

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/464982

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-20: Fix merged to swift (stable/ocata)

#6

Reviewed: https://review.openstack.org/464980
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e127f2277c4436a97f5b2d74307a31af2c98297f
Submitter: Jenkins
Branch: stable/ocata

commit e127f2277c4436a97f5b2d74307a31af2c98297f
Author: Clay Gerrard <email address hidden>
Date: Thu Feb 16 14:14:09 2017 -0800

Do not sync suffixes when remote rejects reconstructor revert

    SSYNC is designed to limit concurrent incoming connections in order to
    prevent IO contention. The reconstructor should expect remote
    replication servers to fail ssync_sender when the remote is too busy.
    When the remote rejects SSYNC - it should avoid forcing additional IO
    against the remote with a REPLICATE request which causes suffix
    rehashing.

Suffix rehashing via REPLICATE verbs takes two forms:

    1) a initial pre-flight call to REPLICATE /dev/part will cause a remote
    primary to rehash any invalid suffixes and return a map for the local
    sender to compare so that a sync can be performed on any mis-matched
    suffixes.

    2) a final call to REPLICATE /dev/part/suf1-suf2-suf3[-sufX[...]] will
    cause the remote primary to rehash the *given* suffixes even if they are
    *not* invalid. This is a requirement for rsync replication because
    after a suffix is synced via rsync the contents of a suffix dir will
    likely have changed and the remote server needs to update it hashes.pkl
    to reflect the new data.

    SSYNC does not *need* to send a post-sync REPLICATE request. Any
    suffixes that are modified by the SSYNC protocol will call _finalize_put
    under the hood as it is syncing. It is however not harmful and
    potentially useful to go ahead refresh hashes after an SSYNC while the
    inodes of those suffixes are warm in the cache.

    However, that only makes sense if the SSYNC conversation actually synced
    any suffixes - if SSYNC is rejected for concurrency before it ever got
    started there is no value in the remote performing a rehash. It may be
    that *another* reconstructor is pushing data into that same partition
    and the suffixes will become immediately invalidated.

    If a ssync_sender does not successful finish a sync the reconstructor
    should skip the REPLICATE call entirely and move on to the next
    partition without causing any useless remote IO.

Closes-Bug: #1665141

Change-Id: Ia72c407247e4525ef071a1728750850807ae8231

Reviewed:  https://review.openstack.org/464980
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=e127f2277c4436a97f5b2d74307a31af2c98297f
Submitter: Jenkins
Branch:    stable/ocata

commit e127f2277c4436a97f5b2d74307a31af2c98297f
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Thu Feb 16 14:14:09 2017 -0800

Do not sync suffixes when remote rejects reconstructor revert
    
    SSYNC is designed to limit concurrent incoming connections in order to
    prevent IO contention.  The reconstructor should expect remote
    replication servers to fail ssync_sender when the remote is too busy.
    When the remote rejects SSYNC - it should avoid forcing additional IO
    against the remote with a REPLICATE request which causes suffix
    rehashing.
    
    Suffix rehashing via REPLICATE verbs takes two forms:
    
    1) a initial pre-flight call to REPLICATE /dev/part will cause a remote
    primary to rehash any invalid suffixes and return a map for the local
    sender to compare so that a sync can be performed on any mis-matched
    suffixes.
    
    2) a final call to REPLICATE /dev/part/suf1-suf2-suf3[-sufX[...]] will
    cause the remote primary to rehash the *given* suffixes even if they are
    *not* invalid.  This is a requirement for rsync replication because
    after a suffix is synced via rsync the contents of a suffix dir will
    likely have changed and the remote server needs to update it hashes.pkl
    to reflect the new data.
    
    SSYNC does not *need* to send a post-sync REPLICATE request.  Any
    suffixes that are modified by the SSYNC protocol will call _finalize_put
    under the hood as it is syncing.  It is however not harmful and
    potentially useful to go ahead refresh hashes after an SSYNC while the
    inodes of those suffixes are warm in the cache.
    
    However, that only makes sense if the SSYNC conversation actually synced
    any suffixes - if SSYNC is rejected for concurrency before it ever got
    started there is no value in the remote performing a rehash.  It may be
    that *another* reconstructor is pushing data into that same partition
    and the suffixes will become immediately invalidated.
    
    If a ssync_sender does not successful finish a sync the reconstructor
    should skip the REPLICATE call entirely and move on to the next
    partition without causing any useless remote IO.
    
    Closes-Bug: #1665141
    
    Change-Id: Ia72c407247e4525ef071a1728750850807ae8231

tags:

added: in-stable-ocata

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-31: Fix included in openstack/swift 2.13.1

#7

This issue was fixed in the openstack/swift 2.13.1 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-06-16: Change abandoned on swift (stable/newton)

#8

Change abandoned by John Dickinson (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/464982
Reason: This backport depends on a feature that was landed after newton, so we're not going to backport this to newton.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-07-28: Fix included in openstack/swift 2.15.0

#9

This issue was fixed in the openstack/swift 2.15.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-03: Fix proposed to swift (master)

#10

Fix proposed to branch: master
Review: https://review.opendev.org/662735

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-06: Fix merged to swift (master)

#11

Reviewed: https://review.opendev.org/662735
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=37fa12cd83849a3ae8374ff07861d1d710d53174
Submitter: Zuul
Branch: master

commit 37fa12cd83849a3ae8374ff07861d1d710d53174
Author: Kuan-Lin Chen <email address hidden>
Date: Mon Jun 3 18:39:51 2019 +0800

Do not sync suffixes when remote rejects reconstructor sync

    The commit a0fcca1e makes reconstructor not sync suffixes when remote
    reject reconstructor revert. However, the exact same logic should
    be applied to SYNC job as well. REPLICATE requests aren't generally
    needed when using SSYC (which the reconstructor always does).

    If a ssync_sender fails to finish a sync the reconstructor should skip
    the REPLICATE call entirely and move on to the next partition without
    causing any useless remote IO.

Change-Id: Ida50539e645ea7e2950ba668c7f031a8d10da787
Closes-Bug: #1665141

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-13: Fix proposed to swift (feature/losf)

#12

Fix proposed to branch: feature/losf
Review: https://review.opendev.org/665170

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-14: Fix merged to swift (feature/losf)

#13

Download full text (3.5 KiB)

Reviewed: https://review.opendev.org/665170
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=b1ad1bcec95dbf898764b61d063490a36ae75c29
Submitter: Zuul
Branch: feature/losf

commit aa2f1db1b71c1b2bf746b72515e3efd15598b6aa
Author: Tim Burke <email address hidden>
Date: Tue Jun 11 14:50:49 2019 -0700

Ensure get_*_info keys are native strings

Change-Id: I29bbea48ae38cfabf449a9f4cca1f5f27769405a

commit b7b92b97b12f2a5c0e1beed59b6ffd4791cec896
Author: Tim Burke <email address hidden>
Date: Mon Jun 10 15:40:58 2019 -0700

Bump up minimum cryptography version

    ...not because we strictly *need* newer cryptography, but rather because
    distro packages have moved forward to the point where the 1.x series
    won't compile from source and PyPI doesn't have wheels for them.

See changes like:

- https://github.com/pyca/cryptography/commit/6e7ea2e
- https://github.com/pyca/cryptography/commit/f88aea5

Change-Id: I1ff5b61873cf382c7a89873ed4ba6153f299262a

commit dca658103a63d212bdf9195fcde6038557c13401
Author: Clay Gerrard <email address hidden>
Date: Thu Jun 6 14:25:22 2019 -0500

Fix swift with python <2.7.9

Closes-Bug: #1831932

Change-Id: I0d33864f4bffa401082548ee9a52f6eb50cb1f39

commit d9cafca246bb15e706d9f7546e1f4bedda1b6c8b
Author: Tim Burke <email address hidden>
Date: Tue May 21 18:04:05 2019 -0700

py3: port ssync

Change-Id: I63a502be13f5dcda2a457d38f2fc5f1ca469d562

commit 98637dc1e7a6ef5641079a6226d12bf106436b35
Author: 翟小君 <email address hidden>
Date: Wed Jun 5 12:35:00 2019 +0800

Bump openstackdocstheme to 1.30.0

...to pick up many improvements, including the return of table borders.

Change-Id: I166211b690b08521171b489582fa419d756b1972

commit 37fa12cd83849a3ae8374ff07861d1d710d53174
Author: Kuan-Lin Chen <email address hidden>
Date: Mon Jun 3 18:39:51 2019 +0800

Do not sync suffixes when remote rejects reconstructor sync

    The commit a0fcca1e makes reconstructor not sync suffixes when remote
    reject reconstructor revert. However, the exact same logic should
    be applied to SYNC job as well. REPLICATE requests aren't generally
    needed when using SSYC (which the reconstructor always does).

    If a ssync_sender fails to finish a sync the reconstructor should skip
    the REPLICATE call entirely and move on to the next partition without
    causing any useless remote IO.

Change-Id: Ida50539e645ea7e2950ba668c7f031a8d10da787
Closes-Bug: #1665141

commit 2e35376c6d6afb5aa2a36081861bab011c8c95c3
Author: Tim Burke <email address hidden>
Date: Thu May 30 11:55:58 2019 -0700

py3: symlink follow-up

- Have the unit tests use WSGI strings, like a real system.
- Port the func tests.

Change-Id: I3a6f409208de45ebf9f55f7f59e4fe6ac6fbe163

commit 82e446a8a0c0fd6a81f06717b76ed3d1be26a281
Author: Tim Burke <email address hidden>
Date: Mon May 20 11:44:21 2019 -0700

s3api: Allow clients to upload with UNSIGNED-PAYLOAD

(Some versions of?) awscli/boto3 will do v4 signatures but send a
Content-MD5 for end-to-end validation. Sin...

OpenStack Object Storage (swift)

Reconstructor should not hash suffixes after failure

Bug Description

Other bug subscribers

Patches

Remote bug watches