inconsistent suffix hashes after ssync replication of a tombstone

Bug #1534276 reported by Alistair Coles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

When an object dir has a .data and a .meta file (so object_post_as_copy=False) it is possible for ssync to replicate a tombstone file to that dir, leaving a tombstone and a meta file. But the source dir may only have the tombstone. So the source and target dirs are inconsistent and their nodes will yield different suffix hashes, causing replication to repeatedly sync them.

cut-paste from commit message of patch that will be proposed:

    Consider two replicas of the same object whose ondisk files
    have diverged due to failures:

      A has t2.ts
      B has t1.data, t4.meta

    (The DELETE at t2 did not make it to B. The POST at t4 was
    rejected by A.)

    After ssync replication the two ondisk file sets will not be
    consistent:

      A has t2.ts (ssync cannot POST t4.meta to this node)
      B has t2.ts, t4.meta (ssync should not delete t4.meta,
                            there may be a t3.data somewhere)

    Consequenty the two nodes will report different hashes for the
    object's suffix, and replication will repeat, always with the
    inconsistent outcome. This scenario is reproduced by the probe
    test added in this patch.

    (Note that rsync replication does result in (t2.ts, t4.meta)
    on both nodes.)

    The solution is to change the way that suffix hashes are
    calculated. Currently the names of *all* files found in each
    object dir are added to the hash. With this patch the
    timestamps of only those files that could be used to
    construct a valid diskfile are added to the hash. File
    extensions are appended to the timestamp so that in most
    'normal' situations the result of the hashing is the same
    as before this patch. That avoids a storm of hash mismatches
    when this patch is deployed in an existing cluster.

    In the problem case described above, t4.meta is no longer
    added to the hash, since it is not useful for constructing
    a diskfile. (Note that t4.meta is not deleted because it
    may become useful should a t3.data be replicated in future).

CVE References

Revision history for this message
Alistair Coles (alistair-coles) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/267788
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=2d55960a221c9934680053873bf1355c4690bb19
Submitter: Jenkins
Branch: master

commit 2d55960a221c9934680053873bf1355c4690bb19
Author: Alistair Coles <email address hidden>
Date: Thu Jan 14 18:31:21 2016 +0000

    Fix inconsistent suffix hashes after ssync of tombstone

    Consider two replicas of the same object whose ondisk files
    have diverged due to failures:

      A has t2.ts
      B has t1.data, t4.meta

    (The DELETE at t2 did not make it to B. The POST at t4 was
    rejected by A.)

    After ssync replication the two ondisk file sets will not be
    consistent:

      A has t2.ts (ssync cannot POST t4.meta to this node)
      B has t2.ts, t4.meta (ssync should not delete t4.meta,
                            there may be a t3.data somewhere)

    Consequenty the two nodes will report different hashes for the
    object's suffix, and replication will repeat, always with the
    inconsistent outcome. This scenario is reproduced by the probe
    test added in this patch.

    (Note that rsync replication does result in (t2.ts, t4.meta)
    on both nodes.)

    The solution is to change the way that suffix hashes are
    calculated. Currently the names of *all* files found in each
    object dir are added to the hash. With this patch the
    timestamps of only those files that could be used to
    construct a valid diskfile are added to the hash. File
    extensions are appended to the timestamp so that in most
    'normal' situations the result of the hashing is the same
    as before this patch. That avoids a storm of hash mismatches
    when this patch is deployed in an existing cluster.

    In the problem case described above, t4.meta is no longer
    added to the hash, since it is not useful for constructing
    a diskfile. (Note that t4.meta is not deleted because it
    may become useful should a t3.data be replicated in future).

    Closes-Bug: 1534276
    Change-Id: I99e88b8d5f5d9bc22b42112a99634ba942415e05

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/288628

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/crypto)
Download full text (20.1 KiB)

Reviewed: https://review.openstack.org/288628
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=6731dd6da8b3723c8dd821b6b4ba40271b4ecb74
Submitter: Jenkins
Branch: feature/crypto

commit fad5fabe0a22e8a86635a66523dd3d3d3b1fa705
Author: Tim Burke <email address hidden>
Date: Thu Mar 3 15:07:08 2016 +0000

    During functional tests, 404 response to a DELETE is successful

    Previously, we would only consider 204 responses successful, which would
    cause some spurious gate failures, such as

    http://logs.openstack.org/66/287666/3/check/gate-swift-dsvm-functional/c6d2673/console.html#_2016-03-03_13_41_07_846

    Change-Id: Ic8c300647924352a297a2781b50064f7657038b4

commit e91de49d6864b3794f8dc5acd9c1bf0c2f7409d1
Author: Alistair Coles <email address hidden>
Date: Mon Aug 10 10:30:10 2015 -0500

    Update container on fast-POST

    This patch makes a number of changes to enable content-type
    metadata to be updated when using the fast-POST mode of
    operation, as proposed in the associated spec [1].

    * the object server and diskfile are modified to allow
      content-type to be updated by a POST and the updated value
      to be stored in .meta files.

    * the object server accepts PUTs and DELETEs with older
      timestamps than existing .meta files. This is to be
      consistent with replication that will leave a later .meta
      file in place when replicating a .data file.

    * the diskfile interface is modified to provide accessor
      methods for the content-type and its timestamp.

    * the naming of .meta files is modified to encode two
      timestamps when the .meta file contains a content-type value
      that was set prior to the latest metadata update; this
      enables consistency to be achieved when rsync is used for
      replication.

    * ssync is modified to sync meta files when content-type
      differs between local and remote copies of objects.

    * the object server issues container updates when handling
      POST requests, notifying the container server of the current
      immutable metadata (etag, size, hash, swift_bytes),
      content-type with their respective timestamps, and the
      mutable metadata timestamp.

    * the container server maintains the most recently reported
      values for immutable metadata, content-type and mutable
      metadata, each with their respective timestamps, in a single
      db row.

    * new probe tests verify that replication achieves eventual
      consistency of containers and objects after discrete updates
      to content-type and mutable metadata, and that container-sync
      sync's objects after fast-post updates.

    [1] spec change-id: I60688efc3df692d3a39557114dca8c5490f7837e

    Change-Id: Ia597cd460bb5fd40aa92e886e3e18a7542603d01

commit 3c61ab4678a7aa9ff256ace4bc97ab449607fd49
Author: asettle <email address hidden>
Date: Wed Feb 10 17:58:05 2016 +1000

    Operational procedures guide

    This is the operational procedures guide that HPE used
    to operate and monitor their public Swift systems.
    It has been made publicly availabl...

tags: added: in-feature-crypto
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/290148

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (71.7 KiB)

Reviewed: https://review.openstack.org/290148
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0f7f1de233919a0b046349a3e31ae7fc8675a1c5
Submitter: Jenkins
Branch: feature/hummingbird

commit d6b4587a554b51ba733b151e0d924735b63d07e0
Author: Olga Saprycheva <email address hidden>
Date: Tue Mar 8 10:57:56 2016 -0600

    Removed redundant file for flake8 check

    Change-Id: I4322978aa20ee731391f7709bbd79dee140fc703

commit 643dbce134140530eef2ae62c42fef1107f905ed
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Mar 8 06:35:49 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I96b8ff1287bf219c5f8d56a3a4868c1063a953f9

commit 83713d37f0331c5ce9d377f4b4e8724551ae30ca
Author: Daisuke Morita <email address hidden>
Date: Mon Mar 7 18:30:47 2016 -0800

    Missing comments for storage policy parameter

    There are missing comments about storege_policy_index so appropriate
    comments are added.

    Change-Id: I3de3f0e6864e65918ca1a13cce70f19c23d295f5

commit 2cff2dec3d1c4588f5103e39679c43b3dded6dcb
Author: Olga Saprycheva <email address hidden>
Date: Fri Mar 4 15:19:39 2016 -0600

    Fixed pep8 and flake8 errors in doc/source/conf.py and updated flake8 commands in tox.ini to test it.

    Change-Id: I2add370e4cfb55d1388e3a8b41f688a7f3f2c621

commit 043fbca6d08648baa314ea2236f1ccdca8785f16
Author: Christian Schwede <email address hidden>
Date: Fri Mar 4 09:33:17 2016 +0000

    Remove Erasure Coding beta status from docs

    This removes notes stating support for Erasure coding as beta. Questions
    regarding the stability of EC are coming up regularly, and are often referring
    to the docs that state EC as still in beta.

    Besides this, a note marking statsd support as beta has been removed as well.

    Change-Id: If4fb6a5c4cb741d42953db3cee8cb17a1d774e15

commit 09c73b86e9255f28fbd4cf571a52c17d549a8f9a
Author: Pete Zaitcev <email address hidden>
Date: Thu Mar 3 10:24:28 2016 -0700

    Fix a crash in exception printout

    Says the number of arguments does not match the number of '%'.

    Change-Id: I8b5e395a07328fb9d4ac7a19f8ed2ae1637bee3b

commit fad5fabe0a22e8a86635a66523dd3d3d3b1fa705
Author: Tim Burke <email address hidden>
Date: Thu Mar 3 15:07:08 2016 +0000

    During functional tests, 404 response to a DELETE is successful

    Previously, we would only consider 204 responses successful, which would
    cause some spurious gate failures, such as

    http://logs.openstack.org/66/287666/3/check/gate-swift-dsvm-functional/c6d2673/console.html#_2016-03-03_13_41_07_846

    Change-Id: Ic8c300647924352a297a2781b50064f7657038b4

commit e91de49d6864b3794f8dc5acd9c1bf0c2f7409d1
Author: Alistair Coles <email address hidden>
Date: Mon Aug 10 10:30:10 2015 -0500

    Update container on fast-POST

    This patch makes a number of changes to enable content-type
    metadata to be updated when using the fast-POST mode of
    operation, as proposed in the associated spec ...

tags: added: in-feature-hummingbird
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/swift 2.7.0

This issue was fixed in the openstack/swift 2.7.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/swift 2.7.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.