suffix hash invalidation may be lost

Bug #1651530 reported by Alistair Coles
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Critical
Unassigned

Bug Description

A fix for this has already been proposed in https://review.openstack.org/#/c/402043 but I'm filing this bug to keep a record of my understanding of how the race condition occurs.

Since commit 22685d6 (AFAICT) there is a possibility that a suffix hash invalidation triggered by a change to a diskfile may be lost, i.e. not register the suffix as needing its hash recalculated by diskfile's get_hashes method.

The scenario is:

No hashes.pkl for the partition exists.

Process X (e.g. replicator) calls get_hashes->_get_hashes->consolidate_hashes, which returns None because no hashes.pkl exists

In _get_hashes, process X does a listdir to find all suffixes in the partition, and then proceeds to calculate their hashes.

While that is happening, process Y (e.g. a DELETE) calls invalidate_hash for a new suffix (or one whose hash has already been calculated by process X). Because no hashes.pkl yet exists, invalidate_hash does nothing i.e. the invalidated suffix is not written to the hashes.invalid file. The presumption is that the absence of a hases.pkl file is sufficient to cause the suffix hash to be recalculated, but of course, that is already in progress in process X.

Process X completes recalculation of the suffixes it knew about and writes hashes.pkl. The invalidation event for the suffix that was changed by process Y is "lost".

The solution is to always create a hashes.pkl file when it does not exist in consolidate_hashes, so that a subsequent call to invalidate_hash in another process will write a suffix to hashes.invalid. That suffix will then eventually be consolidated to hashes.pkl

Changed in swift:
importance: Undecided → Critical
Revision history for this message
Alistair Coles (alistair-coles) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/418690
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=442cc1d16d45fbfd50c57fc62e2582674d7348c4
Submitter: Jenkins
Branch: master

commit 442cc1d16d45fbfd50c57fc62e2582674d7348c4
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 10 18:53:08 2017 -0800

    Fix race in new partitions detecting new/invalid suffixes.

    The assumption that we don't need to write an entry in the invalidations
    file when the hashes.pkl does not exist turned out to be a premature
    optimization and also wrong.

    Primarily we should recognize the creation of hashes.pkl is the first
    thing that happens in a part when it lands on a new primary. The code
    should be optimized toward the assumption of the most common disk state.

    Also, in this case the extra stat calls to check if the hashes.pkl exists
    were not only un-optimized - but introducing a race.

    Consider the common case:

    proc 1 | proc 2
    -------------------------------|---------------------------
    a) read then truncate journal |
    b) do work | c) append to journal
    d) apply "a" to index |

    The index written at "d" may not (yet) reflect the entry writen by proc
    2 at "c"; however, it's clearly in the journal so it's easy to see we're
    safe.

    Adding in the extra stat call for the index existence check increases
    the state which can effect correctness.

    proc 1 | proc 2
    ------------------------------|---------------------------
    a) no index, truncate journal |
    b) do work | b) iff index exists
                                  | c) append to journal
    d) apply (or create) index |

    If step "c" doesn't happen because the index does not yet exist - the
    update is clearly lost.

    In our case we'd skip marking a suffix as invalid when the hashes.pkl
    does not exist because we know "the next time we rehash" we'll have to
    os.listdir suffixes anyway. But if another process is *currently*
    rehashing (and has already done it's os.listdir) instead we've just
    dropped an invalidation on the floor.

    Don't do that.

    Instead - write down the invalidation. The running rehash is welcome to
    proceed on outdated information - as long as the next pass will grab and
    hash the new suffix.

    Known-Issue(s):

    If the suffix already exists there's an even chance the running rehash
    will hash in the very update for which we want to invalidate the suffix,
    but that's ok it's idempotent.

    Co-Author: Pavel Kvasnička <email address hidden>
    Co-Author: Alistair Coles <email address hidden>
    Co-Author: Kota Tsuyuzaki <email address hidden>
    Related-Change-Id: I64cadb1a3feb4d819d545137eecfc295389794f0
    Change-Id: I2b48238d9d684e831d9777a7b18f91a3cef57cd1
    Closes-Bug: #1651530

Changed in swift:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/425441

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by Pavel Kvasnička (<email address hidden>) on branch: master
Review: https://review.openstack.org/402043

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/433565

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.13.0

This issue was fixed in the openstack/swift 2.13.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (stable/newton)

Reviewed: https://review.openstack.org/433565
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=054ee3d44e949378e128f9bf487b12e7afda1223
Submitter: Jenkins
Branch: stable/newton

commit 054ee3d44e949378e128f9bf487b12e7afda1223
Author: Clay Gerrard <email address hidden>
Date: Tue Jan 10 18:53:08 2017 -0800

    Fix race in new partitions detecting new/invalid suffixes.

    The assumption that we don't need to write an entry in the invalidations
    file when the hashes.pkl does not exist turned out to be a premature
    optimization and also wrong.

    Primarily we should recognize the creation of hashes.pkl is the first
    thing that happens in a part when it lands on a new primary. The code
    should be optimized toward the assumption of the most common disk state.

    Also, in this case the extra stat calls to check if the hashes.pkl exists
    were not only un-optimized - but introducing a race.

    Consider the common case:

    proc 1 | proc 2
    -------------------------------|---------------------------
    a) read then truncate journal |
    b) do work | c) append to journal
    d) apply "a" to index |

    The index written at "d" may not (yet) reflect the entry writen by proc
    2 at "c"; however, it's clearly in the journal so it's easy to see we're
    safe.

    Adding in the extra stat call for the index existence check increases
    the state which can effect correctness.

    proc 1 | proc 2
    ------------------------------|---------------------------
    a) no index, truncate journal |
    b) do work | b) iff index exists
                                  | c) append to journal
    d) apply (or create) index |

    If step "c" doesn't happen because the index does not yet exist - the
    update is clearly lost.

    In our case we'd skip marking a suffix as invalid when the hashes.pkl
    does not exist because we know "the next time we rehash" we'll have to
    os.listdir suffixes anyway. But if another process is *currently*
    rehashing (and has already done it's os.listdir) instead we've just
    dropped an invalidation on the floor.

    Don't do that.

    Instead - write down the invalidation. The running rehash is welcome to
    proceed on outdated information - as long as the next pass will grab and
    hash the new suffix.

    Known-Issue(s):

    If the suffix already exists there's an even chance the running rehash
    will hash in the very update for which we want to invalidate the suffix,
    but that's ok it's idempotent.

    Co-Author: Pavel Kvasnička <email address hidden>
    Co-Author: Alistair Coles <email address hidden>
    Co-Author: Kota Tsuyuzaki <email address hidden>
    Related-Change-Id: I64cadb1a3feb4d819d545137eecfc295389794f0
    Change-Id: I2b48238d9d684e831d9777a7b18f91a3cef57cd1
    Closes-Bug: #1651530

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.10.2

This issue was fixed in the openstack/swift 2.10.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/661296

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers