get_container_info should have some probability of bypassing memcache and going to disk

Bug #1883324 reported by Tim Burke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

If you've got thousands of requests per second for objects in a single container, you basically NEVER want that container's info to ever fall out of memcache. If it *does*, all those clients are going to overload the container -- most will fail, and with bug #1883211 the ones that succeed may not get their info into memcache long enough to help much. You can try increasing recheck_container_existence, but eventually it's still going to fall out.

The solution (in my mind, anyway) is to have some small probability -- 0.01%, say -- of get_container_info skipping memcache all together and going out to the container-server any way; then we'll refresh the TTL in memcache and we're good for another minute.

Revision history for this message
clayg (clay-gerrard) wrote :

Does any HEAD to the container push out the TTL - or does it specifically need to get a get_container_info call that misses?

Revision history for this message
Tim Burke (1-tim-z) wrote :

Any GET or HEAD should do it -- the probe test in https://review.opendev.org/#/c/735359/ depends on that behavior.

Revision history for this message
Christian Schwede (cschwede) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/821921

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/736802
Committed: https://opendev.org/openstack/swift/commit/8c6ccb5fd41864155a043856ff9240e84999e4bf
Submitter: "Zuul (22348)"
Branch: master

commit 8c6ccb5fd41864155a043856ff9240e84999e4bf
Author: Tim Burke <email address hidden>
Date: Thu Jun 18 11:48:14 2020 -0700

    proxy: Add a chance to skip memcache when looking for shard ranges

    By having some small portion of calls skip cache and go straight to
    disk, we can ensure the cache is always kept fresh and never expires (at
    least, for active containers). Previously, when shard ranges fell out of
    cache there would frequently be a thundering herd that could overwhelm
    the container server, leading to 503s served to clients or an increase
    in async pendings.

    Include metrics for hit/miss/skip rates.

    Change-Id: I6d74719fb41665f787375a08184c1969c86ce2cf
    Related-Bug: #1883324

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/850954

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by "Tim Burke <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/swift/+/821921
Reason: Re-proposed as https://review.opendev.org/c/openstack/swift/+/850954 since CI went crazy here, and now Gerrit won't let me even leave a comment.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/850954
Committed: https://opendev.org/openstack/swift/commit/5c6407bf591121fa10f8a8b10d22b3a64b9c4fe9
Submitter: "Zuul (22348)"
Branch: master

commit 5c6407bf591121fa10f8a8b10d22b3a64b9c4fe9
Author: Tim Burke <email address hidden>
Date: Thu Jan 6 12:09:58 2022 -0800

    proxy: Add a chance to skip memcache for get_*_info calls

    If you've got thousands of requests per second for objects in a single
    container, you basically NEVER want that container's info to ever fall
    out of memcache. If it *does*, all those clients are almost certainly
    going to overload the container.

    Avoid this by allowing some small fraction of requests to bypass and
    refresh the cache, pushing out the TTL as long as there continue to be
    requests to the container. The likelihood of skipping the cache is
    configurable, similar to what we did for shard range sets.

    Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f
    Closes-Bug: #1883324

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.31.0

This issue was fixed in the openstack/swift 2.31.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.