Bug #1781291 “sharding: container GETs to root container get slo...” : Bugs : OpenStack Object Storage (swift)

Revision history for this message

Matthew Oliver (matt-0) wrote on 2018-07-26:

#1

I like the idea of caching in the proxy, as the other options are leaving it to the sharder to move them so putting it off. In the shard case "most" the time the updates will goto the correct place, and when not, we're putting it off until later.

I don't think invalidation is necessary as if things go to the wrong place before the cache time/limit is up then we're just giving the sharder some work.
If we _want_ invalidation, we could get the get_container_info to include the latest shardrange timestamp of the container (timestamp not meta_timestamp as we'd only care if there was a structural change in the shard network). When this timestamp is > then the timestamp of the root container in the cache, we invalidate it and force a new GET.
The downside is that the info calls are also cached, so is it really important to invalidate then?

The real problem with shard caching is what to cache. If you look at the object controller in the proxy, we are only asking for the shard that object resides (https://github.com/openstack/swift/blob/2.18.0/swift/proxy/controllers/obj.py#L269-L281). This isn't a good cache candidate and it _should_ mostly just return a single shard.
And then theres the problem of what happens when there are thousands of shards? Maybe in the latter case sending to the wrong one isn't too bad a cost?

So what do we cache? We could cache the results of container GETs.. but how often does that happen?

We should get the sharder to update some memcache cache that the proxy can use.. but then what to cache, we don't want to cache _everything_ only the hot containers I'd expect.

Anyway. Just thought I'd put some comments in as I spend some time thinking about it.

Maybe caching, isn't the answer then? Or is there someway to identify the top containers to cache? Proxies update a hotlist in memcache, sharders/replicators/auditors (something) keeps caches of shards?

/me is just brain storming

Revision history for this message

Matthew Oliver (matt-0) wrote on 2018-07-26:

#2

s/We should get the sharder to update some memcache cache/We could get the sharder to update some memcache cache/

Revision history for this message

clayg (clay-gerrard) wrote on 2018-07-26:

#3

I agree caching is hard, but we need to keep thinking about it. The caching doesn't have to be in the proxy-server process or memcache.

I'm still concerned that the PUT rate on a single replicated but unsharded container's objects table (even accepting for .pending file bulk inserts) is faster than the GET rate to a single replicated sharded but root container's shard range table. It doesn't make sense to me that obviously needs to be true.

I really want to do some experimental analysis to provide some data on the cost of the API operations directly to the container-server API with some variables on concurrency and cardinality so we can all look at it.

It's possible the solution is to index the shard ranges table, tweak vfs_cache_pressure and let the page cache do it's thing.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-13: Fix merged to swift (master)

#4

Reviewed: https://review.opendev.org/667203
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a1af3811a79963e2e5d1db3e5588cbc9748f9d57
Submitter: Zuul
Branch: master

commit a1af3811a79963e2e5d1db3e5588cbc9748f9d57
Author: Tim Burke <email address hidden>
Date: Mon Jun 24 12:25:33 2019 -0700

sharding: Cache shard ranges for object writes

    Previously, we issued a GET to the root container for every object PUT,
    POST, and DELETE. This puts load on the container server, potentially
    leading to timeouts, error limiting, and erroneous 404s (!).

    Now, cache the complete set of 'updating' shards, and find the shard for
    this particular update in the proxy. Add a new config option,
    recheck_updating_shard_ranges, to control the cache time; it defaults to
    one hour. Set to 0 to fall back to previous behavior.

    Note that we should be able to tolerate stale shard data just fine; we
    already have to worry about async pendings that got written down with
    one shard but may not get processed until that shard has itself sharded
    or shrunk into another shard.

    Also note that memcache has a default value limit of 1MiB, which may be
    exceeded if a container has thousands of shards. In that case, set()
    will act like a delete(), causing increased memcache churn but otherwise
    preserving existing behavior. In the future, we may want to add support
    for gzipping the cached shard ranges as they should compress well.

Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
Closes-Bug: 1781291

Changed in swift:
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-17: Fix proposed to swift (feature/losf)

#5

Fix proposed to branch: feature/losf
Review: https://review.opendev.org/671181

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-17: Fix merged to swift (feature/losf)

#6

Download full text (3.9 KiB)

Reviewed: https://review.opendev.org/671181
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ea5cbf633cfa56a418a79cde4ccfea0c1f6f3f92
Submitter: Zuul
Branch: feature/losf

commit a1af3811a79963e2e5d1db3e5588cbc9748f9d57
Author: Tim Burke <email address hidden>
Date: Mon Jun 24 12:25:33 2019 -0700

sharding: Cache shard ranges for object writes

    Previously, we issued a GET to the root container for every object PUT,
    POST, and DELETE. This puts load on the container server, potentially
    leading to timeouts, error limiting, and erroneous 404s (!).

    Now, cache the complete set of 'updating' shards, and find the shard for
    this particular update in the proxy. Add a new config option,
    recheck_updating_shard_ranges, to control the cache time; it defaults to
    one hour. Set to 0 to fall back to previous behavior.

    Note that we should be able to tolerate stale shard data just fine; we
    already have to worry about async pendings that got written down with
    one shard but may not get processed until that shard has itself sharded
    or shrunk into another shard.

    Also note that memcache has a default value limit of 1MiB, which may be
    exceeded if a container has thousands of shards. In that case, set()
    will act like a delete(), causing increased memcache churn but otherwise
    preserving existing behavior. In the future, we may want to add support
    for gzipping the cached shard ranges as they should compress well.

Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
Closes-Bug: 1781291

commit 0ae1ad63c10ea8643fc0998dca159b844286c414
Author: zengjia <email address hidden>
Date: Wed Jul 10 15:46:32 2019 +0800

Update auth_url in install docs

    Beginning with the Queens release, the keystone install guide
    recommends running all interfaces on the same port.This patch
    updates the swift install guide to reflect that change

Change-Id: Id00cfd2c921da352abdbbbb6668b921f3cb31a1a
Closes-bug: #1754104

commit f4bb1bea28640dc603da891559387d6b15f1f2da
Author: Tim Burke <email address hidden>
Date: Wed Jul 10 23:48:39 2019 -0700

reconciler: Enqueue right work for shard containers

    This fixes newly-enqueued work going forward, but doesn't offer anything
    to try to parse existing bad work items or even to kick shards so they
    reset their reconciler high-water marks.

Change-Id: I79d20209cea70a6447c4e94941e5e854889cbec5
Closes-Bug: 1836082

commit c1d170225281a39973dda1b8e46f5b3b5c943d1a
Author: Tim Burke <email address hidden>
Date: Wed Jul 10 15:37:44 2019 -0700

functests: Make test_PUT_metadata less flakey

Change-Id: I846e746c2fe7591a3ab502428f587e3cbe753225

commit b10f4bae28bec9c0c394c340bf813a28ac8a3380
Author: Tim Burke <email address hidden>
Date: Tue Jun 18 09:54:02 2019 -0700

func tests: tolerate more 404s when deleting

Change-Id: I3129e4f94ac39964f2f17ea05b6b2dd807fba82a

commit 557335465561b7a00d08cc5a370d5fcd6e7d83b0
Author: Tim Burke <email address hidden>
Date: Sat Jun 1 10:46:54 2019 -0700

Move calls to self.app outside of error handling

On p...

Reviewed:  https://review.opendev.org/671181
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ea5cbf633cfa56a418a79cde4ccfea0c1f6f3f92
Submitter: Zuul
Branch:    feature/losf

commit a1af3811a79963e2e5d1db3e5588cbc9748f9d57
Author: Tim Burke <tim.burke@gmail.com>
Date:   Mon Jun 24 12:25:33 2019 -0700

sharding: Cache shard ranges for object writes
    
    Previously, we issued a GET to the root container for every object PUT,
    POST, and DELETE. This puts load on the container server, potentially
    leading to timeouts, error limiting, and erroneous 404s (!).
    
    Now, cache the complete set of 'updating' shards, and find the shard for
    this particular update in the proxy. Add a new config option,
    recheck_updating_shard_ranges, to control the cache time; it defaults to
    one hour. Set to 0 to fall back to previous behavior.
    
    Note that we should be able to tolerate stale shard data just fine; we
    already have to worry about async pendings that got written down with
    one shard but may not get processed until that shard has itself sharded
    or shrunk into another shard.
    
    Also note that memcache has a default value limit of 1MiB, which may be
    exceeded if a container has thousands of shards. In that case, set()
    will act like a delete(), causing increased memcache churn but otherwise
    preserving existing behavior. In the future, we may want to add support
    for gzipping the cached shard ranges as they should compress well.
    
    Change-Id: Ic7a732146ea19a47669114ad5dbee0bacbe66919
    Closes-Bug: 1781291

commit 0ae1ad63c10ea8643fc0998dca159b844286c414
Author: zengjia <zengjia@awcloud.com>
Date:   Wed Jul 10 15:46:32 2019 +0800

Update auth_url in install docs
    
    Beginning with the Queens release, the keystone install guide
    recommends running all interfaces on the same port.This patch
    updates the swift install guide to reflect that change
    
    Change-Id: Id00cfd2c921da352abdbbbb6668b921f3cb31a1a
    Closes-bug: #1754104

commit f4bb1bea28640dc603da891559387d6b15f1f2da
Author: Tim Burke <tim.burke@gmail.com>
Date:   Wed Jul 10 23:48:39 2019 -0700

reconciler: Enqueue right work for shard containers
    
    This fixes newly-enqueued work going forward, but doesn't offer anything
    to try to parse existing bad work items or even to kick shards so they
    reset their reconciler high-water marks.
    
    Change-Id: I79d20209cea70a6447c4e94941e5e854889cbec5
    Closes-Bug: 1836082

commit c1d170225281a39973dda1b8e46f5b3b5c943d1a
Author: Tim Burke <tim.burke@gmail.com>
Date:   Wed Jul 10 15:37:44 2019 -0700

functests: Make test_PUT_metadata less flakey
    
    Change-Id: I846e746c2fe7591a3ab502428f587e3cbe753225

commit b10f4bae28bec9c0c394c340bf813a28ac8a3380
Author: Tim Burke <tim.burke@gmail.com>
Date:   Tue Jun 18 09:54:02 2019 -0700

func tests: tolerate more 404s when deleting
    
    Change-Id: I3129e4f94ac39964f2f17ea05b6b2dd807fba82a

commit 557335465561b7a00d08cc5a370d5fcd6e7d83b0
Author: Tim Burke <tim.burke@gmail.com>
Date:   Sat Jun 1 10:46:54 2019 -0700

Move calls to self.app outside of error handling
    
    On py3, if/when you hit an error, you can get very noisy tracebacks like
    
      <traceback coming out of split_path()>
    
      During handling of the above exception, another exception occurred:
    
      <meaningful traceback>
    
    In general, I like this, but when we've used exception handling for
    flow-control, it gets difficult to separate the wheat from the chaff.
    
    Change-Id: I5f3bc6416207cab2c7e3a77ee6689360b55990e7

commit 06e4533c2e04333e6215fcaed969118cb758d3d3
Author: Tim Burke <tim.burke@gmail.com>
Date:   Mon Jun 10 13:47:30 2019 -0700

py3: Fix header_to_environ_key
    
    When doing things for the sake of WSGI, .upper(), .title() etc. should
    only be used on bytes.
    
    Change-Id: I130ba5014b7eff458d87ab29eb42fe45607c9a12

tags:

added: in-feature-losf

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-22: Fix included in openstack/swift 2.22.0

#7

This issue was fixed in the openstack/swift 2.22.0 release.

OpenStack Object Storage (swift)

sharding: container GETs to root container get slow

Bug Description

Other bug subscribers

Remote bug watches