sharding: container GETs to root container get slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Medium
|
Unassigned |
Bug Description
During and object PUT to a sharded contianer, the proxy does a GET to the container server to get the shard ranges. This container GET starts to get really slow under high concurrency.
To reproduce, make a container with a lot of shards. Then start concurrently putting new objects into it. As concurrency increases, the container GET gets very slow.
There's a few ideas on fixing this (none of these are final)
Cache the shard ranges in the proxy. Since they are unbounded, memcache isn't likely a good place. Maybe some in-process memory can be used and have some sort of LRU on it. But then you've got cache invalidation issues...
Defer the child DB update until later. Only tell the object server to update the root, and let the updater handle the redirects. Also let the root DB enqueue child updates.
Instead of updating the root, send the updates to the root's handoffs (some number of them) so that they get sharded in place and the sharder handles getting the updates to the right place.
Don't do any of the deferring in the server and instead slow down the client (ie backpressure).
I like the idea of caching in the proxy, as the other options are leaving it to the sharder to move them so putting it off. In the shard case "most" the time the updates will goto the correct place, and when not, we're putting it off until later.
I don't think invalidation is necessary as if things go to the wrong place before the cache time/limit is up then we're just giving the sharder some work.
If we _want_ invalidation, we could get the get_container_info to include the latest shardrange timestamp of the container (timestamp not meta_timestamp as we'd only care if there was a structural change in the shard network). When this timestamp is > then the timestamp of the root container in the cache, we invalidate it and force a new GET.
The downside is that the info calls are also cached, so is it really important to invalidate then?
The real problem with shard caching is what to cache. If you look at the object controller in the proxy, we are only asking for the shard that object resides (https:/ /github. com/openstack/ swift/blob/ 2.18.0/ swift/proxy/ controllers/ obj.py# L269-L281). This isn't a good cache candidate and it _should_ mostly just return a single shard.
And then theres the problem of what happens when there are thousands of shards? Maybe in the latter case sending to the wrong one isn't too bad a cost?
So what do we cache? We could cache the results of container GETs.. but how often does that happen?
We should get the sharder to update some memcache cache that the proxy can use.. but then what to cache, we don't want to cache _everything_ only the hot containers I'd expect.
Anyway. Just thought I'd put some comments in as I spend some time thinking about it.
Maybe caching, isn't the answer then? Or is there someway to identify the top containers to cache? Proxies update a hotlist in memcache, sharders/ replicators/ auditors (something) keeps caches of shards?
/me is just brain storming