Comment 1 for bug 1781291

Revision history for this message
Matthew Oliver (matt-0) wrote :

I like the idea of caching in the proxy, as the other options are leaving it to the sharder to move them so putting it off. In the shard case "most" the time the updates will goto the correct place, and when not, we're putting it off until later.

I don't think invalidation is necessary as if things go to the wrong place before the cache time/limit is up then we're just giving the sharder some work.
If we _want_ invalidation, we could get the get_container_info to include the latest shardrange timestamp of the container (timestamp not meta_timestamp as we'd only care if there was a structural change in the shard network). When this timestamp is > then the timestamp of the root container in the cache, we invalidate it and force a new GET.
The downside is that the info calls are also cached, so is it really important to invalidate then?

The real problem with shard caching is what to cache. If you look at the object controller in the proxy, we are only asking for the shard that object resides (https://github.com/openstack/swift/blob/2.18.0/swift/proxy/controllers/obj.py#L269-L281). This isn't a good cache candidate and it _should_ mostly just return a single shard.
And then theres the problem of what happens when there are thousands of shards? Maybe in the latter case sending to the wrong one isn't too bad a cost?

So what do we cache? We could cache the results of container GETs.. but how often does that happen?

We should get the sharder to update some memcache cache that the proxy can use.. but then what to cache, we don't want to cache _everything_ only the hot containers I'd expect.

Anyway. Just thought I'd put some comments in as I spend some time thinking about it.

Maybe caching, isn't the answer then? Or is there someway to identify the top containers to cache? Proxies update a hotlist in memcache, sharders/replicators/auditors (something) keeps caches of shards?

/me is just brain storming