get_container_info 503s shouldn't try to clear memcache
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I had a container-server get overloaded, which caused a thundering herd problem when it fell out of cache. That deserves its own bug, but it had some weird side effects; in particular, my graph of memcache sets went *way up,* which seemed odd.
Tracing through get_container_info and set_info_cache, we can see why, though:
[0] we check cache, but it's a miss
[1] so we go to origin, but we get a 503
[2] we then check cache again, only that's still not populated
so we make an info dict based on the response and see about caching it. Now
[3] we have a response in hand, and it's unsuccessful
[4] so we kill the key in memcache (even though we *just checked* and had a miss!)
That seems useless, but it's actually *worse* than useless -- if you've got a thousand or so clients all trying to get container info for that DB, some of them might actually have been able to get through -- and if they managed to set info in memcache between our second get and our kill, we just destroyed valuable information that would've helped the next guy.
[0] https:/
[1] https:/
[2] https:/
[3] https:/
[4] https:/
Reviewed: https:/ /review. opendev. org/735359 /git.openstack. org/cgit/ openstack/ swift/commit/ ?id=7be7cc966b5 a98a35cbbccfc42 6a4efb65583c12
Committed: https:/
Submitter: Zuul
Branch: master
commit 7be7cc966b5a98a 35cbbccfc426a4e fb65583c12
Author: Tim Burke <email address hidden>
Date: Fri Jun 12 08:35:06 2020 -0700
proxy: Stop killing memcache entries on 5xx responses
When you've got a container server that's struggling to respond to HEAD
requests, it's really not helpful to keep evicting the cache when some
of the hundreds of concurrent requests trying to repopulate the cache
return 503.
Change-Id: I49174a21a854a4 e8e564a7bbf997e 1841f9dda71
Closes-Bug: #1883211