get_container_info (or our use of it) needs to be smarter in the face of failures

Bug #1883214 reported by Tim Burke
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

... because currently, Swift can do some pretty stupid things: http://paste.openstack.org/show/794669/

Container quota middleware tries to get container info, but the container's overloaded. Primaries are error limited, so it goes 6 deep into handoffs. Doesn't find anything; 503. That's OK, CQ says, "this will hopefully 404 later".

So we keep going down the pipeline. Versioned writes sees the request, tries to get container info. Container's still overloaded, we still go out to handoffs, we still get a 503. That's OK, VW says, I guess old-style versioning isn't enabled.

So we keep going down the pipeline. Object versioning sees the request, tries to get container info. Container's still overloaded, we still go out to handoffs, we still get a 503. That's OK, OV says, I guess new-style versioning isn't enabled.

So we keep going down the pipeline. Proxy server sees the request, tries to get container info, blah blah blah. *Finally* we can respond with a 503 to the client! After 24 useless requests :-(

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to swift (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/swift/+/875819

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to swift (master)

Reviewed: https://review.opendev.org/c/openstack/swift/+/875819
Committed: https://opendev.org/openstack/swift/commit/b68cc893f74bf61a03439549d1ca5412cfc8d0ff
Submitter: "Zuul (22348)"
Branch: master

commit b68cc893f74bf61a03439549d1ca5412cfc8d0ff
Author: Tim Burke <email address hidden>
Date: Tue Feb 28 23:03:23 2023 -0800

    proxy: Reduce round-trips to memcache and backend on info misses

    Following a memcache restart in a SAIO, I've seen the following happen
    during an object HEAD:

    - etag_quoter wants to get account/container info to decide whether to
      quote-wrap or not
    - account info is a cache miss, so we make a no-auth'ed HEAD to the next
      filter in the pipeline
    - eventually this gets down to ratelimit, which *also* wants to get
      account info
    - still a cache miss, so we make a *separate* HEAD that eventually talks
      to the backend and populates cache
    - ratelimit realizes it can't ratelimit the request and lets the
      original HEAD through to the backend

    There's a related bug about how something similar can happen when the
    backend gets overloaded, but *everything is working* -- we just ought to
    be talking straight to the proxy app.

    Note that there's likely something similar going on with container info,
    but the hardcoded 10% sampling rate makes it harder to see if you're
    monitoring raw metric streams.

    I thought I fixed this in the related change, but no :-/

    Change-Id: I49447c62abf9375541f396f984c91e128b8a05d5
    Related-Change: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f
    Related-Bug: #1883214

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.