Overloaded object primaries cause 404s on GET
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Medium
|
Tim Burke |
Bug Description
If a proxy sees
(Timeout, Timeout, Timeout, 404, 404, 404)
for a triple-replicated object GET, at the moment, we take *a* response over *no* response and return 404 to the client.
This seems like a poor choice, though: we have no reason to expect the data to be on handoffs! *Maybe* the 404 would be OK if we could assume that all data would have been written recently (ie, while the primaries were unavailable) -- but it seems like a much safer assumption to think that if there's data it would have been written long enough ago that the replicators would have gotten it settled on the primaries. The correct response (if the primaries weren't overloaded) may *still be* a 404 -- but the system isn't healthy enough for us to say either way with any confidence. 503 is more appropriate.
(That is, assuming those 404s aren't from tombstones -- if there's a timestamp, we *should* give that 404 consideration.)
The story gets even funkier with EC -- on a 3+2 policy, you could get something like
(frag#0#d.data, frag#1#d.data, Timeout, Timeout, Timeout, 404, 404, 404)
so *we see durable data* and yet still 404. If we know we should be able to reconstruct but can't right now, 503 seems like the *only* appropriate response.
Changed in swift: | |
importance: | Undecided → High |
importance: | High → Medium |
assignee: | nobody → Tim Burke (1-tim-z) |
Changed in swift: | |
status: | New → In Progress |
Reviewed: https:/ /review. opendev. org/672186 /git.openstack. org/cgit/ openstack/ swift/commit/ ?id=3189410f9d5 be3a845dbd47fdd 77584f4b76da8a
Committed: https:/
Submitter: Zuul
Branch: master
commit 3189410f9d5be3a 845dbd47fdd7758 4f4b76da8a
Author: Tim Burke <email address hidden>
Date: Mon Jul 22 12:38:30 2019 -0700
Ignore 404s from handoffs for objects when calculating quorum
We previously realized we needed to do that for accounts and containers
where the consequences of treating the 404 as authoritative were more
obvious: we'd cache the non-existence which prevented writes until it
fell out of cache.
The same basic logic applies for objects, though: if we see
(Timeout, Timeout, Timeout, 404, 404, 404)
on a triple-replica policy, we don't really have any reason to think
that a 404 is appropriate. In fact, it seems reasonably likely that
there's a thundering-herd problem where there are too many concurrent
requests for data that *definitely is there*. By responding with a 503,
we apply some back-pressure to clients, who hopefully have some
exponential backoff in their retries.
The situation gets a bit more complicated with erasure-coded data, but
the same basic principle applies. We're just more likely to have
confirmation that there *is* data out there, we just can't reconstruct
it (right now).
Note that we *still want to check* those handoffs, of course. Our
fail-in-place strategy has us replicate (and, more recently,
reconstruct) to handoffs to maintain durability; it'd be silly *not* to
look.
UpgradeImpact:
--------------
Be aware that this may cause an increase in 503 Service Unavailable
responses served by proxy-servers. However, this should more accurately
reflect the state of the system.
Co-Authored-By: Thiago da Silva <email address hidden> 948f01bc50aa8a6 1974ce189fb 61ddd79c98c6295 80472e09961 e4270bf73051da9 a2dd0ddbaec
Change-Id: Ia832e9bab13167
Closes-Bug: #1837819
Related-Bug: #1833612
Related-Change: I53ed04b5de20c2
Related-Change: Ief44ed39d97f65