OpenStack Object Storage (swift)

Return fast 404's in multi-region cluster

Bug #1462527 reported by Paul T Burke on 2015-06-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Won't Fix	Undecided	Unassigned

Bug Description

In a multi-region cluster when getting a 404 it will go over the wire to check all remote disks and secondary locations before returning to the client. Ideally, when read affinity is enabled it would be good to have an additional setting to only check local region instead of all regions for existence of an object to return very fast on failure. In a multi-region replication scenario having such a setting would eliminate a great deal of over the wire latency to the remote regions when performing the lookup.

As Swift clusters continue to extend across multiple regions with object replicas in those regions , customers are expecting the same response times for their applications. Allowing the ability to fail fast would reduce this additional multi-region overhead for 404's.

See original description

Paul T Burke (paul-burke) on 2015-06-05

description:	updated
description:	updated
description:	updated

Revision history for this message

clayg (clay-gerrard) wrote on 2015-06-05:

What if the only local copy is unavailable? Surely there is a use-case where after a 201 we should not return 404 just because we can't find it on the local primary and couple local handoffs?

I guess immediately following a PUT there'd be a good chance to find it on the local handoffs if you have write affinity enabled. But most of the time it's stupid to check a handoff until you've checked all the primaries!

Why does the application want Swift to be so quick to tell them it doesn't have something? Isn't the risk of not returning data that we have that the client asked for worth a little bit of latency - what are the states goals - just "faster?" I think we could accomplish making 404's faster without even *talking* to the object servers :P

I'm worried this might be an anti-feature. Here's a thing you *could* turn on but don't do it because in practice under failure it leads to clients observing horrible behaviors.

*Maybe* next to read affinity you could add a setting something like "I have 4 replicas globally, check 2 primaries, then the local handoff that would have been used in the affinity write case and then finally *one* of the two other primaries in the remote, but that's it" - still seems scary, with enough local primaries it's probably workable, and from the a network traversal situation it might still be a little better than the default 2 * replicas checks.

-Clay

Revision history for this message

Paul T Burke (paul-burke) wrote on 2015-06-05:

Thanks Clay, I understand your concern. There is a common pattern to send a HEAD request prior to a PUT to determine if an object exists. As we extend to multiple regions, with object replicas spread across regions, the 404 is extremely latency expensive. In a geo-distributed cluster, as regions are added and replicas placed across the regions, it is understood that the 404 response gets slower and slower. This behavior changes the customer application performance profile as regions are added.

The bug posed is an attempt to surface a setting to allow fast-failure in order to provide a deterministic performance response on 404 to the applications. And avoid having our customers get a penalty for using this pattern with every region we add to their specific object policy.

>ptb

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2015-06-08:

If the client actually made the GET request, then the proxy would go across the WAN to find the object, so the client would get a 200, albeit slowly. Getting the wrong answer fast doesn't seem particularly valuable.

Besides, if the client is making the HEAD request in order to avoid overwriting an object, and the proxy "fails fast" and returns a 404, then the client is going to upload the object even if it already exists. Worst case, this overwrites an older object and loses data; best case, the client re-uploads an identical object and only wastes time and data transfer. Surely a "fast" 404 and a subsequent useless upload is slower than a "normal" / "slow" 404 like Swift provides now.

Revision history for this message

Paul T Burke (paul-burke) wrote on 2015-06-09:

Thanks for the follow-up Sam. The option of Failing fast is valuable when customer applications are directly (negatively) affected as regions are added (which should be opaque to them). On our Swift cluster there are customers building applications that do a HEAD request as part of their logic and their application performance profile changes as additional regions are introduced. These applications are high throughput to a single primary region and they are negatively impacted as regions increase. The result is that their application behavior changes which cascades into a series of support calls, customer meetings, and explaining to the customer why they need to change their application. As you can imagine, this is not scalable and hence this discussion as we look at options.
>ptb

Revision history for this message

John Dickinson (notmyname) wrote on 2015-07-06:

Paul, you raise a good point, and Sam and Clay have given you some good guidance and insight. I agree with both Clay and Sam than "returning the wrong answer faster" isn't good and it's something we don't want to add to Swift. However, you do touch on an important issue that has come up several times before: how do we reduce the amount of cross-region network traffic when possible?

In addition to configuring Swift to have a different number of handoffs it looks at, and configuring subsets of large clusters as different storage policies, and rewriting apps to do different things (like using If-None-Match), there is a way Swift might be patched to give you a tradeoff of network usage and latency.

Perhaps Swift could choose to do read requests concurrently instead of serially and use the first server to respond with data. This could allow for lower time-to-first-byte latency on reads, but it would also end up creating more network connections. Maybe, eg, all replicas/handoffs in the same region could be queried concurrently to give something between "all serial" and "all concurrent". If you'd like to talk more about how this might be implemented, please join us in #openstack-swift on freenode IRC.

Changed in swift:
status:	New → Won't Fix

Revision history for this message

John Dickinson (notmyname) wrote on 2015-07-06:

closing this as "won't fix" because of the very specific request, not because the idea is a bad one.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.