Download large objects concurrently

Bug #1621562 reported by Tim Burke
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-swiftclient
Confirmed
Wishlist
Unassigned

Bug Description

When uploading a segmented object, we use concurrent connections to boost throughput. We should do something similar during downloads. There are at least two ways we could do this:

Option 1: we could do something like what we do for --skip-identical -- tack on a ?multipart-manifest=get query param, then check whether we got a manifest and if so, start a download for each segment.

Option 2: issue the GET normally, then check the content-length of the object that we're getting. If it's larger than some threshold (1GB? probably want it configurable) then issue several ranged GETs, and close the original connection after reading the first "segment". Note that this segment size may not line up with the segment size used during upload.

The second option has the benefit of not needing any special knowledge about the object, which means (1) it should work for any new types of large objects we might come up with and (2) it may be beneficial even for regular objects, as the subsequent connections may pull from different replicas. However, it will litter operators' swift logs with "Client disconnected" warnings.

Revision history for this message
clayg (clay-gerrard) wrote :

I don't think the Range request approach is likely to yield much results - the backend isn't doing anything to load balance the disks servicing the requests - if you make three requests you're just as likely to trounce on yourself as hit a different replica on a new disk.

If swiftclient wants to download the entirety of a SLO it seems not unreasonable to me that on a GET response with a Static-Large-Object header we could consider the potential tradeoff to close the current connection and instead download each segment into the gaps of a preallocated sparse file.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.