OpenStack Object Storage (swift)

After a static large object (SLO) is uploaded with a manifest file, if the same object is uploaded again with a manifest, the upload process (PUT request) will be executed for a second time.

Bug #1699973 reported by Saurabh jangir on 2017-06-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	New	Undecided	Unassigned

Bug Description

As the same object is already uploaded, PUT requests are not required for the second time onwards as this will only increase the overhead.
Process flow from the second time onwards:
1. HEAD request to check the existence of the object being uploaded and its properties (header information) (HTTP status code 200 is returned.)
2. GET request to get the manifest file.
3. A series of HEAD requests for the newly made container where the segments are uploaded in the prior request.
4. PUT request to upload segments and manifest file.

As a result of 4, because the uploaded information is exactly the same as that of the previous upload, PUT requests need not be made again.

Suggested Solution:
Changes must be done in design to add private method in file /swift/account/server.py that will be called from GET method of same file and implement a check that would prevent duplicate upload if the information that GET request retrieves is same as the already uploaded object. Also changes are required in _update_or_create() method of /swift/container/server.py to prevent duplicate upload.

See original description

Saurabh jangir (sjopenstack) on 2017-06-23

description:	updated
affects:	glance → swift

Revision history for this message

Tim Burke (1-tim-z) wrote on 2017-06-23:

Couple problems.

1. Swift is eventually-consistent -- as a result, we can't be sure that a GET during the PUT would reflect the eventually-consistent state of the system. Even if we included the X-Newest header (which would make the GET much more expensive), it would only query the nodes that are currently reachable; there may be a more-recent PUT recorded on a currently-offline node. This causes errors both ways; we may miss a more-recent manifest on an unreachable node (and still do all of the SLO validation that we normally do with some *new* overhead from the GET), or erroneously think that the manifest already exists when an unreachable node has recorded an overwrite with a new object. This is part of a larger class of problems; in general, atomic operations on eventually-consistent storage systems are hard.

2. Even if we did it anyway, we have to worry about what happens when the object we're overwriting *isn't* a manifest. Sure, the proxy can look at the headers and tear down the connection as soon as it sees there's no X-Static-Large-Object header, but meanwhile whichever object-server (or, with X-Newest, object-server*s*) is servicing the request has already done disk seeks, started filling buffers, and in general wasted a bunch of IOPS.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.