junk data when segmented uploading interrupted

Bug #1264423 reported by Zhou Yuan
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
python-swiftclient
New
Undecided
Unassigned

Bug Description

junk data will occur when segmented uploading interrupted

to reproduce:

1. Segmented uploading some big object, this will happen frequently if the object is >10G

root@proxy1 ~]# bash callswift.sh upload c data -S 1024000

2. cancel the uploads, this will resulting some segments are stored in the cluster already while the whole object is still not accessible from the client. And if you successfully uploaded the object in some future time, new segments will be stored there (new timestamps). Those old segments are big waste of disk space.

root@proxy1 ~]# bash callswift.sh delete c data
Object 'c/data' not found

[root@proxy1 ~]# bash callswift.sh list c_segments
data/1388140244.78/1073741824/00000000
data/1388140244.78/1073741824/00000001
data/1388140244.78/1073741824/00000002
data/1388140244.78/1073741824/00000003
data/1388140244.78/1073741824/00000004
data/1388140244.78/1073741824/00000005
data/1388140244.78/1073741824/00000006
data/1388140244.78/1073741824/00000007
data/1388140244.78/1073741824/00000008
data/1388140244.78/1073741824/00000009
data/1388140244.78/1073741824/00000010
data/1388140244.78/1073741824/00000011
data/1388140244.78/1073741824/00000012

Changed in python-swiftclient:
assignee: nobody → Ritesh (rsritesh)
Changed in python-swiftclient:
assignee: Ritesh (rsritesh) → nobody
Revision history for this message
Kevin Bowrin (kbkbkbkbkb) wrote :

I thought the 'junk' segments were kept in swift to allow for 'resuming' uploads when an SLO upload failed. This is not the case. The wasteful behaviour that Zhou Yuan describes is accurate. It is doubly wasteful: It keeps unused data in swift, and it doesn't use already uploaded segments to speed up retries.

A new CLI option should be created for SLO segmented uploads, which controls what happens on partial uploads. The junk should either be deleted, or used by subsequent uploads to speed up retries. If the hash and name of a segment is the same as one that already exists, the segment should not be uploaded again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.