s3api does not clean up orphan segment parts when MPU is overwritten

Bug #1813202 reported by Tim Burke
56
This bug affects 11 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Undecided
Unassigned

Bug Description

Steps to repro:

1. Use S3 API to create a multipart upload
2. Use Swift API to find the backing data in the +segments container
3. Use S3 API to overwrite the multipart upload
4. Use Swift API to look in the +segments container -- the backing data from step 2 is still there, eating space!

Expected behavior:

The overwritten data should be removed, similar to what would have happened if we issued a DELETE before the overwrite.

This gets hairier when you think about the general case, where the object being overwritten may be an SLO (so it'd be "reasonable" to think that the segments should have their own lifecycle independent of the manifest), but they *definitely* should get cleaned up in the multipart-upload case.

Tags: s3api
Revision history for this message
Bhaskar Singhal (bhaskarsinghal) wrote :

Thanks, Tim.
We also need to handle cases to clean up parts, when the multipart upload stops mid-way(or fails due to some error) or remains pending (no complete called) for some time interval. Something like slo-parts-expirer is needed to scan +segments bucket and delete parts which do not have a valid manifest file.

Revision history for this message
Bhaskar Singhal (bhaskarsinghal) wrote :

Another case to consider is managing large objects(with multi-part enabled) in versioning enabled buckets.

Revision history for this message
Yuxin Wang (chhyx2008) wrote :

We're also affected by this bug.

It would be great to have a expirer to check and remove leftover segments as Bhaskar said, or explicitly remove them after a new multipart upload succeeded with the same name.

clayg (clay-gerrard)
summary: - s3api doesn't clean up multipart upload parts when MU is overwritten
+ s3api does not clean up orphan segment parts when MPU is overwritten
Revision history for this message
Phat (letonphat1988) wrote :

We've been also affected by this bug since last year. Are there any update for this case ?
It'll be great if we've a workaround solution to detect and cleanup the orphaned/segments objects.

clayg (clay-gerrard)
Changed in swift:
status: New → Confirmed
Revision history for this message
Velychkovsky (ahvizl) wrote :

Hello, I've been affected by the same issue.
When I try to overwrite the object with s3api, it creates new segments in bucket+segmens, and cause space usage overhead and orphaned segments.

At the same time - Swift REST API works fine, and clean old segments after the object has been overwritten

Openstack Swift version - 2.23

Revision history for this message
Dmitry (kozlovdmtry) wrote :

Hi! It also affects our cluster. Orphaned segments left in bucket+segments container after overwrite of multipart object.

Revision history for this message
Pawan Gupta (pawan-idrive) wrote :

This issue is affecting our cluster as well. We see orphan segments after reupload of the same object via S3 APIs on a non versioned bucket.

Revision history for this message
Robert Winbladh (rowin2023) wrote :

Hi
Yes this affects us as well on many containers. We have a default container which occupies only ~4 TB.
While the segments container eats up another ~11 TB.

It is a lot of data wasted. And this is only from one container example.

Has this been fixed yet?

Revision history for this message
Fabricio Campos Zuardi (fabricio) wrote :

Steps to reproduce (with aws-cli and swift cli):

- docker pull dockerswiftaio/docker-swift
- docker run -P -t dockerswiftaio/docker-swift
- docker ps (to get the port number)
- export SWIFT_URL=http://127.0.0.1:<port number>
- export BUCKET_NAME=<bucket name>
- export LARGE_FILE=/usr/bin/rclone
- export AWS_ACCESS_KEY_ID=test:tester
- export AWS_SECRET_ACCESS_KEY=testing
- export AWS_REGION=us-east-1
- aws s3 mb --endpoint "$SWIFT_URL" --region="$AWS_REGION" "s3://$BUCKET_NAME"
- aws s3 cp --endpoint "$SWIFT_URL" "$LARGE_FILE" "s3://$BUCKET_NAME"
- swift -A "$SWIFT_URL/auth/v1.0" -U "$AWS_ACCESS_KEY_ID" -K "$AWS_SECRET_ACCESS_KEY" -V "1.0" list "$BUCKET_NAME+segments"
  - (7 parts)
- aws s3 cp --endpoint "$SWIFT_URL" "$LARGE_FILE" "s3://$BUCKET_NAME"
- swift -A "$SWIFT_URL/auth/v1.0" -U "$AWS_ACCESS_KEY_ID" -K "$AWS_SECRET_ACCESS_KEY" -V "1.0" list "$BUCKET_NAME+segments"
  - (14 parts)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.