[SRU] rgw: unable to abort multipart upload after the bucket got resharded

Bug #1868364 reported by dongdong tao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Queens
Fix Released
High
Unassigned
Rocky
Won't Fix
Undecided
Unassigned
Stein
Fix Released
Undecided
Unassigned
Train
Fix Released
Undecided
Unassigned
ceph (Ubuntu)
Fix Released
Undecided
dongdong tao
Bionic
Fix Released
High
Unassigned

Bug Description

[Impact]
This bug will cause the bucket not able to abort the multipart upload and leaving the stale multiple entries behind for those buckets which had partial multipart uploads before the resharding.

[Test Case]
Deploy a latest luminous(12.2.13) ceph cluter
Create a bucket
upload a big file (200M+) to that bucket
Press "Ctrl + C" after several part (each part is 15M) had been uploaded
Manually reshard the bucket to 4 shards
Abort the multipart uploading
Without the fix, you will not able to abort the previous uploading

[Regression Potential]
Low - this fix has been accept upstream in later releases since from Mimic, but it there is a super large multipart uploads, all the multi-entry will now be correctly mapped to the expected shard, which will cause this shard contain more omap entries than before, which might some slow when listing the shard

[Original Bug Report]
There is a bug during the resharding for those multipart entries.
For all the multipart entries, the hash source should be the object name so that all those entries can still be
distributed to one same bucket index shard object.
Right now the code just calculate the shard id based on each entry's name, which is wrong
This can cause the bucket not able to abort the multipart upload and leave the stale multiple entries behind.
https://tracker.ceph.com/issues/43583

dongdong tao (taodd)
Changed in ceph (Ubuntu):
assignee: nobody → dongdong tao (taodd)
Revision history for this message
James Page (james-page) wrote : Re: rgw: unable to abort multipart upload after the bucket got resharded

I have specifically not raised a bug target for eoan as the fix for this issue is included in 14.2.8 which is the proposed UNAPPROVED queue for SRU team review.

summary: - backport the multipart fix to luminous
+ rgw: unable to abort multipart upload after the bucket got resharded
Changed in ceph (Ubuntu):
status: New → Fix Released
Changed in ceph (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
Revision history for this message
dongdong tao (taodd) wrote :

upload the debdiff

description: updated
dongdong tao (taodd)
tags: added: sts sts-sru-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

@taodd since the fix is already in Train UCA and you need to get it as far as bionic-updates, you will need to now submit a patch for Stein and Rocky UCA (Disco and Cosmic are EOL so forget about them). Once R and S have landed we can deal with Bionic/Queens.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

13.2.9-0ubuntu0.19.04.1~cloud0 in stein uca already has this fix as well

rocky uca has 13.2.8-0ubuntu0.18.10.1~cloud0 which does not have the fix but perhaps these can just be synced.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hi dongdong,

Thank you for this. Can you update the [Regression Potential] section to adhere to the description at https://wiki.ubuntu.com/StableReleaseUpdates? The regression potential of an SRU is expected to be low. This section should mention what could happen if the code were to cause a regression.

Also it seems like the patch has several releases in it. Is that expected?

Thank you,
Corey

Revision history for this message
dongdong tao (taodd) wrote :

Hi Corey,

I updated the regression potential section, and this patch is for bionic luminous.

Thanks,
Dongdong

description: updated
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks. The patch repeats the debian/ bits several times. Is that intended?

Revision history for this message
dongdong tao (taodd) wrote :

This is only intended for Bionic ceph12.2.13.
This debdiff file is generated via debdiff <old dsc> <new dsc>.
Let me upload a real patch here, so it would be more clear.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks. I don't work with the ceph package much so was confused by the repeated debian/ changes.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Got it, they're all symlinks to the main debian/ directory.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Dondong, thanks again. A new version of ceph with your change has been uploaded to the bionic unapproved queue. https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=ceph

Changed in cloud-archive:
status: New → Invalid
status: Invalid → Fix Released
summary: - rgw: unable to abort multipart upload after the bucket got resharded
+ [SRU] rgw: unable to abort multipart upload after the bucket got
+ resharded
Revision history for this message
Robie Basak (racb) wrote :

[Regression Potential]

Resharding code is being adjusted, particularly in the handling of multipart entries during resharding. Any regression is likely to manifest there.

Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello dongdong, or anyone else affected,

Accepted ceph into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/12.2.13-0ubuntu0.18.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (ceph/12.2.13-0ubuntu0.18.04.3)

All autopkgtests for the newly accepted ceph (12.2.13-0ubuntu0.18.04.3) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

libvirt/4.0.0-1ubuntu8.17 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#ceph

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
dongdong tao (taodd) wrote :

Hi Robie and Corey,

is this autopkgtest regression expected ?
This is not caused by my change to rgw, do you need to re-upload the pkg or re-run the regression ?

Thanks,
Dongdong

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello dongdong, or anyone else affected,

Accepted ceph into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Revision history for this message
dongdong tao (taodd) wrote :

Hi all,

I've verified that the ceph package 12.2.13-0ubuntu0.18.04.3 ( bionic-proposed) fixed the problem.
The steps I've done:
1. Deploy a ceph cluster with version 12.2.13-0ubuntu0.18.04.2
2. s3cmd mb s3://test
3. s3cmd put testfile s3://test //400MB testfile
4. Ctrl + C to abort the multipart upload
5. radosgw-admin reshard add --bucket=test --num_shards=8
6. radosgw-admin reshard process
7. s3cmd abortmp s3://test/testfile <upload id>
8. s3cmd multipart <bucket> shows the multipart upload is not able to be aborted

9. upgrade the ceph package to 12.2.13-0ubuntu0.18.04.3
10. Retry the steps from 3 to 8 for another 400 MB object, the partial uploaded object was able to be aborted.

Thanks,
Dongdong

dongdong tao (taodd)
tags: added: verification-bionic-done verification-done
removed: verification-needed verification-needed-bionic
tags: added: verification-queens-done
removed: verification-queens-needed
Revision history for this message
dongdong tao (taodd) wrote :

Please verify if the " Autopkgtest regression report (ceph/12.2.13-0ubuntu0.18.04.3)" is an issue or not ?

tags: added: verification-needed
removed: verification-done
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hi Robie and Dongdong,

The autopkgtests are passing now. The libvirt smoke-lxc failure appears to have been unrelated.

Corey

dongdong tao (taodd)
tags: added: verification-done
removed: verification-needed
Mathew Hodson (mhodson)
tags: added: verification-done-bionic
removed: verification-bionic-done
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I just noticed that the new version FTBFS for arm64 and armhf. Can someone take a look at that? We can't release it in this state.

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Hi Łukasz,

The arm64/armhf builds are successful now.

I noticed and was checking this on Monday, and even LP PPAs had a very long wait to start building on ARM at least (mine took 10h+ waiting).
I guess this problem might have failed the builds back on upload approval date, and then had no retries afterward, so that state stuck.

But apparently that's been resolved now, and it's good to go! :)

Thanks!
Mauricio

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 12.2.13-0ubuntu0.18.04.3

---------------
ceph (12.2.13-0ubuntu0.18.04.3) bionic; urgency=medium

  * d/p/bug1868364.patch: fix rgw unable to abort multipart upload after
    the bucket got resharded (LP: #1868364).

 -- Dongdong Tao <email address hidden> Fri, 03 Jul 2020 09:33:20 +0800

Changed in ceph (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for ceph has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The verification of the Stable Release Update for ceph has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package ceph - 12.2.13-0ubuntu0.18.04.3~cloud0
---------------

 ceph (12.2.13-0ubuntu0.18.04.3~cloud0) xenial-queens; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 ceph (12.2.13-0ubuntu0.18.04.3) bionic; urgency=medium
 .
   * d/p/bug1868364.patch: fix rgw unable to abort multipart upload after
     the bucket got resharded (LP: #1868364).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.