Backport of zero-length gc chain fixes to Luminous

Bug #1843085 reported by Kellen Renshaw on 2019-09-06
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Undecided
Unassigned
Queens
High
Dan Hill
Rocky
High
Unassigned
ceph (Ubuntu)
Undecided
Unassigned
Bionic
High
Dan Hill

Bug Description

[Impact]
Cancelling large S3/Swift object puts may result in garbage collection entries with zero-length chains. Rados gateway garbage collection does not efficiently process and clean up these zero-length chains.

A large number of zero-length chains will result in rgw processes quickly spinning through the garbage collection lists doing very little work. This can result in abnormally high cpu utilization and op workloads.

[Test Case]
Disable garbage collection:
`juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": "false"}}'`

Repeatedly kill 256MB object put requests for randomized object names.
`for i in {0.. 1000}; do f=$(mktemp); fallocate -l 256M $f; s3cmd put $f s3://test_bucket &; pid=$!; sleep $((RANDOM % 3)); kill $pid; rm $f; done`

Capture omap detail. Verify zero-length chains were created:
`for i in $(seq 0 ${RGW_GC_MAX_OBJS:-32}); do rados -p default.rgw.log --namespace gc listomapvals gc.$i; done`

Raise radosgw debug levels, and enable garbage collection:
`juju config ceph-radosgw config-flags='{"rgw": {"rgw enable gc threads": "false"}}' loglevel=20`

Verify zero-lenth chains are processed correctly by inspecting radosgw logs.

[Regression Potential]
Backport has been accepted into the Luminous release stable branch upstream.

[Other Information]
This issue has been reported upstream [0] and was fixed in Nautilus alongside a number of other garbage collection issues/enhancements in pr#26601 [1]:
* adds additional logging to make future debugging easier.
* resolves bug where the truncated flag was not always set correctly in gc_iterate_entries
* resolves bug where marker in RGWGC::process was not advanced
* resolves bug in which gc entries with a zero-length chain were not trimmed
* resolves bug where same gc entry tag was added to list for deletion multiple times

These fixes were slated for back-port into Luminous and Mimic, but the Luminous work was not completed because of a required dependency: AIO GC [2]. This dependency has been resolved upstream, and is pending SRU verification in Ubuntu packages [3].

[0] https://tracker.ceph.com/issues/38454
[1] https://github.com/ceph/ceph/pull/26601
[2] https://tracker.ceph.com/issues/23223
[3] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1838858

Kellen Renshaw (krenshaw) wrote :

It appears that there is an existing backport:
https://tracker.ceph.com/issues/38714

that depends on:
https://tracker.ceph.com/issues/23223

Dan Hill (hillpd) on 2019-09-17
Changed in ceph (Ubuntu):
assignee: nobody → Dan Hill (hillpd)
Dan Hill (hillpd) on 2019-09-17
Changed in ceph (Ubuntu):
assignee: Dan Hill (hillpd) → nobody
Changed in ceph (Ubuntu Bionic):
assignee: nobody → Dan Hill (hillpd)
Dan Hill (hillpd) on 2019-09-17
description: updated
summary: - Need backport of 0-length gc chain fixes to Luminous
+ Backport of zero-length gc chain fixes to Luminous
Dan Hill (hillpd) wrote :

pr#30367 [0] is currently pending upstream review, but needs to have build issues resolved.

[0] https://github.com/ceph/ceph/pull/30367

Billy Olsen (billy-olsen) wrote :

Adding to UCA queens for luminous backport. Fix is in the mimic series (UCA rocky) already.

Dan Hill (hillpd) on 2019-09-17
Changed in ceph (Ubuntu Bionic):
status: New → In Progress
Dan Hill (hillpd) wrote :

Want to clearly state that while AIO GC is a dependency, these fixes do not address anything introduced by that feature.

The fixes address bugs that existed prior to AIO GC.

James Page (james-page) on 2019-09-19
Changed in cloud-archive:
status: New → Invalid
Changed in ceph (Ubuntu):
status: New → Invalid
Changed in ceph (Ubuntu Bionic):
importance: Undecided → High
tags: added: sts-sru-needed
Dan Hill (hillpd) wrote :

Upstream back-port is being tracked by issue#38714, and the pr#31664 [1] is pending upstream review.

[0] https://tracker.ceph.com/issues/38714
[1] https://github.com/ceph/ceph/pull/31664

James Page (james-page) on 2019-11-26
description: updated

Hello Kellen, or anyone else affected,

Accepted ceph into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ceph/12.2.12-0ubuntu0.18.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ceph (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-bionic
James Page (james-page) wrote :

Hello Kellen, or anyone else affected,

Accepted ceph into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
Corey Bryant (corey.bryant) wrote :

Hello Kellen, or anyone else affected,

Accepted ceph into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

James Page (james-page) wrote :

General regression testing of bionic/proposed completed OK:

======
Totals
======
Ran: 92 tests in 668.7828 sec.
 - Passed: 84
 - Skipped: 8
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 779.9697 sec.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.