Object storage occasionally allows more objects than the quota

Bug #1967567 reported by Bas de Bruijne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
Invalid
Undecided
Unassigned
tempest
Fix Released
Undecided
Bas de Bruijne

Bug Description

In the testrun https://solutions.qa.canonical.com/testruns/testRun/7918f17d-d042-4c3a-9280-c43350dc87fb , Tempest fails on the test tempest.api.object_storage.test_container_quotas.ContainerQuotasTest.test_upload_too_many_objects:

----------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/snap/fcbtest/29/.rally/verification/verifier-b8d6efbe-b345-41a5-806a-eea2904d2bc2/repo/tempest/common/utils/__init__.py", line 89, in wrapper
    return func(*func_args, **func_kwargs)
  File "/home/ubuntu/snap/fcbtest/29/.rally/verification/verifier-b8d6efbe-b345-41a5-806a-eea2904d2bc2/repo/tempest/api/object_storage/test_container_quotas.py", line 100, in test_upload_too_many_objects
    self.container_name, "OverQuotaObject", "")
  File "/snap/fcbtest/29/lib/python3.6/site-packages/testtools/testcase.py", line 467, in assertRaises
    self.assertThat(our_callable, matcher)
  File "/snap/fcbtest/29/lib/python3.6/site-packages/testtools/testcase.py", line 480, in assertThat
    raise mismatch_error
testtools.matchers._impl.MismatchError: <bound method ObjectClient.create_object of <tempest.lib.services.object_storage.object_client.ObjectClient object at 0x7f8caac80f98>> returned ({'date': 'Tue, 29 Mar 2022 16:46:16 GMT', 'server': 'Apache/2.4.41 (Ubuntu)', 'etag': 'd41d8cd98f00b204e9800998ecf8427e', 'last-modified': 'Tue, 29 Mar 2022 16:46:16 GMT', 'x-trans-id': 'tx00000000000000000008b-00624337d8-31d5-default', 'x-openstack-request-id': 'tx00000000000000000008b-00624337d8-31d5-default', 'content-type': 'application/json; charset=utf-8', 'content-length': '0', 'connection': 'close', 'status': '201', 'content-location': 'https://rados.silo1.solutionsqa:443/swift/v1/tempest-TestContainer-575421513/OverQuotaObject'}, b'')
----------------------------------

This test tries to upload more objects than the set quota (3 in this case) and verifies that it gets an error when the quota is exceed. The test fails because it can successfully create the 4th object.

I can't find any indication of why this is happening in the crashdumps. We see this problem roughly 1 in every 5 testruns.

Crashdumps, tempest.conf, bundle and more can be found here:
https://oil-jenkins.canonical.com/artifacts/7918f17d-d042-4c3a-9280-c43350dc87fb/index.html
All occurrences of this bug can be found here:
https://solutions.qa.canonical.com/bugs/bugs/bug/1967567

Tags: cdo-tempest
description: updated
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Looking at the SKUS that we hit this bug in:
https://solutions.qa.canonical.com/bugs/bugs/bug/1967567

I suspect it might be a bug introduced in wallaby. We have not had a wallaby/xena testrun that passed this test.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I removed this test from SQA's regular testing until this bug is looked at.

I will try this on Yoga too before it releases.

Revision history for this message
Konstantinos Kaskavelis (kaskavel) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It might be related to this upstream bug: https://tracker.ceph.com/issues/54121 or https://tracker.ceph.com/issues/36593 ? Would probably need more investigation to work out what is going on.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

I can say this issue is very likely not related to the two linked bugs in comment #4, as both of those bugs are addressing ceph-fs quotas and not object storage (radosgw).

I think this bug https://bugzilla.redhat.com/show_bug.cgi?id=1417775 is far more relevant to the quota scenario and discussing some of the caching and tunables that are available as part of this.

tags: added: cdo-tempest
Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

As per Billy's comment and the bug description in upstream Ceph, this is considered _not_ a bug if there are more than 1 radosgw instances in the clusters (since quotas are cached and can't be reliably enforced until the ttl elapses).

Can you confirm that the test is running with multiple radosgw units, Bas ? If so, the test could set a ttl of say, 1 second, sleep for that amount of time and re-check that the quota is enforced.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

There indeed multiple radosgw units in this test, that would explain it.
I will add a small delay in the test and see if that fixes it. I assume the test will need to wait for the quota stat caching to refresh, so we may need to decrease the caching time temporarily during the test.

Changed in charm-ceph-radosgw:
status: New → Invalid
Changed in tempest:
assignee: nobody → Bas de Bruijne (basdbruijne)
Changed in tempest:
status: New → Confirmed
status: Confirmed → In Progress
Revision history for this message
Lukas Piwowarski (lukas-piwowarski) wrote :

I'm putting it here just for reference. Related in progress patch: https://review.opendev.org/c/openstack/tempest/+/850311. I don't know why Launchpad didn't update the bug itself.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.opendev.org/c/openstack/tempest/+/850311
Committed: https://opendev.org/openstack/tempest/commit/c9f9a038e0b343b5c87fcb80accbc676f92a580e
Submitter: "Zuul (22348)"
Branch: master

commit c9f9a038e0b343b5c87fcb80accbc676f92a580e
Author: Bas de Bruijne <email address hidden>
Date: Tue Jul 19 11:16:02 2022 -0300

    Add waiter for object creation

    Fixes out-of-sync object quota cache for ceph

    Closes-Bug: #1967567
    Change-Id: I39d0dcc6e629f278fdff718980b376d392e30084

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest 32.0.0

This issue was fixed in the openstack/tempest 32.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.