rbd cannot delete residual image from ceph in some situations

Bug #1591081 reported by Alexander Rubtsov
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
Medium
MOS Glance
6.1.x
Won't Fix
Medium
MOS Maintenance
7.0.x
Fix Released
Medium
Sergii Rizvan
8.0.x
Fix Released
Medium
Sergii Rizvan
9.x
Fix Released
Medium
MOS Glance

Bug Description

Upstream bug: https://bugs.launchpad.net/glance-store/+bug/1473953

Description:
When user through glance RESTful api upload image, the image generally has a large size.

In fact, uploading a large enough image may be failed due to http connection broken or other situations like that.
RBD supports a mechanism that when add operation failed, rollback operation must be triggered (delete residual image if it was created).

Base on a condition, we have already encountered, that the incomplete image has not been taken snapshot yet, then rollback operation do unprotect snap will throw exception "rbd.ImageNotFound".
This exception will be disposed finally, while the code relating to remove residual image from ceph has been skipped.

Therefore, re-uploading image will failed using the same image id due to above reason (residual image already exists) & residual image need to be deleted manually from ceph.

tags: added: customer-found
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Incomplete - please provide specific steps to reproduce and clarify which version of MOS is affected by this issue

Changed in mos:
status: New → Invalid
status: Invalid → Incomplete
Revision history for this message
Dina Belova (dbelova) wrote :

Assigning to the bug creator back for more details to be provided.

Changed in mos:
assignee: nobody → Alexander Rubtsov (arubtsov)
Revision history for this message
Michael Petersen (mpetason) wrote :

In this case the issue is happening in 6.1. The customers steps are:

Upload an image larger than the Glance image size(300gb). The image file fails(322gb). RBD receives some of the data(256gb) but doesn't get the entire image. Glance allows the end user to delete the image in glance, but the image does not get deleted out of Ceph.

Errors that lead to tagging this issue as the cause:

File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 683, in __call__
    request, **action_args)
  File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 707, in dispatch
    return method(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/glance/common/utils.py", line 502, in wrapped
    return func(self, req, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 1084, in delete
    {'status': ori_status})
  File "/usr/lib/python2.7/dist-packages/glance/openstack/common/excutils.py", line 82, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/glance/api/v1/images.py", line 1080, in delete
    upload_utils.initiate_deletion(req, loc_data, id)
  File "/usr/lib/python2.7/dist-packages/glance/api/v1/upload_utils.py", line 46, in initiate_deletion
    id, location_data)
  File "/usr/lib/python2.7/dist-packages/glance/common/store_utils.py", line 124, in delete_image_location_from_backend
    safe_delete_from_backend(context, image_id, location)
  File "/usr/lib/python2.7/dist-packages/glance/common/store_utils.py", line 58, in safe_delete_from_backend
    ret = store_api.delete_from_backend(location['url'], context=context)
  File "/usr/lib/python2.7/dist-packages/glance_store/backend.py", line 280, in delete_from_backend
    return store.delete(loc, context=context)
  File "/usr/lib/python2.7/dist-packages/glance_store/_drivers/rbd.py", line 406, in delete
    self._delete_image(target_pool, loc.image, loc.snapshot)
  File "/usr/lib/python2.7/dist-packages/glance_store/_drivers/rbd.py", line 301, in _delete_image
    raise exceptions.InUseByStore()

Revision history for this message
Alexander Rubtsov (arubtsov) wrote :

As Michael has provided the requested information, I'm changing status back from Incomplete to New

Changed in mos:
status: Incomplete → New
Revision history for this message
Michael Petersen (mpetason) wrote :

Removed need-info tag as we provided information.

tags: removed: need-info
Revision history for this message
Kairat Kushaev (kkushaev) wrote :

The issue has been fixed in glance_store 0.11.0.
Mitaka requirements is glance_store>=0.13.0.
So the issue must be fixed in 9.1 also.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Won't fix for 6.x series as they are unsupported and the issue has medium importance.

Revision history for this message
Michael Petersen (mpetason) wrote :

The version in question that opened the ticket originally is 6.1 and needs a backport.

tags: added: ct2
Revision history for this message
Sergii Rizvan (srizvan) wrote :
Revision history for this message
Sergii Rizvan (srizvan) wrote :

Note for QA:

Steps to reproduce:
1. Download some cloud image, for example:
wget http://cloud.centos.org/centos/6.6/images/CentOS-6-x86_64-GenericCloud-1508.qcow2

2. Inspect usage of ceph resources:
root@node-1:~# ceph df
GLOBAL:
    SIZE AVAIL RAW USED %RAW USED
    113G 106G 6277M 5.42
POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS
    data 0 0 0 54723M 0
    metadata 1 0 0 54723M 0
    rbd 2 0 0 54723M 0
    images 3 12976k 0.01 54723M 5
    volumes 4 0 0 54723M 0
    backups 5 0 0 54723M 0
    compute 6 0 0 54723M 0

As we can see pool 'images' contains only 5 objects.

3. Start creating image with glance CLI:
glance image-create --file ./CentOS-6-x86_64-GenericCloud-1508.qcow2 --disk-format qcow2 --container-format bare --progress

4. Interrupt in creating process in the middle of creation with CTRL-C:
[=====================> ] 73%^C... terminating glance client

5. Inspect usage of ceph resources again:
root@node-1:~# ceph df
GLOBAL:
    SIZE AVAIL RAW USED %RAW USED
    113G 105G 7798M 6.74
POOLS:
    NAME ID USED %USED MAX AVAIL OBJECTS
    data 0 0 0 53790M 0
    metadata 1 0 0 53790M 0
    rbd 2 0 0 53790M 0
    images 3 524M 0.45 53790M 71
    volumes 4 0 0 53790M 0
    backups 5 0 0 53790M 0
    compute 6 0 0 53790M 0

Now pool 'images' contains 71 objects but actually image haven't been saved in glance. So there are residual objects that haven't been deleted.
After applying patch glance should automatically delete residual objects from ceph in case when image uploading fails. For example: http://paste.openstack.org/show/585335/

Revision history for this message
Sergii Rizvan (srizvan) wrote :

@mpetason, 6.1 is now unsupported and has broken CI. And we're not going to release any new MU for 6.1. That's why I've just created the patches [1], [2] in order to fix the issue in 6.1, but you should apply these patches manually on customer's environment.

[1] https://review.fuel-infra.org/#/c/27383/
[2] https://review.fuel-infra.org/#/c/26845/

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on MOS 7.0 + MU6 updates.

After updates in case when image uploading fails, glance automatically deletes residual objects from ceph - pool 'images' contains the same number of objects as before interrupting of creating image process.

tags: removed: on-verification
tags: added: on-automation
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :
tags: added: covered-automated-test
removed: on-automation
Revision history for this message
Chandra (reddydodda) wrote :

Hi team,

is there any work around to clean the disk space till fix is released .

i am using mos8.0

Thanks.

Revision history for this message
Chandra (reddydodda) wrote :

i dont have access to this patch for 8.0

https://review.fuel-infra.org/#/c/27369/

Revision history for this message
Sergii Rizvan (srizvan) wrote :

Hello Chandu. You can find a patch for MOS 8.0 in attachment for this comment.

tags: added: on-verification
Revision history for this message
TatyanaGladysheva (tgladysheva) wrote :

Verified on MOS 8.0 + MU4 updates.

Actual results:
Glance automatically deletes residual objects from ceph - pool 'images' contains the same number of objects as before interrupting of creating image process.

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.