Images in inconsistent state when calls to registry fail during image deletion

Bug #1511061 reported by Hemanth Makkapati
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Glance
Invalid
Critical
Prateek Goel
Juno
New
Undecided
Unassigned
Kilo
New
Undecided
Unassigned
Liberty
New
Undecided
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned

Bug Description

[0] shows a sample image that was left in an inconsistent state when a call to registry failed during image deletion.

Glance v1 API makes two registry calls when deleting an image.
The first call [1] is made to to set the status of an image to deleted/pending_delete.
And, the other [2], to delete the rest of the metadata, which sets 'deleted_at' and 'deleted' fields in the db.

If the first call fails, the image deletion request fails and the image is left intact in it's previous status.
However, if the first call succeeds and the second one fails, the image is left in an inconsistent status where it's status is set to pending_delete/deleted but it's 'deleted_at' and 'deleted' fields are not set.

If delayed delete is turned on, these images are never collected by the scrubber as they won't appear as deleted images because their deleted field is not set. So, these images will continue to occupy storage in the backend.
Also, further attempts at deleting these images will fail with a 404 because the status is already set to pending_delete/deleted.

[0] http://paste.openstack.org/show/477577/
[1]: https://github.com/openstack/glance/blob/master/glance/api/v1/images.py#L1115-L1116
[2]: https://github.com/openstack/glance/blob/master/glance/api/v1/images.py#L1132

description: updated
Changed in glance:
status: New → Triaged
importance: Undecided → Critical
milestone: none → mitaka-1
assignee: nobody → Hemanth Makkapati (hemanth-makkapati)
Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

1. I agree, the image deletion operation should be atomic.

2. Image data left behind means there is a potential risk of filling up storage quota and resulting into a DoS; be mindful that it's a denial of service but NOT a exploit as it is dependent on the operators' failure scenarios of g-api <-> reg communication.

3. The original description has information related to failure scenarios for only v1. So, a check is needed for the v2 as applicable.

information type: Public → Public Security
Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

I am marking this as a public security as we need to let the operators know of such failure scenarios. There's no exploit at the user level so this was never a candidate for private security.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Can user make the image deletion to fail ? e.g., can this be abused trivially ?

Changed in ossa:
status: New → Incomplete
Revision history for this message
Erno Kuvaja (jokke) wrote :

Nikhil,

Security flag should not be used just to inform operators for possible failure conditions.

Tristan,
IMO this bug has nothing to do with security as there is no attack vector here.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Thanks Erno, I've removed the OSSA task

Changed in ossa:
status: Incomplete → Won't Fix
information type: Public Security → Public
Revision history for this message
Prateek Goel (prateek.goel) wrote :

The history of code around this bug goes back to defect:
https://bugs.launchpad.net/glance/+bug/1065187
because of security issue the code was modified as follows:
step 1: Update image metadata with status = deleted
step 2: mark image in db as deleted=true and deleted_at = time
step 3: delete the image from backend
as per comment: https://bugs.launchpad.net/glance/+bug/1065187/comments/2

Later there was a bug filed https://bugs.launchpad.net/glance/+bug/1276142
Here they claimed, if deleting image from backend failed then
users can not use glance image-delete command to clean up the orphan image in backend.
For this purpose code was changed as follows:
step 1: Update image metadata with status = deleted
step 2: delete the image from backend
step 3: mark image in db as deleted=true and deleted_at = time

Now this bug points that what if, image metadata delete from db fails..

On closely looking at v2 api flow:
glance v2 checks the authorization with policy.json to allow delete action
deletes the image from backend
mark image in db as deleted=true and deleted_at = time

To maintain consistency between v1 and v2,
keeping v2 flow as reference in v1 we will do following:
step 1: try image metadata update status=[current image status]
if the above update fails that means user is not allowed to update the image and hence delete action will error
step 2: Then delete the image from backend
step 3: if, its successful, set the image status=pending delete/deleted, deleted=true and deleted_at=time.now()

Alternatively, we could delete the image metadata from db first so that the image is not at all usable
and then try deleting from backend, if, backend fails there can be an error message specifying what to do.

Looking forward to suggestions on this, so that I can proceed with the fix.

Changed in glance:
assignee: Hemanth Makkapati (hemanth-makkapati) → Prateek Goel (prateek.goel)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/539495

Revision history for this message
Prateek Goel (prateek.goel) wrote :

v1 API is deprecated.

Changed in glance:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on glance (master)

Change abandoned by "Cyril Roelandt <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/glance/+/539495
Reason: The bug has been marked as invalid, so this is not required

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.