Enabling soft-deletes opens a DOS on compute hosts

Bug #1501808 reported by Matthew Booth
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Opinion
Wishlist
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned

Bug Description

If the user sets reclaim_instance_interval to anything other than 0, then when a user requests an instance delete, it will instead be soft deleted. Soft delete explicitly releases the user's quota, but does not release the instance's resources until period task _reclaim_queued_deletes runs with a period of reclaim_instance_interval seconds.

A malicious authenticated user can repeatedly create and delete instances without limit, which will consume resources on the host without consuming their quota. If done quickly enough, this will exhaust host resources.

I'm not entirely sure what to suggest in remediation, as this seems to be a deliberate design. The most obvious fix would be to not release quota until the instance is reaped, but that would be a significant change in behaviour.

This is very similar to https://bugs.launchpad.net/bugs/cve/2015-3280 , except that we do it deliberately.

Tags: security
Revision history for this message
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
description: updated
Revision history for this message
Jeremy Stanley (fungi) wrote :

This is a grey area we get into with denial of service "vulnerabilities" related to unbounded (or loosely bounded) resource consumption. I strongly suspect the "fix" for this will happen in documentation (noting that the soft delete option allows users to temporarily exhaust resources), a new configurable feature (such as a secondary soft quota buffer), or a behavior change (perhaps altering/removing the soft delete functionality).

Input from Nova security reviewers is appreciated, but if the solution space is limited to one or more of the above then this report is likely not going to have an associated advisory.

Revision history for this message
Michael Still (mikal) wrote :

This is definitely be design. That said I agree there is a DoS possible here.

It seems to me there is a tweak we could make where if a hypervisor becomes space constrained we delete earlier than the configured time, but that might be a surprise for administrators using a "fill first" scheduling methodology.

Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

This sounds like a B1 or D type of bug (according to https://security.openstack.org/vmt-process.html#incident-report-taxonomy ).
Any objections if we open this bug by the end of the week ?

Revision history for this message
Tony Breeds (o-tony) wrote :

Actually I think it'll be an A class. The mitigation we have in mind will be effective on master and stable/* and will (we hope) leverage existing config options to reduce (not remove) the impact.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Any progress on that one Tony ?

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Tony, would you mind if we open this bug report so that an eventual patch could get posted directly on gerrit ?

Revision history for this message
Tony Breeds (o-tony) wrote :

Yeah okay let's remove the embargo.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

I've removed the privacy settings and put the OSSA tasks as Won't Fix based on comment #3. This can be put back to incomplete if the situation changes.

tags: added: security
description: updated
Changed in ossa:
status: Incomplete → Won't Fix
information type: Private Security → Public
Chris Martin (cm876n)
Changed in nova:
assignee: nobody → Chris Martin (cm876n)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/386756

Changed in nova:
status: Triaged → In Progress
Andia (wangyuwei)
Changed in nova:
assignee: Chris Martin (cm876n) → Andia (wangyuwei)
Revision history for this message
Chris Martin (cm876n) wrote :

Not sure why the assignment was changed. I have had a patch submitted to gerrit for 7 weeks but nobody has reviewed it. Would love some reviews on it.

Changed in nova:
assignee: Andia (wangyuwei) → Chris Martin (cm876n)
Changed in nova:
assignee: Chris Martin (cm876n) → Andia (wangyuwei)
Changed in nova:
assignee: Andia (wangyuwei) → Chris Martin (cm876n)
Changed in nova:
assignee: Chris Martin (cm876n) → Matt Riedemann (mriedem)
Revision history for this message
Sean Dague (sdague) wrote :

This is really a design decision, it's not really clear that changing the expected behavior here is going to provide a good experience for operators. We punt on various classes of potential DOS (like api rate limiting).

Changed in nova:
status: In Progress → Won't Fix
importance: High → Wishlist
Revision history for this message
Sean Dague (sdague) wrote :

Found open reviews for this bug in gerrit, setting to In Progress.

review: https://review.openstack.org/386756 in branch: master
review: https://review.openstack.org/407877 in branch: master

Changed in nova:
status: Won't Fix → In Progress
assignee: Matt Riedemann (mriedem) → Chris Martin (cm876n)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/386756
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
huanhongda (hongda) wrote :

Is anyone working on this?
Can we change the code not to release quota until the instance is reclaimed?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/407877
Reason: Duplicate of https://review.openstack.org/#/c/386756/ which was abandoned and I'm going to abandon this also - as noted in that other review, this would be a change in behavior and requires wider discussion.

Matt Riedemann (mriedem)
Changed in nova:
status: In Progress → Opinion
assignee: Chris Martin (cm876n) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.