[OSSA 2014-038] List instances by IP results in DoS of nova-network (CVE-2014-3708)

Bug #1358583 reported by Mohammed Naser on 2014-08-19
262
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Tristan Cacqueray
Icehouse
High
Tristan Cacqueray
Juno
High
Tristan Cacqueray
OpenStack Security Advisory
Medium
Tristan Cacqueray

Bug Description

Hi,

On a customer install which has approximately 500 VMs in the system, running the following will hang:

nova list --ip 199

What will happen afterwards is that the nova-network process will stop responding for a while, a trace shows that it's receiving a huge amount of data. Upon further investigation, it looks like the issue maybe the right here:

https://github.com/openstack/nova/blob/stable/icehouse/nova/network/manager.py#L420

On this installation:

nova=> select count(*) from virtual_interfaces;
 count
-------
 11985
(1 row)

So with 1 run, we're sending almost 12K records to a single nova-network process which takes up a huge CPU load (and blocks it from doing anything else).

What ends up happening is other things start timing out in the system, such as resizes and new deployments:

2014-08-19 03:44:49.511 31562 ERROR nova.compute.manager [req-e7b6d34f-81b5-46f9-a5e9-25ccfb863cfe bac292822cdf451f81201b3c1957914f 78578deaaf3542c087101746d1ad3f50] [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] Setting instance vm_state to ERROR
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] Traceback (most recent call last):
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3547, in finish_resize
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] disk_info, image)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3490, in _finish_resize
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] migration['dest_compute'])
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 95, in wrapped
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] return func(self, context, *args, **kwargs)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 509, in setup_networks_on_host
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] self.network_rpcapi.setup_networks_on_host(context, **args)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/nova/network/rpcapi.py", line 270, in setup_networks_on_host
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] teardown=teardown)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 361, in call
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] return self.prepare().call(ctxt, method, **kwargs)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 150, in call
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] wait_for_reply=True, timeout=timeout)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] timeout=timeout)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 412, in send
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] return self._send(target, ctxt, message, wait_for_reply, timeout)
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 405, in _send
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] raise result
2014-08-19 03:44:49.511 31562 TRACE nova.compute.manager [instance: 28bf47af-1063-473c-9c7c-bb6351e97064] RemoteError: Remote error: MessagingTimeout Timed out waiting for a reply to message ID dd6f75163f414ac4ade3cd629593cd2d

CVE References

Thanks for the report, the OSSA task is set to incomplete pending additional security review from @nova-coresec.

Changed in ossa:
status: New → Incomplete
Andrew Laski (alaski) wrote :

As the comment in the linked code indicates, that method is not going to work well for large amounts of data. The worst bottleneck is probably going to be the length of time it takes to complete the sql query and retrieve the data. Unfortunately the mysql library used by Nova blocks when it makes a call, so everything stops in that process for the time it takes to complete. But there should be one nova-network process per CPU of the machine it's running on, so even when one is blocked work can continue on the others. And though the processing after retrieving the data will be CPU intensive, it should not prevent work in other processes or greenthreads from continuing.

But, multiple queries could tie up all of the nova-network processes and lead to a degradation of services. In short, I think there is potential for a DoS here.

Andrew Laski (alaski) wrote :

One way to address this would be to add a policy check around the use of the ip filters so they could be effectively disabled. In order to have this call work without causing issues some rethinking of how this filtering is accomplished is going to need to happen.

Mohammed Naser (mnaser) wrote :

Just an FYI to add on this, there is no way to have multiple nova-network processes (at least no simple way to do it).

https://github.com/openstack/nova/blob/master/nova/cmd/network.py

The service has no "workers" configuration, unlike services like the conductor...

https://github.com/openstack/nova/blob/master/nova/cmd/conductor.py

It is possible to launch multiple instances but typically I don't think a cloud operator has reason to think that nova-network requires launching many instances.. it is (implied) to be a fairly quiet/low CPU consuming process.

Mohammed Naser (mnaser) wrote :

Also, an alternative which seems fairly trivial to implement is requesting a filtered list from nova-conductor. That way, only "matched" search results are sent across.

Changed in ossa:
status: Incomplete → Confirmed
Thierry Carrez (ttx) on 2014-09-11
Changed in ossa:
importance: Undecided → Medium
Sean Dague (sdague) on 2014-09-12
Changed in nova:
status: New → Confirmed
Andrew Laski (alaski) wrote :

Unfortunately it is not a trivial thing to get a filtered list of instance uuids filtered by ip. The filtering for instances is somewhat complex and we don't want to duplicate that in the ip query. One potential approach would be to get a list of instance uuids not filtered by ip, and then send that to the network query to pare it down by ip filtering.

I think the easiest approach would be to put some policy in place around that filtering option to address the potential for a DOS and then the querying can be optimized later.

Thierry Carrez (ttx) on 2014-09-22
Changed in nova:
importance: Undecided → High
milestone: none → juno-rc1
Vish Ishaya (vishvananda) wrote :

I think we can change this to grab all instances and then filter locally based on the ip information in the instance info cache. This will take nova-network out of the picture and minimize the network traffic. Working on a solution now.

Vish Ishaya (vishvananda) wrote :

The above solution will still be expensive if trying to filter all instances in the system, but that would only be available to admins so I think it avoids the security issue.

Mohammed Naser (mnaser) wrote :

Can it take into account if the all_tenants flag is provided?

That way it will work both for normal users for their own accounts and admins can search all tenants with --ip, similar to how --name works

nova list --name <x> will just search your tenant
nova list --name <x> --all-tenants will search all tenants

Vish Ishaya (vishvananda) wrote :

This patch should resolve the issue

Vish Ishaya (vishvananda) wrote :

mnaser: the patch should use respect all-tenants properly

Mohammed Naser (mnaser) wrote :

Yes that would seem to respect it properly, I can try testing it out and see how it performs.

Thanks

Michael Still (mikal) wrote :

Patch applies cleanly to master and looks good to me. The patch applies cleanly to stable/icehouse but doesn't work without rework (the patch uses objects).

I'm undecided if we should backport this or not and would appreciate other people's thoughts on this -- the DoS is only available to authenticated users and has been out there in code we've marked as deprecated for a very long time. Is it really worth backporting this fix?

Vish Ishaya (vishvananda) wrote :

If we consider it a security vuln we have to provide backports.

Vish Ishaya (vishvananda) wrote :

Icehouse backport was pretty simple so that is done.

@Michael Still, a security vulnerability requires backports to supported stable releases in order to warrant an advisory (OSSA). Since Havana entered end of life, only Icehouse is now supported.

Considering the effectiveness of this attack, I think we should work on a backport. However, since Juno release is comming really fast, we should also consider if this bug really worth an embargoed disclosure. Note that if we keep the embargo the disclosure process requires 3 business days before fixes can be submitted to gerrit.

I guess it boils to the difficulty of this rework and if someone is available to do it.

Vish Ishaya (vishvananda) wrote :

I should note that I have not tested the icehouse fix against a running system, alhough I did do that for the trunk patch

Oups, Vish already post a patch! Thanks!

Here is impact description draft #1:

Title: Nova network DoS through ip filtering
Reporter: Mohammed Naser (Vexxhost)
Products: Nova
Versions: up to 2014.1.2

Description:
Mohammed Naser from Vexxhost reported a vulnerability in Nova network ip filtering. By listing active servers using an ip filter, an authenticated user may overload nova-network process, resulting in a denial of services. All Nova network setups are affected.

Vish Ishaya (vishvananda) wrote :

Havana version (untested). I removed the tests because it was a serious pain to construct object from scratch in havana. Havana is eol but i figured I would post a version for people that want to patch it on their own.

Vish Ishaya (vishvananda) wrote :

Re: impact description. Neutron isn't directly affected by the same issue, but it may have a similar one depending on how the queries are constructed to get all uuids by the filter. If it is also affected, the above patch should fix it.

@Vish: Thanks again for the feedback!

About the Havana patch, we could probably includes it in the advisory, at the condition Havana tests are still running in OpenStack gate. Else jenkins won't be able to verify it and thus we shouldn't ship it in my opinion.

Then about the impact description, is this new version enough to cover the Neutron impact ?

Title: Nova network DoS through API filtering
Reporter: Mohammed Naser (Vexxhost)
Products: Nova
Versions: up to 2014.1.2

Description:
Mohammed Naser from Vexxhost reported a vulnerability in Nova API filters. By listing active servers using an ip filter, an authenticated user may overload nova-network or neutron-server process, resulting in a denial of services. All Nova setups are affected.

Mohammed Naser (mnaser) wrote :

Hi,

I'm going to evaluate the icehouse patch on a large system and let you know.

Thank you.

Mohammed Naser (mnaser) wrote :

Ran this patch on the same cluster with that issue

nova list --all-tenants 0.92s user 0.19s system 16% cpu 6.804 total
nova list --all-tenants --ip 199.19 1.81s user 0.26s system 23% cpu 8.891 total

So OK for me regarding the Icehouse patch, apologies for saying I was going to test the master one early, but no large cluster running master here.

Thanks

Thierry Carrez (ttx) wrote :

Let's not tie this to the RC1 process

Changed in nova:
milestone: juno-rc1 → none
status: Confirmed → In Progress
Thierry Carrez (ttx) wrote :

Impact desc looks good.
@Nova-coresec: we could use another +1 on the patch here

Changed in ossa:
status: Confirmed → Triaged
Mohammed Naser (mnaser) wrote :

Sorry for the small bump but has anyone had a chance to look at this?

@mnaser: sadly this is not schedule for Juno, thus not many coresec member have time to review this now.
Update should be excepted next week though.

Thanks for your patience...

Andrew Laski (alaski) wrote :

There's a few potential issues with the fix, the main one being that the info_cache is potentially up to a minute out of date. The second issue being that this doesn't work with cells. I don't think either of these is a real problem, and certainly not large enough to block a security fix. I'm +2 on the patches.

Changed in ossa:
status: Triaged → In Progress
assignee: nobody → Tristan Cacqueray (tristan-cacqueray)
summary: - List instances by IP results in DoS of nova-network
+ List instances by IP results in DoS of nova-network (CVE-2014-3708)

The Icehouse patch is failing run_tests.sh here with:
FAIL: nova.tests.compute.test_compute.ComputeAPITestCase.test_get_all_by_multiple_options_at_once
FAIL: nova.tests.compute.test_compute_cells.CellsComputeAPITestCase.test_get_all_by_multiple_options_at_once

pythonlogging:'': {{{
INFO [nova.virt.driver] Loading compute driver 'nova.virt.fake.FakeDriver'
AUDIT [nova.compute.resource_tracker] Auditing locally available compute resources
AUDIT [nova.compute.resource_tracker] Free ram (MB): 7680
AUDIT [nova.compute.resource_tracker] Free disk (GB): 1028
AUDIT [nova.compute.resource_tracker] Free VCPUS: 1
INFO [nova.compute.resource_tracker] Compute_service record created for fake-mini:fakenode1
AUDIT [nova.compute.manager] Deleting orphan compute node 2
}}}

Traceback (most recent call last):
  File "/opt/stack/nova/.venv/local/lib/python2.7/site-packages/mock.py", line 1201, in patched
    return func(*args, **keywargs)
  File "/opt/stack/nova/nova/tests/compute/test_compute.py", line 7633, in test_get_all_by_multiple_options_at_once
    search_opts={'ip': '.*\.1', 'name': 'not.*'})
  File "/opt/stack/nova/nova/compute/api.py", line 1889, in get_all
    inst_models = self._ip_filter(inst_models, filters)
  File "/opt/stack/nova/nova/compute/api.py", line 1908, in _ip_filter
    for vif in nw_info:
TypeError: 'NoneType' object is not iterable

Can someone confirm/fix this error please ? We are willing to send the advance notification with a disclosure date set to:
2014-10-28, 1500UTC

Thanks in advance!

The stable/juno patch is also failing on:
FAIL: nova.tests.compute.test_compute.ComputeAPITestCase.test_get_all_by_multiple_options_at_once
FAIL: nova.tests.compute.test_compute_cells.CellsComputeAPITestCase.test_get_all_by_multiple_options_at_once
FAIL: nova.tests.network.test_neutronv2.TestNeutronv2.test_get_port_vnic_info_3

with the same kind of error:

pythonlogging:'': {{{
2014-10-21 21:37:03,300 INFO [nova.virt.driver] Loading compute driver 'nova.virt.fake.SmallFakeDriver'
2014-10-21 21:37:03,302 AUDIT [nova.compute.resource_tracker] Auditing locally available compute resources
2014-10-21 21:37:03,331 AUDIT [nova.compute.resource_tracker] Total physical ram (MB): 8192, total allocated virtual ram (MB): 512
2014-10-21 21:37:03,331 AUDIT [nova.compute.resource_tracker] Free disk (GB): 1028
2014-10-21 21:37:03,331 AUDIT [nova.compute.resource_tracker] Total usable vcpus: 1, total allocated vcpus: 0
2014-10-21 21:37:03,331 AUDIT [nova.compute.resource_tracker] PCI stats: []
2014-10-21 21:37:03,332 INFO [nova.compute.resource_tracker] Compute_service record created for fake-mini:fakenode1
2014-10-21 21:37:03,333 AUDIT [nova.compute.manager] Deleting orphan compute node 2
}}}

Traceback (most recent call last):
  File "/opt/stack/nova/.venv/local/lib/python2.7/site-packages/mock.py", line 1201, in patched
    return func(*args, **keywargs)
  File "/opt/stack/nova/nova/tests/compute/test_compute.py", line 8025, in test_get_all_by_multiple_options_at_once
    search_opts={'ip': '.*\.1', 'name': 'not.*'})
  File "/opt/stack/nova/nova/compute/api.py", line 1979, in get_all
    inst_models = self._ip_filter(inst_models, filters)
  File "/opt/stack/nova/nova/compute/api.py", line 1998, in _ip_filter
    for vif in nw_info:
TypeError: 'NoneType' object is not iterable

and also:

Traceback (most recent call last):
  File "/opt/stack/nova/nova/tests/network/test_neutronv2.py", line 2615, in test_get_port_vnic_info_3
    self._test_get_port_vnic_info()
  File "/opt/stack/nova/.venv/local/lib/python2.7/site-packages/mock.py", line 1201, in patched
    return func(*args, **keywargs)
  File "/opt/stack/nova/nova/tests/network/test_neutronv2.py", line 2607, in _test_get_port_vnic_info
    fields=['binding:vnic_type', 'network_id'])
  File "/opt/stack/nova/.venv/local/lib/python2.7/site-packages/mock.py", line 845, in assert_called_once_with
    raise AssertionError(msg)
AssertionError: Expected to be called once. Called 2 times.

Andrew Laski (alaski) wrote :

I've updated the patch for master and Juno and this is passing tests for me. There was a test relying on the old ip filtering and it needed to be updated. Will get an Icehouse version together as well.

Andrew Laski (alaski) wrote :

Icehouse patch

Thanks Andrew for these updates!

The pre-OSSA have been sent with a disclosure date set to:
2014-10-28, 1500UTC

Can another Nova-core review patches in comment #31 and #32.

Thanks in advance!

Changed in ossa:
status: In Progress → Fix Committed

Unit tests are successful with both patchs, beside some flake8 errors:

Icehouse:
./nova/tests/compute/test_compute.py:61:1: H306 imports not in alphabetical order (nova.objects.instance_info_cache, nova.objects.instance_group)
./nova/tests/compute/test_compute.py:85:1: F401 'test_network' imported but unused

Juno:
./nova/tests/compute/test_compute.py:88:1: F401 'test_network' imported but unused

Andrew Laski (alaski) wrote :

Of course, I forgot to run pep8.

Kilo/Juno fix attached.

Andrew Laski (alaski) wrote :

Patchs in comment #35 and #36 looks ready for prime-time. Can we please get a review of those before the disclosure date (2014-10-28, 1500UTC)

Thanks in advance!

Vish Ishaya (vishvananda) wrote :

+1 on both of these updates

information type: Private Security → Public Security

Fix proposed to branch: master
Review: https://review.openstack.org/131460

Changed in nova:
assignee: nobody → Tristan Cacqueray (tristan-cacqueray)
summary: - List instances by IP results in DoS of nova-network (CVE-2014-3708)
+ [OSSA 2014-038] List instances by IP results in DoS of nova-network
+ (CVE-2014-3708)
Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/131462
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e98738d55a2bd9e4a15df1b201f919b23d781afa
Submitter: Jenkins
Branch: stable/juno

commit e98738d55a2bd9e4a15df1b201f919b23d781afa
Author: Vishvananda Ishaya <email address hidden>
Date: Mon Sep 22 23:31:07 2014 -0700

    Fixes DOS issue in instance list ip filter

    Converts the ip filtering to filter the list locally based
    on the network info cache instead of making an extremely expensive
    call over to nova network where it attempts to retrieve a list
    of every instance in the system.

    Change-Id: I455f6ab4acdecacc5152b11a183027f933dc4475
    Closes-bug: #1358583

Reviewed: https://review.openstack.org/131460
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ae781ee97947c33d6d43e4c21df4f338c875bf1c
Submitter: Jenkins
Branch: master

commit ae781ee97947c33d6d43e4c21df4f338c875bf1c
Author: Vishvananda Ishaya <email address hidden>
Date: Mon Sep 22 23:31:07 2014 -0700

    Fixes DOS issue in instance list ip filter

    Converts the ip filtering to filter the list locally based
    on the network info cache instead of making an extremely expensive
    call over to nova network where it attempts to retrieve a list
    of every instance in the system.

    Change-Id: I455f6ab4acdecacc5152b11a183027f933dc4475
    Closes-bug: #1358583

Reviewed: https://review.openstack.org/131461
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b6a080bbdaf1a5d8534e8e0519e150f55c46d18c
Submitter: Jenkins
Branch: stable/icehouse

commit b6a080bbdaf1a5d8534e8e0519e150f55c46d18c
Author: Vishvananda Ishaya <email address hidden>
Date: Mon Sep 22 23:31:07 2014 -0700

    Fixes DOS issue in instance list ip filter

    Converts the ip filtering to filter the list locally based
    on the network info cache instead of making an extremely expensive
    call over to nova network where it attempts to retrieve a list
    of every instance in the system.

    Change-Id: I455f6ab4acdecacc5152b11a183027f933dc4475
    Closes-bug: #1358583
    (cherry picked from commit 24c8cc53fd6a62fcad1287b2cdcf32d2ff0991d9)

Thierry Carrez (ttx) on 2014-11-13
Changed in ossa:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-12-18
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-04-30
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers