[OSSA 2013-030] xenapi: secgroups are not in place after live-migration (CVE-2013-4497)

Bug #1202266 reported by Vangelis Tasoulas
272
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
John Garbutt
Grizzly
Fix Released
High
John Garbutt
OpenStack Security Advisory
Fix Released
High
Jeremy Stanley

Bug Description

Distributed Setup with:
2x Compute nodes with Debian Wheezy + XCP installed from repositories (Kronos). The VM controlling XCP is installed on Ubuntu Precise.
1x Controller node with Keystone and Nova (except Network and Compute) on Ubuntu Precise with OpenStack Grizzly installed from cloud archive.
1x Network node running Nova network with FlatDHCP (no quantum is used because it is not supported for XCP yet - I think it will starting with Havana release). The network node has 3 interfaces. 1x Public, 1x Management, 1x Tenant.
1x Storage node running Cinder, Glance and NFSv3 for shared storage to support live migration

I experiment with XCP and live migration these days so after I configured everything else, I tried to configure floating IP addresses as well. The configuration of the floating IP's was trivial but when I booted a VM, I instantly migrated it (that's what I am mostly testing) and then assigned a floating IP. Then I tried to ping it and connect to it using ssh and everything worked fine.

I boot a second VM and this time I do not migrate it. I assign a floating IP address and no ping or ssh connection is possible to be made on this one even though the iptables have been setup correctly (the SNAT and DNAT). I migrate the VM and then I can connect to it using SSH without any problems.

In the beginning I thought it is a bug and for some reason when you boot even though you should be able to connect, you cannot. After looking in the documentation I found this: http://docs.openstack.org/essex/openstack-compute/admin/content/configuring-openstack-compute-basics.html#enabling-access-to-vms-on-the-compute-node

What I understood from this is that it is the other way around and I should NOT be able to ping or connect to the VMs using SSH by default if I don't explicitly add the secgroup rules to allow such actions.

After adding these two rules everything works fine (I can access any vm, migrated or non-migrated):
$ nova secgroup-add-rule default icmp -1 -1 0.0.0.0/0
$ nova secgroup-add-rule default tcp 22 22 0.0.0.0/0

After removing them again, I cannot access the non-migrated VM's (correct) but I can still access those that they were migrated once.

Even when I migrate them back to the hypervisor originally booted on, the secgroups still do not apply and I can access those VM's.

Tags: xenserver

CVE References

Revision history for this message
Thierry Carrez (ttx) wrote :

Adding Nova PTL for sanity check

Changed in ossa:
status: New → Incomplete
Revision history for this message
Vangelis Tasoulas (cyberang3l) wrote :

Just to mention as well that I am using XenAPI pools and shared storage for the live migrations as described here: http://docs.openstack.org/trunk/openstack-compute/admin/content/configuring-migrations.html#configuring-migrations-xenserver-shared-storage

More information on my setup here:
https://answers.launchpad.net/nova/+question/232484

Revision history for this message
Russell Bryant (russellb) wrote :

I think the first thing I'd like to do here is try to replicate this behavior on another setup. I'm adding John Garbutt, nova-core and xenapi driver expert, to this bug to see if he can help try this out.

Revision history for this message
John Garbutt (johngarbutt) wrote :

I thought this sounded familiar:
https://bugs.launchpad.net/nova/+bug/1073306

But I agree, this is a security issue, for those using Security Groups.

Revision history for this message
John Garbutt (johngarbutt) wrote :

On re-reading, this might be a new case.

The old bug covers migration not live-migration. But its likely the cause.

Revision history for this message
Thierry Carrez (ttx) wrote :

So this is Xen-specific ?

I think it would qualify as a vulnerability because people would expect port filtering (at least default DROP/REJECT rules) to be reapplied ?

Revision history for this message
John Garbutt (johngarbutt) wrote :

Sorry, lost the email notification, but this is XenServer specific, its in the XenAPI driver code.

There is quite a bit of work to fix up this feature around:
* resize, migrate, live-migrate

Basically, the security groups is currently totally untested, in many respects. However, I don't think this has really been communicated at all well (if at all).

There are deeper issues here too, because the feature was written for linux bridge, but XenServer now uses OVS by default, so the iptables rules are not good enough. Will need some digging around neutron vs nova here too. I know BobBall at Citrix was talking a look at the above deeper issues, its probably worth brining him in here.

Revision history for this message
John Garbutt (johngarbutt) wrote :

Just to confirm, my code inspection confirms that this bug happens with XenAPI.

I have not checked if this bug is present in any other drivers.

I am also, not sure we have tests for this in tempest, as I heard XenAPI is very close to passing full tempest (on volume related test currently failing).

tags: added: xenserver
Changed in nova:
importance: Undecided → High
status: New → Confirmed
summary: - Secgroups are not in place after migration!
+ xenapi: secgroups are not in place after live-migration
Revision history for this message
Thierry Carrez (ttx) wrote : Re: xenapi: secgroups are not in place after live-migration

My take is that we'll need to issue an OSSA on this one.

Changed in ossa:
status: Incomplete → Confirmed
importance: Undecided → High
Revision history for this message
Michael Still (mikal) wrote :

@Gabe -- do you have a resource who might be able to help out with this one?

Revision history for this message
Matt Dietz (cerberus) wrote :

John: so in essence, we're talking about applying new flows once the VM has moved to the destination, correct?

As you point out, OVS is the default behavior here, and to my knowledge, no real implementation exists for applying OVS flows today. The implementation in Neutron (last I checked) was only a basic OVS pass, and actually utilized IPTables rules in addition. OVS was there more as a proof of concept than actual useful implementation. Ensuring that resize et al also attempt to apply security groups is insufficient since there's nothing (again, to the best of my knowledge) capable of applying those flows.

Revision history for this message
John Garbutt (johngarbutt) wrote :

So, not sure the best way to fix this stuff, advice needed.

I almost want to say we should just issue an advisory to clarify the state of the security groups feature as "experimental" with more work required before it is production ready, and the work around is not to rely on security groups. But that doesn't feel like the right response. However a proper fix will require this feature to (effectively) be implemented.

So lets try summarize the issues:
* nova has missing calls to the firewall driver (there are open public bugs on this one, and there are fixes in progress, in the public, which is probably bad) - I am happy to look into getting this fixed, but do we need to backport these? Will need a networking expert to check the fixes.
* the firewall driver in nova doesn't work with OVS - I could do with a hand fixing that
* I don't know the state of the various neutron drivers and how they interact, we don't yet have the equivalent VIF drivers for XenAPI, but that might not matter - again, not something I really know how to fix
* MAC and IP address spoofing should also be checked

Going into the firewall driver issues, it was written when XenServer used bridge networking, back in 5.6. The OVS case has always been avoided, because until recently, the version of OVS shipping with XenServer (apparently) did not have the bit masking operation that would allow you to avoid some of the worst bits of rule explosion in the number of rules. You need to take care, because there is a massive OVS slowdown once the rules don't fit in your processors L2/L3 cache, or something like that, which would give users a sort of DoS attach on the other VMs the host their VM is present.

Revision history for this message
Thierry Carrez (ttx) wrote :

@John: if "fixing" it for stable/grizzly amounts to implementing the full feature there, then I would agree that we should issue a security note about it being non-usable and move on. The issue is, it looks implemented enough so that people would trust it... in particular, security groups seem to work properly until you do a migration ?

Sitting on the fence on this one.

Revision history for this message
Thierry Carrez (ttx) wrote :

@John: what would it take to plug the security hole (people run with security groups and they kinda work, migrate and expect them to be restored but poof they are gone) ? We don't necessarily need to fix all security group bugs :)

Revision history for this message
John Garbutt (johngarbutt) wrote :

@Thierry, sorry for the delay, screwed up my email filters, and just back from holiday.

I agree, its probably best to fix this up.

Not sure of the best way to phase this, but it would be good to alert people to the fact this

A quick fix for non-live migration is probably just a cut and paste of this code:
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L444
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L462
And put it either side of the boot command here (for migration):
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L311

In terms of the public bug, around migration not setting up security groups, you can see my attempt at a "proper" fix here:
https://review.openstack.org/#/c/38455/

I am not really sure how to fix it for live-migration, need someone with more neutron skills than me really.

The problem is the VM domain and networking is created on the destination by XenAPI, auto-magically, in a single live-migrate operation:
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L1881

We could add a "post_live_migration_at_destination", that does the firewall setup, but that would be after the VM has become active on the destination, so there is likely to be a small period when the traffic is not filtered, unless we can some how block all the traffic during that time (which I think might happen when you have an OVS controller present, but I am not sure). The missing code that would go in post_live_migration, must be something like this:
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L444
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/vmops.py#L462

In libvirt, I think the VIFs are plugged before the live-migrate happens, so its covered by this method call:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3794
XenAPI has a rather dodgy implementation of that:
https://github.com/openstack/nova/blob/master/nova/virt/xenapi/driver.py#L418
I see familar code in the libvirt driver, but I don't think that will work when the VIFs have not been created, and the VIFs are not created in time:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3332

As far as getting a networking guru, I know Salvatore Orlando worked on this first implementation of these bits, maybe he would be a good person to cast his eye across the above ideas. But there might be someone on the security team who can help out, not sure who is on the list.

Revision history for this message
Thierry Carrez (ttx) wrote :

I see no point in keeping this private, since the live migration case is mentioned on the public bug. Unless someone complains, I'll open this up, which should facilitate the fixing, since this is actually non-trivial.

Revision history for this message
Thierry Carrez (ttx) wrote :

CCing Salvatore: we might need your help around here.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Agreed, with bug 1073306 already mentioning that xenapi migrations don't apply security group filters and that it also affects live migration, this is now public knowledge. Opening it up the discussion to the wider developer community will hopefully also get us a fix sooner.

Revision history for this message
Thierry Carrez (ttx) wrote :

Shall be a common OSSA with 1073306

information type: Private Security → Public Security
Revision history for this message
Bob Ball (bob-ball) wrote :

Potentially making this a little wider, my current understanding of the OVS is that the OVS does not call the netfilter code when it is forwarding traffic to VMs. In summary my belief is that only bridge-based systems support security groups, and if you configure a host to use libvirt and OVS (which I believe is possible?) then that would suffer from the same issue.

Agreed that there is a question about how to handle the live migration case with XAPI doing most of the work. There is a hook we can use in XAPI - but I'm not sure this is the best solution. I'd prefer to create the VM with fully blocked ports and then apply the correct security groups.

Revision history for this message
John Garbutt (johngarbutt) wrote :

Bob, seems like a good option. Can you take on fixing the live migrate issue?

Revision history for this message
Jeremy Stanley (fungi) wrote :

In response to Thierry's comment #19, I'm unsure how we'll be able to issue a common OSSA if the proposed fix for bug 1073306 does not address this issue. Should we hold the advisory until such time as fixes for both are ready, or do they need to diverge?

Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-rc1
Revision history for this message
John Garbutt (johngarbutt) wrote :

I have a suggested partial fix for this issue.

I have split the issue to include some additional work in this bug:
https://bugs.launchpad.net/nova/+bug/1224587

Changed in nova:
assignee: nobody → John Garbutt (johngarbutt)
status: Confirmed → In Progress
Revision history for this message
John Garbutt (johngarbutt) wrote :

@Vangelis I would really appreciate help testing this, if you still have a setup you can check this on. The fix is still a little work in progress, but I wanted to get peoples opinions, on if this would be an acceptable way forward, in the short term.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/46313

Revision history for this message
Vangelis Tasoulas (cyberang3l) wrote : Re: xenapi: secgroups are not in place after live-migration

John, I still have the setup so definitely I can help on testing. I'll report back early next week as I'm away at the moment.

Jeremy Stanley (fungi)
Changed in ossa:
assignee: nobody → Jeremy Stanley (fungi)
Revision history for this message
Jeremy Stanley (fungi) wrote :

Vangelis, did you have a chance to confirm whether John's patch above mitigates the issue on your setup?

Revision history for this message
Vangelis Tasoulas (cyberang3l) wrote :
Download full text (3.6 KiB)

I just and it's not working for me.
However, I get the following error in the nova-compute.log:

2013-09-18 14:46:10.972 INFO nova.compute.manager [req-629ff9d2-fb6c-4c83-a672-e5aecd36cb20 8ca2f0ccd55f4d9ba53847755f2e0b18 10c16ce5ffb54c56925152b7d331a8d2] [instance: f521d7f2-d27f-40ff-81da-d1a25a5b69ac] Post operation of migration started
2013-09-18 14:46:12.221 ERROR nova.openstack.common.rpc.amqp [req-629ff9d2-fb6c-4c83-a672-e5aecd36cb20 8ca2f0ccd55f4d9ba53847755f2e0b18 10c16ce5ffb54c56925152b7d331a8d2] Exception during message handling
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp rval = self.proxy.dispatch(ctxt, version, method, **args)
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 133, in dispatch
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp return getattr(proxyobj, method)(ctxt, **kwargs)
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3273, in post_live_migration_at_destination
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp block_migration, block_device_info)
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/nova/virt/xenapi/driver.py", line 521, in post_live_migration_at_destination
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp self._vmops.post_live_migration_at_destination(ctxt, instance_ref,
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp AttributeError: 'VMOps' object has no attribute 'post_live_migration_at_destination'
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.common.rpc.amqp
2013-09-18 14:46:12.230 ERROR nova.openstack.common.rpc.common [req-629ff9d2-fb6c-4c83-a672-e5aecd36cb20 8ca2f0ccd55f4d9ba53847755f2e0b18 10c16ce5ffb54c56925152b7d331a8d2] Returning exception 'VMOps' object has no attribute 'post_live_migration_at_destination' to caller
2013-09-18 14:46:12.230 ERROR nova.openstack.common.rpc.common [req-629ff9d2-fb6c-4c83-a672-e5aecd36cb20 8ca2f0ccd55f4d9ba53847755f2e0b18 10c16ce5ffb54c56925152b7d331a8d2] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/amqp.py", line 430, in _process_data\n rval = self.proxy.dispatch(ctxt, version, method, **args)\n', ' File "/usr/lib/python2.7/dist-packages/nova/openstack/common/rpc/dispatcher.py", line 133, in dispatch\n return getattr(proxyobj, method)(ctxt, **kwargs)\n', ' File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3273, in post_live_migration_at_destination\n block_migration, block_device_info)\n', ' File "/usr/lib/python2.7/dist-packages/nova/virt/xenapi/driver.py", line 521, in post_live_migration_at_destination\n self._vmops.post_live_migration_a...

Read more...

Revision history for this message
John Garbutt (johngarbutt) wrote :

@Vangelis sorry, it will need to be against trunk, a lot of that code has changed recently :(

I could try a backport to grizzly.

Revision history for this message
Thierry Carrez (ttx) wrote :

@John Garbutt: Which other versions are affected ? Looks like Grizzly is ? What about Folsom ?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/46313
Committed: http://github.com/openstack/nova/commit/5cced7a6dd32d231c606e25dbf762d199bf9cca7
Submitter: Jenkins
Branch: master

commit 5cced7a6dd32d231c606e25dbf762d199bf9cca7
Author: John Garbutt <email address hidden>
Date: Thu Sep 12 18:11:49 2013 +0100

    xenapi: enforce filters after live-migration

    Currently and network filters, including security groups, are
    lost after a server has been live-migrated.

    This partially fixes the issue by ensuring that security groups are
    re-applied to the VM once it reached the destination, and been started.

    This leaves a small amount of time during the live-migrate where the VM
    is not protected. There is a further bug raised to close the rest of
    this whole, but this helps keep the VM protected for the majority of the
    time.

    Fixes bug 1202266

    Change-Id: I84fdb6e2a8ee38d75f243aadbe79945af1d6849d

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
Jeremy Stanley (fungi) wrote : Re: xenapi: secgroups are not in place after live-migration

Can we have backports to nova's stable/grizzly branch (and stable/folsom if affected similarly)?

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
Jeremy Stanley (fungi) wrote :

Any information on which stable release branches are/were affected by this (if any)? We'll want bug tasks and backports for them as far back as folsom if possible.

Thierry Carrez (ttx)
Changed in nova:
milestone: havana-rc1 → 2013.2
Revision history for this message
John Garbutt (johngarbutt) wrote :

This one is since folsom, when live-migrate landed, totally my bad:
https://blueprints.launchpad.net/nova/+spec/xenapi-live-migration

The backport should be more straight forward for this one, although there has been quite a lot of rework around live-migrate recently, it shouldn't fundamentally change things.

Sorry for the delay, seem to have lost the updates on this in the regular email soup.

tags: added: folsom-backport-potential grizzly-backport-potential
Revision history for this message
Jeremy Stanley (fungi) wrote :

Great--thanks! I'll work on the combined impact description in bug 1073306 for now.

Revision history for this message
John Garbutt (johngarbutt) wrote :
Revision history for this message
Jeremy Stanley (fungi) wrote :

Vangelis: do you have an affiliation with any employer you want mentioned as part of your reporter credit on the security advisory for this issue?

Jeremy Stanley (fungi)
Changed in ossa:
status: Confirmed → Triaged
Revision history for this message
Vangelis Tasoulas (cyberang3l) wrote :

Jeremy: No employer needs to be mentioned, thanks :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/grizzly)

Reviewed: https://review.openstack.org/52987
Committed: http://github.com/openstack/nova/commit/df2ea2e3acdede21b40d47b7adbeac04213d031b
Submitter: Jenkins
Branch: stable/grizzly

commit df2ea2e3acdede21b40d47b7adbeac04213d031b
Author: John Garbutt <email address hidden>
Date: Thu Sep 12 18:11:49 2013 +0100

    xenapi: enforce filters after live-migration

    Currently and network filters, including security groups, are
    lost after a server has been live-migrated.

    This partially fixes the issue by ensuring that security groups are
    re-applied to the VM once it reached the destination, and been started.

    This leaves a small amount of time during the live-migrate where the VM
    is not protected. There is a further bug raised to close the rest of
    this whole, but this helps keep the VM protected for the majority of the
    time.

    Fixes bug 1202266

    (Cherry picked from commit: 5cced7a6dd32d231c606e25dbf762d199bf9cca7)

    Change-Id: I66bc7af1c6da74e18dce47180af0cb6020ba2c1a

Jeremy Stanley (fungi)
Changed in ossa:
status: Triaged → In Progress
Jeremy Stanley (fungi)
summary: - xenapi: secgroups are not in place after live-migration
+ xenapi: secgroups are not in place after live-migration (CVE-2013-4497)
Thierry Carrez (ttx)
no longer affects: nova/folsom
Thierry Carrez (ttx)
Changed in ossa:
status: In Progress → Fix Committed
Revision history for this message
Thierry Carrez (ttx) wrote :

[OSSA 2013-030]

summary: - xenapi: secgroups are not in place after live-migration (CVE-2013-4497)
+ [OSSA 2013-030] xenapi: secgroups are not in place after live-migration
+ (CVE-2013-4497)
Changed in ossa:
status: Fix Committed → Fix Released
Alan Pevec (apevec)
tags: removed: folsom-backport-potential grizzly-backport-potential
To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.