[OSSA 2013-030] xenapi: secgroups are not in place after live-migration (CVE-2013-4497)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
John Garbutt | ||
| Grizzly |
High
|
John Garbutt | ||
| OpenStack Security Advisory |
High
|
Jeremy Stanley |
Bug Description
Distributed Setup with:
2x Compute nodes with Debian Wheezy + XCP installed from repositories (Kronos). The VM controlling XCP is installed on Ubuntu Precise.
1x Controller node with Keystone and Nova (except Network and Compute) on Ubuntu Precise with OpenStack Grizzly installed from cloud archive.
1x Network node running Nova network with FlatDHCP (no quantum is used because it is not supported for XCP yet - I think it will starting with Havana release). The network node has 3 interfaces. 1x Public, 1x Management, 1x Tenant.
1x Storage node running Cinder, Glance and NFSv3 for shared storage to support live migration
I experiment with XCP and live migration these days so after I configured everything else, I tried to configure floating IP addresses as well. The configuration of the floating IP's was trivial but when I booted a VM, I instantly migrated it (that's what I am mostly testing) and then assigned a floating IP. Then I tried to ping it and connect to it using ssh and everything worked fine.
I boot a second VM and this time I do not migrate it. I assign a floating IP address and no ping or ssh connection is possible to be made on this one even though the iptables have been setup correctly (the SNAT and DNAT). I migrate the VM and then I can connect to it using SSH without any problems.
In the beginning I thought it is a bug and for some reason when you boot even though you should be able to connect, you cannot. After looking in the documentation I found this: http://
What I understood from this is that it is the other way around and I should NOT be able to ping or connect to the VMs using SSH by default if I don't explicitly add the secgroup rules to allow such actions.
After adding these two rules everything works fine (I can access any vm, migrated or non-migrated):
$ nova secgroup-add-rule default icmp -1 -1 0.0.0.0/0
$ nova secgroup-add-rule default tcp 22 22 0.0.0.0/0
After removing them again, I cannot access the non-migrated VM's (correct) but I can still access those that they were migrated once.
Even when I migrate them back to the hypervisor originally booted on, the secgroups still do not apply and I can access those VM's.
CVE References
Vangelis Tasoulas (cyberang3l) wrote : | #2 |
Just to mention as well that I am using XenAPI pools and shared storage for the live migrations as described here: http://
More information on my setup here:
https:/
Russell Bryant (russellb) wrote : | #3 |
I think the first thing I'd like to do here is try to replicate this behavior on another setup. I'm adding John Garbutt, nova-core and xenapi driver expert, to this bug to see if he can help try this out.
John Garbutt (johngarbutt) wrote : | #4 |
I thought this sounded familiar:
https:/
But I agree, this is a security issue, for those using Security Groups.
John Garbutt (johngarbutt) wrote : | #5 |
On re-reading, this might be a new case.
The old bug covers migration not live-migration. But its likely the cause.
Thierry Carrez (ttx) wrote : | #6 |
So this is Xen-specific ?
I think it would qualify as a vulnerability because people would expect port filtering (at least default DROP/REJECT rules) to be reapplied ?
John Garbutt (johngarbutt) wrote : | #7 |
Sorry, lost the email notification, but this is XenServer specific, its in the XenAPI driver code.
There is quite a bit of work to fix up this feature around:
* resize, migrate, live-migrate
Basically, the security groups is currently totally untested, in many respects. However, I don't think this has really been communicated at all well (if at all).
There are deeper issues here too, because the feature was written for linux bridge, but XenServer now uses OVS by default, so the iptables rules are not good enough. Will need some digging around neutron vs nova here too. I know BobBall at Citrix was talking a look at the above deeper issues, its probably worth brining him in here.
John Garbutt (johngarbutt) wrote : | #8 |
Just to confirm, my code inspection confirms that this bug happens with XenAPI.
I have not checked if this bug is present in any other drivers.
I am also, not sure we have tests for this in tempest, as I heard XenAPI is very close to passing full tempest (on volume related test currently failing).
tags: | added: xenserver |
Changed in nova: | |
importance: | Undecided → High |
status: | New → Confirmed |
summary: |
- Secgroups are not in place after migration! + xenapi: secgroups are not in place after live-migration |
My take is that we'll need to issue an OSSA on this one.
Changed in ossa: | |
status: | Incomplete → Confirmed |
importance: | Undecided → High |
Michael Still (mikal) wrote : | #10 |
@Gabe -- do you have a resource who might be able to help out with this one?
Matt Dietz (cerberus) wrote : | #11 |
John: so in essence, we're talking about applying new flows once the VM has moved to the destination, correct?
As you point out, OVS is the default behavior here, and to my knowledge, no real implementation exists for applying OVS flows today. The implementation in Neutron (last I checked) was only a basic OVS pass, and actually utilized IPTables rules in addition. OVS was there more as a proof of concept than actual useful implementation. Ensuring that resize et al also attempt to apply security groups is insufficient since there's nothing (again, to the best of my knowledge) capable of applying those flows.
John Garbutt (johngarbutt) wrote : | #12 |
So, not sure the best way to fix this stuff, advice needed.
I almost want to say we should just issue an advisory to clarify the state of the security groups feature as "experimental" with more work required before it is production ready, and the work around is not to rely on security groups. But that doesn't feel like the right response. However a proper fix will require this feature to (effectively) be implemented.
So lets try summarize the issues:
* nova has missing calls to the firewall driver (there are open public bugs on this one, and there are fixes in progress, in the public, which is probably bad) - I am happy to look into getting this fixed, but do we need to backport these? Will need a networking expert to check the fixes.
* the firewall driver in nova doesn't work with OVS - I could do with a hand fixing that
* I don't know the state of the various neutron drivers and how they interact, we don't yet have the equivalent VIF drivers for XenAPI, but that might not matter - again, not something I really know how to fix
* MAC and IP address spoofing should also be checked
Going into the firewall driver issues, it was written when XenServer used bridge networking, back in 5.6. The OVS case has always been avoided, because until recently, the version of OVS shipping with XenServer (apparently) did not have the bit masking operation that would allow you to avoid some of the worst bits of rule explosion in the number of rules. You need to take care, because there is a massive OVS slowdown once the rules don't fit in your processors L2/L3 cache, or something like that, which would give users a sort of DoS attach on the other VMs the host their VM is present.
Thierry Carrez (ttx) wrote : | #13 |
@John: if "fixing" it for stable/grizzly amounts to implementing the full feature there, then I would agree that we should issue a security note about it being non-usable and move on. The issue is, it looks implemented enough so that people would trust it... in particular, security groups seem to work properly until you do a migration ?
Sitting on the fence on this one.
Thierry Carrez (ttx) wrote : | #14 |
@John: what would it take to plug the security hole (people run with security groups and they kinda work, migrate and expect them to be restored but poof they are gone) ? We don't necessarily need to fix all security group bugs :)
John Garbutt (johngarbutt) wrote : | #15 |
@Thierry, sorry for the delay, screwed up my email filters, and just back from holiday.
I agree, its probably best to fix this up.
Not sure of the best way to phase this, but it would be good to alert people to the fact this
A quick fix for non-live migration is probably just a cut and paste of this code:
https:/
https:/
And put it either side of the boot command here (for migration):
https:/
In terms of the public bug, around migration not setting up security groups, you can see my attempt at a "proper" fix here:
https:/
I am not really sure how to fix it for live-migration, need someone with more neutron skills than me really.
The problem is the VM domain and networking is created on the destination by XenAPI, auto-magically, in a single live-migrate operation:
https:/
We could add a "post_live_
https:/
https:/
In libvirt, I think the VIFs are plugged before the live-migrate happens, so its covered by this method call:
https:/
XenAPI has a rather dodgy implementation of that:
https:/
I see familar code in the libvirt driver, but I don't think that will work when the VIFs have not been created, and the VIFs are not created in time:
https:/
As far as getting a networking guru, I know Salvatore Orlando worked on this first implementation of these bits, maybe he would be a good person to cast his eye across the above ideas. But there might be someone on the security team who can help out, not sure who is on the list.
Thierry Carrez (ttx) wrote : | #16 |
I see no point in keeping this private, since the live migration case is mentioned on the public bug. Unless someone complains, I'll open this up, which should facilitate the fixing, since this is actually non-trivial.
Thierry Carrez (ttx) wrote : | #17 |
CCing Salvatore: we might need your help around here.
Jeremy Stanley (fungi) wrote : | #18 |
Agreed, with bug 1073306 already mentioning that xenapi migrations don't apply security group filters and that it also affects live migration, this is now public knowledge. Opening it up the discussion to the wider developer community will hopefully also get us a fix sooner.
Thierry Carrez (ttx) wrote : | #19 |
Shall be a common OSSA with 1073306
information type: | Private Security → Public Security |
Bob Ball (bob-ball) wrote : | #20 |
Potentially making this a little wider, my current understanding of the OVS is that the OVS does not call the netfilter code when it is forwarding traffic to VMs. In summary my belief is that only bridge-based systems support security groups, and if you configure a host to use libvirt and OVS (which I believe is possible?) then that would suffer from the same issue.
Agreed that there is a question about how to handle the live migration case with XAPI doing most of the work. There is a hook we can use in XAPI - but I'm not sure this is the best solution. I'd prefer to create the VM with fully blocked ports and then apply the correct security groups.
John Garbutt (johngarbutt) wrote : | #21 |
Bob, seems like a good option. Can you take on fixing the live migrate issue?
Jeremy Stanley (fungi) wrote : | #22 |
In response to Thierry's comment #19, I'm unsure how we'll be able to issue a common OSSA if the proposed fix for bug 1073306 does not address this issue. Should we hold the advisory until such time as fixes for both are ready, or do they need to diverge?
Changed in nova: | |
milestone: | none → havana-rc1 |
John Garbutt (johngarbutt) wrote : | #23 |
I have a suggested partial fix for this issue.
I have split the issue to include some additional work in this bug:
https:/
Changed in nova: | |
assignee: | nobody → John Garbutt (johngarbutt) |
status: | Confirmed → In Progress |
John Garbutt (johngarbutt) wrote : | #24 |
@Vangelis I would really appreciate help testing this, if you still have a setup you can check this on. The fix is still a little work in progress, but I wanted to get peoples opinions, on if this would be an acceptable way forward, in the short term.
Fix proposed to branch: master
Review: https:/
Vangelis Tasoulas (cyberang3l) wrote : Re: xenapi: secgroups are not in place after live-migration | #26 |
John, I still have the setup so definitely I can help on testing. I'll report back early next week as I'm away at the moment.
Changed in ossa: | |
assignee: | nobody → Jeremy Stanley (fungi) |
Jeremy Stanley (fungi) wrote : | #27 |
Vangelis, did you have a chance to confirm whether John's patch above mitigates the issue on your setup?
Vangelis Tasoulas (cyberang3l) wrote : | #28 |
I just and it's not working for me.
However, I get the following error in the nova-compute.log:
2013-09-18 14:46:10.972 INFO nova.compute.
2013-09-18 14:46:12.221 ERROR nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.221 1418 TRACE nova.openstack.
2013-09-18 14:46:12.230 ERROR nova.openstack.
2013-09-18 14:46:12.230 ERROR nova.openstack.
John Garbutt (johngarbutt) wrote : | #29 |
@Vangelis sorry, it will need to be against trunk, a lot of that code has changed recently :(
I could try a backport to grizzly.
Thierry Carrez (ttx) wrote : | #30 |
@John Garbutt: Which other versions are affected ? Looks like Grizzly is ? What about Folsom ?
Reviewed: https:/
Committed: http://
Submitter: Jenkins
Branch: master
commit 5cced7a6dd32d23
Author: John Garbutt <email address hidden>
Date: Thu Sep 12 18:11:49 2013 +0100
xenapi: enforce filters after live-migration
Currently and network filters, including security groups, are
lost after a server has been live-migrated.
This partially fixes the issue by ensuring that security groups are
re-applied to the VM once it reached the destination, and been started.
This leaves a small amount of time during the live-migrate where the VM
is not protected. There is a further bug raised to close the rest of
this whole, but this helps keep the VM protected for the majority of the
time.
Fixes bug 1202266
Change-Id: I84fdb6e2a8ee38
Changed in nova: | |
status: | In Progress → Fix Committed |
Can we have backports to nova's stable/grizzly branch (and stable/folsom if affected similarly)?
Changed in nova: | |
status: | Fix Committed → Fix Released |
Jeremy Stanley (fungi) wrote : | #33 |
Any information on which stable release branches are/were affected by this (if any)? We'll want bug tasks and backports for them as far back as folsom if possible.
Changed in nova: | |
milestone: | havana-rc1 → 2013.2 |
John Garbutt (johngarbutt) wrote : | #34 |
This one is since folsom, when live-migrate landed, totally my bad:
https:/
The backport should be more straight forward for this one, although there has been quite a lot of rework around live-migrate recently, it shouldn't fundamentally change things.
Sorry for the delay, seem to have lost the updates on this in the regular email soup.
tags: | added: folsom-backport-potential grizzly-backport-potential |
Jeremy Stanley (fungi) wrote : | #35 |
Great--thanks! I'll work on the combined impact description in bug 1073306 for now.
John Garbutt (johngarbutt) wrote : | #36 |
I have a first attempt at some backports here:
Jeremy Stanley (fungi) wrote : | #37 |
Vangelis: do you have an affiliation with any employer you want mentioned as part of your reporter credit on the security advisory for this issue?
Changed in ossa: | |
status: | Confirmed → Triaged |
Vangelis Tasoulas (cyberang3l) wrote : | #38 |
Jeremy: No employer needs to be mentioned, thanks :)
Reviewed: https:/
Committed: http://
Submitter: Jenkins
Branch: stable/grizzly
commit df2ea2e3acdede2
Author: John Garbutt <email address hidden>
Date: Thu Sep 12 18:11:49 2013 +0100
xenapi: enforce filters after live-migration
Currently and network filters, including security groups, are
lost after a server has been live-migrated.
This partially fixes the issue by ensuring that security groups are
re-applied to the VM once it reached the destination, and been started.
This leaves a small amount of time during the live-migrate where the VM
is not protected. There is a further bug raised to close the rest of
this whole, but this helps keep the VM protected for the majority of the
time.
Fixes bug 1202266
(Cherry picked from commit: 5cced7a6dd32d23
Change-Id: I66bc7af1c6da74
Changed in ossa: | |
status: | Triaged → In Progress |
summary: |
- xenapi: secgroups are not in place after live-migration + xenapi: secgroups are not in place after live-migration (CVE-2013-4497) |
no longer affects: | nova/folsom |
Changed in ossa: | |
status: | In Progress → Fix Committed |
Thierry Carrez (ttx) wrote : | #40 |
[OSSA 2013-030]
summary: |
- xenapi: secgroups are not in place after live-migration (CVE-2013-4497) + [OSSA 2013-030] xenapi: secgroups are not in place after live-migration + (CVE-2013-4497) |
Changed in ossa: | |
status: | Fix Committed → Fix Released |
tags: | removed: folsom-backport-potential grizzly-backport-potential |
Adding Nova PTL for sanity check