Insecure live migration with libvirt driver

Bug #1240554 reported by Russell Bryant
30
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Low
Unassigned
OpenStack Security Advisory
Won't Fix
Undecided
Unassigned
OpenStack Security Notes
Fix Released
Undecided
Nathan Kinder

Bug Description

By default, libvirt on the receiving end of a live migration starts a qemu process listening on 0.0.0.0 waiting for a tcp connection from the sender. During block migration, qemu-nbd is started similarly. This is bad because compute nodes have interfaces on the guest network. As a result, guests can interfere with live migrations.

There is a flag during migration to remedy this called VIR_MIGRATE_TUNNELLED,
which tunnels traffic over the libvirt socket (which can be secured with TLS). This seems like a great option. Unfortunately it doesn't work with the new nbd-based block migration code, so there isn't a great option for securing the traffic.

Related to this, libvirt just added:

 - Default migration bind()/listen() IP addr in /etc/libvirt/qemu.conf
 - Pass in bind()/listen() IP address to migration APIs

So with libvirt >= 1.1.4, Nova will have the ability to control the
interface used

(Problem originally reported by Vish Ishaya)

description: updated
description: updated
Revision history for this message
Daniel Berrange (berrange) wrote :

Even ignoring this live migration issue, guest VMs should never be allowed to access services on the host, except those explicitly intended for their usage - eg a DNS & DHCP service. So if Nova/Neutron have set up a guest on the same network as a host, they should also setup firewall rules to block access to *all* host ports, except for a whitelist of intended services.

Revision history for this message
Thierry Carrez (ttx) wrote :

Ouch. The trick will be to fix this in a backportable security patch :)

Changed in nova:
importance: Undecided → High
status: New → Confirmed
Changed in ossa:
status: New → Incomplete
Revision history for this message
Andrew Laski (alaski) wrote :

We have a way to address this in cases where libvirt >= 1.1.4, but is there anything to be done in Nova for older versions? It sounds like advising deployers to setup firewall rules is the best option so far.

Revision history for this message
Thierry Carrez (ttx) wrote :

So there are two approaches here...

One is to find a way to fix this that would still fit our backporting guidelines, in which cas we can issue an advisory about this and plug the hole

The other is to document this issue in a OSSN so that deployers are aware of the issue, while we push new features (and/or libvirt requirements) in a future version so that it's no longer a deployment issue in the future.

Dan, what's your take on this ?

Revision history for this message
Daniel Berrange (berrange) wrote :

Per my comment #1 I think there's really two issues here. There's the specific live migration problem which we can directly solve with a newer libvirt version. More broadly though, if the guest is able to see the host's networks, then there could be other arbitrary things listening on TCP ports on the host OS, which the guest can possibly exploit.

So regardless of the technical fix with new libvirt which we should definitely do, I think we need something at the network level in general. Either we find a way for nova-network/neutron to explicitly block access to *anything* on the host OS, except explicitly intended services, or we have to document this and let admins figure out how to block this themselves.

I'm inclined to say we should

 1. Use the libvirt fix if available
 2. document the firewall issue wrt host services listening on TCP
 3. Look at what if anything can be done in nova-network/neutron to fix this in general

I'd say we probably want items 1+2 for the immediate OSSA, and have item 3 be something we look at separately as long term "hardening" task

Revision history for this message
Thierry Carrez (ttx) wrote :

(Adding OSSG team contacts)

The libvirt fix is not really an option for the OSSA, since we can't really backport it as a stable branch update.

So... if we can find a lightweight way to efficiently prevent VMs from accessing those ports (without changing behavior too much for other ports), then I think we could do an OSSA for Grizzly/havana/icehouse for this.

If we can't, I would document the issue as an OSSN, and fix the issue in future versions -- probably using a combination of libvirt upgrade AND generally protecting the host from VM access.

The attack is imho a little bit limited anyway, as you can only interact with the receiving end and will most likely not provide it what it expects.

Revision history for this message
Robert Clark (robert-clark) wrote :

So: If steps have not been taken to isolate the Compute host from attempted access from Guest instances it is possible that a malicious virtual machine could interfere with migrations even when SSL is enabled, as SSL does not affect ndb-based block migration - Is this correct?

There are typically options for stopping a guest VM seeing anything on the host, EBTables and IPTables typically, I imagine we can document that in an OSSN, it should also be added to the OpenStack Security Guide.

Question:
I'm not familiar with nbd but I'd like it if SSL worked _and_ interface binding works, anyone know if that's on the roadmap?

Revision history for this message
Bryan D. Payne (bdpayne) wrote :

Yes, Rob, your first paragraph is correct.

I don't think that there's anything we can do here to fix this in Nova (or even Neutron?). This is really a libvirt issue. To make things even more complicated, the version of libvirt needed is currently incompatible with OpenStack (due to deadlock issues that are being actively worked on). Alas...

I think the action that this group can take immediately is to issue 2 security notes:

1) Discuss the importance of separating networks.

2) Discuss the concerns related to live migration and how to mitigate them.

Revision history for this message
Paul McMillan (paul-mcmillan) wrote :

I believe that when we were investigating this, we also determined that the bug didn't occur if you used the ssh-based live migration. Obviously allowing nodes to ssh into each other has serious security implications, and I'm not comfortable recommending that to users as a fix.

As Bryan mentioned, the nbd protocol is unencrypted, not tunneled over TLS, and has no authentication. Ideally, an organization should consider at least using ipsec to protect the data on the wire.

> The attack is imho a little bit limited anyway, as you can only interact
> with the receiving end and will most likely not provide it what it expects.

I don't believe this is quite correct. It's been a while since I looked at the code, but as I remember it, the source end opens the nbd server, from which anyone can read during the transfer period (including local guests if there is no firewall), and the receiving end opens up the libivrt listener which can be secured with TLS (but is not by default). So you've got potential vulnerabilities on both ends.

Revision history for this message
Thierry Carrez (ttx) wrote :

@Dan: could you confirm the source end is also affected ?

Revision history for this message
Daniel Berrange (berrange) wrote :

The NBD stuff isn't my area of expertise, but I asked another libvirt maintainer what libvirt does

   the dst: "nbd-server-start @ port", and then for each disk "nbd-server-add $disk_alias"; then on the src: "drive-mirror $dik_alia snbd:dest:port:$disk_alias" for each disk

IOW, the QEMU process on the target host is opening a NBD socket for listening, and the source QEMU pushes the data to the target.

So in both cases the listening TCP socket is on the target host, never the client.

Revision history for this message
Thierry Carrez (ttx) wrote :

@Nova-core: do you see a lightweight (enough to be backportable) way to efficiently prevent VMs from accessing those ports (without changing behavior too much for other ports) ? That would be plan A (we would issue an OSSA).

As mentioned above, plan B is to issue as an OSSN, and fix the issue in future versions -- probably using a combination of libvirt upgrade AND generally protecting the host from VM access.

Revision history for this message
John Garbutt (johngarbutt) wrote :

This might be a good reasons to consider bumping the minimum libvirt version in Juno, maybe issue a deprecation warning when we release icehouse?

Revision history for this message
Russell Bryant (russellb) wrote :

It seems that "plan B" is the best way forward here.

Revision history for this message
Daniel Berrange (berrange) wrote :

I agree with plan B. I don't think there's an acceptably low risk thing we can do in the stable branches for this.

Revision history for this message
Thierry Carrez (ttx) wrote :

OK, plan B it is then.

Rob, Bryan: do you want to keep this bug private until you have time to draft the OSSN ? Might take some time to research appropriate extra firewalling rules inside compute nodes...

Revision history for this message
Robert Clark (robert-clark) wrote :

Yes please, this is not going to be a trivial one to write.

Thierry Carrez (ttx)
Changed in ossa:
status: Incomplete → Won't Fix
Revision history for this message
Robert Clark (robert-clark) wrote :

I'd like to bring Nathan Kinder in on this.

Revision history for this message
Nathan Kinder (nkinder) wrote :

I will start drafting an OSSN to discuss the live migration concerns and outline actions that can be taken to address them:

- libvirt upgrade
- firewall config on Compute nodes

Changed in ossn:
assignee: nobody → Nathan Kinder (nkinder)
status: New → In Progress
Revision history for this message
Nathan Kinder (nkinder) wrote :

Libvirt uses the port range of 49152-49215 for QEMU migration by default. Is this port range also used for NBD migration?

Revision history for this message
Daniel Berrange (berrange) wrote :

Yes, the migration port range is used for NBD too.

Revision history for this message
Nathan Kinder (nkinder) wrote :

This issue is covered by the OSSN that was written for bug 1287194. That OSSN was just published to the following locations:

- Wiki (https://wiki.openstack.org/wiki/OSSN/OSSN-0007)
- <email address hidden>
- <email address hidden>

I would like to make this bug public now, as the above OSSN describes the issue in detail. Are there any objections?

Revision history for this message
Jeremy Stanley (fungi) wrote :

Public is public is public. If the gory details are public, then there is no reason for this bug to remain private.

information type: Private Security → Public Security
Nathan Kinder (nkinder)
Changed in ossn:
status: In Progress → Fix Released
Thierry Carrez (ttx)
information type: Public Security → Public
tags: added: live-migrate
Paul Murray (pmurray)
tags: added: live-migration
removed: live-migrate
lvmxh (shaohef)
Changed in nova:
assignee: nobody → lvmxh (shaohef)
Revision history for this message
Mark McLoughlin (markmc) wrote :

An updated summary of what we think is required in Nova would be helpful.

Especially taking into account that VIR_MIGRATE_TUNNELLED is now enabled by default: https://review.openstack.org/74600

lvmxh (shaohef)
Changed in nova:
assignee: lvmxh (shaohef) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

Reading through this whole bug it seems like with newer libvirt, and markmc's changes to new defaults, we're basically fixed here. Does anything else still need to be addressed?

Marking as Incomplete, if no one chimes up in 60 days this will close out.

Changed in nova:
status: Confirmed → Incomplete
importance: High → Low
Revision history for this message
Pushkar Umaranikar (pushkar-umaranikar) wrote :

As Sean suggested, marking this bug has invalid. Feel free to reopen.

Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.