OpenStack Compute (Nova)

[OSSA 2014-003] Live migration can leak root disk into ephemeral storage (CVE-2013-7130)

Reported by Loganathan Parthipan on 2013-11-15
276
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Nikola Đipanov
Grizzly
High
Nikola Đipanov
Havana
High
Nikola Đipanov
OpenStack Security Advisory
High
Grant Murphy

Bug Description

During pre-live-migration required disks are created along with their backing files (if they don't already exist). However, the ephemeral backing file is created from a glance downloaded root disk.

# If the required ephemeral backing file is present then there's no issue.

# If the required ephemeral backing file is not already present, then the root disk is downloaded and saved as the ephemeral backing file. This will result in the following situations:

## The disk.local transferred during live-migration will be rebased on the ephemeral backing file so regardless of the content, the end result will be identical to the source disk.local.
## However, if a new instance of the same flavor is spawned on this compute node, then it will have an ephemeral storage that exposes a root disk.

Security concerns:

If the migrated VM was spawned off a snapshot, now it's possible for any instances of the correct flavor to see the snapshot contents of another user via the ephemeral storage.

CVE References

description: updated
description: updated
description: updated

One possible solution that seems to work:

diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index bb3c312..68f68e6 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -4290,13 +4290,22 @@ class LibvirtDriver(driver.ComputeDriver):
                 image = self.image_backend.image(instance,
                                                  instance_disk,
                                                  CONF.libvirt_images_type)
- image.cache(fetch_func=libvirt_utils.fetch_image,
- context=context,
- filename=cache_name,
- image_id=instance['image_ref'],
- user_id=instance['user_id'],
- project_id=instance['project_id'],
- size=info['virt_disk_size'])
+
+ if cache_name.startswith('ephemeral'):
+ image.cache(fetch_func=self._create_ephemeral,
+ fs_label='ephemeral0',
+ os_type=instance["os_type"],
+ filename=cache_name,
+ size=info['virt_disk_size'],
+ ephemeral_size=instance['ephemeral_gb'])
+ else:
+ image.cache(fetch_func=libvirt_utils.fetch_image,
+ context=context,
+ filename=cache_name,
+ image_id=instance['image_ref'],
+ user_id=instance['user_id'],
+ project_id=instance['project_id'],
+ size=info['virt_disk_size'])

Jeremy Stanley (fungi) wrote :

This sounds fairly serious to me at first pass. Any idea which releases are affected?

Nova security reviewers, any input on the proposed patch above?

Changed in ossa:
status: New → Confirmed
importance: Undecided → High
Andrew Laski (alaski) wrote :

The fix seems sane to me, but I am not familiar with libvirt so I would like to see someone more knowledgeable confirm. It just appears that root disks won't be downloaded for ephemeral backing files, and empty images will be created.

But this fix does seem to just address instances migrated after the patch, is there anyway to retroactively address this in cases that a root disk has already been downloaded?

The following is suitable for deployments where it's possible to just put together a script and execute.

1. Delete all ephemeral backing files not currently being referred to by a disk.local cow layer. It's better to delete all such unused ephemeral files since step 2 below can be unreliable.
2. Identify the ephemeral files that are root disk images. This is quite straightforward if all the root disks have a partition table and all we do is look at the file signature to determine if there's a partitionless image with ext3 fs on it. But on a system with user uploaded glance files or where there are single partition root disks with kernel, there could be several false negatives. Anyway, assuming we managed to determine if a compute node has a bad ephemeral backing file then we should first disable the service to prevent any more instance scheduling on that node.
3. Then we could isolate all the VMs using the bad ephemeral file(s) to a set of fresh nodes using live-migration and delete the now unused bad ephemeral files and enable scheduling again.

If one doesn't want to take the chance with step 2 above, a more deterministic approach would be to migrate all the VMs at least once to a node known to contain none or new correct ephemeral disks.

What this gives us is that all ephemeral backing files will be corrected and new instances will see the expected block device with an ext3 fs. All existing VMs will continue to run including those that had already seen the exposed data. However, they all would be backed on good ephemeral backing files.

Nikola Đipanov (ndipanov) wrote :

So IIUC the issue can only happen when the block migration is used, as this code path is only hit if there is no shared storage - this is a limited use case, and the bug does not affect all live migration setups. I am saying this just for clarity - we should definitely fix the issue asap.

The fix looks like a good start, however I think that we should special-case the swap as well. For inspiration see nova.virt.libvirt.driver _create_image method that does these things on instance boot.

Thierry Carrez (ttx) on 2013-11-27
Changed in nova:
status: New → Confirmed

{code}
diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index 440fe72..db7c6e6 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -4381,7 +4381,22 @@ class LibvirtDriver(driver.ComputeDriver):
                 image = self.image_backend.image(instance,
                                                  instance_disk,
                                                  CONF.libvirt.images_type)
- image.cache(fetch_func=libvirt_utils.fetch_image,
+ if cache_name.startswith('ephemeral'):
+ image.cache(fetch_func=self._create_ephemeral,
+ fs_label='ephemeral0',
+ os_type=instance["os_type"],
+ filename=cache_name,
+ size=info['virt_disk_size'],
+ ephemeral_size=instance['ephemeral_gb'])
+ elif cache_name.startswith('swap'):
+ inst_type = flavors.extract_flavor(instance)
+ swap_mb = inst_type['swap']
+ image.cache(fetch_func=self._create_swap,
+ filename="swap_%s" % swap_mb,
+ size=info['virt_disk_size'],
+ swap_mb=swap_mb)
+ else:
+ image.cache(fetch_func=libvirt_utils.fetch_image,
                             context=context,
                             filename=cache_name,
                             image_id=instance['image_ref'],
{code}

Nikola Đipanov (ndipanov) wrote :

Ok so this looks much better now. Launchpad is horrible for commenting on patches, but several issues with ephemerals that I can see:
  + fs_label='ephemeral0',
This should be cache_name since there can be more than one ephemeral disk
  + ephemeral_size=instance['ephemeral_gb'])
This is not always true - we are able to specify the size of ephemerals too through block device mapping. This is tricky since we don't have that info in the xml. I am not sure how to handle this - I need to look into this a bit more.

Having the correct size on an ephemeral backing file is important IMHO only on new instance spawning, since it may expects a filesystem with the size promised by the flavor. During migration it's going to be rebased anyway with the existing information from the qcow2 layer.

So to understand the second issue a bit more, are we going to have a backing file of some name ephemeral_SIZE_OS_TYPE but not of size SIZE?

Nikola Đipanov (ndipanov) wrote :

Hey, so - agreed on the size point. The fs_label should be ephemeral0, ephemeral1 etc. Not a huge deal imho.

Consider the patch +2 from my side.

And a huge thank you for fixing this so quickly.

I'll try to fix the fs_label now. Could you please take a look at the swap handling?

I'm a bit confused on the multiple ephemeral disk code. It appears that regardless of the number of disks created, the backing file is the same. ie fname = "ephemeral_%s_%s" % (eph['size'], os_type_with_default) is not dependent on idx. So I'm wondering if only ephemeral0 will be used in the backing file and the rest used in the qcow2 layers. If that's the case we hardcoding 'ephemeral0' for fs_label won't break anything.

So can we use the patch in https://bugs.launchpad.net/nova/+bug/1251590/comments/6 and move this bug along? Let me know if I need to do anything further on this. Thanks.

Tom Hancock (tom-hancock) wrote :

Maybe I misread, but,
    size=info['virt_disk_size'],
for swap should it be
    size = swap_mb * unit.Mi
based on code further down the file?

Nikola Đipanov (ndipanov) wrote :

>I'm a bit confused on the multiple ephemeral disk code. It appears that regardless of the number of disks created, the backing file is the same. ie fname = "ephemeral_%s_%s" % (eph['size'], os_type_with_default) is not dependent on idx. So I'm wondering if only ephemeral0 will be used in the backing file and the rest used in the qcow2 layers. If that's the case we hardcoding 'ephemeral0' for fs_label won't break anything.

so - we go back to the size issue - each ephemeral drive will have a backing file (assuming qcow Images are used by Nova) that is named
  fname = "ephemeral_%s_%s" % (eph['size'], os_type_with_default)
the sizes however can vary from device to device, so if we have for example an instances that allows ephemeral to be 100G we could have a disk of 50 and two disks of 25 - this will result in two backing files ephemeral_25_blah and ephemeral_50_blah and 3 qcow overlays - two that have the 25 one as base and one that has 50.

The fs_label is applied to the original file when created and it makes no sense for it to be ephemeral0,1 etc... or we should be formatting the newly created overlay which kind of defeats the purpose of having backing files. I will likely file a separate bug for this

Also Tom is right wrt size. The thing is - we will not be getting the real size here, as the user could have overriden it through block device mapping, and we do not save that in the xml.

In the long run - we should really be using the db data to do this and not only the xml, but that is definitely outside of the scope of the bugfix - so as soon as you fix the swap issue - I am +2 on this.

diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index bf7609f..6a1591c 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -4381,7 +4381,22 @@ class LibvirtDriver(driver.ComputeDriver):
                 image = self.image_backend.image(instance,
                                                  instance_disk,
                                                  CONF.libvirt.images_type)
- image.cache(fetch_func=libvirt_utils.fetch_image,
+ if cache_name.startswith('ephemeral'):
+ image.cache(fetch_func=self._create_ephemeral,
+ fs_label=cache_name,
+ os_type=instance["os_type"],
+ filename=cache_name,
+ size=info['virt_disk_size'],
+ ephemeral_size=instance['ephemeral_gb'])
+ elif cache_name.startswith('swap'):
+ inst_type = flavors.extract_flavor(instance)
+ swap_mb = inst_type['swap']
+ image.cache(fetch_func=self._create_swap,
+ filename="swap_%s" % swap_mb,
+ size = swap_mb * unit.Mi
+ swap_mb=swap_mb)
+ else:
+ image.cache(fetch_func=libvirt_utils.fetch_image,
                             context=context,
                             filename=cache_name,
                             image_id=instance['image_ref'],

Hi all, is the code above ok? Let me know if you need anything else to fix this issue.

Thierry Carrez (ttx) wrote :

Nova core security contacts: could you review the proposed patches ?

Russell Bryant (russellb) wrote :

launchpad has destroyed the formatting in the patch it seems ... can you add the patch as an attachment?

Andrew Laski (alaski) wrote :

The good lgtm based on me indenting it the way I would expect, but I would also like to see it as an attachment to be sure. All comments within the scope of this change have been addressed as far as I can tell.

Andrew Laski (alaski) wrote :

s/good/change/

Nikola Đipanov (ndipanov) wrote :

Here's the patch,

tests pass - but we need more tests for this - I was working on adding them today but didn't manage.

We can post a fix and add tests later too actually.

Andrew Laski (alaski) wrote :

Thanks Nikola. The patch looks good to me, but tests would certainly be good to help prevent regressions.

Grant Murphy (gmurphy) on 2013-12-12
Changed in ossa:
assignee: nobody → Grant Murphy (gmurphy)
Grant Murphy (gmurphy) wrote :

Draft impact description -

Title : Live migration can leak root disk into ephemeral storage
Reporter: Loganathan Parthipan
Products : nova
Affects : grizzly, havana, icehouse

Description : Loganathan Parthipan reported a vulnerability in the nova libvirt driver. By live migrating a snapshotted virtual machine the contents of the snapshot could potentially be leaked via ephemeral storage. Only setups using KVM live block migration are affected.

Thierry Carrez (ttx) wrote :

Grant, a few remarks:
- We usually credit the company the reporter works for, if he provides one.
- We capitalize "Nova"
- For the affects line, we now use "All supported versions" which is a bit more accurate than an enumartion of versions. This may or may not affect Folsom, but we don't care because it's no longer supported...
- I would precise the attack scenario a bit more, in this case what type of attacker, and how they would get access to the leak. Is it local user ? Another cloud user ? In this case, I would add "An authenticated user could potentially access snapshot content by spawning new servers from the right flavor" or something like it
- We do not have space before colons (header-style)
- We usually add a newline after "Description:"

Grant Murphy (gmurphy) wrote :

Thanks ttx. Great feedback.

So is HP right for the company? I couldn't tell from the launchpad page..

Attempt #2
----

Title: Live migration can leak root disk into ephemeral storage
Reporter: Loganathan Parthipan
Products: Nova
Affects: All supported versions

Description:
Loganathan Parthipan from Hewlett Packard reported a vulnerability in the Nova libvirt driver. By spawning a server with the same flavor as another users migrated virtual machine an authenticated user can potentially access that users snapshot content resulting in information leakage. Only setups using KVM live block migration are affected.

Yes, HP is correct.

Thierry Carrez (ttx) wrote :

@Grant: looks good. I would just add " (HP)" at the end of the Reporter: line and add a comma between "machine" and "an" in the description.

Changed in ossa:
status: Confirmed → Triaged
Jeremy Stanley (fungi) wrote :

Also, should be "user's" in two places (possessive form). Otherwise, with Thierry's recommendations in mind, the impact description text in comment #27 looks fine to me.

Grant Murphy (gmurphy) wrote :

Title: Live migration can leak root disk into ephemeral storage
Reporter: Loganathan Parthipan (HP)
Products: Nova
Affects: All supported versions

Description:
Loganathan Parthipan from Hewlett Packard reported a vulnerability in the Nova libvirt driver. By spawning a server with the same flavor as another user's migrated virtual machine, an authenticated user can potentially access that user's snapshot content resulting in information leakage. Only setups using KVM live block migration are affected.

Jeremy Stanley (fungi) wrote :

Thanks Grant! That impact description should suffice.

Grant Murphy (gmurphy) on 2013-12-16
Changed in nova:
status: Confirmed → In Progress
importance: Undecided → High
Thierry Carrez (ttx) on 2013-12-16
Changed in ossa:
status: Triaged → In Progress
Nikola Đipanov (ndipanov) wrote :

All - here's the patch with tests for master - please review and make sure the test is valid as well.

Nikola Đipanov (ndipanov) wrote :

Here's the Havana patch. Am I correct to assume that we will want to fix this in Grizzly as well?

Nikola Đipanov (ndipanov) wrote :

All,

Wondering if we should open this up because of https://bugs.launchpad.net/nova/+bug/1237683 comment #1 (basically mentioning the crux of the security issue publicly), or move it along quickly.

I've found several more issues while fixing this that would be good to report and start fixing.

Jeremy Stanley (fungi) wrote :

I think the comment in bug 1237683 is vague enough as to not point out the security risks involved, so there may still be some benefit to keeping this embargoed while your patches are reviewed here.

Grant Murphy (gmurphy) on 2013-12-18
summary: - Live migration can leak root disk into ephemeral storage
+ Live migration can leak root disk into ephemeral storage (CVE-2013-7130)

@Nikola: if Grizzly is affected we'll want this patch for Grizzly as well.

Thierry Carrez (ttx) wrote :

@Nova core: another round of review/test on the master and havana patches would be awesome.

Nikola Đipanov (ndipanov) wrote :

Here is the rebased master patch

Nikola Đipanov (ndipanov) wrote :

The rebased havana patch

Nikola Đipanov (ndipanov) wrote :

Here is the grizzly patch - please test them.

Thanks a lot!

Thierry Carrez (ttx) wrote :

@Nova-core: please review proposed patches and backports !

Russell Bryant (russellb) wrote :

patches look good to me

Andrew Laski (alaski) wrote :

Except for the spurious i in i_create_images_and_backing method in the commit message they look good to me.

Grant Murphy (gmurphy) on 2014-01-20
Changed in nova:
milestone: none → next
milestone: next → icehouse-3
milestone: icehouse-3 → none
Grant Murphy (gmurphy) wrote :

pre-OSSA sent

Proposed public disclosure date/time: Thursday, January 23, 1500UTC.

Grant Murphy (gmurphy) on 2014-01-22
Changed in ossa:
status: In Progress → Fix Committed
Grant Murphy (gmurphy) wrote :

Just noticed that the unit tests for the havana patch aren't working for me. (Probably my setup).

./run_tests.sh test_create_images_and_backing_ephemeral_gets_created

...

FAIL: nova.tests.virt.libvirt.test_libvirt.LibvirtConnTestCase.test_create_images_and_backing_ephemeral_gets_created

Traceback (most recent call last):
  File "/home/gm/devel/github/openstack/nova/nova/tests/virt/libvirt/test_libvirt.py", line 3066, in test_create_images_and_backing_ephemeral_gets_created
    CONF.image_cache_subdirectory_name)
  File "/home/gm/devel/github/openstack/nova/.venv/lib/python2.7/site-packages/oslo/config/cfg.py", line 1648, in __getattr__
    raise NoSuchOptError(name)
NoSuchOptError: no such option: image_cache_subdirectory_name

Nikola Đipanov (ndipanov) wrote :

nope - that's a bug.

I'll update ASAP

Nikola Đipanov (ndipanov) wrote :

Updated the havana patch to fix the failing tests.

@Nikola: The new patch for havana looks like the same from the previous one ( 0001-libvirt-Fix-root-disk-leak-in-live-mig_havana.patch )

Nikola Đipanov (ndipanov) wrote :

@Tristan - you are right - my bad. The real patch is coming up.

Both master and grizzly seem to be OK so no need to update those.

Thank you Nikola, this last patch fix the test_create_images_and_backing_ephemeral_gets_created!

Sadly, flake8 reported another error afterward (on the last havana patch):

./nova/virt/libvirt/driver.py:4225:1: F821 undefined name 'unit'

Nikola Đipanov (ndipanov) wrote :

Yep - seems that that module does not exist on havana yet - it's just a numeric constant tho.

Patch updated.

Grant Murphy (gmurphy) on 2014-01-23
information type: Private Security → Public Security
Grant Murphy (gmurphy) wrote :

[OSSA 2014-003]

Thierry Carrez (ttx) on 2014-01-24
summary: - Live migration can leak root disk into ephemeral storage (CVE-2013-7130)
+ [OSSA 2014-003] Live migration can leak root disk into ephemeral storage
+ (CVE-2013-7130)

Reviewed: https://review.openstack.org/68658
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b0d36683fe064b32cbef013e1c0c46bd018ab9a1
Submitter: Jenkins
Branch: master

commit b0d36683fe064b32cbef013e1c0c46bd018ab9a1
Author: Nikola Dipanov <email address hidden>
Date: Tue Dec 10 17:43:17 2013 +0100

    libvirt: Fix root disk leak in live mig

    This patch makes sure that i_create_images_and_backing method of the
    libvirt driver (called in several places, but most problematic one is
    the call in the pre_live_migration method) creates all the files the
    instance needs that are not present.

    Prioir to this patch - the method would only attempt to download the
    image, and if it did so with the path of the ephemeral drives, it could
    expose the image to other users as an ephemeral devices. See the related
    bug for more detaiis.

    After this patch - we properly distinguish between image, ephemeral and
    swap files, and make sure that the imagebackend does the correct thing.

    Closes-bug: #1251590

    Co-authored-by: Loganathan Parthipan <email address hidden>

    Change-Id: I78aa2f4243899db4f4941e77014a7e18e27fc63e

Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/68660
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cbeb5e51886b0296349fc476305bfe3d63c627c3
Submitter: Jenkins
Branch: stable/grizzly

commit cbeb5e51886b0296349fc476305bfe3d63c627c3
Author: Nikola Dipanov <email address hidden>
Date: Tue Dec 10 17:43:17 2013 +0100

    libvirt: Fix root disk leak in live mig

    This patch makes sure that _create_images_and_backing method of the
    libvirt driver (called in several places, but most problematic one is
    the call in the pre_live_migration method) creates all the files the
    instance needs that are not present.

    Prioir to this patch - the method would only attempt to download the
    image, and if it did so with the path of the ephemeral drives, it could
    expose the image to other users as an ephemeral devices. See the related
    bug for more detaiis.

    After this patch - we properly distinguish between image, ephemeral and
    swap files, and make sure that the imagebackend does the correct thing.

    Closes-bug: #1251590

    Co-authored-by: Loganathan Parthipan <email address hidden>

    This patch also includes part of commit
    65386c91910ee03d947c2b8bcc226a53c30e060a, not cherry-picked as a whole
    due to the fact that it is a trivial change, and to avoud the
    proliferation of patches needed to fix this bug.

    (cherry picked from commit c69a619668b5f44e94a8fe1a23f3d887ba2834d7)

    Conflicts:
     nova/tests/test_libvirt.py
     nova/virt/libvirt/driver.py

    Change-Id: I78aa2f4243899db4f4941e77014a7e18e27fc63e

Alan Pevec (apevec) on 2014-01-28
Changed in nova:
assignee: nobody → Nikola Đipanov (ndipanov)

Reviewed: https://review.openstack.org/68659
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=15ee7e17f63f5583307a546ecf28952c364c88f9
Submitter: Jenkins
Branch: stable/havana

commit 15ee7e17f63f5583307a546ecf28952c364c88f9
Author: Nikola Dipanov <email address hidden>
Date: Tue Dec 10 17:43:17 2013 +0100

    libvirt: Fix root disk leak in live mig

    This patch makes sure that _create_images_and_backing method of the
    libvirt driver (called in several places, but most problematic one is
    the call in the pre_live_migration method) creates all the files the
    instance needs that are not present.

    Prioir to this patch - the method would only attempt to download the
    image, and if it did so with the path of the ephemeral drives, it could
    expose the image to other users as an ephemeral devices. See the related
    bug for more detaiis.

    After this patch - we properly distinguish between image, ephemeral and
    swap files, and make sure that the imagebackend does the correct thing.

    Closes-bug: #1251590

    Co-authored-by: Loganathan Parthipan <email address hidden>

    (cherry picked from commit c69a619668b5f44e94a8fe1a23f3d887ba2834d7)

    Conflicts:
     nova/virt/libvirt/driver.py
    Change-Id: I78aa2f4243899db4f4941e77014a7e18e27fc63e

Thierry Carrez (ttx) on 2014-01-30
Changed in ossa:
status: Fix Committed → Fix Released
Changed in nova:
milestone: none → icehouse-3
Thierry Carrez (ttx) on 2014-03-05
Changed in nova:
status: Fix Committed → Fix Released
To post a comment you must log in.