[OSSA-2024-002] Incomplete file access fix and regression for QCOW2 backing files and VMDK flat descriptors (CVE-2024-40767)

Bug #2071734 reported by Arnaud Morin
280
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Critical
Unassigned
OpenStack Security Advisory
Fix Released
High
Jeremy Stanley

Bug Description

When fixing bug #2059809, a regression of the previous bug #1996188 has been introduced.

TLDR: nova is allowing back VMDK with wrong types and QCOW with backing files.

Long:

The following steps were used to reproduce on a Bobcat (2023.2) OpenStack + nova patches for bug #2059809 (not yet merge when writing this report)

Create a vmdk file:
$ qemu-img create -f vmdk disk-vmdk.vmdk 1M -o subformat=monolithicFlat
$ sed -i -r 's|disk-vmdk-flat.vmdk|/etc/hosts|' disk-vmdk.vmdk

Create a faulty qcow image:
$ qemu-img create -f qcow2 -F raw -b /etc/hosts disk-bf.qcow2 1M

Upload both images as raw (the default)
$ for i in disk-bf.qcow2 disk-vmdk.vmdk ; do openstack image create --file $i $i ; done

Boot an instance from those images:
$ openstack server create --flavor small --image disk-vmdk.vmdk --net public disk-vmdk.vmdk
$ openstack server create --flavor small --image disk-bf.qcow2 --net public disk-bf.qcow2

Results
=======
VMDK monolithicFlat
-------------------

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2615, in _build_and_run_instance
    self.driver.spawn(context, instance, image_meta,
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4388, in spawn
    created_instance_dir, created_disks = self._create_image(
                                          ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4790, in _create_image
    created_disks = self._create_and_inject_local_root(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4921, in _create_and_inject_local_root
    self._try_fetch_image_cache(backend, fetch_func, context,
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 10940, in _try_fetch_image_cache
    image.cache(fetch_func=fetch_func,
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/imagebackend.py", line 288, in cache
    self.create_image(fetch_func_sync, base, size,
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/imagebackend.py", line 615, in create_image
    copy_raw_image(base, self.path, size)
  File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 414, in inner
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/imagebackend.py", line 590, in copy_raw_image
    self.resize_image(size)
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/imagebackend.py", line 621, in resize_image
    disk.extend(image, size)
  File "/usr/lib/python3/dist-packages/nova/virt/disk/api.py", line 128, in extend
    processutils.execute('qemu-img', 'resize', image.path, size)
  File "/usr/lib/python3/dist-packages/oslo_concurrency/processutils.py", line 438, in execute
    raise ProcessExecutionError(exit_code=_returncode,
oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: qemu-img resize /var/lib/nova/instances/07b5907b-5efc-4fdf-8b15-4b47a820c2f8/disk 2147483648
Exit code: 1
Stdout: ''
Stderr: "qemu-img: Could not open '/var/lib/nova/instances/07b5907b-5efc-4fdf-8b15-4b47a820c2f8/disk': Could not open '/etc/hosts': Permission denied\n"

Qemu tried to read /etc/hosts. My system permissions prevented it, but nova did nothing about it: wrong.

QEMU Backing File
-----------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 2615, in _build_and_run_instance
    self.driver.spawn(context, instance, image_meta,
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 4415, in spawn
    self._create_guest_with_network(
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7785, in _create_guest_with_network
    with excutils.save_and_reraise_exception():
  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
    self.force_reraise()
  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
    raise self.value
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7763, in _create_guest_with_network
    guest = self._create_guest(
            ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7702, in _create_guest
    guest.launch(pause=pause)
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 167, in launch
    with excutils.save_and_reraise_exception():
  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
    self.force_reraise()
  File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
    raise self.value
  File "/usr/lib/python3/dist-packages/nova/virt/libvirt/guest.py", line 165, in launch
    return self._domain.createWithFlags(flags)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call
    rv = execute(f, *args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute
    six.reraise(c, e, tb)
  File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker
    rv = meth(*args, **kwargs)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/libvirt.py", line 1409, in createWithFlags
    raise libvirtError('virDomainCreateWithFlags() failed')
libvirt.libvirtError: internal error: cannot update AppArmor profile 'libvirt-6bd32822-2454-402a-9617-6ec66e0090f4'

In libvirtd journalctl:

Jul 02 20:22:44 compute-1 libvirtd[959438]: internal error: Child process (LIBVIRT_LOG_OUTPUTS=3:stderr /usr/lib/libvirt/virt-aa-helper -r -u libvirt-6bd32822-2454-402a-9617-6ec66e0090f4 -F /dev/net/tun) unexpected exit status 1: virt-aa-helper: error: /etc/hosts

Here it's apparmor that prevented the boot, but nova should have catched it: wrong

Expected results
----------------

Nova should raise an exception like it does previously.
E.G. for VMDK: nova.exception.ImageUnacceptable: Image xyz is unacceptable: Invalid VMDK create-type specified

Jeremy Stanley (fungi)
description: updated
Changed in ossa:
status: New → Incomplete
Revision history for this message
Dan Smith (danms) wrote :

This patch removes the explicit format passed to qemu-img in favor of a comparison with format_inspector after the fact. This means we still get a "sniff test" of what qemu thinks the image is, to catch things that are masquerading as raw images but contain more complex formats inside that qemu may later try to interpret. It's yet another compromise in front of the real fix we need, which is a refactor of the whole image backend system to make this comprehensively better.

I'll let Arnaud speak officially for himself, but I received feedback from him that this squashes the known cases of this bug already.

Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete
security advisory task has been added while the core security
reviewers for the affected project or projects confirm the bug and
discuss the scope of any vulnerability along with potential
solutions.

Given this was already partly discussed in late comments for bug
#2059809
(deleted at the last minute for the sake of the advisory
publication timeline in order to give everyone more time to solve
it properly), we may publish any resulting patches as errata to
OSSA-2024-001 rather than as a separate advisory, but can still
perform advance notice to downstream stakeholders before doing so.

Arnaud asked me to additionally subscribe Felix from the previous
bug (which I have done) and also Dan (who is subscribed already
anyway as part of the nova-coresec group).

Revision history for this message
Dan Smith (danms) wrote :

I added Zack Miele and Sean Mooney from RedHat who have context here already.

I would like to propose that we let this marinate for a short period of time, especially since the rest of this week is problematic to do much else. That will give us some time to explore whether there are any other cases we can find that aren't covered here and for proper review of the current situation. Then I think we should keep the inertia and notify the stakeholders soon (like next week) and plan to dispose of this quickly without letting it drag on. We might also want to set some ground rules about checking before expanding the scope of this like happened with the last one in case we find a big new thread to pull that would delay getting this fix out immediately, if possible.

Revision history for this message
Felix Huettner (felix.huettner) wrote :

Since i will be absent next week i have subscribed Maxim to cover preparations for yaook.

Thanks everyone

Revision history for this message
Arnaud Morin (arnaud-morin) wrote :

One note about how to reproduce: you have to set use_cow_images=false in nova.conf.
Otherwise, the Qow class will be used and will partially catch the malicious images

Revision history for this message
Dan Smith (danms) wrote :

I'm updating the patch previously attached with this one. It's almost identical, but with tests and one additional conditional that should ensure safety with a combination of non-cow environments and the deep inspection disabled, if so configured.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

The additional tests are indeed appreciated. Looking at the implementation, I'm fine given we continue to be able disable the deep inspection.

I haven't found any issues with the patch, LGTM.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

as noted priviatly an inital review of the previous patch looked ok.
in isolation the patch also looks ok but i want to test this with know good and bad images before we proceed.

felix also raised a gap in iso format support in the original bug as well as a an additional attack vector that enabling iso could introduce so I'm going to focus on that first and then hopefully we can test this next week.

Revision history for this message
Felix Huettner (felix.huettner) wrote :

thanks a lot @sean for covering that. I would not have posted it publicly if it would be exploitable

Revision history for this message
Dan Smith (danms) wrote :

I'm just realizing that this will re-open the data-file exposure if you hide a bad qcow in a glance image claiming to be another format. We won't know it's a qcow, won't safety-check it, and then will proceed to call qemu-img on it. I need to add in some code I had before that I removed to try to trim it down, but I'm realizing we still need it. It will use format-inspector's detect method to make sure we agree with what glance thinks it is (like we're doing below for qemu).

I'll get that respun and tested, it might not happen today with the holiday.

Revision history for this message
Dan Smith (danms) wrote :

Attached is a replacement patch that I think will plug the issues I just commented about. I have not tested it in devstack, only in unit tests. It has substantial changes from the one Arnaud initially verified, so we definitely need to scrutinize this. It adds format_inspector detection to make sure we don't pass something claiming to be a raw to qemu that is actually something else.

Thus three layers of confirmation:

- Glance says it's X
- format_inspector detects it as an X, matches glance
- qemu detects it as an X, matches format_inspector and glance

Also, AMI continues to be a real pain for strong assertions about what we accept. I think we really need to consider dropping support for that entirely, or come up with a subset of things we're willing to accept inside an AMI (et al) and encode those rules here.

Revision history for this message
Jeremy Stanley (fungi) wrote :

I've subscribed Thomas since he was following the state of this in the previous bug's comments that were deleted immediately before disclosure, and expressed an interest in helping test fixes for the new regression.

Revision history for this message
Dan Smith (danms) wrote (last edit ):

Still looking for detailed confirmation of the new patch, but I've now tested the latest version locally in devstack. I also am collecting a list of images, which hopefully are self-explanatory from the names:

bad-qcow-with-backing.qcow2
bad-qcow-with-datafile.qcow2
bad-vmdk-flat-expose.qcow2
bad-vmdk-flat-expose.vmdk
bad-vmdk-sparse-nbdurl.vmdk
good-qcow.qcow2
good-raw.raw
good-sparse-vmdk.vmdk
good-stream-vmdk.vmdk

All of the bad ones fail to boot, the good ones succeed, and all the non-raw ones are also uploaded as raw and fail to boot, even if they're otherwise harmless because we reject the disk_format mismatch. I've got that in a bash script locally, which we clearly need to get into automation (in python of course) somewhere at some point.

*tested with use_cow_images=True and False

Revision history for this message
Dan Smith (danms) wrote :

I've added gibi from the nova team as well, since Sylvain will be out next week and I will be as well after Monday.

Revision history for this message
Dan Smith (danms) wrote :

I've also tested this on top of Sean's latest (at this moment) iso regression fix and it catches and rejects these two additional images:

bad-iso-with-qcow.iso
good.iso

without regressing any of the others mentioned above.

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (6.0 KiB)

I have tested the change proposed by Dan on top of my changes for iso support. To do this I created an attack.iso image via the following commands:
```
# download known working images
wget http://download.cirros-cloud.net/0.6.2/cirros-0.6.2-x86_64-disk.img
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-standard-3.20.1-x86.iso
# copy iso to serve as target file
cp alpine-standard-3.20.1-x86.iso attack.iso
dd if=cirros-0.6.2-x86_64-disk.img of=attack.iso bs=32K count=1
dd if=alpine-standard-3.20.1-x86.iso of=attack.iso bs=32K skip=1 seek=1
```
we can verify the image using file and qemu-img
```
ubuntu@devstack-bugfix:~$ qemu-img info --output json attack.iso
{
    "virtual-size": 117440512,
    "filename": "attack.iso",
    "cluster-size": 65536,
    "format": "qcow2",
    "actual-size": 168824832,
    "format-specific": {
        "type": "qcow2",
        "data": {
            "compat": "1.1",
            "compression-type": "zlib",
            "lazy-refcounts": false,
            "refcount-bits": 16,
            "corrupt": false,
            "extended-l2": false
        }
    },
    "dirty-flag": false
}
ubuntu@devstack-bugfix:~$ file attack.iso
attack.iso: ISO 9660 CD-ROM filesystem data 'alpine-std 3.20.1 x86' (bootable)
```
I uploaded all 3 files to Glance and uploaded the attack.iso as both raw and iso.

```
openstack --os-cloud devstack-admin image create --file attack.iso --disk-format iso attack-iso
openstack --os-cloud devstack-admin image create --file attack.iso --disk-format raw attack-raw
openstack --os-cloud devstack-admin image create --file alpine-standard-3.20.1-x86.iso --disk-format iso alpine
```

applying dan's patch on top of my iso series required resolving one merge conflict in nova/virt/images.py related to the AttributeError
the resolution is to accept the hunk from dans series

```
- except AttributeError:
- # No inspector was found
- LOG.warning('Unable to perform deep image inspection on type %r',
- img['disk_format'])
- if disk_format in ('ami', 'aki', 'ari'):
- # A lot of things can be in a UEC, although it is typically a raw
- # filesystem. We really have nothing we can do other than treat it
- # like a 'raw', which is what qemu-img will detect a filesystem as
- # anyway. If someone puts a qcow2 inside, we should fail because
- # we won't do our inspection.
- disk_format = 'raw'
- else:
- raise exception.ImageUnacceptable(
- image_id=image_href,
- reason=_('Image not in a supported format'))
-
+ except Exception:
+ raise exception.ImageUnacceptable(
+ image_id=image_href,
+ reason=_('Image not in a supported format'))
     if disk_format == 'iso':
         # ISO image passed safety check; qemu will treat this as raw from here
         disk_format = 'raw'
-
     return disk_format
```

note it is very important to keep
```
 if disk_format == 'iso':
         # ISO image passed safety check; qemu will treat this as raw from here
         disk_format = 'raw'
```
after the` except Exception:` block and ...

Read more...

Revision history for this message
Dan Smith (danms) wrote :

Thanks Sean!

The iso fix is in the gate now:

https://review.opendev.org/c/openstack/nova/+/923533

My suggestion for next steps would be:

1. Wait for that to merge
2. Replace the patch here with one based on that and the conflict properly resolved
3. Give it a day or two to marinate amongst the people cc'd here (ideally with some positive acks)
4. Push ahead on the stakeholder notification

Unless we have a lot of problems with step 1, I'd say we could shoot for Wednesday or Thursday for step 4.

Thoughts?

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i agree with dans timeline.

i suspect that is quite doable to do step 2 tomorrow or later today depending on how long step 1 takes.

Wednesday or Thursday for step 4 seams reasonable to me but ill defer to others on that.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Downstream stakeholder advance notification on Thursday would be ideal. We generally avoid sending such notifications Friday through Monday, preferring Tuesday/Wednesday/Thursday for better visibility. If we send it Thursday of this week, I would plan for public disclosure the following Thursday (2024-07-18) in order to give everyone time to apply the new patches.

In the current state, and so close to last week's advisory, I'm inclined to consider this errata to OSSA-2024-001 and widely announce an amendment to the existing advisory, calling out the additional patches and guidance.

Revision history for this message
Zack Miele (zmiele) wrote :

Hey Fungi, a quick question on process here, do you intend to get another CVE assigned for this issue as well? Strictly for this regression I mean. It would certainly help downstreams track things to make sure all necessary fixes are accounted for.

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

Hi there!

I don't understand. We're already setting-up a clock for disclosure, but there's no patch available here that applies on top of the ISO fix from Sean. Could you please post a patch, give me a bit of time to have it backported, and see if it fits, before rushing into further stress?

Cheers,

Thomas Goirand (zigo)

Revision history for this message
sean mooney (sean-k-mooney) wrote (last edit ):

the iso patch merged just as i was signing off last night.
the content of unforce-format.patch is correct but it has a merge conflict
i will be uploading my local version shortly to replace the existing one.

my plan for today is to backport the iso patch to all relevelnt release
once that is done ill also upload backported variants of unforce-format.patch for each release
as an attachment that applies on top of the relevant backport of the iso patch.

my goal for today is to have a public backport of the iso patches up for review and private attachment of the relevant patches here for this issue. that will likely take a few hours but
ill provide the master version after grabbing a coffee so that you can start testing while I'm working to prepare the other patches.

regards
sean

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i have uploaded a master.patch which is the same as unforce-format.patch rebased to master as of 2024-07-09 10:00 UTC

once I have the proposed backports for the iso patches I will create a <branchname>.patch file for each based on the relevant Gerrit commit.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i have now proposed the backport for the iso format fix to all stable branches

https://review.opendev.org/q/topic:%22format-inspector%22

the master.patch applies cleanly to 2024.1 https://review.opendev.org/c/openstack/nova/+/923724
it has conflict on 2023.2 https://review.opendev.org/c/openstack/nova/+/923729
and will also have conflicts on 2023.1 https://review.opendev.org/c/openstack/nova/+/923733

i will prepare the relevnet addition patch files after i have resolved the conflicts locally and upload them

i will also upload a 2024.1 copy of the master patch even though they are identical just to have a 1:1 mapping between patch and branch.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

actully applying patchs with "git am -3 < patch_file" works cleanly.

i have preperared the branch specific version in any case and will start uploading them now

i belive -3 is needed because the parent commit sha is changing between branch and git needs the extra context of the 3 way merge to resolve the applcaition.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

created by applying the master patch to https://review.opendev.org/c/openstack/nova/+/923724
with `git am < ~/cve-patches/master.patch`

Revision history for this message
sean mooney (sean-k-mooney) wrote :

created by applying the 2024.1 patch to https://review.opendev.org/c/openstack/nova/+/923729
with `git am < ~/cve-patches/2024.1.patch` and exporting with
`git format-patch --from HEAD^1 --output ~/cve-patches/2023.2.patch`

Revision history for this message
sean mooney (sean-k-mooney) wrote :

created by applying the 2023.2.patch to https://review.opendev.org/c/openstack/nova/+/923733
with `git am < ~/cve-patches/2023.2.patch` and exporting with
`git format-patch --from HEAD^1 --output ~/cve-patches/2023.1.patch`

Revision history for this message
Jeremy Stanley (fungi) wrote :

Thomas: We haven't notified downstream stakeholders (security contacts for distributions and cloud providers) yet, but the plan is to supply them with tested patches a full week ahead of any public disclosure. If patches for maintained stable branches aren't ready by Thursday, the earliest we can start that one-week clock is next Tuesday, 2024-07-16 (with a corresponding public disclosure a week later on 2024-07-23). Regardless, I plan to provide a week between supplying advance copies of the patches and the scheduled publication date.

Zack: Normally we don't seek a separate CVE assignment to track regressions in security fixes when we intend to handle them as errata for a prior advisory. Since this is an extraordinary case though, I'll submit a CVE assignment request tomorrow with the following details (everyone, please let me know if you spot inaccuracies, though they can also be adjusted later if needed):

Title: File access regression for QCOW2 backing files and VMDK flat descriptors
Reporter: Arnaud Morin with OVH
Products: Nova
Affects: ==27.4.0, ==28.2.0, ==29.1.0

Description:
Arnaud Morin (OVH) reported a vulnerability in Nova. By supplying a raw format image which is actually a specially created QCOW2 image with a backing file path or VMDK flat image with a descriptor file path, an authenticated user may convince systems to return a copy of that file’s contents from the server resulting in unauthorized access to potentially sensitive data. Only Nova deployments patched with the original OSSA-2024-001 fixes are affected.

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

Hi there!
Sean, I tried applying your backport patch on top of Bobcat and Caracal, and each time, it doesn't apply. I would resolve the conflicts by myself, if only this didn't mean I'm probably missing a (critical?) patch:

$ quilt push
Applying patch debian/patches/CVE-2024-XXXXX_4_Change-force_format-strategy-to-catch-mismatches_caracal.patch
patching file nova/tests/unit/virt/libvirt/test_utils.py
patching file nova/tests/unit/virt/test_images.py
Hunk #8 FAILED at 318.
Hunk #9 succeeded at 342 (offset -18 lines).
1 out of 9 hunks FAILED -- rejects in file nova/tests/unit/virt/test_images.py
patching file nova/virt/images.py
Hunk #1 FAILED at 143.
1 out of 2 hunks FAILED -- rejects in file nova/virt/images.py

Note that this is my current patch stack in debian/series:
CVE-2024-32498_1_nova-stable-2024.1_Reject_qcow_files_with_data-file_attributes.patch
CVE-2024-32498_2_nova-stable-2024.1_Check_images_with_format_inspector_for_safety.patch
CVE-2024-32498_3_nova-stable-2024.1_Additional-qemu-safety-checking-on-base-images.patch
CVE-2024-32498_4_Fix-vmdk_allowed_types-checking.patch
CVE-2024-XXXXX_1_port_format_inspector_tests_from_glance.patch
CVE-2024-XXXXX_2_Reproduce_iso_regression_with_deep_format_inspection.patch
CVE-2024-XXXXX_3_Add-iso-file-format-inspector.patch
CVE-2024-XXXXX_4_Change-force_format-strategy-to-catch-mismatches_caracal.patch

What patch am I missing?

Note I have the exact same issue in both Bobcat and Caracal (didn't try Antelope yet, will soon go for it, and all the way to Victoria).

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

I found out, it was *very* confusing. Looks like I had to update https://review.opendev.org/c/openstack/nova/+/923289 to its latest version.

Please, please please please, when such patch is updated, you *must* let everyone know. Now, it feels like I'd have to check every single patch from CVE-2024-32498 just to make sure I have latest version... :/

Thomas

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :
Download full text (3.7 KiB)

Looks like backport patches all contains an error. Just right after "def do_image_deep_inspection" it adds an extra parenthesis:
disk_format = img['disk_format'])

which obviously, isn't parseable. Once this is fixed, at least in Antelope (I haven't checked other branches yet), I'm getting this:

======================================================================
FAIL: nova.tests.unit.virt.test_images.QemuTestCase.test_fetch_iso_is_raw
nova.tests.unit.virt.test_images.QemuTestCase.test_fetch_iso_is_raw
----------------------------------------------------------------------
testtools.testresult.real._StringException: pythonlogging:'': {{{
2024-07-10 01:35:54,138 WARNING [oslo_policy.policy] JSON formatted policy_file support is deprecated since Victoria release. You need to use YAML format which will be default in future. You can use ``oslopolicy-convert-json-to-yaml`` tool to convert existing JSON-formatted policy file to YAML-formatted in backward compatible way: https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html.
2024-07-10 01:35:54,138 WARNING [oslo_policy.policy] JSON formatted policy_file support is deprecated since Victoria release. You need to use YAML format which will be default in future. You can use ``oslopolicy-convert-json-to-yaml`` tool to convert existing JSON-formatted policy file to YAML-formatted in backward compatible way: https://docs.openstack.org/oslo.policy/latest/cli/oslopolicy-convert-json-to-yaml.html.
2024-07-10 01:35:54,139 WARNING [oslo_policy.policy] Policy Rules ['os_compute_api:extensions', 'os_compute_api:os-floating-ip-pools', 'os_compute_api:os-quota-sets:defaults', 'os_compute_api:os-availability-zone:list', 'os_compute_api:limits', 'project_member_api', 'project_reader_api', 'project_member_or_admin', 'project_reader_or_admin', 'os_compute_api:limits:other_project', 'os_compute_api:os-lock-server:unlock:unlock_override', 'os_compute_api:servers:create:zero_disk_flavor', 'compute:servers:resize:cross_cell', 'os_compute_api:os-shelve:unshelve_to_host'] specified in policy files are the same as the defaults provided by the service. You can remove these rules from policy files which will make maintenance easier. You can detect these redundant rules by ``oslopolicy-list-redundant`` tool also.
}}}

Traceback (most recent call last):
  File "/<<PKGBUILDDIR>>/nova/virt/images.py", line 158, in do_image_deep_inspection
    inspector = format_inspector.detect_file_format(path)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<<PKGBUILDDIR>>/nova/image/format_inspector.py", line 1014, in detect_file_format
    with open(filename, 'rb') as f:
         ^^^^^^^^^^^^^^^^^^^^
  File "/<<PKGBUILDDIR>>/nova/tests/fixtures/nova.py", line 1788, in toxic_wrapper
    return orig_f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'anypath.part'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/unittest/mock.py", line 1369, in patched
    return func(*newargs, **newkeywargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/<<PKGBUILDD...

Read more...

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

I just checked, and it really is the last backport patch that's causing this (ie: "Change force_format strategy to catch mismatches"). Maybe because I'm missing hunks on other previous patches in the stack? I'll be checking for this.

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

I took the latest version, commited to Gerrit, of:
- Check images with format_inspector for safety
- Additional qemu safety checking on base images
- Fix vmdk_allowed_types checking

Indeed, all of the 3 above were different from the one sent during the advisory.

But it didn't help. I still have the above unit test failure.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

ok, let me revalidate that the backports are created correctly and apply them cleanly.
I'll execute unit/functional tests to confirm via tox in a clean clone of Nova to make sure I do not see any effect from my normal dev environment.

if I reproduce the observed issue I'll upload the patches and let you know which were changed.

its possible i did not save the file correctly before commit or something simple like that.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

there is indeed an extra close peren in the master.patch

the original patch was

+ ami_formats = ('ami', 'aki', 'ari')
     disk_format = img['disk_format']
the master.patch contains

- disk_format = img['disk_format']
+ ami_formats = ('ami', 'aki', 'ari')
+ disk_format = img['disk_format'])

this was missed mainly because i was not reviewing the patches on gerrit after i created them and id didn't notices that locally with diff

Revision history for this message
Jeremy Stanley (fungi) wrote :

Sean reminded me yesterday that this vulnerability is mitigated by keeping use_cow_images enabled, so the last sentence for the description in comment #29 should instead be, "Only Nova deployments which disable use_cow_images are affected."

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i have corrected the rebase artificat locally for master and the failing unit test.

the iso series adds one additional function call that was missing a mock in test_fetch_iso_is_raw.

i will upload a new revision of the branched patches in the next 30-60 mins once i have rerun the unit/functional tests on each branch.

Revision history for this message
Jeremy Stanley (fungi) wrote :

After further discussion with Sean, it seems I misunderstood the two separate concerns addressed in this patch. Since only the incomplete fix for QCOW2 is mitigated by use_cow_images and the VMDK regression is reachable under all conditions, let's go with this simpler description instead (for the errata publication we'll separate the two cases as different comments):

Title: Incomplete file access fix and regression for QCOW2 backing files and VMDK flat descriptors
Reporter: Arnaud Morin with OVH
Products: Nova
Affects: <27.4.1, >=28.0.0 <28.2.1, >=29.0.0 <29.1.1

Description:
Arnaud Morin (OVH) reported a vulnerability in Nova. By supplying a raw format image which is actually a specially crafted QCOW2 image with a backing file path or VMDK flat image with a descriptor file path, an authenticated user may convince systems to return a copy of the referenced file’s contents from the server resulting in unauthorized access to potentially sensitive data. All Nova deployments are affected.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

generated with git format-patch --from HEAD^1 --output ~/cve-patches/master.patch
after rebasing unforce-format.patch on the current head of master

commit a305571262481f7fa0152ed23fffdcc87fb821c9 (origin/master, origin/HEAD, gerrit/master, master)
Merge: cc2514d02e ee3ec9b8f2
Author: Zuul <email address hidden>
Date: Tue Jul 9 19:11:48 2024 +0000

    Merge "[ironic] Ensure we test iterators when needed"

Revision history for this message
sean mooney (sean-k-mooney) wrote :

created by applying the master.patch to https://review.opendev.org/c/openstack/nova/+/923729
with `git am < ~/cve-patches/master.patch` and exporting with
`git format-patch --from HEAD^1 --output ~/cve-patches/2024.1.patch`

commit eeda7c333c773216c216159926673874ce4843ba
Author: Sean Mooney <email address hidden>
Date: Thu Jul 4 20:09:31 2024 +0100

    Add iso file format inspector

Revision history for this message
sean mooney (sean-k-mooney) wrote :

created by applying the 2023.2 patch to https://review.opendev.org/c/openstack/nova/+/923733

commit 65f0789df05e2ba7f11c0eaf2c6959367acbced2
Author: Sean Mooney <email address hidden>
Date: Thu Jul 4 20:09:31 2024 +0100

    Add iso file format inspector

with `git am < ~/cve-patches/2023.2.patch` and exporting with
`git format-patch --from HEAD^1 --output ~/cve-patches/2023.1.patch`

Revision history for this message
sean mooney (sean-k-mooney) wrote :

created by applying the 2024.1 patch to https://review.opendev.org/c/openstack/nova/+/923729

commit 24628ecbbe9d5fdd4fe6767ca92395f0d3da9e48
Author: Sean Mooney <email address hidden>
Date: Thu Jul 4 20:09:31 2024 +0100

    Add iso file format inspector

with `git am < ~/cve-patches/2024.1.patch` and exporting with
`git format-patch --from HEAD^1 --output ~/cve-patches/2023.2.patch`

Revision history for this message
sean mooney (sean-k-mooney) wrote :

master.patch, 2024.1.patch, 2023.2.patch and 2023.1.patch are now updated.
i uploaded the incorrect patch for 2023.2.patch so i redid that.

i have ran unit/functional tests on master.patch and 2024.1.patch

the system i am preparing the patches on is not capable for running the test on older releases
so i will test the 2023.1.patch and 2023.2.patch files shortly after i copy them to a test VM.

i will update this bug based on the results.

the patches should apply directly to the current revision of the iso series on each branch.
and the master patch is based directly on the current tip of master.

Revision history for this message
Jeremy Stanley (fungi) wrote : Re: Regression VMDK/qcow arbitrary file access (CVE-2024-40767)

MITRE has assigned CVE-2024-40767 for this.

summary: - Regression VMDK/qcow arbitrary file access
+ Regression VMDK/qcow arbitrary file access (CVE-2024-40767)
Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

Hi.

Thanks Jeremy for the new CVE number. That's indeed easier to manage for everyone.

Thanks Sean and Dan for the work and patches. FYI, I have been able to apply it to Antelope to Caracal, and could also build backports from Victoria to Zed, without any regression when running unit tests at package build time.

I still have to run functional tempest testing with my nested-virtualized-PoC [1] (under both Victoria and Zed, as they are respectively the versions in Bullseye and Bookworm), though so far it's looking good. :)

Cheers,

Thomas Goirand (zigo)

[1] https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer#using-oci-poc-package-for-fun-and-profit

Revision history for this message
sean mooney (sean-k-mooney) wrote :

for older disto release we have a number of optional follow up patches for unit tests sablity/fucntionality

for example https://review.opendev.org/c/openstack/nova/+/923878/3 and https://review.opendev.org/c/openstack/nova/+/923935/1

these just skip unit tests if the distro ships a qemu-img that does not support the relevant format ectra.

if you have any issues with the package build related to the ported unit tests then you should be able to apply those commits.

we have proposed those on top of the iso patches to all upstream stable releases since they are not directly related to this cve and are related to the content of the iso fix series, namely the imported unit tests from glance.

i have not had time to create patches for our downstream branches yet
which are nominally based on wallaby and train but I'm expecting the patches to apply cleanly to those as well if you have the iso format backports.

I'm not sure how useful patches for our downstream branches would be to attach here as we have some downstream-only feature backport but i can potentially share those after the discourser

Revision history for this message
Jeremy Stanley (fungi) wrote :

When starting to put together the advance notification for downstream stakeholders, it quickly became apparent that the messaging will be complex while we're still in a state of flux with outstanding pre-requisite changes that have merged to some branches but not others. For example, at this point the proposed patch for stable/2024.1 should work when merged to the present state of that branch, but the same is not true for stable/2023.2 yet.

In order to simplify downstream patching as well as give time for more testing and feedback from Nova reviewers, I'm going to defer sending advance patch copies until Tuesday, 2024-07-16, with a proposed disclosure one week later at 15:00 UTC on 2024-07-23. Hopefully we can use this opportunity to prioritize approvals and gating for the ISO patch series on branches that don't have it merged yet.

Thanks again to everyone for all your hard work on this!

Revision history for this message
Thomas Goirand (thomas-goirand) wrote :

Hi Jeremy,

Indeed, unless someone is following closely what's going on, it's kind of hard to make a working patch-set, especially when targeting EOL branches. I would suggest that, in your message, you point out that 3 out of 4 patches for CVE-2024-32498 needs to be updated with the last version from Gerrit, as I pointed out earlier. The only one that didn't change is the first one. Maybe a link to the 4 patches for CVE-2024-32498, for each 4 branches, would be enough? Here's the list, if that helps:

1/ Reject qcow files with data-file attributes:
master: https://review.opendev.org/c/openstack/nova/+/923255
Caracal: https://review.opendev.org/c/openstack/nova/+/923273
Bobcat: https://review.opendev.org/c/openstack/nova/+/923284
Antelope: https://review.opendev.org/c/openstack/nova/+/923288

2/ Check images with format_inspector for safety
master: https://review.opendev.org/c/openstack/nova/+/923256
Caracal: https://review.opendev.org/c/openstack/nova/+/923274
Bobcat: https://review.opendev.org/c/openstack/nova/+/923285
Antelope: https://review.opendev.org/c/openstack/nova/+/923289

3/ Additional qemu safety checking on base images
master: https://review.opendev.org/c/openstack/nova/+/923257
Caracal: https://review.opendev.org/c/openstack/nova/+/923275
Bobcat: https://review.opendev.org/c/openstack/nova/+/923286
Antelope: https://review.opendev.org/c/openstack/nova/+/923290

4/ Fix vmdk_allowed_types checking
master: https://review.opendev.org/c/openstack/nova/+/923258
Caracal: https://review.opendev.org/c/openstack/nova/+/923276
Bobcat: https://review.opendev.org/c/openstack/nova/+/923287
Antelope: https://review.opendev.org/c/openstack/nova/+/923291

I hope this helps.

Cheers,

Thomas Goirand (zigo)

Revision history for this message
Jeremy Stanley (fungi) wrote :

Thomas: Those are the same links that appeared for Nova in all copies of the official advisory, e.g. https://security.openstack.org/ossa/OSSA-2024-001.html , so it's probably sufficient to just mention that there were minor updates to them in the process of getting them merged to their corresponding public branches. That's not terribly unusual, and has happened with many of our past advisories. Private copies of patches supplied in advance are best-effort attempts to resolve vulnerabilities, but should eventually be replaced with whatever actually merges upstream since those may get adjustments to non-runtime code paths like test routines or rebased to address merge conflicts with changes that landed independently between patch creation and disclosure.

As far as the new patches attached to this bug, since they depend on additional public patches in each branch (besides those covered in OSSA-2024-001), it's looking more likely that we'll need to issue a completely separate advisory in order to make it clear that the new fixes are expected to be applied to the current states of those branches rather than merely as additions to the original changes from OSSA-2024-001.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

the iso fixes are important as I had to mitigate the possible publicly reported iso attack vector of putting a different image header in the system area of the iso file before i could enable iso support.

the side effect of that is preventing having multiple formats in a single file for all file formats.

so that is a separate hardening opportunity that was not covered by either cve and has been mitigated as part of the regression fix.

it does not change the scope of this CVE but its still important to improve the security posture.

Revision history for this message
Arnaud Morin (arnaud-morin) wrote :

Adding julien le jeune, working within OVH on nova topics, he will work on that subject while I'll be off

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The attached patches seems to be not enough. I did testing based on 2023.1 and is see that the vmdk based vulnerability is still possible. It seems the format_inspector does not support the monolithicFlat, detects it as raw, and then nova calls qemu-img info on the image.

This can be reproduced via the original instruction on a system that patched with the attached 2023.1.patch . Or via the following unit test:
```
    def test_vmdk_monolithic_flat_format_detect(self):
        img = self._create_img("vmdk", 1 *units.Mi, subformat="monolithicFlat")
        self.assertEqual("vmdk", str(format_inspector.detect_file_format(img)))

```

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I cannot reproduce the qcow2 case described by the reporter any more after applying the attached 2023.1.patch. Nova properly rejects the image without passing it to qemu-img:
```
Jul 15 13:35:35 edpm-compute-0 nova_compute[357969]: 2024-07-15 13:35:35.050 2 ERROR nova.compute.manager [instance: 0b848525-9c55-4b8a-b5d5-e7457da5c0eb] File "/usr/lib/python3.9/site-packages/nova/virt/images.py", line 201, in fetch_to_raw
Jul 15 13:35:35 edpm-compute-0 nova_compute[357969]: 2024-07-15 13:35:35.050 2 ERROR nova.compute.manager [instance: 0b848525-9c55-4b8a-b5d5-e7457da5c0eb] force_format = do_image_deep_inspection(img, image_href, path_tmp)
Jul 15 13:35:35 edpm-compute-0 nova_compute[357969]: 2024-07-15 13:35:35.050 2 ERROR nova.compute.manager [instance: 0b848525-9c55-4b8a-b5d5-e7457da5c0eb] File "/usr/lib/python3.9/site-packages/nova/virt/images.py", line 182, in do_image_deep_inspection
Jul 15 13:35:35 edpm-compute-0 nova_compute[357969]: 2024-07-15 13:35:35.050 2 ERROR nova.compute.manager [instance: 0b848525-9c55-4b8a-b5d5-e7457da5c0eb] raise exception.ImageUnacceptable(
Jul 15 13:35:35 edpm-compute-0 nova_compute[357969]: 2024-07-15 13:35:35.050 2 ERROR nova.compute.manager [instance: 0b848525-9c55-4b8a-b5d5-e7457da5c0eb] nova.exception.ImageUnacceptable: Image edc3e901-4410-4337-b622-96276f346ed7 is unacceptable: Image not in a supported format
```

Revision history for this message
Dan Smith (danms) wrote :

As part of the previous-previous-CVE (I'd have to go look up which one) we just limited the subtypes we'll allow by policy:

https://github.com/openstack/nova/blob/master/nova/virt/images.py#L137

So format_inspector doesn't support the formats we know to be dangerous (and less useful for cloudy type things). However, we should be either be failing because raw (from format_inspector) doesn't match vmdk (from qemu-img) or the above vmdk safety check, whichever comes first.

Are you saying that's not happening? And is it only on 2023.1 or potentially elsewhere?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

> However, we should be either be failing because raw (from format_inspector) doesn't match vmdk (from qemu-img) or the above vmdk safety check, whichever comes first.

The vmdk safet_check does not run as the format_inspector does not recognize the file as vmdk, but as raw.

Then we call qemu-img info, but that means we trigger the vulnerability as we pass an unsafe image to qemu.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

> And is it only on 2023.1 or potentially elsewhere?

I only tested it on top of 2023.1 but as the format_inspector lack of monolithicFlat support is true on master based on the unit test I think all the branches are affected.

Revision history for this message
Dan Smith (danms) wrote :

Okay, let me summarize the conversation I just had with gibi here. tl;dr the patch as it is here should be good.

The actual VMDK regression in the original report can be boiled down to "if we hide a VMDK in a raw glance image (or a vmdk that FI thinks is raw), we will not call the old/existing vmdk allowed-types check." I think the OP tried a vulnerable image, saw that it failed but for the wrong reason (because it really opened the backing/extent file and failed before we ran the actual check) and then captured the trace from the logs. Gibi's concern was that we should have failed the safety check by that point, and ended up calling `qemu-img info` on the file, which triggered the permissions issue. However, for those unsupported subtypes, that's expected and if given a backing/extent file that doesn't fail that permission check, we _will_ proceed to call the VMDK types check as expected (and as we did before), thus closing the regression hole with this patch.

Longer-term we should teach format-inspector to recognize even the unsupported subtypes and just make the safety check fail for the ones that we don't support further deep inspection on.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote (last edit ):

Thanks Dan. I agree on the summary. In the meantime I tested the vmdk scenario with an image that points to an existing and readable file on the hypervisor. Nova do call qemu-img info on the file but then our new check that compares the disk_format from glance and from the format inspector (raw in this case) with the disk_format returned from qemu-img info call will fail and nova will properly reject the image.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

thanks, dan/gibi for confirming.

I tried to test this also and saw the permission failure but missed the fact that it is expected.
now that the scope has been clarified, and Gibi has confirmed it works and is rejected when the file is readable, I agree that additional hardening via teaching the format inspector about these unsupported formats can be done separately.

we should not expand the scope of this cve to include that.
I briefly looked at what that involves and it would be better to do that via the normal code review process than under the limited embargoed one.

It is possible to teach the inspector about the other vmdk formats but its a more invasive change then I think we want to do in the current mitigation as it would introduce additional risk and delay closing the current bug without improving security enough to warrant the delay.

calling qemu img info will still reject the image in this case.

Revision history for this message
Arnaud Morin (arnaud-morin) wrote :

Hello,

I confirmed that the current patch mitigates the issue described in that bug.

However, as said by gibi earlier, the code path let the VMDK go through a qemu-img info without enforcing the format.
I was not able to craft a VMDK image that could break through the system, but don't you think that would be possible using QMP (https://www.qemu.org/docs/master/interop/qemu-storage-daemon-qmp-ref.html)?

Revision history for this message
sean mooney (sean-k-mooney) wrote :

preventing the call to qemu is out of scope of this bug.
that would require implementing support for parsing all VMDK formats including the ones we have never supported.

openstack has only ever supported the single file vmdk formats.

$ qemu-img create -f vmdk disk-vmdk.vmdk 1M -o subformat=monolithicFlat

creates a multi-file disk image, disk-vmdk.vmdk is the descriptor text file that describes that image

so when you update disk-vmdk.vmdk

$ sed -i -r 's|disk-vmdk-flat.vmdk|/etc/hosts|' disk-vmdk.vmdk

you are just updating the descriptor file

and when you create the glance image

openstack server create --flavor small --image disk-vmdk.vmdk --net public disk-vmdk.vmdk

you are just uploading the descriptor file.

if you replaced the content of disk-vmdk.vmdk with a qcow the format inspector would detect that as a qcow and reject it before calling qemu-img.

the scope of this cve is to fix the regression of https://bugs.launchpad.net/bugs/1996188
which was previously mitigated by using qemu-img.

any additional hardening beyond that we intend to pursue as a normal hardening opportunity not as part of an embargoed cve.

unless you can demonstrate an explicit path to exploit this I think we should proceed with the patches as presented.

interms of storage deamons nova has never supported or used the libvirt storage deamon

as part of our new installer i even explicitly made sure to not enable the systems service file for it
https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/roles/edpm_libvirt/defaults/main.yml#L29-L44

we did ask our virt team about if they knew of any way to use the nbd server form vmdk and vhd.
to our knowledge, the quoram driver available in the qcow format cannot be used in a similar way with vmdk.

nova also does not have any integration with the qemu-storage-daemon
we do not generate XML that uses the storage daemon as a separate process to run the storage component in a separate process from the main qemu executable.

that functionality is mainly used for vhost-user-blk and storage backends like SPDK that do software accelerated storage in userspace.

while it would not be impossible for os-brick to use the qemu-storage-daemon to create a block device that can be consumed by qemu for cinder volumes that is out of the scope of this bug and I don't believe any implementations do that. with that context in mind i don't see a regression in security by continuing to use qemu-img

long term we will likely want to move away from qemu-image as our primary intropsection method but to do that need considerably more work to make our python based intropector capable fo supporting all formats we may ever support.

Revision history for this message
Dan Smith (danms) wrote :

The QEMU team previously told me that the QMP stuff (i.e. the magic json blob) is only honored when present in a qcow's data-file. Even though it was asserted in the previous bug that the reporter thought it might be doable in other formats, I have confirmed again this morning that it's not exploitable in nova the way we're doing things. The details around why that is are a bit complex, which I will avoid going into here. But, tl;dr is, it's okay for us to call `qemu-img info` on the VMDKs of the subformats (and any other similar formats) we don't recognize in format_inspector at that stage.

So yes, we want to go for full safety eventually, and identify/safety check anything we can before we pass it to `qemu-img info` but doing so for all the vmdk subformats is going to be more work. Doing that in the context of this bug is not scope I want to add, unless and until there's an actual concern. I was hoping to get this out ASAP since it's a regression from the previous fix.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Given there are no remaining blockers identified, and the four patches attached to comments #40-45 from Wednesday have been tested and reviewed favorably by multiple parties, I'll proceed with the downstream stakeholder advance notification with a plan to publish a new official advisory at 15:00 UTC on 2024-07-23.

In addition to the description draft from comment #39 I'll be including these additional notes:

- The patches included should apply cleanly to the present public state of their respective branches, and depend on some commits which merged after the OSSA-2024-001 fixes as well as the final states of the Nova changes linked from that advisory (those did see some minor adjustments before they merged).

- Neither the methods introduced in these patches nor the fixes for OSSA-2024-001 are capable of blocking malicious images which are already resident in Nova's cache. At this time we do not have useful operator guidance for identifying and removing such existing images from the cache but strongly caution, if you do attempt to use the qemu-img tool to find them, to make sure you're using a version of it patched for CVE-2024-4467.

Revision history for this message
Dan Smith (danms) wrote :

Draft extra comments look good to me, thanks Jeremy!

Revision history for this message
Jeremy Stanley (fungi) wrote :

Advance notification and copies of the patches have been sent to our downstream stakeholders, including the private linux-distros mailing list. As usual, I'll be subscribing stakeholders to this bug report at their request.

I also included this additional note, as discussed earlier in the bug:

- The QEMU issue is due to an incomplete fix in OSSA-2024-001 affecting systems where the use_cow_images configuration option is disabled, while the VMDK issue is a regression of the earlier OSSA-2023-002 vulnerability reintroduced by the new implementation in OSSA-2024-001. Both problems were identified in the final hours before OSSA-2024-001 publication but, due to time constraints, were redacted from that bug and moved to a separate report.

Jeremy Stanley (fungi)
summary: - Regression VMDK/qcow arbitrary file access (CVE-2024-40767)
+ Incomplete file access fix and regression for QCOW2 backing files and
+ VMDK flat descriptors (CVE-2024-40767)
Revision history for this message
Kurt Garloff (kgarloff) wrote : Re: Incomplete file access fix and regression for QCOW2 backing files and VMDK flat descriptors (CVE-2024-40767)

Testing note:

TL;DR:
I failed to exploit this issue with both VMDK-Flat and Qcow2-BackingFile with nova-28.1.1.20240710 from the 2023.2 OSISM's kolla-ansible images.

Longer note:
I tested with 2023.2 stable branch, kolla-ansible images from OSISM
quay.io/osism/nova-compute 28.1.1.20240710 ce58c6c1512b 9 days ago 1.44GB

* Produced the raw images for VMDK (with flatfile) and QCOW2 (with backing file) as described in the first post (I referenced /etc/gshadow for testing there.) and registered them.
* Producing cinder volumes from them succeeds, and the created volumes do NOT contain the contents of /etc/gshadow of the cinder-volume container but just the plain content of the image file. Good. As expected.
* Creating a nova instance with local disk, the thing of course does not boot. Using rescue mode to create a booting instance with the disk as /dev/sdb, and inspecting sdb, I again see the contents of the raw image files and not /etc/gshadow. This is NOT what I expected based on this bug report.

Here's what I see, which is the IMVHO correct behavior from nova.

QCOW2:
QFI
   py*rawhWdirty bitcorrupt bitexternal data filecompression typeextended L2 entrieslazy refcountsbitmapsraw external data/etc/gshadow

VMDK:
# Disk DescriptorFile
version=1
CID=67b9d934
parentCID=ffffffff
createType="monolithicFlat"

# Extent description
RW 2048 FLAT "/etc/gshadow" 0

# The Disk Data Base

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (13.6 KiB)

so i just happedend to see this bug I'm not working today

i quickly tested the rescue case again

(overcloud) [stack@undercloud-0 test_images]$ openstack image list
+--------------------------------------+----------------------------------+--------+
| ID | Name | Status |
+--------------------------------------+----------------------------------+--------+
| 0ca27d66-f167-4455-82e6-5732e26829c5 | bad-qcow-with-datafile.qcow2 | active |
| 1fe06aee-f66b-46b7-84ef-6689e3fb88e9 | cirros-0.5.2-x86_64-disk.img | active |
| d2fefc19-a51f-4e2f-9b51-192e7e2d9714 | cirros-0.5.2-x86_64-disk.img_alt | active |
+--------------------------------------+----------------------------------+--------+

(overcloud) [stack@undercloud-0 test_images]$ openstack server create --flavor test --image cirros-0.5.2-x86_64-disk.img --network public rescue

(overcloud) [stack@undercloud-0 test_images]$ openstack server show rescue
+-------------------------------------+---------------------------------------------------------------------+
| Field | Value |
+-------------------------------------+---------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute-1.redhat.local |
| OS-EXT-SRV-ATTR:instance_name | instance-00000049 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2024-07-20T12:01:22.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | public=10.0.0.176, 2620:52:0:13b8::1000:97 |
| config_drive | |
| created | 2024-07-20T12:01:16Z |
| flavor | test (5001348c-d056-42f7-95d6-d9b025719732) |
| hostId | 335611ec1351c7f90a5e5eb581dc958f89ad4c7be64406c4740a12d2 |
| id ...

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Download full text (104.1 KiB)

if i do it the other way around

boot form a bad image and rescue with a good one its also blocked

(overcloud) [stack@undercloud-0 test_images]$ openstack server show rescue
+-------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...

Revision history for this message
sean mooney (sean-k-mooney) wrote :

can you clarify what you mean by

"* Creating a nova instance with local disk, the thing of course does not boot. Using rescue mode to create a booting instance with the disk as /dev/sdb, and inspecting sdb, I again see the contents of the raw image files and not /etc/gshadow. This is NOT what I expected based on this bug report."

Revision history for this message
Kurt Garloff (kgarloff) wrote :

# This creates an instance trying to boot it from the qcow2-bf image that I registered before
# Note SCS-2V-4-20s is a flavor with 2vCPUs, 4GiB RAM and a 20GiB local SSD
openstack server create --network oshm-network --security-group ssh --key-name oshm-key --flavor SCS-2V-4-20s --image disk-bf.qcow2 evil-qocw2.2
# Of course it does not come up, this image does not contain a bootloader/kernel
openstack server rescue --image "Debian 12" --password UNUSED evil-qocw2.2
# Now it boots, from a disk with the Denian 12 image and attaches the previous boot disk as sdb
# I can connect with the oshm-key (either locally or after attaching a FIP) and look at /dev/sdb
# It may be /dev/vbd if your flavors do not prefer SCSI
# Same thing with the disk-vmdk.vmdk image
# In neither case, the referenced external file will be on sdb, just the raw contents ...
# You can of course do all of this from Horizon.
#
# HTH

Revision history for this message
Kurt Garloff (kgarloff) wrote :

And in case it was unclear:
* I expected that this issue is exploitable and I can exfiltrate /etc/gshadow (or /etc/hosts if the original image is used) or worse from the nova-compute container. This was trivially possible before we added the fixes for CVE-2024-32498, via nova, cinder and glance. I expected to still find a hole in nova by registering a raw image with vmdk/qcow2 contents as described by the reporter. It did not work.
* So, the code in nova-compute may still go into areas it should not go into by allowing the misdeclared image to confuse it, but I did not find this to be exploitable, though I was possibly not creative enough.
* That said, I appreciate the patches from Dan Smith and others and I think they make us more robust. I'm just not sure that we are dealing with an exploitable security issue.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

ill need to test this again with master on Monday

if we boot a VM form a bad image it will be blocked and end up not having any host.
so if you rescue that with any image you should get a 409 conflict as the VM is not on any host
as i noted in https://bugs.launchpad.net/nova/+bug/2071734/comments/71

and if you do it the other way (boot form a good image and rescue with a blocked one)

rescue fails as i noted in
https://bugs.launchpad.net/nova/+bug/2071734/comments/70

so at least with a quick attempt i didn't reproduce the behavior you said in
https://bugs.launchpad.net/nova/+bug/2071734/comments/74

did you use an image that was in the image cache ?

part of the mitigation of this requires that you first purge any bad images form the image cache on the compute node

Revision history for this message
Kurt Garloff (kgarloff) wrote :

I wanted to show exploitability and tested with stable branch 2023.2 from 2024-07-10. Without the fixes for this CVE but with the ones for CVE-2024-32498.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

oh ok, that explains why you are not seeing the same behaviour.

there are ways to exploit https://bugs.launchpad.net/nova/+bug/2059809 (CVE-2024-32498)
that do not require the VM to be bootable to exfiltrate data.

as such it's important to ensure we have the extra protections provided by this patch to close those
attack vectors.

since you are testing the unpatched state one of the things you need to keep in mind is this vulnerability manifests when you have [DEFAULT]use_cow_images=False which is not the default.

Revision history for this message
Kurt Garloff (kgarloff) wrote :

Yeah, exfiltration of data with CVE-2024-32498 was trivial. Just create a volume from the bad image and attach it ...
My setup does indeed NOT use `use_cow_images=False`.
Which seems to make the vulnerability non-exploitable. Which is good!

Revision history for this message
Dan Smith (danms) wrote :

Right, this is definitely exploitable without this patch, it just requires a use_cow_images=False (or equivalent - there are a few ways to get there) config on the backend.

So just to be clear (correct me if I'm wrong), this testing was for the reported issue without the patch for this bug and without the requisite config to exploit. Thus, we're still looking good with this bug/patch per plan.

Revision history for this message
Kurt Garloff (kgarloff) wrote :

@danms -- I meant to provide more clarity, not confuse anyone. Sorry if I did.
When I look at security vulns, I always go through some steps.
0. Try to understand the vuln
1. Reproduce it with unpatched code
2. Analyse the patches and deploy them
3. Fail reproduction

I got stuck at 1 -- probably caused by my failure to understand that this would only hit, if I configure use_cow_images=False. I will try again now with the changed setting.

I see no reason why we should not go forward with patching.
(I do see it being a bit less urgent though to deploy quickly, as we don't do this use_cow_images=False setting by default.)

Revision history for this message
Jeremy Stanley (fungi) wrote :

Dan: That matches my interpretation as well. I plan to quietly switch this bug to public at 14:00 UTC tomorrow so that the Nova fix/backports can be pushed to gerrit, and then I'll get the links for them into the local draft advisory for publication at 15:00 UTC as planned.

Kurt: Keep in mind that the incomplete fix for use_cow_images=False situations is only half of this, the other problem is the OSSA-2023-002 regression since the new image inspector wasn't catching VMDK flat descriptors correctly.

Revision history for this message
Kurt Garloff (kgarloff) wrote :

Jeremy: I also failed to exploit the VMDK issue, not just qcow2.
See comment #69. But let me read through the old bug report once more and see of the exploit needs to be done differently.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

i have created 4 copies of nova locally with the relevant patches and i have locally
pre-updated the commit messages with the cherry-picked form lines.

otherwise, there is no delta to the ones attached to the bug.

ill submit them for review once the bug changes to public.

if we want to bypass nova backport validation logic i can add

[stable-only] to each of them to disable that job validation logic.

if that is preferred let me know and ill add that to the commit message before i submit them.

I'm pretty hopeful that we can merge all 4 patches to the relevant stable branches by the end of tomorrow.

summary: - Incomplete file access fix and regression for QCOW2 backing files and
- VMDK flat descriptors (CVE-2024-40767)
+ [OSSA-2024-002] Incomplete file access fix and regression for QCOW2
+ backing files and VMDK flat descriptors (CVE-2024-40767)
description: updated
information type: Private Security → Public Security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/924731

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/nova/+/924732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/nova/+/924733

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/nova/+/924734

Changed in ossa:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ossa (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/ossa/+/924735

Changed in ossa:
importance: Undecided → High
assignee: nobody → Jeremy Stanley (fungi)
Revision history for this message
sean mooney (sean-k-mooney) wrote :

the reviews are now live and attached as seen above ^ but ill summarise here

Master: https://review.opendev.org/c/openstack/nova/+/924731
stable/2024.1: https://review.opendev.org/c/openstack/nova/+/924732
stable/2023.2: https://review.opendev.org/c/openstack/nova/+/924733
stable/2023.1: https://review.opendev.org/c/openstack/nova/+/924734

i have updated the commit sha in the stable branches assuming the previous patch merges without changes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ossa (master)

Reviewed: https://review.opendev.org/c/openstack/ossa/+/924735
Committed: https://opendev.org/openstack/ossa/commit/43ab26efbd928d03f6d8f93b7f6ff2afd765e4fc
Submitter: "Zuul (22348)"
Branch: master

commit 43ab26efbd928d03f6d8f93b7f6ff2afd765e4fc
Author: Jeremy Stanley <email address hidden>
Date: Tue Jul 23 13:25:26 2024 +0000

    Add OSSA-2024-002 (CVE-2024-40767)

    Change-Id: I0108e766ac2aa8145860736a92115b5a963d38aa
    Closes-Bug: #2071734

Changed in ossa:
status: In Progress → Fix Released
Revision history for this message
Jeremy Stanley (fungi) wrote :

The advisory text is now published to https://security.openstack.org/ and copies have been sent to the usual mailing lists. I've also notified MITRE that they can switch the CVE assignment to public.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/924746

Revision history for this message
sean mooney (sean-k-mooney) wrote :

the upstream ci found an issue with our handling of ami format images

https://review.opendev.org/c/openstack/nova/+/924775 is the fix as a result i will be updating all the gerrit patches to include that fix.

i will comment here again when that is done.

its important that all distos ensure they pull the final merged version of the patch not the version attach to the bug report as patch files.

Revision history for this message
Dan Smith (danms) wrote :

To be clear, the original version of the patches on this bug and in the advisory _DO_ fix the CVE and are absolutely what should be packaged by distros ASAP to close the hole.

That said, the checks overzealously reject AMI images that qemu detects as raw if/when they are registered in glance as disk_format=ami (instead of raw). The final version we're working on in gerrit will have a small change from these to avoid that mismatch, but it's far more important to have the security fix. I'm not sure how many nova deployments actually use/support AMI, but none of the downstream testers noticed so I suspect very little. This continues to support the case for us to either deprecate support for AMI, or very strictly define the bounds around what we'll allow in something called AMI.

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.