HypervisorUnavailable error leaks compute host fqdn to non-admin users

Bug #1851587 reported by Archit Modi on 2019-11-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned
OpenStack Security Advisory
Undecided
Unassigned

Bug Description

Description
===========
When an instance encounters a HypervisorUnavailable error, the non-admin user gets the info of the compute host fqdn in the error message.

Steps to reproduce
==================
1. Spin up an instance with non-admin user credentials
2. To reproduce the error, stop the libvirtd service on the compute host containing instance
3. Delete the instance
4. Deletion fails providing HypervisorUnavailable error

Expected result
===============
Error does not show compute host fqdn to a non-admin user

Actual result
=============
#spin up an instance
+--------------------------------------+------------+--------+------------+-------------+-------------------------------------+------------------------------+--------------------------------------+-------------+-----------+-------------------+------+------------+
| ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties |
+--------------------------------------+------------+--------+------------+-------------+-------------------------------------+------------------------------+--------------------------------------+-------------+-----------+-------------------+------+------------+
| 4f42886d-e1f8-4607-a09d-0dc12a681880 | test-11869 | ACTIVE | None | Running | private=192.168.100.158, 10.0.0.243 | cirros-0.4.0-x86_64-disk.img | 5d0bd6a5-7331-4ebe-9328-d126189897e2 | | | nova | | |
+--------------------------------------+------------+--------+------------+-------------+-------------------------------------+------------------------------+--------------------------------------+-------------+-----------+-------------------+------+------------+

#instance is running on compute-0 node (only admin knows this)
[heat-admin@compute-0 ~]$ sudo virsh list --all
 Id Name State
----------------------------------------------------
 108 instance-00000092 running

#stop libvirtd service
[root@compute-0 heat-admin]# systemctl stop tripleo_nova_libvirt.service
[root@compute-0 heat-admin]# systemctl status tripleo_nova_libvirt.service
● tripleo_nova_libvirt.service - nova_libvirt container
   Loaded: loaded (/etc/systemd/system/tripleo_nova_libvirt.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Wed 2019-11-06 22:48:25 UTC; 5s ago
  Process: 8514 ExecStop=/usr/bin/podman stop -t 10 nova_libvirt (code=exited, status=0/SUCCESS)
 Main PID: 3783

Nov 06 22:29:48 compute-0 podman[3396]: 2019-11-06 22:29:48.443603571 +0000 UTC m=+1.325620613 container init a3e32121d12929e663b899b57cb7bc87581ddf5bdfb19cf8fee4bace41cb19bb (image=undercloud-0.ctlpla>
Nov 06 22:29:48 compute-0 podman[3396]: 2019-11-06 22:29:48.475946808 +0000 UTC m=+1.357963869 container start a3e32121d12929e663b899b57cb7bc87581ddf5bdfb19cf8fee4bace41cb19bb (image=undercloud-0.ctlpl>
Nov 06 22:29:48 compute-0 paunch-start-podman-container[3385]: nova_libvirt
Nov 06 22:29:48 compute-0 paunch-start-podman-container[3385]: Creating additional drop-in dependency for "nova_libvirt" (a3e32121d12929e663b899b57cb7bc87581ddf5bdfb19cf8fee4bace41cb19bb)
Nov 06 22:29:49 compute-0 systemd[1]: Started nova_libvirt container.
Nov 06 22:48:24 compute-0 systemd[1]: Stopping nova_libvirt container...
Nov 06 22:48:25 compute-0 podman[8514]: 2019-11-06 22:48:25.595405651 +0000 UTC m=+1.063832024 container died a3e32121d12929e663b899b57cb7bc87581ddf5bdfb19cf8fee4bace41cb19bb (image=undercloud-0.ctlpla>
Nov 06 22:48:25 compute-0 podman[8514]: 2019-11-06 22:48:25.597210594 +0000 UTC m=+1.065636903 container stop a3e32121d12929e663b899b57cb7bc87581ddf5bdfb19cf8fee4bace41cb19bb (image=undercloud-0.ctlpla>
Nov 06 22:48:25 compute-0 podman[8514]: a3e32121d12929e663b899b57cb7bc87581ddf5bdfb19cf8fee4bace41cb19bb
Nov 06 22:48:25 compute-0 systemd[1]: Stopped nova_libvirt container.

#delete the instance, it leaks compute host fqdn to the non-admin user
(overcloud) [stack@undercloud-0 ~]$ nova delete test-11869
Request to delete server test-11869 has been accepted.
(overcloud) [stack@undercloud-0 ~]$ openstack server list --long
+--------------------------------------+------------+--------+------------+-------------+----------+------------------------------+--------------------------------------+-------------+-----------+-------------------+------+------------+
| ID | Name | Status | Task State | Power State | Networks | Image Name | Image ID | Flavor Name | Flavor ID | Availability Zone | Host | Properties |
+--------------------------------------+------------+--------+------------+-------------+----------+------------------------------+--------------------------------------+-------------+-----------+-------------------+------+------------+
| 4f42886d-e1f8-4607-a09d-0dc12a681880 | test-11869 | ERROR | None | Running | | cirros-0.4.0-x86_64-disk.img | 5d0bd6a5-7331-4ebe-9328-d126189897e2 | | | nova | | |
+--------------------------------------+------------+--------+------------+-------------+----------+------------------------------+--------------------------------------+-------------+-----------+-------------------+------+------------+
(overcloud) [stack@undercloud-0 ~]$ openstack server show test-11869 <---debug output attached in logs
+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | error |
| OS-SRV-USG:launched_at | 2019-11-06T22:13:08.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | |
| config_drive | |
| created | 2019-11-06T22:12:57Z |
| description | None |
| fault | {'code': 500, 'created': '2019-11-06T23:01:45Z', 'message': 'Connection to the hypervisor is broken on host: compute-0.redhat.local'} |
| flavor | disk='1', ephemeral='0', , original_name='m1.tiny', ram='512', swap='0', vcpus='1' |
| hostId | c7e6bf58b57f435659bb0aa9637c7f830f776ec202a0d6e430ee3168 |
| id | 4f42886d-e1f8-4607-a09d-0dc12a681880 |
| image | cirros-0.4.0-x86_64-disk.img (5d0bd6a5-7331-4ebe-9328-d126189897e2) |
| key_name | None |
| locked | False |
| locked_reason | None |
| name | test-11869 |
| project_id | 6e39619e17a9478580c93120e1cb16bc |
| properties | |
| server_groups | [] |
| status | ERROR |
| tags | [] |
| trusted_image_certificates | None |
| updated | 2019-11-06T23:01:45Z |
| user_id | 3cd6a8cb88eb49d3a84f9e67d89df598 |
| volumes_attached | |
+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------+

Archit Modi (amodi5) wrote :
Matt Riedemann (mriedem) wrote :

HypervisorUnavailable could probably crop up for any server action if the compute service is running but the hypervisor is down and it just blindly gets injected as an instance fault because of the @wrap_instance_fault decorator in the ComputeManager.

The fault details should be hidden from non-admin users but the message could probably be generically whitelisted and converted to something that doesn't contain the host name.

tags: added: compute
sean mooney (sean-k-mooney) wrote :

did we not just fix a cve that was very similar to this where we were passing exceptions back when we failed to connect ceph leaking the ceph mon info.

granted that is available the attachment details but this feels similar in that we are exposing the compute node fqdn

tags: added: security
information type: Public → Public Security
Archit Modi (amodi5) wrote :

This was found during validation of that CVE and we discussed as this issue being tracked separately and not considered as a CVE

melanie witt (melwitt) wrote :

Yup, what Archit said. To be a bit more verbose, the original CVE was about leaking the ceph monitor IP address and while we were verifying the fix for the original CVE, we stopped libvirtd to cause a server fault and repro the original bug. But stopping libvirtd resulted in the "Connection to the hypervisor is broken" state and we inadvertently exposed the HypervisorUnavailable exception containing the compute host FQDN.

So, Archit has followed up on that and written it up as a different issue.

Jeremy Stanley (fungi) wrote :

Per bug 1837877 this can be treated as a hardening opportunity, but no further advisory should be needed.

Changed in ossa:
status: New → Won't Fix
information type: Public Security → Public
Matt Riedemann (mriedem) wrote :

From the fix for bug 1837877 https://review.opendev.org/#/c/674821/:

"Note that nova exceptions with a %(reason)s replacement
variable could potentially be leaking sensitive details as
well but those would need to be cleaned up on a case-by-case
basis since we don't want to change the behavior of all
fault messages otherwise users might not see information
like NoValidHost when their server goes to ERROR status
during scheduling."

In this case HypervisorUnavailable is a NovaException so it's treated differently:

https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/exception.py#L508

As I said above, this could likely show up in fault messages in a lot of places where the ComputeManager uses the wrap_instance_fault decorator to inject a fault on exceptions getting raised and anything that changes the instance status to ERROR, e.g. failed rebuild:

https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/compute/manager.py#L3061

https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/compute/manager.py#L3145

So one question is, do we need to start whitelisting certain exceptions?

And if we do, how? Because the API will always show the message:

https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/api/openstack/compute/views/servers.py#L331

but only show the details (traceback) for admins and non-500 (I guess, that's weird) error cases:

https://github.com/openstack/nova/blob/a90fe1951200ebd27fe74788c0a96c01104ac2cf/nova/api/openstack/compute/views/servers.py#L341

When I was working on the CVE fix above, it's complicated to know from the point that we inject the fault what should be shown based on context.is_admin because an admin could be rebuilding some non-admin's server, so we can't really base things on that.

If we only showed the fault message in the API for admins in 500 code cases, then non-admin users will no longer see NoValidHost.

Do we need to get so granular that we need to set an attribute on each class of nova exception indicating if its fault message can be exposed to non-admins? That would be hard to maintain I imagine, but maybe it would just start with HypervisorUnavailable and we build on that for other known types of nova exceptions that leak host details?

melanie witt (melwitt) wrote :

> Do we need to get so granular that we need to set an attribute on each class of nova exception indicating if its fault message can be exposed to non-admins? That would be hard to maintain I imagine, but maybe it would just start with HypervisorUnavailable and we build on that for other known types of nova exceptions that leak host details?

This sounds like the most reasonable idea to me, assuming we want/need to keep the compute host FQDN in the exception message.

I had actually been thinking about the possibility of removing the FQDN from the exception message for HypervisorUnavailable altogether. If an admin sees HypervisorUnavailable, they can also easily see what host the guest is on (the instance 'host' field), and thus know what compute host has a broken connection. Just an idea.

Nick Tait (nickthetait) wrote :

So far I agree with hardening classification. But what I don't yet understand is why revealing FQDN is a security problem. Pretend I am a non-admin, and I learn that there is a compute host available at www.example.com. Wouldn't I need valid access credentials to that host before viewing/tampering with it?

In the physical world, leaking your home address to an adversary would be a security problem. How does this transfer into the digital example? In the case of OpenStack being run as a public cloud, the non-admin user has to be treated as a potentially untrustworthy person.

Nick Tait (nickthetait) wrote :

Are there cases where a non-admin could cause a HypervisorUnavailable situation?

melanie witt (melwitt) wrote :

Hi Nick,

I hear you and IMHO revealing the FQDN is kind of a "soft" problem, as it could only hurt you (the deployer) if you've got your hypervisor exposed to the public internet and revealing its address is going to give someone the opportunity to launch a targeted attack on it to brute force the credentials (or whatever else). Having a hypervisor exposed to the internet isn't typical or recommended and probably (hopefully) nobody does that, but if they do, it could be a problem.

Hence, this is a "hardening opportunity" and we've not proposed a patch to deal with it yet because (1) it's a "soft" problem and (2) it's not trivial to fix unless we just remove the FQDN from the exception message altogether (which I am personally fine with).

To answer your last question, yes a non-admin user can see HypervisorUnavailable if, for example, the libvirt process is stopped or nova otherwise can't reach the libvirt monitor when they attempt to delete their server. This is rare I expect, but could happen.

Matt Riedemann (mriedem) on 2019-12-12
Changed in nova:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers