Compute resource tracker does not report correct information for drivers such as vSphere

Bug #1718212 reported by Jay Jahns on 2017-09-19
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Radoslav Gerganov

Bug Description

The compute resource tracker is ignoring values coming from compute drivers such as vSphere. Specifically, the resource tracker is accepting disk_total from the driver, but not disk_available and disk_used.

We've checked the vSphere driver, and the driver is indeed reporting the correct values for this.

<snip>
        data["vcpus"] = stats['vcpus']
        data["disk_total"] = capacity / units.Gi
        data["disk_available"] = freespace / units.Gi
        data["disk_used"] = data["disk_total"] - data["disk_available"]
        data["host_memory_total"] = stats['mem']['total']
        data["host_memory_free"] = stats['mem']['free']
        data["hypervisor_type"] = about_info.name
        data["hypervisor_version"] = versionutils.convert_version_to_int(
                str(about_info.version))
        data["hypervisor_hostname"] = self._host_name
        data["supported_instances"] = [
            (arch.I686, hv_type.VMWARE, vm_mode.HVM),
            (arch.X86_64, hv_type.VMWARE, vm_mode.HVM)]
</snip>

It looks like a patch was made to the resource tracker @ https://review.openstack.org/#/c/126237/ but the change was voted down and not approved.

Since vSphere is reporting the correct data, maybe someone can tell us why Nova is ignoring these values and provide us with some guidance on correcting the problem in the driver, if that's where the change is needed.

This is affecting our production environment because I use upstream code and I have clusters that get overcommitted as a direct result of the scheduler seeing that no disk is "used"

This should get extreme priority since it affects the ability to launch things!!!

Chris Dent (cdent) wrote :

In https://review.openstack.org/#/c/441543/ a new virt driver method, get_inventory(), was added that ought to be able to help this problem once implemented in the vmwareapi virt driver. This will allow the disk, memory, etc to express a "real" "total" and a "reserved" value which represents any resources that are consumed by anything that nova does not track (e.g. pre-existing vms).

When get_inventory is used, it allows the virt driver to become authoritative for Inventory information. Anything that get_inventory doesn't provide falls back to the old way. The libvirt of get_inventory has a bit more information: https://review.openstack.org/#/c/457782/

So I would guess that the way forward here is to provide a get_inventory method in the vmwareapi driver. If you can do that Jay Jahns, great, please say so. If not you, then I guess it will be me or Rado Gerganov.

tags: added: placement
Matt Riedemann (mriedem) wrote :

What version of nova are you running? https://review.openstack.org/#/c/126237/ was proposed over 2 years ago and bug 1240200 was opened in 2013. So I would not consider this an extreme priority given how latent it is.

The reason https://review.openstack.org/#/c/126237/ was rejected was because of the proposed solution, which was adding a global config option to make the operator have to know if they trust the resource tracker or the virt driver. The rejection of that patch was about the solution, not saying it wasn't a problem. That was also expressed in this mailing list thread:

http://lists.openstack.org/pipermail/openstack-dev/2014-October/047849.html

From what I read, there was general favor of fixing the virt drivers to report the correct information and removing the code in the resource tracker that is being used today. The question was what other virt drivers needed to be fixed, which would require an audit and probably a blueprint. The change from Gary Kotton was abandoned as a result.

As Chris noted, I think the get_inventory() method is probably the solution here since (1) we're working toward removing, or very much slimming down, the ResourceTracker code's role in resource claims and scheduling. That's being replaced with Nova's usage of the Placement service, and the get_inventory() method feeds into that. Looking at the original bug 1240200, the issue is with reporting reserved space, which can be solved by implementing the get_inventory() method to report whatever amount of reserved VCPU/MEMORY_MB/DISK_GB is on the hypervisor. The ResourceTracker then feeds that data into the Placement service which the nova scheduler relies on for scheduling decisions.

tags: added: vmware
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Jay Jahns (jjahns) wrote :

Thank you for responding.

The version I am running in prod is Newton.

We are upgrading to Pike in the coming days/weeks. First we are going to go to Ocata.

I'll take a look at this item and confirm.

Tracy Jones (tjones-i) wrote :

Rado is looking into this

Changed in nova:
assignee: nobody → Radoslav Gerganov (rgerganov)

Fix proposed to branch: master
Review: https://review.openstack.org/506175

Changed in nova:
status: Confirmed → In Progress
Radoslav Gerganov (rgerganov) wrote :

Unfortunately there is no easy way to calculate the "reserved" disk space, i.e. the non-OpenStack usage. In the VMware case this is not a static value, especially if you are using a shared datastore for both OpenStack and non-OpenStack workloads.

We may try to report the used disk space as "reserved" but I am not sure what kind of implications that would have. IMO, calculating the resource usage in Nova instead of using the data returned by the hypervisor is poor design choice.

Jay, as a workaround you may try setting "reserved_host_disk_mb" in nova.conf.

Jay Jahns (jjahns) wrote :

Hi,

Here is what the virt driver is returning for my cluster. I am using one vSphere cluster (1 compute node) and there are no non-OpenStack instances on this cluster.

2017-09-28 23:04:35.562 3304 DEBUG nova.compute.resource_tracker [req-ae33aa02-bcf8-48dc-8884-9d6ddad56b61 - - - - -] Hypervisor/Node resource view: name=domain-c192.b9564b6d-07ee-4a65-89f2-48244db310b9 free_ram=4549438MB free_disk=74717GB free_vcpus=936 pci_devices=None _report_hypervisor_resource_view /opt/mhos/openstack/nova/lib/python2.7/site-packages/nova/compute/resource_tracker.py:672

If you notice, free_disk is 74717GB.

Next log output:

2017-09-28 22:21:59.466 3304 INFO nova.compute.resource_tracker [req-ae33aa02-bcf8-48dc-8884-9d6ddad56b61 - - - - -] Final resource view: name=domain-c192.b9564b6d-07ee-4a65-89f2-48244db310b9 phys_ram=6241318MB used_ram=2706944MB phys_disk=101720GB used_disk=0GB total_vcpus=936 used_vcpus=1575 pci_stats=[]

Used disk says 0GB and phys_disk says 101720GB.

That's incorrect. The value should be 101720 - 74717.

All of my instances do not have root disks. They are completely ephemeral and our users use volumes to store persistent data. Even then, we should be reporting what the compute node has reported back with.

How this impacts production: There is no check to prevent gross oversubscription of storage.

In theory, I can login, create a 500GB image in Glance, launch 1000 instances on it (which the maths add up to potentially 500000GB, or 500TB) and then run a dd command in all of them simultaneously and destroy my vSphere cluster.

Jay Jahns (jjahns) wrote :

I should also add - instead of launching 1000 instances, they could create 1000 volumes, and assume they are 500GB each, run a dd command in them and do the same thing.

We *have* to be able to report this data, or OpenStack will continue to provision things to this compute node and we end up in a really nasty, possibly catastrophic, situation.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers