The name VCPUs (total) of Hypervisors is confusing

Bug #1202965 reported by Zhenguo Niu
176
This bug affects 33 people
Affects Status Importance Assigned to Milestone
OpenStack Cinder-backup Charm
Incomplete
Undecided
Unassigned
OpenStack Compute (nova)
Opinion
Wishlist
Unassigned
OpenStack Dashboard (Horizon)
Invalid
Low
Unassigned

Bug Description

In Hypervisors panel, VCPUs(total) and VCPUs(used) fields causes confusion as used always bigger than total.

Virtual CPU to Physical CPU allocation ratio is default to 16.0. put Physical CPU total to VCPUS(total) is not correct.

Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

Yep, we could be more clear here.

Changed in horizon:
importance: Undecided → Low
milestone: none → havana-3
status: New → Confirmed
Changed in horizon:
milestone: havana-3 → none
Revision history for this message
melanie witt (melwitt) wrote :

Not seeing the behavior with nova api:

RESP BODY: {"hypervisor_statistics": {"count": 1, "vcpus_used": 1, "local_gb_used": 10, "memory_mb": 4096, "current_workload": 1, "vcpus": 2, "running_vms": 1, "free_disk_gb": 15, "disk_available_least": 0, "local_gb": 25, "free_ram_mb": 3072, "memory_mb_used": 1024}}

Changed in nova:
status: New → Invalid
Changed in horizon:
assignee: nobody → George Peristerakis (george-peristerakis)
Revision history for this message
George Peristerakis (george-peristerakis) wrote :

Is it possible to have a screen shot and a corresponding nova api call? It is not clear if the problem is a display bug.

Changed in horizon:
status: Confirmed → Incomplete
Changed in horizon:
assignee: George Peristerakis (george-peristerakis) → nobody
Changed in nova:
assignee: nobody → Yassine (yassine-lamgarchal)
Revision history for this message
Yassine (yassine-lamgarchal) wrote :

Hi,

I think the issue is a Nova one, for instance, according to nova.virt.libvirt.driver.LibvirtDriver.get_vcpu_total() function the
computation of the total vcpus for an hypervisor doesn't take into account the cpu allocation ratio.

Here is a screenshot which illustrate the issue.

What do you think about ?

Revision history for this message
Yassine (yassine-lamgarchal) wrote :

By the way, we can observe the same behaviour for "RAM (total)" column.

Changed in nova:
assignee: Yassine (yassine-lamgarchal) → Tristan Cacqueray (tristan-cacqueray)
Changed in nova:
status: Invalid → In Progress
Changed in nova:
assignee: Tristan Cacqueray (tristan-cacqueray) → nobody
Song Li (lisong-cruise)
Changed in nova:
assignee: nobody → Song Li (lisong-cruise)
Song Li (lisong-cruise)
Changed in horizon:
status: Incomplete → Invalid
Revision history for this message
Song Li (lisong-cruise) wrote :

As icehouse will release recently, so I think it was better to do some improvement in next version for this issue

Song Li (lisong-cruise)
Changed in nova:
assignee: Song Li (lisong-cruise) → nobody
Revision history for this message
Joe Gordon (jogo) wrote :

It looks like the hypervisor API lists vcpus and vcpus used, but what it is really returning is CPUs and vcpus_used. This is a good candidate for the API formally known as V3.

tags: added: rootwrap
tags: added: api
removed: rootwrap
Changed in nova:
status: In Progress → Triaged
importance: Undecided → Wishlist
Revision history for this message
Christopher Yeoh (cyeoh-0) wrote :

Agreed we should fix this in Juno in Nova with for the API formally known as V3 (might be a good testcase for the first microversion!)

Revision history for this message
Tom Fifield (fifieldt) wrote :

Did this get any attention in Juno, in the end?

tags: added: ops
tags: added: canonical-bootstack
Revision history for this message
Sean Dague (sdague) wrote :

This is really a UX feature addition

Changed in nova:
status: Triaged → Opinion
Revision history for this message
ofer blaut (oblaut) wrote :

Without this fix, we don't have real knowledge how many VCPUs are available when using:

nova hypervisor-stats
nova hypervisor-show

Revision history for this message
Felipe Reyes (freyes) wrote :

According to the resolution on this blueprint[0], nova won't be doing changes to their API. Is there something I could do to improve the UX in Horizon?

[0] https://review.openstack.org/#/c/98058/

Revision history for this message
Rarylson Freitas (rarylson) wrote :

Hi,

I agree that it isn't a bug (it's a feature) and that the current Horizon behavior may cause misunderstood in some administrators.

As an example, the VMWare vSphere client shows the following values for disk overcommiting:

- Total space: Storage space times the overcommiting ratio;
- Real total space: Storage space (without considering any overcommiting);
- Used space: Sum of user disk spaces;
- Real used space: Real space, less then used space, since thin provisioned images (and CoW) reduces space.

I don't how how to put all these informations in the Horizon interface.

Revision history for this message
Rarylson Freitas (rarylson) wrote :

As a ideia, we could show both concepts (physical total and total with overcommit) in the interface.

The attached image represents the ideia.

It could appear at the screen after the user click in a "more", "advance" or "detailed" button at the screen.

Revision history for this message
Gregory Gee (gee-gregory) wrote :

I'm confused about the behaviour. Are you saying that nova API will always return the incorrect value for vpcu from now on? This is intentional to return the incorrect value? How would an upstream VNFM or Orchestration know how much vcpu is really available? This doesn't sound like a UX issue to me if the data source is feeding wrong info.

Revision history for this message
Stephen Gordon (sgordon) wrote :

My understanding is what is being suggested is that if field(s) to return the total available VCPU/RAM counts factoring in overcommit are desired (or to return the ratios themselves somewhere), then they should be added as net new to the API. The existing fields to return the totals were not designed to factor in the overcommit and never have been, one assumes because this is a scheduler side setting so neither the hypervisor nor the api service actually knew what the overcommit was and thus had no ability to factor it in even if they wanted to (recent changes to help with issues in claim processing may help here).

The main source of confusion is the naming of the field as vcpus when really it is just physical cpus (for which the correct value is in fact being returned). The RAM values seem to confuse people less even though it behaves exactly the same way because the naming isn't as misleading.

Revision history for this message
Hayati Gonultas (hayati-gonultas) wrote :

Naming VCPUs to physical CPUs in Horizon UI may be concern of Horizon, but there should be an API to get overcommit ratio of physical cpus. Either each flavour or a global config value may contain this ratio but some API is needed to get this setting.

Additionally, if "vcpus" in response body shows the number of physical cpus then its name should be changed to an unambigious one (vcpus sounds like it is the number of virtual cpus). If it is really the number of virtual cpus as its name sounds, then its value should be updated with overcommit ratio parameter.

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

I think this should be considered a flaw in Nova, this is similar to the issue where the resource_tracker in Nova counts Cinder volumes as local hypervisor storage as I have tried addressing however doesn't seem like nobody wants to take a strong grip on these issues.

It might not be a functional flaw but it's very misleading and providing the wrong statistics is just annoying.

Both these issues should be brought up properly in meetings and be addressed, I have heard this was gonna be fixed when the resource-objects BP lands in Nova[1] however I've worked for fixes on this for liberty but it did not make it into master nor liberty. So fixing this in master and back porting to Liberty and now Mitaka should be prioritized.

[1] Think it's this one: https://github.com/openstack/nova-specs/blob/master/specs/liberty/approved/resource-objects.rst

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

CONFIRMED FOR: LIBERTY

Revision history for this message
Deepa (dpaclt) wrote :

Unfortunately same can be seen in Mitaka version.Seems this is not fixed

Revision history for this message
do3meli (d-info-e) wrote :

This seems to be still an issue in pike release. please consider rescheduling this as it is totally confusing.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I'm confused as to why this is relating to the cinder-backup charm. I'll set to incomplete to solicit further info..

Changed in charm-cinder-backup:
status: New → Incomplete
Revision history for this message
Viktor Tikkanen (viktor-tikkanen) wrote :

One more example: let's use hw:cpu_policy='dedicated'/hw:cpu_thread_policy='isolate' extra specs and launch as many instances as possible in 3 hosts (8 vcpus each, HT enabled). Finally "openstack hypervisor stats show" displays:

+----------------------+-------+
| Field | Value |
+----------------------+-------+
| count | 3 |
...
| running_vms | 6 |
| vcpus | 24 |
| vcpus_used | 12 |
+----------------------+-------+

At this point no more instances can be launched with the same flavor (at least in my case) because NUMATopologyFilter doesn't give any suitable hosts. This is OK but vcpus/vcpus_used values are quite confusing because they give an impression that only 50% of CPU resources are used.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.