Ironic: Invalid hypervisor stats info while instance running

Bug #1637449 reported by Tuan on 2016-10-28
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)

Bug Description


hypervisor-stats of nova showing wrong information of ironic node resource.

Steps to reproduce
Environment was setup following

After delpoy 3 ironic-nodes, each has 1 cpu, 1024mb mem, 1gb disk, 2 instances running:
#nova hypervisor-stats
| Property | Value |
| count | 3 |
| current_workload | 1 |
| disk_available_least | -10 |
| free_disk_gb | 10 |
| free_ram_mb | 1024 |
| local_gb | 10 |
| local_gb_used | 20 |
| memory_mb | 1024 |
| memory_mb_used | 2048 |
| running_vms | 2 |
| vcpus | 1 |
| vcpus_used | 2 |

Expected result

vcpus should be 3.
memory_mb should be 3072.
local_gb should be 30.

Tuan (tuanla) on 2016-10-28
Changed in ironic:
assignee: nobody → Tuan (tuanla)
Changed in nova:
assignee: nobody → Tuan (tuanla)

Fix proposed to branch: master

Changed in nova:
status: New → In Progress
joel (uestcjoel) on 2016-10-28
Changed in nova:
assignee: Tuan (tuanla) → joel (uestcjoel)
assignee: joel (uestcjoel) → nobody
Changed in nova:
assignee: nobody → Tuan (tuanla)
Dmitry Tantsur (divius) wrote :

Thanks for reporting it, I think I've seen this problem myself. However, it's not related to the Ironic service, so I'm closing the Ironic part of this bug.

Changed in ironic:
status: New → Invalid
Tuan (tuanla) on 2016-11-01
Changed in ironic:
assignee: Tuan (tuanla) → nobody
Changed in nova:
assignee: Tuan (tuanla) → Dao Cong Tien (tiendc)
Changed in nova:
assignee: Dao Cong Tien (tiendc) → Tuan (tuanla)
Vladyslav Drok (vdrok) wrote :

The logic here [0] indeed seems to be incorrect, eg in case of there is
a node in available state with instance_uuid set, first the driver will
report vcpus=vcpus_used=properties['vcpus'] and then will set vcpus=0
leaving vcpus_used intact.

My proposal here is the following:

* If there is an instance_uuid on the node, no matter what provision/power
  state it's in, consider the resources as used. In case it's an orphan,
  an admin will need to take some manual action anyway.

* If there is no instance_uuid and a node is in cleaning/clean wait after
  tear down, it is a part of normal node lifecycle, report all resources
  as used. This means we need a way to determine if it's a manual or
  automated clean.

* If there is no instance_uuid, and a node:
  - has a bad power state or
  - is in maintenance
  - manual clean is happening
  or actually in any other case, consider it unavailable, report available
  resources = used resources = 0. Provision state does not matter in this
  logic, all cases that we wanted to take into account are described in
  the first two bullets.


Change abandoned by Tuan Luong-Anh (<email address hidden>) on branch: master

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers