After evacuate origin host still report a runing vm

Bug #1285259 reported by Juan Manuel Ollé on 2014-02-26
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
jiang, yunhong

Bug Description

After evacuate a host with one instance to a target host it still report there is an instance in that hypervisor

Pre evacuate report:

$ nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 2 |
| current_workload | 0 |
| disk_available_least | 22 |
| free_disk_gb | 50 |
| free_ram_mb | 3860 |
| local_gb | 50 |
| local_gb_used | 0 |
| memory_mb | 4948 |
| memory_mb_used | 1088 |
| running_vms | 1 |
| vcpus | 3 |
| vcpus_used | 1 |
+----------------------+-------+

$ nova hypervisor-list
+----+---------------------+
| ID | Hypervisor hostname |
+----+---------------------+
| 1 | jmolle-Controller |
| 2 | jmolle-Node1 |
+----+---------------------+

$ nova hypervisor-show jmolle-Controller
+---------------------------+-------------------------------------------------------+
| Property | Value |
+---------------------------+-------------------------------------------------------+
| cpu_info_arch | x86_64 |
| cpu_info_features | ["rdtscp", "hypervisor", "x2apic", "ss", "ds", "vme"] |
| cpu_info_model | Westmere |
| cpu_info_topology_cores | 1 |
| cpu_info_topology_sockets | 2 |
| cpu_info_topology_threads | 1 |
| cpu_info_vendor | Intel |
| current_workload | 0 |
| disk_available_least | 10 |
| free_disk_gb | 25 |
| free_ram_mb | 3378 |
| host_ip | 192.168.41.101 |
| hypervisor_hostname | jmolle-Controller |
| hypervisor_type | QEMU |
| hypervisor_version | 1000000 |
| id | 1 |
| local_gb | 25 |
| local_gb_used | 0 |
| memory_mb | 3954 |
| memory_mb_used | 576 |
| running_vms | 1 |
| service_host | jmolle-Controller |
| service_id | 4 |
| vcpus | 2 |
| vcpus_used | 1 |
+---------------------------+-------------------------------------------------------+

$ nova hypervisor-servers jmolle-Controller
+--------------------------------------+-------------------+---------------+---------------------+
| ID | Name | Hypervisor ID | Hypervisor Hostname |
+--------------------------------------+-------------------+---------------+---------------------+
| a3c291e5-05b0-43fc-b16b-121bf36f30c0 | instance-00000001 | 1 | jmolle-Controller |
+--------------------------------------+-------------------+---------------+---------------------+

But after evacuate the instanse we get:

$ nova hypervisor-stats
+----------------------+-------+
| Property | Value |
+----------------------+-------+
| count | 2 |
| current_workload | 1 |
| disk_available_least | 22 |
| free_disk_gb | 50 |
| free_ram_mb | 3796 |
| local_gb | 50 |
| local_gb_used | 0 |
| memory_mb | 4948 |
| memory_mb_used | 1152 |
| running_vms | 2 |
| vcpus | 3 |
| vcpus_used | 2 |
+----------------------+-------+

here we see that now there are 2 running instances instead of one
and if we use show command we get:

$ nova hypervisor-show jmolle-Controller
+---------------------------+-------------------------------------------------------+
| Property | Value |
+---------------------------+-------------------------------------------------------+
| cpu_info_arch | x86_64 |
| cpu_info_features | ["rdtscp", "hypervisor", "x2apic", "ss", "ds", "vme"] |
| cpu_info_model | Westmere |
| cpu_info_topology_cores | 1 |
| cpu_info_topology_sockets | 2 |
| cpu_info_topology_threads | 1 |
| cpu_info_vendor | Intel |
| current_workload | 0 |
| disk_available_least | 10 |
| free_disk_gb | 25 |
| free_ram_mb | 3378 |
| host_ip | 192.168.41.101 |
| hypervisor_hostname | jmolle-Controller |
| hypervisor_type | QEMU |
| hypervisor_version | 1000000 |
| id | 1 |
| local_gb | 25 |
| local_gb_used | 0 |
| memory_mb | 3954 |
| memory_mb_used | 576 |
| running_vms | 1 |
| service_host | jmolle-Controller |
| service_id | 4 |
| vcpus | 2 |
| vcpus_used | 1 |
+---------------------------+-------------------------------------------------------+

we also see 1 running instance
but if we list servers we get 0

$ nova hypervisor-servers jmolle-Controller
+----+------+---------------+---------------------+
| ID | Name | Hypervisor ID | Hypervisor Hostname |
+----+------+---------------+---------------------+
+----+------+---------------+---------------------+

and the instance was evacuate to the other host correctly

$ nova hypervisor-servers jmolle-Node1
+--------------------------------------+-------------------+---------------+---------------------+
| ID | Name | Hypervisor ID | Hypervisor Hostname |
+--------------------------------------+-------------------+---------------+---------------------+
| a3c291e5-05b0-43fc-b16b-121bf36f30c0 | instance-00000001 | 2 | jmolle-Node1 |
+--------------------------------------+-------------------+---------------+---------------------+

I think this is a nova bug due to the inconcistency of hypervisor-show and hypervisor-servers nova commands

Juan Manuel Ollé (juan-m-olle) wrote :

Additional info.

after evacuate the database query

select hypervisor_hostname,running_vms from compute_nodes;

report

+---------------------+-------------+
| hypervisor_hostname | running_vms |
+---------------------+-------------+
| jmolle-Controller | 1 |
| jmolle-Node1 | 1 |
+---------------------+-------------+

but as soon as the compute node goes up, it update the database reportin running_vm = 0

should the evacuate command if it is succesful update the compute node to report that no instance is running?

Changed in nova:
assignee: nobody → jiang, yunhong (yunhong-jiang)
tags: added: api compute
Changed in nova:
importance: Undecided → Low
Changed in nova:
status: New → Confirmed
jiang, yunhong (yunhong-jiang) wrote :

I think there are in fact two things in this bug:

a) When "nova hypervisor-stats", we should not return the caculation for compute nodes that the corresponding compute service is disabled already.

b) when "nova hypervisor-show", it will be better to return the state of the compute node. But I'm not very sure if that needed because user can still get such information by "nova service-list", but I think return the state will be helpful.

I don't think we can catch change b) for I release, so I will firstly try to cook a patch for item a).

Thanks
--jyh

Fix proposed to branch: master
Review: https://review.openstack.org/80707

Changed in nova:
status: Confirmed → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/80708

Reviewed: https://review.openstack.org/80708
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8cd2b890710ba6e53884f43b5d6ce095672732a4
Submitter: Jenkins
Branch: master

commit 8cd2b890710ba6e53884f43b5d6ce095672732a4
Author: Yunhong Jiang <email address hidden>
Date: Thu Mar 13 16:29:26 2014 -0700

    Not count disabled compute node for statistics

    No server will be scheduled to disabled compute service, thus we
    should not count the corresponding compute node information.

    It's arguable if we should count for 'down' service, since service
    may be marked down because of communication error. If we do want to
    exclude the down service, we need passing the information from caller
    because the up/down state is not kepts in database, and it means compute
    and cell api changes.

    Closes-Bug: #1285259

    Change-Id: I5e3e71ef30683c5eb5cc4462f58fa5f29d7c3f4b

Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/80707
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7d8a78a29128debe7ed49bea394f952f37cee498
Submitter: Jenkins
Branch: master

commit 7d8a78a29128debe7ed49bea394f952f37cee498
Author: Yunhong Jiang <email address hidden>
Date: Thu Mar 13 15:09:34 2014 -0700

    Return status for compute node

    Currently when return compute node information, there is no status returned.

    When the corresponding service is disabled or down and users try to
    do 'hypervisor-list' or 'hypervisor-show', they will have no idea of it.

    Implements: blueprint return-status-for-hypervisor-node
    Closes-Bug: #1285259

    DocImpact

    Change-Id: I17c53b454ccef023f298f1b8875daef965d2325d

Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-10-16
Changed in nova:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers