hypervisor status does not update immediately after 'nova delete'

Bug #1210436 reported by zhgaoxa
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Marian Horban

Bug Description

Hypervisor status does not update immediately after 'nova delete', after about 1min the status is correct. But if we boot instance or live migrate on this hyperviosr, it make mistake.
Also, I notice that when running command 'nova boot --image cirros-0.3.1-x86_64-uec --flavor 1 vm1' every one second, the vm will be on one host mostly, because the same reason. Furthermore, after many times 'boot' and 'delete' the hypervisor status becomes muddled.
Actually, after 'nova delete' the table compute_nodes will not be updated immediately, it also lasts until periodic task.

BUT, those should be updated immediately.

Test steps:
1. environment, there is a 'vm1' on hypervisor 'ubuntu'
zhgaoxa@ubuntu:~/src$ nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+------------------+
| a439e64e-0d1c-4c23-8cfd-6ab12dc9a209 | vm1 | ACTIVE | None | Running | private=10.0.0.2 |
+--------------------------------------+------+--------+------------+-------------+------------------+
zhgaoxa@ubuntu:~/src$ nova show vm1
+--------------------------------------+----------------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------------+
| status | ACTIVE |
| updated | 2013-07-18T07:05:29Z |
| OS-EXT-STS:task_state | None |
| OS-EXT-SRV-ATTR:host | ubuntu |
| key_name | None |
| image | cirros-0.3.1-x86_64-uec (2898d4c0-40e7-4449-a21b-07264f32713d) |
| private network | 10.0.0.2 |
| hostId | 3ed8295691f7c645346e3ebd2878c11aa9463c1386b3b1c38e3e42e6 |
| OS-EXT-STS:vm_state | active |
| OS-EXT-SRV-ATTR:instance_name | instance-00000057 |
| OS-SRV-USG:launched_at | 2013-07-18T07:05:29.000000 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | ubuntu |
| flavor | m1.tiny (1) |
| id | a439e64e-0d1c-4c23-8cfd-6ab12dc9a209 |
| security_groups | [{u'name': u'default'}] |
| OS-SRV-USG:terminated_at | None |
| user_id | 2361015c303149f8aa5bca17fb5be92d |
| name | vm1 |
| created | 2013-07-18T07:05:25Z |
| tenant_id | cc05fb78cffb4c308ead93b731ca7357 |
| OS-DCF:diskConfig | MANUAL |
| metadata | {} |
| os-extended-volumes:volumes_attached | [] |
| accessIPv4 | |
| accessIPv6 | |
| progress | 0 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-AZ:availability_zone | nova |
| config_drive | |
+--------------------------------------+----------------------------------------------------------------+

2. delete vm1 and watch the hypervior status. The value of property free_ram_mb/free_disk_gb/ vcpus_used / vcpus_used /memory_mb_used/running_vms are not correct until about 1min later periodic task run.
zhgaoxa@ubuntu:~/src$ nova delete vm1
zhgaoxa@ubuntu:~/src$ watch 'nova hypervisor-show ubuntu'
Every 2.0s: nova hypervisor-show ubuntu Thu Jul 18 15:09:18 2013

+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| hypervisor_hostname | ubuntu |
| cpu_info | {"vendor": "Intel", "model": "Conroe", "arch": "x86_64", "features": ["rdtscp", "ht", "vme"], "topology": {"cores": 2, "threads": 1, "sockets": 1}} |
| free_disk_gb | 29 |
| hypervisor_version | 1000000 |
| disk_available_least | 25 |
| local_gb | 30 |
| free_ram_mb | 979 |
| id | 1 |
| vcpus_used | 1 |
| hypervisor_type | QEMU |
| local_gb_used | 1 |
| memory_mb_used | 1024 |
| memory_mb | 2003 |
| current_workload | 0 |
| vcpus | 2 |
| running_vms | 1 |
| service_id | 2 |
| service_host | ubuntu |
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+

4. mysql> select * from compute_nodes;
mysql> select * from compute_nodes;
+---------------------+---------------------+------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+
| created_at | updated_at | deleted_at | id | service_id | vcpus | memory_mb | local_gb | vcpus_used | memory_mb_used | local_gb_used | hypervisor_type | hypervisor_version | cpu_info | disk_available_least | free_ram_mb | free_disk_gb | current_workload | running_vms | hypervisor_hostname | deleted |
+---------------------+---------------------+------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+
| 2013-07-17 05:55:55 | 2013-07-18 05:37:13 | NULL | 1 | 2 | 2 | 2003 | 30 | 0 | 512 | 0 | QEMU | 1000000 | {"vendor": "Intel", "model": "Conroe", "arch": "x86_64", "features": ["rdtscp", "ht", "vme"], "topology": {"cores": 2, "threads": 1, "sockets": 1}} | 25 | 1491 | 30 | 0 | 0 | ubuntu | 0 |
+---------------------+---------------------+------------+----+------------+-------+-----------+----------+------------+----------------+---------------+-----------------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+-------------+--------------+------------------+-------------+---------------------+---------+
1 row in set (0.00 sec)

Revision history for this message
zhgaoxa (gaozheng0123) wrote :
Download full text (11.2 KiB)

This will show how boot instance error:
1. zhgaoxa@ubuntu:~/src$ for i in {1..10};do nova boot --image cirros-0.3.1-x86_64-uec --flavor 1 vm$i;done
2. zhgaoxa@ubuntu:~/src$ nova list
+--------------------------------------+------+--------+------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+------------------+
| 19e24952-8ad1-43be-b63f-30d99048e5a5 | vm1 | ACTIVE | None | Running | private=10.0.0.2 |
| 5c4a0391-193a-4761-92a5-3389e6fcb7c0 | vm10 | ERROR | None | NOSTATE | |
| 99afccff-fbcb-4997-bb67-b19524747b2b | vm2 | ACTIVE | None | Running | private=10.0.0.3 |
| 80a352e8-956a-4813-8075-acdf3296f4e3 | vm3 | ACTIVE | None | Running | private=10.0.0.4 |
| 07aa47da-7f6c-4f6a-907f-430b40562469 | vm4 | ACTIVE | None | Running | private=10.0.0.5 |
| 5c72622f-44a2-45ea-9d03-4c07da27e8ff | vm5 | ACTIVE | None | Running | private=10.0.0.6 |
| b491b24c-7eae-48dc-946d-32969c886699 | vm6 | ERROR | None | NOSTATE | |
| 51f01870-77af-46f3-bc66-ca20e5bec178 | vm7 | ERROR | None | NOSTATE | |
| 2bc26be6-4268-476d-bed0-e0fa26ee70f4 | vm8 | ERROR | None | NOSTATE | |
| 692f9d20-1d82-49e6-ad14-1e1a6c479516 | vm9 | ERROR | None | NOSTATE | |
+--------------------------------------+------+--------+------------+-------------+------------------+
3. zhgaoxa@ubuntu:~/src$ nova hypervisor-show ubuntu
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| hypervisor_hostname | ubuntu |
| cpu_info | {"vendor": "Intel", "model": "Conroe", "arch": "x86_64", "features": ["rdtscp", "ht", "vme"], "topology": {"cores": 2, "threads": 1, "sockets": 1}} |
| free_disk_gb | 25 |
| hypervisor_version | 1000000 |
| disk_available_least | 20 |
| local_gb | 30 ...

tags: added: compute libvirtt
removed: hypervisor status
tags: added: libvirt
removed: libvirtt
Revision history for this message
melanie witt (melwitt) wrote :

The delayed update in status can be a significant problem when it comes to scheduling rapidly created instances, as the bug reporter mentions.

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Changed in nova:
assignee: nobody → Andres Buraschi (andres-buraschi)
Revision history for this message
Andres Buraschi (andres-buraschi) wrote :

Hi zhgaoxa,
I've been tried to reproduce the instance creation error with no 'luck'... I can see the delay between a deletion and the update, which in my case rounds 30 seconds, but the 10-vm creation simply takes longer or shorter but in the end all the instances finish up and running.

I'm using the last code updates ('git pull' done recently for all repos). Can you still reproduce the issue? I'll be glad to help, any other information that can be provided will be appreciated.

Regards.

Changed in nova:
assignee: Andres Buraschi (andres-buraschi) → nobody
Marian Horban (mhorban)
Changed in nova:
assignee: nobody → Marian Horban (mhorban)
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

I could be wrong, but this seems to be "by design". The Nova API is asynchronous by design, and requests aren't guaranteed to be handled immediately. IRC log from conversation with dansmith included below.

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2015-11-23.log.html#t2015-11-23T20:47:54

    <sfinucan> Does Nova have any concept of interrupt- or notification-driven tasks?
    <sfinucan> I see `periodic_task` used extensively and I'm also aware of the 'notify_about_instance_usage' and related functions. The latter is polling-based though, while the former only seems to issue user-focused messages
    <sfinucan> s/latter/former/ s/former/latter/
    <dansmith> sfinucan: confusing
    <sfinucan> Sorry
    <dansmith> sfinucan: the notify there is to send a notification about a change, which happens when a thing happens, to notify someone like the deployer about a change
    <dansmith> periodic tasks happen on a timer to poll for cleanups, etc
    <sfinucan> dansmith: OK. So for something like VM creation/deletion, we seem to use periodic tasks
    <dansmith> sfinucan: use more words
    <sfinucan> OK :)
    <dansmith> sfinucan: vm create/destroy happens because you asked for it via the api
    <sfinucan> Why do we use periodic tasks to check if VM creation/deletion is done?
    <dansmith> sfinucan: in case it fails in the middle and we need to clean it up
    <sfinucan> dansmith: But this means there's a period of time between when a VM is finished being created/deleted and when the user is told it's finished, correct?
    <sfinucan> Which causes bugs like this https://bugs.launchpad.net/nova/+bug/1210436
    <dansmith> sfinucan: no
    <sfinucan> dansmith: Oh?
    <dansmith> sfinucan: there is a delay between you asking for it to be deleted and it being deleted, but that is not because of a periodic
    <dansmith> sfinucan: that's because most API requests are async.. they return immediately and put something on the message queue to be processed
    <dansmith> sfinucan: which is a very fundamental design point of nova (since I can feel "why" coming)
    <sfinucan> dansmith: Hmm OK that's interesting. So we can say that as soon as the VM deletion request is received and completed by Nova, the DB should reflect the deletion?
    <dansmith> sfinucan: as soon as it's completed by nova, yeah
    <dansmith> sfinucan: which is not equal to "received"
    <sfinucan> dansmith: OK, that's interesting :)
    <sfinucan> I think the linked bug can be marked as invalid for this reason so I'll do just that. Thanks for the help, dansmith
    <dansmith> sfinucan: yep, probably (having read only the title)

Changed in nova:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.