As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances.
At the end of resource update process function _update_available_resource() is beign called:
> /opt/stack/nova/nova/compute/resource_tracker.py(733)
677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)
678 def _update_available_resource(self, context, resources):
679
681 # initialize the compute node object, creating it
682 # if it does not already exist.
683 self._init_compute_node(context, resources)
It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.*
731 # update the compute_node
732 self._update(context, cn)
But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage).
Steps to reproduce
==================
1) Start devstack
2) Create 120 instances
3) Stop some instances
4) Watch blinking values in nova hypervisor-show
nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db
Expected result
===============
Returned values should be the same during test.
Actual result
=============
while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done
Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Bad values where stored in for about 5 seconds. During this time nova-scheduler could take this host.
Environment
===========
Devstack master (f974e3c3566f379211d7fdc790d07b5680925584).
For sure releases down to Newton are impacted.
Description
===========
Nova updates hypervisor resources using function called ./nova/ compute/ resource_ tracker. py:update_ available_ resource( ).
In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used.
Resources are taken from function self.driver. get_available_ resource( ): /github. com/openstack/ nova/blob/ f974e3c3566f379 211d7fdc790d07b 5680925584/ nova/compute/ resource_ tracker. py#L617 /github. com/openstack/ nova/blob/ f974e3c3566f379 211d7fdc790d07b 5680925584/ nova/virt/ libvirt/ driver. py#L5766
https:/
https:/
This function calculates allocated vcpu's based on function _get_vcpu_total(). /github. com/openstack/ nova/blob/ f974e3c3566f379 211d7fdc790d07b 5680925584/ nova/virt/ libvirt/ driver. py#L5352
https:/
As we see in _get_vcpu_total() function calls *self._ host.list_ guests( )* without "only_running= False" parameter. So it doesn't respect shutdowned instances.
At the end of resource update process function _update_ available_ resource( ) is beign called: nova/nova/ compute/ resource_ tracker. py(733)
> /opt/stack/
677 @utils. synchronized( COMPUTE_ RESOURCE_ SEMAPHORE) available_ resource( self, context, resources): compute_ node(context, resources)
678 def _update_
679
681 # initialize the compute node object, creating it
682 # if it does not already exist.
683 self._init_
It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.*
731 # update the compute_node context, cn)
732 self._update(
The inconsistency is automatically fixed during other code execution: /github. com/openstack/ nova/blob/ f974e3c3566f379 211d7fdc790d07b 5680925584/ nova/compute/ resource_ tracker. py#L709
https:/
But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage).
Steps to reproduce
==================
1) Start devstack 7914-48fb- a235-6fe3a41bb6 db
2) Create 120 instances
3) Stop some instances
4) Watch blinking values in nova hypervisor-show
nova hypervisor-show e6dfc16b-
Expected result
===============
Returned values should be the same during test.
Actual result hostname, vcpus_used from compute_nodes where hypervisor_ hostname= 'example. compute. node.com' ;" | mysql nova_cell1; sleep 0.3; done
=============
while true; do echo -n "$(date) "; echo "select hypervisor_
Thu Nov 2 14:50:09 UTC 2017 example. compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 117 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120 compute. node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.
Thu Nov 2 14:50:10 UTC 2017 example.
Thu Nov 2 14:50:10 UTC 2017 example.
Thu Nov 2 14:50:11 UTC 2017 example.
Thu Nov 2 14:50:11 UTC 2017 example.
Thu Nov 2 14:50:11 UTC 2017 example.
Thu Nov 2 14:50:11 UTC 2017 example.
Thu Nov 2 14:50:12 UTC 2017 example.
Thu Nov 2 14:50:12 UTC 2017 example.
Thu Nov 2 14:50:12 UTC 2017 example.
Thu Nov 2 14:50:13 UTC 2017 example.
Thu Nov 2 14:50:13 UTC 2017 example.
Thu Nov 2 14:50:13 UTC 2017 example.
Thu Nov 2 14:50:14 UTC 2017 example.
Thu Nov 2 14:50:14 UTC 2017 example.
Thu Nov 2 14:50:14 UTC 2017 example.
Thu Nov 2 14:50:15 UTC 2017 example.
Thu Nov 2 14:50:15 UTC 2017 example.
Thu Nov 2 14:50:15 UTC 2017 example.
Thu Nov 2 14:50:16 UTC 2017 example.
Thu Nov 2 14:50:16 UTC 2017 example.
Thu Nov 2 14:50:16 UTC 2017 example.
Thu Nov 2 14:50:17 UTC 2017 example.
Thu Nov 2 14:50:17 UTC 2017 example.
Thu Nov 2 14:50:17 UTC 2017 example.
Thu Nov 2 14:50:17 UTC 2017 example.
Thu Nov 2 14:50:18 UTC 2017 example.
Thu Nov 2 14:50:18 UTC 2017 example.
Thu Nov 2 14:50:18 UTC 2017 example.
Thu Nov 2 14:50:19 UTC 2017 example.
Thu Nov 2 14:50:19 UTC 2017 example.
Thu Nov 2 14:50:19 UTC 2017 example.
Bad values where stored in for about 5 seconds. During this time nova-scheduler could take this host.
Environment 9211d7fdc790d07 b5680925584) .
===========
Devstack master (f974e3c3566f37
For sure releases down to Newton are impacted.