Activity log for bug #1729621

Date Who What changed Old value New value Message
2017-11-02 14:54:05 Maciej Jozefczyk bug added bug
2017-11-02 14:56:34 Maciej Jozefczyk description Description =========== Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733) 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE) 678 def _update_available_resource(self, context, resources): 679 681 # initialize the compute node object, creating it 682 # if it does not already exist. 683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.* 731 # update the compute_node 732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). Steps to reproduce ================== 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result =============== Returned values should be the same during test. Actual result ============= while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Bad values where stored in for about 5 seconds. During this time nova-scheduler could take this host. Environment =========== Devstack master (f974e3c3566f379211d7fdc790d07b5680925584). For sure releases down to Newton are impacted. Description =========== Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource(). In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used. Resources are taken from function self.driver.get_available_resource(): https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617 https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766 This function calculates allocated vcpu's based on function _get_vcpu_total(). https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352 As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances. At the end of resource update process function _update_available_resource() is beign called: > /opt/stack/nova/nova/compute/resource_tracker.py(733)  677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)  678 def _update_available_resource(self, context, resources):  679  681 # initialize the compute node object, creating it  682 # if it does not already exist.  683 self._init_compute_node(context, resources) It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.*  731 # update the compute_node  732 self._update(context, cn) The inconsistency is automatically fixed during other code execution: https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709 But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage). Steps to reproduce ================== 1) Start devstack 2) Create 120 instances 3) Stop some instances 4) Watch blinking values in nova hypervisor-show nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db Expected result =============== Returned values should be the same during test. Actual result ============= while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120 Bad values were stored in nova DB for about 5 seconds. During this time nova-scheduler could take this host. Environment =========== Devstack master (f974e3c3566f379211d7fdc790d07b5680925584). For sure releases down to Newton are impacted.
2017-11-02 14:58:10 Maciej Jozefczyk bug added subscriber ElComandante
2017-11-13 05:23:44 Belmiro Moreira bug added subscriber Belmiro Moreira
2017-11-13 16:18:40 Matt Riedemann tags resource-tracker
2017-11-14 16:01:34 Maciej Jozefczyk nova: assignee Maciej Jozefczyk (maciej.jozefczyk)
2017-11-14 16:01:43 Maciej Jozefczyk nova: status New In Progress
2017-12-22 01:44:56 OpenStack Infra nova: assignee Maciej Jozefczyk (maciej.jozefczyk) Minho Ban (mhban)
2017-12-22 08:06:22 Maciej Jozefczyk nova: assignee Minho Ban (mhban) Maciej Jozefczyk (maciej.jozefczyk)
2018-01-29 23:12:21 Matt Riedemann nova: importance Undecided High
2018-01-29 23:12:26 Matt Riedemann nominated for series nova/ocata
2018-01-29 23:12:26 Matt Riedemann bug task added nova/ocata
2018-01-29 23:12:26 Matt Riedemann nominated for series nova/pike
2018-01-29 23:12:26 Matt Riedemann bug task added nova/pike
2018-08-06 15:51:12 OpenStack Infra nova: assignee Maciej Jozefczyk (maciej.jozefczyk) Eric Fried (efried)
2018-08-06 15:56:33 Eric Fried nova: assignee Eric Fried (efried) Maciej Jozefczyk (maciej.jozefczyk)
2018-08-21 17:36:55 OpenStack Infra nova: status In Progress Fix Released
2018-10-22 08:25:05 Radoslav Gerganov nominated for series nova/queens
2018-10-22 08:25:05 Radoslav Gerganov nominated for series nova/rocky
2018-10-22 09:45:09 OpenStack Infra nova/pike: status New In Progress
2018-10-22 09:45:09 OpenStack Infra nova/pike: assignee Radoslav Gerganov (rgerganov)
2018-11-12 18:17:15 OpenStack Infra tags resource-tracker in-stable-rocky resource-tracker
2019-03-25 05:47:54 OpenStack Infra nova/pike: assignee Radoslav Gerganov (rgerganov) Tony Breeds (o-tony)
2019-03-28 03:33:26 OpenStack Infra tags in-stable-rocky resource-tracker in-stable-queens in-stable-rocky resource-tracker
2019-08-21 16:15:11 Matt Riedemann bug task added nova/queens
2019-08-21 16:15:16 Matt Riedemann bug task added nova/rocky
2019-08-21 16:15:36 Matt Riedemann nova/queens: status New Fix Released
2019-08-21 16:15:47 Matt Riedemann nova/rocky: status New Fix Released
2019-08-21 16:16:00 Matt Riedemann nova/pike: assignee Tony Breeds (o-tony) Radoslav Gerganov (rgerganov)
2019-08-21 16:16:20 Matt Riedemann nova/pike: status In Progress Won't Fix
2019-08-21 16:16:42 Matt Riedemann nova/queens: assignee Radoslav Gerganov (rgerganov)
2019-08-21 16:16:59 Matt Riedemann nova/rocky: assignee Radoslav Gerganov (rgerganov)
2019-08-21 16:17:09 Matt Riedemann bug task deleted nova/ocata
2019-08-21 16:17:28 Matt Riedemann nova/queens: importance Undecided High
2019-08-21 16:17:40 Matt Riedemann nova/rocky: importance Undecided High
2019-08-21 16:17:48 Matt Riedemann nova/pike: importance Undecided High
2021-02-26 15:56:49 Balazs Gibizer nova/pike: status Won't Fix In Progress
2021-02-26 15:56:54 Balazs Gibizer nova/pike: assignee Radoslav Gerganov (rgerganov) Balazs Gibizer (balazs-gibizer)
2022-08-01 11:03:43 OpenStack Infra nova/pike: status In Progress Fix Released