2017-11-02 14:54:05 |
Maciej Jozefczyk |
bug |
|
|
added bug |
2017-11-02 14:56:34 |
Maciej Jozefczyk |
description |
Description
===========
Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource().
In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used.
Resources are taken from function self.driver.get_available_resource():
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766
This function calculates allocated vcpu's based on function _get_vcpu_total().
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352
As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances.
At the end of resource update process function _update_available_resource() is beign called:
> /opt/stack/nova/nova/compute/resource_tracker.py(733)
677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)
678 def _update_available_resource(self, context, resources):
679
681 # initialize the compute node object, creating it
682 # if it does not already exist.
683 self._init_compute_node(context, resources)
It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.*
731 # update the compute_node
732 self._update(context, cn)
The inconsistency is automatically fixed during other code execution:
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709
But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage).
Steps to reproduce
==================
1) Start devstack
2) Create 120 instances
3) Stop some instances
4) Watch blinking values in nova hypervisor-show
nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db
Expected result
===============
Returned values should be the same during test.
Actual result
=============
while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done
Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Bad values where stored in for about 5 seconds. During this time nova-scheduler could take this host.
Environment
===========
Devstack master (f974e3c3566f379211d7fdc790d07b5680925584).
For sure releases down to Newton are impacted. |
Description
===========
Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource().
In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used.
Resources are taken from function self.driver.get_available_resource():
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766
This function calculates allocated vcpu's based on function _get_vcpu_total().
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352
As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances.
At the end of resource update process function _update_available_resource() is beign called:
> /opt/stack/nova/nova/compute/resource_tracker.py(733)
677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)
678 def _update_available_resource(self, context, resources):
679
681 # initialize the compute node object, creating it
682 # if it does not already exist.
683 self._init_compute_node(context, resources)
It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.*
731 # update the compute_node
732 self._update(context, cn)
The inconsistency is automatically fixed during other code execution:
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709
But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage).
Steps to reproduce
==================
1) Start devstack
2) Create 120 instances
3) Stop some instances
4) Watch blinking values in nova hypervisor-show
nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db
Expected result
===============
Returned values should be the same during test.
Actual result
=============
while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done
Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Bad values were stored in nova DB for about 5 seconds. During this time nova-scheduler could take this host.
Environment
===========
Devstack master (f974e3c3566f379211d7fdc790d07b5680925584).
For sure releases down to Newton are impacted. |
|
2017-11-02 14:58:10 |
Maciej Jozefczyk |
bug |
|
|
added subscriber ElComandante |
2017-11-13 05:23:44 |
Belmiro Moreira |
bug |
|
|
added subscriber Belmiro Moreira |
2017-11-13 16:18:40 |
Matt Riedemann |
tags |
|
resource-tracker |
|
2017-11-14 16:01:34 |
Maciej Jozefczyk |
nova: assignee |
|
Maciej Jozefczyk (maciej.jozefczyk) |
|
2017-11-14 16:01:43 |
Maciej Jozefczyk |
nova: status |
New |
In Progress |
|
2017-12-22 01:44:56 |
OpenStack Infra |
nova: assignee |
Maciej Jozefczyk (maciej.jozefczyk) |
Minho Ban (mhban) |
|
2017-12-22 08:06:22 |
Maciej Jozefczyk |
nova: assignee |
Minho Ban (mhban) |
Maciej Jozefczyk (maciej.jozefczyk) |
|
2018-01-29 23:12:21 |
Matt Riedemann |
nova: importance |
Undecided |
High |
|
2018-01-29 23:12:26 |
Matt Riedemann |
nominated for series |
|
nova/ocata |
|
2018-01-29 23:12:26 |
Matt Riedemann |
bug task added |
|
nova/ocata |
|
2018-01-29 23:12:26 |
Matt Riedemann |
nominated for series |
|
nova/pike |
|
2018-01-29 23:12:26 |
Matt Riedemann |
bug task added |
|
nova/pike |
|
2018-08-06 15:51:12 |
OpenStack Infra |
nova: assignee |
Maciej Jozefczyk (maciej.jozefczyk) |
Eric Fried (efried) |
|
2018-08-06 15:56:33 |
Eric Fried |
nova: assignee |
Eric Fried (efried) |
Maciej Jozefczyk (maciej.jozefczyk) |
|
2018-08-21 17:36:55 |
OpenStack Infra |
nova: status |
In Progress |
Fix Released |
|
2018-10-22 08:25:05 |
Radoslav Gerganov |
nominated for series |
|
nova/queens |
|
2018-10-22 08:25:05 |
Radoslav Gerganov |
nominated for series |
|
nova/rocky |
|
2018-10-22 09:45:09 |
OpenStack Infra |
nova/pike: status |
New |
In Progress |
|
2018-10-22 09:45:09 |
OpenStack Infra |
nova/pike: assignee |
|
Radoslav Gerganov (rgerganov) |
|
2018-11-12 18:17:15 |
OpenStack Infra |
tags |
resource-tracker |
in-stable-rocky resource-tracker |
|
2019-03-25 05:47:54 |
OpenStack Infra |
nova/pike: assignee |
Radoslav Gerganov (rgerganov) |
Tony Breeds (o-tony) |
|
2019-03-28 03:33:26 |
OpenStack Infra |
tags |
in-stable-rocky resource-tracker |
in-stable-queens in-stable-rocky resource-tracker |
|
2019-08-21 16:15:11 |
Matt Riedemann |
bug task added |
|
nova/queens |
|
2019-08-21 16:15:16 |
Matt Riedemann |
bug task added |
|
nova/rocky |
|
2019-08-21 16:15:36 |
Matt Riedemann |
nova/queens: status |
New |
Fix Released |
|
2019-08-21 16:15:47 |
Matt Riedemann |
nova/rocky: status |
New |
Fix Released |
|
2019-08-21 16:16:00 |
Matt Riedemann |
nova/pike: assignee |
Tony Breeds (o-tony) |
Radoslav Gerganov (rgerganov) |
|
2019-08-21 16:16:20 |
Matt Riedemann |
nova/pike: status |
In Progress |
Won't Fix |
|
2019-08-21 16:16:42 |
Matt Riedemann |
nova/queens: assignee |
|
Radoslav Gerganov (rgerganov) |
|
2019-08-21 16:16:59 |
Matt Riedemann |
nova/rocky: assignee |
|
Radoslav Gerganov (rgerganov) |
|
2019-08-21 16:17:09 |
Matt Riedemann |
bug task deleted |
nova/ocata |
|
|
2019-08-21 16:17:28 |
Matt Riedemann |
nova/queens: importance |
Undecided |
High |
|
2019-08-21 16:17:40 |
Matt Riedemann |
nova/rocky: importance |
Undecided |
High |
|
2019-08-21 16:17:48 |
Matt Riedemann |
nova/pike: importance |
Undecided |
High |
|
2021-02-26 15:56:49 |
Balazs Gibizer |
nova/pike: status |
Won't Fix |
In Progress |
|
2021-02-26 15:56:54 |
Balazs Gibizer |
nova/pike: assignee |
Radoslav Gerganov (rgerganov) |
Balazs Gibizer (balazs-gibizer) |
|
2022-08-01 11:03:43 |
OpenStack Infra |
nova/pike: status |
In Progress |
Fix Released |
|