Inconsistent value for vcpu_used

Bug #1729621 reported by Maciej Jozefczyk on 2017-11-02
52
This bug affects 6 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Maciej Jozefczyk
Pike
High
Radoslav Gerganov
Queens
High
Radoslav Gerganov
Rocky
High
Radoslav Gerganov

Bug Description

Description
===========

Nova updates hypervisor resources using function called ./nova/compute/resource_tracker.py:update_available_resource().

In case of *shutdowned* instances it could impact inconsistent values for resources like vcpu_used.

Resources are taken from function self.driver.get_available_resource():
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L617
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5766

This function calculates allocated vcpu's based on function _get_vcpu_total().
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/virt/libvirt/driver.py#L5352

As we see in _get_vcpu_total() function calls *self._host.list_guests()* without "only_running=False" parameter. So it doesn't respect shutdowned instances.

At the end of resource update process function _update_available_resource() is beign called:
> /opt/stack/nova/nova/compute/resource_tracker.py(733)

 677 @utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)
 678 def _update_available_resource(self, context, resources):
 679
 681 # initialize the compute node object, creating it
 682 # if it does not already exist.
 683 self._init_compute_node(context, resources)

It initialize compute node object with resources that are calculated without shutdowned instances. If compute node object already exists it *UPDATES* its fields - *for a while nova-api has other resources values than it its in real.*

 731 # update the compute_node
 732 self._update(context, cn)

The inconsistency is automatically fixed during other code execution:
https://github.com/openstack/nova/blob/f974e3c3566f379211d7fdc790d07b5680925584/nova/compute/resource_tracker.py#L709

But for heavy-loaded hypervisors (like 100 active instances and 30 shutdowned instances) it creates wrong informations in nova database for about 4-5 seconds (in my usecase) - it could impact other issues like spawning on already full hypervisor (because scheduler has wrong informations about hypervisor usage).

Steps to reproduce
==================

1) Start devstack
2) Create 120 instances
3) Stop some instances
4) Watch blinking values in nova hypervisor-show
nova hypervisor-show e6dfc16b-7914-48fb-a235-6fe3a41bb6db

Expected result
===============
Returned values should be the same during test.

Actual result
=============
while true; do echo -n "$(date) "; echo "select hypervisor_hostname, vcpus_used from compute_nodes where hypervisor_hostname='example.compute.node.com';" | mysql nova_cell1; sleep 0.3; done

Thu Nov 2 14:50:09 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:10 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:11 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:12 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:13 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:14 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:15 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:16 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 117
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:17 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:18 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120
Thu Nov 2 14:50:19 UTC 2017 example.compute.node.com 120

Bad values were stored in nova DB for about 5 seconds. During this time nova-scheduler could take this host.

Environment
===========
Devstack master (f974e3c3566f379211d7fdc790d07b5680925584).
For sure releases down to Newton are impacted.

description: updated

I see solutions:

1. Change self._init_compute_node() in _update_available_resource() to not call self._update(), maybe by introducing new boolean parameter in _init_compute_node() args to not call self._update().

2. Add some kind of db transaction (Its not a good idea I think)

3. Modify calls of self._host.list_guests() to list all instances (those shutdowned too) - but for sure it will break other things

4. Re-organize code (?)

Matt Riedemann (mriedem) on 2017-11-13
tags: added: resource-tracker
Changed in nova:
assignee: nobody → Maciej Jozefczyk (maciej.jozefczyk)
status: New → In Progress
Jay Pipes (jaypipes) wrote :

Please note that Pike and Ocata schedulers are not affected by this issue, since starting in Ocata, we stop using the ComputeNode.vcpus_used value from the CoreFilter (which was deprecated/removed in Ocata) and instead use the (accurate) information from the placement API service about allocated VCPU resources for instances. Placement doesn't care or know about whether an instance is shutdown or not -- only if the instance is "on" the host.

Matt Riedemann (mriedem) wrote :

To be clear, the CoreFilter isn't enabled by default, but Ram and Disk filters are.

The CoreFilter was not deprecated in Ocata:

https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/core_filter.py

In fact it's still not deprecated:

https://github.com/openstack/nova/blob/master/nova/scheduler/filters/core_filter.py

Because the CachingScheduler will need to rely on it since the CachingScheduler isn't using Placement, and while the CachingScheduler itself was deprecated in Pike:

https://github.com/openstack/nova/commit/d48bba18a7cebc57e63f5b2c5a1e939654de0883

We can't really remove it until we have a migration path for people using the CachingScheduler to move over to the FilterScheduler and populate Placement with the allocations that the FilterScheduler would have been creating in Pike (remember that once all computes are upgraded to Pike+, the ResourceTracker in nova-compute stops reporting allocations to Placement).

Changed in nova:
assignee: Maciej Jozefczyk (maciej.jozefczyk) → Minho Ban (mhban)

Working on patch

Changed in nova:
assignee: Minho Ban (mhban) → Maciej Jozefczyk (maciej.jozefczyk)
Matt Riedemann (mriedem) wrote :

There are more details in duplicate bug 1739349.

Changed in nova:
importance: Undecided → High

Change abandoned by Maciej Jozefczyk (<email address hidden>) on branch: master
Review: https://review.openstack.org/532924

Changed in nova:
assignee: Maciej Jozefczyk (maciej.jozefczyk) → Eric Fried (efried)
Eric Fried (efried) on 2018-08-06
Changed in nova:
assignee: Eric Fried (efried) → Maciej Jozefczyk (maciej.jozefczyk)

Reviewed: https://review.openstack.org/520024
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0
Submitter: Zuul
Branch: master

commit c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0
Author: Maciej Józefczyk <email address hidden>
Date: Thu Nov 16 14:49:42 2017 +0100

    Update resources once in update_available_resource

    This change ensures that resources are updated only once per
    update_available_resource() call.

    Compute resources were previously updated during host
    object initialization and at the end of
    update_available_resource(). It could cause inconsistencies
    in resource tracking between compute host and DB for couple
    of second when final _update() at the end of
    update_available_resource() is being called.

    For example: nova-api shows that host uses 10GB of RAM, but
    in fact its 12GB because DB doesn't have resources that belongs
    to shutdown instance.

    Because of that fact nova-scheduler (CachingScheduler) could
    choose (based on imcomplete information) host which is already full.

    For more informations please see realted bug: #1729621

    Change-Id: I120a98cc4c11772f24099081ef3ac44a50daf71d
    Closes-Bug: #1729621

Changed in nova:
status: In Progress → Fix Released
Radoslav Gerganov (rgerganov) wrote :

This bug affects *all* stable releases because there is race not only for vcpus_used but also for the compute stats which are used by the scheduler. See bug #1798806 for more details.

I will backport the fix for the sable releases.

Reviewed: https://review.openstack.org/612293
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=732b0571cc27a8a1aba30f44c18317f13325aad3
Submitter: Zuul
Branch: stable/rocky

commit 732b0571cc27a8a1aba30f44c18317f13325aad3
Author: Maciej Józefczyk <email address hidden>
Date: Thu Nov 16 14:49:42 2017 +0100

    Update resources once in update_available_resource

    This change ensures that resources are updated only once per
    update_available_resource() call.

    Compute resources were previously updated during host
    object initialization and at the end of
    update_available_resource(). It could cause inconsistencies
    in resource tracking between compute host and DB for couple
    of second when final _update() at the end of
    update_available_resource() is being called.

    For example: nova-api shows that host uses 10GB of RAM, but
    in fact its 12GB because DB doesn't have resources that belongs
    to shutdown instance.

    Because of that fact nova-scheduler (CachingScheduler) could
    choose (based on imcomplete information) host which is already full.

    For more informations please see realted bug: #1729621

    Change-Id: I120a98cc4c11772f24099081ef3ac44a50daf71d
    Closes-Bug: #1729621
    (cherry picked from commit c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0)

tags: added: in-stable-rocky

This issue was fixed in the openstack/nova 18.1.0 release.

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/612294
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=36d93675d9a6bf903ed64c216243c74a639a2087
Submitter: Zuul
Branch: stable/queens

commit 36d93675d9a6bf903ed64c216243c74a639a2087
Author: Maciej Józefczyk <email address hidden>
Date: Thu Nov 16 14:49:42 2017 +0100

    Update resources once in update_available_resource

    This change ensures that resources are updated only once per
    update_available_resource() call.

    Compute resources were previously updated during host
    object initialization and at the end of
    update_available_resource(). It could cause inconsistencies
    in resource tracking between compute host and DB for couple
    of second when final _update() at the end of
    update_available_resource() is being called.

    For example: nova-api shows that host uses 10GB of RAM, but
    in fact its 12GB because DB doesn't have resources that belongs
    to shutdown instance.

    Because of that fact nova-scheduler (CachingScheduler) could
    choose (based on imcomplete information) host which is already full.

    For more informations please see realted bug: #1729621

    Change-Id: I120a98cc4c11772f24099081ef3ac44a50daf71d
    Closes-Bug: #1729621
    (cherry picked from commit c9b74bcfa09d11c2046ce1bfb6dd8463b3a2f3b0)

tags: added: in-stable-queens

Change abandoned by Radoslav Gerganov (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/612295
Reason: fira

This issue was fixed in the openstack/nova 17.0.11 release.

Matt Riedemann (mriedem) on 2019-08-21
no longer affects: nova/ocata
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers