Race condition between RT and scheduler

Bug #1798806 reported by Radoslav Gerganov
This bug report is a duplicate of:  Bug #1729621: Inconsistent value for vcpu_used. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
High
Radoslav Gerganov

Bug Description

The HostState object which is used by the scheduler is using the 'stats' property of the compute node to derive its own values, e.g. :

    self.stats = compute.stats or {}
    self.num_instances = int(self.stats.get('num_instances', 0))
    self.num_io_ops = int(self.stats.get('io_workload', 0))
    self.failed_builds = int(self.stats.get('failed_builds', 0))

These values are used for both filtering and weighing compute hosts. However, the 'stats' property of the compute node is cleared during the periodic update_available_resources() and populated again. The clearing occurs in RT._copy_resources() and it preserves only the old value of 'failed_builds'. This creates a race condition between RT and scheduler which may result into populating wrong values for 'num_io_ops' and 'num_instances' into the HostState object and thus leading to incorrect scheduling decisions.

Tags: scheduler
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/611852

Changed in nova:
status: New → In Progress
Revision history for this message
Radoslav Gerganov (rgerganov) wrote :

I just found that this problem is fixed in the master branch as part of bug #1729621. However, it is not backported to stable releases.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Radoslav Gerganov (<email address hidden>) on branch: master
Review: https://review.openstack.org/611852
Reason: https://review.openstack.org/#/c/520024/ is a better fix for this

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.