HostState in Scheduler can be incorrect

Bug #1528743 reported by Zhenyu Zheng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned

Bug Description

In nova-scheduler, we now uses scheduler/host_manager/update_from_compute_node() to update information about a
host from a ComputeNode object. At the beginning of this function, we have a few lines of code:

https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L162-L164

if (self.updated and compute.updated_at
    and self.updated > compute.updated_at):
return

here we will not update the information if the local update time is later than compute update time.
This is generally correct, since the compute have a periodic task to update the information:
https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6243

but it only updates if the resource have changed:
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L659

This can lead to inconsistency if the scheduler have consumed(updated) the information but
then the compute fail to claim (the periodic task won't update because there are no changes).

We can add an time limit as a config to the above mentioned "if" logic, so that if the difference
between current time and self.updated time is larger than the limit, we will also update the
information from ComputeNode object, and avoid the inconsistency between different services.

Tags: scheduler
Changed in nova:
assignee: nobody → Zhenyu Zheng (zhengzhenyu)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/260920

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/260920
Reason: This patch is quite old, so I am abandoning it to keep the review queue manageable. Feel free to restore the change if you're still interested in working on it.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Once we start doing claims in the scheduler this might be obsolete:

https://review.openstack.org/#/c/437424/

tags: added: scheduler
Changed in nova:
assignee: Zhenyu Zheng (zhengzhenyu) → nobody
status: In Progress → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :

Is this still an issue now that we've been doing resource claims for DISK_GB, VCPU and MEMORY_MB (via the FilterScheduler and placement) since Pike? It could still be an issue for the CachingScheduler, but out of date information is kind of a known limitation when using the CachingScheduler.

Also, I still don't really understand the issue. You're saying the compute node record isn't getting updated because nothing changed, so then why do we need to update the host state information in the scheduler for that host if nothing changed on it?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.