commit bf65fdd59ea7d12f7420fcfe2602a5f00cc61055
Author: melanie witt <email address hidden>
Date: Sat Jan 13 21:49:54 2018 +0000
Stop globally caching host states in scheduler HostManager
Currently, in the scheduler HostManager, we cache host states in
a map global to all requests. This used to be okay because we were
always querying the entire compute node list for every request to
pass on to filtering. So we cached the host states globally and
updated them per request and removed "dead nodes" from the cache
(compute nodes still in the cache that were not returned from
ComputeNodeList.get_all).
As of Ocata, we started filtering our ComputeNodeList query based on
an answer from placement about which resource providers could satisfy
the request, instead of querying the entire compute node list every
time. This is much more efficient (don't consider compute nodes that
can't possibly fulfill the request) BUT it doesn't play well with the
global host state cache. We started seeing "Removing dead compute node"
messages in the logs, signaling removal of compute nodes from the
global cache when compute nodes were actually available.
If request A comes in and all compute nodes can satisfy its request,
then request B arrives concurrently and no compute nodes can satisfy
its request, the request B request will remove all the compute nodes
from the global host state cache and then request A will get "no valid
hosts" at the filtering stage because get_host_states_by_uuids returns
a generator that hands out hosts from the global host state cache.
This removes the global host state cache from the scheduler HostManager
and instead generates a fresh host state map per request and uses that
to return hosts from the generator. Because we're filtering the
ComputeNodeList based on a placement query per request, each request
can have a completely different set of compute nodes that can fulfill
it, so we're not gaining much by caching host states anyway.
Co-Authored-By: Dan Smith <email address hidden>
Closes-Bug: #1742827
Related-Bug: #1739323
Change-Id: I40c17ed88f50ecbdedc4daf368fff10e90e7be11
(cherry picked from commit c98ac6adc561d70d34c724703a437b8435e6ddfa)
Reviewed: https:/ /review. openstack. org/539005 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=bf65fdd59ea 7d12f7420fcfe26 02a5f00cc61055
Committed: https:/
Submitter: Zuul
Branch: stable/pike
commit bf65fdd59ea7d12 f7420fcfe2602a5 f00cc61055
Author: melanie witt <email address hidden>
Date: Sat Jan 13 21:49:54 2018 +0000
Stop globally caching host states in scheduler HostManager
Currently, in the scheduler HostManager, we cache host states in List.get_ all).
a map global to all requests. This used to be okay because we were
always querying the entire compute node list for every request to
pass on to filtering. So we cached the host states globally and
updated them per request and removed "dead nodes" from the cache
(compute nodes still in the cache that were not returned from
ComputeNode
As of Ocata, we started filtering our ComputeNodeList query based on
an answer from placement about which resource providers could satisfy
the request, instead of querying the entire compute node list every
time. This is much more efficient (don't consider compute nodes that
can't possibly fulfill the request) BUT it doesn't play well with the
global host state cache. We started seeing "Removing dead compute node"
messages in the logs, signaling removal of compute nodes from the
global cache when compute nodes were actually available.
If request A comes in and all compute nodes can satisfy its request, states_ by_uuids returns
then request B arrives concurrently and no compute nodes can satisfy
its request, the request B request will remove all the compute nodes
from the global host state cache and then request A will get "no valid
hosts" at the filtering stage because get_host_
a generator that hands out hosts from the global host state cache.
This removes the global host state cache from the scheduler HostManager
and instead generates a fresh host state map per request and uses that
to return hosts from the generator. Because we're filtering the
ComputeNodeList based on a placement query per request, each request
can have a completely different set of compute nodes that can fulfill
it, so we're not gaining much by caching host states anyway.
Co-Authored-By: Dan Smith <email address hidden>
Closes-Bug: #1742827
Related-Bug: #1739323
Change-Id: I40c17ed88f50ec bdedc4daf368fff 10e90e7be11 d34c724703a437b 8435e6ddfa)
(cherry picked from commit c98ac6adc561d70