scheduler: re-calculate NUMA on consume_from_instance
This patch narrows down the race window between the filter running and
the consumption of resources from the instance after the host has been
chosen.
It does so by re-calculating the fitted NUMA topology just before consuming it
from the chosen host. Thus we avoid any locking, but also make sure that
the host_state is kept as up to date as possible for concurrent
requests, as there is no opportunity for switching threads inside a
consume_from_instance.
Several things worth noting:
* Scheduler being lock free (and thus racy) does not really affect
resources other than PCI and NUMA topology this badly - this is due
to complexity of said resources. In order for scheduler decesions to not
be based on basically guessing, in case of those two we will likely need
to introduce either locking or special heuristics.
* There is a lot of repeated code between the 'consume_from_instance'
method and the actual filters. This situation should really be fixed but
is out of scope for this bug fix (which is about preventing valid
requests failing because of races in the scheduler).
Reviewed: https:/ /review. openstack. org/169245 /git.openstack. org/cgit/ openstack/ nova/commit/ ?id=d6b3156a6c8 9ddff9b149452df 34c4b32c50b6c3
Committed: https:/
Submitter: Jenkins
Branch: master
commit d6b3156a6c89ddf f9b149452df34c4 b32c50b6c3
Author: Nikola Dipanov <email address hidden>
Date: Tue Apr 7 20:53:32 2015 +0100
scheduler: re-calculate NUMA on consume_ from_instance
This patch narrows down the race window between the filter running and
the consumption of resources from the instance after the host has been
chosen.
It does so by re-calculating the fitted NUMA topology just before consuming it from_instance.
from the chosen host. Thus we avoid any locking, but also make sure that
the host_state is kept as up to date as possible for concurrent
requests, as there is no opportunity for switching threads inside a
consume_
Several things worth noting:
* Scheduler being lock free (and thus racy) does not really affect
resources other than PCI and NUMA topology this badly - this is due
to complexity of said resources. In order for scheduler decesions to not
be based on basically guessing, in case of those two we will likely need
to introduce either locking or special heuristics.
* There is a lot of repeated code between the 'consume_ from_instance'
method and the actual filters. This situation should really be fixed but
is out of scope for this bug fix (which is about preventing valid
requests failing because of races in the scheduler).
Change-Id: If0c7ad20506c9d ddf4dec1eb64c9d 6dd4fb75633
Closes-bug: #1438238