2018-02-02 04:36:09 |
melanie witt |
bug |
|
|
added bug |
2018-02-02 05:15:17 |
melanie witt |
description |
I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query objects.InstanceList.get_by_filters for all instances who are members of the server group, every time the ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on.
This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity *is* however ultimately checked via the late-affinity check in compute, so affinity is not totally broken for multiple cells.
Aside from that, I would expect the database query to noticeably degrade performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the single cell and multiple cell cases.
To fix this, I expect we'll need to pre-load RequestSpec.instance_group.hosts before we schedule each instance -- and make sure we query all cells for the instances. I'm not sure what special consideration we might need for multi-create.
This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells:
nova/scheduler/filters/affinity_filter.py:
group_hosts = (spec_obj.instance_group.hosts
if spec_obj.instance_group else [])
nova/objects/instance_group.py:
def obj_load_attr(self, attrname):
...
self.hosts = self.get_hosts()
self.obj_reset_changes(['hosts'])
...
@base.remotable
def get_hosts(self, exclude=None):
"""Get a list of hosts for non-deleted instances in the group
This method allows you to get a list of the hosts where instances in
this group are currently running. There's also an option to exclude
certain instance UUIDs from this calculation.
"""
filter_uuids = self.members
if exclude:
filter_uuids = set(filter_uuids) - set(exclude)
filters = {'uuid': filter_uuids, 'deleted': False}
instances = objects.InstanceList.get_by_filters(self._context,
filters=filters)
return list(set([instance.host for instance in instances
if instance.host])) |
I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query objects.InstanceList.get_by_filters for all instances who are members of the server group, every time the ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on.
This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity is checked again via the late-affinity check in compute, but compute is using the same InstanceGroup.get_hosts method and will only find group member's hosts that are in its cell.
Aside from that, I would expect the database query to noticeably degrade performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the single cell and multiple cell cases.
To fix this, I expect we'll need to pre-load RequestSpec.instance_group.hosts before we schedule each instance -- and make sure we query all cells for the instances. I'm not sure what special consideration we might need for multi-create.
This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells:
nova/scheduler/filters/affinity_filter.py:
group_hosts = (spec_obj.instance_group.hosts
if spec_obj.instance_group else [])
nova/objects/instance_group.py:
def obj_load_attr(self, attrname):
...
self.hosts = self.get_hosts()
self.obj_reset_changes(['hosts'])
...
@base.remotable
def get_hosts(self, exclude=None):
"""Get a list of hosts for non-deleted instances in the group
This method allows you to get a list of the hosts where instances in
this group are currently running. There's also an option to exclude
certain instance UUIDs from this calculation.
"""
filter_uuids = self.members
if exclude:
filter_uuids = set(filter_uuids) - set(exclude)
filters = {'uuid': filter_uuids, 'deleted': False}
instances = objects.InstanceList.get_by_filters(self._context,
filters=filters)
return list(set([instance.host for instance in instances
if instance.host])) |
|
2018-02-02 05:36:52 |
melanie witt |
tags |
cells performance scheduler |
cells scheduler |
|
2018-02-02 05:37:28 |
melanie witt |
summary |
database query via lazy-load in ServerGroup(Anti|)AffinityFilter |
scheduler affinity doesn't work with multiple cells |
|
2018-02-02 05:39:47 |
melanie witt |
description |
I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query objects.InstanceList.get_by_filters for all instances who are members of the server group, every time the ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on.
This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity is checked again via the late-affinity check in compute, but compute is using the same InstanceGroup.get_hosts method and will only find group member's hosts that are in its cell.
Aside from that, I would expect the database query to noticeably degrade performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the single cell and multiple cell cases.
To fix this, I expect we'll need to pre-load RequestSpec.instance_group.hosts before we schedule each instance -- and make sure we query all cells for the instances. I'm not sure what special consideration we might need for multi-create.
This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells:
nova/scheduler/filters/affinity_filter.py:
group_hosts = (spec_obj.instance_group.hosts
if spec_obj.instance_group else [])
nova/objects/instance_group.py:
def obj_load_attr(self, attrname):
...
self.hosts = self.get_hosts()
self.obj_reset_changes(['hosts'])
...
@base.remotable
def get_hosts(self, exclude=None):
"""Get a list of hosts for non-deleted instances in the group
This method allows you to get a list of the hosts where instances in
this group are currently running. There's also an option to exclude
certain instance UUIDs from this calculation.
"""
filter_uuids = self.members
if exclude:
filter_uuids = set(filter_uuids) - set(exclude)
filters = {'uuid': filter_uuids, 'deleted': False}
instances = objects.InstanceList.get_by_filters(self._context,
filters=filters)
return list(set([instance.host for instance in instances
if instance.host])) |
I happened upon this while hacking on my WIP CellDatabases fixture patch.
Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query 'objects.InstanceList.get_by_filters' for all instances who are members of the server group to do the affinity check. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on.
This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity is checked again via the late-affinity check in compute, but compute is using the same InstanceGroup.get_hosts method and will only find group member's hosts that are in its cell.
This is the code that populates the RequestSpec.instance_group.hosts via a
lazy-load on first access:
nova/objects/instance_group.py:
def obj_load_attr(self, attrname):
...
self.hosts = self.get_hosts()
self.obj_reset_changes(['hosts'])
...
@base.remotable
def get_hosts(self, exclude=None):
"""Get a list of hosts for non-deleted instances in the group
This method allows you to get a list of the hosts where instances in
this group are currently running. There's also an option to exclude
certain instance UUIDs from this calculation.
"""
filter_uuids = self.members
if exclude:
filter_uuids = set(filter_uuids) - set(exclude)
filters = {'uuid': filter_uuids, 'deleted': False}
instances = objects.InstanceList.get_by_filters(self._context,
filters=filters)
return list(set([instance.host for instance in instances
if instance.host])) |
|
2018-02-02 05:47:31 |
OpenStack Infra |
nova: status |
New |
In Progress |
|
2018-02-02 05:47:31 |
OpenStack Infra |
nova: assignee |
|
melanie witt (melwitt) |
|
2018-02-07 17:10:37 |
Matt Riedemann |
nominated for series |
|
nova/pike |
|
2018-02-07 17:10:37 |
Matt Riedemann |
bug task added |
|
nova/pike |
|
2018-02-07 17:10:47 |
Matt Riedemann |
nova/pike: status |
New |
Confirmed |
|
2018-02-07 17:10:51 |
Matt Riedemann |
nova/pike: importance |
Undecided |
High |
|
2018-02-07 17:10:54 |
Matt Riedemann |
nova: importance |
Undecided |
High |
|
2018-02-08 08:32:56 |
Belmiro Moreira |
bug |
|
|
added subscriber Belmiro Moreira |
2018-04-25 15:38:44 |
melanie witt |
nominated for series |
|
nova/queens |
|
2018-04-25 15:38:44 |
melanie witt |
bug task added |
|
nova/queens |
|
2018-04-25 15:39:07 |
melanie witt |
nova/queens: importance |
Undecided |
High |
|
2018-04-25 15:39:07 |
melanie witt |
nova/queens: status |
New |
Confirmed |
|
2018-07-11 09:12:38 |
Balazs Gibizer |
nova: status |
In Progress |
Won't Fix |
|
2018-07-11 09:12:43 |
Balazs Gibizer |
nova: status |
Won't Fix |
In Progress |
|
2018-08-13 07:28:49 |
Matt Riedemann |
nominated for series |
|
nova/rocky |
|
2018-08-13 07:28:49 |
Matt Riedemann |
bug task added |
|
nova/rocky |
|
2018-08-28 16:32:10 |
OpenStack Infra |
nova: status |
In Progress |
Fix Released |
|
2018-09-04 18:18:29 |
OpenStack Infra |
nova/rocky: status |
New |
In Progress |
|
2018-09-04 18:18:29 |
OpenStack Infra |
nova/rocky: assignee |
|
melanie witt (melwitt) |
|
2018-09-04 19:42:55 |
OpenStack Infra |
nova/queens: status |
Confirmed |
In Progress |
|
2018-09-04 19:42:55 |
OpenStack Infra |
nova/queens: assignee |
|
melanie witt (melwitt) |
|
2018-09-04 20:46:09 |
OpenStack Infra |
nova/pike: status |
Confirmed |
In Progress |
|
2018-09-04 20:46:09 |
OpenStack Infra |
nova/pike: assignee |
|
melanie witt (melwitt) |
|
2018-09-13 22:05:13 |
OpenStack Infra |
tags |
cells scheduler |
cells in-stable-rocky scheduler |
|
2018-09-13 23:30:16 |
OpenStack Infra |
nova/rocky: status |
In Progress |
Fix Committed |
|
2018-09-20 21:33:42 |
OpenStack Infra |
tags |
cells in-stable-rocky scheduler |
cells in-stable-queens in-stable-rocky scheduler |
|
2018-09-20 21:33:54 |
OpenStack Infra |
nova/queens: status |
In Progress |
Fix Committed |
|
2018-11-09 16:46:36 |
OpenStack Infra |
tags |
cells in-stable-queens in-stable-rocky scheduler |
cells in-stable-pike in-stable-queens in-stable-rocky scheduler |
|
2018-11-09 23:18:13 |
OpenStack Infra |
nova/pike: status |
In Progress |
Fix Committed |
|