Activity log for bug #1746863

Date Who What changed Old value New value Message
2018-02-02 04:36:09 melanie witt bug added bug
2018-02-02 05:15:17 melanie witt description I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query objects.InstanceList.get_by_filters for all instances who are members of the server group, every time the ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on. This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity *is* however ultimately checked via the late-affinity check in compute, so affinity is not totally broken for multiple cells. Aside from that, I would expect the database query to noticeably degrade performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the single cell and multiple cell cases. To fix this, I expect we'll need to pre-load RequestSpec.instance_group.hosts before we schedule each instance -- and make sure we query all cells for the instances. I'm not sure what special consideration we might need for multi-create. This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells: nova/scheduler/filters/affinity_filter.py: group_hosts = (spec_obj.instance_group.hosts if spec_obj.instance_group else []) nova/objects/instance_group.py: def obj_load_attr(self, attrname): ... self.hosts = self.get_hosts() self.obj_reset_changes(['hosts']) ... @base.remotable def get_hosts(self, exclude=None): """Get a list of hosts for non-deleted instances in the group This method allows you to get a list of the hosts where instances in this group are currently running. There's also an option to exclude certain instance UUIDs from this calculation. """ filter_uuids = self.members if exclude: filter_uuids = set(filter_uuids) - set(exclude) filters = {'uuid': filter_uuids, 'deleted': False} instances = objects.InstanceList.get_by_filters(self._context, filters=filters) return list(set([instance.host for instance in instances if instance.host])) I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query objects.InstanceList.get_by_filters for all instances who are members of the server group, every time the ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on. This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity is checked again via the late-affinity check in compute, but compute is using the same InstanceGroup.get_hosts method and will only find group member's hosts that are in its cell. Aside from that, I would expect the database query to noticeably degrade performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the single cell and multiple cell cases. To fix this, I expect we'll need to pre-load RequestSpec.instance_group.hosts before we schedule each instance -- and make sure we query all cells for the instances. I'm not sure what special consideration we might need for multi-create. This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells: nova/scheduler/filters/affinity_filter.py:         group_hosts = (spec_obj.instance_group.hosts                        if spec_obj.instance_group else []) nova/objects/instance_group.py:     def obj_load_attr(self, attrname):         ...         self.hosts = self.get_hosts()         self.obj_reset_changes(['hosts'])     ...     @base.remotable     def get_hosts(self, exclude=None):         """Get a list of hosts for non-deleted instances in the group         This method allows you to get a list of the hosts where instances in         this group are currently running. There's also an option to exclude         certain instance UUIDs from this calculation.         """         filter_uuids = self.members         if exclude:             filter_uuids = set(filter_uuids) - set(exclude)         filters = {'uuid': filter_uuids, 'deleted': False}         instances = objects.InstanceList.get_by_filters(self._context,                                                         filters=filters)         return list(set([instance.host for instance in instances                          if instance.host]))
2018-02-02 05:36:52 melanie witt tags cells performance scheduler cells scheduler
2018-02-02 05:37:28 melanie witt summary database query via lazy-load in ServerGroup(Anti|)AffinityFilter scheduler affinity doesn't work with multiple cells
2018-02-02 05:39:47 melanie witt description I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query objects.InstanceList.get_by_filters for all instances who are members of the server group, every time the ServerGroup[Anti|]AffinityFilter runs. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on. This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity is checked again via the late-affinity check in compute, but compute is using the same InstanceGroup.get_hosts method and will only find group member's hosts that are in its cell. Aside from that, I would expect the database query to noticeably degrade performance of scheduling if the ServerGroup[Anti|]AffinityFilter is in enabled_filters, for both the single cell and multiple cell cases. To fix this, I expect we'll need to pre-load RequestSpec.instance_group.hosts before we schedule each instance -- and make sure we query all cells for the instances. I'm not sure what special consideration we might need for multi-create. This is the code that lazy-loads instance_group.hosts, which in turn calls InstanceGroup.get_hosts, which calls InstanceList.get_by_filters without targeting any cells: nova/scheduler/filters/affinity_filter.py:         group_hosts = (spec_obj.instance_group.hosts                        if spec_obj.instance_group else []) nova/objects/instance_group.py:     def obj_load_attr(self, attrname):         ...         self.hosts = self.get_hosts()         self.obj_reset_changes(['hosts'])     ...     @base.remotable     def get_hosts(self, exclude=None):         """Get a list of hosts for non-deleted instances in the group         This method allows you to get a list of the hosts where instances in         this group are currently running. There's also an option to exclude         certain instance UUIDs from this calculation.         """         filter_uuids = self.members         if exclude:             filter_uuids = set(filter_uuids) - set(exclude)         filters = {'uuid': filter_uuids, 'deleted': False}         instances = objects.InstanceList.get_by_filters(self._context,                                                         filters=filters)         return list(set([instance.host for instance in instances                          if instance.host])) I happened upon this while hacking on my WIP CellDatabases fixture patch. Some of the nova/tests/functional/test_server_group.py tests started failing with multiple cells and I found that it's because there's a database query 'objects.InstanceList.get_by_filters' for all instances who are members of the server group to do the affinity check. The query for instances doesn't check all cells, so it fails to return any hosts that group members are currently on. This makes the ServerGroup[Anti|]AffinityFilter a no-op for multiple cells. Affinity is checked again via the late-affinity check in compute, but compute is using the same InstanceGroup.get_hosts method and will only find group member's hosts that are in its cell. This is the code that populates the RequestSpec.instance_group.hosts via a lazy-load on first access: nova/objects/instance_group.py:     def obj_load_attr(self, attrname):         ...         self.hosts = self.get_hosts()         self.obj_reset_changes(['hosts'])     ...     @base.remotable     def get_hosts(self, exclude=None):         """Get a list of hosts for non-deleted instances in the group         This method allows you to get a list of the hosts where instances in         this group are currently running. There's also an option to exclude         certain instance UUIDs from this calculation.         """         filter_uuids = self.members         if exclude:             filter_uuids = set(filter_uuids) - set(exclude)         filters = {'uuid': filter_uuids, 'deleted': False}         instances = objects.InstanceList.get_by_filters(self._context,                                                         filters=filters)         return list(set([instance.host for instance in instances                          if instance.host]))
2018-02-02 05:47:31 OpenStack Infra nova: status New In Progress
2018-02-02 05:47:31 OpenStack Infra nova: assignee melanie witt (melwitt)
2018-02-07 17:10:37 Matt Riedemann nominated for series nova/pike
2018-02-07 17:10:37 Matt Riedemann bug task added nova/pike
2018-02-07 17:10:47 Matt Riedemann nova/pike: status New Confirmed
2018-02-07 17:10:51 Matt Riedemann nova/pike: importance Undecided High
2018-02-07 17:10:54 Matt Riedemann nova: importance Undecided High
2018-02-08 08:32:56 Belmiro Moreira bug added subscriber Belmiro Moreira
2018-04-25 15:38:44 melanie witt nominated for series nova/queens
2018-04-25 15:38:44 melanie witt bug task added nova/queens
2018-04-25 15:39:07 melanie witt nova/queens: importance Undecided High
2018-04-25 15:39:07 melanie witt nova/queens: status New Confirmed
2018-07-11 09:12:38 Balazs Gibizer nova: status In Progress Won't Fix
2018-07-11 09:12:43 Balazs Gibizer nova: status Won't Fix In Progress
2018-08-13 07:28:49 Matt Riedemann nominated for series nova/rocky
2018-08-13 07:28:49 Matt Riedemann bug task added nova/rocky
2018-08-28 16:32:10 OpenStack Infra nova: status In Progress Fix Released
2018-09-04 18:18:29 OpenStack Infra nova/rocky: status New In Progress
2018-09-04 18:18:29 OpenStack Infra nova/rocky: assignee melanie witt (melwitt)
2018-09-04 19:42:55 OpenStack Infra nova/queens: status Confirmed In Progress
2018-09-04 19:42:55 OpenStack Infra nova/queens: assignee melanie witt (melwitt)
2018-09-04 20:46:09 OpenStack Infra nova/pike: status Confirmed In Progress
2018-09-04 20:46:09 OpenStack Infra nova/pike: assignee melanie witt (melwitt)
2018-09-13 22:05:13 OpenStack Infra tags cells scheduler cells in-stable-rocky scheduler
2018-09-13 23:30:16 OpenStack Infra nova/rocky: status In Progress Fix Committed
2018-09-20 21:33:42 OpenStack Infra tags cells in-stable-rocky scheduler cells in-stable-queens in-stable-rocky scheduler
2018-09-20 21:33:54 OpenStack Infra nova/queens: status In Progress Fix Committed
2018-11-09 16:46:36 OpenStack Infra tags cells in-stable-queens in-stable-rocky scheduler cells in-stable-pike in-stable-queens in-stable-rocky scheduler
2018-11-09 23:18:13 OpenStack Infra nova/pike: status In Progress Fix Committed