OpenStack Compute (nova)

Scheduler connects to all cells DBs to gather compute nodes info

Bug #1767303 reported by Belmiro Moreira on 2018-04-27

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	In Progress	Undecided	Surya Seetharaman
Declined for Queens by Matt Riedemann

Bug Description

The scheduler host.manager connects to all cells DBs to get compute node info even if only a subset of compute nodes uuids are given by placement.

This has a performance impact in large cloud deployments with several cells.

Also related with: https://review.openstack.org/#/c/539617/9/nova/scheduler/host_manager.py

{code}
def _get_computes_for_cells(self, context, cells, compute_uuids=None)
        for cell in cells:
            LOG.debug('Getting compute nodes and services for cell %(cell)s',
                      {'cell': cell.identity})
            with context_module.target_cell(context, cell) as cctxt:
                if compute_uuids is None:
                    compute_nodes[cell.uuid].extend(
                        objects.ComputeNodeList.get_all(cctxt))
                else:
                    compute_nodes[cell.uuid].extend(
                        objects.ComputeNodeList.get_all_by_uuids(
                            cctxt, compute_uuids))
                services.update(
                    {service.host: service
                     for service in objects.ServiceList.get_by_binary(
                             cctxt, 'nova-compute',
                             include_disabled=True)})
        return compute_nodes, services
{code}

Tags:

Surya Seetharaman (tssurya) on 2018-04-27

Changed in nova:
assignee:	nobody → Surya Seetharaman (tssurya)
tags:	added: cells scheduler

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-05-03:

I'm confused, you reference https://review.openstack.org/#/c/539617/ but are pasting a code snippet of old code before that scatter/gather routine was added. Does https://review.openstack.org/#/c/539617/ resolve your issue or at least make it acceptable performance?

tags:	added: performance
Changed in nova:
status:	New → Incomplete

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-05-03:

If https://review.openstack.org/#/c/539617/ isn't enough, we could also think about adding a HostMapping.uuid field which mirrors the ComputeNode.uuid field and then we could get the list of host mappings by uuids and from that list get the list of cells from which to pull the compute nodes, but it would be good to know if https://review.openstack.org/#/c/539617/ made a big enough difference that it's good enough for now.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-11-06:

Or maybe at this point, this is a duplicate of bug 1737465?

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-12-06:

OK I understand the issue now. The problem is that when we get results from placement, all of the allocation candidates (compute nodes) might be in a single cell, because maybe the request is tied to an aggregate which represents that cell. But when the HostManager queries the cell databases for the compute nodes, it iterates overall enabled cells, which could be ~70 in CERNs case. So we're doing a lot of extra DB queries that won't yield results, and might be on older slower cell DBs which take longer to return a response.

If we could filter the cells up front based on the computes (via host_mappings maybe) like we do for filtering instances by project mapped to cells in the API using config:

https://docs.openstack.org/nova/latest/configuration/config.html#api.instance_list_per_project_cells

Then that might make scheduling faster, assuming the compute nodes are in fact restricted to a small subset of cells.

Changed in nova:
status:	Incomplete → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-02-07: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/635532

Changed in nova:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-03: Change abandoned on nova (master)

Change abandoned by Surya Seetharaman (<email address hidden>) on branch: master
Review: https://review.opendev.org/635532
Reason: cern specific

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.