add_instance_node during nova CDM build performance issue when listing servers

Bug #1834679 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
watcher
Fix Released
High
chenker
Stein
Fix Released
High
Matt Riedemann

Bug Description

https://review.opendev.org/#/c/659688/ (which was backported to stable/stein) changed the add_instance_node logic to list all servers on a given compute service host in a single API call by filtering on the host when getting servers with details for that host rather than iterate the list of server IDs and make a separate GET /server/{server_id} API call for each.

This introduced a performance regression when the number of servers on a given host is small, for example 4 servers on a host.

The problem is that watcher is passing limit=-1 to novaclient when listing servers which will always make at least two API calls to be sure it's done paging:

https://github.com/openstack/python-novaclient/blob/13.0.1/novaclient/v2/servers.py#L896

If we can determine before we list servers that there are only a certain number - 4 in this example - we should just pass the limit=len(servers) to novaclient and avoid the second call for paging which takes extra time and yields no results.

It's worth noting that by default the compute API will return a max of 1000 servers:

https://docs.openstack.org/nova/latest/configuration/config.html#api.max_limit

But in this case when we're filtering by host, there should be far less (non-baremetal) servers on the compute service host in this case and we shouldn't reach that max limit (and the nova CDM builder code does not support ironic baremetal nodes so we can assume we don't have to account for that wrinkle where a single nova-compute service is managing more than 1000 ironic nodes).

Revision history for this message
Matt Riedemann (mriedem) wrote :

chenker has a patch started here: https://review.opendev.org/#/c/668100/

Changed in watcher:
status: New → In Progress
importance: Undecided → High
assignee: nobody → chenker (chenker)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to watcher (master)

Reviewed: https://review.opendev.org/668100
Committed: https://git.openstack.org/cgit/openstack/watcher/commit/?id=1e8b17ac46a9052234c9f56da42a2a4bb8250216
Submitter: Zuul
Branch: master

commit 1e8b17ac46a9052234c9f56da42a2a4bb8250216
Author: chenke <email address hidden>
Date: Fri Jun 28 14:43:57 2019 +0800

    Reduce the query time of the instances when call get_instance_list()

    The problem is that watcher is passing limit=-1 to novaclient when
    listing servers which will always make at least two API calls to be
    sure it's done paging:

    https://github.com/openstack/python-novaclient/blob/13.0.1/novaclient/v2/servers.py#L896

    If we can determine before we list servers that there are only a
    certain number where the number of servers is less than 1000. For
    example: 4, we should just pass the limit=len(servers) to novaclient
    and avoid the second call for paging which takes extra time and
    yields no results.

    Change-Id: I797ad934a0f8496dbcbf65798e28b0443f238137
    Closes-Bug: #1834679

Changed in watcher:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to watcher (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/669876

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/watcher 3.0.0.0rc1

This issue was fixed in the openstack/watcher 3.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/watcher stein-eol

This issue was fixed in the openstack/watcher stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.