List AVZs can take several seconds

Bug #1801897 reported by Belmiro Moreira on 2018-11-06
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Queens
Medium
Unassigned
Rocky
Medium
Matt Riedemann
Stein
Medium
Matt Riedemann

Bug Description

Getting the list of AVZs can take several seconds (~30 secs. in our case)
This is noticeable in Horizon when creating a new instance because the user can't select an AVZ until this completes.

workflow:
- get all services from all cells (~10000 for us)
- fetch all aggregates which are tagged as an AVZ
- construct a dict of {'service['host']: avz.value}
- return a dict of {'avz_value': list of hosts}
- separate available and not available zones.

Reproducible in Queens, Rocky

Changed in nova:
assignee: nobody → Surya Seetharaman (tssurya)
tags: added: availability-zones performance
Matt Riedemann (mriedem) wrote :

Hmm, we do do a scatter/gather here when listing services:

https://github.com/openstack/nova/blob/b93b40c6c01a1161f40592d29353c3461669de19/nova/compute/api.py#L5106

Would be good to know where the majority of the time is spent in the overall flow.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
Andrey Volkov (avolkov) wrote :
Andrey Volkov (avolkov) wrote :

3) assumption is wrong, services are not fetched by one.

Matt Riedemann (mriedem) on 2019-04-23
Changed in nova:
assignee: Surya Seetharaman (tssurya) → Andrey Volkov (avolkov)
status: Confirmed → In Progress
Changed in nova:
assignee: Andrey Volkov (avolkov) → Matt Riedemann (mriedem)
Matt Riedemann (mriedem) wrote :

Looks like this could go back to pike:

https://review.opendev.org/#/c/442163/

Although ^ isn't probably the reason it's slow, it's probably just always been slow.

Changed in nova:
assignee: Matt Riedemann (mriedem) → Andrey Volkov (avolkov)
Matt Riedemann (mriedem) wrote :

Note that the scatter/gather routine when fetching services from the HostAPI was already backported to pike:

https://review.opendev.org/#/q/I90b488102eb265d971cade29892279a22d3b5273

Matt Riedemann (mriedem) on 2019-04-23
Changed in nova:
importance: High → Medium
Changed in nova:
assignee: Andrey Volkov (avolkov) → Matt Riedemann (mriedem)
Download full text (4.2 KiB)

Reviewed: https://review.opendev.org/636947
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74cefe4266a613d4c2afbb0c791e16eb7789aef4
Submitter: Zuul
Branch: master

commit 74cefe4266a613d4c2afbb0c791e16eb7789aef4
Author: Andrey Volkov <email address hidden>
Date: Thu Feb 14 15:39:45 2019 +0300

    AZ list performance optimization: avoid double service list DB fetch

    Assume number of services can be large (10000 as in the bug description),
    this patch removes second service_get_all call.

    zone_hosts changed from dict of lists to dict of sets.

    The HostAPI instance from the API controller is also passed to the
    get_availability_zones method so it does not have to recreate it
    per call (this is both for a slight performance gain but mostly also
    for test sanity).

    On devstack with 10000 services patch decreased response time twice.

    openstack availability zone list --long --timing

    ...

    Before:

    +-------------------------------------------------------------------------------------------+-----------+
    | URL | Seconds |
    +-------------------------------------------------------------------------------------------+-----------+
    | GET http://192.168.0.45/identity | 0.006816 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.456708 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.087485 |
    | GET http://172.18.237.203/compute/v2.1/os-availability-zone/detail | 95.667192 |
    | GET http://172.18.237.203/volume/v2/e2671d37ee2c4374bd1533645261f1d4/os-availability-zone | 0.036528 |
    | Total | 96.254729 |
    +-------------------------------------------------------------------------------------------+-----------+

    After:

    +-------------------------------------------------------------------------------------------+-----------+
    | URL | Seconds |
    +-------------------------------------------------------------------------------------------+-----------+
    | GET http://192.168.0.45/identity | 0.020215 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.102987 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.111899 |
    | GET http://172.18.237.203/compute/v2.1/os-availability-zone/detail | 39.346657 |
    | GET http://172.18.237.203/volume/v2/e2671d37ee2c4374bd1533645261f1d4/os-availability-zone | 0.026403 |
    | Total | 39.608161 |
    +-------------------------------------------------------------------------------...

Read more...

Download full text (4.3 KiB)

Reviewed: https://review.opendev.org/656382
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c280d747fb23f1abaaf91eea7f6d11e716c6db42
Submitter: Zuul
Branch: stable/stein

commit c280d747fb23f1abaaf91eea7f6d11e716c6db42
Author: Andrey Volkov <email address hidden>
Date: Thu Feb 14 15:39:45 2019 +0300

    AZ list performance optimization: avoid double service list DB fetch

    Assume number of services can be large (10000 as in the bug description),
    this patch removes second service_get_all call.

    zone_hosts changed from dict of lists to dict of sets.

    The HostAPI instance from the API controller is also passed to the
    get_availability_zones method so it does not have to recreate it
    per call (this is both for a slight performance gain but mostly also
    for test sanity).

    On devstack with 10000 services patch decreased response time twice.

    openstack availability zone list --long --timing

    ...

    Before:

    +-------------------------------------------------------------------------------------------+-----------+
    | URL | Seconds |
    +-------------------------------------------------------------------------------------------+-----------+
    | GET http://192.168.0.45/identity | 0.006816 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.456708 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.087485 |
    | GET http://172.18.237.203/compute/v2.1/os-availability-zone/detail | 95.667192 |
    | GET http://172.18.237.203/volume/v2/e2671d37ee2c4374bd1533645261f1d4/os-availability-zone | 0.036528 |
    | Total | 96.254729 |
    +-------------------------------------------------------------------------------------------+-----------+

    After:

    +-------------------------------------------------------------------------------------------+-----------+
    | URL | Seconds |
    +-------------------------------------------------------------------------------------------+-----------+
    | GET http://192.168.0.45/identity | 0.020215 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.102987 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.111899 |
    | GET http://172.18.237.203/compute/v2.1/os-availability-zone/detail | 39.346657 |
    | GET http://172.18.237.203/volume/v2/e2671d37ee2c4374bd1533645261f1d4/os-availability-zone | 0.026403 |
    | Total | 39.608161 |
    +-------------------------------------------------------------------------...

Read more...

tags: added: in-stable-stein
Download full text (4.4 KiB)

Reviewed: https://review.opendev.org/656510
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fa275544c27a5b428ad34b4c03be511607d15bf6
Submitter: Zuul
Branch: stable/rocky

commit fa275544c27a5b428ad34b4c03be511607d15bf6
Author: Andrey Volkov <email address hidden>
Date: Thu Feb 14 15:39:45 2019 +0300

    AZ list performance optimization: avoid double service list DB fetch

    Assume number of services can be large (10000 as in the bug description),
    this patch removes second service_get_all call.

    zone_hosts changed from dict of lists to dict of sets.

    The HostAPI instance from the API controller is also passed to the
    get_availability_zones method so it does not have to recreate it
    per call (this is both for a slight performance gain but mostly also
    for test sanity).

    On devstack with 10000 services patch decreased response time twice.

    openstack availability zone list --long --timing

    ...

    Before:

    +-------------------------------------------------------------------------------------------+-----------+
    | URL | Seconds |
    +-------------------------------------------------------------------------------------------+-----------+
    | GET http://192.168.0.45/identity | 0.006816 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.456708 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.087485 |
    | GET http://172.18.237.203/compute/v2.1/os-availability-zone/detail | 95.667192 |
    | GET http://172.18.237.203/volume/v2/e2671d37ee2c4374bd1533645261f1d4/os-availability-zone | 0.036528 |
    | Total | 96.254729 |
    +-------------------------------------------------------------------------------------------+-----------+

    After:

    +-------------------------------------------------------------------------------------------+-----------+
    | URL | Seconds |
    +-------------------------------------------------------------------------------------------+-----------+
    | GET http://192.168.0.45/identity | 0.020215 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.102987 |
    | POST http://192.168.0.45/identity/v3/auth/tokens | 0.111899 |
    | GET http://172.18.237.203/compute/v2.1/os-availability-zone/detail | 39.346657 |
    | GET http://172.18.237.203/volume/v2/e2671d37ee2c4374bd1533645261f1d4/os-availability-zone | 0.026403 |
    | Total | 39.608161 |
    +-------------------------------------------------------------------------...

Read more...

tags: added: in-stable-rocky
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers