Suspected performance regression for RBD back end linked to location sorting

Bug #2086675 reported by Andrew Bonney
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Fix Released
Undecided
Abhishek Kekane

Bug Description

Hi,
For the past few releases (Caracal, Bobcat, maybe earlier) we have noticed that listing images is particularly slow. I've finally had some time to dig into this, and I believe I've tracked it to https://review.opendev.org/c/openstack/glance/+/886811

We're running the RBD back end primarily, but also have HTTP and Cinder listed in 'enabled_backends', meaning that sort_image_locations executes in full. It appears that when this function executes for the RBD back end, it causes a connection to be opened to the back end. When doing a full image list operation, this happens once for every image in the list (the connection is not re-used). This appears to carry a 20-30ms time penalty per image. As such, for any reasonable set of images the response ends up taking several seconds.

In our case, images are unlikely to be held in more than one back end at a time, and I noted that adding a length check to the locations list in https://github.com/openstack/glance/blob/master/glance/common/utils.py#L718 so that the sorting doesn't occur when the list has just one element resolves the performance issue entirely.

Whilst a length check is a workaround, does the sorting operation actually require connections to be opened to the RBD back end? If they are required, could the connections at least be re-used to avoid this time penalty growing linearly with the number of images held by Glance?

Thanks

Revision history for this message
Abhishek Kekane (abhishek-kekane) wrote :

May be this needs enhancement at glance store side, at this moment weight is attribute of store instance which we try to fetch at runtime, which also checks if connection is there or not. I think that needs to be avoided at list call.

Changed in glance:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/glance/+/934716

Changed in glance:
status: Triaged → In Progress
Changed in glance:
assignee: nobody → Abhishek Kekane (abhishek-kekane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to glance (master)

Reviewed: https://review.opendev.org/c/openstack/glance/+/934716
Committed: https://opendev.org/openstack/glance/commit/45202b1e044f52aa863ed135564cc1fc74b43145
Submitter: "Zuul (22348)"
Branch: master

commit 45202b1e044f52aa863ed135564cc1fc74b43145
Author: Abhishek Kekane <email address hidden>
Date: Tue Nov 12 05:49:23 2024 +0000

    Fix performance glitch while sorting image locations

    Some of the available glance stores like file, cinder etc has
    capability to reuse already initiated driver (DRIVER_REUSABLE = 0b01000000). In
    Caracal we have added a feature to sort image locations based on store
    weight. As RBD driver of glance does not have this reuse capability, during
    image list API call it initializes the RBD driver for each of the available
    image which is causing noticable delay in list call.

    To avoid this, using new interface added in glance_store which will directly
    get the weight of the store from memory and return it back to user.

    Depends-On: https://review.opendev.org/c/openstack/glance_store/+/934362
    Closes-Bug: #2086675
    Change-Id: I662ba19697e03917ca999920ea7be93a0b2a8296

Changed in glance:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to glance (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/glance/+/938921

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/glance 30.0.0.0b2

This issue was fixed in the openstack/glance 30.0.0.0b2 Epoxy development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to glance (master)

Reviewed: https://review.opendev.org/c/openstack/glance/+/938921
Committed: https://opendev.org/openstack/glance/commit/579ee7aded1b12e22107cfb37202e4292204af17
Submitter: "Zuul (22348)"
Branch: master

commit 579ee7aded1b12e22107cfb37202e4292204af17
Author: Takashi Kajinami <email address hidden>
Date: Sat Jan 11 00:57:11 2025 +0900

    Bump minimum version of glance_store

    This is follow-up of 45202b1e044f52aa863ed135564cc1fc74b43145 to
    ensure that glance_store is new enough to use the new interface to
    obtain store weight.

    Related-Bug: #2086675
    Change-Id: I2f3fd2484613af50bfeaee6adec6f7dc2f99aa8a

Revision history for this message
Bartosz Bezak (bbezak) wrote :

Is this backportable? this problem also occurs in 2024.1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (stable/2024.2)

Fix proposed to branch: stable/2024.2
Review: https://review.opendev.org/c/openstack/glance/+/940528

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/glance/+/940530

Revision history for this message
Tobias Urdin (tobias-urdin) wrote :

Interesting that this was just fixed as I was hitting this glance-store dependency issue just today!

Revision history for this message
Abhishek Kekane (abhishek-kekane) wrote :

This needs to be carefully backported, once glance_store patch is bacported and merged then we need to make a release from store library and then only glance patch should be backported.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.