Performance regression in libvirt get_available_resource()

Bug #1785827 reported by s10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Lee Yarwood

Bug Description

Description
===========
Periodic task update_available_resource takes 20 seconds for 100
instances on local storage:

https://github.com/openstack/nova/blob/stable/pike/nova/compute/resource_tracker.py#L694

Steps to reproduce
==================
1. /etc/nova/nova.conf:
[DEFAULT]
preallocate_images=space
[libvirt]
images_type=raw

2. Launch 100 instances on the host

3. See, that every update_available_resource() takes 20 seconds

This performance regression was introduced in this commit:
https://github.com/openstack/nova/commit/d88b75e81eabfbd463007f6a4f27e6966a466530
and following commit doubles the time:
https://github.com/openstack/nova/commit/938c0a745325fa73d098c6d5ddd20b2a599f9624

Expected result
===============
update_available_resource() takes less than 5 seconds

Actual result
=============
update_available_resource() lasts for 20-30 seconds

Environment
===========
1. Exact version of OpenStack you are running:
OpenStack Pike
nova 16.1.4, commit b58c7f0

2. Which hypervisor did you use?
   Libvirt + KVM/QEMU

2. Which storage type did you use?
   local storage, raw

s10 (vlad-esten)
summary: - Performance regression in libvirt get_available_resource(nodename)
+ Performance regression in libvirt get_available_resource()
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/589513

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/589567

Changed in nova:
assignee: nobody → Lee Yarwood (lyarwood)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/589513
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d41ea9d87894111051b591d513cd046e5e357124
Submitter: Zuul
Branch: master

commit d41ea9d87894111051b591d513cd046e5e357124
Author: Lee Yarwood <email address hidden>
Date: Tue Aug 7 15:59:24 2018 +0100

    libvirt: Reduce calls to qemu-img during update_available_resource

    I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a second call to
    ``qemu-img`` that can easily be collapsed into one with the addition of
    a new call within the disk_api.

    Related-Bug: #1785827
    Change-Id: Ibfd0527ed79f60282b542034d7cb97b424becba3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/590253

Changed in nova:
assignee: Lee Yarwood (lyarwood) → Matthew Booth (mbooth-9)
Revision history for this message
s10 (vlad-esten) wrote :

My main concern in this bug is related to live migrations.

rt.update_available_resource() is being called after every live migration, not only in periodic task: https://github.com/openstack/nova/blob/stable/pike/nova/compute/manager.py#L5812

And it looks like next live migrations from this host is being blocked in _post_live_migration() by this call because of the config option max_concurrent_live_migrations=1.

Changed in nova:
assignee: Matthew Booth (mbooth-9) → Lee Yarwood (lyarwood)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matthew Booth (<email address hidden>) on branch: master
Review: https://review.openstack.org/590253
Reason: This isn't going to fix it as we've still got a second call to qemu-img, and Lee has a better patch which combines these 2 anyway.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/603358

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/589567
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e6af812865553fbc49114a419170693ff15d5545
Submitter: Zuul
Branch: master

commit e6af812865553fbc49114a419170693ff15d5545
Author: Lee Yarwood <email address hidden>
Date: Thu Aug 9 13:45:13 2018 +0100

    libvirt: Use os.stat and os.path.getsize for RAW disk inspection

    At present when inspecting a file based image we always use ``qemu-img`` to
    determine the virtual size of the image. This works well but can lead to
    the resource tracker taking considerable time to update on hosts with
    a large number of instances/images.

    This change switches to using os.stat and os.path.getsize to determine
    the allocated and virtual disk sizes of RAW disks.

    Future changes will look into caching the virtual size of the disk
    within disk.info locally on the host to also improve this for qcow2 and
    ploop, further simplifying this code path.

    Closes-bug: #1785827
    Change-Id: Ic5c41493dcdcd807209be2beaae0dbbdf5d2ba3f

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/603488

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/604039

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/604295

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/603358
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d085c8fe536e28ff8a3d757c8407fce37ffc00e4
Submitter: Zuul
Branch: stable/queens

commit d085c8fe536e28ff8a3d757c8407fce37ffc00e4
Author: Vlad Gusev <email address hidden>
Date: Tue Sep 18 17:53:16 2018 +0300

    libvirt: Reduce calls to qemu-img during update_available_resource

    I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a second call to
    ``qemu-img`` that can easily be collapsed into one with the addition of
    a new call within the disk_api.

    Conflicts:
       nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(s10): The conflict was due to not having change
    Ic853743573aa0b74d5d2c5b8b47252b875d5f7ef in Queens.

    Related-Bug: #1785827
    Change-Id: Ibfd0527ed79f60282b542034d7cb97b424becba3
    (cherry picked from commit d41ea9d87894111051b591d513cd046e5e357124)

tags: added: in-stable-queens
tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/603488
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bda6173a8430c3bb83f52195d4ffd0bfee050e19
Submitter: Zuul
Branch: stable/rocky

commit bda6173a8430c3bb83f52195d4ffd0bfee050e19
Author: Lee Yarwood <email address hidden>
Date: Thu Aug 9 13:45:13 2018 +0100

    libvirt: Use os.stat and os.path.getsize for RAW disk inspection

    At present when inspecting a file based image we always use ``qemu-img`` to
    determine the virtual size of the image. This works well but can lead to
    the resource tracker taking considerable time to update on hosts with
    a large number of instances/images.

    This change switches to using os.stat and os.path.getsize to determine
    the allocated and virtual disk sizes of RAW disks.

    Future changes will look into caching the virtual size of the disk
    within disk.info locally on the host to also improve this for qcow2 and
    ploop, further simplifying this code path.

    Closes-bug: #1785827
    Change-Id: Ic5c41493dcdcd807209be2beaae0dbbdf5d2ba3f
    (cherry picked from commit e6af812865553fbc49114a419170693ff15d5545)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/607544

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/604295
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b58805753ab54e02d8377ba80c740eebe7a34ddd
Submitter: Zuul
Branch: stable/queens

commit b58805753ab54e02d8377ba80c740eebe7a34ddd
Author: Lee Yarwood <email address hidden>
Date: Thu Aug 9 13:45:13 2018 +0100

    libvirt: Use os.stat and os.path.getsize for RAW disk inspection

    At present when inspecting a file based image we always use ``qemu-img`` to
    determine the virtual size of the image. This works well but can lead to
    the resource tracker taking considerable time to update on hosts with
    a large number of instances/images.

    This change switches to using os.stat and os.path.getsize to determine
    the allocated and virtual disk sizes of RAW disks.

    Future changes will look into caching the virtual size of the disk
    within disk.info locally on the host to also improve this for qcow2 and
    ploop, further simplifying this code path.

    Closes-bug: #1785827
    Change-Id: Ic5c41493dcdcd807209be2beaae0dbbdf5d2ba3f
    (cherry picked from commit e6af812865553fbc49114a419170693ff15d5545)
    (cherry picked from commit bda6173a8430c3bb83f52195d4ffd0bfee050e19)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.2

This issue was fixed in the openstack/nova 18.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.7

This issue was fixed in the openstack/nova 17.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/604039
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=87883c13750a149ef6e6783a66d1a86472c842d5
Submitter: Zuul
Branch: stable/pike

commit 87883c13750a149ef6e6783a66d1a86472c842d5
Author: Lee Yarwood <email address hidden>
Date: Thu Sep 20 14:06:29 2018 +0300

    libvirt: Reduce calls to qemu-img during update_available_resource

    I464bc2b88123a012cd12213beac4b572c3c20a56 introduced a second call to
    ``qemu-img`` that can easily be collapsed into one with the addition of
    a new call within the disk_api.

    Conflicts:
       nova/tests/unit/virt/libvirt/test_driver.py

    NOTE(s10): The conflict was due to not having change
    Ic853743573aa0b74d5d2c5b8b47252b875d5f7ef in Queens.

    NOTE(s10): Another conflict was due to not having change
    I11e329ac5f5fe4b9819fefbcc32ff1ee504fc58b in Pike.

    Related-Bug: #1785827
    Change-Id: Ibfd0527ed79f60282b542034d7cb97b424becba3
    (cherry picked from commit d41ea9d87894111051b591d513cd046e5e357124)
    (cherry picked from commit d085c8fe536e28ff8a3d757c8407fce37ffc00e4)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/607544
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c46ef969f6f5de1fef52fdc2c4b247ecfd84abe6
Submitter: Zuul
Branch: stable/pike

commit c46ef969f6f5de1fef52fdc2c4b247ecfd84abe6
Author: Lee Yarwood <email address hidden>
Date: Thu Aug 9 13:45:13 2018 +0100

    libvirt: Use os.stat and os.path.getsize for RAW disk inspection

    At present when inspecting a file based image we always use ``qemu-img`` to
    determine the virtual size of the image. This works well but can lead to
    the resource tracker taking considerable time to update on hosts with
    a large number of instances/images.

    This change switches to using os.stat and os.path.getsize to determine
    the allocated and virtual disk sizes of RAW disks.

    Future changes will look into caching the virtual size of the disk
    within disk.info locally on the host to also improve this for qcow2 and
    ploop, further simplifying this code path.

    Closes-bug: #1785827
    Change-Id: Ic5c41493dcdcd807209be2beaae0dbbdf5d2ba3f
    (cherry picked from commit e6af812865553fbc49114a419170693ff15d5545)
    (cherry picked from commit bda6173a8430c3bb83f52195d4ffd0bfee050e19)
    (cherry picked from commit b58805753ab54e02d8377ba80c740eebe7a34ddd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.8

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.