Comment 4 for bug 1986838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/853611
Committed: https://opendev.org/openstack/nova/commit/2b447b7236f95752d00ebcee8c32cfef4850cf5d
Submitter: "Zuul (22348)"
Branch: master

commit 2b447b7236f95752d00ebcee8c32cfef4850cf5d
Author: Balazs Gibizer <email address hidden>
Date: Wed Aug 17 18:19:15 2022 +0200

    Trigger reschedule if PCI consumption fail on compute

    The PciPassthroughFilter logic checks each InstancePCIRequest
    individually against the available PCI pools of a given host and given
    boot request. So it is possible that the scheduler accepts a host that
    has a single PCI device available even if two devices are requested for
    a single instance via two separate PCI aliases. Then the PCI claim on
    the compute detects this but does not stop the boot just logs an ERROR.
    This results in the instance booted without any PCI device.

    This patch does two things:
    1) changes the PCI claim to fail with an exception and trigger a
       re-schedule instead of just logging an ERROR.
    2) change the PciDeviceStats.support_requests that is called during
       scheduling to not just filter pools for individual requests but also
       consume the request from the pool within the scope of a single boot
       request.

    The fix in #2) would not be enough alone as two parallel scheduling
    request could race for a single device on the same host. #1) is the
    ultimate place where we consume devices under a compute global lock so
    we need the fix there too.

    Closes-Bug: #1986838
    Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343