Booting with two identical PCI aliases on a host with a single matching dev succeeds but the instance will have no PCI allocations

Bug #1986838 reported by Balazs Gibizer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Unassigned

Bug Description

Detected during reading the code.

Reproduction
1) configure a host with a single PCI passthrough device
2) configure two PCI aliases (a1, a2) with different names but each matching the above device
3) boot an instance with 'pci_passthrough:alias': 'a1:1,a2:1' flavor extra_spec.

Expected result
The instance fails to schedule

Actual result
The instance schedules to the host but has no PCI allocations
The nova scheduler logs:
Selected host: compute1 failed to consume from instance. Error: PCI device request [InstancePCIRequest(alias_name='a1',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}]), InstancePCIRequest(alias_name='a2',count=1,is_new=<?>,numa_policy='legacy',request_id=None,requester_id=<?>,spec=[{product_id='1533',vendor_id='8086'}])] failed

The nova compute logs:
Failed to allocate PCI devices for instance. Unassigning devices back to pools. This should not happen, since the scheduler should have accurate information, and allocation during claims is controlled via a hold on the compute node semaphore.

I think the root cause of the fault is that the PciDeviceStats.support_requests() [1] call matches each InstancePCIRequest object independently to the available PCI pools and does not update the status of the pools locally.

I will push a functional reproduction test shortly.

[1] https://github.com/openstack/nova/blob/69bc4c38d1c5b98fcbbe8b16a7dfeb654e3b8173/nova/pci/stats.py#L645

Tags: pci
tags: added: pci
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/853516

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/853611

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/853516
Committed: https://opendev.org/openstack/nova/commit/2aeb0a96b77e05172b13b4d1f692ff2b08f10bc9
Submitter: "Zuul (22348)"
Branch: master

commit 2aeb0a96b77e05172b13b4d1f692ff2b08f10bc9
Author: Balazs Gibizer <email address hidden>
Date: Wed Aug 17 17:53:45 2022 +0200

    Reproduce bug 1986838

    Related-Bug: #1986838
    Change-Id: I374b21fafff1a2f359d3cf887a9c271449f83635

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/nova/+/853611
Committed: https://opendev.org/openstack/nova/commit/2b447b7236f95752d00ebcee8c32cfef4850cf5d
Submitter: "Zuul (22348)"
Branch: master

commit 2b447b7236f95752d00ebcee8c32cfef4850cf5d
Author: Balazs Gibizer <email address hidden>
Date: Wed Aug 17 18:19:15 2022 +0200

    Trigger reschedule if PCI consumption fail on compute

    The PciPassthroughFilter logic checks each InstancePCIRequest
    individually against the available PCI pools of a given host and given
    boot request. So it is possible that the scheduler accepts a host that
    has a single PCI device available even if two devices are requested for
    a single instance via two separate PCI aliases. Then the PCI claim on
    the compute detects this but does not stop the boot just logs an ERROR.
    This results in the instance booted without any PCI device.

    This patch does two things:
    1) changes the PCI claim to fail with an exception and trigger a
       re-schedule instead of just logging an ERROR.
    2) change the PciDeviceStats.support_requests that is called during
       scheduling to not just filter pools for individual requests but also
       consume the request from the pool within the scope of a single boot
       request.

    The fix in #2) would not be enough alone as two parallel scheduling
    request could race for a single device on the same host. #1) is the
    ultimate place where we consume devices under a compute global lock so
    we need the fix there too.

    Closes-Bug: #1986838
    Change-Id: Iea477be57ae4e95dfc03acc9368f31d4be895343

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 26.0.0.0rc1

This issue was fixed in the openstack/nova 26.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.