pci numa polices are not followed

Bug #1805891 reported by sean mooney on 2018-11-29
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Stephen Finucane
Queens
Undecided
Unassigned
Rocky
Undecided
Unassigned
Stein
Undecided
Unassigned

Bug Description

Description
===========
https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/share-pci-between-numa-nodes.html
introduced the concept of numa affinity policies for pci passthough devices.

upon testing it was observed that the prefer policy is broken.

for contested there is a sperate bug to track the lack of support for neutron sriov interfaces.
https://bugs.launchpad.net/nova/+bug/1795920 so the scope of this bug is limited
pci numa policies for passtrhough devices using a flavor alias.

background
----------

by default in nova pci devices are numa affinitesed using the legacy policy.
but you can override this behavior via the alias. when set to prefer nova
should fall back to no numa affintiy bwteen the guest and the pci devce
if a device on a local numa node is not availeble.

the policies are discibed below.

legacy

    This is the default value and it describes the current nova behavior. Usually we have information about association of PCI devices with NUMA nodes. However, some PCI devices do not provide such information. The legacy value will mean that nova will boot instances with PCI device if either:

        The PCI device is associated with at least one NUMA nodes on which the instance will be booted
        There is no information about PCI-NUMA affinity available

preferred

    This value will mean that nova-scheduler will choose a compute host with minimal consideration for the NUMA affinity of PCI devices. nova-compute will attempt a best effort selection of PCI devices based on NUMA affinity, however, if this is not possible then nova-compute will fall back to scheduling on a NUMA node that is not associated with the PCI device.

    Note that even though the NUMATopologyFilter will not consider NUMA affinity, the weigher proposed in the Reserve NUMA Nodes with PCI Devices Attached spec [2] can be used to maximize the chance that a chosen host will have NUMA-affinitized PCI devices.

Steps to reproduce
==================

the test case was relitively simple

- deploy a singel node devstack install on a host with 2 numa nodes.
- enable the pci and numa topology fileters
- whitelist a pci device attach to numa_node 0
  e.g. passthrough_whitelist = { "address": "0000:01:00.1" }
- adust the vcpu_pin_set to only list the cpus on numa_node 1
  e.g. vcpu_pin_set=8-15
- crate an alias in the pci section of the nova.conf
  alias = { "vendor_id":"8086", "product_id":"10c9", "device_type":"type-PF", "name":"nic-pf", "numa_policy": "preferred"}
- restart the nova services
  sudo systemctl restart devstack@n-*

- update a flavour with the alias and a numa toplogy of 1
 openstack flavour set --property pci_passthrough:alias='nic-pf:1' 42
 openstack flavour set --property hw:numa_nodes=1 42

+----------------------------+-----------------------------------------------------+
| Field | Value |
+----------------------------+-----------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | None |
| disk | 0 |
| id | 42 |
| name | m1.nano |
| os-flavor-access:is_public | True |
| properties | hw:numa_nodes='1', pci_passthrough:alias='nic-pf:1' |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+-----------------------------------------------------+

boot a vm with the flavor

Expected result
===============
vm boots with cpus and ram from host numa node 1
and the pci devcie for host numa node 0

Actual result
=============

the resouce tracker failst to claim the pci device as it cannot
create a guest with a virtual numa topology of 1 with a pci device form a remote numa node.
i.e. there is no fall back and the vm fails to boot due to nova trying to enforce numa affinity.

Environment
===========
1. Exact version of OpenStack you are running.
   master but i belive this will be broken on queens and rocky too.

2. Which hypervisor did you use?
   libvirt kvm.

2. Which storage type did you use?
   N/a cinder lvm and default libvirt image backend

3. Which networking type did you use?
  N/A openvswitch.

sean mooney (sean-k-mooney) wrote :

i have set this to high as it likely effect multiple releases
and renders this feature effectively useless.

while the required policy has some uses the prefer policy was the reason this
feature was implemented and since that policy is broke this feate effectilvy
rrequies operator to work around this by using mulit numa node guests

e.g. hw:numa_nodes=2

as this can have a perfromacne impact for non numa aware workload this causes
problems for peole with performance sensitive workloads.

Changed in nova:
importance: Undecided → High
status: New → Triaged
assignee: nobody → sean mooney (sean-k-mooney)
tags: added: numa pci
Changed in nova:
assignee: sean mooney (sean-k-mooney) → Stephen Finucane (stephenfinucane)

Fix proposed to branch: master
Review: https://review.openstack.org/624444

Changed in nova:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/624444
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=59d94633518e6f6272e9f0654bb908e332f97a96
Submitter: Zuul
Branch: master

commit 59d94633518e6f6272e9f0654bb908e332f97a96
Author: Stephen Finucane <email address hidden>
Date: Tue Dec 11 16:01:38 2018 +0000

    objects: Store InstancePCIRequest.numa_policy in DB

    In change I9360fe29908, we added the 'numa_policy' field to the
    'InstancePCIRequest' object. Unfortunately we did not update the
    (de)serialization logic for the 'InstancePCIRequests' object, meaning
    this field was never saved to the database. As a result, claiming will
    always fail [1].

    The resolution is simple - add the (de)serialization logic and tests to
    prevent regression.

    [1] https://github.com/openstack/nova/blob/18.0.0/nova/compute/resource_tracker.py#L214-L215

    Change-Id: Id4d8ecb8fee46b21590ebcc62a2850030cef6508
    Closes-Bug: #1805891

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/641653
Reason: It's been nearly 3 months since I -1ed this so I'm going to abandon it to cleanup the stable/rocky review queue so I can prepare for a release.

sean mooney (sean-k-mooney) wrote :

as this feature never worked on rocky and queens i am marking it as wont fix as it would be effectivly a feature backport based on matt's comment here https://review.opendev.org/#/c/641653/1//COMMIT_MSG@13

Reviewed: https://review.opendev.org/674072
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8c7224172641c6194582ca4cf7ce11e907df50aa
Submitter: Zuul
Branch: master

commit 8c7224172641c6194582ca4cf7ce11e907df50aa
Author: Sean Mooney <email address hidden>
Date: Thu Aug 1 15:00:07 2019 +0000

    support pci numa affinity policies in flavor and image

    This addresses bug #1795920 by adding support for
    defining a pci numa affinity policy via the flavor
    extra specs or image metadata properties enabling
    the policies to be applied to neutron sriov port
    including hardware offloaded ovs.

    Closes-Bug: #1795920
    Related-Bug: #1805891
    Implements: blueprint vm-scoped-sriov-numa-affinity
    Change-Id: Ibd62b24c2bd2dd208d0f804378d4e4f2bbfdaed6

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers