VMs with SRIOV or PCI passthrough are not setting correctly the IRQs cpus

Bug #1959925 reported by Heitor Matsui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Heitor Matsui

Bug Description

Brief Description
-----------------
Bring up a vm using a port with sriov ou passthrough are not setting up the correctly IRQ cpus.

Severity
-----------------
Critical

Steps to Reproduce
-----------------
Bring up a VM using a port with SRIOV/PCI-Passthrough capabilities

Expected Behavior
-----------------
VMs using a port with SRIOV/PCI-Passthrough capabilities not setting up the correct cpus for the IRQs

Actual Behavior
-----------------
VMs using a port with SRIOV/PCI-Passthrough capabilities setting up the correct cpus for the IRQs

Reproducibility
-----------------
Reproducible

System Configuration
-----------------
Standard

Branch/Pull Time/Commit
-----------------
master/2022-02-02

Last Pass
-----------------
master/2022-01-28

Timestamp/Logs
-----------------
compute-0:~$ sudo virsh vcpupin 1
VCPU: CPU Affinity
----------------------------------
   0: 10
   1: 30
   2: 8
   3: 28

compute-0:~$ ls /sys/bus/pci/devices/0000\:06\:10.3/msi_irqs/
 66 67 68

compute-0:~$ cat /proc/irq/66/smp_affinity_list
12,14,16,18,32,34,36,38

2022-02-03 14:37:18,555 Thread-11[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.131 - INFO Instance online: uuid=711a1eb4-33d1-499d-9669-cf072d4b196c, instance_host=compute-0, event_type=compute.instance.create.end 2022-02-03 14:37:19,362 Thread-11[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/guest.py.281 - WARNING Failed to get domain for uuid=711a1eb4-33d1-499d-9669-cf072d4b196c! error='NoneType' object has no attribute 'lookupByUUIDString' 2022-02-03 14:38:18,266 Thread-1[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/guest.py.281 - WARNING Failed to get domain for uuid=711a1eb4-33d1-499d-9669-cf072d4b196c! error='NoneType' object has no attribute 'lookupByUUIDString' 2022-02-03 14:38:18,267 Thread-1[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/nova_provider.py.140 - WARNING Failed to get instances info! error='NoneType' object has no attribute '__getitem__'

Test Activity
-----------------
Feature Testing

Workaround
-----------------
No workaround

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/utilities/+/827755

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Heitor Matsui (heitormatsui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/827755
Committed: https://opendev.org/starlingx/utilities/commit/5872b851ec2fc2fafabba7b9a35e6ca4ded586e7
Submitter: "Zuul (22348)"
Branch: master

commit 5872b851ec2fc2fafabba7b9a35e6ca4ded586e7
Author: Heitor Matsui <email address hidden>
Date: Thu Feb 3 15:33:55 2022 -0300

    Create libvirt connection on NovaProvider constructor

    Before I58cefac9076db52333b41633bf2cbaa5441dc98c the Nova client
    was created on nova_provider module and imported by other modules
    as a single and shared instance, and it's libvirt connection was
    created once during the agent.process_main function execution if
    OpenStack was enabled.

    Now with the Nova client being created on-demand during the agent
    operation, the libvirt connection opening isn't behaving well inside
    threads and is returning an empty object when called.

    This commit makes the libvirt connection to be opened when the
    NovaProvider instance is created and closed when it is destroyed,
    and return to a similar behavior before the refactor, but now by
    creating a singleton NovaProvider instance on the main thread
    that will be used by other modules during the agent execution,
    preventing unnecessary code execution during nova_provider import.

    Test Plan:
    PASS: Create an instance with SRIOV or PCI-PT and verify that
          instance CPU cores are affined to the PCI device IRQs
    PASS: Verify that log shows no more warnings and errors related
          to domain information retrieval on libvirt

    Closes-bug: 1959925
    Change-Id: Ie72ccca5b63e1a984233703ed518f26564d67dd7
    Signed-off-by: Heitor Matsui <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Screening: Adding stx.7.0 release tag since this fix will be included in the next stx release.

tags: added: stx.7.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.