PCI IRQ Affinity settings are not persistent after VM server actions

Bug #1955051 reported by Heitor Matsui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Heitor Matsui

Bug Description

Brief Description
-----------------
Using SRIOV or Pass-through, verified that PCI IRQ Affinity settings are lost after the following actions on the servers:

Shutting off, Suspending, or Live-Migrating.

Severity
-----------------
Major.

Steps to Reproduce
-----------------
Boot a VM (SRIOV or PT) with pci_irq_affinity_mask extraspec.
Verify that IRQs are pinned to the correct CPUs.
Shut off/Power on, Suspend, or Live-Migrate the server
Verify that IRQs are no longer pinned to the previous CPUs.

Expected Behavior
-----------------
IRQs pinned to CPU after shutting off, suspending, or live-migrating the server.

Actual Behavior
-----------------
IRQs are no longer pinned to the previous CPU

Reproducibility
-----------------
100%.

System Configuration
-----------------
Any lab with SRIOV/PT capabilities

Branch/Pull Time/Commit
-----------------
2021-12-10

Last Pass
-----------------
Never tested before, new feature.

Timestamp/Logs
-----------------
// Before Shutting Off

compute-0:~$ IRQS="72, 73, 74"
compute-0:~$ for IRQ in ${IRQS//,/}; do echo "[$IRQ]"; cat /proc/irq/$IRQ/smp_affinity_list; done
[72]
6
[73]
6
[74]
6

// After Powering On

compute-0:~$ IRQS="72, 73, 74"
compute-0:~$ for IRQ in ${IRQS//,/}; do echo "[$IRQ]"; cat /proc/irq/$IRQ/smp_affinity_list; done
[72]
12,14,16,18,32,34,36,38
[73]
12,14,16,18,32,34,36,38
[74]
12,14,16,18,32,34,36,38

// No errors in the agent pods

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl logs -n openstack -l application=pci-irq-affinity-agent
2021-12-14 11:53:49,941 MainThread[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.199 - INFO Enter PCIInterruptAffinity Agent
2021-12-14 11:53:50,019 MainThread[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.178 - INFO rabbit://nova-rabbitmq-user:<email address hidden>:5672/nova
2021-12-14 11:53:50,544 MainThread[6] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.191 - INFO Rabbitmq Client Started!
No handlers could be found for logger "oslo_messaging.server"
2021-12-14 15:56:09,012 Thread-15[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 70 pinned to CPUS: 8,28
2021-12-14 15:56:09,012 Thread-15[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 71 pinned to CPUS: 8,28
2021-12-14 15:56:09,013 Thread-15[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/driver.py.136 - INFO Instance=2ce1fd3e-c743-495e-8444-a167020cb1df: IRQs affined for pci_addr=0000:06:10.5, dev_id=, dev_type=, vendor_id=, product_id=, irqs=, msi_irqs=69, 70, 71, numa_node=0, cpulist=8,28
2021-12-14 16:53:48,249 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.104 - INFO instance_created: uuid=3265f4d4-513e-4f03-b5f8-cd9815976dbb.
2021-12-14 16:54:05,214 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.197 - INFO pinning pcpu list:set([26, 6])
2021-12-14 16:54:05,214 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.108 - INFO pci_irq_affinity_mask: 1
2021-12-14 16:54:05,214 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 72 pinned to CPUS: 6
2021-12-14 16:54:05,214 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 73 pinned to CPUS: 6
2021-12-14 16:54:05,215 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 74 pinned to CPUS: 6
2021-12-14 16:54:05,215 Thread-55[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/driver.py.136 - INFO Instance=3265f4d4-513e-4f03-b5f8-cd9815976dbb: IRQs affined for pci_addr=0000:06:10.3, dev_id=, dev_type=, vendor_id=, product_id=, irqs=, msi_irqs=72, 73, 74, numa_node=0, cpulist=6
2021-12-14 11:53:41,686 Thread-20[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.146 - INFO instance_deleted: uuid=1c0543c5-ef83-48e2-9f9a-a41989f27f1a.
2021-12-14 17:53:11,656 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/agent.py.104 - INFO instance_created: uuid=b3b4f548-6e04-458b-9d20-0797ed806673.
2021-12-14 17:53:24,412 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.197 - INFO pinning pcpu list:set([8, 10, 30])
2021-12-14 17:53:24,412 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.108 - INFO pci_irq_affinity_mask: 0
2021-12-14 17:53:24,412 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 66 pinned to CPUS: 10
2021-12-14 17:53:24,412 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 67 pinned to CPUS: 10
2021-12-14 17:53:24,412 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 68 pinned to CPUS: 10
2021-12-14 17:53:24,413 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 69 pinned to CPUS: 10
2021-12-14 17:53:24,413 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/utils.py.234 - INFO PCI IRQ 70 pinned to CPUS: 10
2021-12-14 17:53:24,413 Thread-19[7] pci-interrupt-affinity./var/lib/openstack/lib/python2.7/site-packages/pci_irq_affinity/driver.py.136 - INFO Instance=b3b4f548-6e04-458b-9d20-0797ed806673: IRQs affined for pci_addr=0000:06:00.0, dev_id=, dev_type=, vendor_id=, product_id=, irqs=66, msi_irqs=67, 68, 69, 70, numa_node=0, cpulist=10

Alarms
-----------------
None

Test Activity
-----------------
Feature Testing

Workaround
-----------------
None

Changed in starlingx:
assignee: nobody → Heitor Matsui (heitormatsui)
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.7.0 / medium - issue tied to testing of a new feature/capability on stx-openstack

summary: - PCI IRQ Affinity settings are not persistent after server actions
+ PCI IRQ Affinity settings are not persistent after VM server actions
tags: added: stx.distro.openstack
tags: added: stx.7.0
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/821910
Committed: https://opendev.org/starlingx/utilities/commit/50c00c7450cfe4617474cb5b1305afe5dd629733
Submitter: "Zuul (22348)"
Branch: master

commit 50c00c7450cfe4617474cb5b1305afe5dd629733
Author: Heitor Matsui <email address hidden>
Date: Wed Dec 15 18:46:11 2021 -0300

    Add event types to Rabbit notification listener

    The pci-irq-affinity-agent currently receives notifications
    from Rabbit for only three instance event types: create, resize
    and delete. As consequence of this, the agent doesn't set IRQ
    affinity settings correctly when other types of operation are
    executed against an instance, like a migration or shutdown.

    This commit refactor the listener notifications endpoints to
    only two types: "instance online" and "instance offline", and
    adds other relevant event types to be received by the agent.

    Also this commit removes unused imports from the agent code.

    Test Plan:
    PASS: Verify that agent handles online operations for
          instances that match PCI IRQ affine conditions
    PASS: Verify that agent handles offline operations
          instances that match PCI IRQ affine conditions
    PASS: Boot instance, trigger live migrate, verify agent
          resetting IRQ affinity on source host and setting
          IRQ affinity on destination host
    PASS: Boot instance, trigger cold migrate, verify agent
          resetting IRQ affinity on source host and setting
          IRQ affinity on destination host

    Regression:
    PASS: Verify that OpenStack applies successfully

    Closes-bug: 1955051
    Depends-on: https://review.opendev.org/c/starlingx/openstack-armada-app/+/822357
    Change-Id: I9f02730ec6110d5774c57065a3e82e9ae081d234
    Signed-off-by: Heitor Matsui <email address hidden>
    Co-authored-by: Iago Estrela <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.