When a user sets up virtual function (VF) network interfaces, the IRQ affinities of the new VF interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
I need to admit that I did not reproduce the issue on my own due to server unavailability, but the aforementioned colleague and I separately confirmed that a patched version of the kernel sets the IRQ affinities as expected.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=netdevice ${NODE} data0
system host-if-add -c pci-sriov controller-0 sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface-datanetwork-assign ${NODE} sriov1 sriovnet1
system interface-datanetwork-assign ${NODE} sriov2 sriovnet2
system host-label-assign ${NODE} kube-cpu-mgr-policy=static
system host-label-assign ${NODE} kube-topology-mgr-policy=restricted
system host-cpu-modify -f application-isolated -p0 12 ${NODE}
system host-cpu-modify -f application-isolated -p1 12 ${NODE}
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/IRQ_NUM/smp_affinity_list and /proc/irq/IRQ_NUM/effective_affinity_list
Expected Behavior
------------------
Virtual function interfaces should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
Reproducibility
---------------
Reproducible.
System Configuration
--------------------
This issue should be reproducible on All-in-One systems running a version of StarlingX with the v5.10 kernel.
Branch/Pull Time/Commit
-----------------------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/IRQ_NUM/smp_affinity_list, but this is not a scalable solution.
Brief Description
-----------------
When a user sets up virtual function (VF) network interfaces, the IRQ affinities of the new VF interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
I need to admit that I did not reproduce the issue on my own due to server unavailability, but the aforementioned colleague and I separately confirmed that a patched version of the kernel sets the IRQ affinities as expected.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled netdevice ${NODE} data0
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=
system host-if-add -c pci-sriov controller-0 sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface- datanetwork- assign ${NODE} sriov1 sriovnet1 datanetwork- assign ${NODE} sriov2 sriovnet2
system interface-
system host-label-assign ${NODE} kube-cpu- mgr-policy= static mgr-policy= restricted
system host-label-assign ${NODE} kube-topology-
system host-cpu-modify -f application- isolated -p0 12 ${NODE} isolated -p1 12 ${NODE}
system host-cpu-modify -f application-
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/ IRQ_NUM/ smp_affinity_ list and /proc/irq/ IRQ_NUM/ effective_ affinity_ list
Expected Behavior
------------------
Virtual function interfaces should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
Reproducibility
---------------
Reproducible.
System Configuration ------- ------
-------
This issue should be reproducible on All-in-One systems running a version of StarlingX with the v5.10 kernel.
Branch/Pull Time/Commit ------- ------- --
-------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/ IRQ_NUM/ smp_affinity_ list, but this is not a scalable solution.