IRQ affinities of newly set up network interfaces do not align to irqaffinity=
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
M. Vefa Bicakci |
Bug Description
Brief Description
-----------------
When a user sets up new network interfaces, the IRQ affinities of the new interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. This is most visible with virtual function (VF) interfaces, but applies to all interfaces that use a specific kernel API function (irq_set_
According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
Update: This procedure was verified as of Friday, 2022-01-21.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=
system host-if-add -c pci-sriov ${NODE} sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface-
system interface-
system host-label-assign ${NODE} kube-cpu-
system host-label-assign ${NODE} kube-topology-
system host-cpu-modify -f application-
system host-cpu-modify -f application-
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/
The same exercise could be carried out with non-VF interfaces as well, by modifying the /etc/rc.
Expected Behavior
------------------
Network interfaces set up after the system has fully initialized should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
In the example above, the iavf device driver affines its interrupts to CPUs 0, 1, 2 and 3, instead of CPUs 26 and 27 as indicated by the irqaffinity= kernel command line argument. (As a side note, CPUs 2 and 3 were isolated CPUs in the example above.)
Reproducibility
---------------
Reproducible.
System Configuration
-------
This issue is reproducible with systems running a version of StarlingX with the v5.10 kernel.
It was reproduced with the compute-0 node of a two-controller-node and three-compute-node system, as an example.
One of my colleagues reproduced this issue with an All-in-One (aio) simplex system as well.
Branch/Pull Time/Commit
-------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/
Changed in starlingx: | |
assignee: | nobody → M. Vefa Bicakci (vbicakci) |
tags: | added: stx.7.0 stx.distro.other |
Changed in starlingx: | |
importance: | Undecided → Medium |
description: | updated |
summary: |
- IRQ affinities of newly set up VF interfaces do not align to + IRQ affinities of newly set up network interfaces do not align to irqaffinity= |
description: | updated |
Fix proposed to branch: master /review. opendev. org/c/starlingx /kernel/ +/825343
Review: https:/