2022-01-24 16:22:04 |
M. Vefa Bicakci |
description |
Brief Description
-----------------
When a user sets up virtual function (VF) network interfaces, the IRQ affinities of the new VF interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
I need to admit that I did not reproduce the issue on my own due to server unavailability, but the aforementioned colleague and I separately confirmed that a patched version of the kernel sets the IRQ affinities as expected.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=netdevice ${NODE} data0
system host-if-add -c pci-sriov controller-0 sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface-datanetwork-assign ${NODE} sriov1 sriovnet1
system interface-datanetwork-assign ${NODE} sriov2 sriovnet2
system host-label-assign ${NODE} kube-cpu-mgr-policy=static
system host-label-assign ${NODE} kube-topology-mgr-policy=restricted
system host-cpu-modify -f application-isolated -p0 12 ${NODE}
system host-cpu-modify -f application-isolated -p1 12 ${NODE}
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/IRQ_NUM/smp_affinity_list and /proc/irq/IRQ_NUM/effective_affinity_list
Expected Behavior
------------------
Virtual function interfaces should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
Reproducibility
---------------
Reproducible.
System Configuration
--------------------
This issue should be reproducible on All-in-One systems running a version of StarlingX with the v5.10 kernel.
Branch/Pull Time/Commit
-----------------------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/IRQ_NUM/smp_affinity_list, but this is not a scalable solution. |
Brief Description
-----------------
When a user sets up virtual function (VF) network interfaces, the IRQ affinities of the new VF interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
Update: This procedure was verified as of Friday, 2022-01-21.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=netdevice ${NODE} data0
system host-if-add -c pci-sriov ${NODE} sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface-datanetwork-assign ${NODE} sriov1 sriovnet1
system interface-datanetwork-assign ${NODE} sriov2 sriovnet2
system host-label-assign ${NODE} kube-cpu-mgr-policy=static
system host-label-assign ${NODE} kube-topology-mgr-policy=restricted
system host-cpu-modify -f application-isolated -p0 12 ${NODE}
system host-cpu-modify -f application-isolated -p1 12 ${NODE}
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/IRQ_NUM/smp_affinity_list and /proc/irq/IRQ_NUM/effective_affinity_list
Expected Behavior
------------------
Virtual function interfaces should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
In the example above, the iavf device driver affines its interrupts to CPUs 0, 1, 2 and 3, instead of CPUs 26 and 27 as indicated by the irqaffinity= kernel command line argument. (As a side note, CPUs 2 and 3 were isolated CPUs in the example above.)
Reproducibility
---------------
Reproducible.
System Configuration
--------------------
This issue is reproducible with systems running a version of StarlingX with the v5.10 kernel.
It was reproduced with the compute-0 node of a two-controller-node and three-compute-node system, as an example.
One of my colleagues reproduced this issue with an All-in-One (aio) simplex system as well.
Branch/Pull Time/Commit
-----------------------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/IRQ_NUM/smp_affinity_list, but this is not a scalable solution. |
|
2022-02-07 22:50:01 |
M. Vefa Bicakci |
description |
Brief Description
-----------------
When a user sets up virtual function (VF) network interfaces, the IRQ affinities of the new VF interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
Update: This procedure was verified as of Friday, 2022-01-21.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=netdevice ${NODE} data0
system host-if-add -c pci-sriov ${NODE} sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface-datanetwork-assign ${NODE} sriov1 sriovnet1
system interface-datanetwork-assign ${NODE} sriov2 sriovnet2
system host-label-assign ${NODE} kube-cpu-mgr-policy=static
system host-label-assign ${NODE} kube-topology-mgr-policy=restricted
system host-cpu-modify -f application-isolated -p0 12 ${NODE}
system host-cpu-modify -f application-isolated -p1 12 ${NODE}
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/IRQ_NUM/smp_affinity_list and /proc/irq/IRQ_NUM/effective_affinity_list
Expected Behavior
------------------
Virtual function interfaces should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
In the example above, the iavf device driver affines its interrupts to CPUs 0, 1, 2 and 3, instead of CPUs 26 and 27 as indicated by the irqaffinity= kernel command line argument. (As a side note, CPUs 2 and 3 were isolated CPUs in the example above.)
Reproducibility
---------------
Reproducible.
System Configuration
--------------------
This issue is reproducible with systems running a version of StarlingX with the v5.10 kernel.
It was reproduced with the compute-0 node of a two-controller-node and three-compute-node system, as an example.
One of my colleagues reproduced this issue with an All-in-One (aio) simplex system as well.
Branch/Pull Time/Commit
-----------------------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/IRQ_NUM/smp_affinity_list, but this is not a scalable solution. |
Brief Description
-----------------
When a user sets up new network interfaces, the IRQ affinities of the new interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. This is most visible with virtual function (VF) interfaces, but applies to all interfaces that use a specific kernel API function (irq_set_affinity_hint).
According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs
Severity
--------
Major: System may reboot due to heavy traffic.
Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.
Update: This procedure was verified as of Friday, 2022-01-21.
```
NODE=controller-0
system host-lock ${NODE}
system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=netdevice ${NODE} data0
system host-if-add -c pci-sriov ${NODE} sriov2 vf sriov1 -N 4 --vf-driver=vfio
system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan
system interface-datanetwork-assign ${NODE} sriov1 sriovnet1
system interface-datanetwork-assign ${NODE} sriov2 sriovnet2
system host-label-assign ${NODE} kube-cpu-mgr-policy=static
system host-label-assign ${NODE} kube-topology-mgr-policy=restricted
system host-cpu-modify -f application-isolated -p0 12 ${NODE}
system host-cpu-modify -f application-isolated -p1 12 ${NODE}
system host-unlock ${NODE}
```
Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.
Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/IRQ_NUM/smp_affinity_list and /proc/irq/IRQ_NUM/effective_affinity_list
The same exercise could be carried out with non-VF interfaces as well, by modifying the /etc/rc.d/init.d/affine-platform.sh and /usr/bin/affine-interrupts.sh initialization scripts so that they do not manipulate IRQ affinities and then rebooting the system.
Expected Behavior
------------------
Network interfaces set up after the system has fully initialized should have IRQ affinities that adhere to the irqaffinity= command line argument's value.
Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.
In the example above, the iavf device driver affines its interrupts to CPUs 0, 1, 2 and 3, instead of CPUs 26 and 27 as indicated by the irqaffinity= kernel command line argument. (As a side note, CPUs 2 and 3 were isolated CPUs in the example above.)
Reproducibility
---------------
Reproducible.
System Configuration
--------------------
This issue is reproducible with systems running a version of StarlingX with the v5.10 kernel.
It was reproduced with the compute-0 node of a two-controller-node and three-compute-node system, as an example.
One of my colleagues reproduced this issue with an All-in-One (aio) simplex system as well.
Branch/Pull Time/Commit
-----------------------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.
Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.
Timestamp/Logs
--------------
None available at this point.
Test Activity
-------------
Performance testing.
Workaround
----------
Users could modify the IRQ affinities manually by writing to /proc/irq/IRQ_NUM/smp_affinity_list, but this is not a scalable solution. |
|