Bug in iavf driver v4.5.3 results in a system hang
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
M. Vefa Bicakci |
Bug Description
Brief Description
-----------------
iavf driver version 4.5.3 has a bug that results in a system hang when virtual function (VF) interfaces are reset rapidly, by, for example, manipulating the trust and VLAN settings of a VF interface.
This bug was originally resolved in iavf version 4.6.1, and it is also resolved in iavf version 4.5.3.2.
Severity
--------
Major: Attaching (or possibly detaching) VF interfaces to Kubernetes pods may result in a system hang with an unpredictable probability.
Steps to Reproduce
------------------
The most straightforward approach is to use a script like the following to reproduce the issue:
```
#!/bin/bash
if test "${EUID}" -ne 0; then
echo "Please run this script as root"
exit 1
fi
# Debugging options
sysctl -w kernel.sysrq=1 ;
dmesg -n debug;
dmesg -E;
sync
echo 3 > /proc/sys/
limit="${1:-4000}"
iter=1;
set -x;
while test "${iter}" -le "${limit}"; do
tee <<<"Iteration: ${iter}" /dev/kmsg;
let iter+=1;
# enp81s17 is the first VF interface.
# enp81s0f2 is the corresponding PF interface.
ip l set dev enp81s17 up;
ip l set dev enp81s0f2 vf 0 trust on;
ip l set dev enp81s0f2 vf 0 vlan 333;
ip l set dev enp81s0f2 vf 0 trust off;
ip l set dev enp81s0f2 vf 0 vlan 310;
ip l set dev enp81s17 down;
sleep 0.1;
done
```
Expected Behavior
------------------
The script should not result in a system hang
Actual Behavior
----------------
The script results in a system hang, usually within 200 iterations of the loop.
Eventually, the iavf driver emits the following logs, which indicates that the issue has been triggered:
```
iavf 0000:xx:xx.x: Failed to init adminq: -53
iavf 0000:xx:xx.x: failed to allocate resources during reinit
```
Afterwards, the VF bring-down operation (ip l set dev ... down) hangs, followed by an eventual unresponsiveness of the rest of the system.
Reproducibility
---------------
Reliably reproducible with the script.
System Configuration
-------
This issue was reproduced on an All-in-One system with a quad-port Intel E810 NIC managed by the ice driver and VF interfaces managed by the iavf driver.
Branch/Pull Time/Commit
-------
The issue appears to be present in StarlingX versions that ship with iavf 4.5.3. Other versions of the iavf driver have not been verified, but we do know that the issue is fixed in iavf driver v4.6.1 and the intermediate release v4.5.3.2.
Last Pass
---------
Not sure.
Timestamp/Logs
--------------
(Please see the actual behaviour section.)
Test Activity
-------------
Normal use.
Workaround
----------
None.
Changed in starlingx: | |
assignee: | nobody → M. Vefa Bicakci (vbicakci) |
status: | New → In Progress |
description: | updated |
Changed in starlingx: | |
importance: | Undecided → High |
tags: | added: stx.9.0 stx.distro.other |
Fix proposed to branch: master /review. opendev. org/c/starlingx /kernel/ +/897242
Review: https:/