This commit updates the default Intel NIC driver bundle version of the
iavf driver from v4.5.3 to v4.5.3.2 to resolve an issue involving system
hangs after the following messages are printed out by the iavf driver:
```
iavf 0000:51:11.0: Failed to init adminq: -53
iavf 0000:51:11.0: failed to allocate resources during reinit
```
This is reproduced with the following commands on iavf-4.5.3, which
carry out rapid virtual function (VF) interface resets:
```
while true; do
# enp81s17 is the first VF interface
ip l set dev enp81s17 up;
# enp81s0f2 is the corresponding PF interface
ip l set dev enp81s0f2 vf 0 trust on;
ip l set dev enp81s0f2 vf 0 vlan 333;
ip l set dev enp81s0f2 vf 0 trust off;
ip l set dev enp81s0f2 vf 0 vlan 310;
ip l set dev enp81s17 down;
sleep 0.1 ;
done
```
Eventually, iavf reports the aforementioned error messages, and the VF
bring down operation hangs. This is followed by the hang of many
unrelated processes, likely due to the "rtnl" mutex.
This commit updates iavf from v4.5.3 to v4.5.3.2 to resolve this issue
and other issues that Intel has recommended to fix. Please note that
this version of the iavf driver is found in the "unsupported" directory
on Intel's Sourceforge project for NIC drivers, despite Intel having
recommended this version of the iavf driver to fix the reported issue.
This is how Intel provides fixed intermediate versions of their older
NIC drivers on Sourceforge. Furthermore, this version of iavf has gone
through testing by Intel as well as by the StarlingX community, despite
the driver having been declared as an "unsupported" version by Intel.
The corresponding mainline commits are as follows, but note that the
changes in iavf 4.5.3.2 are only loosely based on these commits, due to
the divergence between the out-of-tree and mainline versions of the iavf
source code:
The iavf driver versions belonging to other Intel NIC driver bundle
versions are not updated due to the following reasons:
- intel-iavf-cvl-2.54: We do not yet know if this version of iavf
(v4.0.1) is affected by this issue. The user reporting the issue fixed
by this commit is currently using iavf v4.5.3, and we have not
received field reports regarding a similar issue encountered with iavf
v4.0.1.
- intel-iavf-cvl-4.10: This version of iavf (v4.6.1) is not affected by
this issue, as the changes included in iavf v4.5.3.2 were backported
by Intel from iavf v4.6.1.
Verification
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image from 2023-09-28 was installed onto an All-in-One
Duplex Dell XR11 lab with one quad-port Intel E810 NIC per server in
low-latency mode (i.e., with the PREEMPT_RT kernel).
- The issue was reproduced using a script similar to the one depicted at
the beginning of this commit message. We should note that the issue
manifests itself usually within ~200 iterations of the loop.
- Afterwards, in a StarlingX build environment, the kernel and all of
the kernel modules were built with this commit from scratch. The
resulting *.deb files were copied to controller-1 of the StarlingX
installation and converted into a "sneaky" designer patch with a
customized version of the "sneaky_patch.py" script, the original
version of which is available in StarlingX.
- The resulting designer patch was successfully applied onto
controller-0 of the aforementioned StarlingX ISO image installation.
Afterwards, it was confirmed that the iavf driver version changed from
4.5.3 (prior to the designer patch) to 4.5.3.2 (after the application
of the designer patch).
- Afterwards, a shell script based on the snippet quoted above was
executed for 4000 iterations of the loop, without the reproduction of
the original issue.
- Furthermore, basic tests with iavf-managed VF interfaces were carried
out, involving creating two network namespaces on controller-0,
assigning one iavf-managed VF interface to each network namespace, and
finally, running iperf3 across the VF interfaces, from within the
network namespaces.
Reviewed: https:/ /review. opendev. org/c/starlingx /kernel/ +/897242 /opendev. org/starlingx/ kernel/ commit/ 06a162c47d25064 e573c139e05e7fb 3278d114f4
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 06a162c47d25064 e573c139e05e7fb 3278d114f4
Author: M. Vefa Bicakci <email address hidden>
Date: Tue Sep 12 12:39:51 2023 +0000
intel-iavf: Update from v4.5.3 to v4.5.3.2
This commit updates the default Intel NIC driver bundle version of the
iavf driver from v4.5.3 to v4.5.3.2 to resolve an issue involving system
hangs after the following messages are printed out by the iavf driver:
```
iavf 0000:51:11.0: Failed to init adminq: -53
iavf 0000:51:11.0: failed to allocate resources during reinit
```
This is reproduced with the following commands on iavf-4.5.3, which
carry out rapid virtual function (VF) interface resets:
```
while true; do
# enp81s17 is the first VF interface
ip l set dev enp81s17 up;
# enp81s0f2 is the corresponding PF interface
ip l set dev enp81s0f2 vf 0 trust on;
ip l set dev enp81s0f2 vf 0 vlan 333;
ip l set dev enp81s0f2 vf 0 trust off;
ip l set dev enp81s0f2 vf 0 vlan 310;
ip l set dev enp81s17 down;
sleep 0.1 ;
done
```
Eventually, iavf reports the aforementioned error messages, and the VF
bring down operation hangs. This is followed by the hang of many
unrelated processes, likely due to the "rtnl" mutex.
This commit updates iavf from v4.5.3 to v4.5.3.2 to resolve this issue
and other issues that Intel has recommended to fix. Please note that
this version of the iavf driver is found in the "unsupported" directory
on Intel's Sourceforge project for NIC drivers, despite Intel having
recommended this version of the iavf driver to fix the reported issue.
This is how Intel provides fixed intermediate versions of their older
NIC drivers on Sourceforge. Furthermore, this version of iavf has gone
through testing by Intel as well as by the StarlingX community, despite
the driver having been declared as an "unsupported" version by Intel.
The corresponding mainline commits are as follows, but note that the
changes in iavf 4.5.3.2 are only loosely based on these commits, due to
the divergence between the out-of-tree and mainline versions of the iavf
source code:
* Commit 31071173771e ("iavf: Fix reset error handling") /git.kernel. org/pub/ scm/linux/ kernel/ git/torvalds/ linux.git/ commit/ ?id=31071173771 e
https:/
(This is the commit that resolves the issue the user in question has
encountered.)
* Commit c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger /git.kernel. org/pub/ scm/linux/ kernel/ git/torvalds/ linux.git/ commit/ ?id=c2ed2403f12 c
it")
https:/
* Commit 7598f4b40bd6 ("iavf: Move netdev_ update_ features( ) into /git.kernel. org/pub/ scm/linux/ kernel/ git/torvalds/ linux.git/ commit/ ?id=7598f4b40bd 6
watchdog task")
https:/
The iavf driver versions belonging to other Intel NIC driver bundle
versions are not updated due to the following reasons:
- intel-iavf- cvl-2.54: We do not yet know if this version of iavf
(v4.0.1) is affected by this issue. The user reporting the issue fixed
by this commit is currently using iavf v4.5.3, and we have not
received field reports regarding a similar issue encountered with iavf
v4.0.1.
- intel-iavf- cvl-4.10: This version of iavf (v4.6.1) is not affected by
this issue, as the changes included in iavf v4.5.3.2 were backported
by Intel from iavf v4.6.1.
Verification
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image from 2023-09-28 was installed onto an All-in-One
Duplex Dell XR11 lab with one quad-port Intel E810 NIC per server in
low-latency mode (i.e., with the PREEMPT_RT kernel).
- The issue was reproduced using a script similar to the one depicted at
the beginning of this commit message. We should note that the issue
manifests itself usually within ~200 iterations of the loop.
- Afterwards, in a StarlingX build environment, the kernel and all of
the kernel modules were built with this commit from scratch. The
resulting *.deb files were copied to controller-1 of the StarlingX
installation and converted into a "sneaky" designer patch with a
customized version of the "sneaky_patch.py" script, the original
version of which is available in StarlingX.
- The resulting designer patch was successfully applied onto
controller-0 of the aforementioned StarlingX ISO image installation.
Afterwards, it was confirmed that the iavf driver version changed from
4.5.3 (prior to the designer patch) to 4.5.3.2 (after the application
of the designer patch).
- Afterwards, a shell script based on the snippet quoted above was
executed for 4000 iterations of the loop, without the reproduction of
the original issue.
- Furthermore, basic tests with iavf-managed VF interfaces were carried
out, involving creating two network namespaces on controller-0,
assigning one iavf-managed VF interface to each network namespace, and
finally, running iperf3 across the VF interfaces, from within the
network namespaces.
Closes-Bug: 2037692 b91c2208bff0817 75c9eced083
Change-Id: I75415e5668b002
Signed-off-by: M. Vefa Bicakci <email address hidden>