commit 1a089999c41ef887d6cec66ae8071651a0db24d1
Author: Jiping Ma <email address hidden>
Date: Fri Mar 22 06:19:04 2024 +0000
iavf: upgrade to iavf-4.5.3.4
This commit upgrades iavf to version 4.5.3.4 from 4.5.3.2 to fix the
issue "iavf 0000:17:01.6: Never saw reset".
The following root cause analysis comes from Intel.
"""
The iavf_adminq_task() function processes the device Admin queue,
which is used to handle receiving messages from the PF driver.
It calls iavf_clean_arq_element() to extract the message at the head
of the queue, and processes it by calling iavf_virtchnl_completion().
There is a subtle race between iavf_adminq_task() and
iavf_watchdog_task() involving the processing of
VIRTCHNL_EVENT_RESET_IMPENDING. The race results in the iavf driver
getting stuck waiting for a reset that has already completed, printing
"Never saw reset" once every 5 seconds, and locking the driver in the
__IAVF_RESET state, preventing normal operations from proceeding.
The entire race can be avoided if the iavf_adminq_task() stops holding
onto potentially stale data. To do this, acquire the
__IAVF_IN_CRITICAL_TASK at the start of the function. With this, it is
no longer possible for the function to be blocked holding the data in
its event buffer while the iavf_watchdog_task() function processes the
entire hardware reset.
Instead of sleeping with a while loop, just re-queue the
iavf_adminq_task() when we are unable to acquire the bit lock.
Additionally, align with upstream and check the removal status to
avoid re-queuing in the event that the driver has already started
remove.
This new flow also aligns with the way the upstream driver handles
locking and completely avoids the race. If the iavf_adminq_task()
happens to be delayed until the hardware reset completes, it will no
longer see the VIRTCHNL_EVENT_RESET_IMPENDING data, as this will have
been cleared by the hardware reset.
"""
Verification:
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image was installed onto an All-in-One Dell XR11 lab
with one Intel E810 NIC server in low-latency mode.
- The user who reported this issue was provided with a StarlingX
designer patch that incorporates this change. The user in question
did not encounter any issues during their testing with the designer
patch.
Closes-Bug: 2058858
Change-Id: I448ee1e302bdc7277a6c5db990d4d5cfc485a0f4
Signed-off-by: Jiping Ma <email address hidden>
Reviewed: https:/ /review. opendev. org/c/starlingx /kernel/ +/914047 /opendev. org/starlingx/ kernel/ commit/ 1a089999c41ef88 7d6cec66ae80716 51a0db24d1
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 1a089999c41ef88 7d6cec66ae80716 51a0db24d1
Author: Jiping Ma <email address hidden>
Date: Fri Mar 22 06:19:04 2024 +0000
iavf: upgrade to iavf-4.5.3.4
This commit upgrades iavf to version 4.5.3.4 from 4.5.3.2 to fix the
issue "iavf 0000:17:01.6: Never saw reset".
The following root cause analysis comes from Intel.
"""
The iavf_adminq_task() function processes the device Admin queue,
which is used to handle receiving messages from the PF driver.
It calls iavf_clean_ arq_element( ) to extract the message at the head completion( ).
of the queue, and processes it by calling iavf_virtchnl_
There is a subtle race between iavf_adminq_task() and watchdog_ task() involving the processing of EVENT_RESET_ IMPENDING. The race results in the iavf driver
iavf_
VIRTCHNL_
getting stuck waiting for a reset that has already completed, printing
"Never saw reset" once every 5 seconds, and locking the driver in the
__IAVF_RESET state, preventing normal operations from proceeding.
The entire race can be avoided if the iavf_adminq_task() stops holding _IAVF_IN_ CRITICAL_ TASK at the start of the function. With this, it is task() function processes the
onto potentially stale data. To do this, acquire the
_
no longer possible for the function to be blocked holding the data in
its event buffer while the iavf_watchdog_
entire hardware reset.
Instead of sleeping with a while loop, just re-queue the adminq_ task() when we are unable to acquire the bit lock.
iavf_
Additionally, align with upstream and check the removal status to
avoid re-queuing in the event that the driver has already started
remove.
This new flow also aligns with the way the upstream driver handles EVENT_RESET_ IMPENDING data, as this will have
locking and completely avoids the race. If the iavf_adminq_task()
happens to be delayed until the hardware reset completes, it will no
longer see the VIRTCHNL_
been cleared by the hardware reset.
"""
Verification:
- The following command with this commit results in a successful iavf
kernel module build for standard and PREEMPT_RT kernels:
build-pkgs -c -p iavf
- A StarlingX ISO image was installed onto an All-in-One Dell XR11 lab
with one Intel E810 NIC server in low-latency mode.
- The user who reported this issue was provided with a StarlingX
designer patch that incorporates this change. The user in question
did not encounter any issues during their testing with the designer
patch.
Closes-Bug: 2058858
Change-Id: I448ee1e302bdc7 277a6c5db990d4d 5cfc485a0f4
Signed-off-by: Jiping Ma <email address hidden>