Devlink reload hangs: fix race and lock issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-bluefield (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Jammy |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
Summary:
Machine hangs when doing devlink reload
How to reproduce:
Host:
[root@bu-lab24v ~]# echo '2' > /sys/class/
Arm:
root@bu-
5.15.0-
root@bu-
root@bu-
*Hangs*
Arm dmesg:
[ 1089.747409] INFO: task devlink:8753 blocked for more than 120 seconds.
[ 1089.760560] Tainted: G OE 5.15.0-
[ 1089.775086] "echo 0 > /proc/sys/
[ 1089.790829] task:devlink state:D stack: 0 pid: 8753 ppid: 5090 flags:0x00000004
[ 1089.790838] Call trace:
[ 1089.790840] __switch_
[ 1089.790857] __schedule+
[ 1089.790865] schedule+0x64/0x140
[ 1089.790870] schedule_
[ 1089.790874] __mutex_
[ 1089.790878] __mutex_
[ 1089.790883] mutex_lock+
[ 1089.790887] devl_lock+0x1c/0x30
[ 1089.790893] mlx5_detach_
[ 1089.791055] mlx5_unload_
[ 1089.791177] mlx5_devlink_
[ 1089.791318] devlink_
Fixes:
Checking the OFED source code, we found this missing devl trap group
also need to be backported to avoid deadlock.
void mlx5_detach_
{
...
#ifdef HAVE_DEVL_
#ifdef HAVE_DEVL_
#else
#endif /* HAVE_DEVL_
#endif /* HAVE_DEVL_
#ifdef HAVE_DEVL_
Related issue:
#2032378 Devlink backport: fix race and lock issue
So cherry-pick the patch below
commit 852e85a704c2e11
Author: Jiri Pirko <email address hidden>
Date: Sat Jul 16 13:02:34 2022 +0200
net: devlink: add unlocked variants of devling_trap*() functions
Add unlocked variants of devl_trap*() functions to be used in drivers
called-in with devlink->lock held.
Changed in linux-bluefield (Ubuntu): | |
status: | New → Invalid |
Changed in linux-bluefield (Ubuntu Jammy): | |
status: | New → Fix Committed |
status: | Fix Committed → In Progress |
Changed in linux-bluefield (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-jammy-linux-bluefield removed: verification-needed-jammy-linux-bluefield |
This bug is awaiting verification that the linux-bluefield /5.15.0- 1029.31 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- jammy-linux- bluefield' to 'verification- done-jammy- linux-bluefield '. If the problem still exists, change the tag 'verification- needed- jammy-linux- bluefield' to 'verification- failed- jammy-linux- bluefield' .
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!