Hi Jeff, upstream commit 50b2412b7e78 net/mlx5: Avoid possible free of command entry while timeout comp handler was picked to Ubuntu-5.4.0-56.62 kernel (hash bcd6e98bef76cc8a49a1b736b0fefffbffb75c30) (v5.4.71 upstream stable release, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1902110 ) now a new issue arise reloading mlx5 modules causes an error message in kernel buffer "cmd_work_handler:887:(pid 292): failed to allocate command entry" reproduction: # modprobe -r mlx5_ib mlx5_core # modprobe mlx5_core mlx5_ib # dmesg [ 142.638490] mlx5_core 0000:08:00.1: E-Switch: cleanup [ 143.734339] mlx5_core 0000:08:00.0: E-Switch: cleanup [ 164.171511] mlx5_core: unknown parameter 'mlx5_ib' ignored [ 164.173501] mlx5_core 0000:08:00.0: firmware version: 16.28.1002 [ 164.173576] mlx5_core 0000:08:00.0: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) [ 164.457342] mlx5_core 0000:08:00.0: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 164.457365] mlx5_core 0000:08:00.0: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384) [ 164.484659] port_module: 5 callbacks suppressed [ 164.484665] mlx5_core 0000:08:00.0: Port module event: module 0, Cable plugged [ 164.485112] mlx5_core 0000:08:00.0: mlx5_pcie_event:294:(pid 8): PCIe slot advertised sufficient power (75W). [ 164.494771] mlx5_core 0000:08:00.1: firmware version: 16.28.1002 [ 164.494844] mlx5_core 0000:08:00.1: 126.016 Gb/s available PCIe bandwidth (8 GT/s x16 link) [ 164.779534] mlx5_core 0000:08:00.1: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps [ 164.779552] mlx5_core 0000:08:00.1: E-Switch: Total vports 2, per vport: max uc(1024) max mc(16384) [ 164.808886] mlx5_core 0000:08:00.1: Port module event: module 1, Cable plugged [ 164.809228] mlx5_core 0000:08:00.1: mlx5_pcie_event:294:(pid 292): PCIe slot advertised sufficient power (75W). [ 164.840667] mlx5_core 0000:08:00.0: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 165.081342] mlx5_core 0000:08:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0) [ 165.282793] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0 [ 165.438226] mlx5_core 0000:08:00.0: cmd_work_handler:887:(pid 292): failed to allocate command entry [ 165.442506] infiniband rocep8s0f0: reg_mr_callback:104:(pid 292): async reg mr failed. status -11 # the following fixes this issue 410bd754cd73 net/mlx5: Add retry mechanism to the command entry index allocation (upstream 5.9) 1d5558b1f0de net/mlx5: poll cmd EQ in case of command timeout (upstream 5.9) d43b7007dbd1 net/mlx5: Fix a race when moving command interface to events mode (upstream 5.7-rc7) 3ed879965cc4 net/mlx5: net/mlx5: Use async EQ setup cleanup helpers for multiple EQs (upstream 5.6-rc1) those are on master-next branch off focal tree also synced from linux stable. (v5.4.79 upstream stable release https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907151 ) # git log --oneline Ubuntu-5.4.0-59.65..master-next .... 400ec5bb2816 net/mlx5: Add retry mechanism to the command entry index allocation 2bd608898edd net/mlx5: Fix a race when moving command interface to events mode bec07c488db0 net/mlx5: poll cmd EQ in case of command timeout 0c9bfdf598e1 net/mlx5: Use async EQ setup cleanup helpers for multiple EQs ..... I compiled master-next, booted the system with it and the issue is resolved.