devlink: don't do reporter recovery if the state is healthy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Invalid
|
Medium
|
Jeff Lane | ||
Focal |
Fix Released
|
Medium
|
Jeff Lane |
Bug Description
Hi,
[Impact]
Currently in focal, devices reporter recovery is enabled even if state is healthy.
[fix]
402818205c9e devlink: don't do reporter recovery if the state is healthy
this upstream commit from kernel v5.5-rc1 which is cleanly applied on focal tree.
the commit prevents reporter recovery when device in healthy state.
when applied, issuing
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
on healthy state reporter return successfully, but dmesg is clean and recover counter do not change.
[test case]
1)
display devlink health status
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
reporter fw_fatal
state healthy error 0 recover 0 grace_period 1200000 auto_recover true
2)
perform reporter recovery using devlink,
# devlink health recover pci/0000:05:00.0 reporter fw_fatal
3)see that recovery was performed.
# dmesg
[776733.438708] mlx5_core 0000:05:00.0: mlx5_health_
[776733.438717] mlx5_core 0000:05:00.0: mlx5_handle_
NIC but it is full driver
[776735.591522] mlx5_core 0000:05:00.0: mlx5_health_
...
# devlink health show pci/0000:05:00.0 reporter fw_fatal
pci/0000:05:00.0:
reporter fw_fatal
state healthy error 0 recover 1 grace_period 1200000 auto_recover true
[Regression Potential]
Very small as it is a very minor change, also this patch has been tested internally on upstream setups for a while and no degradation has been found.
One obvious change is that a user cannot force devlink recovery when state is healthy but I'm not aware of such use case.
Thanks,
Amir
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
status: | Confirmed → In Progress |
assignee: | nobody → Jeff Lane (bladernr) |
importance: | Undecided → Medium |
description: | updated |
Changed in linux (Ubuntu Focal): | |
status: | Incomplete → In Progress |
description: | updated |
Changed in linux (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1915403
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.