Warnings/hang during error handling of SATA disks on SAS controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
dann frazier | ||
Bionic |
Fix Released
|
Undecided
|
dann frazier |
Bug Description
[Impact]
When a SATA device, attached to a SAS controller, begins generating errors (e.g. device failing, or someone yanked it), the SAS error handling will complete, but may leave zombie ATA commands that never get properly processed/freed. This can cause some ugly messages on the console, and eventually leads to a system hang-up.
WARNING: CPU: 0 PID: 28512 at drivers/
ata_
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1
......
Call trace:
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[<ffff00000
[Test Case]
I don't have a reliable reproducer for this, but one possible test is to yank an active/hotpluggable SATA disk from its controller and see if the above symptoms occur.
[Fix]
The solution here is to call into libata to have it process the remaining commands, allowing us to free up the zombie commands, preventing the leak and eventual starvation.
[Regression Risk]
This is a clean cherry-pick from upstream, so any regressions should have upstream support. As of this writing, there are no changesets in linux-next marked as Fixing this commit, implying that upstream has not yet found/fixed any bugs related to it.
Changed in linux (Ubuntu): | |
status: | Incomplete → In Progress |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
assignee: | nobody → dann frazier (dannf) |
Changed in linux (Ubuntu): | |
assignee: | nobody → dann frazier (dannf) |
description: | updated |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1768971
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.