2018-05-08 21:40:45 |
dann frazier |
description |
[Impact]
When a SATA device, attached to a SAS controller, begins generating errors (e.g. device failing, or someone yanked it), the SAS error handling will complete, but may leave zombie ATA commands that never get properly processed/freed. This can cause some ugly messages on the console, and eventually leads to a system hang-up.
WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1
......
Call trace:
[<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
[<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
[<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
[<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
[<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
[<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
[<ffff0000080ebd70>] process_one_work+0x144/0x390
[<ffff0000080ec100>] worker_thread+0x144/0x418
[<ffff0000080f2c98>] kthread+0x10c/0x138
[<ffff0000080855dc>] ret_from_fork+0x10/0x18
[Test Case]
I don't have a reliable reproducer for this, but one possible test is to yank an active/hotpluggable SATA disk from its controller and see if the above symptoms occur.
[Regression Risk]
This is a clean cherry-pick from upstream, so any regressions should have upstream support. As of this writing, there are no changesets in linux-next marked as Fixing this commit, implying that upstream has not yet found/fixed any bugs related to it. |
[Impact]
When a SATA device, attached to a SAS controller, begins generating errors (e.g. device failing, or someone yanked it), the SAS error handling will complete, but may leave zombie ATA commands that never get properly processed/freed. This can cause some ugly messages on the console, and eventually leads to a system hang-up.
WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
ata_eh_finish+0xb4/0xcc
CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1
......
Call trace:
[<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
[<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
[<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
[<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
[<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
[<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
[<ffff0000080ebd70>] process_one_work+0x144/0x390
[<ffff0000080ec100>] worker_thread+0x144/0x418
[<ffff0000080f2c98>] kthread+0x10c/0x138
[<ffff0000080855dc>] ret_from_fork+0x10/0x18
[Test Case]
I don't have a reliable reproducer for this, but one possible test is to yank an active/hotpluggable SATA disk from its controller and see if the above symptoms occur.
[Fix]
The solution here is to call into libata to have it process the remaining commands, allowing us to free up the zombie commands, preventing the leak and eventual starvation.
[Regression Risk]
This is a clean cherry-pick from upstream, so any regressions should have upstream support. As of this writing, there are no changesets in linux-next marked as Fixing this commit, implying that upstream has not yet found/fixed any bugs related to it. |
|