Comment 0 for bug 1768971

Revision history for this message
dann frazier (dannf) wrote :

[Impact]
When a SATA device, attached to a SAS controller, begins generating errors (e.g. device failing, or someone yanked it), the SAS error handling will complete, but may leave zombie ATA commands that never get properly processed/freed. This can cause some ugly messages on the console, and eventually leads to a system hang-up.

    WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037
    ata_eh_finish+0xb4/0xcc
    CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1
    ......
    Call trace:
    [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc
    [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8
    [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c
    [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694
    [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80
    [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170
    [<ffff0000080ebd70>] process_one_work+0x144/0x390
    [<ffff0000080ec100>] worker_thread+0x144/0x418
    [<ffff0000080f2c98>] kthread+0x10c/0x138
    [<ffff0000080855dc>] ret_from_fork+0x10/0x18

[Test Case]
I don't have a reliable reproducer for this, but one possible test is to yank an active/hotpluggable SATA disk from its controller and see if the above symptoms occur.

[Regression Risk]
This is a clean cherry-pick from upstream, so any regressions should have upstream support. As of this writing, there are no changesets in linux-next marked as Fixing this commit, implying that upstream has not yet found/fixed any bugs related to it.