[hns-1126]scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device

Bug #1853993 reported by Fred Kimmy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Fix Released
Undecided
Unassigned
Ubuntu-18.04
Won't Fix
Undecided
Ike Panhc
Ubuntu-18.04-hwe
Fix Released
Undecided
Unassigned
Ubuntu-19.04
Won't Fix
Undecided
Unassigned
Ubuntu-19.10
Fix Released
Undecided
Ike Panhc
Ubuntu-20.04
Fix Released
Undecided
Unassigned
Upstream-kernel
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Invalid
Undecided
Ike Panhc
Eoan
Fix Released
Undecided
Ike Panhc
Focal
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
Disks will be lost on SAS interface reset

[Fix]
scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device

[Test]
Resetting SAS interfaces shall not lose any disks

[Regression Potential]
Patch only for hisi_sas. Lowest risk for other platform/driver.

"[Steps to Reproduce]
1. Close all the PHYS;
2. Inject error;
3. Open one PHY;

[Actual Results]
Some disk will be lost

[Expected Results]
No disk will be lost

[Reproducibility]
occasionally

[Additional information]
Hardware: D06 CS
Firmware: NA
Kernel: NA

[Resolution]
When init device for SAS disks, it will send TMF IO to clear disks. At that
time TMF IO is broken by some operations such as injecting controller reset
from HW RAs event, the TMF IO will be timeout, and at last device will be
gone. Print is as followed:

hisi_sas_v3_hw 0000:74:02.0: dev[240:1] found
...
hisi_sas_v3_hw 0000:74:02.0: controller resetting...
hisi_sas_v3_hw 0000:74:02.0: phyup: phy7 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy0 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy2 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy3 link_rate=9(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy6 link_rate=10(sata)
hisi_sas_v3_hw 0000:74:02.0: phyup: phy5 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: phyup: phy4 link_rate=11
hisi_sas_v3_hw 0000:74:02.0: controller reset complete
hisi_sas_v3_hw 0000:74:02.0: abort tmf: TMF task timeout and not done
hisi_sas_v3_hw 0000:74:02.0: dev[240:1] is gone
sas: driver on host 0000:74:02.0 cannot handle device 5000c500a75a860d,
error:5

To improve the reliability, retry TMF IO max of 3 times for SAS disks which
is the same as softreset does."

scsi: hisi_sas: Retry 3 times TMF IO for SAS disks when init device

dann frazier (dannf)
Changed in kunpeng920:
status: New → Triaged
Ike Panhc (ikepanhc)
tags: added: ikeradar
Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: Triaged → In Progress
Ike Panhc (ikepanhc)
Changed in linux (Ubuntu Focal):
status: New → Fix Released
Changed in linux (Ubuntu Eoan):
assignee: nobody → Ike Panhc (ikepanhc)
status: New → In Progress
Changed in linux (Ubuntu Bionic):
assignee: nobody → Ike Panhc (ikepanhc)
status: New → In Progress
Revision history for this message
Ike Panhc (ikepanhc) wrote :
tags: removed: ikeradar
Ike Panhc (ikepanhc)
tags: added: ikeradar
Ike Panhc (ikepanhc)
description: updated
Revision history for this message
Khaled El Mously (kmously) wrote :

Didn't mean to set Bionic to "Fix Committed" in the first place.

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: Fix Committed → In Progress
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks. kernel 5.3.0-43.36 works for me.

tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Fix available from 18.04 HWE kernel. Marking won't fix for Bionic GA 4.15 kernel.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (49.1 KiB)

This bug was fixed in the package linux - 5.3.0-46.38

---------------
linux (5.3.0-46.38) eoan; urgency=medium

  * eoan/linux: 5.3.0-43.36 -proposed tracker (LP: #1867301)

  * Fix AMD Stoney Ridge screen flickering under 4K resolution (LP: #1864005)
    - iommu/amd: Disable IOMMU on Stoney Ridge systems

  * Allow BPF tracing under lockdown (LP: #1868626)
    - Revert "UBUNTU: SAUCE: (efi-lockdown) Lock down kprobes"
    - Revert "bpf: Restrict bpf when kernel lockdown is in confidentiality mode"

  * Missing wireless network interface after kernel 5.3.0-43 upgrade with eoan
    (LP: #1868442)
    - iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * iSCSI-target: Deleting a LUN hangs in the kernel (LP: #1862682)
    - scsi: Revert "target/core: Inline transport_lun_remove_cmd()"

  * Stop using get_scalar_status command in Dell AIO uart backlight driver
    (LP: #1865402)
    - SAUCE: platform/x86: dell-uart-backlight: add get_display_mode command

  * Eoan update: upstream stable patchset 2020-03-11 (LP: #1867051)
    - Revert "drm/sun4i: dsi: Change the start delay calculation"
    - ovl: fix lseek overflow on 32bit
    - kernel/module: Fix memleak in module_add_modinfo_attrs()
    - media: iguanair: fix endpoint sanity check
    - ocfs2: fix oops when writing cloned file
    - x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
    - udf: Allow writing to 'Rewritable' partitions
    - printk: fix exclusive_console replaying
    - iwlwifi: mvm: fix NVM check for 3168 devices
    - sparc32: fix struct ipc64_perm type definition
    - cls_rsvp: fix rsvp_policy
    - gtp: use __GFP_NOWARN to avoid memalloc warning
    - l2tp: Allow duplicate session creation with UDP
    - net: hsr: fix possible NULL deref in hsr_handle_frame()
    - net_sched: fix an OOB access in cls_tcindex
    - net: stmmac: Delete txtimer in suspend()
    - bnxt_en: Fix TC queue mapping.
    - tcp: clear tp->total_retrans in tcp_disconnect()
    - tcp: clear tp->delivered in tcp_disconnect()
    - tcp: clear tp->data_segs{in|out} in tcp_disconnect()
    - tcp: clear tp->segs_{in|out} in tcp_disconnect()
    - rxrpc: Fix use-after-free in rxrpc_put_local()
    - rxrpc: Fix insufficient receive notification generation
    - rxrpc: Fix missing active use pinning of rxrpc_local object
    - rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
    - media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
    - mfd: dln2: More sanity checking for endpoints
    - ipc/msg.c: consolidate all xxxctl_down() functions
    - tracing: Fix sched switch start/stop refcount racy updates
    - rcu: Avoid data-race in rcu_gp_fqs_check_wake()
    - brcmfmac: Fix memory leak in brcmf_usbdev_qinit
    - usb: typec: tcpci: mask event interrupts when remove driver
    - usb: gadget: legacy: set max_speed to super-speed
    - usb: gadget: f_ncm: Use atomic_t to track in-flight request
    - usb: gadget: f_ecm: Use atomic_t to track in-flight request
    - ALSA: usb-audio: Fix endianess in descriptor validatio...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Bionic):
status: In Progress → Invalid
Changed in kunpeng920:
status: In Progress → Fix Committed
Ike Panhc (ikepanhc)
tags: removed: ikeradar
Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.