mpt3sas - driver using the wrong register to update a queue index in FW

Bug #1810781 reported by Guilherme G. Piccoli on 2019-01-07
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Mauricio Faria de Oliveira
Xenial
Medium
Mauricio Faria de Oliveira
Bionic
Critical
Mauricio Faria de Oliveira
Cosmic
Critical
Mauricio Faria de Oliveira
Disco
Medium
Mauricio Faria de Oliveira

Bug Description

[Impact]

* Adapter resets periodically during high-load activity.

* I/O stalls until reset/reinit is complete (latency) and I/O performance
degrades across cluster (e.g., low throughput from data spread over nodes).

* The mpt3sas driver relies in a FW queue (Reply Post Descriptor Queue) in the I/O completion path; there's a MMIO register that driver uses to flag an empty entry in such queue, called Reply Post Host Index. This value is updated during the driver interrupt routine [in _base_interrupt() function].

* Happens that there are 2 registers representing the Reply Post Host Index according to the type of the adapter. They are differentiated in the driver through the "ioc->combined_reply_queue" check. By the MPI specification (vendor spec), driver should use this combined reply queue according to the number of maximum MSI-X vectors that the adapter exposes and the spec version (SAS 3.0 vs SAS 3.5).

* Currently, this is wrong checked for a class of adapters, which was fixed in the upstream
kernel commit 2b48be65685a [0]. Without this commit, we can observe spontaneous resets in the
driver due to queue overflow (FW is not aware that there are free entries in the Reply Post Descriptor Queue). The dmesg log will show the following output in case of this error:

  mpt3sas_cm0: fault_state(0x2100)!
  mpt3sas_cm0: sending diag reset !!
  mpt3sas_cm0: diag reset: SUCCESS
[followed by a lot of driver messages as result of the reset procedure]

* During these resets, I/O is stalled so it may affect performance.

[Test Case]

* It's not trivial to test the problem, but given a machine with an affected device, an I/O benchmark like FIO could be used to exercise the I/O path in a heavy way and trigger the issue.

* We have reports that the adapter "LSI Logic / Symbios Logic Device [1000:00ac]" is affected by the issue. And this commit resolved the problem.

[Regression Potential]

* This is a long-term issue from the mpt3sas driver, affecting only a class of adapters of this vendor. Since it's a clearly bug, the fix is necessary. The potential of regressions is unknown, but likely low - it changes the register used for the index updates given some set of characteristics of the adapter (according to the spec.), which restricts even more the scope of this patch.

Guilherme G. Piccoli (gpiccoli) wrote :
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
Changed in linux (Ubuntu Cosmic):
importance: Undecided → Critical
Changed in linux (Ubuntu Xenial):
importance: Undecided → Critical
status: New → Confirmed
Changed in linux (Ubuntu Bionic):
status: New → Confirmed
Changed in linux (Ubuntu Cosmic):
status: New → Confirmed
Changed in linux (Ubuntu Disco):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Guilherme G. Piccoli (gpiccoli) wrote :

Xenial has no support for the SAS 3.5 class, so we won't backport the patch - it's only needed in Bionic (4.15 / Xenial HWE) and Cosmic kernel (4.18).

Changed in linux (Ubuntu Xenial):
status: Confirmed → Won't Fix
importance: Critical → Medium
Changed in linux (Ubuntu Disco):
importance: Critical → Medium
Changed in linux (Ubuntu Xenial):
assignee: Guilherme G. Piccoli (gpiccoli) → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Bionic):
assignee: Guilherme G. Piccoli (gpiccoli) → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Cosmic):
assignee: Guilherme G. Piccoli (gpiccoli) → Mauricio Faria de Oliveira (mfo)
Changed in linux (Ubuntu Disco):
assignee: Guilherme G. Piccoli (gpiccoli) → Mauricio Faria de Oliveira (mfo)
description: updated
description: updated

Patch submitted to kernel-team mailing list, got 2 ACKs.

https://lists.ubuntu.com/archives/kernel-team/2019-January/097471.html

Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: Confirmed → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
tags: added: verification-needed-bionic
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Verification done on Cosmic for regression on an older adapter model, I/O stress (iozone) finishes successfully, no errors seen in dmesg.

Waiting for verification on Bionic by the reporter.

root@dixie:~# fdisk /dev/sdb # create one partition
root@dixie:~# mkfs.ext4 /dev/sdb1
root@dixie:~# mount /dev/sdb1 /test/
root@dixie:~# cd /test
root@dixie:/test# iozone -R -s 2G -r 1m -S 2048 -i 0 -i 2 -i 8 -G -c -o -l 128 -u 128 -t 128
root@dixie:/test# dmesg | tail
<...>
[ 693.674243] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: (null)

tags: added: verification-done-cosmic
removed: verification-needed-cosmic

Verification done on Bionic, with the HWE kernel in Xenial
(i.e., 4.15.0-44.47~16.04.1 per the original reporter's environment)

The mpt3sas driver is running correctly -- the sosreport shows the previous kernel had mpt3sas fault_state error messages repeatedly within less than 10 minutes, and the current kernel has zero in ~76 minutes (~4600 seconds uptime in kern.log).

$ grep -e 'Linux version' -e 'mpt3sas.*fault_state' sosreport-<details ommited>/var/log/kern.log
...
Jan 18 04:32:36 <details ommited> kernel: [18225729.321846] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 04:41:08 <details ommited> kernel: [18226240.928889] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 04:48:47 <details ommited> kernel: [18226700.312831] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 04:57:29 <details ommited> kernel: [18227222.159601] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 05:05:46 <details ommited> kernel: [18227719.430826] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 05:12:52 <details ommited> kernel: [18228145.023317] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 05:17:22 <details ommited> kernel: [18228414.970544] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 05:22:22 <details ommited> kernel: [18228714.613254] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 05:26:57 <details ommited> kernel: [18228989.680424] mpt3sas_cm0: fault_state(0x2100)!
Jan 18 05:36:14 <details ommited> kernel: [ 0.000000] Linux version 4.15.0-44-generic (buildd@lcy01-amd64-025) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #47~16.04.1-Ubuntu SMP Mon Jan 14 20:50:30 UTC 2019 (Ubuntu 4.15.0-44.47~16.04.1-generic 4.15.18)

$ tail -n1 sosreport-<details ommited>/var/log/kern.log
Jan 18 06:52:36 <details ommited> kernel: [ 4613.908291] perf: interrupt took too long (3958 > 3952), lowering kernel.perf_event_max_sample_rate to 50500

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (47.0 KiB)

This bug was fixed in the package linux - 4.15.0-44.47

---------------
linux (4.15.0-44.47) bionic; urgency=medium

  * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: pass in enum wbt_flags to get_rq_wait()
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't support
      1.8v
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO
    - mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
    - perf: Export perf_event_update_userpage
    - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
    - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
    - [Config] New config CONFIG_THUNDERX2_PMU=m

  * Update hisilicon SoC-specific drivers (LP: #1810457)
    - SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi BD"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when
      resetting"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling roce
      callback function"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback
      function when link status change"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of notifying
      roce client"
    - Revert "UBUNTU: S...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (56.3 KiB)

This bug was fixed in the package linux - 4.18.0-14.15

---------------
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
    - efi: honour memory reservations passed via a linux specific config table
    - efi/arm: libstub: add a root memreserve config table
    - efi: add API to reserve memory persistently across kexec reboot
    - irqchip/gic-v3-its: Change initialization ordering for LPIs
    - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
    - irqchip/gic-v3-its: Split property table clearing from allocation
    - irqchip/gic-v3-its: Move pending table allocation to init time
    - irqchip/gic-v3-its: Keep track of property table's PA and VA
    - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
    - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
      kernels
    - irqchip/gic-v3-its: Check that all RDs have the same property table
    - irqchip/gic-v3-its: Register LPI tables with EFI config table
    - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
    - arm64: memblock: don't permit memblock resizing until linear mapping is up
    - efi/arm: Defer persistent reservations until after paging_init()
    - efi: Permit calling efi_mem_reserve_persistent() from atomic context
    - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: c...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers