Update SmartPQI driver

Bug #1933518 reported by Jeff Lane
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Krzysztof Kozlowski
Focal
High
Krzysztof Kozlowski
Hirsute
High
Krzysztof Kozlowski
linux-hwe-5.11 (Ubuntu)
Undecided
Unassigned
Focal
Undecided
Unassigned
Hirsute
Undecided
Unassigned

Bug Description

[Impact]
Improves support in SmartPQI for most recent controllers

[Fixes]
All are in mainline currently and cleanly cherry pick into Hirsute.
c64aab41c5e1 scsi: smartpqi: Remove unused functions
5cad5a507241 scsi: smartpqi: Fix device pointer variable reference static checker issue
667298ceaf04 scsi: smartpqi: Fix blocks_per_row static checker issue
d56030f882a7 scsi: smartpqi: Update version to 2.1.8-045
75fbeacca3ad scsi: smartpqi: Add new PCI IDs
43e97ef482ee scsi: smartpqi: Correct system hangs when resuming from hibernation
d0cba99fd7a3 scsi: smartpqi: Update enclosure identifier in sysfs
18ff5f0877be scsi: smartpqi: Add additional logging for LUN resets
55732a46d6c5 scsi: smartpqi: Update SAS initiator_port_protocols and target_port_protocols
ec504b23df9d scsi: smartpqi: Add phy ID support for the physical drives
a425625277a1 scsi: smartpqi: Convert snprintf() to scnprintf()
3268b8a8cf77 scsi: smartpqi: Fix driver synchronization issues
66f1c2b40270 scsi: smartpqi: Update device scan operations
2790cd4d3f6a scsi: smartpqi: Update OFA management
5be9db069d3f scsi: smartpqi: Update RAID bypass handling
9fa820233609 scsi: smartpqi: Update suspend/resume and shutdown
37f3318199ce scsi: smartpqi: Synchronize device resets with mutex
4ccc354bac14 scsi: smartpqi: Update soft reset management for OFA
06b41e0d1800 scsi: smartpqi: Update event handler
7a84a821f194 scsi: smartpqi: Add support for wwid
ae0c189db4f1 scsi: smartpqi: Remove timeouts from internal cmds
99a12b487f19 scsi: smartpqi: Disable WRITE SAME for HBA NVMe disks
5be746d7d74b scsi: smartpqi: Add host level stream detection enable
c7ffedb3a774 scsi: smartpqi: Add stream detection
583891c9e509 scsi: smartpqi: Align code with oob driver
598bef8d7942 scsi: smartpqi: Add support for long firmware version
f6cc2a774aa7 scsi: smartpqi: Add support for BMIC sense feature cmd and feature bits
7a012c23c7a7 scsi: smartpqi: Add support for RAID1 writes
6702d2c40f31 scsi: smartpqi: Add support for RAID5 and RAID6 writes
1a22bc4bee22 scsi: smartpqi: Refactor scatterlist code
281a817f232e scsi: smartpqi: Refactor aio submission code
2708a25643ab scsi: smartpqi: Add support for new product ids
b622a601a13a scsi: smartpqi: Correct request leakage during reset operations
c6d3ee209b9e scsi: smartpqi: Use host-wide tag space

The patches they provided only apply the hunk that applies to the SmartPQI driver. This commit was a much wider commit that removes references to MODULE_SUPPORTED_DEVICE in many drivers across the kernel.
6417f03132a6 module: remove never implemented MODULE_SUPPORTED_DEVICE

[Testing]
On machines equipped with SmartPQI SCSI controller:
1. reboot tests
2. insmod/rmmod tests
3. fio testing: no performance regressions

[Regression Risk]
Patchset changes only smartpqi driver so regression is limited to systems equipped with this SCSI device. On such SmartPQI-equipped systems the patchset can cause data corruption, data loss or unavailability of SCSI storage and boot failure.

Jeff Lane (bladernr)
Changed in linux (Ubuntu):
assignee: nobody → Jeff Lane (bladernr)
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
assignee: nobody → Jeff Lane (bladernr)
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Jeff Lane (bladernr) wrote :

tarball of patches that cover these commits provided by microchip

Jeff Lane (bladernr)
description: updated
Changed in linux (Ubuntu):
assignee: Jeff Lane (bladernr) → Krzysztof Kozlowski (krzk)
Changed in linux (Ubuntu Focal):
assignee: nobody → Krzysztof Kozlowski (krzk)
status: New → In Progress
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Updated the "Regression risk" so it correctly lists possible failures needed investigation.

description: updated
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Patches backported. Kernel packages for testing (Focal, hwe-5.11, x86_64):
https://kernel.ubuntu.com/~krzk/wip/focal-hwe-5.11-Ubuntu-hwe-5.11-5.11.0-23.24_20.04.1-30-g8fcdd67cf971/

Revision history for this message
Don Brace (bracedon) wrote :

I installed the provided kernel package and thus far performed:
1. reboot tests
2. insmod/rmmod tests
3. fio testing: no performance regressions

description: updated
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Thanks Don for testing. I sent a SRU with the commits.

Changed in linux (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Hirsute):
importance: Medium → High
assignee: Jeff Lane (bladernr) → Krzysztof Kozlowski (krzk)
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

These patches are not needed for focal/linux. They are targeted for focal/linux-hwe-5.11, so having them applied for hirsute/linux first is the correct nomination.

Changed in linux (Ubuntu Focal):
status: In Progress → Invalid
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Jeff Lane (bladernr) wrote (last edit ):

The request was also for 5.4 which is why I had added that task. Is Focal not possible? My apologies for any confusion there.

Changed in linux (Ubuntu Focal):
status: Invalid → New
Revision history for this message
Jeff Lane (bladernr) wrote :

OK, spoke with Microchip and it's OK to leave it as 5.11 HWE only for 20.04.3.

Changed in linux (Ubuntu Focal):
status: New → Won't Fix
Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Jeff Lane, someone from Microchip should verify whether the Focal hwe-5.11 (5.11.0-26.28_20.04.1) and Hirsute (5.11.0-26.28) kernels from proposed fix the issue.

Changed in linux (Ubuntu Hirsute):
status: Fix Released → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Focal hwe-5.11 (5.11.0-26.28_20.04.1) has not been promoted to -proposed. We are going to re-spin the previous cycle kernel to include the SmartPQI driver update, I'll post here the new kernel version for tests.

Changed in linux-hwe-5.11 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-hwe-5.11 (Ubuntu Focal):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Don Brace (dabrace) wrote :

What you need is some verification tests?

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Hi Don,
Yes, we need a verification whether the new kernels (containing the changes) work as expected on machines with SmartPQI devices. You can test Hirsute (5.11.0-26.28) from proposed but for Focal hwe-5.11 please wait for a re-spin and new proposed kernel.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-hwe-5.11 - 5.11.0-27.29~20.04.1

---------------
linux-hwe-5.11 (5.11.0-27.29~20.04.1) focal; urgency=medium

  * focal/linux-hwe-5.11: 5.11.0-27.29~20.04.1 -proposed tracker (LP: #1939554)

  * Update SmartPQI driver (LP: #1933518)
    - scsi: smartpqi: Add support for new product ids
    - scsi: smartpqi: Refactor aio submission code
    - scsi: smartpqi: Refactor scatterlist code
    - scsi: smartpqi: Add support for RAID5 and RAID6 writes
    - scsi: smartpqi: Add support for RAID1 writes
    - scsi: smartpqi: Add support for BMIC sense feature cmd and feature bits
    - scsi: smartpqi: Add support for long firmware version
    - scsi: smartpqi: Align code with oob driver
    - scsi: smartpqi: Add stream detection
    - scsi: smartpqi: Add host level stream detection enable
    - scsi: smartpqi: Disable WRITE SAME for HBA NVMe disks
    - scsi: smartpqi: Remove timeouts from internal cmds
    - scsi: smartpqi: Add support for wwid
    - scsi: smartpqi: Update event handler
    - scsi: smartpqi: Update soft reset management for OFA
    - scsi: smartpqi: Synchronize device resets with mutex
    - scsi: smartpqi: Update suspend/resume and shutdown
    - scsi: smartpqi: Update RAID bypass handling
    - scsi: smartpqi: Update OFA management
    - scsi: smartpqi: Update device scan operations
    - scsi: smartpqi: Fix driver synchronization issues
    - scsi: smartpqi: Convert snprintf() to scnprintf()
    - scsi: smartpqi: Add phy ID support for the physical drives
    - scsi: smartpqi: Update SAS initiator_port_protocols and
      target_port_protocols
    - scsi: smartpqi: Add additional logging for LUN resets
    - scsi: smartpqi: Update enclosure identifier in sysfs
    - scsi: smartpqi: Correct system hangs when resuming from hibernation
    - scsi: smartpqi: Update version to 2.1.8-045
    - scsi: smartpqi: Fix blocks_per_row static checker issue
    - scsi: smartpqi: Fix device pointer variable reference static checker issue
    - scsi: smartpqi: Remove unused functions

  * Hirsute update: upstream stable patchset 2021-06-14 (LP: #1931896) // HWE
    kernels: NFSv4.1 NULL pointer dereference (LP: #1939157)
    - NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()

  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

 -- Kleber Sacilotto de Souza <email address hidden> Wed, 11 Aug 2021 16:53:07 +0200

Changed in linux-hwe-5.11 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (48.1 KiB)

This bug was fixed in the package linux - 5.11.0-31.33

---------------
linux (5.11.0-31.33) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-31.33 -proposed tracker (LP: #1939553)

  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

linux (5.11.0-26.28) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption

  * Add l2tp.sh in net from ubuntu_kernel_selftests back (LP: #1934293)
    - Revert "UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test"

  * icmp_redirect.sh in net from ubuntu_kernel_selftests failed on F-OEM-5.6 /
    F-OEM-5.10 / F-OEM-5.13 / F / G / H (LP: #1880645)
    - selftests: icmp_redirect: support expected failures

  * Mute/mic LEDs no function on some HP platfroms (LP: #1934878)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8

  * [SRU][OEM-5.10/H] Fix HDMI output issue on Intel TGL GPU (LP: #1934864)
    - drm/i915: Fix HAS_LSPCON macro for platforms between GEN9 and GEN10

  * mute/micmute LEDs no function on HP EliteBook 830 G8 Notebook PC
    (LP: #1934239)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC

  * ubuntu-host driver lacks lseek ops (LP: #1934110)
    - ubuntu-host: add generic lseek op

  * ubuntu_kernel_selftests ftrace fails on arm64 F / aws-5.8 / amd64 F
    azure-5.8 (LP: #1927749)
    - selftests/ftrace: fix event-no-pid on 1-core machine

  * Hirsute update: upstream stable patchset 2021-06-29 (LP: #1934012)
    - proc: Track /proc/$pid/attr/ opener mm_struct
    - ASoC: max98088: fix ni clock divider calculation
    - ASoC: amd: fix for pcm_read() error
    - spi: Fix spi device unregister flow
    - spi: spi-zynq-qspi: Fix stack violation bug
    - bpf: Forbid trampoline attach for functions with variable arguments
    - net/nfc/rawsock.c: fix a permission check bug
    - usb: cdns3: Fix runtime PM imbalance on error
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
    - vfio-ccw: Reset FSM state to IDLE inside FSM
    - vfio-ccw: Serialize FSM IDLE state with I/O completion
    - ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
    - spi: sprd: Add missing MODULE_DEVICE_TABLE
    - usb: chipidea: udc: assign interrupt number to USB gadget structure
    - isdn: mISDN: netjet: Fix crash in nj_probe:
    - bonding: init notify_work earlier to avoid uninitialized use
    - netlink: disable IRQs for netlink_lock_table()
    - net: mdiobus: get rid of a BUG_ON()
    - cgroup: disable controllers at parse time
    - wq: handle VM suspension in stall detection
    - net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
    - RDS tcp loopback connection can hang
    - net:sfc: fix non-freed irq in legacy irq mode
    - scsi: bnx2fc: Return failure if io_req is already in ABTS processing
    - scsi:...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Don Brace (dabrace) wrote :

Completed my testing.
System Information
        Manufacturer: HPE
        Product Name: ProLiant DL385 Gen10
        Version: Not Specified
        Serial Number: 2M2935031X
        UUID: 37383738-3831-4d32-3239-333530333158
        Wake-up Type: Power Switch
        SKU Number: 878718-B21
        Family: ProLiant

lsscsi
[0:0:0:0] enclosu HPE Smart Adapter 3.53 -
[0:1:0:0] disk HPE LOGICAL VOLUME 3.53 /dev/sda
[0:2:0:0] storage HPE P408i-a SR Gen10 3.53 -
[1:0:0:0] disk Generic- SD/MMC CRW 1.00 /dev/sdb

Booted from P408i.

I spent some time testing kdump. I had to disable IOMMU to get kexec/kdump to work. Not sure why because neither the smartpqi driver nor our controllers care about IOMMU settings.
So, this perhaps is a platform issue.

I used the following kdump settings:
 /etc/default/grub.d/kdump-tools.cfg
   GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:640M"

vi /etc/default/kdump-tools
KDUMP_KEXEC_ARGS="--elf64-core-headers"
#KDUMP_CMDLINE=""
KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0 acpi_no_memhotplug numa=off lapic pci=nommconf pci=biosirq nosmep mem_encrypt=off"

Not sure what all is really required, but kdump works with all of the above settings.

ls -ltr /var/crash/202108271939/
total 2374244
-r-------- 1 root root 33816203264 Aug 27 19:40 vmcore.202108271939
root@sys2m2935031x:~# date
Fri 27 Aug 2021 07:45:45 PM UTC

All of my other tests looked good.
fio performance testing - No regressions.
reboot testing
configuration testing - (using our SSA tools to switch between RAID and HBA mode)

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Thank you, Don!

Changed in linux (Ubuntu):
status: In Progress → Fix Released
tags: added: verification-done-focal verification-done-hirsute
removed: verification-needed-focal verification-needed-hirsute
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers