Update SmartPQI driver

Bug #1933518 reported by Jeff Lane 
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Krzysztof Kozlowski
Focal
Won't Fix
High
Krzysztof Kozlowski
Hirsute
Fix Released
High
Krzysztof Kozlowski
linux-hwe-5.11 (Ubuntu)
New
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned
Hirsute
Invalid
Undecided
Unassigned

Bug Description

[Impact]
Improves support in SmartPQI for most recent controllers

[Fixes]
All are in mainline currently and cleanly cherry pick into Hirsute.
c64aab41c5e1 scsi: smartpqi: Remove unused functions
5cad5a507241 scsi: smartpqi: Fix device pointer variable reference static checker issue
667298ceaf04 scsi: smartpqi: Fix blocks_per_row static checker issue
d56030f882a7 scsi: smartpqi: Update version to 2.1.8-045
75fbeacca3ad scsi: smartpqi: Add new PCI IDs
43e97ef482ee scsi: smartpqi: Correct system hangs when resuming from hibernation
d0cba99fd7a3 scsi: smartpqi: Update enclosure identifier in sysfs
18ff5f0877be scsi: smartpqi: Add additional logging for LUN resets
55732a46d6c5 scsi: smartpqi: Update SAS initiator_port_protocols and target_port_protocols
ec504b23df9d scsi: smartpqi: Add phy ID support for the physical drives
a425625277a1 scsi: smartpqi: Convert snprintf() to scnprintf()
3268b8a8cf77 scsi: smartpqi: Fix driver synchronization issues
66f1c2b40270 scsi: smartpqi: Update device scan operations
2790cd4d3f6a scsi: smartpqi: Update OFA management
5be9db069d3f scsi: smartpqi: Update RAID bypass handling
9fa820233609 scsi: smartpqi: Update suspend/resume and shutdown
37f3318199ce scsi: smartpqi: Synchronize device resets with mutex
4ccc354bac14 scsi: smartpqi: Update soft reset management for OFA
06b41e0d1800 scsi: smartpqi: Update event handler
7a84a821f194 scsi: smartpqi: Add support for wwid
ae0c189db4f1 scsi: smartpqi: Remove timeouts from internal cmds
99a12b487f19 scsi: smartpqi: Disable WRITE SAME for HBA NVMe disks
5be746d7d74b scsi: smartpqi: Add host level stream detection enable
c7ffedb3a774 scsi: smartpqi: Add stream detection
583891c9e509 scsi: smartpqi: Align code with oob driver
598bef8d7942 scsi: smartpqi: Add support for long firmware version
f6cc2a774aa7 scsi: smartpqi: Add support for BMIC sense feature cmd and feature bits
7a012c23c7a7 scsi: smartpqi: Add support for RAID1 writes
6702d2c40f31 scsi: smartpqi: Add support for RAID5 and RAID6 writes
1a22bc4bee22 scsi: smartpqi: Refactor scatterlist code
281a817f232e scsi: smartpqi: Refactor aio submission code
2708a25643ab scsi: smartpqi: Add support for new product ids
b622a601a13a scsi: smartpqi: Correct request leakage during reset operations
c6d3ee209b9e scsi: smartpqi: Use host-wide tag space

The patches they provided only apply the hunk that applies to the SmartPQI driver. This commit was a much wider commit that removes references to MODULE_SUPPORTED_DEVICE in many drivers across the kernel.
6417f03132a6 module: remove never implemented MODULE_SUPPORTED_DEVICE

[Testing]
On machines equipped with SmartPQI SCSI controller:
1. reboot tests
2. insmod/rmmod tests
3. fio testing: no performance regressions

[Regression Risk]
Patchset changes only smartpqi driver so regression is limited to systems equipped with this SCSI device. On such SmartPQI-equipped systems the patchset can cause data corruption, data loss or unavailability of SCSI storage and boot failure.

Jeff Lane  (bladernr)
Changed in linux (Ubuntu):
assignee: nobody → Jeff Lane (bladernr)
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
assignee: nobody → Jeff Lane (bladernr)
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Jeff Lane  (bladernr) wrote :

tarball of patches that cover these commits provided by microchip

Jeff Lane  (bladernr)
description: updated
Changed in linux (Ubuntu):
assignee: Jeff Lane (bladernr) → Krzysztof Kozlowski (krzk)
Changed in linux (Ubuntu Focal):
assignee: nobody → Krzysztof Kozlowski (krzk)
status: New → In Progress
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Updated the "Regression risk" so it correctly lists possible failures needed investigation.

description: updated
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Patches backported. Kernel packages for testing (Focal, hwe-5.11, x86_64):
https://kernel.ubuntu.com/~krzk/wip/focal-hwe-5.11-Ubuntu-hwe-5.11-5.11.0-23.24_20.04.1-30-g8fcdd67cf971/

Revision history for this message
Don Brace (bracedon) wrote :

I installed the provided kernel package and thus far performed:
1. reboot tests
2. insmod/rmmod tests
3. fio testing: no performance regressions

description: updated
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Thanks Don for testing. I sent a SRU with the commits.

Changed in linux (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Hirsute):
importance: Medium → High
assignee: Jeff Lane (bladernr) → Krzysztof Kozlowski (krzk)
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

These patches are not needed for focal/linux. They are targeted for focal/linux-hwe-5.11, so having them applied for hirsute/linux first is the correct nomination.

Changed in linux (Ubuntu Focal):
status: In Progress → Invalid
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Jeff Lane  (bladernr) wrote (last edit ):

The request was also for 5.4 which is why I had added that task. Is Focal not possible? My apologies for any confusion there.

Changed in linux (Ubuntu Focal):
status: Invalid → New
Revision history for this message
Jeff Lane  (bladernr) wrote :

OK, spoke with Microchip and it's OK to leave it as 5.11 HWE only for 20.04.3.

Changed in linux (Ubuntu Focal):
status: New → Won't Fix
Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Jeff Lane, someone from Microchip should verify whether the Focal hwe-5.11 (5.11.0-26.28_20.04.1) and Hirsute (5.11.0-26.28) kernels from proposed fix the issue.

Changed in linux (Ubuntu Hirsute):
status: Fix Released → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

Focal hwe-5.11 (5.11.0-26.28_20.04.1) has not been promoted to -proposed. We are going to re-spin the previous cycle kernel to include the SmartPQI driver update, I'll post here the new kernel version for tests.

Changed in linux-hwe-5.11 (Ubuntu Hirsute):
status: New → Invalid
Changed in linux-hwe-5.11 (Ubuntu Focal):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Don Brace (dabrace) wrote :

What you need is some verification tests?

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Hi Don,
Yes, we need a verification whether the new kernels (containing the changes) work as expected on machines with SmartPQI devices. You can test Hirsute (5.11.0-26.28) from proposed but for Focal hwe-5.11 please wait for a re-spin and new proposed kernel.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-hwe-5.11 - 5.11.0-27.29~20.04.1

---------------
linux-hwe-5.11 (5.11.0-27.29~20.04.1) focal; urgency=medium

  * focal/linux-hwe-5.11: 5.11.0-27.29~20.04.1 -proposed tracker (LP: #1939554)

  * Update SmartPQI driver (LP: #1933518)
    - scsi: smartpqi: Add support for new product ids
    - scsi: smartpqi: Refactor aio submission code
    - scsi: smartpqi: Refactor scatterlist code
    - scsi: smartpqi: Add support for RAID5 and RAID6 writes
    - scsi: smartpqi: Add support for RAID1 writes
    - scsi: smartpqi: Add support for BMIC sense feature cmd and feature bits
    - scsi: smartpqi: Add support for long firmware version
    - scsi: smartpqi: Align code with oob driver
    - scsi: smartpqi: Add stream detection
    - scsi: smartpqi: Add host level stream detection enable
    - scsi: smartpqi: Disable WRITE SAME for HBA NVMe disks
    - scsi: smartpqi: Remove timeouts from internal cmds
    - scsi: smartpqi: Add support for wwid
    - scsi: smartpqi: Update event handler
    - scsi: smartpqi: Update soft reset management for OFA
    - scsi: smartpqi: Synchronize device resets with mutex
    - scsi: smartpqi: Update suspend/resume and shutdown
    - scsi: smartpqi: Update RAID bypass handling
    - scsi: smartpqi: Update OFA management
    - scsi: smartpqi: Update device scan operations
    - scsi: smartpqi: Fix driver synchronization issues
    - scsi: smartpqi: Convert snprintf() to scnprintf()
    - scsi: smartpqi: Add phy ID support for the physical drives
    - scsi: smartpqi: Update SAS initiator_port_protocols and
      target_port_protocols
    - scsi: smartpqi: Add additional logging for LUN resets
    - scsi: smartpqi: Update enclosure identifier in sysfs
    - scsi: smartpqi: Correct system hangs when resuming from hibernation
    - scsi: smartpqi: Update version to 2.1.8-045
    - scsi: smartpqi: Fix blocks_per_row static checker issue
    - scsi: smartpqi: Fix device pointer variable reference static checker issue
    - scsi: smartpqi: Remove unused functions

  * Hirsute update: upstream stable patchset 2021-06-14 (LP: #1931896) // HWE
    kernels: NFSv4.1 NULL pointer dereference (LP: #1939157)
    - NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()

  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

 -- Kleber Sacilotto de Souza <email address hidden> Wed, 11 Aug 2021 16:53:07 +0200

Changed in linux-hwe-5.11 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (48.1 KiB)

This bug was fixed in the package linux - 5.11.0-31.33

---------------
linux (5.11.0-31.33) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-31.33 -proposed tracker (LP: #1939553)

  * REGRESSION: shiftfs lets sendfile fail with EINVAL (LP: #1939301)
    - SAUCE: shiftfs: fix sendfile() invocations

linux (5.11.0-26.28) hirsute; urgency=medium

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption

  * Add l2tp.sh in net from ubuntu_kernel_selftests back (LP: #1934293)
    - Revert "UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test"

  * icmp_redirect.sh in net from ubuntu_kernel_selftests failed on F-OEM-5.6 /
    F-OEM-5.10 / F-OEM-5.13 / F / G / H (LP: #1880645)
    - selftests: icmp_redirect: support expected failures

  * Mute/mic LEDs no function on some HP platfroms (LP: #1934878)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8

  * [SRU][OEM-5.10/H] Fix HDMI output issue on Intel TGL GPU (LP: #1934864)
    - drm/i915: Fix HAS_LSPCON macro for platforms between GEN9 and GEN10

  * mute/micmute LEDs no function on HP EliteBook 830 G8 Notebook PC
    (LP: #1934239)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC

  * ubuntu-host driver lacks lseek ops (LP: #1934110)
    - ubuntu-host: add generic lseek op

  * ubuntu_kernel_selftests ftrace fails on arm64 F / aws-5.8 / amd64 F
    azure-5.8 (LP: #1927749)
    - selftests/ftrace: fix event-no-pid on 1-core machine

  * Hirsute update: upstream stable patchset 2021-06-29 (LP: #1934012)
    - proc: Track /proc/$pid/attr/ opener mm_struct
    - ASoC: max98088: fix ni clock divider calculation
    - ASoC: amd: fix for pcm_read() error
    - spi: Fix spi device unregister flow
    - spi: spi-zynq-qspi: Fix stack violation bug
    - bpf: Forbid trampoline attach for functions with variable arguments
    - net/nfc/rawsock.c: fix a permission check bug
    - usb: cdns3: Fix runtime PM imbalance on error
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
    - vfio-ccw: Reset FSM state to IDLE inside FSM
    - vfio-ccw: Serialize FSM IDLE state with I/O completion
    - ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
    - spi: sprd: Add missing MODULE_DEVICE_TABLE
    - usb: chipidea: udc: assign interrupt number to USB gadget structure
    - isdn: mISDN: netjet: Fix crash in nj_probe:
    - bonding: init notify_work earlier to avoid uninitialized use
    - netlink: disable IRQs for netlink_lock_table()
    - net: mdiobus: get rid of a BUG_ON()
    - cgroup: disable controllers at parse time
    - wq: handle VM suspension in stall detection
    - net/qla3xxx: fix schedule while atomic in ql_sem_spinlock
    - RDS tcp loopback connection can hang
    - net:sfc: fix non-freed irq in legacy irq mode
    - scsi: bnx2fc: Return failure if io_req is already in ABTS processing
    - scsi:...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Don Brace (dabrace) wrote :

Completed my testing.
System Information
        Manufacturer: HPE
        Product Name: ProLiant DL385 Gen10
        Version: Not Specified
        Serial Number: 2M2935031X
        UUID: 37383738-3831-4d32-3239-333530333158
        Wake-up Type: Power Switch
        SKU Number: 878718-B21
        Family: ProLiant

lsscsi
[0:0:0:0] enclosu HPE Smart Adapter 3.53 -
[0:1:0:0] disk HPE LOGICAL VOLUME 3.53 /dev/sda
[0:2:0:0] storage HPE P408i-a SR Gen10 3.53 -
[1:0:0:0] disk Generic- SD/MMC CRW 1.00 /dev/sdb

Booted from P408i.

I spent some time testing kdump. I had to disable IOMMU to get kexec/kdump to work. Not sure why because neither the smartpqi driver nor our controllers care about IOMMU settings.
So, this perhaps is a platform issue.

I used the following kdump settings:
 /etc/default/grub.d/kdump-tools.cfg
   GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=512M-:640M"

vi /etc/default/kdump-tools
KDUMP_KEXEC_ARGS="--elf64-core-headers"
#KDUMP_CMDLINE=""
KDUMP_CMDLINE_APPEND="reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb ata_piix.prefer_ms_hyperv=0 acpi_no_memhotplug numa=off lapic pci=nommconf pci=biosirq nosmep mem_encrypt=off"

Not sure what all is really required, but kdump works with all of the above settings.

ls -ltr /var/crash/202108271939/
total 2374244
-r-------- 1 root root 33816203264 Aug 27 19:40 vmcore.202108271939
root@sys2m2935031x:~# date
Fri 27 Aug 2021 07:45:45 PM UTC

All of my other tests looked good.
fio performance testing - No regressions.
reboot testing
configuration testing - (using our SSA tools to switch between RAID and HBA mode)

Revision history for this message
Krzysztof Kozlowski (krzk) wrote :

Thank you, Don!

Changed in linux (Ubuntu):
status: In Progress → Fix Released
tags: added: verification-done-focal verification-done-hirsute
removed: verification-needed-focal verification-needed-hirsute
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.