NVMe drive fails at high write workload after kernel upgrades

Bug #2060770 reported by Enoch Leung
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Kai-Heng Feng

Bug Description

My problem is similarly described in this old thread:
https://unix.stackexchange.com/questions/742360/

journalctl message: one of the many related logs
Apr 09 15:37:40.096850 ****** kernel: Linux version 6.5.0-26-lowlatency (buildd@lcy02-amd64-109) (x86_64-linux-gnu-gcc-12 (Ubunntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #26.1~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Mar 13 10:41:42 UTC (Ubuntu 6.5.0-26.26.1~22.04.1-lowlatency 6.5.13)
....................
Apr 09 15:43:46.238697 ****** kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Apr 09 15:43:46.239162 ****** kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Apr 09 15:43:46.239266 ****** kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Apr 09 15:43:46.690200 ****** kernel: nvme 0000:06:00.0: enabling device (0000 -> 0002)
Apr 09 15:43:46.690409 ****** kernel: nvme nvme0: Disabling device after reset failure: -19
Apr 09 15:43:46.698188 ****** kernel: I/O error, dev nvme0n1, sector 1216896 op 0x1:(WRITE) flags 0xc800 phys_seg 1 prio clas>

I was using 22.04.4 with hwe kernel, as shown above (kernel 6.5)
upgrade to 24.04 dev hoping the problem would be resolved, but no it still exists (kernel 6.8)

The problem happens after some kernel upgrades that I'd done after 2024-03-01, but I cannot pinpoint when; the nvme_core kernel param as shown in the message above does not help.

The problem does NOT exist with 22.04 regular kernel:
Currently I'd created a VM to perform my heavy write workload using pci passthrough of the NVMe drive, and it works okay. Cannot downgrade host to older kernel because of ZFS pool being upgraded

VM info (where my NVMe drive works okay)

uname -r
5.15.0-78-lowlatency

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

(maybe) related hardware spec
CPU: AMD Ryzen 5750G (x8x4x4)
Chipset: AMD B450
NVMe: Samsung MZ1LB960HBJR-000FB (PM983a, f/w EDW73F2Q)

Enoch Leung (leun0036)
description: updated
Changed in linux (Ubuntu):
assignee: nobody → Anthony Wong (anthonywong)
Changed in linux (Ubuntu):
assignee: Anthony Wong (anthonywong) → Kai-Heng Feng (kaihengfeng)
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is it possible to attach said logs?

Changed in linux (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.