NVME regression on kernel 6.8.0-32+ - Framework 16 crash & reboot after resuming from sleep

Bug #2071604 reported by Luis Alberto Pabón
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Upon ugrade to linux-image-6.8.0-36-generic from 6.8.0-31, my Framework 16 will crash & reboot a couple of minutes after resuming from sleep. I'm attaching to this dmesg + kernel log, but notably the laptop crashes a split second after the following error:

[ 494.141625] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[ 494.141635] nvme nvme0: Does your device have a faulty power saving mode enabled?
[ 494.141638] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug

I have 2 nvme drives:

 * NVME0 Crucial MP600 2230 drive
 * NVME1 WD black SN850X 2280 drive

NVME0 holds a windows 11 install.

Booting with linux-image-6.8.0-31-generic does not show this problem.

Edit: issue also happens with 6.8.0-35, see comments below

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-36-generic 6.8.0-36.36
ProcVersionSignature: Ubuntu 6.8.0-36.36-generic 6.8.4
Uname: Linux 6.8.0-36-generic x86_64
NonfreeKernelModules: zfs
AlsaVersion: Advanced Linux Sound Architecture Driver Version k6.8.0-36-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.28.1-0ubuntu3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: N/A
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Card1.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card1.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
CasperMD5CheckResult: unknown
CurrentDesktop: sway
Date: Mon Jul 1 09:34:46 2024
MachineType: Framework Laptop 16 (AMD Ryzen 7040 Series)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: root=zfs:zroot/ROOT/ubuntu_noble quiet splash rtc_cmos.use_acpi_alarm=1 spl.spl_hostid=0x00bab10c
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-6.8.0-36-generic N/A
 linux-backports-modules-6.8.0-36-generic N/A
 linux-firmware 20240318.git3b128b60-0ubuntu2.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to noble on 2024-04-22 (70 days ago)
dmi.bios.date: 03/27/2024
dmi.bios.release: 3.3
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: 03.03
dmi.board.asset.tag: *
dmi.board.name: FRANMZCP09
dmi.board.vendor: Framework
dmi.board.version: A9
dmi.chassis.asset.tag: FRAGAACPA940950016
dmi.chassis.type: 10
dmi.chassis.vendor: Framework
dmi.chassis.version: A9
dmi.modalias: dmi:bvnINSYDECorp.:bvr03.03:bd03/27/2024:br3.3:svnFramework:pnLaptop16(AMDRyzen7040Series):pvrA9:rvnFramework:rnFRANMZCP09:rvrA9:cvnFramework:ct10:cvrA9:skuFRAGAACP09:
dmi.product.family: 16in Laptop
dmi.product.name: Laptop 16 (AMD Ryzen 7040 Series)
dmi.product.sku: FRAGAACP09
dmi.product.version: A9
dmi.sys.vendor: Framework

Revision history for this message
Luis Alberto Pabón (copong) wrote :
Revision history for this message
Luis Alberto Pabón (copong) wrote :
Revision history for this message
Luis Alberto Pabón (copong) wrote :

This is:

tail -f /var/log/kernel.log /var/log/syslog > syslog-kern.log

during suspend + resume until crash & reboot

Revision history for this message
Luis Alberto Pabón (copong) wrote :

I noticed kernel 6.8.0-35 was in between and I had never installed it. I tested it and the issue also occurs with it.

description: updated
Revision history for this message
Luis Alberto Pabón (copong) wrote :
Revision history for this message
Luis Alberto Pabón (copong) wrote :
Revision history for this message
Luis Alberto Pabón (copong) wrote :

After some further testing with kernels available on the proposed repo:

6.8.0-31: no problem
6.8.0-32: problem
6.8.0-35: problem
6.8.0-36: problem
6.8.0-38: problem

Unfortunately, the changelog for 6.8.0-32 is huge and beyond my ability to parse:

https://bugs.launchpad.net/ubuntu/+source/linux/6.8.0-35.35

summary: - NVME regression: Framework 16 crash & reboot after resuming from sleep
+ NVME regression on kernel 6.8.0-32+ - Framework 16 crash & reboot after
+ resuming from sleep
Revision history for this message
Mario Limonciello (superm1) wrote :

If I was to guess it's one of these patches that causes it.

965f593401bd PCI/ASPM: Update save_state when configuration changes
c12dda119c7a PCI/ASPM: Disable L1 before configuring L1 Substates
7fe5ec02955e PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
014516361233 PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
9b6b60b75971 PCI/ASPM: Move pci_save_ltr_state() to aspm.c
1b9713d953d3 PCI/ASPM: Always build aspm.c
459c2d9f06f5 PCI/ASPM: Move pci_configure_ltr() to aspm.c

If it's one of those that means it's a regression from https://bugs.launchpad.net/bugs/2042500

Could you bisect between the two tags?

Ubuntu-6.8.0-31.31 and
Ubuntu-6.8.0-32.32

If not; can you try a few other data points from mainline kernels to understand if it's working in mainline?
https://kernel.ubuntu.com/mainline/v6.8.5/
https://kernel.ubuntu.com/mainline/v6.8.12/
https://kernel.ubuntu.com/mainline/v6.9/
https://kernel.ubuntu.com/mainline/v6.9.4/

tags: added: regression-update
Revision history for this message
Luis Alberto Pabón (copong) wrote :

Thank you Mario. I'm looking into how to do bisection, but I'm limited on what I can do with my set up as I use a ZFS root. I'll report back.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Luis Alberto Pabón (copong) wrote :

I recompiled the 6.8.0-36 kernel with that patch and that solved the issue. I'll report on the other ticket. Thank you all.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Luis Alberto Pabón (copong) wrote :
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.