NVME down errors after kernel update

Bug #1994059 reported by EPx
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After the latest kernel update (from 5.15.0-50-generic to 5.15.0-52-generic) some NVME errors show up in dmesg like

nvme nvme0: controller is down; will reset: CSTS=0xffffffff PCI_STATUS=0x10
nvme 0000:01:00.0: enabling device (0000 -> 0002)
nvme nvme0: Removing after probe failure status: -19
nvme0n1: detected capacity change from 1000215216 to 0

(Typed from a picture, could not send actual dmesg since the machine freezes shortly after losing the root filesystem and no new program can be loaded.)

Tried some tricks like kernel parameters nvme_core.default_ps_max_latency_us=0 and pcie_aspm=off, suggested in the Internet to solve similar problems. It only seems to delay the freeze average time from 20-30 min to 120-180 min but does not solve the issue.

Going back to kernel 5.15.0-50 seems to have stabilized the machine.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-52-generic 5.15.0-52.58
ProcVersionSignature: Ubuntu 5.15.0-50.56-generic 5.15.60
Uname: Linux 5.15.0-50-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: epx 1175 F.... pipewire-media-
                      epx 1176 F.... pulseaudio
 /dev/snd/seq: epx 1174 F.... pipewire
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Mon Oct 24 14:57:35 2022
InstallationDate: Installed on 2022-10-17 (6 days ago)
InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809.1)
MachineType: SAMSUNG ELECTRONICS CO., LTD. 550XDA
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.15.0-50-generic root=/dev/mapper/vgubuntu-root ro quiet splash mitigations=off nvme_core.default_ps_max_latency_us=0 pcie_aspm=off vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-50-generic N/A
 linux-backports-modules-5.15.0-50-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.5
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/23/2022
dmi.bios.release: 5.19
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: P17CFB.044.220223.HQ
dmi.board.asset.tag: No Asset Tag
dmi.board.name: NP550XDA-KH3BR
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: SGLB187A0D-C01-G001-S0001+10.0.22000
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrP17CFB.044.220223.HQ:bd02/23/2022:br5.19:svnSAMSUNGELECTRONICSCO.,LTD.:pn550XDA:pvrP17CFB:rvnSAMSUNGELECTRONICSCO.,LTD.:rnNP550XDA-KH3BR:rvrSGLB187A0D-C01-G001-S0001+10.0.22000:cvnSAMSUNGELECTRONICSCO.,LTD.:ct10:cvrN/A:skuSCAI-A5A5-A5A5-TGL3-PCFB:
dmi.product.family: Notebook Plus2
dmi.product.name: 550XDA
dmi.product.sku: SCAI-A5A5-A5A5-TGL3-PCFB
dmi.product.version: P17CFB
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.

Revision history for this message
EPx (elvis-pfutzenreuter) wrote :
Revision history for this message
EPx (elvis-pfutzenreuter) wrote :

Picture of dmesg errors

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
BloodyIron (bloodyiron) wrote :

I'm getting NVMe problems with my OS Samsung 970 EVO Pro after updating my Linux kernel to "5.15.0-52"

This is seriously concerning as I thought my device was failing due to a hard-lock-up and hard reset after it was saying mounted read-only and IO errors one time, and other times just locking up for a minute or two then coming back.

I've had to switch back to 5.15.0-50, and this is a BREAKING CHANGE. This needs to be P1 as this kernel is unusable with NVMe devices as far as I can tell.

Revision history for this message
BloodyIron (bloodyiron) wrote :

Like honestly this kernel should be removed from the repo of available updates.

Revision history for this message
BloodyIron (bloodyiron) wrote :

So why is this not yet reverted? I see that 5.15.0-53 is now available, should I even bother updating my kernel at all now????????? I cannot risk the stability of my system with this broken code.

Revision history for this message
EPx (elvis-pfutzenreuter) wrote :

I suspect there are more kernels that cause instability with this NVME device (don't know whether it is the hardware or the kernel the culprit).

Using kernels 5.15.0-50 and 5.15.0-43 made the system less unstable than the reported kernel (5.15.0-52), the nvme0 power-off message did not show up, but the system still did freeze every <single-digit> hours (hard freeze, not only I/O stop, no chance to ever get dmesg). Sleep/resume did not work 50% of the time, even disconnecting/reconnecting the power cord froze the system sometimes.

Installed a regular SSD in this computer, no freezes ever since. Using the -52 kernel made the nvme0 power-off to show up in dmesg, but it did not cause a crash since the NVME disk is not in use at the moment.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.