"nvme nvme0: Abort status: 0x0" / "nvme nvme0: I/O 14 QID 2 timeout, aborting"

Bug #1991291 reported by RevAngel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu
Confirmed
Undecided
Unassigned

Bug Description

I am using 5.19 and have that issue from (the beginning is a guess) 5.19.1 to the now recent, and now used by me, 5.19.11 from ubuntu 22.04 mainline.

Gathered from ubuntu logs ("Protokolle" on my German version of ubuntu) in "Hardware":
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 14 QID 2 timeout, aborting
nvme nvme0: Abort status: 0x0
nvme nvme0: I/O 62 QID 2 timeout, aborting
and so on... (the I/O number and QID number changes)

I have these issues after I changed from a AMD 2400G with an AM3 AGESA 1.0.0.6 (on X370 chipset) on ubuntu 20.04 to a AMD 5600G on a AM3 with AGESA 1.2.0.7 (on A520 chipset) ubuntu 22.04.01 LTS. NVME drive is the same.

The NVME drive never breached the high temp count and shows zero errors and very little wear on SMART tests, since I use this drive for a simple "daily use" multimedia system.

When the "nvme nvme0: Abort status: 0x0" / "nvme nvme0: I/O 14 QID 2 timeout, aborting" errors occur, the system hangs for a while, no I/O operations get processed for up to 30 seconds. Then the system works just normal again, until the next error arrives. These errors occur between 2 minutes and several hours.

The NVME drive is a ADATA SX6000LNP Firmware Version: V9001c00 (gathered from gSMARTcontrol info)

I found the same issues on different brands of Linux and kernels also here:
https://github.com/clearlinux/distribution/issues/2121
https://github.com/vmware/open-vm-tools/issues/579

If I can be of any help in providing more information, please consider my knowledge as "user-level", so I can read and I can use the terminal and try to find conclusions. But I am not a linux pro. So please be so kind and help me with small steps and concrete commands if I can provide further information about this issue (thank you in advance for that).

RevAngel (revangel)
description: updated
RevAngel (revangel)
description: updated
Guruprasad (lgp171188)
affects: launchpad → ubuntu
Revision history for this message
RevAngel (revangel) wrote :

Bug stopped happening after updating to Kernel mainline 6.0.3

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
Peter Passchier (peter-passchier) wrote :

How did you update to mainline 6.0.3??

Revision history for this message
RevAngel (revangel) wrote :

@Peter Passchier: Uhm, if you want an easy way on a x86 / x64 CPU, you can use the 15$ program Ukuu (short for Ubuntu Kernel Update Utility). The program was open source before, code from 2008 is on github and as far as I know no fork exists with a more recent update. Or you read up how and do it manually with the sources from https://kernel.ubuntu.com/~kernel-ppa/mainline/ . I am on mainline 6.2.8 right now.

Revision history for this message
RevAngel (revangel) wrote :

@Peter Passchier:
Wait, there is a fork:
https://github.com/bkw777/mainline
and an alternative:
https://github.com/orhun/kmon

Revision history for this message
RevAngel (revangel) wrote :

@Peter Passchier:
Be warned, that using a recent mainline kernel has its risks.

from https://www.linuxuprising.com/2018/10/2-utilities-to-install-latest-kernel-in.html
WARNING - Please read before updating the kernel

Before installing anything you should know that usually it's not a good idea to install a mainline kernel on your Ubuntu machine. These kernels are built from the latest Linux sources, without any Ubuntu patches or any other modifications, and are unsupported.

What's more, installing a kernel from the Mainline Kernel PPA usually breaks proprietary drivers or out-of-tree modules, like the proprietary Nvidia graphics drivers, Broadcom wireless drivers, VirtualBox dkms module, and so on. As a result, your computer may boot to a black screen, you may experience random freezes, and / or your WiFi may not be working after installing and booting to a mainline kernel.

As an example, I installed the latest Linux 4.19 while having the Nvidia 396.54 graphics drivers installed, and the Nvidia module failed to build. Luckily, the Nvidia Graphics PPA has a newer drivers version that supports Linux 4.19 - Nvidia 410, so I installed that to solve the issue. But if Nvidia 410 hadn't been released or if my graphics card wouldn't have supported the latest version of the drivers from the PPA, my computer would have booted to a black screen using the 4.19 kernel (or I'd have to remove the proprietary Nvidia drivers and use Nouveau instead).

Revision history for this message
RevAngel (revangel) wrote :

Since nothing has happened since my last posts, I just wanted to write that this issue has happened with 6.7 kernel series and is now happening with 6.8(.0) as well.

On 6.7 is was so bad, that I/O was lost and only a cold boot could recover the system. Stability for about 2-3 days, but sometimes less than 2 hours without the need for a cold boot reset. I have no clue what causes the instability, the applications vary very much when it happens. But mainly when browsing and loading new pages/tabs and/or skipping while viewing videos.

Error messages showing on the log (nvme nvme0: Abort status: 0x0) can be between two hours and two minutes, even on a completely idle system.

Secondary log messages vary, but most are like this:

nvme nvme0: I/O tag 512 (f200) opcode 0x1 (I/O Cmd) QID 3 timeout, aborting req_op:WRITE(1) size:4096

No error log possible, when the OS looses I/O, because from that point on it is not possible to even load a secondary workspace or logout / shutdown / restart the OS any more. Cold boot necessary.

Revision history for this message
RevAngel (revangel) wrote :

Installed and booted 6.8(.0) kernel today, so I cannot tell yet if there is a change to 6.7 series.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.