I'm getting this on ``` root@abbey:~# head /etc/os-release PRETTY_NAME="Ubuntu 22.04.1 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.1 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" ``` with this hardware: ``` root@abbey:~# lspci -vv -d 144d: 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) (prog-if 02 [NVM Express]) Subsystem: Samsung Electronics Co Ltd PM963 2.5" NVMe PCIe SSD Physical Slot: 0 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [168 v1] Secondary PCI Express LnkCtl3: LnkEquIntrruptEn- PerformEqu- LaneErrStat: 0 Capabilities: [188 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Capabilities: [190 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=10us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Kernel driver in use: nvme Kernel modules: nvme ``` Every time the system boots I get a worrisome email like > SMART error (ErrorCount) detected on host: abbey > > Device: /dev/nvme0, number of Error Log entries increased from 0 to 1 > > Device info: > SAMSUNG MZVPV256HDGL-000L7, S/N:S27MNYAH710579, FW:5L6QBXW7 I was going to throw the chip out, but then I ran `badblocks` over the disk and it found nothing so I got suspicious and decided to investigate deeper. If I use `nvme error-log` (from `apt install nvme-cli`) I can see the errors all look like ``` ................. Entry[ 0] ................. error_count : 23 sqid : 0 cmdid : 0x1019 status_field : 0x2002(INVALID_FIELD: A reserved coded value or an unsupported value in a defined field) phase_tag : 0 parm_err_loc : 0 lba : 0 nsid : 0 vs : 0 trtype : The transport type is not indicated or the error is not transport related. cs : 0 trtype_spec_info: 0 ``` I was able to find a clue from someone on Reddit: https://www.reddit.com/r/DataHoarder/comments/gspbur/nvme_errors_but_smart_selfassessment_passed_need/ > I'm 99.99% sure that 0x4004 is "You tried to talk to me with NVMe 1.z but i only speak NVMe 1.x-y" and that thread suggests that a solution might to be upgrade the Samsung firmware on the drive so that it becomes compatible again -- though that's a relatively difficult process. So would this be an incompatibility with the kernel or with smartmontools? The Debian bug makes it sound like it's with the kernel.