FS disconnects on WD PC SN810 SDCPNRZ HP FW

Bug #2047712 reported by Martin Kvanta
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hello,

I bought notebook HP ZBook Firefly 14 G10 A with AMD Ryzen 7 7840HS CPU.
This notebook have nvme SSD WD PC SN810 SDCPNRZ-2T00 with HPS2 firmware.

When I use standard Ubuntu kernel 6.5.0-14-generic, disk regularly disconnects and FS become ReadOnly.
I found similar issue with eg. Kingston A2000 nvme disk. This seems to Firmware bug.
I tried same solution and this worked.

As first workaround I tried add kernel parameter "nvme_core.default_ps_max_latency_us=10000".
Behavior was much better, but sometimes disk still disconnects (after connect disconnect from Thunderbolt docking station or after several sleep/wake cycles)

Second workaround was update kernel source code and setup quirk "NVME_QUIRK_NO_DEEPEST_PS" for disk identifier. Then I build kernel package and install it on notebook with Ubuntu 23.10.
After ~4days of usage it seems that this is working solution.

Now I running my own kernel build 6.5.3 (from sources downloaded via apt package manager), this is reason why I cannot provide "ubuntu-bug linux" output.

Will be possible to add this solution to the default kernel image please?
I'm not a developer and not sure if this is all you need.
I can provide additional info or testing if will be needed.

------
Updated was file "linux-6.5.0/drivers/nvme/host/pci.c" with following lines:

--- drivers-nvme-host-pci.c.original 2023-12-29 20:48:38.048105439 +0100
+++ drivers-nvme-host-pci.c.new 2023-12-29 20:48:38.064104967 +0100
@@ -3399,6 +3399,8 @@
   .driver_data = NVME_QUIRK_BOGUS_NID, },
  { PCI_DEVICE(0x15b7, 0x2001), /* Sandisk Skyhawk */
   .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
+ { PCI_DEVICE(0x15b7, 0x5011), /* Sandisk Corp WD PC SN810 SDCPNRZ-2T00 (HPS2 FW) */
+ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, },
  { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */
   .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
  { PCI_DEVICE(0x144d, 0xa80b), /* Samsung PM9B1 256G and 512G */

------
$ lsb_release -rd
No LSB modules are available.
Description: Ubuntu 23.10
Release: 23.10

------
$ apt-cache policy linux-image-6.5.0-14-generic
linux-image-6.5.0-14-generic:
  Installed: 6.5.0-14.14
  Candidate: 6.5.0-14.14
  Version table:
 *** 6.5.0-14.14 500
        500 http://sk.archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
        100 /var/lib/dpkg/status

$ apt-cache policy linux-image-6.5.3
linux-image-6.5.3:
  Installed: 6.5.3-2
  Candidate: 6.5.3-2
  Version table:
 *** 6.5.3-2 100
        100 /var/lib/dpkg/status

------
$ sudo smartctl -ic /dev/nvme0n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.3] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: WD PC SN810 SDCPNRZ-2T00-1006
Serial Number:
Firmware Version: HPS2
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 2 048 408 248 320 [2,04 TB]
Unallocated NVM Capacity: 0
Controller ID: 8224
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2 048 408 248 320 [2,04 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 8b4a02b37b
Local Time is: Fri Dec 29 21:04:43 2023 CET
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 88 Celsius
Namespace 1 Features (0x02): NA_Fields

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
 0 + 8.25W 8.25W - 0 0 0 0 0 0
 1 + 3.50W 3.50W - 0 0 0 0 0 0
 2 + 2.60W 2.60W - 0 0 0 0 0 0
 3 - 0.0250W - - 3 3 3 3 5000 10000
 4 - 0.0035W - - 4 4 4 4 3900 45700

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
 0 + 512 0 2
 1 - 4096 0 1

------

Thanks

Revision history for this message
Martin Kvanta (markvan) wrote :
Revision history for this message
Martin Kvanta (markvan) wrote :

Hello,

5 minutes after I submitted this bug FS was disconnected again.
Therefore it seems that the issue was not solved.

When I tried to check status of disk, there was no response from nvme controller.
Tried smartclt and nvme tool.

Revision history for this message
Martin Kvanta (markvan) wrote :

Hello,

I tried some additional tasks.
Added remote- logging to other device to be able catch error.
Below is what I found.

In meantime I tried Mainline kernel 6.7.0 but still same error.

Then I tried add other Quirk (NVME_QUIRK_DELAY_BEFORE_CHK_RDY) for this disk in "linux/drivers/nvme/host/pci.c".
But after kernel recompile and install I receive still same error.

And also I tried same kernel parameters as was recommended in error message.
After some time of work I received same error.

Can I ask for some help or recommendation please?

This is what I found in logs:

Jan 14 14:17:09 marvin kernel: [52934.215144] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Jan 14 14:17:09 marvin kernel: [52934.215157] nvme nvme0: Does your device have a faulty power saving mode enabled?
Jan 14 14:17:09 marvin kernel: [52934.215162] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jan 14 14:17:09 marvin kernel: [52934.315175] nvme 0000:02:00.0: enabling device (0000 -> 0002)
Jan 14 14:17:09 marvin kernel: [52934.315562] nvme nvme0: Disabling device after reset failure: -19
Jan 14 14:17:09 marvin kernel: [52934.356714] EXT4-fs (nvme0n1p2): shut down requested (2)
Jan 14 14:17:09 marvin kernel: [52934.356718] Aborting journal on device nvme0n1p2-8.
Jan 14 14:17:09 marvin kernel: [52934.356722] JBD2: I/O error when updating journal superblock for nvme0n1p2-8.
Jan 14 14:17:57 marvin udisksd[1948]: Error performing housekeeping for drive /org/freedesktop/UDisks2/drives/WD_PC_SN810_SDCPNRZ_2T00_1006_23306V800421: Error updating Health Information: Failed to open device '/dev/nvme0': Resource temporarily unavailable (g-bd-nvme-error-quark, 2)

Thanks

Revision history for this message
Farkas Miklós (miklosfarkas) wrote :

Hi!

The same issue is happening to me as well! In practice, I can't put the laptop to sleep because within random 1-40 minutes after waking up, the device becomes read-only!
I've tried various kernel parameters, but without success.
It's very unpleasant! I'm open to any solutions :)

Thanks
----

root@farkasm-z:/home/farkasm# lsb_release -rd
Description: Ubuntu 22.04.3 LTS
Release: 22.04

----

sudo smartctl -ic /dev/nvme0n1
Place your right index finger on the fingerprint reader
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.7.0-060700-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: WD PC SN810 SDCPNRZ-2T00-1006
Serial Number: 23240W800405
Firmware Version: HPS2
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 2.048.408.248.320 [2,04 TB]
Unallocated NVM Capacity: 0
Controller ID: 8224
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2.048.408.248.320 [2,04 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 001b44 8b4ab177c1
Local Time is: Fri Jan 19 10:46:05 2024 CET
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 88 Celsius
Namespace 1 Features (0x02): NA_Fields

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
 0 + 8.25W 8.25W - 0 0 0 0 0 0
 1 + 3.50W 3.50W - 0 0 0 0 0 0
 2 + 2.60W 2.60W - 0 0 0 0 0 0
 3 - 0.0250W - - 3 3 3 3 5000 10000
 4 - 0.0035W - - 4 4 4 4 3900 45700

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
 0 + 512 0 2

----
uname -r
6.7.0-060700-generic

Revision history for this message
Martin Kvanta (markvan) wrote :

Hello,

After some additional investigation it seems more to some kind of problem with Standby mode.
I checked this discussion and tried the AMD S2IDLE analysis script to debug ACPI standby.
https://discussion.fedoraproject.org/t/please-improve-the-s0ix-experience-under-linux/79113/2
But no new info found.

As next step I will try to install other SSD from Kingston to check if this is Disk or System error.

For now I configured system to use Hibernation instead to Standby (only available is s2idle).
I followed this Discussion to enable Hibernation in System menu:
https://askubuntu.com/questions/1405990/hibernate-entry-in-the-menu-on-22-04

m

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.