APST quirk needed for PM951 Samsung 1TB NVMe Drive

Bug #1805816 reported by Ian Ozsvald
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

This is a fresh posted connected to a very similar now-solved bug ("APST quirk needed for Samsung 512GB NVMe drive"): https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184

My NVMe is the less-common 1TB version. It exhibits similar (possibly identical) behaviour but has not yet been quirked.

I've followed up as Comment #101 but maybe I need to open this as a new issue? https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184/comments/101

Specifically - using Kernel 4.19.0 on Mint 19.0 the default APST settings caused read-only errors within minutes-1 hour, this is fatal. Disabling APST solves the problem, I have this disabled in GRUB now. Reducing the APST settings but not disabling them does not stop the problem.

An example of the read-only result (copied from #101) that occurs if APST is left in its default state:
Oct 31 11:33:04 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 11:33:04 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
Oct 31 11:33:05 ian-XPS-15-9550 kernel: EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)

I had tried `nvme_core.default_ps_max_latency_us=250` but the read-only bug persists, it just takes longer to occur. Disabling APST with `nvme_core.default_ps_max_latency_us=0` has solved the issue.

I've been using my laptop successfully (no read-only errors) for over a week having disabled APST including many suspend/awaken cycles. Prior to this every boot would crash and burn with read-only errors within an hour or so.

This is the drive and kernel (copied from #101):
$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S2FZNYAG801690 PM951 NVMe SAMSUNG 1024GB 1 314.10 GB / 1.02 TB 512 B + 0 B BXV76D0Q

$ uname -a
Linux ian-XPS-15-9550 4.19.0-041900-generic #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -rd
Description: Linux Mint 19 Tara
Release: 19

I am of course happy to upload more info - let me know what's needed?

Could this be added as a quirk to the kernel please?
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
CurrentDesktop: X-Cinnamon
DistroRelease: Linux Mint 19
InstallationDate: Installed on 2018-10-27 (32 days ago)
InstallationMedia: Linux Mint 19 "Tara" - Release amd64 20180717
Package: linux (not installed)
Tags: tara
Uname: Linux 4.19.0-041900-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
CurrentDesktop: X-Cinnamon
DistroRelease: Linux Mint 19
InstallationDate: Installed on 2018-10-27 (32 days ago)
InstallationMedia: Linux Mint 19 "Tara" - Release amd64 20180717
Package: linux (not installed)
Tags: tara
Uname: Linux 4.19.0-041900-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
Ian Ozsvald (ian-x88) wrote :
tags: added: apport-collected tara
description: updated
Revision history for this message
Ian Ozsvald (ian-x88) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Ian Ozsvald (ian-x88) wrote : ProcEnviron.txt

apport information

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1805816

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Ian Ozsvald (ian-x88) wrote : ProcCpuinfoMinimal.txt

apport information

description: updated
Revision history for this message
Ian Ozsvald (ian-x88) wrote : ProcEnviron.txt

apport information

Revision history for this message
Ian Ozsvald (ian-x88) wrote :

Sorry - I've hit a hiccup. `apport` wants me to file extra information (as uploaded above, twice) but upon running at the command line it notes:
`
It appears you are currently running a mainline kernel. It would be better to report this bug upstream at http://bugzilla.kernel.org/ so that the upstream kernel developers are aware of the issue. If you'd still like to file a bug against the Ubuntu kernel, please boot with an official Ubuntu kernel and re-file.
`

I believe I installed 4.19.0 from: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19/ which I presume is the official Ubuntu kernel source - should I be installing from elsewhere?

My first step is to mark this as Complete in case someone can give me sensible guidance and I've _also_ posted it to the Kernel.org Bugzilla (which might be the best place for this bug?):
https://bugzilla.kernel.org/show_bug.cgi?id=201811

Any and all guidance very happily received, thanks for your help, Ian.

Changed in linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Ian Ozsvald (ian-x88) wrote :

Oops - I can mark this issue as "New" but not "Complete" (which in turn was me mis-reading the instruction from the bot in #4). Sorry for the noise.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1805816

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Ian,

Does this still happen?
Can you attach output of `sudo nvme id-ctrl /dev/nvme0`? I'd like to see the firmware version.

Revision history for this message
Ian Ozsvald (ian-x88) wrote :

Thanks for following up - we can close this bug, I no longer have this issue. I have attached info just for the record. I upgraded my system BIOS and the problems went away.

$ sudo nvme id-ctrl /dev/nvme0
[sudo] password for ian:
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S2FZNYAG801690
mn : PM951 NVMe SAMSUNG 1024GB
fr : BXV76D0Q
rab : 2
ieee : 002538
cmic : 0
mdts : 5
cntlid : 1
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
ctratt : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x6
lpa : 0
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 0
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
edstt : 35
dsto : 0
fwug : 0
kas : 0
hctma : 0
mntmt : 0
mxtmt : 0
sanicap : 0
hmminds : 0
hmmaxd : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ioccsz : 0
iorcsz : 0
icdoff : 0
ctrattr : 0
msdbd : 0
ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.20W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.10W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

$ uname -a
Linux ian-XPS-15-9550 4.19.8-041908-generic #201812080831 SMP Sat Dec 8 13:34:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

System BIOS was upgraded to 1.9.0 - I believe this is the thing that fixed the NVMe issues.

Thanks for following up, please feel free to close the bug if you have the details you need (and hopefully these will help someone else).

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

XPS 9550 is the consumer version of Precision 5510, right?

I did some test on Precision 5510 a while back and the BIOS did fix the issue.
The latest BIOS disables CommClk and ClockPM when it's on battery, and disables the ASPM entirely when it's on power adapter.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Revision history for this message
David Pol (daviubu) wrote :

I have the same problem on Thinkpad X1 Carbon 5th gen with NVMe drive. I get read-only rootfs every time I suspend my laptop (e.g. by closing the laptop's lid).

Things were fine under Ubuntu Xenial 16.04. Once I upgraded to the latest LTS 18.04 the problem started. I was using the toshiba NVMe drive (256GB). Then I upgraded to the following drive (samsung ssd 970 evo 500GB) however the problem persisted:

# nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S4EVNG0M217854Y
mn : Samsung SSD 970 EVO Plus 500GB
fr : 1B2QEXM7
rab : 2
ieee : 002538
cmic : 0
mdts : 9
cntlid : 4
ver : 10300
rtd3r : 30d40
rtd3e : 7a1200
oaes : 0
ctratt : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 358
cctemp : 358
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 500107862016
unvmcap : 0
rpmbs : 0
edstt : 35
dsto : 0
fwug : 0
kas : 0
hctma : 0x1
mntmt : 356
mxtmt : 358
sanicap : 0
hmminds : 0
hmmaxd : 0
sqes : 0x66
cqes : 0x44
maxcmd : 0
nn : 1
oncs : 0x5f
fuses : 0
fna : 0x5
vwc : 0x1
awun : 1023
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ioccsz : 0
iorcsz : 0
icdoff : 0
ctrattr : 0
msdbd : 0
ps 0 : mp:7.80W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.40W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:210 exlat:1200 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0100W non-operational enlat:2000 exlat:8000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

I tried different kernels. 4.15.18 , 4.20.17 , 5.1.15.
Currently on:
 4.15.18-041518-generic #201804190330 SMP Thu Apr 19 07:34:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

None seem to fix it.

I tried the suggested solution by setting nvme_core.default_ps_max_latency_us=
to 5500, 200, and 0
Currently using:
nvme_core.default_ps_max_latency_us=0

APST is now disabled:

# nvme get-feature -f 0x0c -H /dev/nvme0
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
 Autonomous Power State Transition Enable (APSTE): Disabled
 Auto PST Entries .................
 Entry[ 0]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0

The problem persists however.... As soon as I close the lid / suspend the laptop, it won't recover from this suspend, instead it hangs with the black screen with

EXT4-fs error (device nvme0n1p2) : I/O error while writing superblock
and so on...

Is there any solution to this yet?

Thanks

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

What's the output of `cat /sys/power/mem_sleep`?

Revision history for this message
Junior Lawrence Kibirige (juniorkibirige) wrote :

Just merry with the same error running a Lenovo Thinkpad T470s with a Toshiba Nvme 256GB ssd

Revision history for this message
Junior Lawrence Kibirige (juniorkibirige) wrote :

My output to cat /sys/power/mem_sleep is s2idle [deep]

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.