APST quirk needed for Samsung 512GB NVMe drive

Bug #1678184 reported by Andy Clayton on 2017-03-31
212
This bug affects 41 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Seth Forshee
Yakkety
Medium
Unassigned
Zesty
Medium
Unassigned

Bug Description

APST support just landed in the latest Zesty kernel (4.10.0-14.16) as part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602. That patch has a quirk for certain 256GB Samsung drives found in Dell laptops that do not behave well when APST is enabled. I am experiencing the same symptoms with the same model laptop except with a 512GB Samsung. Prior to manually disabling APST the drive would die and system would go down in flames with I/O errors within 20 to 40 minutes of boot.

$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 ************** PM951 NVMe SAMSUNG 512GB 1 500.20 GB / 512.11 GB 512 B + 0 B BXV76D0Q

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: linux-image-4.10.0-14-generic 4.10.0-14.16
ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
Uname: Linux 4.10.0-14-generic x86_64
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ajclayton 3305 F.... pulseaudio
CurrentDesktop: Unity:Unity7
Date: Fri Mar 31 09:42:38 2017
InstallationDate: Installed on 2012-09-08 (1665 days ago)
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
MachineType: Dell Inc. XPS 15 9550
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-14-generic root=UUID=779e5929-5ffb-49b1-9786-1adcde824b7d ro rootflags=subvol=@ noprompt nouveau.modeset=0 log_buf_len=20M
RelatedPackageVersions:
 linux-restricted-modules-4.10.0-14-generic N/A
 linux-backports-modules-4.10.0-14-generic N/A
 linux-firmware 1.164
SourcePackage: linux
UpgradeStatus: Upgraded to zesty on 2017-03-07 (23 days ago)
dmi.bios.date: 04/07/2016
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 01.02.00
dmi.board.name: 0N7TVV
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr01.02.00:bd04/07/2016:svnDellInc.:pnXPS159550:pvr:rvnDellInc.:rn0N7TVV:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: XPS 15 9550
dmi.sys.vendor: Dell Inc.

CVE References

Andy Clayton (q3aiml) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Seth Forshee (sforshee) on 2017-04-03
Changed in linux (Ubuntu):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → Medium
status: Confirmed → In Progress
Kai-Heng Feng (kaihengfeng) wrote :

@Andy,

Can you install `nvme-cli` and attach the output of `nvme id-ctrl /dev/nvme0`?

Seth Forshee (sforshee) wrote :

Please test this kernel and let me know whether it fixes your problem. Thanks!

http://people.canonical.com/~sforshee/lp1678184/linux-4.10.0-16.18+lp1678184v201704030935/

Changed in linux (Ubuntu):
status: In Progress → Incomplete
Andy Clayton (q3aiml) wrote :

That quirk looks good! Running stable and also verified that `nvme get-feature -f 0x0c -H /dev/nvme0` reports Disabled.

Thanks for the quick responses!

Here's the id-ctrl output in case it is still helpful:

$ sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : **************
mn : PM951 NVMe SAMSUNG 512GB
fr : BXV76D0Q
rab : 2
ieee : 002538
cmic : 0
mdts : 5
cntlid : 1
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x6
lpa : 0
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 0
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.20W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.10W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :

@Andy,

Are you willing to help me testing if ps2/ps3 works?
My guess is that ps2 (most powersaving operational state) works but not ps3/ps4.

So I need your help to try kernel parameter "nvme_core.default_ps_max_latency_us=250", and "nvme_core.default_ps_max_latency_us=3000".

Thanks!

Kai-Heng Feng (kaihengfeng) wrote :

BTW you should use the kernel from repo instead of kernel from comment #4.

Andy Clayton (q3aiml) wrote :

Sure thing. I have been running with "nvme_core.default_ps_max_latency_us=250" and repo kernel for about 24 hours now without issue. Confirmed with nvme that APST is enabled. I will switch to "nvme_core.default_ps_max_latency_us=3000" later today or tomorrow.

$ sudo nvme get-feature -f 0x0c -H /dev/nvme0
get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
 Autonomous Power State Transition Enable (APSTE): Enabled

$ uname -a
Linux TEC-AJC2-U 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

André Düwel (aduewel) wrote :

Hi, I have set "nvme_core.default_ps_max_latency_us=3000" now for over a day now and its working, too. No issues until now.

@Kai-Heng Feng: do you need any other testing?

Kai-Heng Feng (kaihengfeng) wrote :

Yes, can you try 6000 again? 3000 doesn't make it enters to PS3. My bad.

Nils (maeher) wrote :

I'm having the exact same problem (drive dying with filesystem read errors and system becoming unusable after using it for a while)

However, my nvme SSD is not from Samsung, but from Toshiba:

nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 177S10WYTAMT THNSF5256GPUK TOSHIBA 1 256.06 GB / 256.06 GB 512 B + 0 B 51025KLA

nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x1179
ssvid : 0x1179
sn : 177S10WYTAMT
mn : THNSF5256GPUK TOSHIBA
fr : 51025KLA
rab : 1
ieee : 00080d
cmic : 0
mdts : 0
cntlid : 0
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
oacs : 0x17
acl : 3
aerl : 3
frmw : 0x2
lpa : 0x2
elpe : 127
npss : 5
avscc : 0
apsta : 0x1
wctemp : 351
cctemp : 355
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1e
fuses : 0
fna : 0x4
vwc : 0x1
awun : 255
awupf : 0
nvscc : 0
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:8.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:3.90W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:2.00W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.1200W non-operational enlat:1000 exlat:1000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0120W non-operational enlat:5000 exlat:10000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-
ps 5 : mp:0.0060W non-operational enlat:100000 exlat:50000 rrt:5 rrl:5
          rwt:5 rwl:5 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :

@Nils,

Is your laptop XPS 9550/Precision 5510, too?

Nils (maeher) wrote :

No, it's a Lenovo ThinkPad X270. Do you need any other information?

Ben (white00lightning) wrote :
Download full text (3.2 KiB)

I'm also experiencing issues since ugprading to 17.04. When I switch to another TTY (via ctrl-alt-f1) the following errors get printed to the screen:

[ 746.341551] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #525023: comm NetworkManager: reading directory iblock 0
[ 746.343318] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524289: comm pool: reading directory iblock 0
[ 746.356125] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272213: comm systemd-udevd: reading directory iblock 0
[ 746.356139] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.356332] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272193: comm systemd-udevd: reading directory iblock 0
[ 746.356338] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272825: comm systemd-udevd: reading directory iblock 0
[ 746.356400] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.474632] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524539: comm unity-settings-: reading directory iblock 0
[ 746.992814] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506108: comm BrowserBlocking: reading directory iblock 0
[ 746.304451] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #5506117: comm BrowserBlocking: reading directory iblock 0

I also tried to run `smartctl -t long /dev/nvme0n1p7` however the results seem to indicate that the tool doesn't work with my particular SSD:

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.10.0-19-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: PM951 NVMe SAMSUNG 512GB
Serial Number: S29PNX0H611013
Firmware Version: BXV77D0Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Utilization: 254,982,533,120 [254 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Apr 17 17:45:48 2017 AEST
Firmware Updates (0x06): 3 Slots
Optional Admin Commands (0x0017): Security Format Frmw_DL *Other*
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
 0 + 6.00W - - 0 0 0 0 5 5
 1 + 4.20W - - 1 1 1 1 30 30
 2 + 3.10W - - 2 2 2 2 100 100
 3 - 0.0700W - - 3 3 3 3 500 5000
 4 - 0.0050W - - 4 4 4 4 2000 22000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
 0 + 512 0 0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

----...

Read more...

Kai-Heng Feng (kaihengfeng) wrote :

@Nils,
With the current default value, your NVMe will transit to PS4. So let's add kernel parameter 'nvme-core.nvme_core.default_ps_max_latency_us=2000' to test if PS3 also shares the same issue.
If this issue persists, use 'nvme-core.nvme_core.default_ps_max_latency_us=0' to turn APST off.

@Ben,
Is your laptop XPS 9550/Precision 5510, too?

Please try 'nvme-core.nvme_core.default_ps_max_latency_us=5500', if the issue persists, please try 'nvme-core.nvme_core.default_ps_max_latency_us=200'.

Kai-Heng Feng (kaihengfeng) wrote :

@Ben, I saw that you uses XPS 9550, the comment was clipped at the bottom.

André Düwel (aduewel) wrote :

@Kai-Heng Feng: I will test now with 6000, so far it looks good, but hadn't much time to test during the holidays. I report back later this day.

Additional Info for my XPS15 9550:
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S29PNXAGB08057
mn : PM951 NVMe SAMSUNG 512GB
fr : BXV77D0Q
rab : 2
ieee : 002538
cmic : 0
mdts : 5
cntlid : 1
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x6
lpa : 0
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 0
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.20W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.10W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Shantanu Goel (shantanu-goel) wrote :

@Kai-heng Feng: I have a precision 5510 and faced same issue. I'm running with "nvme_core.default_ps_max_latency_us=6000" since last 3 days without any problem now.

Chris Sanders (chris.sanders) wrote :

I have an xps15 9550 same as above, same controller, same issue. I set nvme_core.default_ps_max_latency_us=6000 this morning and have ~1 hour right now with no crash. I'll continue to run it today and report back.

The only issue I have seen with the 6k setting which I didn't not see with a setting of 250 was a black screen on boot with an external display connected. I can't imagine how that's affected by this setting. I was able to boot by turning off the display until the laptop was booted and then turning it on which seems to be working. This reliably repeats with the 6k kernel setting with no problem on the 250 setting, at least on the 3-4 reboots I've done in the last two days.

André Düwel (aduewel) wrote :

@Kai-Heng Feng: Almost same here, the whole working day without any issues. should I do a negative test with 23000 to confirm that ps/4 is the issue for Samsung PM951?

Kai-Heng Feng (kaihengfeng) wrote :

I don't think that's necessary, the default value (25000) is larger than 23000, so PS4 is enabled in this case.

Thank you for all the testing - this information will be useful for patch author, Dell and Samsung.

Antoine Hedgecock (macnibblet) wrote :

Do we have an ETA on the patch coming out?

André Düwel (aduewel) on 2017-04-18
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Francesco (flying-mugs) wrote :

I want to report that I got the same bug under Ubuntu 16.04 with hardware enablement stack. The bug came with a recent update of the linux-generic-hwe, I now have 4.8.0.48.20 and I am affected.
I also have an XPS 9550 with the 512GB Samsung NVMe. Was the APST support backported to 4.8?

For the moment my workaround is using the old 4.4 kernel, but I will try the kernel parameters as well...

Info about my drive:
$ sudo nvme list
Node SN Model Version Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S29PNXAGC05771 PM951 NVMe SAMSUNG 512GB 1.1 1 450,10 GB / 512,11 GB 512 B + 0 B BXV77D0Q

Nils (maeher) wrote :

I have to admit that I'm not sure how or where to set 'nvme-core.nvme_core.default_ps_max_latency_us=2000'.

I assumed that it's a kernel boot parameter that I can add in grub? But when I do nothing changes and "sysctl -a", which I assumed should list the changed parameter does not show anything related to nvme at all.

If you could advise me how or where to set that parameter, I will do so.

Chris Sanders (chris.sanders) wrote :

I can now confirm that I've had a stable machine for more than 7 hours using 6k for latency. I think the longest I had with out the option was ~3 hours and generally it was ~45 minutes to crash.

Nils, I'm setting the parameter in /etc/default/grub by appending it to the "GRUB_CMDLINE_LINUX_DEFAULT" options then running `update-grub`. That has worked for me, although I'm not sure how to confirm it took. That is a permanent method, I started by just adding the 250 during boot so it was for a single boot while testing.

Nils (maeher) wrote :

Thanks Chris.
I have tried 'nvme-core.nvme_core.default_ps_max_latency_us=2000' (though I have been unable to verify that the parameter actually changed) and the problem persisted.

Chris Sanders (chris.sanders) wrote :

Nils, from above use 'nvme-core.nvme_core.default_ps_max_latency_us=0' to turn APST off.

That as my first step to confirm this is indeed related to APST and the same bug. If that doesn't do it, I'm afraid you might have another issue.

André Düwel (aduewel) wrote :

@Nils: as Chris mentioned try to set it to 0 and then check with 'sudo nvme get-feature -f 0x0c -H /dev/nvme0' if its really disabled.

Ben (white00lightning) wrote :

@Kai-Heng Feng

I tried 'nvme-core.nvme_core.default_ps_max_latency_us=5500' and got a crash.

To confirm I have added the boot parameter:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.10.0-19-generic.efi.signed root=UUID=7cf9f325-5277-4249-b399-4df583e6286b ro quiet splash nvme-core.nvme_core.default_ps_max_latency_us=5500 vt.handoff=7

Will now try with 'nvme-core.nvme_core.default_ps_max_latency_us=200'. Will let you know within 24 hours if it happens again

Thanks a lot for you hard work on this mate, we all really appreciate it :)

If there's any other info you need, just let me know.

Nils (maeher) wrote :

Hm, ok I seem to be unable to disable APST. :(

In '/etc/default/grub', I have

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme-core.nvme_core.default_ps_max_latency_us=0"

and I get

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.10.0-19-generic.efi.signed root=UUID=365f1a9c-9598-4ad5-a387-d02f771767a1 ro quiet splash nvme-core.nvme_core.default_ps_max_latency_us=0 vt.handoff=7

but when I try to confirm that it's disabled, I get

sudo nvme get-feature -f 0x0c -H /dev/nvme0
get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
 Autonomous Power State Transition Enable (APSTE): Enabled
 Auto PST Entries .................

Not sure if I'm doing something wrong.

Kai-Heng Feng (kaihengfeng) wrote :

@Ben, @Nils,

You have two 'nvme-core' in the parameter...

Nils (maeher) wrote :

Hm, thanks. That's copied verbatim from your comment above. ;)
I did not notice the inconsistency with the rest of the comments.

In the meantime I DID manage to disable APST with

sudo nvme set-feature -f 0x0c -v=0 /dev/nvme0

and that seems to have helped (no crash for 9 hours). I'll retry with 2000 now.

Nils (maeher) wrote :

With nvme_core.default_ps_max_latency_us=2000 the problem persists.
However, everything seems fine with nvme_core.default_ps_max_latency_us=0.

Kai-Heng Feng (kaihengfeng) wrote :

So on your machine (X270 + Toshiba), the NVMe has this issue on non-operational power states, ps 3. The maximum working states is ps2.

Aras Zagros (zeveen) wrote :

I have a Dell XPS 15 9550 (i7-6700HQ 16GB-RAM 4K 512GB SSD Samsung).

I'm experiencing the same problem that @Ben has described.

My story: I went from 16.04 to 16.10 (no serious issues with the previous versions) and then 17.04 almost a week ago, everything with the previous versions was great but seems like 17.04 has two really bothering issues with my device, this one and "slow mouse pointer issue".

Guys seems like there isn't any solid fixes so far. Should I disable "APST" too?

André Düwel (aduewel) wrote :

@Aras: I have exactly the same Laptop as you have. You do not need to disable it completely, just change in the '/etc/default/grub' the following line

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
to
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=6000"

This way, at least the most power saving states are still enabled (6W if disabled vs. 0,07W with PS/3) :)

Seth Forshee (sforshee) wrote :

@kaihengfeng - based on the number of incidents here I'm thinking that enabling APST was premature. Maybe we just need to revert "nvme: Enable autonomous power state transitions" for now.

André Düwel (aduewel) wrote :

I forgot to mention that you need to run 'update-grub' and reboot your system :)

André Düwel (aduewel) wrote :

@sforshee: You would suggest an other solution, e.g. setting the default value for nvme_core.default_ps_max_latency_us to 0, this way it is still possible to enable the power saving features for experienced users without having an impact to everyone else :)

Ben (white00lightning) wrote :

@kaihengfeng

I haven't gotten any issues so far with 'nvme_core.default_ps_max_latency_us=5500'. That was with around 12 hours of use yesterday (previously, the issue was occurring within 2 hours).

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.10.0-19-generic.efi.signed root=UUID=7cf9f325-5277-4249-b399-4df583e6286b ro quiet splash nvme_core.default_ps_max_latency_us=5500 vt.handoff=7

Stefan Bader (smb) on 2017-05-04
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
status: New → Fix Committed
Stefan Bader (smb) on 2017-05-04
Changed in linux (Ubuntu Yakkety):
importance: Undecided → Medium
status: New → Fix Committed
29 comments hidden view all 109 comments

Hi,

Running 64-bit Ubuntu 16.04 on a new quietpc "sentinel" fanless pc with a 256GB Samsung PM961 Polaris NVMe M.2 SSD.

Came back to my set-up this lunchtime and found that the screen had frozen - first ever crash in 7 years of using ubuntu!

Did Ctrl+Alt+F4 to get to a terminal and got this message:

EXT4-fs error (dev nvme0n1p4:ext4_find_entry:1461:inode #1460718: com (agetty): reading directory block lblock 0

Is it the same issue?

Solution ideas?

Kai-Heng Feng (kaihengfeng) wrote :

Stephen, what linux kernel version do you use?

Hi,

I'm running Zesty on the following:

Linux francois-XPS-15-9560 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

and seem to be affected by the same bug.

I tried the kernel listed in #61 and was stable for at least 2 hours. When I got back from lunch and my computer logged off (blank screen), it had crashed with the following error:

EXT4-fs error (device nvme0n1p1à: ext4_find_entry:1463: inode #1835009: comm (plymouth): reading directory lblock 0
ecryptfs_encrypt_page: Error attempting to write lower page; rc = [-5]

Let me know if there is any other info you'd like from me. Thanks!

$ sudo nvme id-ctrl /dev/nvme0

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33XNX0HC01366
mn : PM961 NVMe SAMSUNG 1024GB
fr : CXY74D1Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x2
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 1024209543168
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0x1
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:5.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.60W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :

Sounds like even with new RSTe policy, we still need to disable PS4 for XPS 9550/Precision 5510 as a temporary solution.

Hi Kai-Heng,

You ask:

"Stephen, what linux kernel version do you use?"

azed@azed-H270N:~$ uname -a
Linux azed-H270N 4.8.0-51-generic #54~16.04.1-Ubuntu SMP Wed Apr 26 16:00:28 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
tags: added: verification-done-zesty
removed: verification-needed-zesty

@kaihengfeng: could you please expand on your comment in #73? is it something that can be done with kernel options? does the kernel in proposed mentioned in #74 also fixing the issue you are mentioning? also note that I have an XPS 9560 (not 9550).

Kai-Heng Feng (kaihengfeng) wrote :

My bad, didn't notice it's XPS 9560...
To disable the deepest power state, you can add this to your kernel parameter:
"nvme_core.default_ps_max_latency_us=6000".

Please report back if this works, thanks!

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
John Neffenger (jgneff) wrote :

I think I hit this problem, but with the 256 GB Lite-On (or LiteOn) NVMe M.2 PCIe CX2 Series Enterprise Solid-State Drive Model CA1-8B256 that came standard in my 2017 Dell Precision Tower 3420.

I'm on the HWE Kernel with the "linux-generic-hwe-16.04" package, having installed from the Ubuntu 16.04.2 LTS amd64 ISO file. I didn't notice any file system errors until I upgraded to the "linux-image-4.8.0-53-generic" package on May 25, 2017. Below is my upgrade schedule showing when the errors occurred:

2017-04-04 Upgraded to 4.8.0-46.49
2017-04-24 Upgraded to 4.8.0-49.52 ← Adds NVMe APST support (LP: #1664602)
2017-05-01 Upgraded to 4.8.0-51.54
2017-05-16 Upgraded to 4.8.0-52.55
2017-05-25 Upgraded to 4.8.0-53.56 ← Started noticing errors on NVMe drive

First my Thunderbird Lightning calendar database got corrupted so I couldn't add events. One time the system just froze, with the mouse cursor and keyboard not working. Yesterday, the root file system remounted read-only after waking from suspend.

This morning I added "nvme_core.default_ps_max_latency_us=0" to the Grub configuration, and after repairing the file system errors with "e2fsck", everything seems to be working fine so far:

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash swapaccount=1 nvme_core.default_ps_max_latency_us=0"

I'm considering moving back to the GA 16.04 kernel, just to be more conservative in my system updates. Is the general advice to use the HWE kernel only if we need support for some new hardware, but otherwise, stay on the GA kernel?

Is there anything in Kernel version 4.8.0-53 that may have caused the problem or made it more likely to appear?

John Neffenger (jgneff) wrote :

Oops, wrong model number on the Lite-On drive in my Dell Precision Tower 3420. It should be Lite-On Model CX2-8B256-Q11.

The "nvme list" command reports:

Node: /dev/nvme0n1
Model: CX2-8B256-Q11 NVMe LITEON 256GB
Version: 1.2
Namespace: 1
Usage: 0.00 B / 256.06 GB
Format: 512 B + 0 B
FW Rev: 48811QD

Kai-Heng Feng (kaihengfeng) wrote :

@John

Can you paste the output of `nvme id-ctrl /dev/nvme0`?

@kaihengfeng, to follow up with #77, using 4.12.0rc3 and the following boot options

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.12.0-041200rc3-generic root=UUID=0768ab04-826b-49ba-81c1-9e51657ecd48 ro quiet splash acpi_enforce_resources=lax nouveau.modeset=0 nvme_core.default_ps_max_latency_us=6000 vt.handoff=7

I still experience this issue.

John Neffenger (jgneff) wrote :

@kaihengfeng Here you go:

$ sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x14a4
ssvid : 0x1b4b
sn : TW0XVRV7LOH006AJ09F1
mn : CX2-8B256-Q11 NVMe LITEON 256GB
fr : 48811QD
rab : 0
ieee : 002303
cmic : 0
mdts : 5
cntlid : 1
ver : 10200
rtd3r : f4240
rtd3e : f4240
oaes : 0
oacs : 0x1f
acl : 3
aerl : 3
frmw : 0x14
lpa : 0x2
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 358
cctemp : 368
mtfa : 50
hmpre : 0
hmmin : 0
tnvmcap : 256060514304
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 7
nvscc : 1
acwu : 0
sgls : 0
ps 0 : mp:8.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.00W operational enlat:5 exlat:5 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:2.10W operational enlat:5 exlat:5 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :

@John, I opened a new bug for LiteOn, LP: #1694596.

Download full text (4.5 KiB)

This is not a Samsung drive

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of John Neffenger
Sent: Tuesday, May 30, 2017 8:50 AM
To: Judy Brock
Subject: [Bug 1678184] Re: APST quirk needed for Samsung 512GB NVMe drive

@kaihengfeng Here you go:

$ sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x14a4
ssvid : 0x1b4b
sn : TW0XVRV7LOH006AJ09F1
mn : CX2-8B256-Q11 NVMe LITEON 256GB
fr : 48811QD
rab : 0
ieee : 002303
cmic : 0
mdts : 5
cntlid : 1
ver : 10200
rtd3r : f4240
rtd3e : f4240
oaes : 0
oacs : 0x1f
acl : 3
aerl : 3
frmw : 0x14
lpa : 0x2
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 358
cctemp : 368
mtfa : 50
hmpre : 0
hmmin : 0
tnvmcap : 256060514304
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 7
nvscc : 1
acwu : 0
sgls : 0
ps 0 : mp:8.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.00W operational enlat:5 exlat:5 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:2.10W operational enlat:5 exlat:5 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.1000W non-operational enlat:5000 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0100W non-operational enlat:50000 exlat:100000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1678184

Title:
  APST quirk needed for Samsung 512GB NVMe drive

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Yakkety:
  Fix Committed
Status in linux source package in Zesty:
  Fix Committed

Bug description:
  APST support just landed in the latest Zesty kernel (4.10.0-14.16) as
  part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602.
  That patch has a quirk for certain 256GB Samsung drives found in Dell
  laptops that do not behave well when APST is enabled. I am
  experiencing the same symptoms with the same model laptop except with
  a 512GB Samsung. Prior to manually disabling APST the drive would die
  and system would go down in flames with I/O errors within 20 to 40
  minutes of boot.

  $ sudo nvme list
  Node SN Model Namespace Usage Format FW Rev
  ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
  /dev/nvme0n1 ************** PM951 NVMe SAMSUNG 512GB 1 500.20 GB / 512.11 GB 512 B + 0 B BXV76D0Q

  ProblemType: Bug
  DistroRelease: Ubuntu 17.04
  Package: linux-image-4.10.0-14-generic 4.10.0-14.16
  ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
  Uname: Linux 4.10.0-14-generic x86_6...

Read more...

Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux - 4.10.0-22.24

---------------
linux (4.10.0-22.24) zesty; urgency=low

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTDT driver
    - clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
    - acpi/arm64: Add SBS...

Changed in linux (Ubuntu):
status: Confirmed → Fix Released

Hi Same issue with a Lenovo X1 yoga with 512ssd

sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S2CYNAAH200772 SAMSUNG MZVKV512HAJH-000L1 1 170.68 GB / 512.11 GB 512 B + 0 B 6L0QBXX7

marco@placebo:~$ sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S2CYNAAH200772
mn : SAMSUNG MZVKV512HAJH-000L1
fr : 6L0QBXX7
rab : 2
ieee : 002538
cmic : 0
mdts : 5
cntlid : 1
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
oacs : 0x7
acl : 7
aerl : 3
frmw : 0x6
lpa : 0x1
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 0
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:6.50W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.40W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.60W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-
marco@placebo:~$

sudo nvme get-feature -f 0x0c -H /dev/nvme0n1
get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
 Autonomous Power State Transition Enable (APSTE): Enabled
 Auto PST Entries .................
 Entry[ 0]
 .................
 Idle Time Prior to Transition (ITPT): 275 ms
 Idle Transition Power State (ITPS): 3
 .................
 Entry[ 1]
 .................
 Idle Time Prior to Transition (ITPT): 275 ms
 Idle Transition Power State (ITPS): 3
 .................
 Entry[ 2]
 .................
 Idle Time Prior to Transition (ITPT): 275 ms
 Idle Transition Power State (ITPS): 3
 .................
 Entry[ 3]
 .................
 Idle Time Prior to Transition (ITPT): 1200 ms
 Idle Transition Power State (ITPS): 4
 .................
 Entry[ 4]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
# there are 16 Entries all like Entry[4]

Kai-Heng Feng (kaihengfeng) wrote :

@François
Add "nvme_core.default_ps_max_latency_us=0" to kernel parameter to see if disable APST completely helps.

@Marco
Add "nvme_core.default_ps_max_latency_us=5500" to kernel parameter to see if disable deepest power state helps.

Launchpad Janitor (janitor) wrote :
Download full text (16.0 KiB)

This bug was fixed in the package linux - 4.10.0-22.24

---------------
linux (4.10.0-22.24) zesty; urgency=low

  * linux: 4.10.0-22.24 -proposed tracker (LP: #1691146)

  * Fix NVLINK2 TCE route (LP: #1690155)
    - powerpc/powernv: Fix TCE kill on NVLink2

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * perf: qcom: Add L3 cache PMU driver (LP: #1689856)
    - [Config] CONFIG_QCOM_L3_PMU=y
    - perf: qcom: Add L3 cache PMU driver

  * No PMU support for ACPI-based arm64 systems (LP: #1689661)
    - drivers/perf: arm_pmu: rework per-cpu allocation
    - drivers/perf: arm_pmu: manage interrupts per-cpu
    - drivers/perf: arm_pmu: split irq request from enable
    - drivers/perf: arm_pmu: remove pointless PMU disabling
    - drivers/perf: arm_pmu: define armpmu_init_fn
    - drivers/perf: arm_pmu: fold init into alloc
    - drivers/perf: arm_pmu: factor out pmu registration
    - drivers/perf: arm_pmu: simplify cpu_pmu_request_irqs()
    - drivers/perf: arm_pmu: handle no platform_device
    - drivers/perf: arm_pmu: rename irq request/free functions
    - drivers/perf: arm_pmu: split cpu-local irq request/free
    - drivers/perf: arm_pmu: move irq request/free into probe
    - drivers/perf: arm_pmu: split out platform device probe logic
    - arm64: add function to get a cpu's MADT GICC table
    - [Config] CONFIG_ARM_PMU_ACPI=y
    - drivers/perf: arm_pmu: add ACPI framework
    - arm64: pmuv3: handle !PMUv3 when probing
    - arm64: pmuv3: use arm_pmu ACPI framework

  * [SRU][Zesty]QDF2400 kernel oops on ipmitool fru write 0 fru.bin
    (LP: #1689886)
    - ipmi: Fix kernel panic at ipmi_ssif_thread()

  * tty: pl011: fix earlycon work-around for QDF2400 erratum 44 (LP: #1689818)
    - tty: pl011: fix earlycon work-around for QDF2400 erratum 44
    - tty: pl011: use "qdf2400_e44" as the earlycon name for QDF2400 E44

  * kernel-wedge fails in artful due to leftover squashfs-modules d-i files
    (LP: #1688259)
    - Remove squashfs-modules files from d-i
    - [Config] as squashfs-modules is builtin kernel-image must Provides: it

  * arm64/ACPI support for SBSA watchdog (LP: #1688114)
    - clocksource: arm_arch_timer: clean up printk usage
    - clocksource: arm_arch_timer: rename type macros
    - clocksource: arm_arch_timer: rename the PPI enum
    - clocksource: arm_arch_timer: move enums and defines to header file
    - clocksource: arm_arch_timer: add a new enum for spi type
    - clocksource: arm_arch_timer: rework PPI selection
    - clocksource: arm_arch_timer: split dt-only rate handling
    - clocksource: arm_arch_timer: refactor arch_timer_needs_probing
    - clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init
      call
    - clocksource: arm_arch_timer: add structs to describe MMIO timer
    - clocksource: arm_arch_timer: split MMIO timer probing.
    - [Config] CONFIG_ACPI_GTDT=y
    - acpi/arm64: Add GTDT table parse driver
    - clocksource: arm_arch_timer: simplify ACPI support code.
    - acpi/arm64: Add memory-mapped timer support in GTDT driver
    - clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
    - acpi/arm64: Add SBS...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Launchpad Janitor (janitor) wrote :
Download full text (4.3 KiB)

This bug was fixed in the package linux - 4.8.0-54.57

---------------
linux (4.8.0-54.57) yakkety; urgency=low

  * linux: 4.8.0-54.57 -proposed tracker (LP: #1692589)

  * CVE-2017-0605
    - tracing: Use strlcpy() instead of strcpy() in __trace_find_cmdline()

  * Populating Hyper-V MSR for Ubuntu 13.10 (LP: #1193172)
    - SAUCE: (no-up) hv: Supply vendor ID and package ABI

  * [Hyper-V] Implement Hyper-V PTP Source (LP: #1676635)
    - hv: allocate synic pages for all present CPUs
    - hv: init percpu_list in hv_synic_alloc()
    - Drivers: hv: vmbus: Prevent sending data on a rescinded channel
    - hv: switch to cpuhp state machine for synic init/cleanup
    - hv: make CPU offlining prevention fine-grained
    - Drivers: hv: vmbus: Fix a rescind handling bug
    - Drivers: hv: util: kvp: Fix a rescind processing issue
    - Drivers: hv: util: Fcopy: Fix a rescind processing issue
    - Drivers: hv: util: Backup: Fix a rescind processing issue
    - Drivers: hv: vmbus: Move the definition of hv_x64_msr_hypercall_contents
    - Drivers: hv: vmbus: Move the definition of generate_guest_id()
    - Revert "UBUNTU: SAUCE: (no-up) hv: Supply vendor ID and package ABI"
    - Drivers: hv vmbus: Move Hypercall page setup out of common code
    - Drivers: hv: vmbus: Move Hypercall invocation code out of common code
    - Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code
    - Drivers: hv: vmbus: Move the extracting of Hypervisor version information
    - Drivers: hv: vmbus: Move the crash notification function
    - Drivers: hv: vmbus: Move the check for hypercall page setup
    - Drivers: hv: vmbus: Move the code to signal end of message
    - Drivers: hv: vmbus: Restructure the clockevents code
    - Drivers: hv: util: Use hv_get_current_tick() to get current tick
    - Drivers: hv: vmbus: Get rid of an unsused variable
    - Drivers: hv: vmbus: Define APIs to manipulate the message page
    - Drivers: hv: vmbus: Define APIs to manipulate the event page
    - Drivers: hv: vmbus: Define APIs to manipulate the synthetic interrupt
      controller
    - Drivers: hv: vmbus: Define an API to retrieve virtual processor index
    - Drivers: hv: vmbus: Define an APIs to manage interrupt state
    - Drivers: hv: vmbus: Cleanup hyperv_vmbus.h
    - hv_util: switch to using timespec64
    - Drivers: hv: restore hypervcall page cleanup before kexec
    - Drivers: hv: restore TSC page cleanup before kexec
    - Drivers: hv: balloon: add a fall through comment to hv_memory_notifier()
    - Drivers: hv: vmbus: Use all supported IC versions to negotiate
    - Drivers: hv: Log the negotiated IC versions.
    - Drivers: hv: Fix the bug in generating the guest ID
    - hv: export current Hyper-V clocksource
    - hv_utils: implement Hyper-V PTP source
    - SAUCE: (no-up) hv: Supply vendor ID and package ABI

  * CIFS: Enable encryption for SMB3 (LP: #1670508)
    - SMB3: Add mount parameter to allow user to override max credits
    - SMB2: Separate Kerberos authentication from SMB2_sess_setup
    - SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup
    - SMB3: parsing for new snapshot timestamp mount parm
    - cifs: Simplify SMB...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Josh Lilly (edenist) wrote :

I would like to add that this bug also affects 256GB versions of the Samsung NVMe drives.
The quirk has been added for the 512GB, can we also have one added for the 256GB?

Here is my output of nvme-cli

**
nvme list

Node SN Model Version Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- -------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S29NNXAGC38399 PM951 NVMe SAMSUNG 256GB 1.1 1 201.30 GB / 256.06 GB 512 B + 0 B BXV77D0Q

nvme id-ctrl /dev/nvme0n1

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S29NNXAGC38399
mn : PM951 NVMe SAMSUNG 256GB
fr : BXV77D0Q
rab : 2
ieee : 002538
cmic : 0
mdts : 5
cntlid : 1
ver : 0
rtd3r : 0
rtd3e : 0
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x6
lpa : 0
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 0
cctemp : 0
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 0
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
ps 0 : mp:6.00W operational enlat:5 exlat:5 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:4.20W operational enlat:30 exlat:30 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:3.10W operational enlat:100 exlat:100 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0700W non-operational enlat:500 exlat:5000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2000 exlat:22000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :

Josh,

Can you file a new bug?

Also attach `lspci -nn` and `sudo nvme get-feature -f 0x0c -H /dev/nvme0` in the new bug. Thanks!

Thim Thom (thimhh) wrote :

Followed the instructions from here, but
nvme_core.default_ps_max_latency_us=6000
did not work for me.

I'm now on nvme_core.default_ps_max_latency_us= 0 and will try if this helps.

17.04 4.12 from mainline, Bios from May, Dell Precision 5510 (from 12/2015), Samsung 512 GB

Kai-Heng Feng (kaihengfeng) wrote :

Thim, what happened to your NVMe? Can you file a new bug?

Steve Roberts (drgrumpy) wrote :

Just landed here with similar issue on Samsung 960 EVO 500GB, system crashes/hangs with disk inaccessible (?) but mostly after resume from suspend, or latest 3 mins after cold boot...

Kernel 4.10.0-26-generic
M/B Asus Prime B350m-A

Seems it is either not fixed or I have another bug....

e.g.
Jul 20 16:32:59 phs08 kernel: [ 190.893571] INVALID_DEVICE_REQUEST device=00:00.0 address=0xfffffffdf8000000 flags=0x0a00]
Jul 20 16:33:05 phs08 kernel: [ 197.010928] nvme 0000:01:00.0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 20 16:33:05 phs08 kernel: [ 197.046980] pci_raw_set_power_state: 4 callbacks suppressed
Jul 20 16:33:05 phs08 kernel: [ 197.046985] nvme 0000:01:00.0: Refused to change power state, currently in D3
Jul 20 16:33:05 phs08 kernel: [ 197.047163] nvme nvme0: Removing after probe failure status: -19
Jul 20 16:33:05 phs08 kernel: [ 197.047182] nvme0n1: detected capacity change from 500107862016 to 0
Jul 20 16:33:05 phs08 kernel: [ 197.047793] blk_update_request: I/O error, dev nvme0n1, sector 0

nvme list

/dev/nvme0n1 S3EUNX0J305518L Samsung SSD 960 EVO 500GB 1.2 1 125.20 GB / 500.11 GB 512 B + 0 B 2B7QCXE7

sudo nvme id-ctrl /dev/nvme0

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S3EUNX0J305518L
mn : Samsung SSD 960 EVO 500GB
fr : 2B7QCXE7
rab : 2
ieee : 002538
cmic : 0
mdts : 9
cntlid : 2
ver : 10200
rtd3r : 7a120
rtd3e : 4c4b40
oaes : 0
oacs : 0x7
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 350
cctemp : 352
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 500107862016
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0x5
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
ps 0 : mp:6.04W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:5.09W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:4.08W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :

What you faced (D3 issue) is not the same as this one (APST issue). Please file a new bug.

Bashar (bashar-mc) wrote :

Same problem here on a Lenovo ideapad Y700 and Ubuntu 16.04
/dev/nvme0n1 SAMSUNG MZVLW512HMJP-000L2

I disabled APST by adding nvme_core.default_ps_max_latency_us=0 to the kernel parameters in /etc/default/grub and works ok, no blocking since then.

Kai-Heng Feng (kaihengfeng) wrote :
Download full text (3.3 KiB)

> On 28 Oct 2017, at 9:10 PM, Bashar <email address hidden> wrote:
>
> Same problem here on a Lenovo ideapad Y700 and Ubuntu 16.04
> /dev/nvme0n1 SAMSUNG MZVLW512HMJP-000L2
>
> I disabled APST by adding nvme_core.default_ps_max_latency_us=0 to the
> kernel parameters in /etc/default/grub and works ok, no blocking since
> then.

Can you file a new bug? Thanks.

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1678184
>
> Title:
> APST quirk needed for Samsung 512GB NVMe drive
>
> Status in linux package in Ubuntu:
> Fix Released
> Status in linux source package in Yakkety:
> Fix Released
> Status in linux source package in Zesty:
> Fix Released
>
> Bug description:
> APST support just landed in the latest Zesty kernel (4.10.0-14.16) as
> part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602.
> That patch has a quirk for certain 256GB Samsung drives found in Dell
> laptops that do not behave well when APST is enabled. I am
> experiencing the same symptoms with the same model laptop except with
> a 512GB Samsung. Prior to manually disabling APST the drive would die
> and system would go down in flames with I/O errors within 20 to 40
> minutes of boot.
>
> $ sudo nvme list
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1 ************** PM951 NVMe SAMSUNG 512GB 1 500.20 GB / 512.11 GB 512 B + 0 B BXV76D0Q
>
> ProblemType: Bug
> DistroRelease: Ubuntu 17.04
> Package: linux-image-4.10.0-14-generic 4.10.0-14.16
> ProcVersionSignature: Ubuntu 4.10.0-14.16-generic 4.10.3
> Uname: Linux 4.10.0-14-generic x86_64
> ApportVersion: 2.20.4-0ubuntu2
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: ajclayton 3305 F.... pulseaudio
> CurrentDesktop: Unity:Unity7
> Date: Fri Mar 31 09:42:38 2017
> InstallationDate: Installed on 2012-09-08 (1665 days ago)
> InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
> MachineType: Dell Inc. XPS 15 9550
> ProcFB: 0 inteldrmfb
> ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.10.0-14-generic root=UUID=779e5929-5ffb-49b1-9786-1adcde824b7d ro rootflags=subvol=@ noprompt nouveau.modeset=0 log_buf_len=20M
> RelatedPackageVersions:
> linux-restricted-modules-4.10.0-14-generic N/A
> linux-backports-modules-4.10.0-14-generic N/A
> linux-firmware 1.164
> SourcePackage: linux
> UpgradeStatus: Upgraded to zesty on 2017-03-07 (23 days ago)
> dmi.bios.date: 04/07/2016
> dmi.bios.vendor: Dell Inc.
> dmi.bios.version: 01.02.00
> dmi.board.name: 0N7TVV
> dmi.board.vendor: Dell Inc.
> dmi.board.version: A00
> dmi.chassis.type: 9
> dmi.chassis.vendor: Dell Inc.
> dmi.modalias: dmi:bvnDellInc.:bvr01.02.00:bd04/07/2016:svnDellInc.:pnXPS159550:pvr:rvnDellInc.:rn0N7TVV:rvrA00:cvnDellInc.:ct9:cv...

Read more...

Lucas Zanella (lucaszanella) wrote :

I'm having this on my Razer Blade Stealth, 512gb samsung SSD, ubuntu 17.10.1, but also had the issue on 17.04 and today at 16.04 after updating (possibly related; or not). I installed 17.10.1 today and immediately updated and the error ocurred. I then reinstalled without updating and I've seen no problems until now.

Could somebody help me? Is this issue fixed for 17.10.1? What should I do?

 sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Kai-Heng Feng (kaihengfeng) wrote :
Download full text (5.4 KiB)

> On 22 Jan 2018, at 9:34 AM, Lucas Zanella <email address hidden> wrote:
>
> I'm having this on my Razer Blade Stealth, 512gb samsung SSD, ubuntu
> 17.10.1, but also had the issue on 17.04 and today at 16.04 after
> updating (possibly related; or not). I installed 17.10.1 today and
> immediately updated and the error ocurred. I then reinstalled without
> updating and I've seen no problems until now.
>
> Could somebody help me? Is this issue fixed for 17.10.1? What should I
> do?

Please file a new bug. This issue is quite specific to different NVMe & motherboard combination.

You can try kernel parameter “nvme_core.default_ps_max_latency_us=1500” to disable PS4.

>
> sudo nvme list
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q
>
>
> NVME Identify Controller:
> vid : 0x144d
> ssvid : 0x144d
> sn : S33UNX0J324060
> mn : SAMSUNG MZVLW512HMJP-00000
> fr : CXY7501Q
> rab : 2
> ieee : 002538
> cmic : 0
> mdts : 0
> cntlid : 2
> ver : 10200
> rtd3r : 186a0
> rtd3e : 4c4b40
> oaes : 0
> oacs : 0x17
> acl : 7
> aerl : 3
> frmw : 0x16
> lpa : 0x3
> elpe : 63
> npss : 4
> avscc : 0x1
> apsta : 0x1
> wctemp : 341
> cctemp : 344
> mtfa : 0
> hmpre : 0
> hmmin : 0
> tnvmcap : 512110190592
> unvmcap : 0
> rpmbs : 0
> sqes : 0x66
> cqes : 0x44
> nn : 1
> oncs : 0x1f
> fuses : 0
> fna : 0
> vwc : 0x1
> awun : 255
> awupf : 0
> nvscc : 1
> acwu : 0
> sgls : 0
> subnqn :
> ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
> rwt:0 rwl:0 idle_power:- active_power:-
> ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
> rwt:1 rwl:1 idle_power:- active_power:-
> ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
> rwt:2 rwl:2 idle_power:- active_power:-
> ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
> rwt:3 rwl:3 idle_power:- active_power:-
> ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
> rwt:4 rwl:4 idle_power:- active_power:-
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1678184
>
> Title:
> APST quirk needed for Samsung 512GB NVMe drive
>
> Status in linux package in Ubuntu:
> Fix Released
> Status in linux source package in Yakkety:
> Fix Released
> Status in linux source package in Zesty:
> Fix Released
>
> Bug description:
> APST support just landed in the latest Zesty kernel (4.10.0-14.16) as
> part of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1664602.
> That patch has a quirk for certain 256GB Samsung drives found in Dell
> laptops that do not behave well when APST is enabled. I am
> experiencing the s...

Read more...

Ian Ozsvald (ian-x88) wrote :
Download full text (4.7 KiB)

I'd like to follow-up on this bug - it affects me using kernel 4.19.0 on a Dell XPS 9550 with a 1TB Samsung drive. Details below. Do I need to open a new bug as I've got a 1TB drive and this bug report was opened for a 512GB drive?

$ uname -a
Linux ian-XPS-15-9550 4.19.0-041900-generic #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-041900-generic root=/dev/mapper/mint--vg-root ro quiet splash vt.handoff=1

$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S2FZNYAG801690 PM951 NVMe SAMSUNG 1024GB 1 314.10 GB / 1.02 TB 512 B + 0 B BXV76D0Q

A few weeks back I reformatted my machine due to Dropbox's requirement to drop encrypted home folders (I went for full-disk encryption). I'm using Linux Mint 19.0. I used to run kernel 4.9.91, I had to stick to that in my previous machine's Mint installation as >4.9.91 had boot issues (e.g. missing firmware for my wifi) and going 4.10+ caused other issues including, from memory, strange disk issues that I didn't track down along with occasional GPU+second screen issues. I was keen to upgrade to 4.10+ but sticking to 4.9.91 gave me a stable machine. I'm keen to push on now that I'm running on a fresh installation.

Having installed Mint 19.0 and upgraded to 4.19.0 I was happy that everything worked...until left idle for a while when I got similar errors to the ones noted here:
Oct 31 11:33:04 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 11:33:04 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
Oct 31 11:33:05 ian-XPS-15-9550 kernel: EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)

After this state the machine would be unusable due to a read-only filesystem.

So far I've only exprerimented with disabling APSTE (using: `sudo nvme set-feature -f 0x0c -v=0 /dev/nvme0`) - if I do this after every awaken from suspend then I'm able to use the laptop for over a week without issues with many suspended sessions. If I forget to disable APSTE after awakening from suspend then I lose the hard-drive to a read-only state in 1-2 hours depending on usage.

I'm going to start experimenting with adding the grub boot parameter to try different values starting with `nvme_core.default_ps_max_latency_us=250`. I figured opening up this report and taking guidance on what you'd need and whether I needed a new bug report would get the ball rolling.

Much obliged! Ian.

Note that the following command was run after I disabled A...

Read more...

Judy Brock (judy.brock) wrote :
Download full text (8.9 KiB)

Hi,

Unfortunately I do not work in this area in Samsung although I have tried in the past to try to find the right contacts (not consistently successful)

Judy.

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Ian Ozsvald
Sent: Tuesday, November 20, 2018 3:39 AM
To: Judy Brock
Subject: [Bug 1678184] Re: APST quirk needed for Samsung 512GB NVMe drive

I'd like to follow-up on this bug - it affects me using kernel 4.19.0 on
a Dell XPS 9550 with a 1TB Samsung drive. Details below. Do I need to
open a new bug as I've got a 1TB drive and this bug report was opened
for a 512GB drive?

$ uname -a
Linux ian-XPS-15-9550 4.19.0-041900-generic #201810221809 SMP Mon Oct 22 22:11:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.19.0-041900-generic root=/dev/mapper/mint--vg-root ro quiet splash vt.handoff=1

$ sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S2FZNYAG801690 PM951 NVMe SAMSUNG 1024GB 1 314.10 GB / 1.02 TB 512 B + 0 B BXV76D0Q

A few weeks back I reformatted my machine due to Dropbox's requirement
to drop encrypted home folders (I went for full-disk encryption). I'm
using Linux Mint 19.0. I used to run kernel 4.9.91, I had to stick to
that in my previous machine's Mint installation as >4.9.91 had boot
issues (e.g. missing firmware for my wifi) and going 4.10+ caused other
issues including, from memory, strange disk issues that I didn't track
down along with occasional GPU+second screen issues. I was keen to
upgrade to 4.10+ but sticking to 4.9.91 gave me a stable machine. I'm
keen to push on now that I'm running on a fresh installation.

Having installed Mint 19.0 and upgraded to 4.19.0 I was happy that everything worked...until left idle for a while when I got similar errors to the ones noted here:
Oct 31 11:33:04 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 11:33:04 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
Oct 31 11:33:05 ian-XPS-15-9550 kernel: EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
Oct 31 15:09:44 ian-XPS-15-9550 kernel: EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)

After this state the machine would be unusable due to a read-only
filesystem.

So far I've only exprerimented with disabling APSTE (using: `sudo nvme
set-feature -f 0x0c -v=0 /dev/nvme0`) - if I do this after every awaken
from suspend then I'm able to use the laptop for over a week without
issues with many suspended sessions. If I forget to disable APSTE after
awakening fr...

Read more...

Ian Ozsvald (ian-x88) wrote :

To follow-up. Earlier today I fresh-booted and didn't use the APST disable command - my machine crashed out with read-only errors after minutes.

I then updated grub to use `nvme_core.default_ps_max_latency_us=250` as suggested - it was stable for 6 hours, through 2 suspends. A little while ago it crashed with read-only errors.

I'm now going to try `nvme_core.default_ps_max_latency_us=0` in GRUB.

Kai-Heng Feng (kaihengfeng) wrote :

Ian, I am working on this issue and trying to find the root cause of it. Let's see what I can find.

Ian Ozsvald (ian-x88) wrote :

Hi Kai-Heng, thanks!

Overnight I've used `nvme_core.default_ps_max_latency_us=0` in GRUB with no read-only crashes (everything has been smooth). I left the machine awake all night, I've since suspended & awoken it twice, all seems to be ok.

As expected all APSTE is disabled:
`
$ sudo nvme get-feature -f 0x0c -H /dev/nvme0
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
 Autonomous Power State Transition Enable (APSTE): Disabled
 Auto PST Entries .................
 Entry[ 0]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
 Entry[ 1]
 .................
 Idle Time Prior to Transition (ITPT): 0 ms
 Idle Transition Power State (ITPS): 0
 .................
...
`

Ian Ozsvald (ian-x88) wrote :

I have posted my 1TB (PM951) issue as a new bug here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816

Yingfang (jamesyuan) wrote :

I was experiencing this issue for around 1 year, always repeated irregularly. Allienware 13 with Samsung SDD. Over 7 times reinstallation, changed SSD twice.

Ian Ozsvald (ian-x88) wrote :

Hello Yingfag - can I check please - was it a 1TB Samsung SSD? Which model of SSD (950? 960? 970? something else?). When you changed the SSD - did you keep the same model? What was your final solution? Regards, Ian.

Yingfang (jamesyuan) wrote :

@ian-x88 Hi Ian, nothing changed, so no solution for now. I am considering to buy a new laptop. The problem makes me feel so confused and upset. The SSD is Samsung 250 GB 860 EVO Sata III 64L V NAND Solid State Drive. The infinite loop for ext4 fs error occur irregularly, might be twice an hour or whole night works well.

I post my issue on below link:
https://askubuntu.com/questions/1100838/ext4-fs-error-device-sda2-ext4-find-entry1436-ubuntu-18-04

Displaying first 40 and last 40 comments. View all 109 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers