Bug #1746340 “Samsung SSD corruption (fsck needed)” : Bugs : linux package : Ubuntu

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2018-01-30:

#1

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

Changed in linux (Ubuntu):
importance:	Undecided → High
tags:	added: kernel-da-key

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2018-01-30: Missing required logs.

#2

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1746340

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: AlsaInfo.txt

#3

AlsaInfo.txt Edit (40.7 KiB, text/plain)

apport information

tags:	added: apport-collected artful wayland-session
description:	updated

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: CRDA.txt

#4

CRDA.txt Edit (426 bytes, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: CurrentDmesg.txt

#5

CurrentDmesg.txt Edit (63.4 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: IwConfig.txt

#6

IwConfig.txt Edit (469 bytes, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: JournalErrors.txt

#7

JournalErrors.txt Edit (32.6 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: Lspci.txt

#8

Lspci.txt Edit (10.5 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: Lsusb.txt

#9

Lsusb.txt Edit (430 bytes, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: ProcCpuinfo.txt

#10

ProcCpuinfo.txt Edit (4.5 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: ProcCpuinfoMinimal.txt

#11

ProcCpuinfoMinimal.txt Edit (1.1 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: ProcEnviron.txt

#12

ProcEnviron.txt Edit (331 bytes, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: ProcInterrupts.txt

#13

ProcInterrupts.txt Edit (3.2 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: ProcModules.txt

#14

ProcModules.txt Edit (6.6 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: PulseList.txt

#15

PulseList.txt Edit (26.8 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: RfKill.txt

#16

RfKill.txt Edit (112 bytes, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: UdevDb.txt

#17

UdevDb.txt Edit (187.3 KiB, text/plain)

apport information

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30: WifiSyslog.txt

#18

WifiSyslog.txt Edit (104.5 KiB, text/plain)

apport information

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-30:

#19

Which kernel should I install exactly, and how to? Don't feel safe to download from http

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-01-31:

#20

This is a known issue for Samsung NVMe.

Please attach the output of `sudo nvme id-ctrl /dev/nvme0` and `sudo nvme get-feature -f 0x0c -H /dev/nvme0 | less`, Thanks!

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-01-31:

#21

Uhh sans the "less", thanks.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-01-31:

#22

Thank you for your answer. I'm desperated. I just installed debian therefore I'm not going to able to do it right now, but I have output from the last time I was using Ubuntu.

I tried nvme_core.default_ps_max_latency_us=5500 and it didn't work. Then I've put it to 0, which didn't work too. Well, with 0 it didn't generate errors while using, but while trying to update my machine, which always happens too, so I don't know anymore. I remember seeing ATSP Disabled at the output, but the error always happens when I try to update my software...

Shouldn't this bug be already fixed? Or not in my kernel? I could pay to get to the bottom of this, because I need my computer so much right now and this bug is happening every day and I can't continue my work!

The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian and I have 4.9.0-4.

sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Thank you for your answer. I'm desperated. I just installed debian therefore I'm not going to able to do it right now, but I have output from the last time I was using Ubuntu.

I tried nvme_core.default_ps_max_latency_us=5500 and it didn't work. Then I've put it to 0, which didn't work too. Well, with 0 it didn't generate errors while using, but while trying to update my machine, which always happens too, so I don't know anymore. I remember seeing ATSP Disabled at the output, but the error always happens when I try to update my software...

Shouldn't this bug be already fixed? Or not in my kernel? I could pay to get to the bottom of this, because I need my computer so much right now and this bug is happening every day and I can't continue my work!

The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian and I have 4.9.0-4.

sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-01-31: Re: [Bug 1746340] Re: Samsung SSD corruption (fsck needed)

#23

Download full text (8.7 KiB)

Kai-Heng

> On 31 Jan 2018, at 1:38 PM, Lucas Zanella <email address hidden> wrote:
>
> Thank you for your answer. I'm desperated. I just installed debian
> therefore I'm not going to able to do it right now, but I have output
> from the last time I was using Ubuntu.
>
> I tried nvme_core.default_ps_max_latency_us=5500 and it didn't work.
> Then I've put it to 0, which didn't work too. Well, with 0 it didn't
> generate errors while using, but while trying to update my machine,
> which always happens too, so I don't know anymore. I remember seeing
> ATSP Disabled at the output, but the error always happens when I try to
> update my software…

I’d like to see the output of `sudo nvme get-feature -f 0x0c -H /dev/nvme0` when you use nvme_core.default_ps_max_latency_us=0.

>
> Shouldn't this bug be already fixed? Or not in my kernel? I could pay to
> get to the bottom of this, because I need my computer so much right now
> and this bug is happening every day and I can't continue my work!

This is more likely to a low level NVMe/PCIe issue. If possible, please try to upgrade the firmware for the NVMe.

>
> The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian
> and I have 4.9.0-4.

You’ll get hit by this issue (again) once next Debian release uses newer kernel.

>
> sudo nvme list
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q
>
> NVME Identify Controller:
> vid : 0x144d
> ssvid : 0x144d
> sn : S33UNX0J324060
> mn : SAMSUNG MZVLW512HMJP-00000
> fr : CXY7501Q
> rab : 2
> ieee : 002538
> cmic : 0
> mdts : 0
> cntlid : 2
> ver : 10200
> rtd3r : 186a0
> rtd3e : 4c4b40
> oaes : 0
> oacs : 0x17
> acl : 7
> aerl : 3
> frmw : 0x16
> lpa : 0x3
> elpe : 63
> npss : 4
> avscc : 0x1
> apsta : 0x1
> wctemp : 341
> cctemp : 344
> mtfa : 0
> hmpre : 0
> hmmin : 0
> tnvmcap : 512110190592
> unvmcap : 0
> rpmbs : 0
> sqes : 0x66
> cqes : 0x44
> nn : 1
> oncs : 0x1f
> fuses : 0
> fna : 0
> vwc : 0x1
> awun : 255
> awupf : 0
> nvscc : 1
> acwu : 0
> sgls : 0
> subnqn :
> ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
> rwt:0 rwl:0 idle_power:- active_power:-
> ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
> rwt:1 rwl:1 idle_power:- active_power:-
> ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
> rwt:2 rwl:2 idle_power:- active_power:-
> ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
> rwt:3 rwl:3 idle_power:- active_power:-
> ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
> rwt:4 rwl:4 idle_power:- active_power:-
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed w...

Kai-Heng

> On 31 Jan 2018, at 1:38 PM, Lucas Zanella <1746340@bugs.launchpad.net> wrote:
> 
> Thank you for your answer. I'm desperated. I just installed debian
> therefore I'm not going to able to do it right now, but I have output
> from the last time I was using Ubuntu.
> 
> I tried nvme_core.default_ps_max_latency_us=5500 and it didn't work.
> Then I've put it to 0, which didn't work too. Well, with 0 it didn't
> generate errors while using, but while trying to update my machine,
> which always happens too, so I don't know anymore. I remember seeing
> ATSP Disabled at the output, but the error always happens when I try to
> update my software…

I’d like to see the output of `sudo nvme get-feature -f 0x0c -H /dev/nvme0` when you use nvme_core.default_ps_max_latency_us=0.

> 
> Shouldn't this bug be already fixed? Or not in my kernel? I could pay to
> get to the bottom of this, because I need my computer so much right now
> and this bug is happening every day and I can't continue my work!

This is more likely to a low level NVMe/PCIe issue. If possible, please try to upgrade the firmware for the NVMe.

> 
> The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian
> and I have 4.9.0-4.

You’ll get hit by this issue (again) once next Debian release uses newer kernel.

> 
> sudo nvme list
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
> /dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q
> 
> NVME Identify Controller:
> vid : 0x144d
> ssvid : 0x144d
> sn : S33UNX0J324060
> mn : SAMSUNG MZVLW512HMJP-00000
> fr : CXY7501Q
> rab : 2
> ieee : 002538
> cmic : 0
> mdts : 0
> cntlid : 2
> ver : 10200
> rtd3r : 186a0
> rtd3e : 4c4b40
> oaes : 0
> oacs : 0x17
> acl : 7
> aerl : 3
> frmw : 0x16
> lpa : 0x3
> elpe : 63
> npss : 4
> avscc : 0x1
> apsta : 0x1
> wctemp : 341
> cctemp : 344
> mtfa : 0
> hmpre : 0
> hmmin : 0
> tnvmcap : 512110190592
> unvmcap : 0
> rpmbs : 0
> sqes : 0x66
> cqes : 0x44
> nn : 1
> oncs : 0x1f
> fuses : 0
> fna : 0
> vwc : 0x1
> awun : 255
> awupf : 0
> nvscc : 1
> acwu : 0
> sgls : 0
> subnqn :
> ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
>          rwt:0 rwl:0 idle_power:- active_power:-
> ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
>          rwt:1 rwl:1 idle_power:- active_power:-
> ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
>          rwt:2 rwl:2 idle_power:- active_power:-
> ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
>          rwt:3 rwl:3 idle_power:- active_power:-
> ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
>          rwt:4 rwl:4 idle_power:- active_power:-
> 
> -- 
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1746340
> 
> Title:
>  Samsung SSD corruption (fsck needed)
> 
> Status in linux package in Ubuntu:
>  Confirmed
> 
> Bug description:
>  Ubuntu 4.13.0-21.24-generic 4.13.13
> 
> 
>  I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
> 
>  I notice the error when I try to save something on disk and it says me
>  that the disk is in read-only mode:
> 
>  lz@lz:/var/log$ touch something
>  touch: cannot touch 'something': Read-only file system
> 
> 
>  lz@lz:/var/log$ cat syslog
>  Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> 
> 
>  lz@lz:/var/log$ dmesg
>  [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.377374] Aborting journal on device nvme0n1p2-8.
>  [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
>  [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12  capname="net_admin"
> 
>  Rebooting the ubuntu will give me a black terminal where I can run
>  fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
>  orphaned inodes. The majority of time it boots back to the Ubuntu
>  working good, but some times it boots to a broken ubuntu (no images,
>  lots of things broken). I have to reinstall ubuntu then.
> 
>  Every time I reinstall my Ubuntu, I have to try lots of times until it
>  installs without an Input/Output error. When it installs, I can use it
>  for some hours without having the problem, but if I run the software
>  updates, it ALWAYS crashes and enters in read-only mode, specifically
>  in the part that is installing kernel updates.
> 
>  I noticed that Ubuntu installs updates automatically when they're for
>  security reasons. Could this be the reason my Ubuntu worked for months
>  without the problem, but then an update was applied and it broke?
> 
>  I thought that this bug was happening:
>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 and tried
>  different nvme_core.default_ps_max_latency_us= combinations, all them
>  gave errors. I just changed to 0 and I had no error while using ubuntu
>  (however I didn't test for a long time) but I still had the error
>  after trying to update my ubuntu.
> 
>  My Samsung 512gb SSD is:
> 
>  SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q
> 
>  on a Razer Blade Stealth.
> 
>  I also asked this on ask ubuntu, without success:
>  https://askubuntu.com/questions/998471/razer-blade-stealth-disk-
>  corruption-fsck-needed-probably-samsung-ssd-bug-afte
> 
>  Please help me, as I need this computer to work on lots of things :c
>  --- 
>  ApportVersion: 2.20.7-0ubuntu3.7
>  Architecture: amd64
>  AudioDevicesInUse:
>   USER        PID ACCESS COMMAND
>   /dev/snd/controlC0:  lz         1088 F.... pulseaudio
>  CurrentDesktop: ubuntu:GNOME
>  DistroRelease: Ubuntu 17.10
>  InstallationDate: Installed on 2018-01-30 (0 days ago)
>  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
>  MachineType: Razer Blade Stealth
>  Package: linux (not installed)
>  ProcFB: 0 inteldrmfb
>  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=0ca062da-7e8f-425a-88b1-1f784fb40346 ro quiet splash button.lid_init_state=open nvme_core.default_ps_max_latency_us=0
>  ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
>  RelatedPackageVersions:
>   linux-restricted-modules-4.13.0-21-generic N/A
>   linux-backports-modules-4.13.0-21-generic  N/A
>   linux-firmware                             1.169.1
>  Tags:  wayland-session artful
>  Uname: Linux 4.13.0-21-generic x86_64
>  UpgradeStatus: No upgrade log present (probably fresh install)
>  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
>  _MarkForUpload: True
>  dmi.bios.date: 01/12/2017
>  dmi.bios.vendor: Razer
>  dmi.bios.version: 6.00
>  dmi.board.name: Razer
>  dmi.board.vendor: Razer
>  dmi.chassis.type: 9
>  dmi.chassis.vendor: Razer
>  dmi.modalias: dmi:bvnRazer:bvr6.00:bd01/12/2017:svnRazer:pnBladeStealth:pvr2.04:rvnRazer:rnRazer:rvr:cvnRazer:ct9:cvr:
>  dmi.product.family: 1A586752
>  dmi.product.name: Blade Stealth
>  dmi.product.version: 2.04
>  dmi.sys.vendor: Razer
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340/+subscriptions

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-03:

#24

Hi. I've been trying to install Windows 10 in order to try to update my SSD firmware, but I'm getting an error:

https://imgur.com/a/BM0gG

could it be that my SSD has a real hardware problem? I tried many different pen drives, in different USB ports, but I always get the same error.

I'm trying to install Ubuntu to get the output of nvme_core.default_ps_max_latency_us=0 but the installation always fails

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-04:

#25

Download full text (5.8 KiB)

Hi! I managed to install ubuntu again, these are the outputs you asked for the ms tie of 0 milliseconds:

NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

get-feature:0xc (Autonomous Power State Transition), Current value:00000000
Autonomous Power State Transition Enable (APSTE): Disabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 1]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 2]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 3]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 4]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 5]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 6]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 7]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 8]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 9]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[10]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[11]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[12]
...

Hi! I managed to install ubuntu again, these are the outputs you asked for the ms tie of 0 milliseconds:

NVME Identify Controller:
vid     : 0x144d
ssvid   : 0x144d
sn      : S33UNX0J324060      
mn      : SAMSUNG MZVLW512HMJP-00000              
fr      : CXY7501Q
rab     : 2
ieee    : 002538
cmic    : 0
mdts    : 0
cntlid  : 2
ver     : 10200
rtd3r   : 186a0
rtd3e   : 4c4b40
oaes    : 0
oacs    : 0x17
acl     : 7
aerl    : 3
frmw    : 0x16
lpa     : 0x3
elpe    : 63
npss    : 4
avscc   : 0x1
apsta   : 0x1
wctemp  : 341
cctemp  : 344
mtfa    : 0
hmpre   : 0
hmmin   : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs   : 0
sqes    : 0x66
cqes    : 0x44
nn      : 1
oncs    : 0x1f
fuses   : 0
fna     : 0
vwc     : 0x1
awun    : 255
awupf   : 0
nvscc   : 1
acwu    : 0
sgls    : 0
subnqn  : 
ps    0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

get-feature:0xc (Autonomous Power State Transition), Current value:00000000
	Autonomous Power State Transition Enable (APSTE): Disabled
	Auto PST Entries	.................
	Entry[ 0]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 1]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 2]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 3]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 4]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 5]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 6]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 7]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 8]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 9]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[10]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[11]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[12]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[13]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[14]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[15]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[16]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[17]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[18]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[19]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[20]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[21]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[22]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[23]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[24]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[25]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[26]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[27]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[28]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[29]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[30]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[31]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-04:

#26

I just installed 4.15.0-041500-generic

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-04:

#27

Problem persists with 4.15.0-041500-generic, just happened

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-05:

#28

So you have the issue on Linux v4.15 with nvme_core.default_ps_max_latency_us=0, but not on v4.9?

APST doesn't get enabled on both of them.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-05:

#29

On debian (4.9) I didn't notice the issue but I didn't use much. HOWEVER, when I do apt-get upgrade on debian I do get the issue. It just updated the kernel file, didn't run the new kernel (a boot would have to happen).

On v4.15 I didn't change the nvme_core.default_ps_max_latency_us=0, I guess. I did before upgrading to v4.15, I guess. But I can try again.

This is all very strange

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-05:

#30

I forgot to mention that I reinstalled windows and everything is fine. Even did a benchmark test on the SSD and I'm downloading lots of files to test

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-05:

#31

I am not familiar with Windows, is there anyway to check its APST table? I'd like to see if deepest power state is enabled or not.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-05:

#32

I searched and found nothing.

So, even with APST disabled my ssd will fail on linux. What should I do?
Does it work normally for other people when they disable it?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-06:

#33

I found a guy with same problem as mine and had a Razer Blade Stealth, but he didn't post anything more after that. And he was in a thread with you. I also found some people with this same problem on the same SSD. Together with the fact that I had no problem on windows (ore than 24hrs of usage by now) I think it can be fixed in the kernel.

I had no luck updating my SSD's firmware as it's OEM and Samsung's updater won't work for it. Do you have any idea? I don't have money to buy a new SSD, and I really need to work. I'd be so grateful if you could help with a solution.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-06:

#34

Does the issue happen after system suspend?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-06:

#35

Initially I noted that it'd happen after opening the lid of the notebook, so yes. But now after I install Ubuntu it immediately starts looking for software updates and that's when the problem happens for the first time, when I haven't even had time to close the notebook to suspend it.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-07:

#36

Please try [1]. It will do a PCI reset for NVMe device after resume.

people.canonical.com/~khfeng/lp1746340/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-07:

#37

Thanks. What's a 'PCI reset for NVMe device after resume'?

Here's the output of running sudo dpkg -i *.deb on the 4 files:

Selecting previously unselected package linux-headers-4.15.0+.
(Reading database ... 137951 files and directories currently installed.)
Preparing to unpack linux-headers-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-headers-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-4.15.0+.
Preparing to unpack linux-image-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-image-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-4.15.0+-dbg.
Preparing to unpack linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb ...
Unpacking linux-image-4.15.0+-dbg (4.15.0+-2) ...
dpkg-deb (subprocess): decompressing archive member: lzma error: compressed data is corrupt
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing archive linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb (--install):
cannot copy extracted data for './usr/lib/debug/lib/modules/4.15.0+/kernel/drivers/iio/pressure/zpa2326.ko' to '/usr/lib/debug/lib/modules/4.15.0+/kernel/drivers/iio/pressure/zpa2326.ko.dpkg-new': unexpected end of file or stream
Selecting previously unselected package linux-libc-dev.
Preparing to unpack linux-libc-dev_4.15.0+-2_amd64.deb ...
Unpacking linux-libc-dev (4.15.0+-2) ...
Setting up linux-headers-4.15.0+ (4.15.0+-2) ...
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.img-4.15.0+
W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1_04.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_39.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver9_29.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_ver9_33.bin for module i915
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-libc-dev (4.15.0+-2) ...
Errors were encountered while processing:
linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb

Thanks. What's a 'PCI reset for NVMe device after resume'?

Here's the output of running sudo dpkg -i *.deb on the 4 files:

Selecting previously unselected package linux-headers-4.15.0+.
(Reading database ... 137951 files and directories currently installed.)
Preparing to unpack linux-headers-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-headers-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-4.15.0+.
Preparing to unpack linux-image-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-image-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-4.15.0+-dbg.
Preparing to unpack linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb ...
Unpacking linux-image-4.15.0+-dbg (4.15.0+-2) ...
dpkg-deb (subprocess): decompressing archive member: lzma error: compressed data is corrupt
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing archive linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb (--install):
 cannot copy extracted data for './usr/lib/debug/lib/modules/4.15.0+/kernel/drivers/iio/pressure/zpa2326.ko' to '/usr/lib/debug/lib/modules/4.15.0+/kernel/drivers/iio/pressure/zpa2326.ko.dpkg-new': unexpected end of file or stream
Selecting previously unselected package linux-libc-dev.
Preparing to unpack linux-libc-dev_4.15.0+-2_amd64.deb ...
Unpacking linux-libc-dev (4.15.0+-2) ...
Setting up linux-headers-4.15.0+ (4.15.0+-2) ...
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.img-4.15.0+
W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1_04.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_39.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver9_29.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_ver9_33.bin for module i915
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-libc-dev (4.15.0+-2) ...
Errors were encountered while processing:
 linux-image-4.15.0+-dbg_4.15.0+-2_amd64.deb

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-08:

#38

I downloaded again and it seems that this time it wasn't corrupted.

Output:

Preparing to unpack linux-headers-4.15.0+_4.15.0+-2_amd64.deb ...
Unpacking linux-headers-4.15.0+ (4.15.0+-2) over (4.15.0+-2) ...
Preparing to unpack linux-image-4.15.0+_4.15.0+-2_amd64(1).deb ...
Unpacking linux-image-4.15.0+ (4.15.0+-2) over (4.15.0+-2) ...
Preparing to unpack linux-image-4.15.0+-dbg_4.15.0+-2_amd64(1).deb ...
Unpacking linux-image-4.15.0+-dbg (4.15.0+-2) ...
Preparing to unpack linux-libc-dev_4.15.0+-2_amd64.deb ...
Unpacking linux-libc-dev (4.15.0+-2) over (4.15.0+-2) ...
Setting up linux-headers-4.15.0+ (4.15.0+-2) ...
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.img-4.15.0+
W: Possible missing firmware /lib/firmware/i915/skl_dmc_ver1_27.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_dmc_ver1_04.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_39.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver9_29.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_ver9_33.bin for module i915
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-4.15.0+-dbg (4.15.0+-2) ...
Setting up linux-libc-dev (4.15.0+-2) ...

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-08:

#39

After installing everything, I rebooted to use the new kernel. I then installed updates on the machine to see if the problem would happen (easier way to make it happen is on the moment I try to update). After the update, wireless stopped working. Restarted many times and still not working.

Could it be that the update triggered the error and the so called pcie reset of this kernel made the wireless go wrong?

I'm gonna still use this kernel to see if the read only filesystem happens though

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-08:

#40

I added an USB wireless receiver to use internet to download things so I can see if something happens. I installed more system updates through the ubuntu software updates. Is this ok? The kernel will still be yours, rigtht?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-08:

#41

Download full text (6.4 KiB)

> On Feb 8, 2018, at 10:19 AM, Lucas Zanella <email address hidden> wrote:
>
> I added an USB wireless receiver to use internet to download things so I
> can see if something happens. I installed more system updates through
> the ubuntu software updates. Is this ok? The kernel will still be yours,
> rigtht?

I should be. You can use `uname -r` to check the kernel version.

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"
>
> Rebooting the ubuntu will give me a black ter...

> On Feb 8, 2018, at 10:19 AM, Lucas Zanella <1746340@bugs.launchpad.net> wrote:
> 
> I added an USB wireless receiver to use internet to download things so I
> can see if something happens. I installed more system updates through
> the ubuntu software updates. Is this ok? The kernel will still be yours,
> rigtht?

I should be. You can use `uname -r` to check the kernel version.

> 
> -- 
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1746340
> 
> Title:
>  Samsung SSD corruption (fsck needed)
> 
> Status in linux package in Ubuntu:
>  Confirmed
> 
> Bug description:
>  Ubuntu 4.13.0-21.24-generic 4.13.13
> 
> 
>  I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
> 
>  I notice the error when I try to save something on disk and it says me
>  that the disk is in read-only mode:
> 
>  lz@lz:/var/log$ touch something
>  touch: cannot touch 'something': Read-only file system
> 
> 
>  lz@lz:/var/log$ cat syslog
>  Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> 
> 
>  lz@lz:/var/log$ dmesg
>  [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.377374] Aborting journal on device nvme0n1p2-8.
>  [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
>  [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>  [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12  capname="net_admin"
> 
>  Rebooting the ubuntu will give me a black terminal where I can run
>  fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
>  orphaned inodes. The majority of time it boots back to the Ubuntu
>  working good, but some times it boots to a broken ubuntu (no images,
>  lots of things broken). I have to reinstall ubuntu then.
> 
>  Every time I reinstall my Ubuntu, I have to try lots of times until it
>  installs without an Input/Output error. When it installs, I can use it
>  for some hours without having the problem, but if I run the software
>  updates, it ALWAYS crashes and enters in read-only mode, specifically
>  in the part that is installing kernel updates.
> 
>  I noticed that Ubuntu installs updates automatically when they're for
>  security reasons. Could this be the reason my Ubuntu worked for months
>  without the problem, but then an update was applied and it broke?
> 
>  I thought that this bug was happening:
>  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 and tried
>  different nvme_core.default_ps_max_latency_us= combinations, all them
>  gave errors. I just changed to 0 and I had no error while using ubuntu
>  (however I didn't test for a long time) but I still had the error
>  after trying to update my ubuntu.
> 
>  My Samsung 512gb SSD is:
> 
>  SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q
> 
>  on a Razer Blade Stealth.
> 
>  I also asked this on ask ubuntu, without success:
>  https://askubuntu.com/questions/998471/razer-blade-stealth-disk-
>  corruption-fsck-needed-probably-samsung-ssd-bug-afte
> 
>  Please help me, as I need this computer to work on lots of things :c
>  --- 
>  ApportVersion: 2.20.7-0ubuntu3.7
>  Architecture: amd64
>  AudioDevicesInUse:
>   USER        PID ACCESS COMMAND
>   /dev/snd/controlC0:  lz         1088 F.... pulseaudio
>  CurrentDesktop: ubuntu:GNOME
>  DistroRelease: Ubuntu 17.10
>  InstallationDate: Installed on 2018-01-30 (0 days ago)
>  InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
>  MachineType: Razer Blade Stealth
>  Package: linux (not installed)
>  ProcFB: 0 inteldrmfb
>  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=0ca062da-7e8f-425a-88b1-1f784fb40346 ro quiet splash button.lid_init_state=open nvme_core.default_ps_max_latency_us=0
>  ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
>  RelatedPackageVersions:
>   linux-restricted-modules-4.13.0-21-generic N/A
>   linux-backports-modules-4.13.0-21-generic  N/A
>   linux-firmware                             1.169.1
>  Tags:  wayland-session artful
>  Uname: Linux 4.13.0-21-generic x86_64
>  UpgradeStatus: No upgrade log present (probably fresh install)
>  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
>  _MarkForUpload: True
>  dmi.bios.date: 01/12/2017
>  dmi.bios.vendor: Razer
>  dmi.bios.version: 6.00
>  dmi.board.name: Razer
>  dmi.board.vendor: Razer
>  dmi.chassis.type: 9
>  dmi.chassis.vendor: Razer
>  dmi.modalias: dmi:bvnRazer:bvr6.00:bd01/12/2017:svnRazer:pnBladeStealth:pvr2.04:rvnRazer:rnRazer:rvr:cvnRazer:ct9:cvr:
>  dmi.product.family: 1A586752
>  dmi.product.name: Blade Stealth
>  dmi.product.version: 2.04
>  dmi.sys.vendor: Razer
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340/+subscriptions

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-08:

#42

The new kernel has been running for almost a day and no problems happened (however I still have no PCIe wireless and no i915 firmware so I can't open things like kdenlive)

What does this fix of yours do and is it possible to make it work with everything?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-09:

#43

I built another one based on Bionic, please use this kernel instead,
people.canonical.com/~khfeng/lp1746340-2/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-09:

#44

Download full text (4.2 KiB)

Just after running sudo dpkg -i *.deb, and before rebooting, the error happened. Since the new kernel isn't running yet, I guess this current kernel still had the problem? That's strange because I've been running for more than 24 hours, downloading lots of torrents and had no problems.

I'm going to reboot now to test the new kernel

Here's the output:

sudo dpkg -i *.deb
[sudo] password for lz:
Selecting previously unselected package linux-headers-4.14.0-17.
(Reading database ... 215114 files and directories currently installed.)
Preparing to unpack linux-headers-4.14.0-17_4.14.0-17.20~lp1746340_all.deb ...
Unpacking linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
Selecting previously unselected package linux-headers-4.14.0-17-generic.
Preparing to unpack linux-headers-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
Unpacking linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Selecting previously unselected package linux-image-4.14.0-17-generic.
Preparing to unpack linux-image-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
Examining /etc/kernel/preinst.d/
Done.
Unpacking linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Selecting previously unselected package linux-image-extra-4.14.0-17-generic.
Preparing to unpack linux-image-extra-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
Unpacking linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Setting up linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
Setting up linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Setting up linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
update-initramfs: Generating /boot/initrd.img-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/...

Just after running sudo dpkg -i *.deb, and before rebooting, the error happened. Since the new kernel isn't running yet, I guess this current kernel still had the problem? That's strange because I've been running for more than 24 hours, downloading lots of torrents and had no problems.

I'm going to reboot now to test the new kernel

Here's the output:

sudo dpkg -i *.deb
[sudo] password for lz: 
Selecting previously unselected package linux-headers-4.14.0-17.
(Reading database ... 215114 files and directories currently installed.)
Preparing to unpack linux-headers-4.14.0-17_4.14.0-17.20~lp1746340_all.deb ...
Unpacking linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
Selecting previously unselected package linux-headers-4.14.0-17-generic.
Preparing to unpack linux-headers-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
Unpacking linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Selecting previously unselected package linux-image-4.14.0-17-generic.
Preparing to unpack linux-image-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
Examining /etc/kernel/preinst.d/
Done.
Unpacking linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Selecting previously unselected package linux-image-extra-4.14.0-17-generic.
Preparing to unpack linux-image-extra-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
Unpacking linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Setting up linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
Setting up linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Setting up linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
update-initramfs: Generating /boot/initrd.img-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
update-initramfs: Generating /boot/initrd.img-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-09:

#45

I rebooted and I was still at 4.15. I then activated grub to select a kernel version, and I chose 4.14.... which is yours. I then boot to a cpu_fifo_underun and nothing boots

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-09:

#46

initramf [drm: intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-09:

#47

Download full text (10.7 KiB)

> On Feb 9, 2018, at 3:34 PM, Lucas Zanella <email address hidden>
> wrote:
>
> Just after running sudo dpkg -i *.deb, and before rebooting, the error
> happened. Since the new kernel isn't running yet, I guess this current
> kernel still had the problem? That's strange because I've been running
> for more than 24 hours, downloading lots of torrents and had no
> problems.
>
> I'm going to reboot now to test the new kernel
>

The issue happens when the disk transits between operational states and
non-operational states.

If you are torrenting, then chances are the disk is always in operational
states, so you don’t see the issue.

Kai-Heng

> Here's the output:
>
> sudo dpkg -i *.deb
> [sudo] password for lz:
> Selecting previously unselected package linux-headers-4.14.0-17.
> (Reading database ... 215114 files and directories currently installed.)
> Preparing to unpack
> linux-headers-4.14.0-17_4.14.0-17.20~lp1746340_all.deb ...
> Unpacking linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
> Selecting previously unselected package linux-headers-4.14.0-17-generic.
> Preparing to unpack
> linux-headers-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
> Unpacking linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Selecting previously unselected package linux-image-4.14.0-17-generic.
> Preparing to unpack
> linux-image-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
> Examining /etc/kernel/preinst.d/
> Done.
> Unpacking linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Selecting previously unselected package
> linux-image-extra-4.14.0-17-generic.
> Preparing to unpack
> linux-image-extra-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
> Unpacking linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Setting up linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
> Setting up linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Setting up linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Running depmod.
> update-initramfs: deferring update (hook will be called later)
> Examining /etc/kernel/postinst.d.
> run-parts: executing /etc/kernel/postinst.d/apt-auto-removal
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/initramfs-tools
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> update-initramfs: Generating /boot/initrd.img-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/unattended-upgrades
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/update-notifier
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/zz-update-grub
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> Generating grub configuration file ...
> Warning: Setting GRUB_TIMEOUT to a non-zero value when
> GRUB_HIDDEN_TIMEOUT is set is no longer supported.
> Found linux image: /boot/vmlinuz-4.15.0+
> Found initrd image: /boot/initrd.img-4.15.0+
> Found linux image: /boot/vmlinuz-4.14.0-17-generic
> Found initrd image: /boot/initrd.img-4.14.0-17-generic
> Found linux image: /boot/vmlinuz-4.13.0-32-generic
> Found...

> On Feb 9, 2018, at 3:34 PM, Lucas Zanella <1746340@bugs.launchpad.net>  
> wrote:
>
> Just after running sudo dpkg -i *.deb, and before rebooting, the error
> happened. Since the new kernel isn't running yet, I guess this current
> kernel still had the problem? That's strange because I've been running
> for more than 24 hours, downloading lots of torrents and had no
> problems.
>
> I'm going to reboot now to test the new kernel
>

The issue happens when the disk transits between operational states and  
non-operational states.

If you are torrenting, then chances are the disk is always in operational  
states, so you don’t see the issue.

Kai-Heng

> Here's the output:
>
> sudo dpkg -i *.deb
> [sudo] password for lz:
> Selecting previously unselected package linux-headers-4.14.0-17.
> (Reading database ... 215114 files and directories currently installed.)
> Preparing to unpack  
> linux-headers-4.14.0-17_4.14.0-17.20~lp1746340_all.deb ...
> Unpacking linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
> Selecting previously unselected package linux-headers-4.14.0-17-generic.
> Preparing to unpack  
> linux-headers-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
> Unpacking linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Selecting previously unselected package linux-image-4.14.0-17-generic.
> Preparing to unpack  
> linux-image-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
> Examining /etc/kernel/preinst.d/
> Done.
> Unpacking linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Selecting previously unselected package  
> linux-image-extra-4.14.0-17-generic.
> Preparing to unpack  
> linux-image-extra-4.14.0-17-generic_4.14.0-17.20~lp1746340_amd64.deb ...
> Unpacking linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Setting up linux-headers-4.14.0-17 (4.14.0-17.20~lp1746340) ...
> Setting up linux-headers-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Setting up linux-image-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> Running depmod.
> update-initramfs: deferring update (hook will be called later)
> Examining /etc/kernel/postinst.d.
> run-parts: executing /etc/kernel/postinst.d/apt-auto-removal  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/initramfs-tools  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> update-initramfs: Generating /boot/initrd.img-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/unattended-upgrades  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/update-notifier  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/zz-update-grub  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> Generating grub configuration file ...
> Warning: Setting GRUB_TIMEOUT to a non-zero value when  
> GRUB_HIDDEN_TIMEOUT is set is no longer supported.
> Found linux image: /boot/vmlinuz-4.15.0+
> Found initrd image: /boot/initrd.img-4.15.0+
> Found linux image: /boot/vmlinuz-4.14.0-17-generic
> Found initrd image: /boot/initrd.img-4.14.0-17-generic
> Found linux image: /boot/vmlinuz-4.13.0-32-generic
> Found initrd image: /boot/initrd.img-4.13.0-32-generic
> Found linux image: /boot/vmlinuz-4.13.0-21-generic
> Found initrd image: /boot/initrd.img-4.13.0-21-generic
> Adding boot menu entry for EFI firmware configuration
> done
> Setting up linux-image-extra-4.14.0-17-generic (4.14.0-17.20~lp1746340) ...
> run-parts: executing /etc/kernel/postinst.d/apt-auto-removal  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/initramfs-tools  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> update-initramfs: Generating /boot/initrd.img-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/unattended-upgrades  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/update-notifier  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> run-parts: executing /etc/kernel/postinst.d/zz-update-grub  
> 4.14.0-17-generic /boot/vmlinuz-4.14.0-17-generic
> Generating grub configuration file ...
> Warning: Setting GRUB_TIMEOUT to a non-zero value when  
> GRUB_HIDDEN_TIMEOUT is set is no longer supported.
> Found linux image: /boot/vmlinuz-4.15.0+
> Found initrd image: /boot/initrd.img-4.15.0+
> Found linux image: /boot/vmlinuz-4.14.0-17-generic
> Found initrd image: /boot/initrd.img-4.14.0-17-generic
> Found linux image: /boot/vmlinuz-4.13.0-32-generic
> Found initrd image: /boot/initrd.img-4.13.0-32-generic
> Found linux image: /boot/vmlinuz-4.13.0-21-generic
> Found initrd image: /boot/initrd.img-4.13.0-21-generic
> Adding boot menu entry for EFI firmware configuration
> done
>
> -- 
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
>   Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
>   Confirmed
>
> Bug description:
>   Ubuntu 4.13.0-21.24-generic 4.13.13
>
>
>   I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
>   I notice the error when I try to save something on disk and it says me
>   that the disk is in read-only mode:
>
>   lz@lz:/var/log$ touch something
>   touch: cannot touch 'something': Read-only file system
>
>
>   lz@lz:/var/log$ cat syslog
>   Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
>
>   lz@lz:/var/log$ dmesg
>   [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.377374] Aborting journal on device nvme0n1p2-8.
>   [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
>   [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>   [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"
>
>   Rebooting the ubuntu will give me a black terminal where I can run
>   fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
>   orphaned inodes. The majority of time it boots back to the Ubuntu
>   working good, but some times it boots to a broken ubuntu (no images,
>   lots of things broken). I have to reinstall ubuntu then.
>
>   Every time I reinstall my Ubuntu, I have to try lots of times until it
>   installs without an Input/Output error. When it installs, I can use it
>   for some hours without having the problem, but if I run the software
>   updates, it ALWAYS crashes and enters in read-only mode, specifically
>   in the part that is installing kernel updates.
>
>   I noticed that Ubuntu installs updates automatically when they're for
>   security reasons. Could this be the reason my Ubuntu worked for months
>   without the problem, but then an update was applied and it broke?
>
>   I thought that this bug was happening:
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 and tried
>   different nvme_core.default_ps_max_latency_us= combinations, all them
>   gave errors. I just changed to 0 and I had no error while using ubuntu
>   (however I didn't test for a long time) but I still had the error
>   after trying to update my ubuntu.
>
>   My Samsung 512gb SSD is:
>
>   SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q
>
>   on a Razer Blade Stealth.
>
>   I also asked this on ask ubuntu, without success:
>   https://askubuntu.com/questions/998471/razer-blade-stealth-disk-
>   corruption-fsck-needed-probably-samsung-ssd-bug-afte
>
>   Please help me, as I need this computer to work on lots of things :c
>   ---
>   ApportVersion: 2.20.7-0ubuntu3.7
>   Architecture: amd64
>   AudioDevicesInUse:
>    USER        PID ACCESS COMMAND
>    /dev/snd/controlC0:  lz         1088 F.... pulseaudio
>   CurrentDesktop: ubuntu:GNOME
>   DistroRelease: Ubuntu 17.10
>   InstallationDate: Installed on 2018-01-30 (0 days ago)
>   InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
>   MachineType: Razer Blade Stealth
>   Package: linux (not installed)
>   ProcFB: 0 inteldrmfb
>   ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=0ca062da-7e8f-425a-88b1-1f784fb40346 ro quiet splash button.lid_init_state=open nvme_core.default_ps_max_latency_us=0
>   ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
>   RelatedPackageVersions:
>    linux-restricted-modules-4.13.0-21-generic N/A
>    linux-backports-modules-4.13.0-21-generic  N/A
>    linux-firmware                             1.169.1
>   Tags:  wayland-session artful
>   Uname: Linux 4.13.0-21-generic x86_64
>   UpgradeStatus: No upgrade log present (probably fresh install)
>   UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
>   _MarkForUpload: True
>   dmi.bios.date: 01/12/2017
>   dmi.bios.vendor: Razer
>   dmi.bios.version: 6.00
>   dmi.board.name: Razer
>   dmi.board.vendor: Razer
>   dmi.chassis.type: 9
>   dmi.chassis.vendor: Razer
>   dmi.modalias: dmi:bvnRazer:bvr6.00:bd01/12/2017:svnRazer:pnBladeStealth:pvr2.04:rvnRazer:rnRazer:rvr:cvnRazer:ct9:cvr:
>   dmi.product.family: 1A586752
>   dmi.product.name: Blade Stealth
>   dmi.product.version: 2.04
>   dmi.sys.vendor: Razer
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340/+subscriptions

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-09:

#48

Very important update: I bought a brand new Samsung 960 EVO, and I can't even install Ubuntu: I get I/O error in the installation

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-09:

#49

You need to boot with kernel parameter "nvme_core.default_ps_max_latency_us=0"

Please try this kernel after installation:
http://people.canonical.com/~khfeng/lp1746340-artful/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-09:

#50

Is it possible to boot the live installation media with the kernel parameter? I'm having problems installing the ubuntu into the new SSD, always get I/O errors...

I'm gonna also try the new kernel on the old SSD though

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-10:

#51

You mean I need to boot with the parameter on your new kernel? I'm gonna try it

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-10:

#52

Download full text (3.1 KiB)

I just installed your new kernel on the old SSD and changed the nvme_core_..._us to 0

seems that a dependency is missing on the kernel:

oblems prevent configuration of linux-headers-4.13.0-34-generic:
linux-headers-4.13.0-34-generic depends on libssl1.1 (>= 1.1.0); however:
Package libssl1.1 is not installed.

dpkg: error processing package linux-headers-4.13.0-34-generic (--install):
dependency problems - leaving unconfigured
Setting up linux-image-4.13.0-34-generic (4.13.0-34.37~lp1746340) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
update-initramfs: Generating /boot/initrd.img-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-34-generic
Found initrd image: /boot/initrd.img-4.13.0-34-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-extra-4.13.0-34-generic (4.13.0-34.37~lp1746340) ...
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
update-initramfs: Generating /boot/initrd.img-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-34-generic
Found initrd image: /boot/initrd.img-4.13.0-34-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Errors were encountered while processing:
linux-hea...

I just installed your new kernel on the old SSD and changed the nvme_core_..._us to 0

seems that a dependency is missing on the kernel:

oblems prevent configuration of linux-headers-4.13.0-34-generic:
 linux-headers-4.13.0-34-generic depends on libssl1.1 (>= 1.1.0); however:
  Package libssl1.1 is not installed.

dpkg: error processing package linux-headers-4.13.0-34-generic (--install):
 dependency problems - leaving unconfigured
Setting up linux-image-4.13.0-34-generic (4.13.0-34.37~lp1746340) ...
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
update-initramfs: Generating /boot/initrd.img-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-34-generic
Found initrd image: /boot/initrd.img-4.13.0-34-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-extra-4.13.0-34-generic (4.13.0-34.37~lp1746340) ...
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
update-initramfs: Generating /boot/initrd.img-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/update-notifier 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.13.0-34-generic /boot/vmlinuz-4.13.0-34-generic
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0+
Found initrd image: /boot/initrd.img-4.15.0+
Found linux image: /boot/vmlinuz-4.14.0-17-generic
Found initrd image: /boot/initrd.img-4.14.0-17-generic
Found linux image: /boot/vmlinuz-4.13.0-34-generic
Found initrd image: /boot/initrd.img-4.13.0-34-generic
Found linux image: /boot/vmlinuz-4.13.0-32-generic
Found initrd image: /boot/initrd.img-4.13.0-32-generic
Found linux image: /boot/vmlinuz-4.13.0-21-generic
Found initrd image: /boot/initrd.img-4.13.0-21-generic
Adding boot menu entry for EFI firmware configuration
done
Errors were encountered while processing:
 linux-headers-4.13.0-34-generic

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-10:

#53

Nevermind I installed libssl1.1 by adding the bionic rep, however right before I could reinstall the kernel the system entered in read-only mode. I'm gonna try to enter and install the new kernel in some way.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-10:

#54

I think I got it: http://pastebin.com/raw/squFVnGi

I'm counting with the idea that this 4.13.0-34 is the new one, not the old one I had. I hope it is.

Just booted and PCIe wireless is working. uname-r gives 4.13.0-34-generic.

Going to leave the system rest for a while to see if something happens, not going to download torrent again.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-12:

#55

I've been running for days without any problem (it'd happen before like 30 minutes after installation). So can you release the source? Will it be on mainline?

Also, how to use this kernel with the live image? Because it's painful to install ubuntu with this problems, I get I/O error in 90% of my tries. I have to try for hours until it installs good.

Thank you so much!

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-13:

#56

Can you try again with [1]?

The one you used is with quirk NVME_QUIRK_NO_DEEPEST_PS, let's see if that quirk is unnecessary.

[1] people.canonical.com/~khfeng/lp1746340-pcireset/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-13:

#57

Ok, just installed it. Gonna monitor it to see if any errors come up

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-15:

#58

Everything is ok with this new kernel. No erros.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-15:

#59

https://lkml.org/lkml/2018/2/15/347

Changed in linux (Ubuntu):
assignee:	nobody → Kai-Heng Feng (kaihengfeng)

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-15:

#60

When there will be a kernel with this patch included?

What about the live image? It's going to take months for a live installation image to have this patch. Is it possible for me to use this kernel in a live image myself?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-16:

#61

> When there will be a kernel with this patch included?
v4.16.

> What about the live image? It's going to take months for a live installation image to have this patch. Is it possible for me to use this kernel in a live image myself?
I'll back port the patch to v4.15 so Bionic (18.04) live image will have this fix.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-16:

#62

but v4.16-rc1 doesn't have "NVME_QUIRK_PCI_RESET_RESUME = (1 << 7)"

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-18:

#63

The patch doesn't get merged yet.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-19:

#64

I compiled a kernel myself apllying the patcch and using make deb-pkg and got these files:

    linux-headers-4.15.4_4.15.4-4_amd64.deb
    linux-image-4.15.4_4.15.4-4_amd64.deb
    linux-image-4.15.4-dbg_4.15.4-4_amd64.deb
    linux-libc-dev_4.15.4-4_amd64.deb

but you don't have image...dbg neither libc, and you have image-extra and headers...generic. What's the difference? Will mine work? If not, how do I get your 4 files exactly?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-19:

#65

Build a Debian/Ubuntu kernel:
https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel#Building_the_kernel

Kernel source:
http://kernel.ubuntu.com/git/ubuntu/ubuntu-artful.git/
http://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-19:

#66

I download a fresh 4.15.0-9 kernel, applied the diff and compiled just as in the page you sent.

I then formatted, installed ubuntu 17.10.1, booted and enabled nvme_..._us=0, rebooted. Then I started installing updates. The error ocurred in the middle of it.

I've never tried to install updates on the kernels you made me test. Could it be that it's breaking something?

:(

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-20:

#67

I installed ubuntu again, installed my compiled kernel, disabled updates. When installing a package (virt-manager), the error ocurred again. This package messes with kernel (kvm things) but I used before on your kernels and everything was fine (I didn't try to install while in your kernel though)

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-20:

#68

Here's what I did:

git clone bionic_git_url...
cd ubuntu-bionic
git checkout <tag of version 4.15.0-9>
patch -p1 < nvme_reset.diff #(from your diff file)
#gave an error about last line but I checked manually and it was ok (I guess was because of the number in the end: https://pastebin.com/j4Tz1fDa
sudo apt-get build-dep linux-image-$(uname -r)
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic binary-perarch

after a long time, I copied

linux-headers-4.15.0-9_4.15.0-9.10_all.deb
linux-headers-4.15.0-9-generic_4.15.0-9.10_amd64.deb
linux-image-4.15.0-9-generic_4.15.0-9.10_amd64.deb
linux-image-extra-4.15.0-9-generic_4.15.0-9.10_amd64.deb

and installed on my fresh ubuntu 17.10.1 install on the razer blade stealth by doing

sudo dpkg -i *.deb

then I added nvme_..._us=0 to grub and did

sudo update-grub

rebooted and used for a while (confirmed using uname-r that the new kernel was running). In the first time I did all this, the problem ocurred while installing updates. In the second time I tried, the error ocurred when tried to install virt-manager.

Since the kernel worked perfectly except for that, I can only assume that your diff didn't go through. But if I do git checkout <tag> and then apply a diff to that tag, then I can simply cd to this folder and compile and I'll be using the diff, right?

Thank you for your help!

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-20:

#69

I want to know the reason behind compiling your own kernel, is it because with kernel parameter "nvme_core.default_ps_max_latency_us=0" you still encounter some disk errors?

If it's true, then we need to put the patch into Bionic's kernel, and make sure the daily Bionic iso use the new kernel.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-20:

#70

I added nvme_core.default_ps_max_latency_us=0 because you said in an older comment. Is it necessary or can I take it off?

I'm compiling my own because I want to learn and also test new kernels as they are released, specially now with specte and meltdown (it's going to take time for it to reach mainline and even more time for it to reach the live installer). Also it's a good pratice for security reasons.

I don't see what I did wrong, my kernel should work exactly as yours.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-20:

#71

Please remove the kernel parameter so we can make sure it works with APST enabled.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-22:

#72

Since you wrote the last message I recompiled the kernel and reinstalled. Tested again, the problem ocurred in about 1 hour. Then I took the kernel parameter off and started to test and I've been running for more than 24 hours without errors. However, the error ocurred inside a virtual machine. But the disk in the machine is named /dev/sda1, so it's not using NVME drivers or anything like that. How is it possible for the error to occur inside the virtual machine but not on the main machine? Could this be due to another completely unrelated problem?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-22:

#73

Thanks Lucas. Sounds like the issue is gone when APST gets enabled.

It should be great if you can test it with more S3 cycles.

Regarding to the VM issue, I can't be sure unless you attach the error message.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-22:

#74

The kernel is still good. The error happened again in the virtual machine, here's dmesg:

[ 6730.708866] EXT4-fs error (device sda1): htree_dirblock_to_tree:976: inode #418562: comm updatedb.mlocat: Directory block failed checksum
[ 6730.710121] Aborting journal on device sda1-8.
[ 6730.711514] EXT4-fs (sda1): Remounting filesystem read-only
[ 6730.713087] EXT4-fs error (device sda1): ext4_journal_check_start:60: Detected aborted journal
[ 7030.415582] audit: type=1400 audit(1519269087.344:26): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=2851 comm="cupsd" capability=12 capname="net_admin"
[67539.479651] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
[67539.479670] clocksource: 'kvm-clock' wd_now: 55b3d2da2f60 wd_last: 269f11d4c146 mask: ffffffffffffffff
[67539.479673] clocksource: 'tsc' cs_now: 422f92c80a6a cs_last: 422e9dc07f56 mask: ffffffffffffffff

what do you think? My disk is /dev/sda1 on the virtual machine, so no NVME... I'm using KVM spice

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-26:

#75

Download full text (7.8 KiB)

So... the error happened :(

I don't know if it's related but I was compiling QT 5 inside a virtual machine and gone to sleep. When I woke up there was an error in the compilation about not being able to allocate virtual memory. The vm was unusable (I pressed things and they won't work) then I rebooted the VM and in the ubuntu initialization there was something about trying to write outside disk hd1. I did fsck then and the machine kept printing lots of lines indefintely about disk writes (wouldn't stop). Tried to print but couldn't save.

Now, in the main machine, I did touch a and the error Read Only filesystem appeared. Then looks at dmesg:

[62526.097648] CPU3: Package temperature above threshold, cpu clock throttled (total events = 734393)
[62526.097650] CPU2: Package temperature above threshold, cpu clock throttled (total events = 734389)
[62526.097654] CPU0: Package temperature above threshold, cpu clock throttled (total events = 734421)
[62526.098643] CPU0: Core temperature/speed normal
[62526.098644] CPU2: Core temperature/speed normal
[62526.098644] CPU3: Package temperature/speed normal
[62526.098645] CPU1: Package temperature/speed normal
[62526.098646] CPU2: Package temperature/speed normal
[62526.098647] CPU0: Package temperature/speed normal
[62826.083664] CPU0: Core temperature/speed normal
[62826.083665] CPU2: Core temperature/speed normal
[62826.083666] CPU3: Package temperature/speed normal
[62826.083667] CPU1: Package temperature/speed normal
[62826.083667] CPU2: Package temperature/speed normal
[62826.083669] CPU0: Package temperature/speed normal
[63109.039660] CPU3: Core temperature above threshold, cpu clock throttled (total events = 122579)
[63109.039661] CPU1: Core temperature above threshold, cpu clock throttled (total events = 122586)
[63109.043637] CPU3: Core temperature/speed normal
[63109.043637] CPU1: Core temperature/speed normal
[63141.298625] CPU2: Core temperature above threshold, cpu clock throttled (total events = 685839)
[63141.298626] CPU0: Core temperature above threshold, cpu clock throttled (total events = 685861)
[63141.298628] CPU1: Package temperature above threshold, cpu clock throttled (total events = 752070)
[63141.298628] CPU3: Package temperature above threshold, cpu clock throttled (total events = 752043)
[63141.298630] CPU0: Package temperature above threshold, cpu clock throttled (total events = 752073)
[63141.298633] CPU2: Package temperature above threshold, cpu clock throttled (total events = 752042)
[63141.311665] CPU0: Core temperature/speed normal
[63141.311666] CPU2: Core temperature/speed normal
[63141.311667] CPU1: Package temperature/speed normal
[63141.311667] CPU3: Package temperature/speed normal
[63141.311668] CPU2: Package temperature/speed normal
[63141.311669] CPU0: Package temperature/speed normal
[63441.300764] CPU2: Core temperature/speed normal
[63441.300765] CPU0: Core temperature/speed normal
[63441.300766] CPU3: Package temperature/speed normal
[63441.300766] CPU1: Package temperature/speed normal
[63441.300767] CPU0: Package temperature/speed normal
[63441.300768] CPU2: Package temperature/speed normal
[63742.088404] CPU0: Core temperature above threshold, cpu cloc...

So... the error happened :(

I don't know if it's related but I was compiling QT 5 inside a virtual machine and gone to sleep. When I woke up there was an error in the compilation about not being able to allocate virtual memory. The vm was unusable (I pressed things and they won't work) then I rebooted the VM and in the ubuntu initialization there was something about trying to write outside disk hd1. I did fsck then and the machine kept printing lots of lines indefintely about disk writes (wouldn't stop). Tried to print but couldn't save.

Now, in the main machine, I did touch a and the error Read Only filesystem appeared. Then looks at dmesg:

[62526.097648] CPU3: Package temperature above threshold, cpu clock throttled (total events = 734393)
[62526.097650] CPU2: Package temperature above threshold, cpu clock throttled (total events = 734389)
[62526.097654] CPU0: Package temperature above threshold, cpu clock throttled (total events = 734421)
[62526.098643] CPU0: Core temperature/speed normal
[62526.098644] CPU2: Core temperature/speed normal
[62526.098644] CPU3: Package temperature/speed normal
[62526.098645] CPU1: Package temperature/speed normal
[62526.098646] CPU2: Package temperature/speed normal
[62526.098647] CPU0: Package temperature/speed normal
[62826.083664] CPU0: Core temperature/speed normal
[62826.083665] CPU2: Core temperature/speed normal
[62826.083666] CPU3: Package temperature/speed normal
[62826.083667] CPU1: Package temperature/speed normal
[62826.083667] CPU2: Package temperature/speed normal
[62826.083669] CPU0: Package temperature/speed normal
[63109.039660] CPU3: Core temperature above threshold, cpu clock throttled (total events = 122579)
[63109.039661] CPU1: Core temperature above threshold, cpu clock throttled (total events = 122586)
[63109.043637] CPU3: Core temperature/speed normal
[63109.043637] CPU1: Core temperature/speed normal
[63141.298625] CPU2: Core temperature above threshold, cpu clock throttled (total events = 685839)
[63141.298626] CPU0: Core temperature above threshold, cpu clock throttled (total events = 685861)
[63141.298628] CPU1: Package temperature above threshold, cpu clock throttled (total events = 752070)
[63141.298628] CPU3: Package temperature above threshold, cpu clock throttled (total events = 752043)
[63141.298630] CPU0: Package temperature above threshold, cpu clock throttled (total events = 752073)
[63141.298633] CPU2: Package temperature above threshold, cpu clock throttled (total events = 752042)
[63141.311665] CPU0: Core temperature/speed normal
[63141.311666] CPU2: Core temperature/speed normal
[63141.311667] CPU1: Package temperature/speed normal
[63141.311667] CPU3: Package temperature/speed normal
[63141.311668] CPU2: Package temperature/speed normal
[63141.311669] CPU0: Package temperature/speed normal
[63441.300764] CPU2: Core temperature/speed normal
[63441.300765] CPU0: Core temperature/speed normal
[63441.300766] CPU3: Package temperature/speed normal
[63441.300766] CPU1: Package temperature/speed normal
[63441.300767] CPU0: Package temperature/speed normal
[63441.300768] CPU2: Package temperature/speed normal
[63742.088404] CPU0: Core temperature above threshold, cpu clock throttled (total events = 702907)
[63742.088405] CPU1: Package temperature above threshold, cpu clock throttled (total events = 773051)
[63742.088406] CPU3: Package temperature above threshold, cpu clock throttled (total events = 773025)
[63742.088407] CPU2: Core temperature above threshold, cpu clock throttled (total events = 702884)
[63742.088408] CPU2: Package temperature above threshold, cpu clock throttled (total events = 773023)
[63742.088412] CPU0: Package temperature above threshold, cpu clock throttled (total events = 773055)
[63742.089411] CPU2: Core temperature/speed normal
[63742.089412] CPU0: Core temperature/speed normal
[63742.089412] CPU3: Package temperature/speed normal
[63742.089413] CPU1: Package temperature/speed normal
[63742.089413] CPU0: Package temperature/speed normal
[63742.089414] CPU2: Package temperature/speed normal
[64044.223423] CPU0: Core temperature above threshold, cpu clock throttled (total events = 712002)
[64044.223424] CPU2: Core temperature above threshold, cpu clock throttled (total events = 711979)
[64044.223425] CPU1: Package temperature above threshold, cpu clock throttled (total events = 783486)
[64044.223426] CPU3: Package temperature above threshold, cpu clock throttled (total events = 783460)
[64044.223428] CPU2: Package temperature above threshold, cpu clock throttled (total events = 783459)
[64044.223431] CPU0: Package temperature above threshold, cpu clock throttled (total events = 783491)
[64044.235444] CPU2: Core temperature/speed normal
[64044.235444] CPU3: Package temperature/speed normal
[64044.235445] CPU1: Package temperature/speed normal
[64044.235446] CPU0: Core temperature/speed normal
[64044.235447] CPU0: Package temperature/speed normal
[64044.235448] CPU2: Package temperature/speed normal
[68668.590466] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.592440] Aborting journal on device nvme0n1p2-8.
[68668.595459] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[68668.595466] EXT4-fs error (device nvme0n1p2): ext4_journal_check_start:61: Detected aborted journal
[68668.595469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.597557] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.599520] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.601525] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.603489] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.605403] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68668.607379] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26479617: comm updatedb.mlocat: checksumming directory block 0
[68970.398177] audit: type=1400 audit(1519614768.685:44): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=11872 comm="cupsd" capability=12  capname="net_admin"
[70578.929656] CPU2: Core temperature above threshold, cpu clock throttled (total events = 712200)
[70578.929656] CPU0: Core temperature above threshold, cpu clock throttled (total events = 712223)
[70578.929659] CPU3: Package temperature above threshold, cpu clock throttled (total events = 783686)
[70578.929659] CPU1: Package temperature above threshold, cpu clock throttled (total events = 783712)
[70578.929662] CPU0: Package temperature above threshold, cpu clock throttled (total events = 783717)
[70578.929666] CPU2: Package temperature above threshold, cpu clock throttled (total events = 783685)
[70578.930558] CPU2: Core temperature/speed normal
[70578.930559] CPU0: Core temperature/speed normal
[70578.930560] CPU1: Package temperature/speed normal
[70578.930560] CPU3: Package temperature/speed normal
[70578.930561] CPU0: Package temperature/speed normal
[70578.930561] CPU2: Package temperature/speed normal
[70630.805040] virbr0: port 2(vnet0) entered disabled state
[70630.806745] device vnet0 left promiscuous mode
[70630.806752] virbr0: port 2(vnet0) entered disabled state
[70631.364205] audit: type=1400 audit(1519616429.742:45): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-07c4040c-b1a8-4849-9f15-dcf01356c56b" pid=12422 comm="apparmor_parser"

What's the relation between CPU temperature and the SSD entering in read-only mode? I'm not going to compile this QT again, too much heat (and it was on a flat surface)

What do you think?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-26:

#76

here's a print screen:

https://imgur.com/a/ZEBAP

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-26:

#77

Do you have full dmesg in comment #75? Do you see any NVMe error?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-26:

#78

Nope, sorry, I rebooted after copying the dmesg. I thought that since there is [68668.595459] EXT4-fs (nvme0n1p2): Remounting filesystem read-only in what I copied, it was enough because there is where the error started. Gonna cat the entire file the next time, but I'm afraid it'll only happend if I force my CPU too much like I did (actually this is good, because the nvme took a very long time to enter in read only mody, it's a very good progress)

Meanwhile, I think that the average of read only errors inside the VMs is like 0.8/day. I always test the main machine when these errors happen and it's always fine. The only time when it gone wrong was this one that I reported.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-26:

#79

Just to remember, inside the VM the error looks like this:

[26547.754916] EXT4-fs error (device sda1): htree_dirblock_to_tree:976: inode #31777: comm gvfsd-trash: Directory block failed checksum
[26547.756301] Aborting journal on device sda1-8.
[26547.757724] EXT4-fs (sda1): Remounting filesystem read-only
[26547.762207] EXT4-fs error (device sda1): ext4_journal_check_start:60: Detected aborted journal
[26631.771204] EXT4-fs error (device sda1): htree_dirblock_to_tree:976: inode #302034: comm gvfsd-trash: Directory block failed checksum

when outside there's no problem at all.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-02-26:

#80

I don't think the EXT4 issue inside VM is the same as NVMe one.

If you no longer have the issue on host machine, then the fix works.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-26:

#81

Yes, the fix totally works. The only time when I had a real nvme error on the main machine was that one I reported. Don't know why, though, but it looks like that the VM gone terribly wrong. But these VMs are just plain ubuntus with docker, visual studio code and git. Nothing fancy, don't know why I keep getting ext4 problems.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-02-28:

#82

Well, the error happened again, and I wasn't even running any VMs

Maybe there's a rare case in which your diff correction didn't get applied?

:(

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-01:

#83

Please change the line in the patch from
"return NVME_QUIRK_PCI_RESET_RESUME";
to
"return (NVME_QUIRK_PCI_RESET_RESUME | NVME_QUIRK_NO_DEEPEST_PS);

And see if this still happens.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-01:

#84

Ok, I'm compiling it now on my other PC. Meanwhile, the error happened twice in the same day. That's odd, it usually took at least 2 days to manifest again. Could it be that something chanded on my PC?

Anyways, here's the error:

4609.325351] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26610978: comm updatedb.mlocat: checksumming directory block 0
[ 4609.327443] Aborting journal on device nvme0n1p2-8.
[ 4609.329533] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[ 4609.357281] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #26739117: comm updatedb.mlocat: checksumming directory block 0
[ 4609.627350] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1447: inode #5901277: comm updatedb.mlocat: checksumming directory block 0
[ 4795.596378] perf: interrupt took too long (2563 > 2500), lowering kernel.perf_event_max_sample_rate to 78000
[ 4911.346846] audit: type=1400 audit(1519876882.781:29): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=4149 comm="cupsd" capability=12 capname="net_admin"

I'll test the new compiler kernel in some hours

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-03:

#85

After 2 days with the new kernel, it happened again. Seems like around every 2 days it happens. Maybe some rare nvme write that you didn't cover in the quirks?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-05:

#86

Again, can you attach full dmesg?
Because the message is not about NVMe, but EXT4.

Please fsck the rootfs before any further testing.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-05:

#87

The next time it happen I will post the full dmesg. But even my first message (#1):

Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0

cites EXT-4 errors. It's always been like that, nothing changed.

I need to fsck the rootfs now or when the error happens again? And how should I do it?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-05:

#88

Generally I boot up a live system and run fsck on they block device. A quick google shows there are several ways to achieve the same thing.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-09:

#89

So does the issue still happen after fsck?

Does quirk NVME_QUIRK_PCI_RESET_RESUME alone work for you?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-10:

#90

Sorry I didn't test it yet. I had to travel and use the computer so I did a fsck on onvme0n1p2 only, to get it working.

I thought fsck /dev/nvme0n1p2 was the same a fscking the rootfs. I do it every time the error happens. I didn't understand exactly what you meant.

Also I think NVME_QUIRK_NO_DEEPEST_PS had no effect when added. I'd however try to make it work with it first, and if everything goes ok I'd take it off to see if it continues.

If I need to do it before the error happens, I can just run my live ubuntu and do it.

Thank you for your help.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-17:

#91

Ok so I didn't know exactly what to do.

I was using my machine and even though I didn't get any errors I rebooted, entered ubuntu live image and did fsck on /dev/nvme0n1p1 and /dev/nvme0n1p2, no errors showed up

Then I continued using my machine and the error appeared. I rebooted into the live machine and did

fsck /dev/nvme0n1p2 and fsck /dev/nvme0n1p1

here are the outputs:

https://pastebin.com/jpz5SwrR

https://pastebin.com/xNMQPuVi

(in the nvme0n1p1 I got dirty bit, don't know what is this, and in nvme0n1p2 the output looks the same as when I've run the fsck from the SSD)

The error persists.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-18:

#92

Seems to me it's a bug in EXT4 instead of NVMe.

So seems like NVME_QUIRK_PCI_RESET_RESUME is not needed?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-18:

#93

Without NVME_QUIRK_PCI_RESET_RESUME the bug happens every 2 hours. With it, the bug happens every 2 days or more (I think there was a time i've run 5 to 8 days without an error)

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-19:

#94

The error just happened again inside the VM, and upon VM reboot it said that something was trying to access outside the disk hd0 or hd1, don't remember. Then I noted that the host machine had the error too.

This already happened before and I mentioned in commet #75

Somehow the virtual machine errors and the host machine errors are related. However, the error happens even without virtual machine usage.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-19:

#95

Do you suspend/resume the system during your usage?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-19:

#96

I do it a lot. However the errors don't happen right afetr waking from suspend. They take some time.

Before the NVME quirk I tried to see if suspend/resume influenced in the error. I experienced botting the PC and leaving it open until the error ocurred, proving the suspend wasn't causing it.

However now with the NVME quirk the computer takes a lot of time to show the error, so I always en dup closing it. However I think there was one time the error happened right after turning the PC on, though I'm not 100% sure

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-20:

#97

Have you ever seen error message that says the nvme device stays in D3 and refused to change to D0?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-20:

#98

No. The times I've read dmesg there was nothing like this, neither as an error popup. I'll grep D0 and D3 the next time though

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-28:

#99

I just noted that almost always when I do

docker rm $(docker ps -a -q)

to remove all docker containers inside my VM, the error happens. Maybe high disk usage causes this?

The errors on my computer are taking time to happen, but ih the VMs it happens every day.

I'm 100% thankful for what you've done. If you know something, I can pay, these errors are making very hard for me to work

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-03-29:

#100

Please try this one, I built it with NVMe queue depth = 2.

https://people.canonical.com/~khfeng/lp1746340-q-2/

Also please attach the dmesg, thanks!

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-03-29:

#101

Would you mind posting the diff? I'm using custom kernel modifications (not related to disk and tested without them)

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-04-01:

#102

Ok, there's actually a kernel parameter for that, please boot with "nvme.io_queue_depth=2"

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-01:

#103

I tried this parameter and the computer got stuck at the loading screen. Had to enter recovery mode and remove the parameter to make it boot again

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-04-02:

#104

Can you try some value like 64? PM1725 NVMe uses this value.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-03:

#105

I'm trying since you wrote. No problems yet on the host machine, but the virtual machine already presented the error twice today (not related to high disk usage)

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-07:

#106

Ok, the error just happened now in the main machine. Took much more days to happen.

I picked my computer while it were sleeping but opened, and moved the mouse and saw a black screen with nothing on it. Then pressed power and soething appeared: /.../libvirt .... read only file system and then it turned off. Libvirt was running at the time so I think it was just an error saying that libvirt was trying to write to the disk. Don't know if it's related.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-07:

#107

And it just happened again. It's common to happend again right after I rebooted ans fscked from a previous one. Then it calms down for some days

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-04-10:

#108

Do you see similar behavior under Windows?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-10:

#109

On the epoch when I was having the error every 2 hours I installed windows and used for some days without any problems, so I guess not. I also tried an old debian and installation went ok (on ubuntu it fails 80% of the time. I have to try many until I get a good installation)

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-04-10:

#110

So do you see the same issue with mainline v4.9 kernel [1]?

From what I can understand, disable APST can let you fully install Ubuntu/Debian, but after some usage, you still have to fsck the disk?

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9.93/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-10:

#111

I did not try to install with disable APST. I tried to put a custom kernel in the live CD but it wouldn't boot. My current ubuntu was installed with trial and error until it installed without any errors. The installation process is not so important for me, I can try 5 or 7 times before getting it right. I'm mainly concerned about usage after.

So this is what happened: before any quirks, I was having the error every 2 hours. After your kernel quirk, I started having the error every 2 days on main machine and on average every day on the virtual machines. After the NVME queue_depth parameter it looks like it's taking 5 days, but the virtual machines continue giving the errors 1 time per day on average.

So I should try this new kernel? I suppose it already has the quirk you created. I'll install it soon. Not now because I need to backup things, because if the error happens during the kernel installation then the whole ubuntu is going to get wrecked.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-04-10:

#112

The quirk is not included.

I asked this because looks like you didn't need to fsck under Debian with Linux v4.9.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-10:

#113

I didn't need but suddenly I installed some updates and it broke. However I don't think the kernel got upgraded with that update.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-04-12:

#114

Can you use v4.9 under Ubuntu and see if this still happens? Or does your laptop need driver support from newer kernel?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-04-24:

#115

Hi, I'm back, sorry for the delay. I'll test it soon again. I tried and the error happened in the middle of the update and broke my ubuntu. I'll reinstall and try again

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-14:

#116

Lucas,

Can you attach `sudo lspci -vvnn` here? Thanks!

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-06-14:

#117

Download full text (28.2 KiB)

Hello. Thank you for your continued support! I was unable to test the older kernel yet as I'm using this PC constantly and cannot lose or have it unusable for too much time, as when the system gets corrupted I have to spend hours trying to install it again without errors.

Here's the output:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
Subsystem: Razer USA Ltd. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
Subsystem: Razer USA Ltd. HD Graphics 620 [1a58:6752]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 127
Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0
   ExtTag- RBE+
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 128 bytes, MaxReadReq 128 bytes
  DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee00018 Data: 0000
Capabilities: [d0] Power Management version 2
  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Process Address Space ID (PASID)
  PASIDCap: Exec- Priv-, Max PASID Width: 14
  PASIDCtl: Enable- Exec- Priv-
Capabilities: [200 v1] Address Translation Service (ATS)
  ATSCap: Invalidate Queue Depth: 00
  ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [300 v1] Page Request Interface (PRI)
  PRICtl: Enable- Reset-
  PRISta: RF- UPRGI- Stopped+
  Page Request Capacity: 00008000, Page Request Allocation: 00000000
Kernel driver in use: i915
Kernel modules: i915

00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
Subsystem: Razer USA Ltd. Sunrise Point-LP USB 3.0 xHCI Controller [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr...

Hello. Thank you for your continued support! I was unable to test the older kernel yet as I'm using this PC constantly and cannot lose or have it unusable for too much time, as when the system gets corrupted I have to spend hours trying to install it again without errors.

Here's the output:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
	Subsystem: Razer USA Ltd. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: [e0] Vendor Specific Information: Len=10 <?>

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
	Subsystem: Razer USA Ltd. HD Graphics 620 [1a58:6752]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 127
	Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
	Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Process Address Space ID (PASID)
		PASIDCap: Exec- Priv-, Max PASID Width: 14
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [200 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable-, Smallest Translation Unit: 00
	Capabilities: [300 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped+
		Page Request Capacity: 00008000, Page Request Allocation: 00000000
	Kernel driver in use: i915
	Kernel modules: i915

00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
	Subsystem: Razer USA Ltd. Sunrise Point-LP USB 3.0 xHCI Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 122
	Region 0: Memory at dc310000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
		Address: 00000000fee00258  Data: 0000
	Kernel driver in use: xhci_hcd

00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Thermal subsystem [1a58:6752]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin C routed to IRQ 18
	Region 0: Memory at dc32d000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Kernel driver in use: intel_pch_thermal
	Kernel modules: intel_pch_thermal

00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Serial IO I2C Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at dc32c000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci

00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 [8086:9d61] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Serial IO I2C Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 17
	Region 0: Memory at dc32b000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci

00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP CSME HECI #1 [8086:9d3a] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP CSME HECI [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 128
	Region 0: Memory at dc32a000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee002f8  Data: 0000
	Kernel driver in use: mei_me
	Kernel modules: mei_me

00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d12] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin C routed to IRQ 18
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Memory behind bridge: dc000000-dc1fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #3, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L0s <1us, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #6, PowerLimit 10.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Razer USA Ltd. Sunrise Point-LP PCI Express Root Port [1a58:6752]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Bus: primary=00, secondary=02, subordinate=3a, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: c4000000-da0fffff
	Prefetchable memory behind bridge: 00000000a0000000-00000000c1ffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #5, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s unlimited, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
			Slot #8, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Razer USA Ltd. Sunrise Point-LP PCI Express Root Port [1a58:6752]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [200 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 [8086:9d18] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Bus: primary=00, secondary=3b, subordinate=3b, sec-latency=0
	Memory behind bridge: dc200000-dc2fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #9, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <1us, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #12, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Razer USA Ltd. Sunrise Point-LP PCI Express Root Port [1a58:6752]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [200 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=40us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1e.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO UART Controller #0 [8086:9d27] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Serial IO UART Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 20
	Region 0: Memory at dc329000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci

00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller [8086:9d58] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP LPC Controller [1a58:6752]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC [8086:9d21] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP PMC [1a58:6752]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Region 0: Memory at dc324000 (32-bit, non-prefetchable) [size=16K]
	Kernel driver in use: intel_pmc_core

00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d71] (rev 21)
	Subsystem: Razer USA Ltd. Device [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32
	Interrupt: pin A routed to IRQ 130
	Region 0: Memory at dc320000 (64-bit, non-prefetchable) [size=16K]
	Region 4: Memory at dc300000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00378  Data: 0000
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_soc_skl

00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP SMBus [1a58:6752]
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at dc328000 (64-bit, non-prefetchable) [size=256]
	Region 4: I/O ports at f040 [size=32]
	Kernel modules: i2c_i801

01:00.0 Network controller [0280]: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter [168c:003e] (rev 32)
	Subsystem: Bigfoot Networks, Inc. QCA6174 802.11ac Wireless Network Adapter [1a56:1535]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 129
	Region 0: Memory at dc000000 (64-bit, non-prefetchable) [size=2M]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/8 Maskable+ 64bit-
		Address: fee00358  Data: 0000
		Masking: 000000fe  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [148 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [168 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=50us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci

3b:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a804] (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a801]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	NUMA node: 0
	Region 0: Memory at dc200000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [158 v1] Power Budgeting <?>
	Capabilities: [168 v1] #19
	Capabilities: [188 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [190 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
			   T_CommonMode=0us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: nvme

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-14:

#118

If possible, please try this kernel:
https://people.canonical.com/~khfeng/pm961-disable-aspm/

Please also attach `sudo lspci -vvnn` with this kernel, thanks!

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-06-14:

#119

Could you provide the diff file? I need to compile with other modifications.

Or better yet, is there a command to disable aspm in boot?

Thank you so much

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-15:

#120

0001-ASPM-quirk-for-SM-PM-EVO-961.patch Edit (3.8 KiB, text/plain)

For Bionic kernel.

Ubuntu Foundations Team Bug Bot (crichton) on 2018-06-15

tags:

added: patch

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-06-16:

#121

Should I also add the other quirk you made which made the problem happen fewer times?

https://lkml.org/lkml/2018/2/15/347

I'm using the 4.15-23

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-16:

#122

No, just use the patch in #120.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-06-18:

#123

Download full text (28.1 KiB)

I just compiled the kernel and it presented the error just minutes after the first boot.

A reminder: my kernel parameters are still like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash button.lid_init_state=open nvme.io_queue_depth=64"

Here's the output you wanted:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
Subsystem: Razer USA Ltd. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
Subsystem: Razer USA Ltd. HD Graphics 620 [1a58:6752]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 124
Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0
   ExtTag- RBE+
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 128 bytes, MaxReadReq 128 bytes
  DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee00018 Data: 0000
Capabilities: [d0] Power Management version 2
  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Process Address Space ID (PASID)
  PASIDCap: Exec- Priv-, Max PASID Width: 14
  PASIDCtl: Enable- Exec- Priv-
Capabilities: [200 v1] Address Translation Service (ATS)
  ATSCap: Invalidate Queue Depth: 00
  ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [300 v1] Page Request Interface (PRI)
  PRICtl: Enable- Reset-
  PRISta: RF- UPRGI- Stopped+
  Page Request Capacity: 00008000, Page Request Allocation: 00000000
Kernel driver in use: i915
Kernel modules: i915

00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
Subsystem: Razer USA Ltd. Sunrise Point-LP USB 3.0 xHCI Controller [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >T...

I just compiled the kernel and it presented the error just minutes after the first boot.

A reminder: my kernel parameters are still like this:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash button.lid_init_state=open nvme.io_queue_depth=64"

Here's the output you wanted:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
	Subsystem: Razer USA Ltd. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
	Latency: 0
	Capabilities: [e0] Vendor Specific Information: Len=10 <?>

00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
	Subsystem: Razer USA Ltd. HD Graphics 620 [1a58:6752]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 124
	Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
	Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Process Address Space ID (PASID)
		PASIDCap: Exec- Priv-, Max PASID Width: 14
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [200 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable-, Smallest Translation Unit: 00
	Capabilities: [300 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped+
		Page Request Capacity: 00008000, Page Request Allocation: 00000000
	Kernel driver in use: i915
	Kernel modules: i915

00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
	Subsystem: Razer USA Ltd. Sunrise Point-LP USB 3.0 xHCI Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 122
	Region 0: Memory at dc310000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [70] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
		Address: 00000000fee00258  Data: 0000
	Kernel driver in use: xhci_hcd

00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Thermal subsystem [8086:9d31] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Thermal subsystem [1a58:6752]
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin C routed to IRQ 18
	Region 0: Memory at dc32d000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Kernel driver in use: intel_pch_thermal
	Kernel modules: intel_pch_thermal

00:15.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 [8086:9d60] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Serial IO I2C Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at dc32c000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci

00:15.1 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 [8086:9d61] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Serial IO I2C Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 17
	Region 0: Memory at dc32b000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci

00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP CSME HECI #1 [8086:9d3a] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP CSME HECI [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 128
	Region 0: Memory at dc32a000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00318  Data: 0000
	Kernel driver in use: mei_me
	Kernel modules: mei_me

00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d12] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin C routed to IRQ 18
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	Memory behind bridge: dc000000-dc1fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #3, Speed 8GT/s, Width x1, ASPM L1, Exit Latency L0s <1us, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #6, PowerLimit 10.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Razer USA Ltd. Sunrise Point-LP PCI Express Root Port [1a58:6752]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout+ NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1c.4 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Bus: primary=00, secondary=02, subordinate=3a, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: c4000000-da0fffff
	Prefetchable memory behind bridge: 00000000a0000000-00000000c1ffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #5, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s unlimited, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+
			Slot #8, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Razer USA Ltd. Sunrise Point-LP PCI Express Root Port [1a58:6752]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [200 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 [8086:9d18] (rev f1) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Bus: primary=00, secondary=3b, subordinate=3b, sec-latency=0
	Memory behind bridge: dc200000-dc2fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #9, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <1us, L1 <16us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #12, PowerLimit 25.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [90] Subsystem: Razer USA Ltd. Sunrise Point-LP PCI Express Root Port [1a58:6752]
	Capabilities: [a0] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- EgressCtrl- DirectTrans-
		ACSCtl:	SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
	Capabilities: [200 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=40us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [220 v1] #19
	Kernel driver in use: pcieport
	Kernel modules: shpchp

00:1e.0 Signal processing controller [1180]: Intel Corporation Sunrise Point-LP Serial IO UART Controller #0 [8086:9d27] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP Serial IO UART Controller [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 20
	Region 0: Memory at dc329000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] Vendor Specific Information: Len=14 <?>
	Kernel driver in use: intel-lpss
	Kernel modules: intel_lpss_pci

00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller [8086:9d58] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP LPC Controller [1a58:6752]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0

00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC [8086:9d21] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP PMC [1a58:6752]
	Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Region 0: Memory at dc324000 (32-bit, non-prefetchable) [disabled] [size=16K]

00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d71] (rev 21)
	Subsystem: Razer USA Ltd. Device [1a58:6752]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 32
	Interrupt: pin A routed to IRQ 130
	Region 0: Memory at dc320000 (64-bit, non-prefetchable) [size=16K]
	Region 4: Memory at dc300000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee00378  Data: 0000
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_soc_skl

00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] (rev 21)
	Subsystem: Razer USA Ltd. Sunrise Point-LP SMBus [1a58:6752]
	Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 255
	Region 0: Memory at dc328000 (64-bit, non-prefetchable) [size=256]
	Region 4: I/O ports at f040 [size=32]
	Kernel modules: i2c_i801

01:00.0 Network controller [0280]: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter [168c:003e] (rev 32)
	Subsystem: Bigfoot Networks, Inc. QCA6174 802.11ac Wireless Network Adapter [1a56:1535]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 129
	Region 0: Memory at dc000000 (64-bit, non-prefetchable) [size=2M]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/8 Maskable+ 64bit-
		Address: fee00338  Data: 0000
		Masking: 000000fe  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via message
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [148 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [168 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=50us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=0ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci

3b:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a804] (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961 [144d:a801]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	NUMA node: 0
	Region 0: Memory at dc200000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr+ UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00003000
		PBA: BAR=0 offset=00002000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [158 v1] Power Budgeting <?>
	Capabilities: [168 v1] #19
	Capabilities: [188 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [190 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=163840ns
		L1SubCtl2: T_PwrOn=10us
	Kernel driver in use: nvme
	Kernel modules: nvme

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-21:

#124

So the ASPM is indeed disabled:
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

Can yo try disabling the deepest power state under this kernel? i.e. use "nvme-core.default_ps_max_latency_us=1500".

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-21:

#125

FWIW there's another user says that disabling ASPM fixes this issue for him.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-06-21:

#126

Should I take nvme.io_queue_depth=64 out? I didn't experience the problem again yet, just right after the first boot with the new kernel. However I still experience the error inside the VMs.

I'm adding nvme-core.default_ps_max_latency_us=1500 now.

Is this an user of Razer Blade Stealth? Would be good to talk to him to see if he experiences the problems inside VMs, which is very annoying as I do everything in them.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-06-21:

#127

No the user uses an XPS 9560. I think remove io_queue_depth parameter should be safe.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-07-02:

#128

I've tested the new kernel with this parameter for 10 days and the error didn't happen, which is a record. So I decided to open a VM yesterday and it handled good for like 12 hours, they I went to sleep and grabbed the computer again and the error had happened inside and outside the VM. I don't know if it was caused by the VM itself but for those 10 days I tested, it worked great.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-07-03:

#129

Can you attach the error? Maybe use something else like virtual box to see if KVM is the culprit?

Revision history for this message

hariprasad (hariprasad) wrote on 2018-07-30:

#130

I can confirm. I have the same problems since 10.11. 2017. Some hours system works and suddenly there are serious SSD problems. I changed SSD rour times (reclamation) and I did recamation odf a new Computer one times.
HW: Intel NUC7i7BNH (Intel i7), Samsung EVO 960 M.2 NVMe, OPM Crucial 16GB.
SSD formatting: Partition table GPT, Primary Partitions EXT4
SW: Ubuntu Linux 18.04

Installation UEFI, legacy has no effect.

Revision history for this message

hariprasad (hariprasad) wrote on 2018-07-30:

#131

Additionally, i can comfirm, that problem is not bounded only on Samsung NVME SSD, but It occurs on Intel SSD as well. It looks like, that problem is a SSD Driver.

Revision history for this message

hariprasad (hariprasad) wrote on 2018-07-31:

#132

There is an issue, wich can be our problem. it looks like, that workaround is to disable TRIM. https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-07-31:

#133

Hi hariprasad. nvme-core.default_ps_max_latency_us=1500 and ASPM disabling by kernel patching worked for me, at least for the main machine (my virtual machines inside this main machine still give the error after some time). I've been using the patch for almost a month without incidents. I had the problem inside a virtual machine after some days, but it didn't happen again yet (I'm using VMs but not for too much time like in the last time the problem happened).

I'm going to try to disable TRIM but it's going to take days for me to test if the VMs give any errors. Should sudo rm /etc/cron.weekly/fstrim be enough?

Have you experienced problems installing ubuntu into your SSD? My ubuntu installation gives disk error in 8/10 tries. That is, I need to reinstall ubuntu 8 times in average for the installation to end without any problems. I didn't try to install ubuntu with the kernel patch because it's a lot of work to create a live CD instalation with a patched kernel, but maybe I'll do it some day.

I also need to try virtualbox in place of virt-manager as kaihengfeng suggested. Problem is that I need to leave things open for days to notice these errors, so it's going to take time.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-08-01:

#134

Lucas,

I've found that the PCIe common clock may be the culprit here.
Please try the kernel [1].

[1] https://people.canonical.com/~khfeng/quirk-no-commclk/

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-08-02:

#135

Could you send me the patch (and should I use the aspm patch together with it?)

Another information that you might find useful: I've been using for 30 days without any problems on the main machine (only one incident inside the virtual machine), so today I decided to finally conclude all the missing updates that I've been waiting to do since the problems arised. It updated more than 1000 packages, and as before it gave the Read Only error while updating initram. It seems that it ALWAYS happens when updating initram.

Here's the output I was able to save: https://pastebin.com/raw/PaSQwRJN

So after 30 days without any problems it happens while doing this. It's gotta explain something, because it's very unusual, and I've always had problems with initram before.

Revision history for this message

Sam (samr28) wrote on 2018-08-06:

#136

I also have the same issue on a Razer Blade 2017 - 7500U model. My system has the exact same drive in it. I have just installed the 4.18.0-3 kernel linked above and will post here if I run into the issue again.

Revision history for this message

hariprasad (hariprasad) wrote on 2018-08-11:

#137

Hello, my Ubuntu Linux was 8-9 months out of order v. 17.10, and later 18.04. I did many installations and tests, changed SSD M.2 four times (recognized reclamation), changed the whole NUC7i7BNH (recognized reclamation). Log Issue on Intel. Finally I installed Fedora 28. and NVME M.2 SSD is in good condition and work properly. I used default LVM partition format. Ubuntu during installation on VLN filed immediatelly during installation, when updates were applied. The problem is bounded specially with Ubuntu. I doesn't check, which driver use Fedora and Ubuntu, Fedora kernel is '4.17.12-200.fc28.x86_64 #1 x86_64 GNU/Linux', so I cannot distill if it should be in driver or is system settings. But finally, i can say, that my reclamations were unauthorized. Sadly, in that case, the easiest workaround for me is to use different Linux.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-08-13:

#138

Hi hariprasad. If possible, you could try our patched kernel which disables ASPM: https://people.canonical.com/~khfeng/pm961-disable-aspm/. It worked for me but only when I added the nvme-core.default_ps_max_latency_us=1500 kernel parameter. Maybe you can try some day. We're still investigating the issue. When Kai Heng send me the patch I can try it and see what changes.

Revision history for this message

hariprasad (hariprasad) wrote on 2018-08-28:

#139

Hello Lucas, thank you for response. Yes, badly setted kernel parameters can cause very serious problems. Additionally, I can comfirm, that problem is not bounded only on Samsung NVME SSD, but It occurs on Intel SSD-6 series as well. It looks like, that problem is in ASPM SSD Driver - kernel parameters. There are a few errors, which came in one time. The easiest way, how to simulate initframfs error during startup/restart is to install Thunderbird and download thousands emails from cloud e.q. google mail to generate traffic on SSD. Than install and startup Firefox, add plugins for video (Player) and stertup video. Firefox for Linux (last version was something about 57-61) is unstable on Linux (generally, not only Ubuntu), than Firefox begin crash, and issues - "Would you like to restart and recover Firefox?". It streses the SSD and after a few restores (about 10) probably begin crash Thunderbird. It is the time for restart system. Probably - there will be issue, that it is not possible to start Ubuntu and initframfs error occured.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-08-30:

#140

It's worth saying that the ASPM patch + 1500 kernel parameter worked for me for over a month without giving me one single error. After update to 18.04 now I see the error every 2 or 3 days. Actually, in the middle of the update process to 18.04 it gave the error right on the initramfs update, which is where it always gives the error. This is sad, it was working perfectly except inside the VMs but it was very stable :(

Revision history for this message

hariprasad (hariprasad) wrote on 2018-09-02:

#141

Hello Lucas, thank you for response. Yes, badly setted kernel parameters can cause very serious problems. Additionally, I can comfirm, that problem is not bounded only on Samsung NVME SSD, but It occurs on Intel SSD-6 series as well. It looks like, that problem is in ASPM SSD Driver - kernel parameters. There are a few errors, which came in one time. The easiest way, how to simulate initframfs error during startup/restart is to install Thunderbird and download thousands emails from cloud e.q. google mail to generate traffic on SSD. Than install and startup Firefox, add plugins for video (Player) and stertup video. Firefox for Linux (last version was something about 57-61) is unstable on Linux (generally, not only Ubuntu), than Firefox begin crash, and issues - "Would you like to restart and recover Firefox?". It streses the SSD and after a few restores (about 10) probably begin crash Thunderbird. It is the time for restart system. Probably - there will be issue, that it is not possible to start Ubuntu and initframfs error occured.

Revision history for this message

Sam (samr28) wrote on 2018-09-16:

#142

I'm also running the ASPM patch and haven't had problems for the last month or so. Any idea when this will get merged?

Revision history for this message

Janne Peltonen (janne-peltonen) wrote on 2018-10-02:

#143

Stumbled onto this bug from somewhere else, and noticed that it seems I have the same samsung SSD drive SM961/PM961 (Same output on lspci --vvnn regarding the NVMe as Lucas posted,). However, for me it has worked without any problems on stock ubuntu 18.04 / mint / kubuntu installations. Perhaps it depends on the system configuration as well instead of just the SSD? Not sure this information helps but though to post it anyway.

Revision history for this message

Fabian (fabiangieseke) wrote on 2018-10-02:

#144

I have had the same issues with Ubuntu 18.04 and a Samsung MZ-V7E1T0 1000GB M.2 PCI Express 3.0 and the default installation (ext4): Plenty of errors, especially when upgrading/installing packages via apt.

I have reinstalled the whole system. Instead of the standard journaling file system (ext4), I have btrfs for the root mount point (/). System works perfectly now, no errors for a couple of days with plenty of software being installed.

Not sure, might be a ext4/kernel bug (?).

Revision history for this message

Janne Peltonen (janne-peltonen) wrote on 2018-10-02:

#145

To add to my previous comment, I've been running ext4 all the time.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-10-02:

#146

Fabian, did you have any problems installing ubuntu? Mine would give disk errors about 7/10 times I tried to install. I had to try many times until no error appeared.

I'd like to try btrfs but I don't have the time to do it right now. I also had problems with apt, but when upgrading the system. It'd always give the error in the initramfs update, or something like that.

I'll try to install a fresh ubuntu 18.04 soon too, as Janne suggested.

Revision history for this message

Fabian (fabiangieseke) wrote on 2018-10-03:

#147

I have tried two things:

(1) Fresh install, Ubuntu 18.04 (about ten days ago), ext4. No errors during the installation. However, when installing stuff via apt afterwards (or upgrading), I got many errors along the lines described above (e.g., "compressed data is corrupt... unexpected end of file or stream"). This happened for, I guess, arbitrary packages. No errors for initramfs update for me ...

(2) Fresh install, Ubuntu 18.04 (about four days ago), btrfs for /. No errors at all.

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-14:

#148

I have this bug (MSI laptop, Ubuntu Studio 18.04) and it's getting quite annoying to be honest. If there's anything I can do to help remedy the situation within reasonable time (I'm about to reinstall) then let me know.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-10-14:

#149

Hi Ole Christian. First, did you have any problems in the ubuntu installation? In mine I had to try to install several times until it installed without any disk errors.

Also, you can try this kernel https://people.canonical.com/~khfeng/pm961-disable-aspm/ with this kernel parameter "nvme-core.default_ps_max_latency_us=1500". This is what worked for me, but it's not a definitive solution, I still get the error in some situations (much more rare than before though). You can read our discussion to understand it better.

I guess someone is working on this bug for a definitive solution...

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-14:

#150

Hi Lucas! Thanks for the reply. No, I had no problems during installation. The computer just shuts down at random intervals to a black screen with all kinds of EXT4-fs errors and reports that the file system is read only. Often the disk isn't even recognized at reboot, so I have to boot into a live environment and use Gparted to fix it from there.

I do music production professionally, so if I can't get it fixed relatively easily and permanently then I'll have to look elsewhere unfortunately.

Thanks though. :)

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2018-10-14:

#151

Ok Christian, thanks for the info. You can try the kernel for now, and I also read that using ubuntu with brtfs system instead of ext4 also solves the problem, you could try

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-14:

#152

I may try that. Are we sure it's a kernel bug though? I can't remember having this problem when I used Solus OS for a while. But I may not have used for long enough since I discovered it didn't support Jack2 and was pretty much unusable to me.

Revision history for this message

pleban (marek-zebrowski-gmail) wrote on 2018-10-24:

#153

I can confirm that bug with two different NVMe drivers - Samsung EVO970 and WD Black in 4.18.0-10-generic and in 4.15.0-20-generic kernels. H270 Intel chipset on the motherboard

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-24:

#154

I have the WD Black 256 Gb drive.

Revision history for this message

pleban (marek-zebrowski-gmail) wrote on 2018-10-24:

#155

nvme-heavy-write-error.txt Edit (5.4 KiB, text/plain)

Attachment contains error from dmesg output. For me reproduction steps are: write large (>10G) amount of data to NVMe ssd.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-10-25:

#156

What's the PCI ID for EVO 970 and WD Black?

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-10-25:

#157

If you use Samsung (144d:a804) or Sk Hynix (1c5c:1285), please try kernel in [1].

[1] https://people.canonical.com/~khfeng/lp1785715/

Revision history for this message

pleban (marek-zebrowski-gmail) wrote on 2018-10-25:

#158

My Samsung is indeed [144d:a808]. I'll check WD later on - it's not connected at this time.
I was not able to reproduce this bug using Clear Linux current kernel (4.18.16-645).

Revision history for this message

pleban (marek-zebrowski-gmail) wrote on 2018-10-25:

#159

My Samsung is indeed [144d:a808]. I'll check WD later on - it's not connected at this time.
I was not able to reproduce this bug using Clear Linux current kernel (4.18.16-645).
I checked kernel https://people.canonical.com/~khfeng/lp1785715/ with no nvme-core.default_ps_max_latency_us= settings and I was not able to reproduce the issue with my "copy lots of data" scenario that triggered the bug every time yesterday.
So it looks like success! I'll keep using that kernel for now and report if any problems arise.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2018-10-25:

#160

The kernel doesn't do anything special for 144d:a808, it's for 144d:a804.

Revision history for this message

pleban (marek-zebrowski-gmail) wrote on 2018-10-25:

#161

Then I'm puzzled. I'll retest later with WD.

Revision history for this message

Janne Peltonen (janne-peltonen) wrote on 2018-10-25:

#162

lspci -vvnn output Edit (3.3 KiB, text/plain)

Here is the output of lspci -vvnn on my computer. It's from the 256GB version of the samsung NVMe.
On my systems I've never had any corruption problems, even moving large (60GB+) VM files and installing OS on ext4 multiple times. Currently running stock LM 19. Hope this helps.

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-25:

#163

lshw output:

*-storage
                description: Non-Volatile memory controller
                product: Sandisk Corp
                vendor: Sandisk Corp
                physical id: 0
                bus info: pci@0000:04:00.0
                version: 00
                width: 64 bits
                clock: 33MHz
                capabilities: storage pm pciexpress msix nvm_express bus_master cap_list
                configuration: driver=nvme latency=0
                resources: irq:16 memory:df100000-df103fff

lspci output:

04:00.0 Non-Volatile memory controller: Sandisk Corp WD Black NVMe SSD

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-25:

#164

lspci -vvnn:

04:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black NVMe SSD [15b7:5001] (prog-if 02 [NVM Express])
        Subsystem: Marvell Technology Group Ltd. WD Black NVMe SSD [1b4b:1093]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        NUMA node: 0
        Region 0: Memory at df100000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: nvme
        Kernel modules: nvme

Revision history for this message

Ole Christian Nilsen (oc-nilsen) wrote on 2018-10-25:

#165

I should probably mention that while it is installed in my laptop it is not currently being used as I had to revert to using an ordinary HDD.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-02-23:

#166

Any news on this problem? Im still having it

Revision history for this message

Richard Grieves (trickydickie) wrote on 2019-02-25:

#167

lspci.txt Edit (26.7 KiB, text/plain)

I too am having an SSD corruption issue with Ubuntu 18.04, same exact symptoms. I have a Kingston 480gb SSD, not nvme, connected over SATA. My PC is a desktop, I have attached the output of lspci -vvnn. I have to do manual fsck every 1.5 weeks or so. When I am using my PC, it will freeze up occasionally for about 15 seconds with very high SSD I/O usage - I have attached an iotop log which recorded a freeze at around 18:03:22 (the log records every 1 second, and you will see there is a gap between a recording at 18:03:22 and 18:03:35 which indicates the freeze, followed by 90%+ io. I have included my SSD smart info as well as my current lsblk output below:

=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37480G
Serial Number: 50026B76825B4FA0
LU WWN Device Id: 5 0026b7 6825b4fa0
Firmware Version: SBFKB1C2
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Feb 25 17:57:34 2019 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

lsblk::

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 3.7M 1 loop /snap/gnome-system-monitor/57
loop1 7:1 0 13M 1 loop /snap/gnome-characters/103
loop2 7:2 0 91M 1 loop /snap/core/6350
loop3 7:3 0 3.7M 1 loop /snap/gnome-system-monitor/51
loop4 7:4 0 2.3M 1 loop /snap/gnome-calculator/180
loop5 7:5 0 140.7M 1 loop /snap/gnome-3-26-1604/78
loop6 7:6 0 270.5M 1 loop /snap/pycharm-community/112
loop7 7:7 0 86.9M 1 loop /snap/core/4917
loop8 7:8 0 91M 1 loop /snap/core/6405
loop9 7:9 0 14.5M 1 loop /snap/gnome-logs/45
loop10 7:10 0 140.7M 1 loop /snap/gnome-3-26-1604/74
loop11 7:11 0 13M 1 loop /snap/gnome-characters/139
loop12 7:12 0 14.5M 1 loop /snap/gnome-logs/37
loop13 7:13 0 2.3M 1 loop /snap/gnome-calculator/260
loop14 7:14 0 34.7M 1 loop /snap/gtk-common-themes/319
loop15 7:15 0 34.6M 1 loop /snap/gtk-common-themes/818
loop16 7:16 0 140.9M 1 loop /snap/gnome-3-26-1604/70
loop17 7:17 0 34.8M 1 loop /snap/gtk-common-themes/1122
sda 8:0 0 447.1G 0 disk
└─sda1 8:1 0 447.1G 0 part /

I too am having an SSD corruption issue with Ubuntu 18.04, same exact symptoms. I have a Kingston 480gb SSD, not nvme, connected over SATA. My PC is a desktop, I have attached the output of lspci -vvnn. I have to do manual fsck every 1.5 weeks or so. When I am using my PC, it will freeze up occasionally for about 15 seconds with very high SSD I/O usage - I have attached an iotop log which recorded a freeze at around 18:03:22 (the log records every 1 second, and you will see there is a gap between a recording at 18:03:22 and 18:03:35 which indicates the freeze, followed by 90%+ io. I have included my SSD smart info as well as my current lsblk output below:

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37480G
Serial Number:    50026B76825B4FA0
LU WWN Device Id: 5 0026b7 6825b4fa0
Firmware Version: SBFKB1C2
User Capacity:    480,103,981,056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Feb 25 17:57:34 2019 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

lsblk::

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0   3.7M  1 loop /snap/gnome-system-monitor/57
loop1    7:1    0    13M  1 loop /snap/gnome-characters/103
loop2    7:2    0    91M  1 loop /snap/core/6350
loop3    7:3    0   3.7M  1 loop /snap/gnome-system-monitor/51
loop4    7:4    0   2.3M  1 loop /snap/gnome-calculator/180
loop5    7:5    0 140.7M  1 loop /snap/gnome-3-26-1604/78
loop6    7:6    0 270.5M  1 loop /snap/pycharm-community/112
loop7    7:7    0  86.9M  1 loop /snap/core/4917
loop8    7:8    0    91M  1 loop /snap/core/6405
loop9    7:9    0  14.5M  1 loop /snap/gnome-logs/45
loop10   7:10   0 140.7M  1 loop /snap/gnome-3-26-1604/74
loop11   7:11   0    13M  1 loop /snap/gnome-characters/139
loop12   7:12   0  14.5M  1 loop /snap/gnome-logs/37
loop13   7:13   0   2.3M  1 loop /snap/gnome-calculator/260
loop14   7:14   0  34.7M  1 loop /snap/gtk-common-themes/319
loop15   7:15   0  34.6M  1 loop /snap/gtk-common-themes/818
loop16   7:16   0 140.9M  1 loop /snap/gnome-3-26-1604/70
loop17   7:17   0  34.8M  1 loop /snap/gtk-common-themes/1122
sda      8:0    0 447.1G  0 disk 
└─sda1   8:1    0 447.1G  0 part /

Revision history for this message

Richard Grieves (trickydickie) wrote on 2019-02-25:

#168

iotop.log Edit (5.1 KiB, text/plain)

Here is the iotop log I mentioned above (attached)

Revision history for this message

Richard Grieves (trickydickie) wrote on 2019-03-03:

#169

My issue has been resolved by upgrading the firmware of my SSD from SBFKB1C2 to SBFKB1C3.

https://askubuntu.com/questions/1107053/ubutnu-18-04-ssd-sometimes-freeze-for-seconds

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-04-19:

#170

Just tried Ubuntu 19 today and the problem persists (can't even install ubuntu because it gives io error)

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-04-19:

#171

Hi Kai-Heng Feng, do you have any news on this problem? It'd be great to know.

Than you so much!

Revision history for this message

Fabian (fabiangieseke) wrote on 2019-04-19:

#172

Hi,

a little update from my side: It seems that faulty memory was the reason for the data corruptions in my case. I have replaced the memory module and everything seems to work fine now. I was quite surprised though that the memory was defective since I did test it carefully for many hours with memtest (20+ passes without any errors). The errors only occured when running Ubuntu ...

The memory was the only thing I have changed, so I am very sure that this was the cause ...

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-05-09:

#173

Lucas,
Do you still have this issue on mainline kernel?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-05-10:

#174

I tried the Ubuntu 19.04 installer and I couldn't even install it because of IO errors. Does the installer of Ubuntu 19.04 uses the new kernel?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-05-15:

#175

Hi Kai-Heng Feng, I just installed kernel 5.1.1 and the error still happens

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2019-06-27:

#176

lp1746340.patch Edit (2.0 KiB, text/plain)

Disable ASPM. Only compile tested.

Revision history for this message

WinEunuchs2Unix (ricklee518) wrote on 2019-06-30:

#177

I've been using NVMe M.2 Samsung Pro 960 for 18 months and never had a problem.
Ubuntu 16.04.6 LTS, Kernel 4.14.114 LTS, Skylake 6700HQ, nVidia 970m
UEFI, GPT, AHCI (Intel Raid off), Secure Boot off

`/etc/fstab`:

UUID=b40b3925-70ef-447f-923e-1b05467c00e7 / ext4 errors=remount-ro 0 1
UUID=D656-F2A8 /boot/efi vfat umask=0077 0 1
UUID=b4512bc6-0ec8-4b17-9edd-88db0f031332 none swap sw 0 0

`/etc/default/grub`:
GRUB_CMDLINE_LINUX_DEFAULT="noplymouth fastboot acpiphp.disable=1 pcie_aspm=force vt.handoff=7 i915.fastboot=1 nopti nospectre_v2 nospec"

I've never had a single fsck error ever. Granted the `grub` boot option `fastboot` means `fsck` is not run on boot but I can check once FS is mounted RW with:

$ sudo fsck -n /dev/nvme0n1p6
fsck from util-linux 2.27.1
e2fsck 1.42.13 (17-May-2015)
Warning! /dev/nvme0n1p6 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
New_Ubuntu_16.04: clean, 712096/2953920 files, 5733245/11829504 blocks

Assuming your `/etc/fstab` is the same, the two important `grub` boot parameters are: `acpiphp.disable=1 pcie_aspm=force`. If memory serves me correct though these were setup for suspend/resume reasons though.

I hope this helps those effected by bug a little but more importantly that people realize the vast majority of NVMe installations work fine in Linux.

Kai-Heng Feng (kaihengfeng) on 2019-07-17

Changed in linux (Ubuntu):
assignee:	Kai-Heng Feng (kaihengfeng) → nobody

Brad Figg (brad-figg) on 2019-07-24

tags:

added: cscc

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-10-11:

#178

Hi, I didn't find the root of the problem, **BUT**...

Using Qubes OS I was able to run for more than 7 days without any problems! It normally would occur in the first hour of usage.

I guess Qubes's Xen drivers proxy the pcie requests and therefore the failing NVME/PCIE drivers aren't used. So it at least shines a light in the problem.

Maybe someone with better understanding of how Xen works can reason about which drivers are being used and which are not and discover the root of the problem!

Revision history for this message

Juan Carlos Carvajal Bermúdez (jucajuca) wrote on 2019-11-17:

#179

I have exactly the same drive:

/dev/nvme0n1 S444NY0K600040 SAMSUNG MZVLB256HAHQ-00000 1 81.09 GB / 256.06 GB 512 B + 0 B EXD7101Q

and exactly the same problem.

I filled a bug before deactivating AER (pci=noaer)

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852479

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-11-17:

#180

Hi Juan, what computer you are in?

If you really want to use your computer with linux the only way that it solved for me was to use Qubes OS

Revision history for this message

trong luu (tronglx) wrote on 2019-11-27:

#181

Hi lucas, i have the same problem. My laptop is matebook x pro 2018 and nvme LITEON CA3-8D512. My working is on linux and it is an unpleasant experience. Currently, when issue occurs, i power off/on my laptop (one times or more) and it can work normally in a few hours. Do you have any another suggest about linux distributions?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-11-27:

#182

Hi tronglx, the only way I found was to install Qubes OS

Revision history for this message

trong luu (tronglx) wrote on 2019-11-28:

#183

Thank Lucas, do you have tried with arch linux? Qubes OS is very strange with me. I'm developer and os community is very importance.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-11-28:

#184

Hi tronglx,

I didn't try Arch but I THINK I tried Manjaro which is based on it. If I did it didn't work. I remember trying lots of linux and all of them failed.

Qubes OS works because it doesn't use linux kernels directly because it uses Xen microkernel, so somehow it excludes the bug. You can install Arch as a Qubes VM, there's a script for it, you just run it and then it generates an image that you can install. They also provide Ubuntu, Kali and others.

Since this bug is rare I don't think they'll try to fix, the guy that was helping here gave up.

Revision history for this message

trong luu (tronglx) wrote on 2019-11-28:

#185

Thank Lucas,
It just happened to my laptop. I will try find out the solution.

Revision history for this message

trong luu (tronglx) wrote on 2019-12-07:

#186

I switched to recovery mode and run: mount -o remount,rw /. The problem no longer appears, it seem be fixed.

Revision history for this message

trong luu (tronglx) wrote on 2019-12-18:

#187

The error still happens.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-12-18:

#188

Hi trong luu, for me the error happened every day, which is why I ended up using Qubes. It's the only way that I could find except for Windows

You can try older kernels but it didn't work for me. Remember that downloading older ubuntus will still give you a recent kernel, you have to downgrade by yourself. However ni Ubuntu or Debian kernels fixed the problem for me

Revision history for this message

trong luu (tronglx) wrote on 2019-12-18:

#189

I think other SSD type is last option. But, i really want find out root cause of the problem. As my understanding, system booted up with Opts: errors=remount-ro. Then something went wrong, system switched to ro mode to protect file system. Do you have checked system log, have any abnormal log? NVME is becoming more and more popular. This is the big problem with linux user.

Revision history for this message

Juan Carlos Carvajal Bermúdez (jucajuca) wrote on 2019-12-18:

#190

For anyone struggling with this hideous bug, try the following:

add "nvme_core.default_ps_max_latency_us=250" in /etc/default/grub, for example:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off nvme_core.default_ps_max_latency_us=250"

then run "update-grub"

My laptop has been running smoothly for a week now. (/dev/nvme0n1 S444NY0K600040 SAMSUNG MZVLB256HAHQ-00000 1 81.09 GB / 256.06 GB 512 B + 0 B EXD7101Q)

see more infos here: https://wiki.archlinux.org/index.php/Solid_state_drive/NVMe

@kernel developers, would it not be great to detect such disks and lower automatically the nvme_core.default_ps_max_latency_us? Thi bug is really hard to detect and solve because there are NO logs whatsoever. Disk goes read-only ya know?

Revision history for this message

trong luu (tronglx) wrote on 2019-12-18:

#191

Hi Juan, i have tried with your suggest many time but the problem still happens. I also tried with nvme_core.default_ps_max_latency_us=0 but no hope. I'm not sure the APST be disabled. How to check APST status when system booted?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/core.c#n2282

Revision history for this message

Juan Carlos Carvajal Bermúdez (jucajuca) wrote on 2019-12-18:

#192

Hi
try:

cat /sys/module/nvme_core/parameters/default_ps_max_latency_us

sudo nvme get-feature -f 0x0c -H /dev/nvme0

please read carefully the link provided. the info is there.

Revision history for this message

trong luu (tronglx) wrote on 2019-12-18:

#193

Download full text (6.0 KiB)

After running cat /sys/module/nvme_core/parameters/default_ps_max_latency_us command and output is 0.
sudo nvme get-feature -f 0x0c -H /dev/nvme0n1p2
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
Autonomous Power State Transition Enable (APSTE): Disabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 1]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 2]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 3]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 4]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 5]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 6]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 7]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 8]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 9]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[10]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[11]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[12]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[13]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[14]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[15]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[16]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[17]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[18]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[19]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[20]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
...

After running cat /sys/module/nvme_core/parameters/default_ps_max_latency_us command and output is 0.
sudo nvme get-feature -f 0x0c -H /dev/nvme0n1p2 
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
	Autonomous Power State Transition Enable (APSTE): Disabled
	Auto PST Entries	.................
	Entry[ 0]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 1]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 2]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 3]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 4]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 5]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 6]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 7]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 8]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[ 9]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[10]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[11]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[12]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[13]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[14]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[15]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[16]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[17]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[18]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[19]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[20]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[21]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[22]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[23]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[24]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[25]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[26]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[27]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[28]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[29]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[30]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
	Entry[31]   
	.................
	Idle Time Prior to Transition (ITPT): 0 ms
	Idle Transition Power State   (ITPS): 0
	.................
       0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
00f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "................"
My grub default config is:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"
I have already tried with nvme_core.default_ps_max_latency_us=250, even some of another value, but the issue still occurs.

Revision history for this message

trong luu (tronglx) wrote on 2019-12-18:

#194

Hi Lucas, is it ok if installing window and using ubuntu in VMware?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-12-18:

#195

Hi trong luu, I didn't test it, but I think that it depends on the way VMware virtualizes access to the disk. There may be multiple ways, one of which will work.

Revision history for this message

trong luu (tronglx) wrote on 2019-12-19:

#196

Hi Lucas, do you have tried with new SSD? I don't think this is the hw issue. My SSD Power Cycles is only 807. Eventually, if not having any other solution, i think i will buy new SSD, do you know which type of SSD would work properly with linux?
Smartctl output:
sudo smartctl -t long -a /dev/nvme0n1p2
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-37-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: LITEON CA3-8D512
Serial Number: 0028104000DN
Firmware Version: C49640A
PCI Vendor ID: 0x14a4
PCI Vendor Subsystem ID: 0x1b4b
IEEE OUI Identifier: 0x002303
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Dec 19 08:54:28 2019 +07
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt *Other*
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 83 Celsius
Critical Comp. Temp. Threshold: 85 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.00W - - 0 0 0 0 0 0
1 + 4.50W - - 1 1 1 1 5 5
2 + 3.00W - - 2 2 2 2 5 5
3 - 0.0700W - - 3 3 3 3 1000 5000
4 - 0.0100W - - 4 4 4 4 5000 45000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 1
1 - 4096 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 47 Celsius
Available Spare: 100%
Available Spare Threshold: 0%
Percentage Used: 0%
Data Units Read: 5,773,150 [2.95 TB]
Data Units Written: 6,405,757 [3.27 TB]
Host Read Commands: 78,674,228
Host Write Commands: 91,754,035
Controller Busy Time: 10,405
Power Cycles: 807
Power On Hours: 312
Unsafe Shutdowns: 104
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 47 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

Hi Lucas, do you have tried with new SSD? I don't think this is the hw issue. My SSD Power Cycles is only 807. Eventually, if not having any other solution, i think i will buy new SSD, do you know which type of SSD would work properly with linux?
Smartctl output:
sudo smartctl -t long -a /dev/nvme0n1p2 
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.0.0-37-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       LITEON CA3-8D512
Serial Number:                      0028104000DN
Firmware Version:                   C49640A
PCI Vendor ID:                      0x14a4
PCI Vendor Subsystem ID:            0x1b4b
IEEE OUI Identifier:                0x002303
Total NVM Capacity:                 512,110,190,592 [512 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Thu Dec 19 08:54:28 2019 +07
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x001f):   Security Format Frmw_DL NS_Mngmt *Other*
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.00W       -        -    0  0  0  0        0       0
 1 +     4.50W       -        -    1  1  1  1        5       5
 2 +     3.00W       -        -    2  2  2  2        5       5
 3 -   0.0700W       -        -    3  3  3  3     1000    5000
 4 -   0.0100W       -        -    4  4  4  4     5000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         1
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        47 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    5,773,150 [2.95 TB]
Data Units Written:                 6,405,757 [3.27 TB]
Host Read Commands:                 78,674,228
Host Write Commands:                91,754,035
Controller Busy Time:               10,405
Power Cycles:                       807
Power On Hours:                     312
Unsafe Shutdowns:                   104
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               47 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-12-19:

#197

Hu trong lu. I indeed bought a new SSD because I thought mine was faulty. However I bought one of the same brand (Samsung). I didn't have the idea of buying another brand. Anyways, the brand new SSD also has the problem.

For my case it definitely is not a hardware problem. With Linux the problem happens every day, sometimes more than once per day. With Windows the error never happened and with Qubes I'm running for more than 2 months without any problems. So it's not hardware, definitely is something wrong with Linux kernel

Revision history for this message

trong luu (tronglx) wrote on 2019-12-19:

#198

Thank Lucas, i think i will buy another type of SSD. Do you have any suggestion?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-12-19:

#199

The other 2 good brands I know are Corsair and WD Black. Don't buy Samsung, the majority of people with this problem have Samsung

Revision history for this message

trong luu (tronglx) wrote on 2019-12-19:

#200

Thank you. My SSD is LITEON CA3-8D512, not Samsung. So, would i buy another type of SSD? non nvme?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2019-12-19:

#201

non nvme SSDs are pretty slow, like 8 times slower. Stick with NVME and if nothing works install Qubes

Revision history for this message

trong luu (tronglx) wrote on 2019-12-19:

#202

Thank Lucas.

Revision history for this message

Craigums Carlonious (craigsidcarlson) wrote on 2020-02-13:

#203

It's 2020, is there still no solution to this problem? Getting this error with ubuntu 18 LTS and 19

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2020-02-13:

#204

Craigums Carlonious,

Is the system exact the same?

Revision history for this message

Craigums Carlonious (craigsidcarlson) wrote on 2020-02-13:

#205

Hi, yes I also am trying to install onto the razer blade stealth like a lot of the other people above which have the SAMSUNG MZVLB256HAHQ-00000 nvme ssd. Getting the I/O Error and have tried most of the fixes mentioned above, but no luck, and I would rather continue using Windows instead of Qubes.

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2020-02-14:

#206

Can you please attach `sudo nvme id-ctrl /dev/nvme0`?

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2020-02-14:

#207

Hi Kai-Heng Feng, please note that after I installed Qubes, I never ever had the problem again. It may be useful in the debug process, and maybe the way Xen, PCIe and Linux work together in Qubes can give a hint on what's happening. Thank you for all your help to this day.

Revision history for this message

Juan Carlos Carvajal Bermúdez (jucajuca) wrote on 2020-02-24:

#208

an update on this:

it was actually pcie_aspm=off what helped to solve the problem.

I think the problem is related to the power management of PCIe ports.

Without pcie_aspm=off I started seeing errors like the following ones:

- [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=7841 end=7842) time 291 us, min 1063, max 1079, scanline start 1044, end 1092

pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
pcieport 0000:00:1d.0: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
pcieport 0000:00:1d.0: AER: device [8086:a330] error status/mask=00000001/00002000

I think the bug is not with the nvme controller but somewhere in ASPM. But I am not a kernel developer.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2020-02-24:

#209

Juan, which hardware you're on? Razer?

Revision history for this message

Juan Carlos Carvajal Bermúdez (jucajuca) wrote on 2020-02-25:

#210

No I have a laptop from a XMG, it is a German brand.

Revision history for this message

Ramon Fontes (ramonreisfontes) wrote on 2020-03-06:

#211

Hello all!

I'm experiencing the same problem with an adata SU800NS38. My SSD works fine with the 4.17.0-041700-generic kernel version but unfortunately this is the only kernel version it works perfectly. In addition to try other kernel versions such as 4.x, I also tried 5.0 - 5.5. The disk becomes read-only during use and I need to use fsck whenever I start the system.

Revision history for this message

Ramon Fontes (ramonreisfontes) wrote on 2020-03-06:

#212

BTW, pcie_aspm=off and nvme_core.default_ps_max_latency_us=5500 didn't work.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2020-03-06:

#213

Download full text (6.1 KiB)

Ramon, which hardware is yours? Razer?

Enviado via ProtonMail móvel

-------- Mensagem Original --------
Ativo 6 de mar de 2020 16:34, Ramon Fontes escreveu:

> BTW, pcie_aspm=off and nvme_core.default_ps_max_latency_us=5500 didn't
> work.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"
>
> Rebooting the ubuntu will give me a black terminal where I can run
> fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
> orphaned inodes. The majority of time it boots back to the Ubuntu
> working good, but some times...

Ramon, which hardware is yours? Razer?

Enviado via ProtonMail móvel

-------- Mensagem Original --------
Ativo 6 de mar de 2020 16:34, Ramon Fontes escreveu:

> BTW, pcie_aspm=off and nvme_core.default_ps_max_latency_us=5500 didn't
> work.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"
>
> Rebooting the ubuntu will give me a black terminal where I can run
> fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
> orphaned inodes. The majority of time it boots back to the Ubuntu
> working good, but some times it boots to a broken ubuntu (no images,
> lots of things broken). I have to reinstall ubuntu then.
>
> Every time I reinstall my Ubuntu, I have to try lots of times until it
> installs without an Input/Output error. When it installs, I can use it
> for some hours without having the problem, but if I run the software
> updates, it ALWAYS crashes and enters in read-only mode, specifically
> in the part that is installing kernel updates.
>
> I noticed that Ubuntu installs updates automatically when they're for
> security reasons. Could this be the reason my Ubuntu worked for months
> without the problem, but then an update was applied and it broke?
>
> I thought that this bug was happening:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 and tried
> different nvme_core.default_ps_max_latency_us= combinations, all them
> gave errors. I just changed to 0 and I had no error while using ubuntu
> (however I didn't test for a long time) but I still had the error
> after trying to update my ubuntu.
>
> My Samsung 512gb SSD is:
>
> SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q
>
> on a Razer Blade Stealth.
>
> I also asked this on ask ubuntu, without success:
> https://askubuntu.com/questions/998471/razer-blade-stealth-disk-
> corruption-fsck-needed-probably-samsung-ssd-bug-afte
>
> Please help me, as I need this computer to work on lots of things :c
> ---
> ApportVersion: 2.20.7-0ubuntu3.7
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: lz 1088 F.... pulseaudio
> CurrentDesktop: ubuntu:GNOME
> DistroRelease: Ubuntu 17.10
> InstallationDate: Installed on 2018-01-30 (0 days ago)
> InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
> MachineType: Razer Blade Stealth
> Package: linux (not installed)
> ProcFB: 0 inteldrmfb
> ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=0ca062da-7e8f-425a-88b1-1f784fb40346 ro quiet splash button.lid_init_state=open nvme_core.default_ps_max_latency_us=0
> ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
> RelatedPackageVersions:
> linux-restricted-modules-4.13.0-21-generic N/A
> linux-backports-modules-4.13.0-21-generic N/A
> linux-firmware 1.169.1
> Tags: wayland-session artful
> Uname: Linux 4.13.0-21-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> dmi.bios.date: 01/12/2017
> dmi.bios.vendor: Razer
> dmi.bios.version: 6.00
> dmi.board.name: Razer
> dmi.board.vendor: Razer
> dmi.chassis.type: 9
> dmi.chassis.vendor: Razer
> dmi.modalias: dmi:bvnRazer:bvr6.00:bd01/12/2017:svnRazer:pnBladeStealth:pvr2.04:rvnRazer:rnRazer:rvr:cvnRazer:ct9:cvr:
> dmi.product.family: 1A586752
> dmi.product.name: Blade Stealth
> dmi.product.version: 2.04
> dmi.sys.vendor: Razer
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340/+subscriptions

Revision history for this message

Ramon Fontes (ramonreisfontes) wrote on 2020-03-07:

#214

I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is that I bought my laptop about 1 year ago and I've installed Ubuntu 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything worked as expected. However, the same problem happened with any other kernel version (including 4.17.0-041700-generic).

Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-041700-generic works and I don't know why. It didn't work with Ubuntu 18.04.1 and it works with 18.04.4. This is really strange and I need to use most recent kernel versions, because I need to use some features I've implemented for v5.5-rc1.

[1] https://github.com/torvalds/linux/commit/b5764696ac409523414f70421c13b7e7a9309454#diff-21081ef83e1374560c2e244926168e49
[2] https://github.com/torvalds/linux/commit/7dfd8ac327301f302b03072066c66eb32578e940#diff-21081ef83e1374560c2e244926168e49

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2020-03-07:

#215

Download full text (7.3 KiB)

I also had this problem of it working for a year, then I update it and it stops working. Then I roll back the kernel and it won't work again

Enviado via ProtonMail móvel

-------- Mensagem Original --------
Ativo 7 de mar de 2020 10:17, Ramon Fontes escreveu:

> I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is
> that I bought my laptop about 1 year ago and I've installed Ubuntu
> 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything
> worked as expected. However, the same problem happened with any other
> kernel version (including 4.17.0-041700-generic).
>
> Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-041700-generic works and I don't know why. It didn't work with Ubuntu 18.04.1 and it works with 18.04.4. This is really strange and I need to use most recent kernel versions, because I need to use some features I've implemented for v5.5-rc1.
>
> [1] https://github.com/torvalds/linux/commit/b5764696ac409523414f70421c13b7e7a9309454#diff-21081ef83e1374560c2e244926168e49
> [2] https://github.com/torvalds/linux/commit/7dfd8ac327301f302b03072066c66eb32578e940#diff-21081ef83e1374560c2e244926168e49
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (devic...

I also had this problem of it working for a year, then I update it and it stops working. Then I roll back the kernel and it won't work again

Enviado via ProtonMail móvel

-------- Mensagem Original --------
Ativo 7 de mar de 2020 10:17, Ramon Fontes escreveu:

> I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is
> that I bought my laptop about 1 year ago and I've installed Ubuntu
> 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything
> worked as expected. However, the same problem happened with any other
> kernel version (including 4.17.0-041700-generic).
>
> Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-041700-generic works and I don't know why. It didn't work with Ubuntu 18.04.1 and it works with 18.04.4. This is really strange and I need to use most recent kernel versions, because I need to use some features I've implemented for v5.5-rc1.
>
> [1] https://github.com/torvalds/linux/commit/b5764696ac409523414f70421c13b7e7a9309454#diff-21081ef83e1374560c2e244926168e49
> [2] https://github.com/torvalds/linux/commit/7dfd8ac327301f302b03072066c66eb32578e940#diff-21081ef83e1374560c2e244926168e49
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1746340
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-21.24-generic 4.13.13
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_entry:1442: inode #26607929: comm updatedb.mlocat: checksumming directory block 0
> [63285.618078] audit: type=1400 audit(1517195560.393:63): apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=22495 comm="cupsd" capability=12 capname="net_admin"
>
> Rebooting the ubuntu will give me a black terminal where I can run
> fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
> orphaned inodes. The majority of time it boots back to the Ubuntu
> working good, but some times it boots to a broken ubuntu (no images,
> lots of things broken). I have to reinstall ubuntu then.
>
> Every time I reinstall my Ubuntu, I have to try lots of times until it
> installs without an Input/Output error. When it installs, I can use it
> for some hours without having the problem, but if I run the software
> updates, it ALWAYS crashes and enters in read-only mode, specifically
> in the part that is installing kernel updates.
>
> I noticed that Ubuntu installs updates automatically when they're for
> security reasons. Could this be the reason my Ubuntu worked for months
> without the problem, but then an update was applied and it broke?
>
> I thought that this bug was happening:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 and tried
> different nvme_core.default_ps_max_latency_us= combinations, all them
> gave errors. I just changed to 0 and I had no error while using ubuntu
> (however I didn't test for a long time) but I still had the error
> after trying to update my ubuntu.
>
> My Samsung 512gb SSD is:
>
> SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q
>
> on a Razer Blade Stealth.
>
> I also asked this on ask ubuntu, without success:
> https://askubuntu.com/questions/998471/razer-blade-stealth-disk-
> corruption-fsck-needed-probably-samsung-ssd-bug-afte
>
> Please help me, as I need this computer to work on lots of things :c
> ---
> ApportVersion: 2.20.7-0ubuntu3.7
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC0: lz 1088 F.... pulseaudio
> CurrentDesktop: ubuntu:GNOME
> DistroRelease: Ubuntu 17.10
> InstallationDate: Installed on 2018-01-30 (0 days ago)
> InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
> MachineType: Razer Blade Stealth
> Package: linux (not installed)
> ProcFB: 0 inteldrmfb
> ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic.efi.signed root=UUID=0ca062da-7e8f-425a-88b1-1f784fb40346 ro quiet splash button.lid_init_state=open nvme_core.default_ps_max_latency_us=0
> ProcVersionSignature: Ubuntu 4.13.0-21.24-generic 4.13.13
> RelatedPackageVersions:
> linux-restricted-modules-4.13.0-21-generic N/A
> linux-backports-modules-4.13.0-21-generic N/A
> linux-firmware 1.169.1
> Tags: wayland-session artful
> Uname: Linux 4.13.0-21-generic x86_64
> UpgradeStatus: No upgrade log present (probably fresh install)
> UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
> _MarkForUpload: True
> dmi.bios.date: 01/12/2017
> dmi.bios.vendor: Razer
> dmi.bios.version: 6.00
> dmi.board.name: Razer
> dmi.board.vendor: Razer
> dmi.chassis.type: 9
> dmi.chassis.vendor: Razer
> dmi.modalias: dmi:bvnRazer:bvr6.00:bd01/12/2017:svnRazer:pnBladeStealth:pvr2.04:rvnRazer:rnRazer:rvr:cvnRazer:ct9:cvr:
> dmi.product.family: 1A586752
> dmi.product.name: Blade Stealth
> dmi.product.version: 2.04
> dmi.sys.vendor: Razer
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1746340/+subscriptions

Revision history for this message

Kai-Heng Feng (kaihengfeng) wrote on 2020-03-09:

#216

Ramon,
Please file a separate bug since it's platform specific.

Revision history for this message

Ramon Fontes (ramonreisfontes) wrote on 2020-05-03:

#217

I thought I could help in some way with more information. By the way, I've found the solution and my SSD works fine right now. You may want to take a lookt at https://bugzilla.kernel.org/show_bug.cgi?id=201685. Comment #294 (https://bugzilla.kernel.org/show_bug.cgi?id=201685#c294), in particular, helped me to solve the problem.

Revision history for this message

Lucas Zanella (lucaszanella) wrote on 2021-06-22:

#218

I just want to say that after 2 years I remembered I had an SSD with different brand tham Samsung, a Kingston one. I installed it on my razer and it worked perfectly for days, I did several SSD stress tests and no errors.

The error is defintely with Samsung AND linux. And it's not a faulty SSD because it happens on both of my samsung SSDs. It does not happen on Windows, neither Qubes, with any SSD.

I tested the latest Ubuntu 21.04 and the problem still happens on Samsung SSDs right on the installation screen.

Anyways I'm not even using this computer anymore, I bought a Dell XPS 13, but the error persists and it's either Samsung's or Linux fault. Probably Samsung since other brands work ok with Samsung.

Revision history for this message

Anthony Durity (anthony-durity) wrote on 2023-01-16:

#219

omg_dmesg_b0rked.txt Edit (109.7 KiB, text/plain)

I've hit this "bug". I've a nice Clevo ODM based laptop and luckily I have two nvme drives in it so it's not a show-stopper for me but obv. it's a concern. I have an Intel one which is the boot drive and a Samsung one which is the data drive. I have a dual-boot setup. So two data points to note. The Intel nvme works in both Windows and Linux. The Samsung works in Windows, but not in Linux. When I say that it doesn't work in Linux I should say that the system brings the drive up, I can mount it read-write, everything looks good but as soon as I try and write files to it it craps out with nothing written:

[369.798910] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[369.798916] nvme nvme0: Does your device have a faulty power saving mode enabled?
[369.798918] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
[369.870912] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[369.871064] nvme nvme0: Removing after probe failure status: -19
[369.890931] nvme0n1: detected capacity change from 1953525168 to 0

Output of `dmesg` attached.

Revision history for this message

tetebueno (tetebueno) wrote on 2024-03-19 (last edit on 2024-04-15):

#220

Bump:

PC: Lenovo Legion Y520-15IKBN

SSD: Samsung SM951 M.2 PCIe SSD Drive (MZ-HPV256)

OS: Elementary OS 7.1 Horus (Ubuntu 22.04.1)

Kernel: 6.5.0-14-generic

---

lshw relevant parts:

computer
     description: Notebook
    product: 80WK (LENOVO_MT_80WK_BU_idea_FM_Lenovo Y520-15IKBN)
    vendor: LENOVO
    version: Lenovo Y520-15IKBN
    serial: PF0UM7F3
    width: 64 bits
    capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32
    configuration: administrator_password=disabled boot=normal chassis=notebook family=IDEAPAD frontpanel_password=disabled keyboard_password=disabled power-on_password=disabled sku=LENOVO_MT_80WK_BU_idea_FM_Lenovo Y520-15IKBN uuid=e974c0b6-54a9-11ef-8ff5-54e13d454041
(...)
              *-disk
                   description: ATA Disk
                   product: SAMSUNG MZHPV256
                   physical id: 0.0.0
                   bus info: scsi@3:0.0.0
                   logical name: /dev/sdb
                   version: 500Q
                   serial: S1X2NYAG810617
                   size: 238GiB (256GB)
                   capabilities: gpt-1.00 partitioned partitioned:gpt
                   configuration: ansiversion=5 guid=094df7bd-71f3-4587-b36f-8cb0bd3ba964 logicalsectorsize=512 sectorsize=512

---

Update: I first tried changes in comment #190 but that didn't work, the error persisted. Then, I tried only adding the pcie_aspm=off parameter only (removing the nvme_core.default_ps_max_latency_us parameter) and things got better, I was without errors for about three weeks straight; then the error manifested again. One thing to note is that the day of the failure was the only one that the computer was suspended intentionally. For now I'll keep this configuration as it seems to be the most "stable" one.

Ubuntu
linux package

Samsung SSD corruption (fsck needed)

Bug Description

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntulinux package

Samsung SSD corruption (fsck needed)

Bug Description

Duplicates of this bug

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
linux package