Samsung SSD corruption (fsck needed)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
High
|
Unassigned |
Bug Description
Ubuntu 4.13.0-
I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
I notice the error when I try to save something on disk and it says me that the disk is in read-only mode:
lz@lz:/var/log$ touch something
touch: cannot touch 'something': Read-only file system
lz@lz:/var/log$ cat syslog
Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
lz@lz:/var/log$ dmesg
[62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.377374] Aborting journal on device nvme0n1p2-8.
[62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_
[62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_
[63285.618078] audit: type=1400 audit(151719556
Rebooting the ubuntu will give me a black terminal where I can run fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of orphaned inodes. The majority of time it boots back to the Ubuntu working good, but some times it boots to a broken ubuntu (no images, lots of things broken). I have to reinstall ubuntu then.
Every time I reinstall my Ubuntu, I have to try lots of times until it installs without an Input/Output error. When it installs, I can use it for some hours without having the problem, but if I run the software updates, it ALWAYS crashes and enters in read-only mode, specifically in the part that is installing kernel updates.
I noticed that Ubuntu installs updates automatically when they're for security reasons. Could this be the reason my Ubuntu worked for months without the problem, but then an update was applied and it broke?
I thought that this bug was happening: https:/
My Samsung 512gb SSD is:
SAMSUNG MZVLW512HMJP-00000, FW REV: CXY7501Q
on a Razer Blade Stealth.
I also asked this on ask ubuntu, without success: https:/
Please help me, as I need this computer to work on lots of things :c
---
ApportVersion: 2.20.7-0ubuntu3.7
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 17.10
InstallationDate: Installed on 2018-01-30 (0 days ago)
InstallationMedia: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20180105.1)
MachineType: Razer Blade Stealth
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.169.1
Tags: wayland-session artful
Uname: Linux 4.13.0-21-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 01/12/2017
dmi.bios.vendor: Razer
dmi.bios.version: 6.00
dmi.board.name: Razer
dmi.board.vendor: Razer
dmi.chassis.type: 9
dmi.chassis.vendor: Razer
dmi.modalias: dmi:bvnRazer:
dmi.product.family: 1A586752
dmi.product.name: Blade Stealth
dmi.product.
dmi.sys.vendor: Razer
Joseph Salisbury (jsalisbury) wrote : | #1 |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
tags: | added: kernel-da-key |
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs. | #2 |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1746340
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Lucas Zanella (lucaszanella) wrote : AlsaInfo.txt | #3 |
tags: | added: apport-collected artful wayland-session |
description: | updated |
Lucas Zanella (lucaszanella) wrote : CRDA.txt | #4 |
Lucas Zanella (lucaszanella) wrote : CurrentDmesg.txt | #5 |
Lucas Zanella (lucaszanella) wrote : IwConfig.txt | #6 |
Lucas Zanella (lucaszanella) wrote : JournalErrors.txt | #7 |
Lucas Zanella (lucaszanella) wrote : Lspci.txt | #8 |
Lucas Zanella (lucaszanella) wrote : Lsusb.txt | #9 |
Lucas Zanella (lucaszanella) wrote : ProcCpuinfo.txt | #10 |
Lucas Zanella (lucaszanella) wrote : ProcCpuinfoMinimal.txt | #11 |
Lucas Zanella (lucaszanella) wrote : ProcEnviron.txt | #12 |
Lucas Zanella (lucaszanella) wrote : ProcInterrupts.txt | #13 |
Lucas Zanella (lucaszanella) wrote : ProcModules.txt | #14 |
Lucas Zanella (lucaszanella) wrote : PulseList.txt | #15 |
Lucas Zanella (lucaszanella) wrote : RfKill.txt | #16 |
Lucas Zanella (lucaszanella) wrote : UdevDb.txt | #17 |
Lucas Zanella (lucaszanella) wrote : WifiSyslog.txt | #18 |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Lucas Zanella (lucaszanella) wrote : | #19 |
Which kernel should I install exactly, and how to? Don't feel safe to download from http
Kai-Heng Feng (kaihengfeng) wrote : | #20 |
This is a known issue for Samsung NVMe.
Please attach the output of `sudo nvme id-ctrl /dev/nvme0` and `sudo nvme get-feature -f 0x0c -H /dev/nvme0 | less`, Thanks!
Kai-Heng Feng (kaihengfeng) wrote : | #21 |
Uhh sans the "less", thanks.
Lucas Zanella (lucaszanella) wrote : | #22 |
Thank you for your answer. I'm desperated. I just installed debian therefore I'm not going to able to do it right now, but I have output from the last time I was using Ubuntu.
I tried nvme_core.
Shouldn't this bug be already fixed? Or not in my kernel? I could pay to get to the bottom of this, because I need my computer so much right now and this bug is happening every day and I can't continue my work!
The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian and I have 4.9.0-4.
sudo nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------
/dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
rwt:4 rwl:4 idle_power:- active_power:-
Kai-Heng Feng (kaihengfeng) wrote : Re: [Bug 1746340] Re: Samsung SSD corruption (fsck needed) | #23 |
Kai-Heng
> On 31 Jan 2018, at 1:38 PM, Lucas Zanella <email address hidden> wrote:
>
> Thank you for your answer. I'm desperated. I just installed debian
> therefore I'm not going to able to do it right now, but I have output
> from the last time I was using Ubuntu.
>
> I tried nvme_core.
> Then I've put it to 0, which didn't work too. Well, with 0 it didn't
> generate errors while using, but while trying to update my machine,
> which always happens too, so I don't know anymore. I remember seeing
> ATSP Disabled at the output, but the error always happens when I try to
> update my software…
I’d like to see the output of `sudo nvme get-feature -f 0x0c -H /dev/nvme0` when you use nvme_core.
>
> Shouldn't this bug be already fixed? Or not in my kernel? I could pay to
> get to the bottom of this, because I need my computer so much right now
> and this bug is happening every day and I can't continue my work!
This is more likely to a low level NVMe/PCIe issue. If possible, please try to upgrade the firmware for the NVMe.
>
> The last kernel I had on ubuntu was 4.13.0-26-generic, now I'm on debian
> and I have 4.9.0-4.
You’ll get hit by this issue (again) once next Debian release uses newer kernel.
>
> sudo nvme list
> Node SN Model Namespace Usage Format FW Rev
> ---------------- -------
> /dev/nvme0n1 S33UNX0J324060 SAMSUNG MZVLW512HMJP-00000 1 25,30 GB / 512,11 GB 512 B + 0 B CXY7501Q
>
> NVME Identify Controller:
> vid : 0x144d
> ssvid : 0x144d
> sn : S33UNX0J324060
> mn : SAMSUNG MZVLW512HMJP-00000
> fr : CXY7501Q
> rab : 2
> ieee : 002538
> cmic : 0
> mdts : 0
> cntlid : 2
> ver : 10200
> rtd3r : 186a0
> rtd3e : 4c4b40
> oaes : 0
> oacs : 0x17
> acl : 7
> aerl : 3
> frmw : 0x16
> lpa : 0x3
> elpe : 63
> npss : 4
> avscc : 0x1
> apsta : 0x1
> wctemp : 341
> cctemp : 344
> mtfa : 0
> hmpre : 0
> hmmin : 0
> tnvmcap : 512110190592
> unvmcap : 0
> rpmbs : 0
> sqes : 0x66
> cqes : 0x44
> nn : 1
> oncs : 0x1f
> fuses : 0
> fna : 0
> vwc : 0x1
> awun : 255
> awupf : 0
> nvscc : 1
> acwu : 0
> sgls : 0
> subnqn :
> ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
> rwt:0 rwl:0 idle_power:- active_power:-
> ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
> rwt:1 rwl:1 idle_power:- active_power:-
> ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
> rwt:2 rwl:2 idle_power:- active_power:-
> ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
> rwt:3 rwl:3 idle_power:- active_power:-
> ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
> rwt:4 rwl:4 idle_power:- active_power:-
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https:/
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-
>
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed w...
Lucas Zanella (lucaszanella) wrote : | #24 |
Hi. I've been trying to install Windows 10 in order to try to update my SSD firmware, but I'm getting an error:
could it be that my SSD has a real hardware problem? I tried many different pen drives, in different USB ports, but I always get the same error.
I'm trying to install Ubuntu to get the output of nvme_core.
Lucas Zanella (lucaszanella) wrote : | #25 |
Hi! I managed to install ubuntu again, these are the outputs you asked for the ms tie of 0 milliseconds:
NVME Identify Controller:
vid : 0x144d
ssvid : 0x144d
sn : S33UNX0J324060
mn : SAMSUNG MZVLW512HMJP-00000
fr : CXY7501Q
rab : 2
ieee : 002538
cmic : 0
mdts : 0
cntlid : 2
ver : 10200
rtd3r : 186a0
rtd3e : 4c4b40
oaes : 0
oacs : 0x17
acl : 7
aerl : 3
frmw : 0x16
lpa : 0x3
elpe : 63
npss : 4
avscc : 0x1
apsta : 0x1
wctemp : 341
cctemp : 344
mtfa : 0
hmpre : 0
hmmin : 0
tnvmcap : 512110190592
unvmcap : 0
rpmbs : 0
sqes : 0x66
cqes : 0x44
nn : 1
oncs : 0x1f
fuses : 0
fna : 0
vwc : 0x1
awun : 255
awupf : 0
nvscc : 1
acwu : 0
sgls : 0
subnqn :
ps 0 : mp:7.60W operational enlat:0 exlat:0 rrt:0 rrl:0
rwt:0 rwl:0 idle_power:- active_power:-
ps 1 : mp:6.00W operational enlat:0 exlat:0 rrt:1 rrl:1
rwt:1 rwl:1 idle_power:- active_power:-
ps 2 : mp:5.10W operational enlat:0 exlat:0 rrt:2 rrl:2
rwt:2 rwl:2 idle_power:- active_power:-
ps 3 : mp:0.0400W non-operational enlat:210 exlat:1500 rrt:3 rrl:3
rwt:3 rwl:3 idle_power:- active_power:-
ps 4 : mp:0.0050W non-operational enlat:2200 exlat:6000 rrt:4 rrl:4
rwt:4 rwl:4 idle_power:- active_power:-
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
Autonomous Power State Transition Enable (APSTE): Disabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 1]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 2]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 3]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 4]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 5]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 6]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 7]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 8]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 9]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[10]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[11]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[12]
...
Lucas Zanella (lucaszanella) wrote : | #26 |
I just installed 4.15.0-
Lucas Zanella (lucaszanella) wrote : | #27 |
Problem persists with 4.15.0-
Kai-Heng Feng (kaihengfeng) wrote : | #28 |
So you have the issue on Linux v4.15 with nvme_core.
APST doesn't get enabled on both of them.
Lucas Zanella (lucaszanella) wrote : | #29 |
On debian (4.9) I didn't notice the issue but I didn't use much. HOWEVER, when I do apt-get upgrade on debian I do get the issue. It just updated the kernel file, didn't run the new kernel (a boot would have to happen).
On v4.15 I didn't change the nvme_core.
This is all very strange
Lucas Zanella (lucaszanella) wrote : | #30 |
I forgot to mention that I reinstalled windows and everything is fine. Even did a benchmark test on the SSD and I'm downloading lots of files to test
Kai-Heng Feng (kaihengfeng) wrote : | #31 |
I am not familiar with Windows, is there anyway to check its APST table? I'd like to see if deepest power state is enabled or not.
Lucas Zanella (lucaszanella) wrote : | #32 |
I searched and found nothing.
So, even with APST disabled my ssd will fail on linux. What should I do?
Does it work normally for other people when they disable it?
Lucas Zanella (lucaszanella) wrote : | #33 |
I found a guy with same problem as mine and had a Razer Blade Stealth, but he didn't post anything more after that. And he was in a thread with you. I also found some people with this same problem on the same SSD. Together with the fact that I had no problem on windows (ore than 24hrs of usage by now) I think it can be fixed in the kernel.
I had no luck updating my SSD's firmware as it's OEM and Samsung's updater won't work for it. Do you have any idea? I don't have money to buy a new SSD, and I really need to work. I'd be so grateful if you could help with a solution.
Kai-Heng Feng (kaihengfeng) wrote : | #34 |
Does the issue happen after system suspend?
Lucas Zanella (lucaszanella) wrote : | #35 |
Initially I noted that it'd happen after opening the lid of the notebook, so yes. But now after I install Ubuntu it immediately starts looking for software updates and that's when the problem happens for the first time, when I haven't even had time to close the notebook to suspend it.
Kai-Heng Feng (kaihengfeng) wrote : | #36 |
Please try [1]. It will do a PCI reset for NVMe device after resume.
people.
Lucas Zanella (lucaszanella) wrote : | #37 |
Thanks. What's a 'PCI reset for NVMe device after resume'?
Here's the output of running sudo dpkg -i *.deb on the 4 files:
Selecting previously unselected package linux-headers-
(Reading database ... 137951 files and directories currently installed.)
Preparing to unpack linux-headers-
Unpacking linux-headers-
Selecting previously unselected package linux-image-
Preparing to unpack linux-image-
Unpacking linux-image-4.15.0+ (4.15.0+-2) ...
Selecting previously unselected package linux-image-
Preparing to unpack linux-image-
Unpacking linux-image-
dpkg-deb (subprocess): decompressing archive member: lzma error: compressed data is corrupt
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing archive linux-image-
cannot copy extracted data for './usr/
Selecting previously unselected package linux-libc-dev.
Preparing to unpack linux-libc-
Unpacking linux-libc-dev (4.15.0+-2) ...
Setting up linux-headers-
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-libc-dev (4.15.0+-2) ...
Errors were encountered while processing:
linux-
Lucas Zanella (lucaszanella) wrote : | #38 |
I downloaded again and it seems that this time it wasn't corrupted.
Output:
Preparing to unpack linux-headers-
Unpacking linux-headers-
Preparing to unpack linux-image-
Unpacking linux-image-4.15.0+ (4.15.0+-2) over (4.15.0+-2) ...
Preparing to unpack linux-image-
Unpacking linux-image-
Preparing to unpack linux-libc-
Unpacking linux-libc-dev (4.15.0+-2) over (4.15.0+-2) ...
Setting up linux-headers-
Setting up linux-image-4.15.0+ (4.15.0+-2) ...
update-initramfs: Generating /boot/initrd.
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
W: Possible missing firmware /lib/firmware/
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-
Setting up linux-libc-dev (4.15.0+-2) ...
Lucas Zanella (lucaszanella) wrote : | #39 |
After installing everything, I rebooted to use the new kernel. I then installed updates on the machine to see if the problem would happen (easier way to make it happen is on the moment I try to update). After the update, wireless stopped working. Restarted many times and still not working.
Could it be that the update triggered the error and the so called pcie reset of this kernel made the wireless go wrong?
I'm gonna still use this kernel to see if the read only filesystem happens though
Lucas Zanella (lucaszanella) wrote : | #40 |
I added an USB wireless receiver to use internet to download things so I can see if something happens. I installed more system updates through the ubuntu software updates. Is this ok? The kernel will still be yours, rigtht?
Kai-Heng Feng (kaihengfeng) wrote : | #41 |
> On Feb 8, 2018, at 10:19 AM, Lucas Zanella <email address hidden> wrote:
>
> I added an USB wireless receiver to use internet to download things so I
> can see if something happens. I installed more system updates through
> the ubuntu software updates. Is this ok? The kernel will still be yours,
> rigtht?
I should be. You can use `uname -r` to check the kernel version.
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https:/
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-
>
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
>
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_
> [63285.618078] audit: type=1400 audit(151719556
>
> Rebooting the ubuntu will give me a black ter...
Lucas Zanella (lucaszanella) wrote : | #42 |
The new kernel has been running for almost a day and no problems happened (however I still have no PCIe wireless and no i915 firmware so I can't open things like kdenlive)
What does this fix of yours do and is it possible to make it work with everything?
Kai-Heng Feng (kaihengfeng) wrote : | #43 |
I built another one based on Bionic, please use this kernel instead,
people.
Lucas Zanella (lucaszanella) wrote : | #44 |
Just after running sudo dpkg -i *.deb, and before rebooting, the error happened. Since the new kernel isn't running yet, I guess this current kernel still had the problem? That's strange because I've been running for more than 24 hours, downloading lots of torrents and had no problems.
I'm going to reboot now to test the new kernel
Here's the output:
sudo dpkg -i *.deb
[sudo] password for lz:
Selecting previously unselected package linux-headers-
(Reading database ... 215114 files and directories currently installed.)
Preparing to unpack linux-headers-
Unpacking linux-headers-
Selecting previously unselected package linux-headers-
Preparing to unpack linux-headers-
Unpacking linux-headers-
Selecting previously unselected package linux-image-
Preparing to unpack linux-image-
Examining /etc/kernel/
Done.
Unpacking linux-image-
Selecting previously unselected package linux-image-
Preparing to unpack linux-image-
Unpacking linux-image-
Setting up linux-headers-
Setting up linux-headers-
Setting up linux-image-
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
update-initramfs: Generating /boot/initrd.
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
Generating grub configuration file ...
Warning: Setting GRUB_TIMEOUT to a non-zero value when GRUB_HIDDEN_TIMEOUT is set is no longer supported.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
Lucas Zanella (lucaszanella) wrote : | #45 |
I rebooted and I was still at 4.15. I then activated grub to select a kernel version, and I chose 4.14.... which is yours. I then boot to a cpu_fifo_underun and nothing boots
Lucas Zanella (lucaszanella) wrote : | #46 |
initramf [drm: intel_cpu_
Kai-Heng Feng (kaihengfeng) wrote : | #47 |
> On Feb 9, 2018, at 3:34 PM, Lucas Zanella <email address hidden>
> wrote:
>
> Just after running sudo dpkg -i *.deb, and before rebooting, the error
> happened. Since the new kernel isn't running yet, I guess this current
> kernel still had the problem? That's strange because I've been running
> for more than 24 hours, downloading lots of torrents and had no
> problems.
>
> I'm going to reboot now to test the new kernel
>
The issue happens when the disk transits between operational states and
non-operational states.
If you are torrenting, then chances are the disk is always in operational
states, so you don’t see the issue.
Kai-Heng
> Here's the output:
>
> sudo dpkg -i *.deb
> [sudo] password for lz:
> Selecting previously unselected package linux-headers-
> (Reading database ... 215114 files and directories currently installed.)
> Preparing to unpack
> linux-headers-
> Unpacking linux-headers-
> Selecting previously unselected package linux-headers-
> Preparing to unpack
> linux-headers-
> Unpacking linux-headers-
> Selecting previously unselected package linux-image-
> Preparing to unpack
> linux-image-
> Examining /etc/kernel/
> Done.
> Unpacking linux-image-
> Selecting previously unselected package
> linux-image-
> Preparing to unpack
> linux-image-
> Unpacking linux-image-
> Setting up linux-headers-
> Setting up linux-headers-
> Setting up linux-image-
> Running depmod.
> update-initramfs: deferring update (hook will be called later)
> Examining /etc/kernel/
> run-parts: executing /etc/kernel/
> 4.14.0-17-generic /boot/vmlinuz-
> run-parts: executing /etc/kernel/
> 4.14.0-17-generic /boot/vmlinuz-
> update-initramfs: Generating /boot/initrd.
> run-parts: executing /etc/kernel/
> 4.14.0-17-generic /boot/vmlinuz-
> run-parts: executing /etc/kernel/
> 4.14.0-17-generic /boot/vmlinuz-
> run-parts: executing /etc/kernel/
> 4.14.0-17-generic /boot/vmlinuz-
> Generating grub configuration file ...
> Warning: Setting GRUB_TIMEOUT to a non-zero value when
> GRUB_HIDDEN_TIMEOUT is set is no longer supported.
> Found linux image: /boot/vmlinuz-
> Found initrd image: /boot/initrd.
> Found linux image: /boot/vmlinuz-
> Found initrd image: /boot/initrd.
> Found linux image: /boot/vmlinuz-
> Found...
Lucas Zanella (lucaszanella) wrote : | #48 |
Very important update: I bought a brand new Samsung 960 EVO, and I can't even install Ubuntu: I get I/O error in the installation
Kai-Heng Feng (kaihengfeng) wrote : | #49 |
You need to boot with kernel parameter "nvme_core.
Please try this kernel after installation:
http://
Lucas Zanella (lucaszanella) wrote : | #50 |
Is it possible to boot the live installation media with the kernel parameter? I'm having problems installing the ubuntu into the new SSD, always get I/O errors...
I'm gonna also try the new kernel on the old SSD though
Lucas Zanella (lucaszanella) wrote : | #51 |
You mean I need to boot with the parameter on your new kernel? I'm gonna try it
Lucas Zanella (lucaszanella) wrote : | #52 |
I just installed your new kernel on the old SSD and changed the nvme_core_..._us to 0
seems that a dependency is missing on the kernel:
oblems prevent configuration of linux-headers-
linux-
Package libssl1.1 is not installed.
dpkg: error processing package linux-headers-
dependency problems - leaving unconfigured
Setting up linux-image-
Running depmod.
update-initramfs: deferring update (hook will be called later)
Examining /etc/kernel/
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
update-initramfs: Generating /boot/initrd.
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Adding boot menu entry for EFI firmware configuration
done
Setting up linux-image-
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
update-initramfs: Generating /boot/initrd.
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
run-parts: executing /etc/kernel/
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Found linux image: /boot/vmlinuz-
Found initrd image: /boot/initrd.
Adding boot menu entry for EFI firmware configuration
done
Errors were encountered while processing:
linux-hea...
Lucas Zanella (lucaszanella) wrote : | #53 |
Nevermind I installed libssl1.1 by adding the bionic rep, however right before I could reinstall the kernel the system entered in read-only mode. I'm gonna try to enter and install the new kernel in some way.
Lucas Zanella (lucaszanella) wrote : | #54 |
I think I got it: http://
I'm counting with the idea that this 4.13.0-34 is the new one, not the old one I had. I hope it is.
Just booted and PCIe wireless is working. uname-r gives 4.13.0-34-generic.
Going to leave the system rest for a while to see if something happens, not going to download torrent again.
Lucas Zanella (lucaszanella) wrote : | #55 |
I've been running for days without any problem (it'd happen before like 30 minutes after installation). So can you release the source? Will it be on mainline?
Also, how to use this kernel with the live image? Because it's painful to install ubuntu with this problems, I get I/O error in 90% of my tries. I have to try for hours until it installs good.
Thank you so much!
Kai-Heng Feng (kaihengfeng) wrote : | #56 |
Can you try again with [1]?
The one you used is with quirk NVME_QUIRK_
[1] people.
Lucas Zanella (lucaszanella) wrote : | #57 |
Ok, just installed it. Gonna monitor it to see if any errors come up
Lucas Zanella (lucaszanella) wrote : | #58 |
Everything is ok with this new kernel. No erros.
Kai-Heng Feng (kaihengfeng) wrote : | #59 |
Changed in linux (Ubuntu): | |
assignee: | nobody → Kai-Heng Feng (kaihengfeng) |
Lucas Zanella (lucaszanella) wrote : | #60 |
When there will be a kernel with this patch included?
What about the live image? It's going to take months for a live installation image to have this patch. Is it possible for me to use this kernel in a live image myself?
Kai-Heng Feng (kaihengfeng) wrote : | #61 |
> When there will be a kernel with this patch included?
v4.16.
> What about the live image? It's going to take months for a live installation image to have this patch. Is it possible for me to use this kernel in a live image myself?
I'll back port the patch to v4.15 so Bionic (18.04) live image will have this fix.
Lucas Zanella (lucaszanella) wrote : | #62 |
but v4.16-rc1 doesn't have "NVME_QUIRK_
Kai-Heng Feng (kaihengfeng) wrote : | #63 |
The patch doesn't get merged yet.
Lucas Zanella (lucaszanella) wrote : | #64 |
I compiled a kernel myself apllying the patcch and using make deb-pkg and got these files:
linux-
linux-
linux-
linux-
but you don't have image...dbg neither libc, and you have image-extra and headers...generic. What's the difference? Will mine work? If not, how do I get your 4 files exactly?
Kai-Heng Feng (kaihengfeng) wrote : | #65 |
Build a Debian/Ubuntu kernel:
https:/
Kernel source:
http://
http://
Lucas Zanella (lucaszanella) wrote : | #66 |
I download a fresh 4.15.0-9 kernel, applied the diff and compiled just as in the page you sent.
I then formatted, installed ubuntu 17.10.1, booted and enabled nvme_..._us=0, rebooted. Then I started installing updates. The error ocurred in the middle of it.
I've never tried to install updates on the kernels you made me test. Could it be that it's breaking something?
:(
Lucas Zanella (lucaszanella) wrote : | #67 |
I installed ubuntu again, installed my compiled kernel, disabled updates. When installing a package (virt-manager), the error ocurred again. This package messes with kernel (kvm things) but I used before on your kernels and everything was fine (I didn't try to install while in your kernel though)
Lucas Zanella (lucaszanella) wrote : | #68 |
Here's what I did:
git clone bionic_git_url...
cd ubuntu-bionic
git checkout <tag of version 4.15.0-9>
patch -p1 < nvme_reset.diff #(from your diff file)
#gave an error about last line but I checked manually and it was ok (I guess was because of the number in the end: https:/
sudo apt-get build-dep linux-image-$(uname -r)
fakeroot debian/rules clean
fakeroot debian/rules binary-headers binary-generic binary-perarch
after a long time, I copied
linux-headers-
linux-headers-
linux-image-
linux-image-
and installed on my fresh ubuntu 17.10.1 install on the razer blade stealth by doing
sudo dpkg -i *.deb
then I added nvme_..._us=0 to grub and did
sudo update-grub
rebooted and used for a while (confirmed using uname-r that the new kernel was running). In the first time I did all this, the problem ocurred while installing updates. In the second time I tried, the error ocurred when tried to install virt-manager.
Since the kernel worked perfectly except for that, I can only assume that your diff didn't go through. But if I do git checkout <tag> and then apply a diff to that tag, then I can simply cd to this folder and compile and I'll be using the diff, right?
Thank you for your help!
Kai-Heng Feng (kaihengfeng) wrote : | #69 |
I want to know the reason behind compiling your own kernel, is it because with kernel parameter "nvme_core.
If it's true, then we need to put the patch into Bionic's kernel, and make sure the daily Bionic iso use the new kernel.
Lucas Zanella (lucaszanella) wrote : | #70 |
I added nvme_core.
I'm compiling my own because I want to learn and also test new kernels as they are released, specially now with specte and meltdown (it's going to take time for it to reach mainline and even more time for it to reach the live installer). Also it's a good pratice for security reasons.
I don't see what I did wrong, my kernel should work exactly as yours.
Kai-Heng Feng (kaihengfeng) wrote : | #71 |
Please remove the kernel parameter so we can make sure it works with APST enabled.
Lucas Zanella (lucaszanella) wrote : | #72 |
Since you wrote the last message I recompiled the kernel and reinstalled. Tested again, the problem ocurred in about 1 hour. Then I took the kernel parameter off and started to test and I've been running for more than 24 hours without errors. However, the error ocurred inside a virtual machine. But the disk in the machine is named /dev/sda1, so it's not using NVME drivers or anything like that. How is it possible for the error to occur inside the virtual machine but not on the main machine? Could this be due to another completely unrelated problem?
Kai-Heng Feng (kaihengfeng) wrote : | #73 |
Thanks Lucas. Sounds like the issue is gone when APST gets enabled.
It should be great if you can test it with more S3 cycles.
Regarding to the VM issue, I can't be sure unless you attach the error message.
Lucas Zanella (lucaszanella) wrote : | #74 |
The kernel is still good. The error happened again in the virtual machine, here's dmesg:
[ 6730.708866] EXT4-fs error (device sda1): htree_dirblock_
[ 6730.710121] Aborting journal on device sda1-8.
[ 6730.711514] EXT4-fs (sda1): Remounting filesystem read-only
[ 6730.713087] EXT4-fs error (device sda1): ext4_journal_
[ 7030.415582] audit: type=1400 audit(151926908
[67539.479651] clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
[67539.479670] clocksource: 'kvm-clock' wd_now: 55b3d2da2f60 wd_last: 269f11d4c146 mask: ffffffffffffffff
[67539.479673] clocksource: 'tsc' cs_now: 422f92c80a6a cs_last: 422e9dc07f56 mask: ffffffffffffffff
what do you think? My disk is /dev/sda1 on the virtual machine, so no NVME... I'm using KVM spice
Lucas Zanella (lucaszanella) wrote : | #75 |
So... the error happened :(
I don't know if it's related but I was compiling QT 5 inside a virtual machine and gone to sleep. When I woke up there was an error in the compilation about not being able to allocate virtual memory. The vm was unusable (I pressed things and they won't work) then I rebooted the VM and in the ubuntu initialization there was something about trying to write outside disk hd1. I did fsck then and the machine kept printing lots of lines indefintely about disk writes (wouldn't stop). Tried to print but couldn't save.
Now, in the main machine, I did touch a and the error Read Only filesystem appeared. Then looks at dmesg:
[62526.097648] CPU3: Package temperature above threshold, cpu clock throttled (total events = 734393)
[62526.097650] CPU2: Package temperature above threshold, cpu clock throttled (total events = 734389)
[62526.097654] CPU0: Package temperature above threshold, cpu clock throttled (total events = 734421)
[62526.098643] CPU0: Core temperature/speed normal
[62526.098644] CPU2: Core temperature/speed normal
[62526.098644] CPU3: Package temperature/speed normal
[62526.098645] CPU1: Package temperature/speed normal
[62526.098646] CPU2: Package temperature/speed normal
[62526.098647] CPU0: Package temperature/speed normal
[62826.083664] CPU0: Core temperature/speed normal
[62826.083665] CPU2: Core temperature/speed normal
[62826.083666] CPU3: Package temperature/speed normal
[62826.083667] CPU1: Package temperature/speed normal
[62826.083667] CPU2: Package temperature/speed normal
[62826.083669] CPU0: Package temperature/speed normal
[63109.039660] CPU3: Core temperature above threshold, cpu clock throttled (total events = 122579)
[63109.039661] CPU1: Core temperature above threshold, cpu clock throttled (total events = 122586)
[63109.043637] CPU3: Core temperature/speed normal
[63109.043637] CPU1: Core temperature/speed normal
[63141.298625] CPU2: Core temperature above threshold, cpu clock throttled (total events = 685839)
[63141.298626] CPU0: Core temperature above threshold, cpu clock throttled (total events = 685861)
[63141.298628] CPU1: Package temperature above threshold, cpu clock throttled (total events = 752070)
[63141.298628] CPU3: Package temperature above threshold, cpu clock throttled (total events = 752043)
[63141.298630] CPU0: Package temperature above threshold, cpu clock throttled (total events = 752073)
[63141.298633] CPU2: Package temperature above threshold, cpu clock throttled (total events = 752042)
[63141.311665] CPU0: Core temperature/speed normal
[63141.311666] CPU2: Core temperature/speed normal
[63141.311667] CPU1: Package temperature/speed normal
[63141.311667] CPU3: Package temperature/speed normal
[63141.311668] CPU2: Package temperature/speed normal
[63141.311669] CPU0: Package temperature/speed normal
[63441.300764] CPU2: Core temperature/speed normal
[63441.300765] CPU0: Core temperature/speed normal
[63441.300766] CPU3: Package temperature/speed normal
[63441.300766] CPU1: Package temperature/speed normal
[63441.300767] CPU0: Package temperature/speed normal
[63441.300768] CPU2: Package temperature/speed normal
[63742.088404] CPU0: Core temperature above threshold, cpu cloc...
Lucas Zanella (lucaszanella) wrote : | #76 |
here's a print screen:
Kai-Heng Feng (kaihengfeng) wrote : | #77 |
Do you have full dmesg in comment #75? Do you see any NVMe error?
Lucas Zanella (lucaszanella) wrote : | #78 |
Nope, sorry, I rebooted after copying the dmesg. I thought that since there is [68668.595459] EXT4-fs (nvme0n1p2): Remounting filesystem read-only in what I copied, it was enough because there is where the error started. Gonna cat the entire file the next time, but I'm afraid it'll only happend if I force my CPU too much like I did (actually this is good, because the nvme took a very long time to enter in read only mody, it's a very good progress)
Meanwhile, I think that the average of read only errors inside the VMs is like 0.8/day. I always test the main machine when these errors happen and it's always fine. The only time when it gone wrong was this one that I reported.
Lucas Zanella (lucaszanella) wrote : | #79 |
Just to remember, inside the VM the error looks like this:
[26547.754916] EXT4-fs error (device sda1): htree_dirblock_
[26547.756301] Aborting journal on device sda1-8.
[26547.757724] EXT4-fs (sda1): Remounting filesystem read-only
[26547.762207] EXT4-fs error (device sda1): ext4_journal_
[26631.771204] EXT4-fs error (device sda1): htree_dirblock_
when outside there's no problem at all.
Kai-Heng Feng (kaihengfeng) wrote : | #80 |
I don't think the EXT4 issue inside VM is the same as NVMe one.
If you no longer have the issue on host machine, then the fix works.
Lucas Zanella (lucaszanella) wrote : | #81 |
Yes, the fix totally works. The only time when I had a real nvme error on the main machine was that one I reported. Don't know why, though, but it looks like that the VM gone terribly wrong. But these VMs are just plain ubuntus with docker, visual studio code and git. Nothing fancy, don't know why I keep getting ext4 problems.
Lucas Zanella (lucaszanella) wrote : | #82 |
Well, the error happened again, and I wasn't even running any VMs
Maybe there's a rare case in which your diff correction didn't get applied?
:(
Kai-Heng Feng (kaihengfeng) wrote : | #83 |
Please change the line in the patch from
"return NVME_QUIRK_
to
"return (NVME_QUIRK_
And see if this still happens.
Lucas Zanella (lucaszanella) wrote : | #84 |
Ok, I'm compiling it now on my other PC. Meanwhile, the error happened twice in the same day. That's odd, it usually took at least 2 days to manifest again. Could it be that something chanded on my PC?
Anyways, here's the error:
4609.325351] EXT4-fs error (device nvme0n1p2): ext4_find_
[ 4609.327443] Aborting journal on device nvme0n1p2-8.
[ 4609.329533] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
[ 4609.357281] EXT4-fs error (device nvme0n1p2): ext4_find_
[ 4609.627350] EXT4-fs error (device nvme0n1p2): ext4_find_
[ 4795.596378] perf: interrupt took too long (2563 > 2500), lowering kernel.
[ 4911.346846] audit: type=1400 audit(151987688
I'll test the new compiler kernel in some hours
Lucas Zanella (lucaszanella) wrote : | #85 |
After 2 days with the new kernel, it happened again. Seems like around every 2 days it happens. Maybe some rare nvme write that you didn't cover in the quirks?
Kai-Heng Feng (kaihengfeng) wrote : | #86 |
Again, can you attach full dmesg?
Because the message is not about NVMe, but EXT4.
Please fsck the rootfs before any further testing.
Lucas Zanella (lucaszanella) wrote : | #87 |
The next time it happen I will post the full dmesg. But even my first message (#1):
Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
cites EXT-4 errors. It's always been like that, nothing changed.
I need to fsck the rootfs now or when the error happens again? And how should I do it?
Kai-Heng Feng (kaihengfeng) wrote : | #88 |
Generally I boot up a live system and run fsck on they block device. A quick google shows there are several ways to achieve the same thing.
Kai-Heng Feng (kaihengfeng) wrote : | #89 |
So does the issue still happen after fsck?
Does quirk NVME_QUIRK_
Lucas Zanella (lucaszanella) wrote : | #90 |
Sorry I didn't test it yet. I had to travel and use the computer so I did a fsck on onvme0n1p2 only, to get it working.
I thought fsck /dev/nvme0n1p2 was the same a fscking the rootfs. I do it every time the error happens. I didn't understand exactly what you meant.
Also I think NVME_QUIRK_
If I need to do it before the error happens, I can just run my live ubuntu and do it.
Thank you for your help.
Lucas Zanella (lucaszanella) wrote : | #91 |
Ok so I didn't know exactly what to do.
I was using my machine and even though I didn't get any errors I rebooted, entered ubuntu live image and did fsck on /dev/nvme0n1p1 and /dev/nvme0n1p2, no errors showed up
Then I continued using my machine and the error appeared. I rebooted into the live machine and did
fsck /dev/nvme0n1p2 and fsck /dev/nvme0n1p1
here are the outputs:
(in the nvme0n1p1 I got dirty bit, don't know what is this, and in nvme0n1p2 the output looks the same as when I've run the fsck from the SSD)
The error persists.
Kai-Heng Feng (kaihengfeng) wrote : | #92 |
Seems to me it's a bug in EXT4 instead of NVMe.
So seems like NVME_QUIRK_
Lucas Zanella (lucaszanella) wrote : | #93 |
Without NVME_QUIRK_
Lucas Zanella (lucaszanella) wrote : | #94 |
The error just happened again inside the VM, and upon VM reboot it said that something was trying to access outside the disk hd0 or hd1, don't remember. Then I noted that the host machine had the error too.
This already happened before and I mentioned in commet #75
Somehow the virtual machine errors and the host machine errors are related. However, the error happens even without virtual machine usage.
Kai-Heng Feng (kaihengfeng) wrote : | #95 |
Do you suspend/resume the system during your usage?
Lucas Zanella (lucaszanella) wrote : | #96 |
I do it a lot. However the errors don't happen right afetr waking from suspend. They take some time.
Before the NVME quirk I tried to see if suspend/resume influenced in the error. I experienced botting the PC and leaving it open until the error ocurred, proving the suspend wasn't causing it.
However now with the NVME quirk the computer takes a lot of time to show the error, so I always en dup closing it. However I think there was one time the error happened right after turning the PC on, though I'm not 100% sure
Kai-Heng Feng (kaihengfeng) wrote : | #97 |
Have you ever seen error message that says the nvme device stays in D3 and refused to change to D0?
Lucas Zanella (lucaszanella) wrote : | #98 |
No. The times I've read dmesg there was nothing like this, neither as an error popup. I'll grep D0 and D3 the next time though
Lucas Zanella (lucaszanella) wrote : | #99 |
I just noted that almost always when I do
docker rm $(docker ps -a -q)
to remove all docker containers inside my VM, the error happens. Maybe high disk usage causes this?
The errors on my computer are taking time to happen, but ih the VMs it happens every day.
I'm 100% thankful for what you've done. If you know something, I can pay, these errors are making very hard for me to work
Kai-Heng Feng (kaihengfeng) wrote : | #100 |
Please try this one, I built it with NVMe queue depth = 2.
https:/
Also please attach the dmesg, thanks!
Lucas Zanella (lucaszanella) wrote : | #101 |
Would you mind posting the diff? I'm using custom kernel modifications (not related to disk and tested without them)
Kai-Heng Feng (kaihengfeng) wrote : | #102 |
Ok, there's actually a kernel parameter for that, please boot with "nvme.io_
Lucas Zanella (lucaszanella) wrote : | #103 |
I tried this parameter and the computer got stuck at the loading screen. Had to enter recovery mode and remove the parameter to make it boot again
Kai-Heng Feng (kaihengfeng) wrote : | #104 |
Can you try some value like 64? PM1725 NVMe uses this value.
Lucas Zanella (lucaszanella) wrote : | #105 |
I'm trying since you wrote. No problems yet on the host machine, but the virtual machine already presented the error twice today (not related to high disk usage)
Lucas Zanella (lucaszanella) wrote : | #106 |
Ok, the error just happened now in the main machine. Took much more days to happen.
I picked my computer while it were sleeping but opened, and moved the mouse and saw a black screen with nothing on it. Then pressed power and soething appeared: /.../libvirt .... read only file system and then it turned off. Libvirt was running at the time so I think it was just an error saying that libvirt was trying to write to the disk. Don't know if it's related.
Lucas Zanella (lucaszanella) wrote : | #107 |
And it just happened again. It's common to happend again right after I rebooted ans fscked from a previous one. Then it calms down for some days
Kai-Heng Feng (kaihengfeng) wrote : | #108 |
Do you see similar behavior under Windows?
Lucas Zanella (lucaszanella) wrote : | #109 |
On the epoch when I was having the error every 2 hours I installed windows and used for some days without any problems, so I guess not. I also tried an old debian and installation went ok (on ubuntu it fails 80% of the time. I have to try many until I get a good installation)
Kai-Heng Feng (kaihengfeng) wrote : | #110 |
So do you see the same issue with mainline v4.9 kernel [1]?
From what I can understand, disable APST can let you fully install Ubuntu/Debian, but after some usage, you still have to fsck the disk?
Lucas Zanella (lucaszanella) wrote : | #111 |
I did not try to install with disable APST. I tried to put a custom kernel in the live CD but it wouldn't boot. My current ubuntu was installed with trial and error until it installed without any errors. The installation process is not so important for me, I can try 5 or 7 times before getting it right. I'm mainly concerned about usage after.
So this is what happened: before any quirks, I was having the error every 2 hours. After your kernel quirk, I started having the error every 2 days on main machine and on average every day on the virtual machines. After the NVME queue_depth parameter it looks like it's taking 5 days, but the virtual machines continue giving the errors 1 time per day on average.
So I should try this new kernel? I suppose it already has the quirk you created. I'll install it soon. Not now because I need to backup things, because if the error happens during the kernel installation then the whole ubuntu is going to get wrecked.
Kai-Heng Feng (kaihengfeng) wrote : | #112 |
The quirk is not included.
I asked this because looks like you didn't need to fsck under Debian with Linux v4.9.
Lucas Zanella (lucaszanella) wrote : | #113 |
I didn't need but suddenly I installed some updates and it broke. However I don't think the kernel got upgraded with that update.
Kai-Heng Feng (kaihengfeng) wrote : | #114 |
Can you use v4.9 under Ubuntu and see if this still happens? Or does your laptop need driver support from newer kernel?
Lucas Zanella (lucaszanella) wrote : | #115 |
Hi, I'm back, sorry for the delay. I'll test it soon again. I tried and the error happened in the middle of the update and broke my ubuntu. I'll reinstall and try again
Kai-Heng Feng (kaihengfeng) wrote : | #116 |
Lucas,
Can you attach `sudo lspci -vvnn` here? Thanks!
Lucas Zanella (lucaszanella) wrote : | #117 |
Hello. Thank you for your continued support! I was unable to test the older kernel yet as I'm using this PC constantly and cannot lose or have it unusable for too much time, as when the system gets corrupted I have to spend hours trying to install it again without errors.
Here's the output:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
Subsystem: Razer USA Ltd. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
Subsystem: Razer USA Ltd. HD Graphics 620 [1a58:6752]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 127
Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00018 Data: 0000
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Process Address Space ID (PASID)
PASIDCap: Exec- Priv-, Max PASID Width: 14
PASIDCtl: Enable- Exec- Priv-
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [300 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00008000, Page Request Allocation: 00000000
Kernel driver in use: i915
Kernel modules: i915
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
Subsystem: Razer USA Ltd. Sunrise Point-LP USB 3.0 xHCI Controller [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr...
Kai-Heng Feng (kaihengfeng) wrote : | #118 |
If possible, please try this kernel:
https:/
Please also attach `sudo lspci -vvnn` with this kernel, thanks!
Lucas Zanella (lucaszanella) wrote : | #119 |
Could you provide the diff file? I need to compile with other modifications.
Or better yet, is there a command to disable aspm in boot?
Thank you so much
Kai-Heng Feng (kaihengfeng) wrote : | #120 |
tags: | added: patch |
Lucas Zanella (lucaszanella) wrote : | #121 |
Should I also add the other quirk you made which made the problem happen fewer times?
https:/
I'm using the 4.15-23
Kai-Heng Feng (kaihengfeng) wrote : | #122 |
No, just use the patch in #120.
Lucas Zanella (lucaszanella) wrote : | #123 |
I just compiled the kernel and it presented the error just minutes after the first boot.
A reminder: my kernel parameters are still like this:
GRUB_CMDLINE_
Here's the output you wanted:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
Subsystem: Razer USA Ltd. Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
Latency: 0
Capabilities: [e0] Vendor Specific Information: Len=10 <?>
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02) (prog-if 00 [VGA controller])
Subsystem: Razer USA Ltd. HD Graphics 620 [1a58:6752]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 124
Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 90000000 (64-bit, prefetchable) [size=256M]
Region 4: I/O ports at f000 [size=64]
[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
Address: fee00018 Data: 0000
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Process Address Space ID (PASID)
PASIDCap: Exec- Priv-, Max PASID Width: 14
PASIDCtl: Enable- Exec- Priv-
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable-, Smallest Translation Unit: 00
Capabilities: [300 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00008000, Page Request Allocation: 00000000
Kernel driver in use: i915
Kernel modules: i915
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller [8086:9d2f] (rev 21) (prog-if 30 [XHCI])
Subsystem: Razer USA Ltd. Sunrise Point-LP USB 3.0 xHCI Controller [1a58:6752]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >T...
Kai-Heng Feng (kaihengfeng) wrote : | #124 |
So the ASPM is indeed disabled:
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
Can yo try disabling the deepest power state under this kernel? i.e. use "nvme-core.
Kai-Heng Feng (kaihengfeng) wrote : | #125 |
FWIW there's another user says that disabling ASPM fixes this issue for him.
Lucas Zanella (lucaszanella) wrote : | #126 |
Should I take nvme.io_
I'm adding nvme-core.
Is this an user of Razer Blade Stealth? Would be good to talk to him to see if he experiences the problems inside VMs, which is very annoying as I do everything in them.
Kai-Heng Feng (kaihengfeng) wrote : | #127 |
No the user uses an XPS 9560. I think remove io_queue_depth parameter should be safe.
Lucas Zanella (lucaszanella) wrote : | #128 |
I've tested the new kernel with this parameter for 10 days and the error didn't happen, which is a record. So I decided to open a VM yesterday and it handled good for like 12 hours, they I went to sleep and grabbed the computer again and the error had happened inside and outside the VM. I don't know if it was caused by the VM itself but for those 10 days I tested, it worked great.
Kai-Heng Feng (kaihengfeng) wrote : | #129 |
Can you attach the error? Maybe use something else like virtual box to see if KVM is the culprit?
hariprasad (hariprasad) wrote : | #130 |
I can confirm. I have the same problems since 10.11. 2017. Some hours system works and suddenly there are serious SSD problems. I changed SSD rour times (reclamation) and I did recamation odf a new Computer one times.
HW: Intel NUC7i7BNH (Intel i7), Samsung EVO 960 M.2 NVMe, OPM Crucial 16GB.
SSD formatting: Partition table GPT, Primary Partitions EXT4
SW: Ubuntu Linux 18.04
Installation UEFI, legacy has no effect.
hariprasad (hariprasad) wrote : | #131 |
Additionally, i can comfirm, that problem is not bounded only on Samsung NVME SSD, but It occurs on Intel SSD as well. It looks like, that problem is a SSD Driver.
hariprasad (hariprasad) wrote : | #132 |
There is an issue, wich can be our problem. it looks like, that workaround is to disable TRIM. https:/
Lucas Zanella (lucaszanella) wrote : | #133 |
Hi hariprasad. nvme-core.
I'm going to try to disable TRIM but it's going to take days for me to test if the VMs give any errors. Should sudo rm /etc/cron.
Have you experienced problems installing ubuntu into your SSD? My ubuntu installation gives disk error in 8/10 tries. That is, I need to reinstall ubuntu 8 times in average for the installation to end without any problems. I didn't try to install ubuntu with the kernel patch because it's a lot of work to create a live CD instalation with a patched kernel, but maybe I'll do it some day.
I also need to try virtualbox in place of virt-manager as kaihengfeng suggested. Problem is that I need to leave things open for days to notice these errors, so it's going to take time.
Kai-Heng Feng (kaihengfeng) wrote : | #134 |
Lucas,
I've found that the PCIe common clock may be the culprit here.
Please try the kernel [1].
Lucas Zanella (lucaszanella) wrote : | #135 |
Could you send me the patch (and should I use the aspm patch together with it?)
Another information that you might find useful: I've been using for 30 days without any problems on the main machine (only one incident inside the virtual machine), so today I decided to finally conclude all the missing updates that I've been waiting to do since the problems arised. It updated more than 1000 packages, and as before it gave the Read Only error while updating initram. It seems that it ALWAYS happens when updating initram.
Here's the output I was able to save: https:/
So after 30 days without any problems it happens while doing this. It's gotta explain something, because it's very unusual, and I've always had problems with initram before.
Sam (samr28) wrote : | #136 |
I also have the same issue on a Razer Blade 2017 - 7500U model. My system has the exact same drive in it. I have just installed the 4.18.0-3 kernel linked above and will post here if I run into the issue again.
hariprasad (hariprasad) wrote : | #137 |
Hello, my Ubuntu Linux was 8-9 months out of order v. 17.10, and later 18.04. I did many installations and tests, changed SSD M.2 four times (recognized reclamation), changed the whole NUC7i7BNH (recognized reclamation). Log Issue on Intel. Finally I installed Fedora 28. and NVME M.2 SSD is in good condition and work properly. I used default LVM partition format. Ubuntu during installation on VLN filed immediatelly during installation, when updates were applied. The problem is bounded specially with Ubuntu. I doesn't check, which driver use Fedora and Ubuntu, Fedora kernel is '4.17.12-
Lucas Zanella (lucaszanella) wrote : | #138 |
Hi hariprasad. If possible, you could try our patched kernel which disables ASPM: https:/
hariprasad (hariprasad) wrote : | #139 |
Hello Lucas, thank you for response. Yes, badly setted kernel parameters can cause very serious problems. Additionally, I can comfirm, that problem is not bounded only on Samsung NVME SSD, but It occurs on Intel SSD-6 series as well. It looks like, that problem is in ASPM SSD Driver - kernel parameters. There are a few errors, which came in one time. The easiest way, how to simulate initframfs error during startup/restart is to install Thunderbird and download thousands emails from cloud e.q. google mail to generate traffic on SSD. Than install and startup Firefox, add plugins for video (Player) and stertup video. Firefox for Linux (last version was something about 57-61) is unstable on Linux (generally, not only Ubuntu), than Firefox begin crash, and issues - "Would you like to restart and recover Firefox?". It streses the SSD and after a few restores (about 10) probably begin crash Thunderbird. It is the time for restart system. Probably - there will be issue, that it is not possible to start Ubuntu and initframfs error occured.
Lucas Zanella (lucaszanella) wrote : | #140 |
It's worth saying that the ASPM patch + 1500 kernel parameter worked for me for over a month without giving me one single error. After update to 18.04 now I see the error every 2 or 3 days. Actually, in the middle of the update process to 18.04 it gave the error right on the initramfs update, which is where it always gives the error. This is sad, it was working perfectly except inside the VMs but it was very stable :(
hariprasad (hariprasad) wrote : | #141 |
Hello Lucas, thank you for response. Yes, badly setted kernel parameters can cause very serious problems. Additionally, I can comfirm, that problem is not bounded only on Samsung NVME SSD, but It occurs on Intel SSD-6 series as well. It looks like, that problem is in ASPM SSD Driver - kernel parameters. There are a few errors, which came in one time. The easiest way, how to simulate initframfs error during startup/restart is to install Thunderbird and download thousands emails from cloud e.q. google mail to generate traffic on SSD. Than install and startup Firefox, add plugins for video (Player) and stertup video. Firefox for Linux (last version was something about 57-61) is unstable on Linux (generally, not only Ubuntu), than Firefox begin crash, and issues - "Would you like to restart and recover Firefox?". It streses the SSD and after a few restores (about 10) probably begin crash Thunderbird. It is the time for restart system. Probably - there will be issue, that it is not possible to start Ubuntu and initframfs error occured.
Sam (samr28) wrote : | #142 |
I'm also running the ASPM patch and haven't had problems for the last month or so. Any idea when this will get merged?
Janne Peltonen (janne-peltonen) wrote : | #143 |
Stumbled onto this bug from somewhere else, and noticed that it seems I have the same samsung SSD drive SM961/PM961 (Same output on lspci --vvnn regarding the NVMe as Lucas posted,). However, for me it has worked without any problems on stock ubuntu 18.04 / mint / kubuntu installations. Perhaps it depends on the system configuration as well instead of just the SSD? Not sure this information helps but though to post it anyway.
Fabian (fabiangieseke) wrote : | #144 |
I have had the same issues with Ubuntu 18.04 and a Samsung MZ-V7E1T0 1000GB M.2 PCI Express 3.0 and the default installation (ext4): Plenty of errors, especially when upgrading/
I have reinstalled the whole system. Instead of the standard journaling file system (ext4), I have btrfs for the root mount point (/). System works perfectly now, no errors for a couple of days with plenty of software being installed.
Not sure, might be a ext4/kernel bug (?).
Janne Peltonen (janne-peltonen) wrote : | #145 |
To add to my previous comment, I've been running ext4 all the time.
Lucas Zanella (lucaszanella) wrote : | #146 |
Fabian, did you have any problems installing ubuntu? Mine would give disk errors about 7/10 times I tried to install. I had to try many times until no error appeared.
I'd like to try btrfs but I don't have the time to do it right now. I also had problems with apt, but when upgrading the system. It'd always give the error in the initramfs update, or something like that.
I'll try to install a fresh ubuntu 18.04 soon too, as Janne suggested.
Fabian (fabiangieseke) wrote : | #147 |
I have tried two things:
(1) Fresh install, Ubuntu 18.04 (about ten days ago), ext4. No errors during the installation. However, when installing stuff via apt afterwards (or upgrading), I got many errors along the lines described above (e.g., "compressed data is corrupt... unexpected end of file or stream"). This happened for, I guess, arbitrary packages. No errors for initramfs update for me ...
(2) Fresh install, Ubuntu 18.04 (about four days ago), btrfs for /. No errors at all.
Ole Christian Nilsen (oc-nilsen) wrote : | #148 |
I have this bug (MSI laptop, Ubuntu Studio 18.04) and it's getting quite annoying to be honest. If there's anything I can do to help remedy the situation within reasonable time (I'm about to reinstall) then let me know.
Lucas Zanella (lucaszanella) wrote : | #149 |
Hi Ole Christian. First, did you have any problems in the ubuntu installation? In mine I had to try to install several times until it installed without any disk errors.
Also, you can try this kernel https:/
I guess someone is working on this bug for a definitive solution...
Ole Christian Nilsen (oc-nilsen) wrote : | #150 |
Hi Lucas! Thanks for the reply. No, I had no problems during installation. The computer just shuts down at random intervals to a black screen with all kinds of EXT4-fs errors and reports that the file system is read only. Often the disk isn't even recognized at reboot, so I have to boot into a live environment and use Gparted to fix it from there.
I do music production professionally, so if I can't get it fixed relatively easily and permanently then I'll have to look elsewhere unfortunately.
Thanks though. :)
Lucas Zanella (lucaszanella) wrote : | #151 |
Ok Christian, thanks for the info. You can try the kernel for now, and I also read that using ubuntu with brtfs system instead of ext4 also solves the problem, you could try
Ole Christian Nilsen (oc-nilsen) wrote : | #152 |
I may try that. Are we sure it's a kernel bug though? I can't remember having this problem when I used Solus OS for a while. But I may not have used for long enough since I discovered it didn't support Jack2 and was pretty much unusable to me.
pleban (marek-zebrowski-gmail) wrote : | #153 |
I can confirm that bug with two different NVMe drivers - Samsung EVO970 and WD Black in 4.18.0-10-generic and in 4.15.0-20-generic kernels. H270 Intel chipset on the motherboard
Ole Christian Nilsen (oc-nilsen) wrote : | #154 |
I have the WD Black 256 Gb drive.
pleban (marek-zebrowski-gmail) wrote : | #155 |
- nvme-heavy-write-error.txt Edit (5.4 KiB, text/plain)
Attachment contains error from dmesg output. For me reproduction steps are: write large (>10G) amount of data to NVMe ssd.
Kai-Heng Feng (kaihengfeng) wrote : | #156 |
What's the PCI ID for EVO 970 and WD Black?
Kai-Heng Feng (kaihengfeng) wrote : | #157 |
If you use Samsung (144d:a804) or Sk Hynix (1c5c:1285), please try kernel in [1].
pleban (marek-zebrowski-gmail) wrote : | #158 |
My Samsung is indeed [144d:a808]. I'll check WD later on - it's not connected at this time.
I was not able to reproduce this bug using Clear Linux current kernel (4.18.16-645).
pleban (marek-zebrowski-gmail) wrote : | #159 |
My Samsung is indeed [144d:a808]. I'll check WD later on - it's not connected at this time.
I was not able to reproduce this bug using Clear Linux current kernel (4.18.16-645).
I checked kernel https:/
So it looks like success! I'll keep using that kernel for now and report if any problems arise.
Kai-Heng Feng (kaihengfeng) wrote : | #160 |
The kernel doesn't do anything special for 144d:a808, it's for 144d:a804.
pleban (marek-zebrowski-gmail) wrote : | #161 |
Then I'm puzzled. I'll retest later with WD.
Janne Peltonen (janne-peltonen) wrote : | #162 |
- lspci -vvnn output Edit (3.3 KiB, text/plain)
Here is the output of lspci -vvnn on my computer. It's from the 256GB version of the samsung NVMe.
On my systems I've never had any corruption problems, even moving large (60GB+) VM files and installing OS on ext4 multiple times. Currently running stock LM 19. Hope this helps.
Ole Christian Nilsen (oc-nilsen) wrote : | #163 |
lshw output:
*-storage
bus info: pci@0000:04:00.0
lspci output:
04:00.0 Non-Volatile memory controller: Sandisk Corp WD Black NVMe SSD
Ole Christian Nilsen (oc-nilsen) wrote : | #164 |
lspci -vvnn:
04:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Black NVMe SSD [15b7:5001] (prog-if 02 [NVM Express])
Subsystem: Marvell Technology Group Ltd. WD Black NVMe SSD [1b4b:1093]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 16
NUMA node: 0
Region 0: Memory at df100000 (64-bit, non-prefetchable) [size=16K]
Kernel driver in use: nvme
Kernel modules: nvme
Ole Christian Nilsen (oc-nilsen) wrote : | #165 |
I should probably mention that while it is installed in my laptop it is not currently being used as I had to revert to using an ordinary HDD.
Lucas Zanella (lucaszanella) wrote : | #166 |
Any news on this problem? Im still having it
Richard Grieves (trickydickie) wrote : | #167 |
I too am having an SSD corruption issue with Ubuntu 18.04, same exact symptoms. I have a Kingston 480gb SSD, not nvme, connected over SATA. My PC is a desktop, I have attached the output of lspci -vvnn. I have to do manual fsck every 1.5 weeks or so. When I am using my PC, it will freeze up occasionally for about 15 seconds with very high SSD I/O usage - I have attached an iotop log which recorded a freeze at around 18:03:22 (the log records every 1 second, and you will see there is a gap between a recording at 18:03:22 and 18:03:35 which indicates the freeze, followed by 90%+ io. I have included my SSD smart info as well as my current lsblk output below:
=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37480G
Serial Number: 50026B76825B4FA0
LU WWN Device Id: 5 0026b7 6825b4fa0
Firmware Version: SBFKB1C2
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Feb 25 17:57:34 2019 SAST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
lsblk::
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 3.7M 1 loop /snap/gnome-
loop1 7:1 0 13M 1 loop /snap/gnome-
loop2 7:2 0 91M 1 loop /snap/core/6350
loop3 7:3 0 3.7M 1 loop /snap/gnome-
loop4 7:4 0 2.3M 1 loop /snap/gnome-
loop5 7:5 0 140.7M 1 loop /snap/gnome-
loop6 7:6 0 270.5M 1 loop /snap/pycharm-
loop7 7:7 0 86.9M 1 loop /snap/core/4917
loop8 7:8 0 91M 1 loop /snap/core/6405
loop9 7:9 0 14.5M 1 loop /snap/gnome-logs/45
loop10 7:10 0 140.7M 1 loop /snap/gnome-
loop11 7:11 0 13M 1 loop /snap/gnome-
loop12 7:12 0 14.5M 1 loop /snap/gnome-logs/37
loop13 7:13 0 2.3M 1 loop /snap/gnome-
loop14 7:14 0 34.7M 1 loop /snap/gtk-
loop15 7:15 0 34.6M 1 loop /snap/gtk-
loop16 7:16 0 140.9M 1 loop /snap/gnome-
loop17 7:17 0 34.8M 1 loop /snap/gtk-
sda 8:0 0 447.1G 0 disk
└─sda1 8:1 0 447.1G 0 part /
Richard Grieves (trickydickie) wrote : | #168 |
Richard Grieves (trickydickie) wrote : | #169 |
My issue has been resolved by upgrading the firmware of my SSD from SBFKB1C2 to SBFKB1C3.
https:/
Lucas Zanella (lucaszanella) wrote : | #170 |
Just tried Ubuntu 19 today and the problem persists (can't even install ubuntu because it gives io error)
Lucas Zanella (lucaszanella) wrote : | #171 |
Hi Kai-Heng Feng, do you have any news on this problem? It'd be great to know.
Than you so much!
Fabian (fabiangieseke) wrote : | #172 |
Hi,
a little update from my side: It seems that faulty memory was the reason for the data corruptions in my case. I have replaced the memory module and everything seems to work fine now. I was quite surprised though that the memory was defective since I did test it carefully for many hours with memtest (20+ passes without any errors). The errors only occured when running Ubuntu ...
The memory was the only thing I have changed, so I am very sure that this was the cause ...
Kai-Heng Feng (kaihengfeng) wrote : | #173 |
Lucas,
Do you still have this issue on mainline kernel?
Lucas Zanella (lucaszanella) wrote : | #174 |
I tried the Ubuntu 19.04 installer and I couldn't even install it because of IO errors. Does the installer of Ubuntu 19.04 uses the new kernel?
Lucas Zanella (lucaszanella) wrote : | #175 |
Hi Kai-Heng Feng, I just installed kernel 5.1.1 and the error still happens
Kai-Heng Feng (kaihengfeng) wrote : | #176 |
WinEunuchs2Unix (ricklee518) wrote : | #177 |
I've been using NVMe M.2 Samsung Pro 960 for 18 months and never had a problem.
Ubuntu 16.04.6 LTS, Kernel 4.14.114 LTS, Skylake 6700HQ, nVidia 970m
UEFI, GPT, AHCI (Intel Raid off), Secure Boot off
`/etc/fstab`:
UUID=b40b3925-
UUID=D656-F2A8 /boot/efi vfat umask=0077 0 1
UUID=b4512bc6-
`/etc/default/
GRUB_CMDLINE_
I've never had a single fsck error ever. Granted the `grub` boot option `fastboot` means `fsck` is not run on boot but I can check once FS is mounted RW with:
$ sudo fsck -n /dev/nvme0n1p6
fsck from util-linux 2.27.1
e2fsck 1.42.13 (17-May-2015)
Warning! /dev/nvme0n1p6 is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
New_Ubuntu_16.04: clean, 712096/2953920 files, 5733245/11829504 blocks
Assuming your `/etc/fstab` is the same, the two important `grub` boot parameters are: `acpiphp.disable=1 pcie_aspm=force`. If memory serves me correct though these were setup for suspend/resume reasons though.
I hope this helps those effected by bug a little but more importantly that people realize the vast majority of NVMe installations work fine in Linux.
Changed in linux (Ubuntu): | |
assignee: | Kai-Heng Feng (kaihengfeng) → nobody |
tags: | added: cscc |
Lucas Zanella (lucaszanella) wrote : | #178 |
Hi, I didn't find the root of the problem, **BUT**...
Using Qubes OS I was able to run for more than 7 days without any problems! It normally would occur in the first hour of usage.
I guess Qubes's Xen drivers proxy the pcie requests and therefore the failing NVME/PCIE drivers aren't used. So it at least shines a light in the problem.
Maybe someone with better understanding of how Xen works can reason about which drivers are being used and which are not and discover the root of the problem!
Juan Carlos Carvajal Bermúdez (jucajuca) wrote : | #179 |
I have exactly the same drive:
/dev/nvme0n1 S444NY0K600040 SAMSUNG MZVLB256HAHQ-00000 1 81.09 GB / 256.06 GB 512 B + 0 B EXD7101Q
and exactly the same problem.
I filled a bug before deactivating AER (pci=noaer)
https:/
Lucas Zanella (lucaszanella) wrote : | #180 |
Hi Juan, what computer you are in?
If you really want to use your computer with linux the only way that it solved for me was to use Qubes OS
trong luu (tronglx) wrote : | #181 |
Hi lucas, i have the same problem. My laptop is matebook x pro 2018 and nvme LITEON CA3-8D512. My working is on linux and it is an unpleasant experience. Currently, when issue occurs, i power off/on my laptop (one times or more) and it can work normally in a few hours. Do you have any another suggest about linux distributions?
Lucas Zanella (lucaszanella) wrote : | #182 |
Hi tronglx, the only way I found was to install Qubes OS
trong luu (tronglx) wrote : | #183 |
Thank Lucas, do you have tried with arch linux? Qubes OS is very strange with me. I'm developer and os community is very importance.
Lucas Zanella (lucaszanella) wrote : | #184 |
Hi tronglx,
I didn't try Arch but I THINK I tried Manjaro which is based on it. If I did it didn't work. I remember trying lots of linux and all of them failed.
Qubes OS works because it doesn't use linux kernels directly because it uses Xen microkernel, so somehow it excludes the bug. You can install Arch as a Qubes VM, there's a script for it, you just run it and then it generates an image that you can install. They also provide Ubuntu, Kali and others.
Since this bug is rare I don't think they'll try to fix, the guy that was helping here gave up.
trong luu (tronglx) wrote : | #185 |
Thank Lucas,
It just happened to my laptop. I will try find out the solution.
trong luu (tronglx) wrote : | #186 |
I switched to recovery mode and run: mount -o remount,rw /. The problem no longer appears, it seem be fixed.
trong luu (tronglx) wrote : | #187 |
The error still happens.
Lucas Zanella (lucaszanella) wrote : | #188 |
Hi trong luu, for me the error happened every day, which is why I ended up using Qubes. It's the only way that I could find except for Windows
You can try older kernels but it didn't work for me. Remember that downloading older ubuntus will still give you a recent kernel, you have to downgrade by yourself. However ni Ubuntu or Debian kernels fixed the problem for me
trong luu (tronglx) wrote : | #189 |
I think other SSD type is last option. But, i really want find out root cause of the problem. As my understanding, system booted up with Opts: errors=remount-ro. Then something went wrong, system switched to ro mode to protect file system. Do you have checked system log, have any abnormal log? NVME is becoming more and more popular. This is the big problem with linux user.
Juan Carlos Carvajal Bermúdez (jucajuca) wrote : | #190 |
For anyone struggling with this hideous bug, try the following:
add "nvme_core.
GRUB_CMDLINE_
then run "update-grub"
My laptop has been running smoothly for a week now. (/dev/nvme0n1 S444NY0K600040 SAMSUNG MZVLB256HAHQ-00000 1 81.09 GB / 256.06 GB 512 B + 0 B EXD7101Q)
see more infos here: https:/
@kernel developers, would it not be great to detect such disks and lower automatically the nvme_core.
trong luu (tronglx) wrote : | #191 |
Hi Juan, i have tried with your suggest many time but the problem still happens. I also tried with nvme_core.
https:/
Juan Carlos Carvajal Bermúdez (jucajuca) wrote : | #192 |
Hi
try:
cat /sys/module/
sudo nvme get-feature -f 0x0c -H /dev/nvme0
please read carefully the link provided. the info is there.
trong luu (tronglx) wrote : | #193 |
After running cat /sys/module/
sudo nvme get-feature -f 0x0c -H /dev/nvme0n1p2
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
Autonomous Power State Transition Enable (APSTE): Disabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 1]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 2]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 3]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 4]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 5]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 6]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 7]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 8]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 9]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[10]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[11]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[12]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[13]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[14]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[15]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[16]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[17]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[18]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[19]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[20]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
...
trong luu (tronglx) wrote : | #194 |
Hi Lucas, is it ok if installing window and using ubuntu in VMware?
Lucas Zanella (lucaszanella) wrote : | #195 |
Hi trong luu, I didn't test it, but I think that it depends on the way VMware virtualizes access to the disk. There may be multiple ways, one of which will work.
trong luu (tronglx) wrote : | #196 |
Hi Lucas, do you have tried with new SSD? I don't think this is the hw issue. My SSD Power Cycles is only 807. Eventually, if not having any other solution, i think i will buy new SSD, do you know which type of SSD would work properly with linux?
Smartctl output:
sudo smartctl -t long -a /dev/nvme0n1p2
smartctl 6.6 2016-05-31 r4324 [x86_64-
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontoo
=== START OF INFORMATION SECTION ===
Model Number: LITEON CA3-8D512
Serial Number: 0028104000DN
Firmware Version: C49640A
PCI Vendor ID: 0x14a4
PCI Vendor Subsystem ID: 0x1b4b
IEEE OUI Identifier: 0x002303
Total NVM Capacity: 512,110,190,592 [512 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Dec 19 08:54:28 2019 +07
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x001f): Security Format Frmw_DL NS_Mngmt *Other*
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Warning Comp. Temp. Threshold: 83 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.00W - - 0 0 0 0 0 0
1 + 4.50W - - 1 1 1 1 5 5
2 + 3.00W - - 2 2 2 2 5 5
3 - 0.0700W - - 3 3 3 3 1000 5000
4 - 0.0100W - - 4 4 4 4 5000 45000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 1
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning: 0x00
Temperature: 47 Celsius
Available Spare: 100%
Available Spare Threshold: 0%
Percentage Used: 0%
Data Units Read: 5,773,150 [2.95 TB]
Data Units Written: 6,405,757 [3.27 TB]
Host Read Commands: 78,674,228
Host Write Commands: 91,754,035
Controller Busy Time: 10,405
Power Cycles: 807
Power On Hours: 312
Unsafe Shutdowns: 104
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 47 Celsius
Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged
Lucas Zanella (lucaszanella) wrote : | #197 |
Hu trong lu. I indeed bought a new SSD because I thought mine was faulty. However I bought one of the same brand (Samsung). I didn't have the idea of buying another brand. Anyways, the brand new SSD also has the problem.
For my case it definitely is not a hardware problem. With Linux the problem happens every day, sometimes more than once per day. With Windows the error never happened and with Qubes I'm running for more than 2 months without any problems. So it's not hardware, definitely is something wrong with Linux kernel
trong luu (tronglx) wrote : | #198 |
Thank Lucas, i think i will buy another type of SSD. Do you have any suggestion?
Lucas Zanella (lucaszanella) wrote : | #199 |
The other 2 good brands I know are Corsair and WD Black. Don't buy Samsung, the majority of people with this problem have Samsung
trong luu (tronglx) wrote : | #200 |
Thank you. My SSD is LITEON CA3-8D512, not Samsung. So, would i buy another type of SSD? non nvme?
Lucas Zanella (lucaszanella) wrote : | #201 |
non nvme SSDs are pretty slow, like 8 times slower. Stick with NVME and if nothing works install Qubes
trong luu (tronglx) wrote : | #202 |
Thank Lucas.
Craigums Carlonious (craigsidcarlson) wrote : | #203 |
It's 2020, is there still no solution to this problem? Getting this error with ubuntu 18 LTS and 19
Kai-Heng Feng (kaihengfeng) wrote : | #204 |
Craigums Carlonious,
Is the system exact the same?
Craigums Carlonious (craigsidcarlson) wrote : | #205 |
Hi, yes I also am trying to install onto the razer blade stealth like a lot of the other people above which have the SAMSUNG MZVLB256HAHQ-00000 nvme ssd. Getting the I/O Error and have tried most of the fixes mentioned above, but no luck, and I would rather continue using Windows instead of Qubes.
Kai-Heng Feng (kaihengfeng) wrote : | #206 |
Can you please attach `sudo nvme id-ctrl /dev/nvme0`?
Lucas Zanella (lucaszanella) wrote : | #207 |
Hi Kai-Heng Feng, please note that after I installed Qubes, I never ever had the problem again. It may be useful in the debug process, and maybe the way Xen, PCIe and Linux work together in Qubes can give a hint on what's happening. Thank you for all your help to this day.
Juan Carlos Carvajal Bermúdez (jucajuca) wrote : | #208 |
an update on this:
it was actually pcie_aspm=off what helped to solve the problem.
I think the problem is related to the power management of PCIe ports.
Without pcie_aspm=off I started seeing errors like the following ones:
- [drm:intel_
pcieport 0000:00:1d.0: AER: Corrected error received: 0000:00:1d.0
pcieport 0000:00:1d.0: AER: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
pcieport 0000:00:1d.0: AER: device [8086:a330] error status/
I think the bug is not with the nvme controller but somewhere in ASPM. But I am not a kernel developer.
Lucas Zanella (lucaszanella) wrote : | #209 |
Juan, which hardware you're on? Razer?
Juan Carlos Carvajal Bermúdez (jucajuca) wrote : | #210 |
No I have a laptop from a XMG, it is a German brand.
Ramon Fontes (ramonreisfontes) wrote : | #211 |
Hello all!
I'm experiencing the same problem with an adata SU800NS38. My SSD works fine with the 4.17.0-
Ramon Fontes (ramonreisfontes) wrote : | #212 |
BTW, pcie_aspm=off and nvme_core.
Lucas Zanella (lucaszanella) wrote : | #213 |
Ramon, which hardware is yours? Razer?
Enviado via ProtonMail móvel
-------- Mensagem Original --------
Ativo 6 de mar de 2020 16:34, Ramon Fontes escreveu:
> BTW, pcie_aspm=off and nvme_core.
> work.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.383484] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.385469] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.387278] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.389262] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.391252] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.393341] EXT4-fs error (device nvme0n1p2): ext4_find_
> [63285.618078] audit: type=1400 audit(151719556
>
> Rebooting the ubuntu will give me a black terminal where I can run
> fsck /dev/nvm30n1p2 (something like that) and it fill fix a lot of
> orphaned inodes. The majority of time it boots back to the Ubuntu
> working good, but some times...
Ramon Fontes (ramonreisfontes) wrote : | #214 |
I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is that I bought my laptop about 1 year ago and I've installed Ubuntu 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything worked as expected. However, the same problem happened with any other kernel version (including 4.17.0-
Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-
[1] https:/
[2] https:/
Lucas Zanella (lucaszanella) wrote : | #215 |
I also had this problem of it working for a year, then I update it and it stops working. Then I roll back the kernel and it won't work again
Enviado via ProtonMail móvel
-------- Mensagem Original --------
Ativo 7 de mar de 2020 10:17, Ramon Fontes escreveu:
> I have a Dell Inspiron 14 5000 Series-5480. The most strange thing is
> that I bought my laptop about 1 year ago and I've installed Ubuntu
> 18.04.1 with kernel 4.15.0-65-xxx (default installation) and everything
> worked as expected. However, the same problem happened with any other
> kernel version (including 4.17.0-
>
> Then, after having some problems with my system I've installed Ubuntu 18.04.4. The kernel version installed with the system was 5.3 and after observing the same problem with the disk I've tried to install 4.15.0-65 and the problem has not been solved (I don't remember exactly which kernel version I had in the first time (e.g. what was the xxx)). Finally, I've found that 4.17.0-
>
> [1] https:/
> [2] https:/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Samsung SSD corruption (fsck needed)
>
> Status in linux package in Ubuntu:
> Confirmed
>
> Bug description:
> Ubuntu 4.13.0-
>
> I have a Razer Blade Stealth 2016. The first Ubuntu I installed was Ubuntu 17.04, which gave me this error after 2 weeks of usage. After that, I installed 16.04 and used it for MONTHS without any problems, until it produced the same error this week. I think it has to do with the ubuntu updates, because I did one recently and one today, just before this problem. Could be a coincidence though.
>
> I notice the error when I try to save something on disk and it says me
> that the disk is in read-only mode:
>
> lz@lz:/var/log$ touch something
> touch: cannot touch 'something': Read-only file system
>
> lz@lz:/var/log$ cat syslog
> Jan 29 01:07:39 lz kernel: [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
>
> lz@lz:/var/log$ dmesg
> [62984.375393] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.377374] Aborting journal on device nvme0n1p2-8.
> [62984.379343] EXT4-fs (nvme0n1p2): Remounting filesystem read-only
> [62984.379516] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.381486] EXT4-fs error (device nvme0n1p2): ext4_find_
> [62984.383484] EXT4-fs error (devic...
Kai-Heng Feng (kaihengfeng) wrote : | #216 |
Ramon,
Please file a separate bug since it's platform specific.
Ramon Fontes (ramonreisfontes) wrote : | #217 |
I thought I could help in some way with more information. By the way, I've found the solution and my SSD works fine right now. You may want to take a lookt at https:/
Lucas Zanella (lucaszanella) wrote : | #218 |
I just want to say that after 2 years I remembered I had an SSD with different brand tham Samsung, a Kingston one. I installed it on my razer and it worked perfectly for days, I did several SSD stress tests and no errors.
The error is defintely with Samsung AND linux. And it's not a faulty SSD because it happens on both of my samsung SSDs. It does not happen on Windows, neither Qubes, with any SSD.
I tested the latest Ubuntu 21.04 and the problem still happens on Samsung SSDs right on the installation screen.
Anyways I'm not even using this computer anymore, I bought a Dell XPS 13, but the error persists and it's either Samsung's or Linux fault. Probably Samsung since other brands work ok with Samsung.
Anthony Durity (anthony-durity) wrote : | #219 |
- omg_dmesg_b0rked.txt Edit (109.7 KiB, text/plain)
I've hit this "bug". I've a nice Clevo ODM based laptop and luckily I have two nvme drives in it so it's not a show-stopper for me but obv. it's a concern. I have an Intel one which is the boot drive and a Samsung one which is the data drive. I have a dual-boot setup. So two data points to note. The Intel nvme works in both Windows and Linux. The Samsung works in Windows, but not in Linux. When I say that it doesn't work in Linux I should say that the system brings the drive up, I can mount it read-write, everything looks good but as soon as I try and write files to it it craps out with nothing written:
[369.798910] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[369.798916] nvme nvme0: Does your device have a faulty power saving mode enabled?
[369.798918] nvme nvme0: Try "nvme_core.
[369.870912] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[369.871064] nvme nvme0: Removing after probe failure status: -19
[369.890931] nvme0n1: detected capacity change from 1953525168 to 0
Output of `dmesg` attached.
tetebueno (tetebueno) wrote (last edit ): | #220 |
Bump:
PC: Lenovo Legion Y520-15IKBN
SSD: Samsung SM951 M.2 PCIe SSD Drive (MZ-HPV256)
OS: Elementary OS 7.1 Horus (Ubuntu 22.04.1)
Kernel: 6.5.0-14-generic
---
lshw relevant parts:
computer
description: Notebook
product: 80WK (LENOVO_
vendor: LENOVO
version: Lenovo Y520-15IKBN
serial: PF0UM7F3
width: 64 bits
capabilities: smbios-3.0.0 dmi-3.0.0 smp vsyscall32
configuration: administrator_
(...)
---
Update: I first tried changes in comment #190 but that didn't work, the error persisted. Then, I tried only adding the pcie_aspm=off parameter only (removing the nvme_core.
Would it be possible for you to test the latest upstream kernel? Refer to https:/ /wiki.ubuntu. com/KernelMainl ineBuilds . Please test the latest v4.15 kernel[0].
If this bug is fixed in the mainline kernel, please add the following tag 'kernel- fixed-upstream' .
If the mainline kernel does not fix this bug, please add the tag: 'kernel- bug-exists- upstream' .
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
Thanks in advance.
[0] http:// kernel. ubuntu. com/~kernel- ppa/mainline/ v4.15