trim does not work with Samsung 840 EVO after firmware update (EXT0DB6Q)

Bug #1449005 reported by Marcuz93 on 2015-04-27
220
This bug affects 35 people
Affects Status Importance Assigned to Milestone
fstrim (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Utopic
Undecided
Unassigned
Vivid
Undecided
Unassigned
linux (Ubuntu)
High
Unassigned
Trusty
High
Dave Chiluk
Utopic
High
Dave Chiluk
Vivid
High
Dave Chiluk

Bug Description

Hi,
after updating Samsung SSD 840 EVO to the latest firmware (EXT0DB6Q) the fstream command doesn't work anymore. Besides, there are a lot of file system errors and "bad sectors" reports. The problem is certainly caused by Samsung's firmware but I reported it as a bug because in Windows TRIM works normally after the upgrade. I have attached an image uploaded by a user who is experiencing the same problem. If you execute the command "sudo fstrim /" the console gets stuck. If you add that command in rc.local the boot is compromised. I really hope this problem could be solved.
Thanks

===
break-fix: - 6fc4d97a4987c5d247655a157a9377996626221a
break-fix: - 9a9324d3969678d44b330e1230ad2c8ae67acf81

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in fstrim (Ubuntu):
status: New → Confirmed
rozwell (rozwell69) wrote :

Same here and as addition to the image with errors above I've attached some other errors I actually captured.
In my case, I had to disable discard command in /etc/fstab because system wasn't able to boot anymore.

delfi (korkyra52) wrote :

Sadly, same trouble. fstrim fails and taints ext4 partition, discard prevents mount.

Robert Hooker (sarvatt) wrote :

What kernel version are you using? Not seeing any problems at all with 3.16.0-36-generic from 14.04.2

3.19.0-15 from latest Ubuntu, but the same problem occurred with Mint 17.1 which is based on 14.04.2

rozwell (rozwell69) wrote :

I moved from kernel 3.19.3 to 4.0.1 and the problem is still there.
There were no issues on previous firmware.
Looks like TRIM simply doesn't work on EVO now.
It not a valid Ubuntu bug, maybe they can fix it in kernel somehow, but maybe it's just time for a new drive...

As I said in the description I realize that it's probably caused by the new controller, but I reported it as a bug because in Windows I am able to trim the disk.

I have the same problem. After updating my Evo 840 to firmware EXT0DB6Q I got a lot of read errors (kernel 3.16.0-37), but the smart status did not show any errors logged. I currently run this drive in an older Thinkpad X201. Interestingly I was able to boot the system partially but it got hung up due to read errors in the middle of the process.

As an initial workaround I switched the controller from AHCI to compatibility before I found this bug report. Then I was able to boot the system without problems.

Disabling the discard option in fstab made my system work in AHCI mode again. So I can definitely confirm it is related to trimming.

Henning Paul (hnch) wrote :

Confirmed under openSUSE 13.1. Disabling NCQ makes discard work again, so it must be the combination of TRIM with Linux' NCQ implementation. Someone should contact Samsung w.r.t. issue.

I contacted Samsung and this was the answer:

"Dear Marco,
Thank you very much for your fast feedback and information.

We escalate to other dept. to find out the root cause and solution.

Thank you in advance for your patience.

Kind Regards"

I really hope they'll fix this issue soon.
For anyone interested in contacting them this is the email address: <email address hidden>

cki (charles-kirsch) wrote :

I confirm for Kubuntu 15.03 with 3.19.0-15-generic. With the hint #11 and #12 I could implement a workaround with discard enabled in fstab. Instead of deactivating the controller, I only disabled NCQ in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash libata.force=noncq"
After update-grub and reboot everything seems to work.

rozwell (rozwell69) wrote :

But from what I know NCQ is important to maintain SSD performance... did you test it in both cases?

Lukas Loehrer (loehrerl) wrote :

It may not be necessary to disable NCQ completely, but only for the
TRIM command. The 3.16.0 kernel apparently already contains a blacklist of
devices for which TRIM together with NCQ causes trouble. It is located
in the file:

drivers/ata/libata-core.c

Could anyone, who has already applied the firmware update, plece try
to add an entry like the follwoing to the array ata_device_blacklist in the above file?

{ "Samsung SSD 840 EVO *", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }

On boot, the resulting kernel should say something like the following
for the 840 EVO:

"disabling queued TRIM support"

On my way. I am certainly pi**ed of this Samsung FW ... see attached patch.

I will report back after testing.

Looks good so far: http://paste.ubuntu.com/10977400/

Without that workaround it would not even boot without sever FS corruption. I use discard from /etc/fstab

rozwell (rozwell69) wrote :

Amazing, I've encountered those issues on regular use, without TRIM enabled.
At least it's what I think because I had to force - shutdown the laptop.
I guess this is it for me.

Okay, some further info:

I build kodi from source (complete ffmpeg, a lot of dependencies, a whole lot of small files) with 4 threads. No issue so far. Though I am certainly no kernel dev, someone with more clue should have a look and second my testing.

I used stable branch of linux kernel with the above patch applied (4.0.1).

Thanks very much Lukas for the tipp.

The attachment "Disable queued trimming" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Sven (sven-koehler) wrote :

Peter, did you send your patch upstream?

Nope, I did not. Though someone of Ubuntu will pick it up - as I basically don't know 100% what I am doing there :-). I just disabled delayed trim with native command queueing. But it's rather crucial as this concerns the integrity and safety of user's data. So someone in charge should reproduce and revisit this.

Sven (sven-koehler) wrote :

The problem was discussed on the linux-ide mailing list yesterday. A patch will appear upstream (already has been applied I think) and it disables queued TRIM for all Samsung 8xx SSDs.

Sven could you explain the consequences of disabling the queued trim? Will the trim command work as before? Or this means that trim is disabled?

There are no conequences. The older firmware simple did not announce that feature, so it was not used at all. The newer tells: "Hey, it's me, your ssd, I can do delayed trimming in combination with ncq" - but in reality it does not support it at all.

So - to my knowledge - this was always disabled, as not announced by the ssd. So nothing should change.

Sven (sven-koehler) wrote :

Peter is right. With previous firmwares, the drive would not announce that it suppurts queued TRIM. The drive would however still support an "unqueued" TRIM (so the discard option will work). With the patch, Samsung SSDs are blacklisted and the kernel will not use queued but only unqueued TRIM commands.

Thanks for the answers, I tried to execute "fstrim /" command and now it works. However, there's no output. Before the patch I remember it reported the number of the blocks trimmed. Is this normal?

Kendek (nemh) wrote :

Yeah, just use "-v" option for print number of discarded bytes.

Thanks Kendek, it works. However, if you insert the command "fstrim /" in /etc/rc.local the PC still doesn't boot. If you execute it in a shell it works.

Lukas Loehrer (loehrerl) wrote :

Does anyone know if the fix will be backported to the Ubuntu kernel
and thus be available in the 3.13.0 or 3.16.0 kernel packages for
Trusty? What would be necessary to make this happen?

Diep Pham (favadi) wrote :

I made the mistake upgrade all my samsung 840 evo SSDs to latest frimware and TRIM is broken in Ubuntu 14.04. A fix 3.16 would be nice.

GrzesiekC (grzesiekc) wrote :

Do the procedure form #14.
As far as I know the "fix" does the same but on a different level.
TRIM in my EVO 840 works now.

Josh Hill (ingenium) wrote :

What performance issues (if any) would be encountered as a result of disabling NCQ? It appears it has more of an impact on HDDs than SSDs?

I'm guessing there's no boot flag that can just disable queued TRIM but leave NCQ enabled for everything else?

Sven (sven-koehler) wrote :

I think you shouldn't worry too much an either disable NCQ or remove the discard option from fstab. I will not speculate on the performance impact of disabling NCQ. Please take your favorite I/O benchmark and evaluate the different options.

I saw patches that add an option to only disable queued TRIM and leave NCQ enabled for everything else. But they will only show up in Linux 4.1, as far as I know.

I can't boot at all in runlevel 5 and I have a lot of "FAILED COMMAND: WRITE FPDMA QUEUE".

I can boot only in runlevel 1 and after "echo 1 >/sys/block/sda/device/queue_depth" everything work properly.

Today it failed even in runlevel 1, the best solution seems to be "libata.force=noncq" if you can't rebuild the kernel.

Dave Chiluk (chiluk) wrote :

So I have an 840 EVO, and have ncq enabled and am running the latest firmware. I do not have the discard option in fstab, but instead run fstrim weekly as is the default in ubuntu. Everything seems to be functioning fine for me.

It's entirely possible that something else is at play here. Perhaps your sectors had electrically degraded to the point of data loss, and the firmware update simply marked them bad for lack of being refreshed.

[ 1.108071] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.108119] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 1.108155] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.108180] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.108206] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.108234] ata4: SATA link down (SStatus 0 SControl 300)
[ 1.110025] ata2.00: supports DRM functions and may not be fully accessible
[ 1.110107] ata2.00: ATA-9: Samsung SSD 840 EVO 500GB, EXT0DB6Q, max UDMA/133
[ 1.110110] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.110293] ata2.00: supports DRM functions and may not be fully accessible
[ 1.110336] ata2.00: configured for UDMA/133

rozwell (rozwell69) wrote :

@chiluk: For me discard option in fstab was causing the problems and I did full secure erase. Even twice, after I couldn't boot because of mentioned errors.

@chiluk I quote rozwell's comment, I also did a secure erase and the problem persisted.

@chiluk:
Nothing wrong electrically and NO data loss, simply the new firmware is buggy.
With EXT0DB6Q errors are every time on different sectors.
With "libata.force=noncq", or even better with a patched kernel (ATA_HORKAGE_NO_NCQ_TRIM for the "Samsung SSD 840 EVO") there is no error at all, no data loss and no errors.

Sven (sven-koehler) wrote :

There is no benefit of running fstrim or secure erase your SSD does not have impact. The problem lies in the firmware advertising (but not supporting) queued TRIM. And this will not go away magically by doing a secure erase. The problem will reappear the next time you delete a file (if you have discard enabled)

@sven: Don't explain too much, some guys don't read at all which causes further confusion.

Josh Hill (ingenium) wrote :

I'm currently recompiling the 3.16 kernel with ATA_HORKAGE_NO_NCQ_TRIM added for the the 840 EVO, derived from this patch: https://launchpadlibrarian.net/205494465/0001-libata-Disable-native-queued-TRIM-support-Reason-fw-.patch

I'll report back on if it actually works. However, it seems that ATA_HORKAGE_ZERO_AFTER_TRIM isn't defined in the 3.16 kernel (it may not appear until 3.19?). What does it do? Is it necessary for 3.16?

delfi (korkyra52) wrote :

Did somebody get a meaningful, non generic reply from Samsung? This issue, very likely affects thousands of users (double boot with Win, where magician did the update), and most of them won't be able to switch to a custom kernel, or if on older LTS, won't receive the kernel fix ever. Disabling NCQ in grub defies the purpose/main idea of SSD :)

Ioannis Vranos (cppdeveloper) wrote :

As I read on this thread:

http://comments.gmane.org/gmane.linux.ide/59791

the issue is common to many Samsung SSDs.

I hope Canonical will backport the kernel fix to *buntus 15.04, and 14.04 soon.

Andy Whitcroft (apw) on 2015-05-21
Changed in fstrim (Ubuntu):
status: Confirmed → Invalid
Changed in fstrim (Ubuntu Trusty):
status: New → Invalid
Changed in fstrim (Ubuntu Utopic):
status: New → Invalid
Changed in fstrim (Ubuntu Vivid):
status: New → Invalid
Changed in linux (Ubuntu):
status: New → Triaged
Changed in linux (Ubuntu Trusty):
status: New → Triaged
Changed in linux (Ubuntu Utopic):
status: New → Triaged
Changed in linux (Ubuntu Vivid):
status: New → Triaged
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
importance: Undecided → High
Changed in linux (Ubuntu Utopic):
importance: Undecided → High
Changed in linux (Ubuntu Vivid):
importance: Undecided → High
description: updated
tags: added: kernel-bug-break-fix
Andy Whitcroft (apw) on 2015-05-21
Changed in linux (Ubuntu Trusty):
status: Triaged → Confirmed
Changed in linux (Ubuntu Utopic):
status: Triaged → Confirmed
Changed in linux (Ubuntu Vivid):
status: Triaged → Confirmed
Changed in linux (Ubuntu):
status: Triaged → Confirmed
delfi (korkyra52) wrote :

Thx Andy! Btw, finally got some response from Samsung:

'As Linux is open source and can be modified by anyone, *we do not support Linux*. We advise users to disable Queued TRIM in Linux, as doing so will allow Sequential TRIM to run in the OS. .... and have a good day'

Brilliant. Guess no more Samsung for me.

I really can't understand the logical implication: As Linux is open source - > we don't support Linux. No more samsung for me either.

delfi (korkyra52) wrote :

Well, I guess any logic in this case is lost in corporate hallways:)
Sad, and not very bright, as they did have that Mk.1 Freedos fix for linux, and they probably need to change only 1 bit/byte in fw to prevent false trim capability reporting.

Sven (sven-koehler) wrote :

I don't know who at Samsung wrote that response. They have a bug in their SSD firmware. Full stop. The issue with Linux probably is, that it supports a feature that Windows doesn't and thus is the only OS affected by the broken firmware.

Martin Petersen from the linux-ide mailing list wrote, when he was sending the patch blacklisting all 800 series SSDs, that they would work with Samsung to resolve the issue - whatever that means.

delfi (korkyra52) wrote :

The response was from Samsung SSD division from USofA. I hope that EU or Asia might have a different stance on the issue.

Mark Rein (gimpsmart) wrote :

The exact same bug also affects the 850 PRO: http://www.spinics.net/lists/linux-ide/msg50342.html
Originally, they came with EXM01B6Q, which reports "I do NOT support NCQ TRIM!".
Then, samsung released EXM02B6Q, which wrongly reports "I SUPPORT NCQ TRIM!", but of course they don't; the data corrupts if you try.

Samsung then pulled EXM02B6Q from their website for a DIFFERENT issue: The firmware update bricked many devices. But the problem is that new 850 PROs from the factory come with the new, buggy firmware preinstalled. And Samsung are not yet aware that their NCQ TRIM is buggy.

I have contacted Samsung via their regular tech support form and pointed out the NCQ TRIM bug and hopefully they will fix it. I also know that Martin K. Petersen from Oracle has contacts at Samsung and is talking to them to get it resolved in their firmware, see here:
http://permalink.gmane.org/gmane.linux.ide/59802
(Full thread: http://comments.gmane.org/gmane.linux.ide/59791)

I hope Samsung will get on this swiftly, and at least release an intermediary firmware which disables NCQ TRIM until they're ready to re-add it.

Mark Rein (gimpsmart) wrote :

Followup: My email to Samsung clearly and succinctly stated that there is a bug in their firmware, where it wrongly reports "I support NCQ TRIM!" but it really doesn't, and that they introduced the bug recently. I said nothing about Linux.

Their mentally retarded response: "Dear Customer,

Thank you for contacting Samsung Support regarding your concerns and inquiries. We apologize for any inconvenience this may be causing you. Since Linux is open source, Samsung does not support the OS. All we can tell you is that you can disable Queued TRIM in Linux. Linux will then start using Sequential Trim, which is similar to how Windows does TRIM. Newer versions of Linux kernels blacklist the drive so that Queued TRIM automatically is disabled.

Thank you again for contacting Samsung Support and have a good day.

DM

CS20204"

In other words: "Our product is broken, so just tell the OS not to use the broken feature and hey presto you're a winner! Now stop bugging us with your Loonixes and stuff."

Can we try to get press attention on this? Random corruption caused by regular TRIM operation is almost as serious as the 840 Evo's data retention issue.

Mark Rein (gimpsmart) wrote :

Turns out this "I support Queued TRIM!" issue is a year old and affected the entire previous generation too:
Discovered by kernel devs in the 840 Pro and 840 Evo: https://bugzilla.kernel.org/show_bug.cgi?id=72341
and also discussed here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1338706

Here is another brand (WD) doing the same thing in late 2013: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-October/203508.html

The bottom line is: Samsung SSDs firmware have been updated to SATA 3.2 spec, which includes Queued TRIM, and they advertise that they support it, but in reality the firmware doesn't process those queued trims properly and seems to eat random blocks instead, thus destroying random data.

From what I understand from the first link (see comment 48), these drives set "ATA IDENTIFY's word 77, bit 6" to 1, which means "RECEIVE/SEND FPDMA QUEUED supported". That "FPDMA QUEUED" is the thing used to send a queued TRIM. The firmware does not actually support RECV/SEND FPDMA QUEUED, and just wrongly claims that it does. If you try to retrieve "log 13h" the drive errors out, but the spec says that if RECV/SEND FPDMA is supported then log 13h MUST also be supported. So this is a clear case of Samsung's firmware department ticking a flag for all the shiny SATA 3.2 features, and not actually making sure they implemented them all. Very, very shoddy.

Since this problem affects all modern Samsung SSDs, it's really up to Samsung to fix their firmware. It's NOT up to the operating systems to blacklist misbehaving drives. Are we really gonna have to wait until Windows does Queued TRIM and millions of people lose data, for them to react to their broken firmware?

It has proven futile to talk to Samsung via their regular support contact form. This bug report needs to come from someone with better access to intelligent humans at Samsung (they must exist, right?). I know of only two people with "direct access" to Samsung firmware people:
- Marc Carino < marc.ceeeee [at] gmail.com >, who wrote in kernel.org bug #72341 on 2014-05, that he was reaching out to Samsung's firmware department about the problem.
- Martin K. Petersen from Oracle < martin.petersen [at] oracle.com >, who wrote on the kernel mailing list on 2015-05-04 about contacting Samsung.

Dave Chiluk (chiluk) on 2015-05-29
Changed in linux (Ubuntu Trusty):
assignee: nobody → Dave Chiluk (chiluk)
Changed in linux (Ubuntu Utopic):
assignee: nobody → Dave Chiluk (chiluk)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Dave Chiluk (chiluk)
Andy Whitcroft (apw) on 2015-06-04
Changed in linux (Ubuntu Vivid):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Trusty):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Utopic):
status: Confirmed → Fix Committed
Mark Rein (gimpsmart) wrote :

I have now sent a second email to Samsung's support, this time being much more detailed about *exactly* why their drive is broken. If they keep coming back with their retarded form-reply again, I will just keep sending it, and will be talking to the two guys mentioned earlier who have direct contacts at Samsung.

Here is the new email (to save time, I copied heavily from my previous post here on this thread so readers will recognize lots of segments):

---
My Samsung 850 PRO 500gb drive came shipped with firmware EXM02B6Q from the factory.

This firmware revision includes SATA 3.2 spec features, but they are not properly implemented.

The drive sets "ATA IDENTIFY's" word 77, bit 6 to 1 ("true"), which means "RECEIVE/SEND FPDMA QUEUED supported".

But the firmware does NOT actually support RECV/SEND FPDMA QUEUED, and just wrongly claims that it does. If you try to retrieve "log 13h" the drive errors out, but the spec says that if RECV/SEND FPDMA is supported then log 13h MUST also be supported.

So this is a case of Samsung's firmware department ticking a flag for all the shiny SATA 3.2 features, and not actually making sure they implemented them all.

A secondary problem of you incorrectly setting "ATA IDENTIFY's" word 77, bit 6 to 1 ("true") is that FPDMA QUEUED TRIM *must* ALSO be supported if you do that. But the drive does not support queued trim.

So the false advertisement of ATA IDENTIFY word 77 bit 6, without actually supporting that new feature, means that the drive is severely broken in multiple ways.

Two possible solutions to this situation:
1) A firmware update which sets "ATA IDENTIFY's" word 77, bit 6 to 0 ("FALSE!") instead, to PROPERLY show that the drive does NOT support SATA 3.2 FPDMA QUEUED features.
2) Alternatively, a firmware update which implements FPDMA QUEUED, log 13h, FPDMA QUEUED TRIM, etc, so that the drive actually supports what it *claims* it does.

Of these two, #1 is the easiest and makes the most sense. Either way, there's a problem in the firmware and it needs a fix.

Thank you for your time,

Mark
---

Mark Rein (gimpsmart) wrote :

Got that braindead form-reply again, yet again blaming SAMSUNG'S FAULT on Linux:

---

Thank you for contacting Samsung Support regarding your concerns and inquiries. We apologize for any inconvenience this may be causing you. Linux is the only operating system that has this issue with the Queued TRIM. Linux is open source and can be modified by anyone, as such we do not support the OS. We have seen with other customers that updating your kernel version to 4.0.5 addresses the issue.

Thank you again for contacting Samsung Support and have a good day.

DM

CS20204

---

Time to re-send the message with *zero* mention of the word TRIM, because maybe they've got a trigger for the support-monkeys to auto-suggest using this response when "queued trim" is mentioned.

Mark Rein (gimpsmart) wrote :

My final attempt at contacting Samsung's braindead support monkeys, this time without any mention of the word "trim", and I decided to lie and say that I'm a hardware controller manufacturer who discovered the flaw to hopefully get the monkey to take notice. Let's see the support monkeys screw this up again - I am SURE they will, as they've repeatedly done so for *every one of us* who has tried contacting them. And if/when that happens, I'll do one last effort and take this up with the two guys above who have direct contacts at Samsung, who can bypass the monkey brigade.

---

I am working on programming a new hardware RAID controller, and one of my test disks is a Samsung 850 PRO 500gb which shipped with firmware EXM02B6Q from the factory.

During my testing, I probe all the drives and discovered that your firmware revision includes SATA 3.2 spec features, but that they are not properly implemented.

The drive sets "ATA IDENTIFY's" word 77, bit 6 to 1 ("true"), which means "RECEIVE/SEND FPDMA QUEUED supported".

But the firmware does NOT actually support RECV/SEND FPDMA QUEUED, and just wrongly claims that it does. If you try to retrieve "log 13h" the drive errors out, but the spec says that if RECV/SEND FPDMA is supported then log 13h MUST also be supported.

So this is a case of Samsung's firmware department ticking a flag for all the shiny SATA 3.2 features, and not actually making sure they implemented them all.

The false advertisement of ATA IDENTIFY word 77 bit 6, without actually supporting that new feature, means that the drive is severely broken in multiple ways.

Two possible solutions to this situation:
1) A firmware update which sets "ATA IDENTIFY's" word 77, bit 6 to 0 ("FALSE!") instead, to PROPERLY show that the drive does NOT support SATA 3.2 FPDMA QUEUED features.
2) Alternatively, a firmware update which implements FPDMA QUEUED, log 13h, etc, so that the drive actually supports what it *claims* it does.

Of these two, #1 is the easiest and makes the most sense. Either way, there's a problem in the firmware and it needs a fix.

Thank you for your time,

Richard

Mark Rein (gimpsmart) wrote :

Success! By not mentioning "triggering keywords", the support person wasn't offered the normal form-letter reply, and had to actually read what I said. As a result, they replied with the direct email to their firmware department and asked me to send the technical report to them. I've now done so. Maybe we'll see a solution to this after all.

Mark Rein (gimpsmart) wrote :

Here's an edited summary of the reply I got from the "please contact our firmware engineers at xxx@xxx" address; turns out there's no engineers there either, but at least we finally have a sane non-formletter reply which *for the first time* states that Samsung is aware of the issue and is working on a fix at the firmware level. Good news for everyone:

"We are just the SSD tech support. We are not the FW engineers. They are in Korea and have been aware of the issue since it first started being reported online.

The latest stable Linux Kernal 4.0.5 successfully blacklists the affected drive(s) from queued TRIM. Linux can still use Sequential TRIM, so you’re not losing out on anything.

We see that you are well aware that Marc Carino and Martin K. Petersen are in contact with Samsung on how to resolve the issue since it started.

All we can say to users at this moment is that Linux is the only operating system that has this issue with the Queued TRIM. Linux is open source and can be modified by anyone, as such we do not support the OS. We recommend updating to the new kernel, as we have seen that other users have done so and it alleviated their issue(s)."

CJ (cjpostor) wrote :

Can anyone tell me if this bug affects kernel 3.10?
I don't think this has the SATA 3.1 codes yet.. so I think it should be fine?

Thanks!

Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 3.19.0-22.22

---------------
linux (3.19.0-22.22) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465755

  [ Tai Nguyen ]

  * SAUCE: power: reset: Add syscon reboot device node for APM X-Gene
    platform
    - LP: #1463211

  [ Upstream Kernel Changes ]

  * Revert "dm crypt: fix deadlock when async crypto algorithm returns
    -EBUSY"
    - LP: #1465696
  * Bluetooth: ath3k: Add a new ID 0cf3:e006 to ath3k list
    - LP: #1459934
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * powerpc/powernv: Check image loaded or not before calling flash
    - LP: #1461553
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * Bluetooth: btusb: support public address configuration for ath3012
    - LP: #1459937
  * Bluetooth: btusb: Add setup callback for chip init on USB
    - LP: #1459937
  * Bluetooth: btusb: Add support for QCA ROME chipset family
    - LP: #1459937
  * Bluetooth: btusb: Fix incorrect type in qca_device_info
    - LP: #1459937
  * Bluetooth: btusb: Fix minor whitespace issue in QCA ROME device entries
    - LP: #1459937
  * Bluetooth: btusb: Add support for 0cf3:e007
    - LP: #1459937
  * storvsc: Set the SRB flags correctly when no data transfer is needed
    - LP: #1439780
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * enclosure: fix WARN_ON removing an adapter in multi-path devices
    - LP: #1415178
  * ASoC: tfa9879: Fix return value check in tfa9879_i2c_probe()
    - LP: #1465696
  * ASoC: samsung: s3c24xx-i2s: Fix return value check in
    s3c24xx_iis_dev_probe()
    - LP: #1465696
  * ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLE
    - LP: #1465696
  * ASoC: rt5677: add register patch for PLL
    - LP: #1465696
  * btrfs: unlock i_mutex after attempting to delete subvolume during send
    - LP: #1465696
  * ALSA: hda - Fix mute-LED fixed mode
    - LP: #1465696
  * ALSA: hda - Add mute-LED mode control to Thinkpad
    - LP: #1465696
  * arm64: dma-mapping: always clear allocated buffers
    - LP: #1465696
  * ALSA: emu10k1: Fix card shortname string buffer overflow
    - LP: #1465696
  * ALSA: emux: Fix mutex deadlock at unloading
    - LP: #1465696
  * drm/radeon: Use drm_calloc_ab for CS relocs
    - LP: #1465696
  * drm/radeon: adjust pll when audio is not enabled
    - LP: #1465696
  * drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    - LP: #1465696
  * drm/radeon: fix lockup when BOs aren't part of the VM on release
    - LP: #1465696
  * drm/radeon: reset BOs address after clearing it.
    - LP: #1465696
  * drm/radeon: check new address before removing old one
  ...

Read more...

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
UweBrauer (oub) wrote :

Hi

I bought a Samsung 840 almost a year ago.
I still use a 3.7 kernel and its seems that the fstrim command works without problems.
So most likely I still use the old firmware
However since I run also Windows (which I have not booted for ages), I would like to know if windows might/will
upgrade the firmware (without asking) and causing me trouble?

Does anybody know about this? thanks

Uwe Brauer

Markus Strobl (mstrobl2) wrote :

Uwe: No, Windows will not update the SSD firmware by itself. Samsung has a windows application for updates, but it has to be downloaded from Samsung and executed.

You really should update the firmware though. I also bought my 840 about a year ago and before updating the firmware noticed severe performance degradation and a few corrupted files. Performance according to "hdparm -t /dev/sda" was down to 60 MB/s. After the firmware upgrade performance was restored to 516 MB/s.

Ioannis Vranos (cppdeveloper) wrote :

@CJ, this bug is duplicate of bug #1338706.

The title of it, is:

"Samsung SSD 840 failed to get NCQ Send/Recv Log Emask 0x1 failed to set xfermode (err_mask=0x40) on upstream kernels >= 3.12".

So I presume the bug affects kernels since version 3.12, and kernel 3.10 has not the bug.

Brad Figg (brad-figg) on 2015-06-25
tags: added: verification-done-trusty verification-done-utopic
tags: added: verification-done-vivid
removed: verification-done-utopic
Brad Figg (brad-figg) on 2015-06-26
tags: added: verification-done-utopic
Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 3.19.0-22.22

---------------
linux (3.19.0-22.22) vivid; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465755

  [ Tai Nguyen ]

  * SAUCE: power: reset: Add syscon reboot device node for APM X-Gene
    platform
    - LP: #1463211

  [ Upstream Kernel Changes ]

  * Revert "dm crypt: fix deadlock when async crypto algorithm returns
    -EBUSY"
    - LP: #1465696
  * Bluetooth: ath3k: Add a new ID 0cf3:e006 to ath3k list
    - LP: #1459934
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * powerpc/powernv: Check image loaded or not before calling flash
    - LP: #1461553
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * Bluetooth: btusb: support public address configuration for ath3012
    - LP: #1459937
  * Bluetooth: btusb: Add setup callback for chip init on USB
    - LP: #1459937
  * Bluetooth: btusb: Add support for QCA ROME chipset family
    - LP: #1459937
  * Bluetooth: btusb: Fix incorrect type in qca_device_info
    - LP: #1459937
  * Bluetooth: btusb: Fix minor whitespace issue in QCA ROME device entries
    - LP: #1459937
  * Bluetooth: btusb: Add support for 0cf3:e007
    - LP: #1459937
  * storvsc: Set the SRB flags correctly when no data transfer is needed
    - LP: #1439780
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * enclosure: fix WARN_ON removing an adapter in multi-path devices
    - LP: #1415178
  * ASoC: tfa9879: Fix return value check in tfa9879_i2c_probe()
    - LP: #1465696
  * ASoC: samsung: s3c24xx-i2s: Fix return value check in
    s3c24xx_iis_dev_probe()
    - LP: #1465696
  * ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLE
    - LP: #1465696
  * ASoC: rt5677: add register patch for PLL
    - LP: #1465696
  * btrfs: unlock i_mutex after attempting to delete subvolume during send
    - LP: #1465696
  * ALSA: hda - Fix mute-LED fixed mode
    - LP: #1465696
  * ALSA: hda - Add mute-LED mode control to Thinkpad
    - LP: #1465696
  * arm64: dma-mapping: always clear allocated buffers
    - LP: #1465696
  * ALSA: emu10k1: Fix card shortname string buffer overflow
    - LP: #1465696
  * ALSA: emux: Fix mutex deadlock at unloading
    - LP: #1465696
  * drm/radeon: Use drm_calloc_ab for CS relocs
    - LP: #1465696
  * drm/radeon: adjust pll when audio is not enabled
    - LP: #1465696
  * drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    - LP: #1465696
  * drm/radeon: fix lockup when BOs aren't part of the VM on release
    - LP: #1465696
  * drm/radeon: reset BOs address after clearing it.
    - LP: #1465696
  * drm/radeon: check new address before removing old one
  ...

Read more...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (20.2 KiB)

This bug was fixed in the package linux - 3.16.0-43.58

---------------
linux (3.16.0-43.58) utopic; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1466792

  [ Brad Figg ]

  * Merged back Ubuntu-3.16.0-41.57 regression fix for security release

linux (3.16.0-42.56) utopic; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465714

  [ Chris J Arges ]

  * [config] CONFIG_IPMI_POWERNV=m on ppc64el
    - LP: #1439562

  [ Luis Henriques ]

  * [Config] Disable CONFIG_USB_OTG
    - LP: #1411295

  [ Upstream Kernel Changes ]

  * Revert "i2c: Mark adapter devices with pm_runtime_no_callbacks"
    - LP: #1465613
  * Revert "mm/hugetlb: use pmd_page() in follow_huge_pmd()"
    - LP: #1465613
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * drivers/char/ipmi: Add powernv IPMI driver
    - LP: #1439562
  * powerpc/powernv: Add OPAL IPMI interface
    - LP: #1439562
  * powerpc/powernv: Support OPAL requested heartbeat
    - LP: #1439562
  * powerpc/kernel: Make syscall_exit a local label
    - LP: #1439562
  * powerpc: Remove old compile time disabled syscall tracing code
    - LP: #1439562
  * powerpc/powernv: Remove "opal" prefix from pr_xxx()s
    - LP: #1439562
  * powerpc/powernv: Separate function for OPAL IRQ setup
    - LP: #1439562
  * powerpc/powernv: Add OPAL message notifier unregister function
    - LP: #1439562
  * device: Add dev_of_node() accessor
    - LP: #1439562
  * drivers/core/of: Add symlink to device-tree from devices with an OF
    node
    - LP: #1439562
  * powerpc: Add a proper syscall for switching endianness
    - LP: #1439562
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * udf: Remove repeated loads blocksize
    - LP: #1462173
    - CVE-2015-4167
  * udf: Check length of extended attributes and allocation descriptors
    - LP: #1462173
    - CVE-2015-4167
  * (upstream)scsi_lib: remove the description string in
    scsi_io_completion()
    - LP: #1449372
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * net: eth: xgene: devm_ioremap() returns NULL on error
    - LP: #1458042
  * drivers: net: xgene: fix new firmware backward compatibility with older
    driver
    - LP: #1458042
  * drivers: net: xgene: constify of_device_id array
    - LP: #1458042
  * drivers: net: xgene: Add second SGMII based 1G interface
    - LP: #1458042
  * dtb: change binding name to match with newer firmware DT
    - LP: #1458042
  * dtb: xgene: Add second SGMII based 1G interface node
    - LP: #1458042
  * mlx4: Fix tx ring affinity_mask creation
    - LP: #1465613
  * net/mlx4_en: Schedule napi when RX buffers allocation fails
    - LP: #1465613
...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (9.2 KiB)

This bug was fixed in the package linux - 3.13.0-57.95

---------------
linux (3.13.0-57.95) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1466592

  [ Brad Figg ]

  * Merged back Ubuntu-3.13.0-55.94 regression fix for security release

linux (3.13.0-56.93) trusty; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1465798

  [ Upstream Kernel Changes ]

  * net: eth: xgene: devm_ioremap() returns NULL on error
    - LP: #1458042
  * drivers: net: xgene: fix new firmware backward compatibility with older
    driver
    - LP: #1458042
  * drivers: net: xgene: constify of_device_id array
    - LP: #1458042
  * drivers: net: xgene: Add second SGMII based 1G interface
    - LP: #1458042
  * net: phy: re-design phy_modes to be self-contained
    - LP: #1458042
  * dtb: change binding name to match with newer firmware DT
    - LP: #1458042
  * dtb: xgene: Add second SGMII based 1G interface node
    - LP: #1458042
  * Btrfs: make xattr replace operations atomic
    - LP: #1438501
    - CVE-2014-9710
  * cdc-acm: prevent infinite loop when parsing CDC headers.
    - LP: #1460657
  * (upstream) libata: Blacklist queued TRIM on all Samsung 800-series
    - LP: #1338706, #1449005
  * ahci: avoton port-disable reset-quirk
    - LP: #1458617
  * xfs: avoid false quotacheck after unclean shutdown
    - LP: #1461730
  * (upstream)[SCSI] Add timeout to avoid infinite command retry
    - LP: #1449372
  * (upstream)scsi_lib: remove the description string in
    scsi_io_completion()
    - LP: #1449372
  * udf: Remove repeated loads blocksize
    - LP: #1462173
    - CVE-2015-4167
  * udf: Check length of extended attributes and allocation descriptors
    - LP: #1462173
    - CVE-2015-4167
  * vfs: read file_handle only once in handle_to_path
    - LP: #1416503
    - CVE-2015-1420
  * ozwpan: Use unsigned ints to prevent heap overflow
    - LP: #1463442
    - CVE-2015-4001
  * ozwpan: divide-by-zero leading to panic
    - LP: #1463445
    - CVE-2015-4003
  * ozwpan: Use proper check to prevent heap overflow
    - LP: #1463444
    - CVE-2015-4002
  * ozwpan: unchecked signed subtraction leads to DoS
    - LP: #1463444
    - CVE-2015-4002
  * Input: elantech - add new icbody type
    - LP: #1464490
  * Bluetooth: ath3k: Add support Atheros AR5B195 combo Mini PCIe card
    - LP: #1465796
  * power_supply: twl4030_madc: Check return value of power_supply_register
    - LP: #1465796
  * power_supply: lp8788-charger: Fix leaked power supply on probe fail
    - LP: #1465796
  * ARM: dts: dove: Fix uart[23] reg property
    - LP: #1465796
  * xtensa: xtfpga: fix hardware lockup caused by LCD driver
    - LP: #1465796
  * Drivers: hv: vmbus: Fix a bug in the error path in vmbus_open()
    - LP: #1465796
  * xtensa: provide __NR_sync_file_range2 instead of __NR_sync_file_range
    - LP: #1465796
  * KVM: s390: Zero out current VMDB of STSI before including level3 data.
    - LP: #1465796
  * usb: musb: core: fix TX/RX endpoint order
    - LP: #1465796
  * drm/radeon: fix doublescan modes (v2)
    - LP: #1465796
  * usb: phy: Find the right match in devm_usb_phy_match
    - LP: #1465796
  * tools lib traceevent kbuffer: Rem...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
ray (arkibott) wrote :

Is this the same Problem as described there at https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/ ?

This does affect the 850 EVO also, not just PRO.

Well, the 'bugfix' for 3.16.0-43.58 is there, but somehow the Kernel does not say 'disabling queued TRIM support'
I currently get 'failed to get NCQ Send/Recv Log Emask 0x1'.

Also I am a bit confused, while it may be another issue only i face, since the system currently should be 14.04.2 with an lts-utopic kernel, but the latest kernel uname -a reports, right after an upgrade:

Linux .... 3.16.0-43-generic #58~14.04.1-Ubuntu SMP Mon Jun 22 10:21:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

This looks somewhat old by the stated date. Why does aptitude update && aptitude dist-upgrade not upgrade to the Version with the 'bugfix', and why does the Version number indicate a match to the Version with the 'bugfix'?

cat /sys/block/sda/device/queue_depth does report 31

Offtopic:

Does anyone work on detecting or scanning at a low-level (dm, logical volume manager, file-system) for 512 sized and zeroed areas at suspicious places, maybe even telling the maybe corrupted filename? Actually the information I found about fstrim and the real inner workings in combination with volume managers etc. is hard to find.

I would welcome if someone would pick up on the tests that algolia published on github to build some stress tests for ssd in general.

Does it zero out 'random' blocks only when it's under some 'heavy load', or just always?

Dave Chiluk (chiluk) wrote :

@arkibott

The patch does this.
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4228,6 +4228,7 @@ static const struct ata_blacklist_entry ata_device_blacklist []
        { "Crucial_CT???M500SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, },
        { "Micron_M550*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, },
        { "Crucial_CT*M550SSD*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, },
+ { "Samsung SSD 8*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, },

So barring any unexpected drive strings it should apply to all Samsung 8** SSD's.

The fix is not to disable trim or ncq completely, but only disables NCQ trim.

The kernel looks "old" because kernels go through regression testing before being pushed to end-users. The date you see is the build date.

QIII (qiii) wrote :

The next time Samsung tech support tells you they don't support Linux, ask them "Isn't that a bit odd, considering Samsung is a member of The Linux Foundation?"

zoolook (nbensa) wrote :

Hello,

I have a Samsung 850 EVO (500GB) and yesterday my laptop was unresponsive. I had to do a hard power off.

After a few minutes of usage, another crash. This time I was able to switch to the console and I took a picture (see attach, I'm sorry for the quality).

I moved the disk to my desktop computer to make a backup, but at 30% the disk died. After a reboot, I was able to read the disk again. I could backup almost everything. The problem seems to be some large torrent (~100GB) that I was downloading.

I ran fstrim manually two or three times the past weeks, but I had no problem doing so.

Is the disk damaged/dying, or is just a kernel issue?

Thanks!

Firmware is EMT01B6Q.

Kernel: 4.2.0-19-generic

Ubuntu 15.10

This question / problem does not really fit here, as it's, as you write yourself, totally unrelated to the TRIM issues mentioned above and this is bug report is a dublicate in any case.

That being said your logs don't look too promising and I'd assume an SSD failure. One of the first things I'd try in this case is having a closer look at the S.M.A.R.T. stats (smartctl --xall /dev/sdX | less) and running the different S.M.A.R.T. self tests (man smartctl). This should give you more information about how your SSD feels and might help to rule out any software issues immediately.

In any case, you should not discuss the results here but on an appropriate mailing list or help forum, which bugs.launchpad.net isn't. (You can also contact me directly via mail, however while I've some experience using Linux systems, also with SSDs, I'm no ATA / libata expert.)

Good luck!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.