Kernel errors triggered with SSD storage

Bug #502219 reported by Vishal Rao
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Bootup seems to take a long time while there seem to be ATA errors in dmesg and kern.log (attached). This is for Lucid kernel 2.6.32-9.13-generic and also mainline 2.6.33rc2 from the kernel team PPA. System appears to be running fine though. I have a solid state disk made by Crucial.com model CT128M225 (128 GB SSD model M225) used as my boot drive with current 1819 firmware with TRIM support.

cat /proc/version_signature is "Ubuntu 2.6.32-9.13-generic"
uname -a is "Linux thunderbird 2.6.32-9-generic #13-Ubuntu SMP Thu Dec 17 17:01:59 UTC 2009 x86_64 GNU/Linux"

Attaching dmesg and others for both .32-9.13 and .33rc2...

ProblemType: Bug
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: vishal 1856 F.... knotify4
                      vishal 1875 F.... kmix
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xe3220000 irq 22'
   Mixer name : 'SigmaTel STAC9271D'
   Components : 'HDA:83847627,80863001,00100201'
   Controls : 37
   Simple ctrls : 24
CurrentDmesg:
 [ 20.020013] eth1: no IPv6 routers present
 [ 25.199490] CPUFREQ: Per core ondemand sysfs interface is deprecated - up_threshold
DKDisksMonitorLog: Monitoring activity from the disks daemon. Press Ctrl+C to cancel.
Date: Sat Jan 2 06:54:46 2010
DistroRelease: Ubuntu 10.04
HibernationDevice: RESUME=UUID=e9aa0a1b-0c43-440d-a3e2-261539cb7f5a
HotplugNewDevices:

HotplugNewMounts:

InstallationMedia: Kubuntu 10.04 "Lucid Lynx" - Alpha amd64 (20091225)
IwConfig:
 lo no wireless extensions.

 eth1 no wireless extensions.
MachineType: Vishal Rao BlackBird
Package: linux-image-2.6.32-9-generic 2.6.32-9.13
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-9-generic root=UUID=02af207c-81f2-491f-a3b7-1906a38964cd ro quiet splash
ProcEnviron:
 LANGUAGE=
 LANG=C
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-9.13-generic
Regression: Yes
RelatedPackageVersions: linux-firmware 1.28
Reproducible: Yes
RfKill:

SourcePackage: linux
Symptom: storage
Tags: lucid needs-upstream-testing regression-release
TestedUpstream: No
UdevMonitorLog:
 monitor will print the received events for:
 UDEV - the event which udev sends out after rule processing
Uname: Linux 2.6.32-9-generic x86_64
dmi.bios.date: 03/06/2008
dmi.bios.vendor: Intel Corp.
dmi.bios.version: DPP3510J.86A.0413.2008.0306.2218
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: DP35DP
dmi.board.vendor: Intel Corporation
dmi.board.version: AAD81073-207
dmi.chassis.asset.tag: 000
dmi.chassis.type: 3
dmi.chassis.vendor: Circle
dmi.chassis.version: cc 715
dmi.modalias: dmi:bvnIntelCorp.:bvrDPP3510J.86A.0413.2008.0306.2218:bd03/06/2008:svnVishalRao:pnBlackBird:pvr001:rvnIntelCorporation:rnDP35DP:rvrAAD81073-207:cvnCircle:ct3:cvrcc715:
dmi.product.name: BlackBird
dmi.product.version: 001
dmi.sys.vendor: Vishal Rao

Revision history for this message
Vishal Rao (vishalrao) wrote :
Revision history for this message
Tommy Trussell (tommy-trussell) wrote :

you asked in Bug 445852 if this might be a duplicate of that bug; try the workaround listed in comment 136 https://bugs.launchpad.net/linux/+bug/445852/comments/136 (though there's a typo) -- here it is, corrected:

1) Run the following command. It renames the existing file to devkit-disks-probe-ata-smart.bak and tells the package manager to install any new updates to the _changed_ file name:

$ sudo dpkg-divert --divert --add --rename --divert /lib/udev/devkit-disks-probe-ata-smart.bak /lib/udev/devkit-disks-probe-ata-smart

If you want to see your new divert (and others in the system):

$ sudo dpkg-divert --list

2) Now create a dummy file using nano (substitute gedit or another editor for nano if you prefer).

$ sudo nano /lib/udev/devkit-disks-probe-ata-smart

Type the following three lines into the file then save it and exit the editor:

#!/bin/bash
#
exit 0

3) Make the new dummy file executable so udev can run it:

$ sudo chmod 755 /lib/udev/devkit-disks-probe-ata-smart

-----

If this workaround does not change your problem (or when the bug in libatasmart gets fixed) you will want to remove the dummy file and the divert entry by executing these two commands:

$ sudo rm /lib/udev/devkit-disks-probe-ata-smart
$ sudo dpkg-divert --rename --remove /lib/udev/devkit-disks-probe-ata-smart

Revision history for this message
Vishal Rao (vishalrao) wrote : Re: [Bug 502219] Re: Kernel errors triggered with SSD storage

I have since tried the workaround in the other bug (with reboot) and it
hasn't helped.
I will also post on the crucial.com forums about what is their experience
with Linux on these disks.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Vishal Rao (vishalrao) wrote :

Update from my side after installing Kubuntu Lucid Alpha 2:

It does not seem to be related to bug #445852 or SMART because I tried the workaround above to no effect and I also simply disabled SMART in the BIOS to no effect. I still get the ata errors in dmesg as before.

Although it may not be significant/relevant, note that right after I installed Lucid A2 I also updated the firmware on my Crucial 128 GB SSD model CT128M225 from 1819 to 1916 which seems to help with perf issues users of Win7 etc reported - it now has active GC along with TRIM.

I will now also post on the forums.crucial.com about these ata errors in dmesg and see if they can do anything about it.

Revision history for this message
Vishal Rao (vishalrao) wrote :

A forum user reports a normal-looking dmesg using an Intel SSD: http://ubuntuforums.org/showpost.php?p=8673825&postcount=11 but I'm sad that WIn7 RC has appeared to have been working smoothly with this SSD since I got it.

Revision history for this message
Vishal Rao (vishalrao) wrote :

Is there anything I can do to help diagnose/investigate the problem?

Revision history for this message
Vishal Rao (vishalrao) wrote :

OK I think I know what's happening and I've fixed my issue!

I should have figured this out sooner but the problem is/was NCQ not SMART.

I did a web search for "failed command READ FPDMA QUEUED" and saw some references to NCQ, SSDs and all that jazz.

So all I need to do is pass the linux kernel boot paramter along with the existing (unrelated) parameters "quiet splash" like:

[code]
quiet splash libata.force=noncq
[/code]

Intel does NCQ support right for sure. And OCZ is apparently blacklisted already in the code it looks like that is why the above folks dont get this error.

The place to backlist in linux kernel (.32 series) libata source would be drivers/ata/libata-core.c around line 4252 with the existing OCZ ata_blacklist_entry item:

[code]
{ "CRUCIAL_CT128M225", NULL, ATA_HORKAGE_NONCQ },
[/code]

This blacklist just the 128 GB model which I have, I'm guessing the 64 and 256 models also need blacklisting with their own ID strings. I guess you pass NULL for the firmware version string so it blacklists all of them.

I will post this info on the Crucial.com forums and also on the forums and see what happens from there.

Not sure if this should be fixed by blacklisting these Crucial M225 models in the libata linux kernel source or it should be fixed by Crucial in perhaps a new model and/or BIOS update.

I hope Ubuntu devs might get this into the .32 kernel released with Lucid to avoid headaches for other Crucial SSD users.

Revision history for this message
Vishal Rao (vishalrao) wrote :

Attaching a proposed (sample) patch, not yet tested as I am a first-timer with my kernel PPA. When it builds and I am able to run the kernel I will post back whether the patch worked or not...

Revision history for this message
Vishal Rao (vishalrao) wrote :

Confirmed to work at least for my Crucial 128 GB SSD model CT128M225.

Patch sent upstream to LKML and ATA subsystem maintainer Jeff Garzik, see:

http://lkml.org/lkml/2010/1/26/185
http://lkml.org/lkml/2010/1/26/202

If it gets dropped hopefully it might in essence be incorporated into Ubuntu kernels
to improve the quality user experience of Lucid LTS just that little extra bit :-)

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Vishal,
     As this is in heavy discussion upstream, we recommend that you open an upstream bug for this issue so that you can get them the information they need.

If you have already opened one, please add a bug watch to this report so that we can track the progress of the upstream ticket.

Thanks!

-Jfo

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
ove (ove) wrote :

Just an FYI: my Super Talent UltraDrive GX2 suffers from the same problem. Those errors only occur with firmware versions 1819 and 1916, but not 1571. Disabling NCQ fixed this for me.

tags: added: patch
Revision history for this message
dirk (dirk-kuijsten) wrote :

Hi,

Thans for this bug report. I finally found why my new SSD wasn't working. I have a 64GB Crucial M255 (fw 1916). I tried with Ubuntu 10.04 and these kernels: 2.6.32-22-generic and 2.6.34-020634-generic. It didn't work with both my onboard nvidia sata chip (Geforce 6150 motherboard, MCP51 chipset, sata_nv module) and with a separate PCI-e sata controller: Silicon Image Sil3531 (module sata_sil24).
Thus for upstream it really should block at least all those Crucial models:
mine is: CRUCIAL_CT64M225

I hope upstream will now pick it up that at least two different Crucial SSD's and a Super Talent are faulty with Linux and require the no ncq setting.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.