On bootup smartd reports FailedReadSmartSelfTestLog but there is no hardware problem

Bug #1471462 reported by Richard
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
smartmontools (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

SYMPTOM:
smartd reports it cannot read the smart self-test log for two Intel ssd drives.
Problems are reported for both drives, indicating that this is a software problem.

AFFECTS:
The issue started after upgrading to Ubuntu 15.04.
The issue did not happen on any Ubuntu release up to and including 14.10.

IMPACT:
errors written to syslog on bootup
emails sent to root on bootup

DIAGNOSE:
The log can be read wihtout issuesa when the computer is running:
sudo smartctl --log=selftest /dev/sda

Perhaps the drive is accessed before it is ready or other similar problem preventing the smartd commands from succeeding.

Emails are sent to root:
This message was generated by the smartd daemon running on:

   host name: c505
   DNS domain: [Empty]

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], Read SMART Self-Test Log Failed

Device info:
INTEL SSDSA2MH080G1GC, S/N:CVEM838000PY080DGN, WWN:5-001517-0ed62633c, FW:045C8820, 80.0 GB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
The original message about this issue was sent at Sat Jun 20 06:48:32 2015 PDT
Another message will be sent in 24 hours if the problem persists.

On bootup, the log displays:

Jul 3 21:47:26 c505 kernel: [1128701.864062] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 3 21:47:26 c505 kernel: [1128701.864069] ata4.00: failed command: SMART
Jul 3 21:47:26 c505 kernel: [1128701.864075] ata4.00: cmd b0/d5:01:06:4f:c2/00:00:00:00:00/00 tag 2 pio 512 in
Jul 3 21:47:26 c505 kernel: [1128701.864075] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 3 21:47:26 c505 kernel: [1128701.864079] ata4.00: status: { DRDY }
Jul 3 21:47:26 c505 kernel: [1128701.864083] ata4: hard resetting link
Jul 3 21:47:27 c505 kernel: [1128702.356041] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul 3 21:47:27 c505 kernel: [1128702.356534] ata4.00: configured for UDMA/133
Jul 3 21:47:27 c505 kernel: [1128702.356552] ata4: EH complete
Jul 3 21:47:27 c505 kernel: [1128702.356782] ata4.00: Enabling discard_zeroes_data

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: smartmontools 6.3+svn4002-2
ProcVersionSignature: Ubuntu 3.19.0-22.22-generic 3.19.8-ckt1
Uname: Linux 3.19.0-22-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.17.2-0ubuntu1.1
Architecture: amd64
CurrentDesktop: GNOME-Flashback:Unity:
Date: Sat Jul 4 12:03:27 2015
ProcEnviron:
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: smartmontools
UpgradeStatus: Upgraded to vivid on 2015-06-20 (14 days ago)

Revision history for this message
Richard (ismail-a) wrote :
Revision history for this message
Christian Franke (christian-franke) wrote :

> ... kernel: [1128701.864075] ata4.00: cmd b0/d5:01:06:4f:c2/...
> ... kernel: [1128701.864075] res ... Emask 0x4 (timeout)

This ATA cmd is a correct SMART READ LOG 0x06 (self-test log) command. If the command fails with "timeout", this is possibly not a smartmontools issue.

> Device info:
> INTEL SSDSA2MH080G1GC, ...

Some older Intel SSDs have known issues with the SMART READ LOG 0x00 (log directory) command: After this command was completed, the drive may hang. Smartd reads the log directory to check which log is supported, see https://www.smartmontools.org/ticket/89.

This very old X18/X25-M G1 series SSD may be affected or not. Please try whether the problem persists if '-F nologdir' is added to the smartd.conf configuration line.

Revision history for this message
Christian Franke (christian-franke) wrote :

This is possibly not the Intel firmware bug mentioned in my previous comment but an older one:

X18/X25-M/V G2 series with Firmware before 2CV102J8 may hang if self-test log is read. G1 series may also be affected. Smartctl and smartd print a related warning for G2 series but not for G1.

Please update the SSD firmware if possible.

See also this ticket from 2010:
"smartd Read SMART Self Test Log Failed on Intel X25-M SSD":
https://bugs.launchpad.net/ubuntu/+source/smartmontools/+bug/597518

Revision history for this message
Richard (ismail-a) wrote :

for the 34 nm, Intel has an update.
For the 50 nm, there is no fix from Intel.

Revision history for this message
Robie Basak (racb) wrote :

Christian, thank you for looking after smartmontools bugs in Ubuntu. Do you consider this a valid bug for which we should track a fix in upstream and Ubuntu?

Revision history for this message
Christian Franke (christian-franke) wrote :

It is possibly is a bug in the Intel X25 G1 firmware Intel did never fix (a similar bug in G2 was fixed). But I don't remember any related report since the G1 series was introduced in 2009(!).

It is possibly not a smartmontools bug, except if the problem did not occur with older versions.

Please test if '-F nologdir' changes anything. If not, test whether another problem occurs if '-l selftest' is not specified: Don't use '-a' in smartd.conf, use '-H -f -t -l error' instead.

All that could be done in smartmontools is to add another -F ( --firmwarebug) option to suppress accesses to self-test log. If desired, please create a ticket upstream. But I'm not sure whether this enhancement is worth the (non-trivial) effort for a single SSD series introduced 6+ years ago.

Revision history for this message
Richard (ismail-a) wrote :

I updated the G2 34 nm drive, works fine.

G1 has the problem and Intel has decided no fix.

It is true that these drives are 7 years old, however the wear indicator says there's another 14 years in them.
They sure were expensive but the only product tested to be reliable at the time.

Joshua Powers (powersj)
Changed in smartmontools (Ubuntu):
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.