MAAS

Bug #1783889
Comment #7

Comment 7 for bug 1783889

Revision history for this message

Tyler Gray (tyler.gray) wrote on 2018-07-30:

Okay, maybe I need more help reading these logs, but I've been trying to study them. From what I can tell what you posted does not actually appear show read errors that one would need to be concerned with.

Here's some wikis I've used to try to help me understand errors:
https://lime-technology.com/wiki/Understanding_SMART_Reports
https://en.wikipedia.org/wiki/S.M.A.R.T.

According to these wikis, there are a few things to note:
1) The columns VALUE, WORST, and THRESH tend to start at 100 and count down. So if the current value was lower than 039 (currently at 130), then it would signify that there is a problem with the drive.

2) The column FAIL seems to indicate the last operational hour (from attribute 9 Power_On_Hours) that this attribute failed. Right now that column is blank ('-').

3) It mentions that the RAW_VALUE column should basically be ignored. Its meaning is entirely up to the drive manufacturer. These are Intel drives.

The overall result of that section of the test was:
SMART overall-health self-assessment test result: PASSED

So even with those values, smartctl isn't really declaring that the drive is having issues.

Here's an example from another server of ours were the smartctl results came back clean, with the only difference being that there were no entries in the devices error log:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate -OSR-- 130 130 039 - 8637
13 Read_Soft_Error_Rate -OSRC- 130 130 000 - 8637
201 Unknown_SSD_Attribute PO--CK 100 100 010 - 103079492898

So this is why, from what I can tell, if something simply has an entry in its error log, which will flip bit 6 (and give return code 64 in decimal), that those errors will permanently be in that log and could thus be ignored if that's the only bit flagged in a smartctl return code.