Bug #1783889 “COMMISSION S.M.A.R.T Tests fail unnecessarily on ...” : Bugs : MAAS

Andres Rodriguez (andreserl) on 2018-07-26

Changed in maas:
milestone:	none → 2.5.0alpha2
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Evan Sikorski (evan.sikorski) wrote on 2018-07-26:

#1

We made a small improvement to this if you wish to incorporate it.

if (proc.returncode & 187) != 0:

and

return 0 if (proc.returncode & 187) == 0 else proc.returncode

"Those would be smarter ways to do what we were wanting based similarly to what the munin guy is doing Now *ALLL* of the return codes 4, 64, and 68 will *ALL* return 0!!!
Not sure if you can update the bug with those lines as suggestions vs what I had before :confused:
This will also now always mask those errors so that when people see the return codes they can ignore the `4` and the `64` cause it won't even show in the exit code"

Revision history for this message

Tyler Gray (tyler.gray) wrote on 2018-07-26:

#2

So there's 2 thoughts of how to solve this:
(Still referencing lines 159 and 172)

1) Account for all permutations of the bits that will be flagged for the test failures we want to ignore (0, 4, 64, and 68)

In which case the code would need to be:
if (proc.returncode != 0 and proc.returncode != 4 and proc.returncode != 64 and proc.returncode != 68):

and

return (0 if proc.returncode == 4 or proc.returncode == 64 or proc.returncode == 68 else proc.returncode)

2) Completely ignore those flags occurring with a bitwise operation that can mask it completely from those bits being flagged

In which case the code could be:
if (proc.returncode & 187) != 0:

and

return 0 if (proc.returncode & 187) == 0 else proc.returncode

And you might even be able to shorten line 172 to just the below, and remove the if/else completely:
return (proc.returncode & 187)

Not sure how dev's find it best to handle it, but wanted to throw out a more completely solution.

Revision history for this message

Lee Trager (ltrager) wrote on 2018-07-26:

#3

Thanks for the report. I'm trying to understand what exactly is failing. Could you please post output of the following:

* The output of the MAAS smartctl test
* sudo smartctl --xall <disk with error>; echo $?

Changed in maas:
status:	Triaged → Incomplete

Revision history for this message

Tyler Gray (tyler.gray) wrote on 2018-07-27:

#4

A few things to note:
1) Scratch the last suggestion of "return (proc.returncode & 187)", that would mask the error even if more than just bits 2 and 6 were flagged from a failure.

2) According to the man page for smartctl, any error present in a log, even past ones, will continue to flag the 6th bit (with the 8 bits marked as 0-7). So even if there are errors that are in the log, but nothing that is actively wrong with the disk, the smartctl error will still flag the 6th bit, causing the exit code to be 64 when translated from binary.

I'll post a failed log for you when I get a chance.

Revision history for this message

Tyler Gray (tyler.gray) wrote on 2018-07-27:

#5

Download full text (36.8 KiB)

INFO: Veriying SMART support for the following drive: /dev/sdm
INFO: Running command: sudo -n smartctl --all /dev/sdm

INFO: SMART support is available; continuing...
INFO: Verifying and/or validating SMART tests...
INFO: Running command: sudo -n smartctl --xall /dev/sdm

FAILURE: SMART tests have FAILED for: /dev/sdm
The test exited with return code 64! See the smarctl manpage for information on the return code meaning. For more information on the test failures, review the test output provided below.
---------------------------------------------------

smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-108-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: SSDSC2BB120G7R
Serial Number: PHDV808402XH150MGN
LU WWN Device Id: 5 5cd2e4 14f1c6a0b
Add. Product Id: DELL(tm)
Firmware Version: N201DL43
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Thu Jul 26 18:55:37 2018 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Unavailable
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
     without error or no self-test has ever
     been run.
Total time to complete Offline
data collection: ( 18) seconds.
Offline data collection
capabilities: (0x79) SMART execute Offline immediate.
     No Auto Offline data collection support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 60) minutes.
Conveyance self-test routine
recommended polling time: ( 60) minutes.
SCT capabilities: (0x003d) SCT Status supported.
     SCT Error Recovery Control supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate -OSR-- 130 130...

INFO: Veriying SMART support for the following drive: /dev/sdm
INFO: Running command: sudo -n smartctl --all /dev/sdm

INFO: SMART support is available; continuing...
INFO: Verifying and/or validating SMART tests...
INFO: Running command: sudo -n smartctl --xall /dev/sdm

FAILURE: SMART tests have FAILED for: /dev/sdm
The test exited with return code 64! See the smarctl manpage for information on the return code meaning. For more information on the test failures, review the test output provided below.
---------------------------------------------------

smartctl 6.5 2016-01-24 r4214 [x86_64-linux-4.4.0-108-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SSDSC2BB120G7R
Serial Number:    PHDV808402XH150MGN
LU WWN Device Id: 5 5cd2e4 14f1c6a0b
Add. Product Id:  DELL(tm)
Firmware Version: N201DL43
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul 26 18:55:37 2018 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Unavailable
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   18) seconds.
Offline data collection
capabilities: 			 (0x79) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 (  60) minutes.
Conveyance self-test routine
recommended polling time: 	 (  60) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -OSR--   130   130   039    -    8597
  5 Reallocated_Sector_Ct   PO--CK   100   100   001    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    24
 12 Power_Cycle_Count       -O--CK   100   100   000    -    43
 13 Read_Soft_Error_Rate    -OSRC-   130   130   000    -    8597
170 Unknown_Attribute       PO--CK   100   100   010    -    0
174 Unknown_Attribute       -O--CK   100   100   000    -    38
179 Used_Rsvd_Blk_Cnt_Tot   PO--CK   100   100   010    -    0
180 Unused_Rsvd_Blk_Cnt_Tot -O--CK   100   100   000    -    5751
181 Program_Fail_Cnt_Total  -O-RCK   100   100   000    -    0
182 Erase_Fail_Count_Total  -O-RCK   100   100   000    -    0
184 End-to-End_Error        -O--CK   100   100   000    -    0
194 Temperature_Celsius     -O---K   100   100   000    -    34
195 Hardware_ECC_Recovered  -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   100   100   000    -    1084
201 Unknown_SSD_Attribute   PO--CK   100   100   010    -    227633478306
202 Unknown_SSD_Attribute   POS--K   100   100   000    -    0
225 Unknown_SSD_Attribute   -O--CK   100   100   000    -    5
226 Unknown_SSD_Attribute   -O--CK   100   100   000    -    102400
227 Unknown_SSD_Attribute   -O--CK   100   100   000    -    0
228 Power-off_Retract_Count -O--CK   100   100   000    -    360789450
232 Available_Reservd_Space PO--CK   100   100   010    -    0
233 Media_Wearout_Indicator -O--CK   100   100   000    -    5
234 Unknown_Attribute       -O--CK   100   100   000    -    0
241 Total_LBAs_Written      -O--CK   100   100   000    -    5
242 Total_LBAs_Read         -O--CK   100   100   000    -    7602
245 Unknown_Attribute       -O--CK   100   100   000    -    100
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x02           SL  R/O      8  Comprehensive SMART error log
0x03       GPL     R/O     20  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      2  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80       GPL     R/W  25600  Host vendor specific log
0x81       GPL,SL  R/W    100  Host vendor specific log
0x85       GPL     R/W  13028  Host vendor specific log
0x85       SL      R/W    228  Host vendor specific log
0x86       GPL     R/W  25600  Host vendor specific log
0x87       GPL     R/W  51812  Host vendor specific log
0x87       SL      R/W    100  Host vendor specific log
0x88       GPL     R/W  33077  Host vendor specific log
0x88       SL      R/W     53  Host vendor specific log
0x89       GPL,SL  R/W     21  Host vendor specific log
0x8b       GPL     R/W  13288  Host vendor specific log
0x8b       SL      R/W    232  Host vendor specific log
0x8c       GPL     R/W  25600  Host vendor specific log
0x8d       GPL,SL  R/W    100  Host vendor specific log
0x91       GPL     R/W  13033  Host vendor specific log
0x91       SL      R/W    233  Host vendor specific log
0x92       GPL     R/W  25600  Host vendor specific log
0x93       GPL     R/W   1380  Host vendor specific log
0x93       SL      R/W    100  Host vendor specific log
0x97       GPL     R/W  13034  Host vendor specific log
0x97       SL      R/W    234  Host vendor specific log
0x98       GPL     R/W  25600  Host vendor specific log
0x99-0x9a  GPL,SL  R/W      1  Host vendor specific log
0x9d       GPL     R/W  13041  Host vendor specific log
0x9d       SL      R/W    241  Host vendor specific log
0x9e       GPL     R/W  25600  Host vendor specific log
0x9f       GPL     R/W   1380  Host vendor specific log
0x9f       SL      R/W    100  Host vendor specific log
0xa3       GPL     VS   13042  Device vendor specific log
0xa3       SL      VS     242  Device vendor specific log
0xa4       GPL     VS   25600  Device vendor specific log
0xa5       GPL     VS   45668  Device vendor specific log
0xa5       SL      VS     100  Device vendor specific log
0xa6       GPL,SL  VS      29  Device vendor specific log
0xa9       GPL     VS   13045  Device vendor specific log
0xa9       SL      VS     245  Device vendor specific log
0xaa       GPL     VS   25600  Device vendor specific log
0xab       GPL     VS   25700  Device vendor specific log
0xab       SL      VS     100  Device vendor specific log
0xb5       GPL,SL  VS       2  Device vendor specific log
0xb6       GPL,SL  VS      18  Device vendor specific log
0xb7       GPL     VS   30976  Device vendor specific log
0xb8       GPL,SL  VS       1  Device vendor specific log
0xb9       GPL     VS    1704  Device vendor specific log
0xb9       SL      VS     168  Device vendor specific log
0xba       GPL     VS    1280  Device vendor specific log
0xbb       GPL,SL  VS      60  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      8  SCT Data Transfer
0xff       GPL     -    19968  Reserved

SMART Extended Comprehensive Error Log Version: 1 (20 sectors)
Device Error Count: 6
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
Error 6 [5] occurred at disk power-on lifetime: 17 hours (0 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 41 03 ff 00 00 0d f9 0f 23 40 00  Error: ICRC, ABRT at LBA = 0x0df90f23 = 234426147

Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 03 ff 00 00 0d f9 0f 23 40 00     00:06:15.741  WRITE FPDMA QUEUED
  ef 00 02 00 00 00 00 00 00 00 00 00 00     00:06:15.741  SET FEATURES [Enable write cache]
  ef 00 aa 00 00 00 00 00 00 00 00 00 00     00:06:15.741  SET FEATURES [Enable read look-ahead]
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:06:15.740  IDENTIFY DEVICE
  b0 00 d0 00 00 00 00 00 c2 4f 00 00 00     00:06:15.739  SMART READ DATA

Error 5 [4] occurred at disk power-on lifetime: 10 hours (0 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 41 03 ff 00 00 0d f9 13 22 40 00  Error: ICRC, ABRT at LBA = 0x0df91322 = 234427170

Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 03 ff 00 00 0d f9 13 22 40 00     00:07:19.045  WRITE FPDMA QUEUED
  ef 00 02 00 00 00 00 00 00 00 00 00 00     00:07:19.045  SET FEATURES [Enable write cache]
  ef 00 aa 00 00 00 00 00 00 00 00 00 00     00:07:19.045  SET FEATURES [Enable read look-ahead]
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:07:19.044  IDENTIFY DEVICE
  b0 00 d0 00 00 00 00 00 c2 4f 00 00 00     00:07:19.043  SMART READ DATA

Error 4 [3] occurred at disk power-on lifetime: 8 hours (0 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 41 03 ff 00 00 0d f9 0f 23 40 00  Error: ICRC, ABRT at LBA = 0x0df90f23 = 234426147

Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 03 ff 00 00 0d f9 0f 23 40 00     00:32:47.402  WRITE FPDMA QUEUED
  ef 00 02 00 00 00 00 00 00 00 00 00 00     00:32:47.402  SET FEATURES [Enable write cache]
  ef 00 aa 00 00 00 00 00 00 00 00 00 00     00:32:47.402  SET FEATURES [Enable read look-ahead]
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:32:47.401  IDENTIFY DEVICE
  b0 00 d0 00 00 00 00 00 c2 4f 00 00 00     00:32:47.400  SMART READ DATA

Error 3 [2] occurred at disk power-on lifetime: 7 hours (0 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 41 03 ff 00 00 0d f9 0b 24 40 00  Error: ICRC, ABRT at LBA = 0x0df90b24 = 234425124

Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 03 ff 00 00 0d f9 0b 24 40 00     00:51:18.936  WRITE FPDMA QUEUED
  ef 00 02 00 00 00 00 00 00 00 00 00 00     00:51:18.936  SET FEATURES [Enable write cache]
  ef 00 aa 00 00 00 00 00 00 00 00 00 00     00:51:18.936  SET FEATURES [Enable read look-ahead]
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:51:18.935  IDENTIFY DEVICE
  b0 00 d0 00 00 00 00 00 c2 4f 00 00 00     00:51:18.934  SMART READ DATA

Error 2 [1] occurred at disk power-on lifetime: 6 hours (0 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 41 03 ff 00 00 0d f9 13 22 40 00  Error: ICRC, ABRT at LBA = 0x0df91322 = 234427170

Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 03 ff 00 00 0d f9 13 22 40 00     00:35:19.875  WRITE FPDMA QUEUED
  ef 00 02 00 00 00 00 00 00 00 00 00 00     00:35:19.875  SET FEATURES [Enable write cache]
  ef 00 aa 00 00 00 00 00 00 00 00 00 00     00:35:19.874  SET FEATURES [Enable read look-ahead]
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:35:19.873  IDENTIFY DEVICE
  b0 00 d0 00 00 00 00 00 c2 4f 00 00 00     00:35:19.872  SMART READ DATA

Error 1 [0] occurred at disk power-on lifetime: 6 hours (0 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 41 03 ff 00 00 0d f9 03 26 40 00  Error: ICRC, ABRT at LBA = 0x0df90326 = 234423078

Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 00 03 ff 00 00 0d f9 03 26 40 00     00:24:26.654  WRITE FPDMA QUEUED
  ef 00 02 00 00 00 00 00 00 00 00 00 00     00:24:26.654  SET FEATURES [Enable write cache]
  ef 00 aa 00 00 00 00 00 00 00 00 00 00     00:24:26.654  SET FEATURES [Enable read look-ahead]
  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:24:26.653  IDENTIFY DEVICE
  b0 00 d0 00 00 00 00 00 c2 4f 00 00 00     00:24:26.653  SMART READ DATA

SMART Extended Self-test Log Version: 1 (2 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%        20         -
# 2  Short offline       Aborted by host               00%        20         -
# 3  Short offline       Completed without error       00%        19         -
# 4  Extended offline    Aborted by host               00%        19         -
# 5  Extended offline    Completed without error       00%         2         -
# 6  Short offline       Aborted by host               00%         2         -
# 7  Short offline       Completed without error       00%         2         -
# 8  Vendor (0x40)       Aborted by host               00%         2         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       1 (0x0001)
SCT Support Level:                   0
Device State:                        Active (0)
Current Temperature:                    34 Celsius
Power Cycle Min/Max Temperature:     26/34 Celsius
Lifetime    Min/Max Temperature:     12/56 Celsius
Under/Over Temperature Limit Count:   0/0
Vendor specific:
00 00 2c 01 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/70 Celsius
Min/Max Temperature Limit:            0/70 Celsius
Temperature History Size (Index):    478 (94)

Index    Estimated Time   Temperature Celsius
  95    2018-07-26 10:58    53  **********************************
 ...    ..(  8 skipped).    ..  **********************************
 104    2018-07-26 11:07    53  **********************************
 105    2018-07-26 11:08    52  *********************************
 ...    ..( 20 skipped).    ..  *********************************
 126    2018-07-26 11:29    52  *********************************
 127    2018-07-26 11:30     ?  -
 128    2018-07-26 11:31    35  ****************
 129    2018-07-26 11:32    36  *****************
 130    2018-07-26 11:33    37  ******************
 131    2018-07-26 11:34    38  *******************
 132    2018-07-26 11:35    38  *******************
 133    2018-07-26 11:36    39  ********************
 134    2018-07-26 11:37    40  *********************
 135    2018-07-26 11:38    41  **********************
 136    2018-07-26 11:39    41  **********************
 137    2018-07-26 11:40    42  ***********************
 138    2018-07-26 11:41    43  ************************
 139    2018-07-26 11:42    43  ************************
 140    2018-07-26 11:43    44  *************************
 141    2018-07-26 11:44    44  *************************
 142    2018-07-26 11:45    44  *************************
 143    2018-07-26 11:46    45  **************************
 144    2018-07-26 11:47    45  **************************
 145    2018-07-26 11:48    45  **************************
 146    2018-07-26 11:49    46  ***************************
 147    2018-07-26 11:50    46  ***************************
 148    2018-07-26 11:51    46  ***************************
 149    2018-07-26 11:52    47  ****************************
 ...    ..(  3 skipped).    ..  ****************************
 153    2018-07-26 11:56    47  ****************************
 154    2018-07-26 11:57    48  *****************************
 ...    ..(  2 skipped).    ..  *****************************
 157    2018-07-26 12:00    48  *****************************
 158    2018-07-26 12:01    49  ******************************
 ...    ..(  4 skipped).    ..  ******************************
 163    2018-07-26 12:06    49  ******************************
 164    2018-07-26 12:07    50  *******************************
 ...    ..(  4 skipped).    ..  *******************************
 169    2018-07-26 12:12    50  *******************************
 170    2018-07-26 12:13    51  ********************************
 ...    ..(  5 skipped).    ..  ********************************
 176    2018-07-26 12:19    51  ********************************
 177    2018-07-26 12:20    52  *********************************
 ...    ..(  7 skipped).    ..  *********************************
 185    2018-07-26 12:28    52  *********************************
 186    2018-07-26 12:29    53  **********************************
 ...    ..(  9 skipped).    ..  **********************************
 196    2018-07-26 12:39    53  **********************************
 197    2018-07-26 12:40    54  ***********************************
 ...    ..( 10 skipped).    ..  ***********************************
 208    2018-07-26 12:51    54  ***********************************
 209    2018-07-26 12:52     ?  -
 210    2018-07-26 12:53    18  -
 211    2018-07-26 12:54    19  -
 212    2018-07-26 12:55    20  *
 213    2018-07-26 12:56    21  **
 214    2018-07-26 12:57    21  **
 215    2018-07-26 12:58    22  ***
 216    2018-07-26 12:59    23  ****
 217    2018-07-26 13:00    23  ****
 218    2018-07-26 13:01    24  *****
 219    2018-07-26 13:02    25  ******
 220    2018-07-26 13:03    26  *******
 221    2018-07-26 13:04    27  ********
 222    2018-07-26 13:05    27  ********
 223    2018-07-26 13:06    28  *********
 224    2018-07-26 13:07    29  **********
 225    2018-07-26 13:08    30  ***********
 226    2018-07-26 13:09    30  ***********
 227    2018-07-26 13:10     ?  -
 228    2018-07-26 13:11    30  ***********
 229    2018-07-26 13:12     ?  -
 230    2018-07-26 13:13    32  *************
 231    2018-07-26 13:14    33  **************
 232    2018-07-26 13:15    34  ***************
 233    2018-07-26 13:16    35  ****************
 234    2018-07-26 13:17    35  ****************
 235    2018-07-26 13:18    36  *****************
 236    2018-07-26 13:19    37  ******************
 237    2018-07-26 13:20    37  ******************
 238    2018-07-26 13:21    38  *******************
 239    2018-07-26 13:22    38  *******************
 240    2018-07-26 13:23    38  *******************
 241    2018-07-26 13:24    39  ********************
 242    2018-07-26 13:25    39  ********************
 243    2018-07-26 13:26    40  *********************
 244    2018-07-26 13:27    40  *********************
 245    2018-07-26 13:28    41  **********************
 246    2018-07-26 13:29    41  **********************
 247    2018-07-26 13:30    41  **********************
 248    2018-07-26 13:31    42  ***********************
 249    2018-07-26 13:32    42  ***********************
 250    2018-07-26 13:33    42  ***********************
 251    2018-07-26 13:34    43  ************************
 252    2018-07-26 13:35    43  ************************
 253    2018-07-26 13:36     ?  -
 254    2018-07-26 13:37    43  ************************
 255    2018-07-26 13:38    44  *************************
 256    2018-07-26 13:39    44  *************************
 257    2018-07-26 13:40    45  **************************
 258    2018-07-26 13:41    45  **************************
 259    2018-07-26 13:42    46  ***************************
 ...    ..( 10 skipped).    ..  ***************************
 270    2018-07-26 13:53    46  ***************************
 271    2018-07-26 13:54    47  ****************************
 ...    ..(  7 skipped).    ..  ****************************
 279    2018-07-26 14:02    47  ****************************
 280    2018-07-26 14:03    48  *****************************
 ...    ..(  3 skipped).    ..  *****************************
 284    2018-07-26 14:07    48  *****************************
 285    2018-07-26 14:08    49  ******************************
 ...    ..(  9 skipped).    ..  ******************************
 295    2018-07-26 14:18    49  ******************************
 296    2018-07-26 14:19    50  *******************************
 297    2018-07-26 14:20    51  ********************************
 298    2018-07-26 14:21    51  ********************************
 299    2018-07-26 14:22    52  *********************************
 ...    ..(  3 skipped).    ..  *********************************
 303    2018-07-26 14:26    52  *********************************
 304    2018-07-26 14:27    51  ********************************
 305    2018-07-26 14:28    51  ********************************
 306    2018-07-26 14:29    50  *******************************
 307    2018-07-26 14:30    50  *******************************
 308    2018-07-26 14:31    50  *******************************
 309    2018-07-26 14:32    49  ******************************
 310    2018-07-26 14:33    49  ******************************
 311    2018-07-26 14:34    49  ******************************
 312    2018-07-26 14:35    48  *****************************
 ...    ..(  3 skipped).    ..  *****************************
 316    2018-07-26 14:39    48  *****************************
 317    2018-07-26 14:40    49  ******************************
 318    2018-07-26 14:41    50  *******************************
 319    2018-07-26 14:42    50  *******************************
 320    2018-07-26 14:43    51  ********************************
 321    2018-07-26 14:44    52  *********************************
 ...    ..(  5 skipped).    ..  *********************************
 327    2018-07-26 14:50    52  *********************************
 328    2018-07-26 14:51    51  ********************************
 329    2018-07-26 14:52    51  ********************************
 330    2018-07-26 14:53    51  ********************************
 331    2018-07-26 14:54    50  *******************************
 332    2018-07-26 14:55    50  *******************************
 333    2018-07-26 14:56    49  ******************************
 334    2018-07-26 14:57    49  ******************************
 335    2018-07-26 14:58    48  *****************************
 336    2018-07-26 14:59    48  *****************************
 337    2018-07-26 15:00    48  *****************************
 338    2018-07-26 15:01    49  ******************************
 339    2018-07-26 15:02    49  ******************************
 340    2018-07-26 15:03    50  *******************************
 341    2018-07-26 15:04    51  ********************************
 342    2018-07-26 15:05    51  ********************************
 343    2018-07-26 15:06    52  *********************************
 344    2018-07-26 15:07    52  *********************************
 345    2018-07-26 15:08    53  **********************************
 346    2018-07-26 15:09    53  **********************************
 347    2018-07-26 15:10    52  *********************************
 348    2018-07-26 15:11    52  *********************************
 349    2018-07-26 15:12    52  *********************************
 350    2018-07-26 15:13    51  ********************************
 351    2018-07-26 15:14    51  ********************************
 352    2018-07-26 15:15    50  *******************************
 353    2018-07-26 15:16    50  *******************************
 354    2018-07-26 15:17    49  ******************************
 355    2018-07-26 15:18    49  ******************************
 356    2018-07-26 15:19    49  ******************************
 357    2018-07-26 15:20    48  *****************************
 ...    ..(  4 skipped).    ..  *****************************
 362    2018-07-26 15:25    48  *****************************
 363    2018-07-26 15:26    49  ******************************
 364    2018-07-26 15:27    49  ******************************
 365    2018-07-26 15:28    50  *******************************
 366    2018-07-26 15:29    51  ********************************
 367    2018-07-26 15:30    51  ********************************
 368    2018-07-26 15:31    52  *********************************
 ...    ..(  2 skipped).    ..  *********************************
 371    2018-07-26 15:34    52  *********************************
 372    2018-07-26 15:35    51  ********************************
 373    2018-07-26 15:36    51  ********************************
 374    2018-07-26 15:37    51  ********************************
 375    2018-07-26 15:38    50  *******************************
 ...    ..(  2 skipped).    ..  *******************************
 378    2018-07-26 15:41    50  *******************************
 379    2018-07-26 15:42    49  ******************************
 ...    ..(  4 skipped).    ..  ******************************
 384    2018-07-26 15:47    49  ******************************
 385    2018-07-26 15:48    50  *******************************
 386    2018-07-26 15:49    51  ********************************
 387    2018-07-26 15:50    52  *********************************
 388    2018-07-26 15:51    52  *********************************
 389    2018-07-26 15:52    53  **********************************
 ...    ..(  3 skipped).    ..  **********************************
 393    2018-07-26 15:56    53  **********************************
 394    2018-07-26 15:57    52  *********************************
 395    2018-07-26 15:58    51  ********************************
 396    2018-07-26 15:59     ?  -
 397    2018-07-26 16:00    48  *****************************
 398    2018-07-26 16:01    48  *****************************
 399    2018-07-26 16:02    47  ****************************
 ...    ..(  8 skipped).    ..  ****************************
 408    2018-07-26 16:11    47  ****************************
 409    2018-07-26 16:12     ?  -
 410    2018-07-26 16:13    47  ****************************
 ...    ..(  3 skipped).    ..  ****************************
 414    2018-07-26 16:17    47  ****************************
 415    2018-07-26 16:18    48  *****************************
 416    2018-07-26 16:19    48  *****************************
 417    2018-07-26 16:20    49  ******************************
 418    2018-07-26 16:21    49  ******************************
 419    2018-07-26 16:22    50  *******************************
 420    2018-07-26 16:23    51  ********************************
 421    2018-07-26 16:24     ?  -
 422    2018-07-26 16:25    52  *********************************
 ...    ..( 20 skipped).    ..  *********************************
 443    2018-07-26 16:46    52  *********************************
 444    2018-07-26 16:47    53  **********************************
 ...    ..(  2 skipped).    ..  **********************************
 447    2018-07-26 16:50    53  **********************************
 448    2018-07-26 16:51    54  ***********************************
 449    2018-07-26 16:52     ?  -
 450    2018-07-26 16:53    54  ***********************************
 451    2018-07-26 16:54    53  **********************************
 452    2018-07-26 16:55    53  **********************************
 453    2018-07-26 16:56    52  *********************************
 454    2018-07-26 16:57    52  *********************************
 455    2018-07-26 16:58    52  *********************************
 456    2018-07-26 16:59    51  ********************************
 ...    ..(  2 skipped).    ..  ********************************
 459    2018-07-26 17:02    51  ********************************
 460    2018-07-26 17:03    50  *******************************
 ...    ..(  2 skipped).    ..  *******************************
 463    2018-07-26 17:06    50  *******************************
 464    2018-07-26 17:07    49  ******************************
 ...    ..(  2 skipped).    ..  ******************************
 467    2018-07-26 17:10    49  ******************************
 468    2018-07-26 17:11    48  *****************************
 ...    ..(  3 skipped).    ..  *****************************
 472    2018-07-26 17:15    48  *****************************
 473    2018-07-26 17:16    47  ****************************
 ...    ..(  9 skipped).    ..  ****************************
   5    2018-07-26 17:26    47  ****************************
   6    2018-07-26 17:27    48  *****************************
   7    2018-07-26 17:28    48  *****************************
   8    2018-07-26 17:29    49  ******************************
   9    2018-07-26 17:30    49  ******************************
  10    2018-07-26 17:31    50  *******************************
  11    2018-07-26 17:32    50  *******************************
  12    2018-07-26 17:33    51  ********************************
  13    2018-07-26 17:34    51  ********************************
  14    2018-07-26 17:35    52  *********************************
  15    2018-07-26 17:36    52  *********************************
  16    2018-07-26 17:37     ?  -
  17    2018-07-26 17:38    53  **********************************
  18    2018-07-26 17:39     ?  -
  19    2018-07-26 17:40    33  **************
  20    2018-07-26 17:41    34  ***************
  21    2018-07-26 17:42     ?  -
  22    2018-07-26 17:43    32  *************
  23    2018-07-26 17:44    33  **************
  24    2018-07-26 17:45    33  **************
  25    2018-07-26 17:46     ?  -
  26    2018-07-26 17:47    35  ****************
  27    2018-07-26 17:48    36  *****************
  28    2018-07-26 17:49     ?  -
  29    2018-07-26 17:50    37  ******************
  30    2018-07-26 17:51    38  *******************
  31    2018-07-26 17:52    38  *******************
  32    2018-07-26 17:53    39  ********************
  33    2018-07-26 17:54    40  *********************
  34    2018-07-26 17:55    40  *********************
  35    2018-07-26 17:56    41  **********************
  36    2018-07-26 17:57    42  ***********************
  37    2018-07-26 17:58    42  ***********************
  38    2018-07-26 17:59    43  ************************
  39    2018-07-26 18:00    44  *************************
  40    2018-07-26 18:01    45  **************************
  41    2018-07-26 18:02    46  ***************************
  42    2018-07-26 18:03    46  ***************************
  43    2018-07-26 18:04     ?  -
  44    2018-07-26 18:05    48  *****************************
  45    2018-07-26 18:06    48  *****************************
  46    2018-07-26 18:07    49  ******************************
 ...    ..(  2 skipped).    ..  ******************************
  49    2018-07-26 18:10    49  ******************************
  50    2018-07-26 18:11    48  *****************************
  51    2018-07-26 18:12    48  *****************************
  52    2018-07-26 18:13    47  ****************************
  53    2018-07-26 18:14    47  ****************************
  54    2018-07-26 18:15    47  ****************************
  55    2018-07-26 18:16    46  ***************************
  56    2018-07-26 18:17    46  ***************************
  57    2018-07-26 18:18     ?  -
  58    2018-07-26 18:19    45  **************************
  59    2018-07-26 18:20    45  **************************
  60    2018-07-26 18:21    45  **************************
  61    2018-07-26 18:22    44  *************************
  62    2018-07-26 18:23    45  **************************
  63    2018-07-26 18:24    45  **************************
  64    2018-07-26 18:25    45  **************************
  65    2018-07-26 18:26    46  ***************************
  66    2018-07-26 18:27    46  ***************************
  67    2018-07-26 18:28    47  ****************************
  68    2018-07-26 18:29    47  ****************************
  69    2018-07-26 18:30    48  *****************************
  70    2018-07-26 18:31    49  ******************************
  71    2018-07-26 18:32    49  ******************************
  72    2018-07-26 18:33    50  *******************************
  73    2018-07-26 18:34    50  *******************************
  74    2018-07-26 18:35    51  ********************************
  75    2018-07-26 18:36    52  *********************************
 ...    ..(  2 skipped).    ..  *********************************
  78    2018-07-26 18:39    52  *********************************
  79    2018-07-26 18:40    51  ********************************
  80    2018-07-26 18:41    51  ********************************
  81    2018-07-26 18:42     ?  -
  82    2018-07-26 18:43    50  *******************************
  83    2018-07-26 18:44    50  *******************************
  84    2018-07-26 18:45    49  ******************************
  85    2018-07-26 18:46    49  ******************************
  86    2018-07-26 18:47    48  *****************************
  87    2018-07-26 18:48     ?  -
  88    2018-07-26 18:49    32  *************
  89    2018-07-26 18:50    33  **************
  90    2018-07-26 18:51    33  **************
  91    2018-07-26 18:52     ?  -
  92    2018-07-26 18:53    32  *************
  93    2018-07-26 18:54    33  **************
  94    2018-07-26 18:55    33  **************

SCT Error Recovery Control:
           Read:     80 (8.0 seconds)
          Write:     80 (8.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0003  4            0  R_ERR response for device-to-host data FIS
0x0004  4            0  R_ERR response for host-to-device data FIS
0x0006  4            0  R_ERR response for device-to-host non-data FIS
0x000a  4            1  Device-to-host register FISes sent due to a COMRESET
0x000b  4            0  CRC errors within host-to-device FIS
0x000d  4            0  Non-CRC errors within host-to-device FIS

Revision history for this message

Lee Trager (ltrager) wrote on 2018-07-27:

#6

Thanks for posting the log. It looks like your drive is experiencing read errors. While these errors are recoverable they may effect performance. You can still use the machine by using the 'override failed test' machine operation in the UI or over the API.

ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate -OSR-- 130 130 039 - 8597
13 Read_Soft_Error_Rate -OSRC- 130 130 000 - 8597
201 Unknown_SSD_Attribute PO--CK 100 100 010 - 227633478306

Revision history for this message

Tyler Gray (tyler.gray) wrote on 2018-07-30:

#7

Okay, maybe I need more help reading these logs, but I've been trying to study them. From what I can tell what you posted does not actually appear show read errors that one would need to be concerned with.

Here's some wikis I've used to try to help me understand errors:
https://lime-technology.com/wiki/Understanding_SMART_Reports
https://en.wikipedia.org/wiki/S.M.A.R.T.

According to these wikis, there are a few things to note:
1) The columns VALUE, WORST, and THRESH tend to start at 100 and count down. So if the current value was lower than 039 (currently at 130), then it would signify that there is a problem with the drive.

2) The column FAIL seems to indicate the last operational hour (from attribute 9 Power_On_Hours) that this attribute failed. Right now that column is blank ('-').

3) It mentions that the RAW_VALUE column should basically be ignored. Its meaning is entirely up to the drive manufacturer. These are Intel drives.

The overall result of that section of the test was:
SMART overall-health self-assessment test result: PASSED

So even with those values, smartctl isn't really declaring that the drive is having issues.

Here's an example from another server of ours were the smartctl results came back clean, with the only difference being that there were no entries in the devices error log:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate -OSR-- 130 130 039 - 8637
13 Read_Soft_Error_Rate -OSRC- 130 130 000 - 8637
201 Unknown_SSD_Attribute PO--CK 100 100 010 - 103079492898

So this is why, from what I can tell, if something simply has an entry in its error log, which will flip bit 6 (and give return code 64 in decimal), that those errors will permanently be in that log and could thus be ignored if that's the only bit flagged in a smartctl return code.

Andres Rodriguez (andreserl) on 2018-08-23

Changed in maas:
milestone:	2.5.0alpha2 → 2.5.0beta1

Andres Rodriguez (andreserl) on 2018-09-05

Changed in maas:
milestone:	2.5.0beta1 → 2.5.0beta2

Andres Rodriguez (andreserl) on 2018-09-27

Changed in maas:
milestone:	2.5.0beta2 → 2.5.0rc1

Andres Rodriguez (andreserl) on 2018-10-09

Changed in maas:
milestone:	2.5.0rc1 → 2.5.x

Revision history for this message

KingJ (kj-kingj) wrote on 2019-01-12:

#8

smartctl output.txt Edit (27.0 KiB, text/plain)

I am also experiencing this on a few drives. Their regular smartctl -a output passes with a return code of 0, however when run with --xall old errors present in the log cause the test to fail.

In my case, the errors are all related to failed WRITE FPDMA QUEUED commands. These were caused by a faulty backplane, rather than a faulty disk. However, as a result the disk is now persistently marked as failing smartctl tests by MAAS despite smartctl reporting that it has passed every single test after the WRITE FPDMA QUEUED errors as well as a badblocks test.

Personally, it feels wrong to mark the drive as failed in this instance since the fault was caused by other hardware in the past, and the drive has subsequently passed any checks performed against it*.

* checking the log again, I see that i've not performed an extended test at any point, I wonder if I were to perform an extended test and it passed, smartctl would disregard the old entries in the log and return an RC of 0? smartctl's man page does seem to imply that this is the case for bit 7 - "The device self-test log contains records of errors. [ATA only] Failed self-tests outdated by a newer successful extended self-test are ignored.". However, as RC=64 is bit 6 it may not work. I'll try it and report back here...

I've attached a log showing the output of;

1) smartctl -a
2) echo $?
3) smartctl --xall
4) echo $?

Revision history for this message

Jan Klare (j-klare) wrote on 2019-07-12:

#9

Is the still work going on to get this merged? We are currently also encountering this issue with SSDs where the smartctl error log includes old errors from a power outage about 2 year ago and therefore the return code is 64 like mentioned in the bug description.

Revision history for this message

Jan Klare (j-klare) wrote on 2020-02-12:

#10

bump

Revision history for this message

Paul Tobias (tobias.pal) wrote on 2020-03-12:

#11

Maybe instead of --xall it would be better to use --health?

For me smartctl --xall exits with code 64:
# smartctl --xall /dev/sdh; echo $?
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-88-generic] (local build)
...snip...
SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 4
...snip...
64

But with --health it returns with success:
# smartctl --health /dev/sdh; echo $?
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-88-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

0

These errors in the logs are past errors. It does not indicate the drive is failing. For example an error log like this is generated if `smartctl --test=short --captive` is ran. Because the kernel detects that the drive didn't respond in a long time, then resets the drive with `ataX: hard resetting link` in dmesg. There seems to be no way of clearing these logs, so the drive will report as Failed in MaaS forever if --xall is used.

Revision history for this message

Paul Tobias (tobias.pal) wrote on 2020-03-12:

#12

Until then I've fixed it for myself with this patch and restarted maas-regiond and now I don't have failing hardware tests any more:

--- /usr/lib/python3/dist-packages/metadataserver/builtin_scripts/smartctl.py.orig
+++ /usr/lib/python3/dist-packages/metadataserver/builtin_scripts/smartctl.py
@@ -245,7 +245,7 @@
     print('INFO: Verifying SMART data on %s' % device_name)
     try:
         output = run_smartctl(
- blockdevice, ['--xall'], device, output=True, stderr=STDOUT)
+ blockdevice, ['--health'], device, output=True, stderr=STDOUT)
     except TimeoutExpired:
         print('ERROR: Validating %s timed out!' % device_name)
         raise

Revision history for this message

Jan Klare (j-klare) wrote on 2020-03-27:

#13

Just ran into that issue again while installing another MAAS server. It would be great if we could agree on a solution for this.
Cheers,
Jan

Adam Collard (adam-collard) on 2020-05-11

Changed in maas:
status:	Incomplete → New
no longer affects:	maas/2.4
Changed in maas:
milestone:	2.5.x → none

Revision history for this message

Björn Tillenius (bjornt) wrote on 2020-09-30:

#14

I'm fine with either solution:

  1) run smartclt --health instead of --xall
  2) detect that the error was in the past, and either
     ignore it, or automatically override testing, so
     that the machine is usable, but with a warning.

Changed in maas:
status:	New → Triaged

Revision history for this message

Brent Barr (brentbarr) wrote on 2021-09-08:

#15

Still an issue in 3.0/stable.

INFO: Verifying SMART support for the following drive: /dev/sdb
INFO: Running command: sudo -n smartctl --all /dev/sdb
INFO: SMART support is available; continuing...
INFO: Verifying SMART data on /dev/sdb
INFO: Running command: sudo -n smartctl --xall /dev/sdb
FAILURE: SMART tests have FAILED for: /dev/sdb
The test exited with return code 64! See the smarctl manpage for information on the return code meaning. For more information on the test failures, review the test output provided below.

Revision history for this message

Bryan Seitz (seitz-a) wrote on 2021-10-15:

#16

Confirmed an issue for me as well with 3/stable from SNAP.

INFO: Verifying SMART support for the following drive: /dev/sda
INFO: Running command: sudo -n smartctl --all /dev/sda
INFO: SMART support is available; continuing...
INFO: Verifying SMART data on /dev/sda
INFO: Running command: sudo -n smartctl --xall /dev/sda
FAILURE: SMART tests have FAILED for: /dev/sda
The test exited with return code 64! See the smarctl manpage for information on the return code meaning. For more information on the test failures, review the test output provided below.

Jerzy Husakowski (jhusakowski) on 2022-08-04

Changed in maas:
milestone:	none → 3.4.0

Alberto Donato (ack) on 2023-06-29

Changed in maas:
milestone:	3.4.0 → 3.4.x

Revision history for this message

Jan Klare (j-klare) wrote on 2023-06-29:

#17

Just saw that this was moved again. IMHO it is a pretty low hanging fruit since there were already a bunch of suggestions on how to fix it, including at least one patch set. I am happy to rebase this, but i would not want to invest this time if nobody cares on the maintainer side.

Anton Troyanov (troyanov) on 2024-03-05

Changed in maas:
milestone:	3.4.x → 3.5.x

Revision history for this message

Jan Klare (j-klare) wrote on 2024-03-06:

#18

Hi Anton,

I just saw that this was moved again from 3.4 to 3.5. What is blocking here and what needs to happen to get this implemented/merged?

Cheers,
Jan

MAAS

COMMISSION S.M.A.R.T Tests fail unnecessarily on code 64 (past log entries)

Bug Description

Related branches

Other bug subscribers

Bug attachments

Remote bug watches