Opened 8 years ago

Closed 7 years ago

#746 closed defect (fixed)

Invalid SMART STATUS output values on FreeNAS

Reported by: Michał Fita Owned by:
Priority: major Milestone: Release 6.6
Component: all Version: 6.5
Keywords: ata freebsd Cc:

Description

I have SanDisk SSD drive from Plus series connected in my FreeNAS box. That's what I'm getting from smartctl for it:

freenas# smartctl -a /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     SanDisk SDSSDA120G
Serial Number:    160696404971
LU WWN Device Id: 5 001b44 4a46cb55b
Firmware Version: U21010RL
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Oct  5 20:34:04 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
Error SMART Status command failed
Please get assistance from
http://www.smartmontools.org/
Register values returned from SMART Status command are:
CMD=0xb0
FR =0xda
NS =0xffff
SC =0xff
CL =0xff
CH =0xff
RETURN =0x0000
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x71) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                                                               FAILED RAW_VALUE
  5 Retired_Block_Count     0x0032   100   100   000    Old_age   Always       -                                                                                      0
  9 Power_On_Hours_and_Msec 0x0032   179   100   000    Old_age   Always       -                                                                                      1203h+00m+00.000s
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -                                                                                      10
166 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -                                                                                      1
167 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -                                                                                      0
168 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -                                                                                      2
169 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -                                                                                      27
170 Reserve_Block_Count     0x0032   100   100   000    Old_age   Always       -                                                                                      0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -                                                                                      0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -                                                                                      0
173 Unknown_SandForce_Attr  0x0032   100   100   ---    Old_age   Always       -                                                                                      0
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -                                                                                      3
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -                                                                                      0
194 Temperature_Celsius     0x0022   067   100   000    Old_age   Always       -                                                                                      33 (Min/Max 0/51)
199 SATA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -                                                                                      0
230 Life_Curve_Status       0x0032   100   100   000    Old_age   Always       -                                                                                      0
232 Available_Reservd_Space 0x0033   100   100   004    Pre-fail  Always       -                                                                                      100
233 SandForce_Internal      0x0032   100   100   000    Old_age   Always       -                                                                                      125
241 Lifetime_Writes_GiB     0x0030   253   253   000    Old_age   Offline      -                                                                                      110
242 Lifetime_Reads_GiB      0x0030   253   253   000    Old_age   Offline      -                                                                                      0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                                                               _of_first_error
# 1  Short offline       Completed without error       00%       173         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

freenas# smartctl --version
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

smartctl comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See http://www.gnu.org for further details.

smartmontools release 6.5 dated 2016-05-07 at 11:17:46 UTC
smartmontools SVN rev 4318 dated 2016-05-07 at 11:18:20
smartmontools build host: amd64-portbld-freebsd10.3
smartmontools build with: C++98, GCC 4.2.1 Compatible FreeBSD Clang 3.4.1 (tags/RELEASE_34/dot1-final 208032)
smartmontools configure arguments: '--disable-dependency-tracking' '--enable-sample' '--with-initscriptdir=/usr/local/etc/rc.d' '--prefix=/usr/local' '--localstatedir=/var' '--mandir=/usr/local/man' '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd10.3' 'build_alias=amd64-portbld-freebsd10.3' 'CXX=c++' 'CXXFLAGS=-O2 -pipe -fstack-protector -fno-strict-aliasing ' 'LDFLAGS= -fstack-protector' 'LIBS=' 'CPPFLAGS=' 'CC=cc' 'CFLAGS=-O2 -pipe  -fstack-protector -fno-strict-aliasing'

The problematic part of the report above is:

Register values returned from SMART Status command are:
CMD=0xb0
FR =0xda
NS =0xffff
SC =0xff
CL =0xff
CH =0xff
RETURN =0x0000

Samsung says they don't see any problem with the drive. Their own SMART checker doesn't reports any problem either. It has to be something with how the drive reacts to the sequence of commands.

The drive is connected to the motherboard through the SATA extension card Lycom PE-115 SATA 3 2 Port 6Gbps Low Profile PCI-e 2.0 Host Adapter... can this be related as I've seen problem with this brand related to USB things?

This text:

SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.

causes my FreeNAS installation to panic and spam me with tons of alarms every day. I'd like to see it solved.

Change History (7)

comment:1 by Christian Franke, 8 years ago

Component: smartctlall
Keywords: ata added; freenas removed
Milestone: unscheduled
Summary: Problems with SanDisk SDSSDA120GInvalid SMART STATUS output values on FreeNAS

Apparently the driver used for this controller does not return ATA output registers from SMART RETURN STATUS command properly (Expected: CL=0x4f, CH=0xc2 for PASSED, CL=0xf4, CH=0x2c for FAILED).

This is interpreted by (legacy) code in os_freebsd.cpp as SMART FAILED. Should be interpreted as COMMAND NOT SUPPORTED instead.

The SSD drive itself looks sane.

Workaround: run smartctl without -H option:

  smartctl -i -c -A -l error -l selftest -l selective /dev/ada0

comment:2 by paul.warwicker, 8 years ago

I have the same issue with an array of 5x 2TB WD drives with FreeNAS Corral 10.0.2. All drives are identical spec and firmware version. Only two of the drives report failures.
I have pulled one of the drives and tried WD's LifeGuard tools. It reports that the SMART status is 'PASS', but might not of course use use the command/sub-command 0xb0/0xda.
I am currently discussing this with WD.

The workround doesn't work in Corral because the monitoring uses the pySMART module and that explicitly greps for the 'PASS' output in line 5 of the output from the smartctl --health command.

btw one weird point. The "affected" drives have been ata0 and ata1 for weeks. Then after a recent reboot it has swapped to be ata1 and ata2. ata0 now returns the data correctly.
Note: I had confirmed that the drive was okay before I started ignoring the warning!

comment:3 by paul.warwicker, 8 years ago

fwiw WD have confirmed that the equivalent of the ATA_SMART_STATUS sub-command is used as part of their LifeGuard utility.
-paul

comment:4 by Alex Samorukov, 8 years ago

Please try r4425 - duplicated code was removed from the os_freebsd.cpp

comment:5 by Christian Franke, 8 years ago

Milestone: unscheduledRelease 6.6

comment:6 by Vladimir Grebenschikov, 7 years ago

Well, before patch r4425 it was showed:

# smartctl -H /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-RELEASE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

START OF READ SMART DATA SECTION

Error SMART Status command failed
Please get assistance from
http://www.smartmontools.org/
Register values returned from SMART Status command are:
CMD=0xb0
FR =0xda
NS =0x00
SC =0xff
CL =0x3f
CH =0x3a
RETURN =0x0000
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
No failed Attributes found.

After patch it shows:

# smartctl -H /dev/ada0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.1-RELEASE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

START OF READ SMART DATA SECTION

SMART Status command failed
Please get assistance from http://www.smartmontools.org/
Register values returned from SMART Status command are:

ERR=0x00, SC=0x00, LL=0xff, LM=0x3f, LH=0x3a, DEV=...., STS=....

SMART Status not supported: Invalid ATA output register values
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

Which is a bit better, but still suspicious.
What are LM=0x3f, LH=0x3a? Is drive is good?
All other smart values looks fine.

What is strange first HDD in system get such status after some time (just after connection, it usually shows PASSED on --health),
but after a while it starts to show "Error SMART Status command failed" or "SMART Status not supported: Invalid ATA output register values" after patch.

How to interpret that?

Version 1, edited 7 years ago by Vladimir Grebenschikov (previous) (next) (diff)

comment:7 by Alex Samorukov, 7 years ago

Resolution: fixed
Status: newclosed

Yes, it is good.

Message means exactly what it tells - output registers are incorrect. There is nothing to do with a smartctl/smartd, it is firmware or controller issue. However based on the SMART attributes value we can consider that drive is healthy

Note: See TracTickets for help on using tickets.