Opened 3 years ago

Last modified 5 months ago

#1620 new enhancement

WD WDS SSDs show incorrect Media Wearout Indicator (WDS400T2B0A)

Reported by: jeyare Owned by:
Priority: minor Milestone: undecided
Component: drivedb Version: 7.3
Keywords: ssd Cc:

Description

almost one year old drives show MWI (230) with value 001 and hex: 0x015d011e015d
what is wrong, because it’s not 1% of the lifespan, it’s 99%

"-v 230,hex48,Media_Wearout_Indicator "

Model Family: WD Blue / Red / Green SSDs
Device Model: WDC WDS400T2B0A-00SM50
Serial Number: 1926D7420114
LU WWN Device Id: 5 001b44 4a8e02f24
Firmware Version: 411030WD

same found in many post over internet

Issue:
Synology NAS read this information as 1% lifespan …. wrong
Win10, Linux OSs as 99% … expected and they categorize the same drives as health (native WD tool or smartctl)
when MWI is watched as one of the health indicators, then system evaluated drives status unhealthy, you can imagine all consequences.
What is your point? Thx.
J.

Attachments (2)

Screenshot2.png (69.8 KB ) - added by jeyare 3 years ago.
Screenshot1.png (705.7 KB ) - added by jeyare 3 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 by Christian Franke, 3 years ago

Keywords: ssd added; WDS removed
Milestone: undecided

Please provide sample smartctl -x ... output of an affected device.

comment:2 by jeyare, 3 years ago

smartctl -x /dev/sata1p6

smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               WDC
Product:              WDS400T2B0A-00SM
Revision:             30WD
User Capacity:        4,000,787,030,016 bytes [4.00 TB]
Logical block size:   512 bytes
LU is fully provisioned
Rotation Rate:        Solid State Device
Form Factor:          2.5 inches
Logical Unit id:      0x5001b444a8e02f24
Serial number:        1926D7420114
Device type:          disk
Local Time is:        Sat Jun  4 16:52:32 2022 BST
SMART support is:     Unavailable - device lacks SMART capability.
Read Cache is:        Enabled
Writeback Cache is:   Enabled

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging


----

smartctl -a -d ata /dev/sata1p6

smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue / Red / Green SSDs
Device Model:     WDC  WDS400T2B0A-00SM50
Serial Number:    1926D7420114
LU WWN Device Id: 5 001b44 4a8e02f24
Firmware Version: 411030WD
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   Unknown(0x0ff0), ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun  4 16:53:38 2022 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME                                                   FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct                                            0x0032   100   100   ---    Old_age   Always       -       0
  9 Power_On_Hours                                                   0x0032   100   100   ---    Old_age   Always       -       20705
 12 Power_Cycle_Count                                                0x0032   100   100   ---    Old_age   Always       -       29
165 Block_Erase_Count                                                0x0032   100   100   ---    Old_age   Always       -       8670610390
166 Minimum_PE_Cycles_TLC                                            0x0032   100   100   ---    Old_age   Always       -       4
167 Max_Bad_Blocks_per_Die                                           0x0032   100   100   ---    Old_age   Always       -       125
168 Maximum_PE_Cycles_TLC                                            0x0032   100   100   ---    Old_age   Always       -       47
169 Total_Bad_Blocks                                                 0x0032   100   100   ---    Old_age   Always       -       2899
170 Grown_Bad_Blocks                                                 0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count                                               0x0032   100   100   ---    Old_age   Always       -       0
172 Erase_Fail_Count                                                 0x0032   100   100   ---    Old_age   Always       -       0
173 Average_PE_Cycles_TLC                                            0x0032   100   100   ---    Old_age   Always       -       13
174 Unexpected_Power_Loss                                            0x0032   100   100   ---    Old_age   Always       -       3
184 End-to-End_Error                                                 0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect                                               0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout                                                  0x0032   100   100   ---    Old_age   Always       -       77
194 Temperature_Celsius                                              0x0022   063   048   ---    Old_age   Always       -       37 (Min/Max 18/48)
199 UDMA_CRC_Error_Count                                             0x0032   100   100   ---    Old_age   Always       -       0
230 Media_Wearout_Indicator                                          0x0032   001   001   ---    Old_age   Always       -       0x0160011e0160
232 Available_Reservd_Space                                          0x0033   100   100   004    Pre-fail  Always       -       100
233 NAND_GB_Written_TLC                                              0x0032   100   100   ---    Old_age   Always       -       52683
234 NAND_GB_Written_SLC                                              0x0032   100   100   ---    Old_age   Always       -       61340
241 Host_Writes_GiB                                                  0x0030   253   253   ---    Old_age   Offline      -       41701
242 Host_Reads_GiB                                                   0x0030   253   253   ---    Old_age   Offline      -       191267
244 Temp_Throttle_Status                                             0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20652         -
# 2  Short offline       Completed without error       00%     20633         -
# 3  Short offline       Completed without error       00%     20620         -
# 4  Short offline       Completed without error       00%     20452         -
# 5  Short offline       Completed without error       00%     20284         -
# 6  Short offline       Completed without error       00%     20116         -
# 7  Short offline       Completed without error       00%     19949         -
# 8  Short offline       Completed without error       00%     19781         -
# 9  Short offline       Completed without error       00%     19613         -
#10  Short offline       Completed without error       00%     19445         -
#11  Short offline       Completed without error       00%     19277         -
#12  Short offline       Completed without error       00%     19109         -
#13  Short offline       Completed without error       00%     18942         -
#14  Short offline       Completed without error       00%     18774         -
#15  Short offline       Completed without error       00%     18606         -
#16  Short offline       Completed without error       00%     18494         -
#17  Short offline       Completed without error       00%     18318         -
#18  Short offline       Completed without error       00%     18150         -
#19  Short offline       Completed without error       00%     17982         -
#20  Short offline       Completed without error       00%     17814         -
#21  Short offline       Completed without error       00%     17646         -

Selective Self-tests/Logging not supported


----

smartctl -P show -d ata /dev/sata1p6

smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Drive found in smartmontools Database.  Drive identity strings:
MODEL:              WDC  WDS400T2B0A-00SM50
FIRMWARE:           411030WD
match smartmontools Drive Database entry:
MODEL REGEXP:       WDC WDBNCE(250|500|00[124])0PNC(-.*)?|WDC  ?WDS((120|240|250|480|500)G|[124]00T)(1B|2B|1G|2G|1R)0[AB](-.*)?
FIRMWARE REGEXP:    .*
MODEL FAMILY:       WD Blue / Red / Green SSDs
ATTRIBUTE OPTIONS:  165 Block_Erase_Count
                    166 Minimum_PE_Cycles_TLC
                    167 Max_Bad_Blocks_per_Die
                    168 Maximum_PE_Cycles_TLC
                    169 Total_Bad_Blocks
                    170 Grown_Bad_Blocks
                    171 Program_Fail_Count
                    172 Erase_Fail_Count
                    173 Average_PE_Cycles_TLC
                    174 Unexpected_Power_Loss
                    230 Media_Wearout_Indicator
                    233 NAND_GB_Written_TLC
                    234 NAND_GB_Written_SLC
                    241 Host_Writes_GiB
                    242 Host_Reads_GiB
                    244 Temp_Throttle_Status
Last edited 3 years ago by Christian Franke (previous) (diff)

in reply to:  2 comment:3 by Christian Franke, 3 years ago

smartctl -a -d ata /dev/sata1p6

-a does only include legacy SMART information that's why I requested -x. Please provide smartctl -x -d ata ... output. Use plain-text attachments or wiki markup.

comment:4 by jeyare, 3 years ago

smartctl -x -d ata /dev/sata1p6
smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue / Red / Green SSDs
Device Model:     WDC  WDS400T2B0A-00SM50
Serial Number:    1926D7420114
LU WWN Device Id: 5 001b44 4a8e02f24
Firmware Version: 411030WD
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   Unknown(0x0ff0), ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun  4 20:46:14 2022 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME                                                   FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct                                            -O--CK   100   100   ---    -    0
  9 Power_On_Hours                                                   -O--CK   100   100   ---    -    20709
 12 Power_Cycle_Count                                                -O--CK   100   100   ---    -    29
165 Block_Erase_Count                                                -O--CK   100   100   ---    -    8670610390
166 Minimum_PE_Cycles_TLC                                            -O--CK   100   100   ---    -    4
167 Max_Bad_Blocks_per_Die                                           -O--CK   100   100   ---    -    125
168 Maximum_PE_Cycles_TLC                                            -O--CK   100   100   ---    -    47
169 Total_Bad_Blocks                                                 -O--CK   100   100   ---    -    2899
170 Grown_Bad_Blocks                                                 -O--CK   100   100   ---    -    0
171 Program_Fail_Count                                               -O--CK   100   100   ---    -    0
172 Erase_Fail_Count                                                 -O--CK   100   100   ---    -    0
173 Average_PE_Cycles_TLC                                            -O--CK   100   100   ---    -    13
174 Unexpected_Power_Loss                                            -O--CK   100   100   ---    -    3
184 End-to-End_Error                                                 -O--CK   100   100   ---    -    0
187 Reported_Uncorrect                                               -O--CK   100   100   ---    -    0
188 Command_Timeout                                                  -O--CK   100   100   ---    -    77
194 Temperature_Celsius                                              -O---K   064   048   ---    -    36 (Min/Max 18/48)
199 UDMA_CRC_Error_Count                                             -O--CK   100   100   ---    -    0
230 Media_Wearout_Indicator                                          -O--CK   001   001   ---    -    0x0160011e0160
232 Available_Reservd_Space                                          PO--CK   100   100   004    -    100
233 NAND_GB_Written_TLC                                              -O--CK   100   100   ---    -    52686
234 NAND_GB_Written_SLC                                              -O--CK   100   100   ---    -    61344
241 Host_Writes_GiB                                                  ----CK   253   253   ---    -    41704
242 Host_Reads_GiB                                                   ----CK   253   253   ---    -    191277
244 Temp_Throttle_Status                                             -O--CK   000   100   ---    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented
Read GP Log Directory failed

SMART Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00           SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      2  Comprehensive SMART error log
0x04           SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x30           SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f      SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20652         -
# 2  Short offline       Completed without error       00%     20633         -
# 3  Short offline       Completed without error       00%     20620         -
# 4  Short offline       Completed without error       00%     20452         -
# 5  Short offline       Completed without error       00%     20284         -
# 6  Short offline       Completed without error       00%     20116         -
# 7  Short offline       Completed without error       00%     19949         -
# 8  Short offline       Completed without error       00%     19781         -
# 9  Short offline       Completed without error       00%     19613         -
#10  Short offline       Completed without error       00%     19445         -
#11  Short offline       Completed without error       00%     19277         -
#12  Short offline       Completed without error       00%     19109         -
#13  Short offline       Completed without error       00%     18942         -
#14  Short offline       Completed without error       00%     18774         -
#15  Short offline       Completed without error       00%     18606         -
#16  Short offline       Completed without error       00%     18494         -
#17  Short offline       Completed without error       00%     18318         -
#18  Short offline       Completed without error       00%     18150         -
#19  Short offline       Completed without error       00%     17982         -
#20  Short offline       Completed without error       00%     17814         -
#21  Short offline       Completed without error       00%     17646         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (SMART Log 0x04)
Page  Offset Size        Value Flags Description
ATA_SMART_READ_LOG failed: Multi-sector ATA commands not implemented
Read Device Statistics pages 0x00-0x07 failed

ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented
Read SATA Phy Event Counters failed
Last edited 3 years ago by Christian Franke (previous) (diff)

in reply to:  4 comment:5 by Christian Franke, 3 years ago

smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)

Please note that this release is 6+ years old.

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
...
230 Media_Wearout_Indicator -O--CK   001   001   ---    -    0x0160011e0160

Most thresholds are missing. The normalized value is possibly increasing from 0 instead of decreasing from 100. This is a drive firmware issue which cannot be fixed by a drive database entry. From the ATA standards point of view, it is not a firmware bug because the SMART data structure was never part of any ATA standard.

Device Statistics (SMART Log 0x04)
Page  Offset Size        Value Flags Description
ATA_SMART_READ_LOG failed: Multi-sector ATA commands not implemented
Read Device Statistics pages 0x00-0x07 failed

Device Statistics usually contains the correct Percentage Used Endurance Indicator but unfortunately this SATA driver does not implement the required functionality to read this log.

There are two alternatives which might disable the false interpretation of attribute 230 by Synology NAS:

Rename the attribute:
-v 230,hex48,Bogus_Media_Wearout_Ind

Move VALUE WORST to RAW_VALUE:
-v 230,hex64,Media_Wearout_Indicator

comment:6 by Christian Franke, 3 years ago

Summary: WD WDS SSDs show incorrect Media Wearout IndicatorWD WDS SSDs show incorrect Media Wearout Indicator (WDS400T2B0A)

comment:7 by jeyare, 3 years ago

Thanks Christian for your valuable support.
Gents from Synology support spent 3 weeks responding to the MWI issue (+10 iterations). Instead, they labeled these 6 same SSDs as unhealthy (by the last DSM7 system update, never in previous system’s version), and subsequently they wrote that the drive is incompatible with their NASes (which is stup.d). Thanks to this upgrade the volume being degraded. Similar story with the support on the WD side - they still looking for reasons not to do so, even though I asked them for a clear definition of the MWI algorithm for normalization value. Instead they asked me for information about ‘msinfo’ (Win OS) even though the NAS contains FreeBSD (what they know from me).
Thanks again. Great support from you. I will check the proposed steps.

J.

comment:8 by Christian Franke, 3 years ago

Type: defectenhancement

You're welcome.

Apparently only few WD SSD models and/or firmware versions are affected.

Tickets for similar drives which correctly show attribute 230 with a VALUE WORST of 100 100: #767, #771, #845, #980, #1048, #1162, #1073, #1169, #1198, #1321.

There are two exceptions:
WDC WDS400T1R0A-68A4W0/411000WR (#1450, #1601).
WDC WDS400T2B0A-00SM50/411030WD (this ticket).

Perhaps we could add a separate drivedb entry to handle these cases if it is known how the bogus behavior of Synology NAS and others could be prevented.

Leaving the ticket open as undecided for now.

by jeyare, 3 years ago

Attachment: Screenshot2.png added

comment:9 by jeyare, 3 years ago

Christian,

1. test

Rename the attribute:
-v 230,hex48,Bogus_Media_Wearout_Ind

result:
This option stops DSM from sampling the Estimated lifespan value displayed on the DSM GUI and rather than the inverted % value it shows a '- instead.
Attached: Screenshot 1
Verdict: doesn't work

2. test

Move VALUE WORST to RAW_VALUE:
-v 230,hex64,Media_Wearout_Indicator

result:
The second retains the Estimated lifespan value but changes the value to 0% triggering the Damaged / Critical warnings.
Attached: Screenshot 2
Verdict: doesn't work

so it must be tricky to find an accurate value to prevent drive attribute reading by the Synology DSM system.

Thx for a help

J.

Last edited 3 years ago by jeyare (previous) (diff)

by jeyare, 3 years ago

Attachment: Screenshot1.png added

comment:10 by Christian Franke, 3 years ago

Screenshot1.png shows the expected behavior of 2. test, not 1. test:
VALUE and WORST are both returned as --- and both appear in RAW_VALUE as least significant bytes (0x0160011e01600101). Originally introduced for the unusual 64-bit format of very old Indilinx SSD controllers, this should prevent any interpretation of the VALUE.

comment:11 by jeyare, 3 years ago

Chris,
thx for the help. The issue is temporarily resolved (till the next Synology system update) by:

-v 230,hex48,Disabled_Media_Wearout_Ind

The solution from WD: exchanging your SSDs for new ones is an absolutely terrible approach because it won't solve anything.

The solution from Synology: this not supported SSD in our NAS is like a head in the sand.

Thx again.

J.

comment:12 by jeyare, 3 years ago

I'm just curious.
For a correct reading from drivedb.h it is recommended to read:

  • attribute ID, then attribute NAME?
  • or just attribute name only?

Because if I look at drivedb.h, no one there uses ID230 for MWI, only WD SSDs

  • however, Seagate uses "-v 231,hex56,SSD_Life_Left " - dif. ID and NAME for MWI
  • your general table contains: "-v 233, raw48, Media_Wearout_Indicator, SSD"
  • Intel also uses "-v 233, raw48, Media_Wearout_Indicator, SSD"
  • same for Samsung

I understand that compliance with drive vendors is more of a fairy tale.

This means that if someone wants to integrate SMART into a system, he must not only have an up-to-date DB from you, but he must also understand that if he measures MWI on Intel, it should use an Attrib ID-233, for Seagate ID-231, for WD ID-230, for Samsung ...
or be sure that there is 100% accurately written the NAME field = Media_Wearout_Indicator. What isn§t true from the Seagate side.

Thx
J.

comment:13 by Christian Franke, 3 years ago

SMART attributes were never part of any standard. The related command SMART READ DATA was declared obsolete in ATA ACS-4 (2015).

In practice, SMART attributes are vendor and device (and sometime firmware version) specific. HDDs use a common set (except newer Helium-related entries), but SSD vendors continue to invent new attribute sets.

Monitoring tools interested in MWI should consult Percentage Used Endurance Indicator from Device Statistics (smartctl -l devstat) if supported. Otherwise a drive database entry and proper interpretation is required.

Last edited 3 years ago by Christian Franke (previous) (diff)

comment:14 by Christian Franke, 5 months ago

Ticket #1852 (WDC WDS100T1R0A) has been marked as a duplicate of this ticket.

Note: See TracTickets for help on using tickets.