Opened 2 years ago

Last modified 4 months ago

#1620 new enhancement

WD WDS SSDs show incorrect Media Wearout Indicator (WDS400T2B0A)

Reported by: jeyare Owned by:
Priority: minor Milestone: undecided
Component: drivedb Version: 7.3
Keywords: ssd Cc:

Description

almost one year old drives show MWI (230) with value 001 and hex: 0x015d011e015d
what is wrong, because it’s not 1% of the lifespan, it’s 99%

"-v 230,hex48,Media_Wearout_Indicator "

Model Family: WD Blue / Red / Green SSDs
Device Model: WDC WDS400T2B0A-00SM50
Serial Number: 1926D7420114
LU WWN Device Id: 5 001b44 4a8e02f24
Firmware Version: 411030WD

same found in many post over internet

Issue:
Synology NAS read this information as 1% lifespan …. wrong
Win10, Linux OSs as 99% … expected and they categorize the same drives as health (native WD tool or smartctl)
when MWI is watched as one of the health indicators, then system evaluated drives status unhealthy, you can imagine all consequences.
What is your point? Thx.
J.

Attachments (2)

Screenshot2.png (69.8 KB ) - added by jeyare 2 years ago.
Screenshot1.png (705.7 KB ) - added by jeyare 2 years ago.

Download all attachments as: .zip

Change History (16)

comment:1 by Christian Franke, 2 years ago

Keywords: ssd added; WDS removed
Milestone: undecided

Please provide sample smartctl -x ... output of an affected device.

comment:2 by jeyare, 2 years ago

smartctl -x /dev/sata1p6

smartctl 6.5 (build date Oct 7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

START OF INFORMATION SECTION

Vendor: WDC
Product: WDS400T2B0A-00SM
Revision: 30WD
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
LU is fully provisioned
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Logical Unit id: 0x5001b444a8e02f24
Serial number: 1926D7420114
Device type: disk
Local Time is: Sat Jun 4 16:52:32 2022 BST
SMART support is: Unavailable - device lacks SMART capability.
Read Cache is: Enabled
Writeback Cache is: Enabled

START OF READ SMART DATA SECTION

Current Drive Temperature: 0 C
Drive Trip Temperature: 0 C

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
Device does not support Self Test logging
Device does not support Background scan results logging


smartctl -a -d ata /dev/sata1p6

smartctl 6.5 (build date Oct 7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

START OF INFORMATION SECTION

Model Family: WD Blue / Red / Green SSDs
Device Model: WDC WDS400T2B0A-00SM50
Serial Number: 1926D7420114
LU WWN Device Id: 5 001b44 4a8e02f24
Firmware Version: 411030WD
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: Unknown(0x0ff0), ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA >3.2 (0x1ff), 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat Jun 4 16:53:38 2022 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

START OF READ SMART DATA SECTION

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity

was never started.
Auto Offline Data Collection: Disabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.

Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.

No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.
Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0032 100 100 --- Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 --- Old_age Always - 20705

12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 29

165 Block_Erase_Count 0x0032 100 100 --- Old_age Always - 8670610390
166 Minimum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 4
167 Max_Bad_Blocks_per_Die 0x0032 100 100 --- Old_age Always - 125
168 Maximum_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 47
169 Total_Bad_Blocks 0x0032 100 100 --- Old_age Always - 2899
170 Grown_Bad_Blocks 0x0032 100 100 --- Old_age Always - 0
171 Program_Fail_Count 0x0032 100 100 --- Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 --- Old_age Always - 0
173 Average_PE_Cycles_TLC 0x0032 100 100 --- Old_age Always - 13
174 Unexpected_Power_Loss 0x0032 100 100 --- Old_age Always - 3
184 End-to-End_Error 0x0032 100 100 --- Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
188 Command_Timeout 0x0032 100 100 --- Old_age Always - 77
194 Temperature_Celsius 0x0022 063 048 --- Old_age Always - 37 (Min/Max 18/48)
199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0
230 Media_Wearout_Indicator 0x0032 001 001 --- Old_age Always - 0x0160011e0160
232 Available_Reservd_Space 0x0033 100 100 004 Pre-fail Always - 100
233 NAND_GB_Written_TLC 0x0032 100 100 --- Old_age Always - 52683
234 NAND_GB_Written_SLC 0x0032 100 100 --- Old_age Always - 61340
241 Host_Writes_GiB 0x0030 253 253 --- Old_age Offline - 41701
242 Host_Reads_GiB 0x0030 253 253 --- Old_age Offline - 191267
244 Temp_Throttle_Status 0x0032 000 100 --- Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 20652 -
# 2 Short offline Completed without error 00% 20633 -
# 3 Short offline Completed without error 00% 20620 -
# 4 Short offline Completed without error 00% 20452 -
# 5 Short offline Completed without error 00% 20284 -
# 6 Short offline Completed without error 00% 20116 -
# 7 Short offline Completed without error 00% 19949 -
# 8 Short offline Completed without error 00% 19781 -
# 9 Short offline Completed without error 00% 19613 -
#10 Short offline Completed without error 00% 19445 -
#11 Short offline Completed without error 00% 19277 -
#12 Short offline Completed without error 00% 19109 -
#13 Short offline Completed without error 00% 18942 -
#14 Short offline Completed without error 00% 18774 -
#15 Short offline Completed without error 00% 18606 -
#16 Short offline Completed without error 00% 18494 -
#17 Short offline Completed without error 00% 18318 -
#18 Short offline Completed without error 00% 18150 -
#19 Short offline Completed without error 00% 17982 -
#20 Short offline Completed without error 00% 17814 -
#21 Short offline Completed without error 00% 17646 -

Selective Self-tests/Logging not supported


smartctl -P show -d ata /dev/sata1p6

smartctl 6.5 (build date Oct 7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Drive found in smartmontools Database. Drive identity strings:
MODEL: WDC WDS400T2B0A-00SM50
FIRMWARE: 411030WD
match smartmontools Drive Database entry:
MODEL REGEXP: WDC WDBNCE(250|500|00[124])0PNC(-.*)?|WDC ?WDS((120|240|250|480|500)G|[124]00T)(1B|2B|1G|2G|1R)0[AB](-.*)?
FIRMWARE REGEXP: .*
MODEL FAMILY: WD Blue / Red / Green SSDs
ATTRIBUTE OPTIONS: 165 Block_Erase_Count

166 Minimum_PE_Cycles_TLC
167 Max_Bad_Blocks_per_Die
168 Maximum_PE_Cycles_TLC
169 Total_Bad_Blocks
170 Grown_Bad_Blocks
171 Program_Fail_Count
172 Erase_Fail_Count
173 Average_PE_Cycles_TLC
174 Unexpected_Power_Loss
230 Media_Wearout_Indicator
233 NAND_GB_Written_TLC
234 NAND_GB_Written_SLC
241 Host_Writes_GiB
242 Host_Reads_GiB
244 Temp_Throttle_Status

Version 0, edited 2 years ago by jeyare (next)

in reply to:  2 comment:3 by Christian Franke, 2 years ago

smartctl -a -d ata /dev/sata1p6

-a does only include legacy SMART information that's why I requested -x. Please provide smartctl -x -d ata ... output. Use plain-text attachments or wiki markup.

comment:4 by jeyare, 2 years ago

smartctl -x -d ata /dev/sata1p6
smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     WD Blue / Red / Green SSDs
Device Model:     WDC  WDS400T2B0A-00SM50
Serial Number:    1926D7420114
LU WWN Device Id: 5 001b44 4a8e02f24
Firmware Version: 411030WD
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   Unknown(0x0ff0), ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jun  4 20:46:14 2022 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x11) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME                                                   FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct                                            -O--CK   100   100   ---    -    0
  9 Power_On_Hours                                                   -O--CK   100   100   ---    -    20709
 12 Power_Cycle_Count                                                -O--CK   100   100   ---    -    29
165 Block_Erase_Count                                                -O--CK   100   100   ---    -    8670610390
166 Minimum_PE_Cycles_TLC                                            -O--CK   100   100   ---    -    4
167 Max_Bad_Blocks_per_Die                                           -O--CK   100   100   ---    -    125
168 Maximum_PE_Cycles_TLC                                            -O--CK   100   100   ---    -    47
169 Total_Bad_Blocks                                                 -O--CK   100   100   ---    -    2899
170 Grown_Bad_Blocks                                                 -O--CK   100   100   ---    -    0
171 Program_Fail_Count                                               -O--CK   100   100   ---    -    0
172 Erase_Fail_Count                                                 -O--CK   100   100   ---    -    0
173 Average_PE_Cycles_TLC                                            -O--CK   100   100   ---    -    13
174 Unexpected_Power_Loss                                            -O--CK   100   100   ---    -    3
184 End-to-End_Error                                                 -O--CK   100   100   ---    -    0
187 Reported_Uncorrect                                               -O--CK   100   100   ---    -    0
188 Command_Timeout                                                  -O--CK   100   100   ---    -    77
194 Temperature_Celsius                                              -O---K   064   048   ---    -    36 (Min/Max 18/48)
199 UDMA_CRC_Error_Count                                             -O--CK   100   100   ---    -    0
230 Media_Wearout_Indicator                                          -O--CK   001   001   ---    -    0x0160011e0160
232 Available_Reservd_Space                                          PO--CK   100   100   004    -    100
233 NAND_GB_Written_TLC                                              -O--CK   100   100   ---    -    52686
234 NAND_GB_Written_SLC                                              -O--CK   100   100   ---    -    61344
241 Host_Writes_GiB                                                  ----CK   253   253   ---    -    41704
242 Host_Reads_GiB                                                   ----CK   253   253   ---    -    191277
244 Temp_Throttle_Status                                             -O--CK   000   100   ---    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented
Read GP Log Directory failed

SMART Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00           SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      2  Comprehensive SMART error log
0x04           SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x30           SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f      SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log Version: 1
No Errors Logged

SMART Extended Self-test Log (GP Log 0x07) not supported

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20652         -
# 2  Short offline       Completed without error       00%     20633         -
# 3  Short offline       Completed without error       00%     20620         -
# 4  Short offline       Completed without error       00%     20452         -
# 5  Short offline       Completed without error       00%     20284         -
# 6  Short offline       Completed without error       00%     20116         -
# 7  Short offline       Completed without error       00%     19949         -
# 8  Short offline       Completed without error       00%     19781         -
# 9  Short offline       Completed without error       00%     19613         -
#10  Short offline       Completed without error       00%     19445         -
#11  Short offline       Completed without error       00%     19277         -
#12  Short offline       Completed without error       00%     19109         -
#13  Short offline       Completed without error       00%     18942         -
#14  Short offline       Completed without error       00%     18774         -
#15  Short offline       Completed without error       00%     18606         -
#16  Short offline       Completed without error       00%     18494         -
#17  Short offline       Completed without error       00%     18318         -
#18  Short offline       Completed without error       00%     18150         -
#19  Short offline       Completed without error       00%     17982         -
#20  Short offline       Completed without error       00%     17814         -
#21  Short offline       Completed without error       00%     17646         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (SMART Log 0x04)
Page  Offset Size        Value Flags Description
ATA_SMART_READ_LOG failed: Multi-sector ATA commands not implemented
Read Device Statistics pages 0x00-0x07 failed

ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented
Read SATA Phy Event Counters failed
Last edited 2 years ago by Christian Franke (previous) (diff)

in reply to:  4 comment:5 by Christian Franke, 2 years ago

smartctl 6.5 (build date Oct  7 2021) [x86_64-linux-4.4.180+] (local build)

Please note that this release is 6+ years old.

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
...
230 Media_Wearout_Indicator -O--CK   001   001   ---    -    0x0160011e0160

Most thresholds are missing. The normalized value is possibly increasing from 0 instead of decreasing from 100. This is a drive firmware issue which cannot be fixed by a drive database entry. From the ATA standards point of view, it is not a firmware bug because the SMART data structure was never part of any ATA standard.

Device Statistics (SMART Log 0x04)
Page  Offset Size        Value Flags Description
ATA_SMART_READ_LOG failed: Multi-sector ATA commands not implemented
Read Device Statistics pages 0x00-0x07 failed

Device Statistics usually contains the correct Percentage Used Endurance Indicator but unfortunately this SATA driver does not implement the required functionality to read this log.

There are two alternatives which might disable the false interpretation of attribute 230 by Synology NAS:

Rename the attribute:
-v 230,hex48,Bogus_Media_Wearout_Ind

Move VALUE WORST to RAW_VALUE:
-v 230,hex64,Media_Wearout_Indicator

comment:6 by Christian Franke, 2 years ago

Summary: WD WDS SSDs show incorrect Media Wearout IndicatorWD WDS SSDs show incorrect Media Wearout Indicator (WDS400T2B0A)

comment:7 by jeyare, 2 years ago

Thanks Christian for your valuable support.
Gents from Synology support spent 3 weeks responding to the MWI issue (+10 iterations). Instead, they labeled these 6 same SSDs as unhealthy (by the last DSM7 system update, never in previous system’s version), and subsequently they wrote that the drive is incompatible with their NASes (which is stup.d). Thanks to this upgrade the volume being degraded. Similar story with the support on the WD side - they still looking for reasons not to do so, even though I asked them for a clear definition of the MWI algorithm for normalization value. Instead they asked me for information about ‘msinfo’ (Win OS) even though the NAS contains FreeBSD (what they know from me).
Thanks again. Great support from you. I will check the proposed steps.

J.

comment:8 by Christian Franke, 2 years ago

Type: defectenhancement

You're welcome.

Apparently only few WD SSD models and/or firmware versions are affected.

Tickets for similar drives which correctly show attribute 230 with a VALUE WORST of 100 100: #767, #771, #845, #980, #1048, #1162, #1073, #1169, #1198, #1321.

There are two exceptions:
WDC WDS400T1R0A-68A4W0/411000WR (#1450, #1601).
WDC WDS400T2B0A-00SM50/411030WD (this ticket).

Perhaps we could add a separate drivedb entry to handle these cases if it is known how the bogus behavior of Synology NAS and others could be prevented.

Leaving the ticket open as undecided for now.

by jeyare, 2 years ago

Attachment: Screenshot2.png added

comment:9 by jeyare, 2 years ago

Christian,

1. test

Rename the attribute:
-v 230,hex48,Bogus_Media_Wearout_Ind

result:
This option stops DSM from sampling the Estimated lifespan value displayed on the DSM GUI and rather than the inverted % value it shows a '- instead.
Attached: Screenshot 1
Verdict: doesn't work

2. test

Move VALUE WORST to RAW_VALUE:
-v 230,hex64,Media_Wearout_Indicator

result:
The second retains the Estimated lifespan value but changes the value to 0% triggering the Damaged / Critical warnings.
Attached: Screenshot 2
Verdict: doesn't work

so it must be tricky to find an accurate value to prevent drive attribute reading by the Synology DSM system.

Thx for a help

J.

Last edited 2 years ago by jeyare (previous) (diff)

by jeyare, 2 years ago

Attachment: Screenshot1.png added

comment:10 by Christian Franke, 2 years ago

Screenshot1.png shows the expected behavior of 2. test, not 1. test:
VALUE and WORST are both returned as --- and both appear in RAW_VALUE as least significant bytes (0x0160011e01600101). Originally introduced for the unusual 64-bit format of very old Indilinx SSD controllers, this should prevent any interpretation of the VALUE.

comment:11 by jeyare, 2 years ago

Chris,
thx for the help. The issue is temporarily resolved (till the next Synology system update) by:

-v 230,hex48,Disabled_Media_Wearout_Ind

The solution from WD: exchanging your SSDs for new ones is an absolutely terrible approach because it won't solve anything.

The solution from Synology: this not supported SSD in our NAS is like a head in the sand.

Thx again.

J.

comment:12 by jeyare, 2 years ago

I'm just curious.
For a correct reading from drivedb.h it is recommended to read:

  • attribute ID, then attribute NAME?
  • or just attribute name only?

Because if I look at drivedb.h, no one there uses ID230 for MWI, only WD SSDs

  • however, Seagate uses "-v 231,hex56,SSD_Life_Left " - dif. ID and NAME for MWI
  • your general table contains: "-v 233, raw48, Media_Wearout_Indicator, SSD"
  • Intel also uses "-v 233, raw48, Media_Wearout_Indicator, SSD"
  • same for Samsung

I understand that compliance with drive vendors is more of a fairy tale.

This means that if someone wants to integrate SMART into a system, he must not only have an up-to-date DB from you, but he must also understand that if he measures MWI on Intel, it should use an Attrib ID-233, for Seagate ID-231, for WD ID-230, for Samsung ...
or be sure that there is 100% accurately written the NAME field = Media_Wearout_Indicator. What isn§t true from the Seagate side.

Thx
J.

comment:13 by Christian Franke, 2 years ago

SMART attributes were never part of any standard. The related command SMART READ DATA was declared obsolete in ATA ACS-4 (2015).

In practice, SMART attributes are vendor and device (and sometime firmware version) specific. HDDs use a common set (except newer Helium-related entries), but SSD vendors continue to invent new attribute sets.

Monitoring tools interested in MWI should consult Percentage Used Endurance Indicator from Device Statistics (smartctl -l devstat) if supported. Otherwise a drive database entry and proper interpretation is required.

Last edited 2 years ago by Christian Franke (previous) (diff)

comment:14 by Christian Franke, 4 months ago

Ticket #1852 (WDC WDS100T1R0A) has been marked as a duplicate of this ticket.

Note: See TracTickets for help on using tickets.