Opened 5 years ago
Last modified 4 years ago
#1313 new enhancement
Beware of SMR drives in CMR clothing — at Initial Version
Reported by: | Stoat | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | all | Version: | |
Keywords: | Cc: | Christopher Kenna, Christoph, Andrew Clayton |
Description
I'm really not sure how this can be addressed easily.
There are a lot of SMR drives quietly submarining into supply channels that are programmed to "look" like "conventional" drives (CMR). This appears to be an attempt to end-run around consumer resistance
WD and Seagate are _both_ shipping drive-managed SMR (DM-SMR) drives which don't report themselves as SMR when questioned via conventional means.
What's worse, they're shipping DM-SMR drives as "RAID" and "NAS" drives
This is causing MAJOR problems - such as the latest iteration of WD REDs (WDx0EFAX replacing WDx0EFRX) being unable to be used for rebuilding RAID[56] or ZFS RAIDZ sets: They rebuiild for a while (1-2 hours), then throw errors and get kicked out of the set.
When this happens, the drives themselves report such oddities in the logs as "IDNF" ("Sector ID not found") when interrogated with smartctl -x
Error 451 [18] occurred at disk power-on lifetime: 286 hours (11 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 00 00 00 cf ae b9 20 40 00 Error: IDNF at LBA = 0xcfaeb920 = 3484334368
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
61 00 10 00 28 00 00 00 00 0a 10 40 00 8d+03:27:28.841 WRITE FPDMA QUEUED
61 00 10 00 18 00 01 d1 c0 76 10 40 00 8d+03:27:28.841 WRITE FPDMA QUEUED
61 00 10 00 10 00 01 d1 c0 74 10 40 00 8d+03:27:28.765 WRITE FPDMA QUEUED
61 00 90 00 08 00 00 cf ae ba 60 40 00 8d+03:27:28.765 WRITE FPDMA QUEUED
61 00 a0 00 00 00 00 cf ae b9 b8 40 00 8d+03:27:28.765 WRITE FPDMA QUEUED
This manifests in Linux kernel as:
[20809.396248] blk_update_request: critical target error, dev sdd, sector 3484334368 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
[20809.396275] sd 0:0:3:0: [sdd] tag#830 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[20809.396279] sd 0:0:3:0: [sdd] tag#829 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
[20809.396280] sd 0:0:3:0: [sdd] tag#830 CDB: Write(16) 8a 00 00 00 00 00 cf ae ba 60 00 00 00 90 00 00
[20809.396284] blk_update_request: I/O error, dev sdd, sector 3484334688 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
[20809.396285] sd 0:0:3:0: [sdd] tag#829 CDB: Write(16) 8a 00 00 00 00 00 cf ae b9 b8 00 00 00 a0 00 00
[20809.396289] blk_update_request: I/O error, dev sdd, sector 3484334520 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
I'd originally thought this was due to them reading ahead to areas not yet recorded in the shingled area in use and getting an unformatted block(*), but 100% zero-filling a drive resulted in the same errors.
(*) SMR zone filling is different to CMR preformatted layouts
The only real external clue that these drives are not what they should be is that they report themselves as "trim" capable - which a CMR drive shouldn't be able to do.
for a WD40EFAX
hdparm -I shows:
- Data Set Management TRIM supported (limit 10 blocks)
- Deterministic read ZEROs after TRIM
And to make matters worse, not all DM-SMR drives are doing this - for example Seagate ST3000DM-007 are DM-SMR but don't report TRIM capabilities
Caveats:
Beware of using "PMR" as a term, SMR (shingled) is an extension on PMR and if you use this instead of "CMR" (for 'conventional') the drive makers will latch on it to twist words and block your complaints - I've already found this out the hard way, and the fact that they're resorting to weasel-activities like that shows they know that they have created problems.
"Two dimensional magnetic recording"(TDMR) is a necessary part of SMR (shingled+zoning), but the marketers don't see it that way. The effect is the same and the implications of both are the same.