Opened 3 years ago
Last modified 3 years ago
#1614 assigned patch
Add more status strings for ASC 0xb
Reported by: | asomers | Owned by: | Doug Gilbert |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | all | Version: | |
Keywords: | scsi | Cc: |
Description
This patch adds in all currently defined status strings for ASC 0xb. In particular, I find that Seagate ST16000NM002G disks frequently return 0xb/0x14 .
Attachments (1)
Change History (8)
by , 3 years ago
Attachment: | 0001-Add-more-status-strings-for-ASC-0xb.patch added |
---|
comment:1 by , 3 years ago
Component: | smartctl → all |
---|---|
Keywords: | scsi added |
Milestone: | → undecided |
comment:2 by , 3 years ago
comment:3 by , 3 years ago
Oh, that's a good idea! Unfortunately I've already RMAed all of the drives that were reporting this error. But it happens fairly often. It'll probably happen again within a month, and then I'll run that command for you.
comment:4 by , 3 years ago
We got lucky: I just got another error of that type. But the output isn't very interesting I'm afraid:
$ sudo sg_get_elem_status --filter=1 -vv /dev/da557
Get physical element status cdb: [9e 17 00 00 00 00 00 00 00 00 00 00 00 20 40 00]
response length 32 bytes
Number of descriptors: 1
Number of descriptors returned: 0
Identifier of element being depopulated: 0
No complete physical element status descriptors available
For comparison, here is the same command run on a healthy drive:
$ sudo sg_get_elem_status --filter=1 -vv /dev/da556
Get physical element status cdb: [9e 17 00 00 00 00 00 00 00 00 00 00 00 20 40 00]
response length 32 bytes
Number of descriptors: 0
Number of descriptors returned: 0
Identifier of element being depopulated: 0
No complete physical element status descriptors available
And here is the output without the filter bit set. It's the same on both healthy and degraded drives.
$ sudo sg_get_elem_status --filter=0 -vv /dev/da557
Get physical element status cdb: [9e 17 00 00 00 00 00 00 00 00 00 00 00 20 00 00]
response length 32 bytes
Number of descriptors: 18
Number of descriptors returned: 0
Identifier of element being depopulated: 0
No complete physical element status descriptors available
comment:5 by , 3 years ago
Thanks for that as its the first time I've seen a real response to that command. In the last case adding the --maxlen=1k option should print out the 18 descriptors. I should change that. Anyway 18 seems a bit strange as its a 16 TB disk. I would like to see the full output, perhaps you could email to me.
I would like to see the "0xb,0x14" sense data also include a INFO field that said _which_ physical element id it was reporting. Would a physical element "coming good" qualify for this warning since it is a change? I found a product manual for that disk family [100845788g.pdf] but it says virtually nothing about "physical elements" apart from saying Get physical element status and Remove element and truncate commands are supported.
There is a Physical element health field for each element where values 0x1 through 0x63 are okay, 0x64 is on the edge and >= 0x65 is kaput. T10 doesn't say whether that is a sliding scale (like endurance on a SSD).
comment:6 by , 3 years ago
Seagate's datasheet doesn't say so, but other websites describe this disk as having 9 platters. So each one of those physical elements probably corresponds to a surface. Here's the command output with --maxlen=1k
For the degraded disk:
$ sudo sg_get_elem_status --filter=0 --maxlen=1k /dev/da557
Number of descriptors: 18
Number of descriptors returned: 18
Identifier of element being depopulated: 0
Element descriptors:
[1] identifier: 0x000001 associated LBs: not specified health: within manufacturer's specification limits <1>
[2] identifier: 0x000002 associated LBs: not specified health: within manufacturer's specification limits <1>
[3] identifier: 0x000003 associated LBs: not specified health: within manufacturer's specification limits <1>
[4] identifier: 0x000004 associated LBs: not specified health: within manufacturer's specification limits <1>
[5] identifier: 0x000005 associated LBs: not specified health: within manufacturer's specification limits <1>
[6] identifier: 0x000006 associated LBs: not specified health: within manufacturer's specification limits <1>
[7] identifier: 0x000007 associated LBs: not specified health: within manufacturer's specification limits <1>
[8] identifier: 0x000008 associated LBs: not specified health: within manufacturer's specification limits <1>
[9] identifier: 0x000009 associated LBs: not specified health: within manufacturer's specification limits <1>
[10] identifier: 0x00000a associated LBs: not specified health: outside manufacturer's specification limits <101>
[11] identifier: 0x00000b associated LBs: not specified health: within manufacturer's specification limits <1>
[12] identifier: 0x00000c associated LBs: not specified health: within manufacturer's specification limits <1>
[13] identifier: 0x00000d associated LBs: not specified health: within manufacturer's specification limits <1>
[14] identifier: 0x00000e associated LBs: not specified health: within manufacturer's specification limits <1>
[15] identifier: 0x00000f associated LBs: not specified health: within manufacturer's specification limits <1>
[16] identifier: 0x000010 associated LBs: not specified health: within manufacturer's specification limits <1>
[17] identifier: 0x000011 associated LBs: not specified health: within manufacturer's specification limits <1>
[18] identifier: 0x000012 associated LBs: not specified health: within manufacturer's specification limits <1>
And for a healthy disk:
$ sudo sg_get_elem_status --filter=0 --maxlen=1k /dev/da556
Number of descriptors: 18
Number of descriptors returned: 18
Identifier of element being depopulated: 0
Element descriptors:
[1] identifier: 0x000001 associated LBs: not specified health: within manufacturer's specification limits <1>
[2] identifier: 0x000002 associated LBs: not specified health: within manufacturer's specification limits <1>
[3] identifier: 0x000003 associated LBs: not specified health: within manufacturer's specification limits <1>
[4] identifier: 0x000004 associated LBs: not specified health: within manufacturer's specification limits <1>
[5] identifier: 0x000005 associated LBs: not specified health: within manufacturer's specification limits <1>
[6] identifier: 0x000006 associated LBs: not specified health: within manufacturer's specification limits <1>
[7] identifier: 0x000007 associated LBs: not specified health: within manufacturer's specification limits <1>
[8] identifier: 0x000008 associated LBs: not specified health: within manufacturer's specification limits <1>
[9] identifier: 0x000009 associated LBs: not specified health: within manufacturer's specification limits <1>
[10] identifier: 0x00000a associated LBs: not specified health: within manufacturer's specification limits <1>
[11] identifier: 0x00000b associated LBs: not specified health: within manufacturer's specification limits <1>
[12] identifier: 0x00000c associated LBs: not specified health: within manufacturer's specification limits <1>
[13] identifier: 0x00000d associated LBs: not specified health: within manufacturer's specification limits <1>
[14] identifier: 0x00000e associated LBs: not specified health: within manufacturer's specification limits <1>
[15] identifier: 0x00000f associated LBs: not specified health: within manufacturer's specification limits <1>
[16] identifier: 0x000010 associated LBs: not specified health: within manufacturer's specification limits <1>
[17] identifier: 0x000011 associated LBs: not specified health: within manufacturer's specification limits <1>
[18] identifier: 0x000012 associated LBs: not specified health: within manufacturer's specification limits <1>
comment:7 by , 3 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
A complete list of decoded SCSI ASC/ASCQ codes is pretty large and smartmontools doesn't have one (sg3_utils does in its library). That said 0xb,0x14 is a new one and looks pretty important: "WARNING - PHYSICAL ELEMENT STATUS CHANGE". I only have recent WD SAS disks and they don't use physical elements; it looks like 16 GB Seagate SAS disks do have physical elements. According to sbc5r01.pdf section 4.36.2 that warning should prompt a GET PHYSICAL ELEMENT STATUS command with a filter value of 1. Could you try sg_get_elem_status utility in sg3_utils to see what it reports (when --filter=1) and report if it shows anything of note?