#608 closed defect (invalid)
Long test hanged on HGST drives
Reported by: | janardhan | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | all | Version: | |
Keywords: | scsi | Cc: |
Description (last modified by )
We have HGST drives in our server. When we triggered long test to these drives it got hang up and in smartoutput it is showing as long test is running.
Sample outputs:
SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Self test in progress ... 1 NOW - [- - -] # 2 Background long Aborted (device reset ?) 8 0 - [- - -]
When we contacted with HGST support team. They reproduced the issue and ran their own tools on the drive. They found no background tests is running but still smartoutput is showing like the test is in progress.
We already tried smartcl -X to abort the test but it throws an error.
smartctl -X /dev/sdf smartctl 6.0 2012-10-10 r3643 [x86_64-linux-2.6.30.5-43.ami26.fc11.x86_64] (local build) Copyright 2002-12, Bruce Allen, Christian Franke, _www.smartmontools.org Abort self test failed [unsupported field in scsi command]
Help us to stop this test.
Attachments (1)
Change History (8)
comment:1 by , 9 years ago
Description: | modified (diff) |
---|---|
Keywords: | scsi added |
Milestone: | → undecided |
by , 9 years ago
comment:2 by , 9 years ago
Please find the attachment for the output "smartctl -r ioctl,2 -a /dev/sdf"
follow-up: 7 comment:3 by , 9 years ago
The drive returns the following self-test log entry #2 (at offset 0x18...0x2b):
... Incoming data, len=404 [only first 256 bytes shown]: 00 10 00 01 90 00 01 03 10 20 00 25 b4 ff ff ff ff 10 ff ff ff ff 00 00 00 01 [00 02 03 10 4f 01 00 00 NUMBER = 2 --------------------^^^^^ || || ||||| TYPE<<1 = 2<<1 (Background long) ----------^| || ||||| STATUS = 0xf (Self test in progress ...) ---^ || ||||| SEGMENT = 1 ----------------------------------^^ ||||| HOURS = 0 ---------------------------------------^^^^^ 20 ff ff ff ff ff ff ff ff 00 00 00 00]00 03 03 10 ...
This and the other entries are properly printed by smartctl:
SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 9652 - [- - -] # 2 Background long Self test in progress ... 1 NOW - [- - -] # 3 Background long Aborted (device reset ?) 8 0 - [- - -]
This is probably a harmless HGST firmware bug: Entries from previously running self-tests may not be cleaned up properly if the self-test was aborted by power loss or similar.
follow-up: 5 comment:4 by , 9 years ago
Thanx for your response
This disk is running on server, and there is no power loss for this.
Other than powerloss is there nay possibility of aborting the self test?
comment:5 by , 9 years ago
Milestone: | undecided |
---|---|
Resolution: | → invalid |
Status: | new → closed |
Version: | 6.0 |
Replying to janardhan:
Thanx for your response
This disk is running on server, and there is no power loss for this.
Other than powerloss is there nay possibility of aborting the self test?
This could be caused in case of SCSI related issues - OS may try to reset controller in this case and its typically aborts the test. Try to look on the dmesg log. I am closing this ticket, because it does not look like a smartmontools issue.
comment:6 by , 9 years ago
As HGST team we got confirmation that there isn o background tests is running. But in smartctl output it is showing like running now. Can you explain me why this is happening?
comment:7 by , 9 years ago
Because smartctl simply prints what the drive returns in its self-test log. This was already explained in detail, see comment 3 above.
Note that the "Self test in progress ..." entry moved from number #1 to #2 and therefore is no longer the most recent entry. This is an evidence that the #2 "Background long" self-test was actually aborted before the successful #1 "Background short" test was started before "9652" hours lifetime.
Please ask HGST team why their firmware did not change the state of this entry from "Self test in progress ..." to "Aborted" or similar.
Please provide output of
smartctl -r ioctl,2 -a /dev/sdf
as an attachment.