Opened 3 years ago
Last modified 3 years ago
#1523 new enhancement
Schrodinger's test - Captive tests fail when Smart is checked
Reported by: | Kevin C | Owned by: | |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | smartctl | Version: | 6.5 |
Keywords: | ata | Cc: |
Description
I am trying to run Captive tests on my Synology drives, because Short Offline (background) don't show any problem, and when I have tried Extended Offline - they ran for months without finishing. I have several sectors pending relocation, but they never seemed to relocate.
However, I've found that Short Captive tests will tell me once it hits a bad block, and then I can go and force the issue with hdparm on that LBA, and repeat the process to clear the pending relocation sectors.
The commands I am using are these
#start the test smartctl -t short -C /dev/sda -d ata
wait a while (Generally about 10 minutes to an hour) and then check if it is done with
#Check disk information smartctl -a /dev/sda -d ata
if the tests were done, you should get badblock LBA information - but if they weren't then presumably the commands requesting SMART are failing to return in time, and causing the host reset it mentions interrupting the test.
use HDParm to forcibly fix or relocate the sector
hdparm --repair-sector 218492585 --yes-i-know-what-i-am-doing /dev/sda
My main uncertainty is that as you can see from the screenshot - I often accidentally check if the test is done running before it has finished, which seems to cause the driver/drive to interrupt the captive test and prevent it finishing, is there some way that would be better to check if a Captive test is still in progress?
Until then, I am calling this Schrodinger's Test - we don't know if it's still alive or errored or if checking killed it by checking until we check
Change History (4)
comment:1 by , 3 years ago
comment:2 by , 3 years ago
This is the behavior of a specific (which?) drive model in conjunction with a 5 year old version of smartctl.
During captive tests, the drive does not accept any ATA command except a possible device reset performed by the OS driver after some configured timeout. Captive test may always be interrupted by the OS before completion (related: ticket #1066). It is dangerous to use captive tests on drives with mounted partitions.
Non-captive tests should perform the same checks as captive tests, but without these problems. If not, this is a disk firmware bug.
If long tests do not work, I would recommend to run a regular read test with backblocks
or ddrescue
. See also the FAQ and the Bad Block HOWTO.
If you have any related enhancement request, please be more specific and describe it here.
For future support questions, please use the smartmontools-support mailing list instead. Thanks.
PS: Please do not use screen shots. Please do not paste smartctl output unchanged to tickets. Use plain-text attachments or wiki markup instead.
comment:3 by , 3 years ago
Milestone: | → undecided |
---|
comment:4 by , 3 years ago
Component: | all → smartctl |
---|---|
Keywords: | ata added |
in general, if it says "interrupted" assume I shortly afterward started a fresh test - so you can see that some of these - like # 6 Extended Captive was running for 92 hours before I interrupted it, or # 2 Short Captive was running for ~20 hours before I checked on it.
So it was still running that Short Captive test after 20 hours, when the program estimated it would be done in ~2 minutes