#1153 closed defect (wontfix)
Command timeout occurred when I used the command "smartctl -C -t short" on HDD test
Reported by: | jerrytw168 | Owned by: | |
---|---|---|---|
Priority: | critical | Milestone: | |
Component: | smartctl | Version: | 6.6 |
Keywords: | Cc: | linjerrytw@… |
Description
Hi,
When I used this command "smartctl -c -t short /dev/sdb" to verify SSD, smartctl (using smartctl -a)test result would show "Interrupt (host reset)" as following.
# 6 Short captive Interrupted (host reset) 70% 2423 -
# 7 Short captive Interrupted (host reset) 70% 2408 -
And /dev/log/dmesg also occurred some error messages below.
However, when I removed -C (captive mode), these issues would disappeared. I tried lots of SSDs (Intel, Samsung, HGST), I got the same symptom.
Would you please advise if the parameter "-C" can't use with "-t" in the same test? Or is it a bug for smartctl tool? I will be grateful for any help you can provide.
/var/log/dmesg
=================================================================
[166867.098164] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[166867.098172] ata2.00: failed command: SMART
[166867.098180] ata2.00: cmd b0/d4:00:81:4f:c2/00:00:00:00:00/00 tag 25
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[166867.098184] ata2.00: status: { DRDY }
[166867.098189] ata2: hard resetting link
[166867.403151] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[166867.403678] ata2.00: supports DRM functions and may not be fully accessible
[166867.404715] ata2.00: supports DRM functions and may not be fully accessible
[166867.405200] ata2.00: configured for UDMA/133
[166867.405218] ata2: EH complete
Change History (7)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Thanks for your prompt reply.
As for your question "Why do you need captive tests?"?
According to the description of "SMART RUN/ABORT OFFLINE TEST AND self-test OPTIONS", '-C' option can be used in conjuction with short or long self-test. That's why I use in captive mode for the testing.
To be honest, I don't understand what test purpose of captive mode is. If possible, could you explain more when I just need to use SSD self-test in captive mode.
Thanks for your help.
comment:3 by , 6 years ago
Summary: | Some issues occurred when I used the command "smartctl -C -t short" on HDD test → Command timeout occurred when I used the command "smartctl -C -t short" on HDD test |
---|
There is no need to use the captive mode. I never use it.
- Off-line (Background) test: The test command returns immediately and the test itself continues in background. The drive is accessible during the test (see also the FAQ).
- Captive (Foreground) test: The test command waits until the test has finished. The drive is not accessible during the test. Captive tests are aborted if the device driver times out the command and resets the link. Therefore
smartctl -C -t ...
should set a sufficiently long timeout when issuing the test command. This is not the case for ATA/SATA devices.
Leaving ticket open as we should either fix the timeout setting or remove the -C
option.
comment:4 by , 5 years ago
Christian, i think this timeout comes not from smartmontools, but from system itself, if drive is mounted and in use. We can potentially warn user that captive test is dangerous and will put device offline for some time and to require --force
for it.
comment:5 by , 4 years ago
May this breakage be avoided if smartctl would temporarily increase some timeouts
as set, e.g. in /sys/block/${disk}/device/timeout
on linux.
The drive is not accessible during the [captive "-C"] test.
This is quite a limitation.
So, for me the keep-awake solution for reliable offline (background) selftests from https://www.smartmontools.org/ticket/1443 seems preferable.
comment:6 by , 2 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Workaround provided in the ticket
comment:7 by , 2 years ago
Milestone: | undecided |
---|
The kernel log shows that the SMART command which runs the captive test was aborted by the driver with "timeout". Then the driver resets the device. The device reset aborts the running self test. This is then recorded as "host reset" in the self-test log.
The problem is that smartctl does not pass a sufficient long command timeout to the driver in this case. Some drivers don't even support long timeouts.
Why do you need captive tests?
PS: For future submission, please do not set a milestone.