Opened 13 years ago
Last modified 10 years ago
#215 new enhancement
Read xerror and append new error(s) to log file
Reported by: | John Peterson | Owned by: | somebody |
---|---|---|---|
Priority: | minor | Milestone: | undecided |
Component: | all | Version: | 5.42 |
Keywords: | Cc: |
Description
The Bad block HOWTO for smartmontools show how to locate a pending sector and write zero to it.
I would appreciate an automation of this task.
This would be especially helpful for a NTFS disk as the example is for another file system.
Thanks!
Change History (8)
comment:1 by , 13 years ago
Priority: | major → minor |
---|
comment:2 by , 13 years ago
Summary: | Automate pending sector zero write → Read and append new xerror |
---|
Ok!, thanks. That's too bad! Since unstable sectors are serious I'm surprised my disk is designed to save only the last 24 errors. I would have saved much more than that since each error is only around 600 bytes including the call stack.
My fist inclination is obviously to reread the marked sectors only but since that's not possible I see it as my best option to run
ddrescue --force /dev/sda /dev/null
from Cygwin, setting ddrescue.exe to low i/o priority so as to least disturb other disk operations. It's a 2 TB USB 2.0 drive so a full read will take 77 hrs assuming 10 MB/s (USB max throughput is 20 MB/s but file sharing tasks continuously use the disk).
Since the disk don't maintain a list of all marked errors I'm changing my enhancement request. Can you provide a script that runs smartctl --log xerror and appends only new errors to a specified log file. I can then schedule this script in Task Scheduler so that I can maintain a complete marked sectors log. Preferable it can be accompanied by a script that reads the log and perform a read from the marked sectors with dd.
I also have a separate question, is there ever or always overlap between Current_Pending_Sector and Offline_Uncorrectable? I have 32 pending and 29 uncorrectable sectors, do I have 61, 32 or another number of marked sectors? Here's my smartctl --xall http://pastebin.com/iqKFPKSM.
Thanks!
comment:3 by , 13 years ago
Or rather, since I have LBA_of_first_error 3575199272 I should use
ddrescue --force --input-position=1830502027264 /dev/sdb /dev/null
to skip the first 1.8 TB.
comment:4 by , 13 years ago
Yes, makes sense. You can pass the LBA unchanged if 'b' is appended. Always use a ddrescue log file to record good/bad/undone status for disk ranges. With a log file you can interrupt ddrescue at any time and resume it later at the same point:
ddrescue -v --force --input-position=3575199272b /dev/sdb /dev/null disk.log
By using --max-retries=N
later it may be possible to force sector reallocation without loosing data.
SMART attributes are not standardized at all. The exact meaning of Current_Pending_Sector and Offline_Uncorrectable is vendor specific. Offline_Uncorrectable may count bad sectors found during SMART self-test.
comment:5 by , 13 years ago
Summary: | Read and append new xerror → Read xerror and append new error(s) to log file |
---|
Cool, thanks.
So LBA_of_first_error is the first from zero? during the self test. Not the first discovered? An extended self test is supposed to be a complete disk read so its LBA_of_first_error should give the first error from zero.
How would that move a sector to Reallocated_Sector_Ct? It never writes any data to the infile right?
Ok I get it, I thought the difference was in 'pending' and 'uncorrectable' and I misunderstood uncorrectable as failing a certain number of reads, now I understand that uncorrectable means uncorrectable by ECC which is the same error as for pending sectors and that the distinction is in the UPDATED status which is either Offline (self test) or Always.
By the way, please add phpBB to Hosted Apps!
Thanks!
comment:6 by , 13 years ago
Normally an extended self-test reports the first bad sector from zero as LBA_of_first_error. The self-test is typically aborted then. You could use selective self-tests to test the remaining sectors.
On some cases ddrescue --max-retries=N ...
is able to finally read a bad sector after many retries. Then the firmware should reallocate the sector and write the old data to the spare sector.
comment:7 by , 13 years ago
That's interesting, I would not have guessed that the self test ends because of a read error. Perhaps the wiki page could carry some information about the extended test too. I suggest the https://sourceforge.net/wiki/selftest_short https://sourceforge.net/wiki/test_offline articles are merged and information about the extended test is added.
That has not occurred with my drive, ddrescue show 8 unreadable sectors from a complete disk read (down from a higher number before a retry) but Reallocated_Sector_Ct remain at zero (and Current_Pending_Sector increased from 32 to 35).
I'm also happy to report that I've written the requested tools and they can be found at http://code.google.com/p/file-management-tools/. Example usage
php smarterr.php "smartctl --log xerror,99 p:" smarterr.log php smartest.php "ddrescue -vfdM /dev/sdb /dev/null ddrescue.log" smarterr.log
and an example of scheduling the S.M.A.R.T. error log update to run regularly
schtasks /Create /TN "User\smarterr" /F /SC DAILY /ST 03:00 /TR "php C:\...\smarterr.php \"smartctl -d sat --log xerror,99 p:\" \"C:\...\smarterr.log\""
Since ddrescue remembers its progress there's no reason to skip duplicate sectors in the smarterr log, smartest therefore send all read errors to ddrescue;
I believe these features can be added to this project suite, if they don't fit in smartctl they could fit in a new executable called for example smartest.
Thanks!
comment:8 by , 10 years ago
Milestone: | → undecided |
---|
There are other tools available which can do a full read scan and overwrite the bad blocks then:
A full read scan is needed to get the logical block addresses of all bad sectors. ATA/SATA devices do not provide a command to read the list of all addresses of pending sectors. Self-test and error log list only a subset of these sectors. The old SMART logs do not support 48-bit LBA addresses.
To detect the affected files on NTFS, you could use