Changes between Version 13 and Version 14 of BadBlockHowto


Ignore:
Timestamp:
Mar 29, 2017, 10:52:46 PM (8 years ago)
Author:
Christian Franke
Comment:

Add section Case Studies with a real-life example using ddrescue and sleuthkit on Windows

Legend:

Unmodified
Added
Removed
Modified
  • BadBlockHowto

    v13 v14  
    738738
    739739
     740== Case Studies ==
     741
     742This section is intended to collect step-by-step descriptions of some real-life use cases.
     743
     744=== Recovering a (mostly) unreadable sector of a Notebook HDD ===
     745
     746This was done in March 2016 under Windows 7 using ''Cygwin''^[#footnote12 12.]^ ports of ''GNU ddrescue''^[#footnote13 13.]^ and ''The Sleuth Kit (TSK)''^[#footnote14 14.]^. All commands shown should work similar on other platforms and with other filesystems.
     747
     748==== Determine Logical Block Address of unreadable sector ====
     749
     750Examine smartctl output:
     751{{{
     752root:~# smartctl -x /dev/sdb
     753smartctl 6.5 2016-02-29 r4227 [x86_64-w64-mingw32-win7-sp1] (daily-20160229)
     754...
     755Model Family:     SAMSUNG SpinPoint MP5
     756Device Model:     SAMSUNG HM640JJ
     757...
     758Firmware Version: 2AK10001
     759User Capacity:    640.135.028.736 bytes [640 GB]
     760Sector Size:      512 bytes logical/physical
     761Rotation Rate:    7200 rpm
     762Form Factor:      2.5 inches
     763...
     764ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
     765...
     766  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
     767...
     768  9 Power_On_Hours          -O--CK   100   100   000    -    251  <=== See Self-test Log below
     769...
     770197 Current_Pending_Sector  -O--CK   100   100   000    -    1    <=== At least 1 bad sector
     771...
     772SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
     773Device Error Count: 351 (device log contains only the most recent 8 errors)
     774...
     775Error 351 [6] occurred at disk power-on lifetime: 251 hours (10 days + 11 hours)
     776  When the command that caused the error occurred, the device was active or idle.
     777
     778  After command completion occurred, registers were:
     779  ER -- ST COUNT  LBA_48  LH LM LL DV DC
     780  -- -- -- == -- == == == -- -- -- -- --
     781  40 -- 51 00 01 00 00 33 3f d8 a6 40 00  Error: UNC 1 sectors at LBA = 0x333fd8a6 = 859822246  <=== Its LBA
     782
     783  Commands leading to the command that caused the error were:
     784  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
     785  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
     786  25 00 00 00 01 00 00 33 3f d8 a6 40 00     00:00:06.924  READ DMA EXT
     787...
     788
     789SMART Extended Self-test Log Version: 1 (2 sectors)
     790Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
     791# 1  Short offline       Completed: read failure       90%       176         859822246  <=== Detected 75 power on hours ago
     792}}}
     793
     794A read scan helps to verify the LBA and checks for other possible bad sectors
     795(alternatively replace `/dev/null` by a file path to create a disk image):
     796
     797{{{
     798root:~# ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdb /dev/null disk.map
     799GNU ddrescue 1.21-rc2
     800About to copy 610480 MiBytes from /dev/sdb [SAMSUNG HM640JJ::...] to /dev/null [0].
     801Proceed (y/N)? y
     802...
     803non-tried:        0 B,     errsize:      512 B,      run time:          2h
     804  rescued: 610480 MiB,      errors:        1,  remaining time:         n/a
     805percent rescued:  99.99%      time since last successful read:         20s
     806Finished
     807}}}
     808
     809The `ddrescue` map file now shows byte ranges of good and bad disk areas:
     810
     811{{{
     812root:~# cat disk.map
     813...
     814#      pos        size      status
     815  0x00000000  0x667FB14C00  +
     8160x667FB14C00    0x00000200  -  <=== 512 bytes unreadable
     8170x667FB14E00  0x2E8B541200  +
     818}}}
     819
     820Translate the byte position to the LBA:
     821
     822{{{
     823root:~# echo $((0x667FB14C00/512))
     824859822246
     825}}}
     826
     827Or convert the map file to a `badblocks` like list with `ddrescuelog` (part of recent versions of ''ddrescue'' package):
     828
     829{{{
     830root:~# ddrescuelog --list-blocks=- disk.map
     831859822246
     832}}}
     833
     834Both match the LBA reported by `smartctl`.
     835
     836==== Find affected file ====
     837
     838Get start offset of affected partition:
     839
     840{{{
     841root:~# fdisk --list /dev/sdb
     842...
     843Device     Boot Start        End    Sectors   Size Id Type
     844/dev/sdb1          63 1250258624 1250258562 596.2G  7 HPFS/NTFS/exFAT
     845}}}
     846
     847Get filesystem block (cluster) size if unknown (4096 in many cases):
     848
     849{{{
     850root:~# fsstat /dev/sdb1
     851...
     852File System Type: NTFS
     853...
     854Sector Size: 512
     855Cluster Size: 4096
     856...
     857}}}
     858
     859Calculate number of bad cluster as `(BAD_LBA - START_LBA) / SECTORS_PER_CLUSTER`:
     860
     861{{{
     862root:~# echo $(((859822246-63)/8))
     863107477772
     864}}}
     865
     866Find inode (here: MFT entry) used by this cluster:
     867
     868{{{
     869root:~# ifind -d 107477772 /dev/sdb1
     870663-128-2
     871}}}
     872
     873Print some info about this inode:
     874
     875{{{
     876root:~# istat /dev/sdb1 663-128-2
     877...
     878Name: Backup_2015-12-17.zip
     879Parent MFT Entry: 30    Sequence: 1
     880Allocated Size: 4660039680      Actual Size: 4660039516
     881Created:        2015-12-17 13:43:30.460000000 (CET)
     882File Modified:  2015-12-17 13:46:19.647000000 (CET)
     883...
     884Type: $DATA (128-2)   Name: N/A   Non-Resident   size: 4660039516  init_size: 4660039516
     885106950180 106950181 ...
     886...
     887107477772  <=== The bad cluster
     888...
     889108087884
     890}}}
     891
     892Find full path of affected file:
     893
     894{{{
     895root:~# ffind /dev/sdb1 663-128-2
     896/Backups/2015/Backup_2015-12-17.zip
     897}}}
     898
     899If the file is no longer needed, it could be overwritten in place and removed then. This is easy with `shred` from ''GNU coreutils'': `shred --iterations=1 --remove /PATH/TO/FILE`. This should reallocate the bad sector in most cases.
     900
     901==== Try to recover the bad sector ====
     902
     903Start with 100 read retries of the bad sector, write to `recovered.bin` if successful:
     904
     905{{{
     906root:~# ddrescue --ask --verbose --binary-prefixes --idirect --retry=100 \
     907                 --input-position=859822246s --output-position=0 --size=1s \
     908                 /dev/sdb recovered.bin recovered.map
     909...
     910Current status
     911     ipos: 419835 MiB, non-trimmed:        0 B,  current rate:      32 B/s
     912     opos:        0 B, non-scraped:        0 B,  average rate:       4 B/s
     913non-tried:        0 B,     errsize:        0 B,      run time:      1m 49s
     914  rescued:      512 B,      errors:        0,  remaining time:         n/a
     915percent rescued: 100.00%      time since last successful read:          0s
     916Finished
     917}}}
     918
     919We were very lucky:
     920
     921{{{
     922root:~# cat recovered.map
     923...
     924#      pos        size      status
     925  0x00000000  0x667FB14C00  ?
     9260x667FB14C00  0x00000200    +  <=== Now OK!
     9270x667FB14E00  0x2E8B541200  ?
     928}}}
     929
     930Check whether the disk firmware took the chance to reallocate the sector using the recovered data:
     931
     932{{{
     933root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin
     934dd: error reading ‘/dev/sdb’: Input/output error
     9350+0 records in
     9360+0 records out
     9370 bytes (0 B) copied, 23.5006 s, 0.0 kB/s
     938}}}
     939
     940No luck in this case. So overwrite the sector manually:
     941
     942{{{
     943root:~# dd seek=859822246 count=1 oflag=direct if=recovered.bin of=/dev/sdb
     9441+0 records in
     9451+0 records out
     946512 bytes (512 B) copied, 1.05331 s, 0.5 kB/s
     947}}}
     948
     949Read data back and check:
     950
     951{{{
     952root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin
     9531+0 records in
     9541+0 records out
     955512 bytes (512 B) copied, 0.0211745 s, 24.2 kB/s
     956
     957root:~# diff -s recovered.bin test.bin
     958Files recovered.bin and test.bin are identical
     959}}}
     960
     961Finally, run a SMART self-test and check its result:
     962
     963{{{
     964root:~# smartctl -t short /dev/sdb
     965...
     966Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
     967...
     968Please wait 2 minutes for test to complete.
     969
     970root:~# sleep 120 # :-)
     971
     972root:~# smartctl -x /dev/sdb
     973...
     974ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
     975...
     976  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0   <=== Interesting...
     977...
     978  9 Power_On_Hours          -O--CK   100   100   000    -    252
     979...
     980197 Current_Pending_Sector  -O--CK   100   100   000    -    0   <=== As expected
     981...
     982SMART Extended Self-test Log Version: 1 (2 sectors)
     983Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
     984# 1  Short offline       Completed without error       00%       252         -          <=== Works again!
     985# 2  Short offline       Completed: read failure       90%       176         859822246
     986}}}
     987
     988Interestingly the `Reallocated_Sector_Ct` did not increase. Either the firmware did not record the reallocation or decided to reuse the original sector.
     989
     990Done!
     991
    740992== Footnotes ==
    741993
     
    7621014[=#footnote11 [11]] Most window managers have a handy calculator that will do hex to decimal conversions.
    7631015
     1016[=#footnote12 [12]] See [https://cygwin.com/].
     1017
     1018[=#footnote13 [13]] See [https://www.gnu.org/software/ddrescue/]. Note that on Debian and Ubuntu the package is named [https://packages.debian.org/stable/gddrescue gddrescue] because the (no longer available) package ''ddrescue'' provided the tool [http://www.garloff.de/kurt/linux/ddrescue/ dd_rescue].
     1019
     1020[=#footnote14 [14]] See [https://www.sleuthkit.org/sleuthkit/].
     1021
     1022
    7641023== Changelog ==
    7651024
    7661025|| Date || Author || Description ||
     1026||2017-03-29||chrfranke||Add section Case Studies with a real-life example using ddrescue and sleuthkit on Windows||
    7671027||2009-08-11||dipohl||Add documentation improvements by Francesco Potorti` (​http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=540359)||
    7681028||2009-01-28||ballen4705||Incorporated suggestion from Danie Marais (https://sourceforge.net/p/smartmontools/mailman/message/21437469/)||