| 740 | == Case Studies == |
| 741 | |
| 742 | This section is intended to collect step-by-step descriptions of some real-life use cases. |
| 743 | |
| 744 | === Recovering a (mostly) unreadable sector of a Notebook HDD === |
| 745 | |
| 746 | This was done in March 2016 under Windows 7 using ''Cygwin''^[#footnote12 12.]^ ports of ''GNU ddrescue''^[#footnote13 13.]^ and ''The Sleuth Kit (TSK)''^[#footnote14 14.]^. All commands shown should work similar on other platforms and with other filesystems. |
| 747 | |
| 748 | ==== Determine Logical Block Address of unreadable sector ==== |
| 749 | |
| 750 | Examine smartctl output: |
| 751 | {{{ |
| 752 | root:~# smartctl -x /dev/sdb |
| 753 | smartctl 6.5 2016-02-29 r4227 [x86_64-w64-mingw32-win7-sp1] (daily-20160229) |
| 754 | ... |
| 755 | Model Family: SAMSUNG SpinPoint MP5 |
| 756 | Device Model: SAMSUNG HM640JJ |
| 757 | ... |
| 758 | Firmware Version: 2AK10001 |
| 759 | User Capacity: 640.135.028.736 bytes [640 GB] |
| 760 | Sector Size: 512 bytes logical/physical |
| 761 | Rotation Rate: 7200 rpm |
| 762 | Form Factor: 2.5 inches |
| 763 | ... |
| 764 | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
| 765 | ... |
| 766 | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 |
| 767 | ... |
| 768 | 9 Power_On_Hours -O--CK 100 100 000 - 251 <=== See Self-test Log below |
| 769 | ... |
| 770 | 197 Current_Pending_Sector -O--CK 100 100 000 - 1 <=== At least 1 bad sector |
| 771 | ... |
| 772 | SMART Extended Comprehensive Error Log Version: 1 (2 sectors) |
| 773 | Device Error Count: 351 (device log contains only the most recent 8 errors) |
| 774 | ... |
| 775 | Error 351 [6] occurred at disk power-on lifetime: 251 hours (10 days + 11 hours) |
| 776 | When the command that caused the error occurred, the device was active or idle. |
| 777 | |
| 778 | After command completion occurred, registers were: |
| 779 | ER -- ST COUNT LBA_48 LH LM LL DV DC |
| 780 | -- -- -- == -- == == == -- -- -- -- -- |
| 781 | 40 -- 51 00 01 00 00 33 3f d8 a6 40 00 Error: UNC 1 sectors at LBA = 0x333fd8a6 = 859822246 <=== Its LBA |
| 782 | |
| 783 | Commands leading to the command that caused the error were: |
| 784 | CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name |
| 785 | -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- |
| 786 | 25 00 00 00 01 00 00 33 3f d8 a6 40 00 00:00:06.924 READ DMA EXT |
| 787 | ... |
| 788 | |
| 789 | SMART Extended Self-test Log Version: 1 (2 sectors) |
| 790 | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
| 791 | # 1 Short offline Completed: read failure 90% 176 859822246 <=== Detected 75 power on hours ago |
| 792 | }}} |
| 793 | |
| 794 | A read scan helps to verify the LBA and checks for other possible bad sectors |
| 795 | (alternatively replace `/dev/null` by a file path to create a disk image): |
| 796 | |
| 797 | {{{ |
| 798 | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdb /dev/null disk.map |
| 799 | GNU ddrescue 1.21-rc2 |
| 800 | About to copy 610480 MiBytes from /dev/sdb [SAMSUNG HM640JJ::...] to /dev/null [0]. |
| 801 | Proceed (y/N)? y |
| 802 | ... |
| 803 | non-tried: 0 B, errsize: 512 B, run time: 2h |
| 804 | rescued: 610480 MiB, errors: 1, remaining time: n/a |
| 805 | percent rescued: 99.99% time since last successful read: 20s |
| 806 | Finished |
| 807 | }}} |
| 808 | |
| 809 | The `ddrescue` map file now shows byte ranges of good and bad disk areas: |
| 810 | |
| 811 | {{{ |
| 812 | root:~# cat disk.map |
| 813 | ... |
| 814 | # pos size status |
| 815 | 0x00000000 0x667FB14C00 + |
| 816 | 0x667FB14C00 0x00000200 - <=== 512 bytes unreadable |
| 817 | 0x667FB14E00 0x2E8B541200 + |
| 818 | }}} |
| 819 | |
| 820 | Translate the byte position to the LBA: |
| 821 | |
| 822 | {{{ |
| 823 | root:~# echo $((0x667FB14C00/512)) |
| 824 | 859822246 |
| 825 | }}} |
| 826 | |
| 827 | Or convert the map file to a `badblocks` like list with `ddrescuelog` (part of recent versions of ''ddrescue'' package): |
| 828 | |
| 829 | {{{ |
| 830 | root:~# ddrescuelog --list-blocks=- disk.map |
| 831 | 859822246 |
| 832 | }}} |
| 833 | |
| 834 | Both match the LBA reported by `smartctl`. |
| 835 | |
| 836 | ==== Find affected file ==== |
| 837 | |
| 838 | Get start offset of affected partition: |
| 839 | |
| 840 | {{{ |
| 841 | root:~# fdisk --list /dev/sdb |
| 842 | ... |
| 843 | Device Boot Start End Sectors Size Id Type |
| 844 | /dev/sdb1 63 1250258624 1250258562 596.2G 7 HPFS/NTFS/exFAT |
| 845 | }}} |
| 846 | |
| 847 | Get filesystem block (cluster) size if unknown (4096 in many cases): |
| 848 | |
| 849 | {{{ |
| 850 | root:~# fsstat /dev/sdb1 |
| 851 | ... |
| 852 | File System Type: NTFS |
| 853 | ... |
| 854 | Sector Size: 512 |
| 855 | Cluster Size: 4096 |
| 856 | ... |
| 857 | }}} |
| 858 | |
| 859 | Calculate number of bad cluster as `(BAD_LBA - START_LBA) / SECTORS_PER_CLUSTER`: |
| 860 | |
| 861 | {{{ |
| 862 | root:~# echo $(((859822246-63)/8)) |
| 863 | 107477772 |
| 864 | }}} |
| 865 | |
| 866 | Find inode (here: MFT entry) used by this cluster: |
| 867 | |
| 868 | {{{ |
| 869 | root:~# ifind -d 107477772 /dev/sdb1 |
| 870 | 663-128-2 |
| 871 | }}} |
| 872 | |
| 873 | Print some info about this inode: |
| 874 | |
| 875 | {{{ |
| 876 | root:~# istat /dev/sdb1 663-128-2 |
| 877 | ... |
| 878 | Name: Backup_2015-12-17.zip |
| 879 | Parent MFT Entry: 30 Sequence: 1 |
| 880 | Allocated Size: 4660039680 Actual Size: 4660039516 |
| 881 | Created: 2015-12-17 13:43:30.460000000 (CET) |
| 882 | File Modified: 2015-12-17 13:46:19.647000000 (CET) |
| 883 | ... |
| 884 | Type: $DATA (128-2) Name: N/A Non-Resident size: 4660039516 init_size: 4660039516 |
| 885 | 106950180 106950181 ... |
| 886 | ... |
| 887 | 107477772 <=== The bad cluster |
| 888 | ... |
| 889 | 108087884 |
| 890 | }}} |
| 891 | |
| 892 | Find full path of affected file: |
| 893 | |
| 894 | {{{ |
| 895 | root:~# ffind /dev/sdb1 663-128-2 |
| 896 | /Backups/2015/Backup_2015-12-17.zip |
| 897 | }}} |
| 898 | |
| 899 | If the file is no longer needed, it could be overwritten in place and removed then. This is easy with `shred` from ''GNU coreutils'': `shred --iterations=1 --remove /PATH/TO/FILE`. This should reallocate the bad sector in most cases. |
| 900 | |
| 901 | ==== Try to recover the bad sector ==== |
| 902 | |
| 903 | Start with 100 read retries of the bad sector, write to `recovered.bin` if successful: |
| 904 | |
| 905 | {{{ |
| 906 | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --retry=100 \ |
| 907 | --input-position=859822246s --output-position=0 --size=1s \ |
| 908 | /dev/sdb recovered.bin recovered.map |
| 909 | ... |
| 910 | Current status |
| 911 | ipos: 419835 MiB, non-trimmed: 0 B, current rate: 32 B/s |
| 912 | opos: 0 B, non-scraped: 0 B, average rate: 4 B/s |
| 913 | non-tried: 0 B, errsize: 0 B, run time: 1m 49s |
| 914 | rescued: 512 B, errors: 0, remaining time: n/a |
| 915 | percent rescued: 100.00% time since last successful read: 0s |
| 916 | Finished |
| 917 | }}} |
| 918 | |
| 919 | We were very lucky: |
| 920 | |
| 921 | {{{ |
| 922 | root:~# cat recovered.map |
| 923 | ... |
| 924 | # pos size status |
| 925 | 0x00000000 0x667FB14C00 ? |
| 926 | 0x667FB14C00 0x00000200 + <=== Now OK! |
| 927 | 0x667FB14E00 0x2E8B541200 ? |
| 928 | }}} |
| 929 | |
| 930 | Check whether the disk firmware took the chance to reallocate the sector using the recovered data: |
| 931 | |
| 932 | {{{ |
| 933 | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
| 934 | dd: error reading ‘/dev/sdb’: Input/output error |
| 935 | 0+0 records in |
| 936 | 0+0 records out |
| 937 | 0 bytes (0 B) copied, 23.5006 s, 0.0 kB/s |
| 938 | }}} |
| 939 | |
| 940 | No luck in this case. So overwrite the sector manually: |
| 941 | |
| 942 | {{{ |
| 943 | root:~# dd seek=859822246 count=1 oflag=direct if=recovered.bin of=/dev/sdb |
| 944 | 1+0 records in |
| 945 | 1+0 records out |
| 946 | 512 bytes (512 B) copied, 1.05331 s, 0.5 kB/s |
| 947 | }}} |
| 948 | |
| 949 | Read data back and check: |
| 950 | |
| 951 | {{{ |
| 952 | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
| 953 | 1+0 records in |
| 954 | 1+0 records out |
| 955 | 512 bytes (512 B) copied, 0.0211745 s, 24.2 kB/s |
| 956 | |
| 957 | root:~# diff -s recovered.bin test.bin |
| 958 | Files recovered.bin and test.bin are identical |
| 959 | }}} |
| 960 | |
| 961 | Finally, run a SMART self-test and check its result: |
| 962 | |
| 963 | {{{ |
| 964 | root:~# smartctl -t short /dev/sdb |
| 965 | ... |
| 966 | Sending command: "Execute SMART Short self-test routine immediately in off-line mode". |
| 967 | ... |
| 968 | Please wait 2 minutes for test to complete. |
| 969 | |
| 970 | root:~# sleep 120 # :-) |
| 971 | |
| 972 | root:~# smartctl -x /dev/sdb |
| 973 | ... |
| 974 | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
| 975 | ... |
| 976 | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 <=== Interesting... |
| 977 | ... |
| 978 | 9 Power_On_Hours -O--CK 100 100 000 - 252 |
| 979 | ... |
| 980 | 197 Current_Pending_Sector -O--CK 100 100 000 - 0 <=== As expected |
| 981 | ... |
| 982 | SMART Extended Self-test Log Version: 1 (2 sectors) |
| 983 | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
| 984 | # 1 Short offline Completed without error 00% 252 - <=== Works again! |
| 985 | # 2 Short offline Completed: read failure 90% 176 859822246 |
| 986 | }}} |
| 987 | |
| 988 | Interestingly the `Reallocated_Sector_Ct` did not increase. Either the firmware did not record the reallocation or decided to reuse the original sector. |
| 989 | |
| 990 | Done! |
| 991 | |