| 501 | === Bad block reassignment === |
| 502 | |
| 503 | The SCSI disk command set and associated disk architecture are assumed in this section. SCSI disks have their own logical to physical mapping allowing a damaged sector (usually carrying 512 bytes of data) to be remapped irrespective of the operating system, file system or software RAID being used. |
| 504 | |
| 505 | The terms ''block and sector'' are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a ''logical block''. |
| 506 | |
| 507 | When a SCSI disk is formatted, defective sectors identified during the manufacturing process (the so called primary list: PLIST), those found during the format itself (the certification list: CLIST), those given explicitly to the format command (the DLIST) and optionally the previous grown list (GLIST) are not used in the logical block map. The number (and low level addresses) of the unmapped sectors can be found with the `READ DEFECT DATA SCSI` command. |
| 508 | |
| 509 | SCSI disks tend to be divided into zones which have spare sectors and perhaps spare tracks, to support the logical block address mapping process. The idea is that if a logical block is remapped, the heads do not have to move a long way to access the replacement sector. Note that spare sectors are a scarce resource. |
| 510 | |
| 511 | Once a SCSI disk format has completed successfully, other problems may appear over time. These fall into two categories: |
| 512 | |
| 513 | * recoverable: the Error Correction Codes (ECC) detect a problem but it is small enough to be corrected. Optionally other strategies such as retrying the access may retrieve the data. |
| 514 | * unrecoverable: try as it may, the disk logic and ECC algorithms cannot recover the data. This is often reported as a ''medium error''. |
| 515 | |
| 516 | Other things can go wrong, typically associated with the transport and they will be reported using a term other than ''medium error''. For example a disk may decide a read operation was successful but a computer's host bus adapter (HBA) checking the incoming data detects a CRC error due to a bad cable or termination. |
| 517 | |
| 518 | Depending on the disk vendor, recoverable errors can be ignored. After all, some disks have up to 68 bytes of ECC above the payload size of 512 bytes so why use up spare sectors which are limited in number ^[#footnote8 [8]]^ ? If the disk can recover the data and does decide to re-allocate (reassign) a sector, then first it checks the settings of the `ARRE` and `AWRE` bits in the read-write error recovery mode page. Usually these bits are set ^[#footnote9 [9]]^ enabling automatic (read or write) re-allocation. The automatic re-allocation may also fail if the zone (or disk) has run out of spare sectors. |
| 519 | |
| 520 | Another consideration with RAIDs, and applications that require a high data rate without pauses, is that the controller logic may not want a disk to spend too long trying to recover an error. |
| 521 | |
| 522 | Unrecoverable errors will cause a ''medium error'' sense key, perhaps with some useful additional sense information. If the extended background self test includes a full disk read scan, one would expect the self test log to list the bad block, as shown in section [#Repairsinafilesystem Repairs in a file system]. Recent SCSI disks with a periodic background scan should also list unrecoverable read errors (and some recoverable errors as well). The advantage of the background scan is that it runs to completion while self tests will often terminate at the first serious error. |
| 523 | |
| 524 | SCSI disks expect unrecoverable errors to be fixed manually using the `REASSIGN BLOCKS SCSI` command since loss of data is involved. It is possible that an operating system or a file system could issue the `REASSIGN BLOCKS` command itself but the authors are unaware of any examples. The `REASSIGN BLOCKS` command will reassign one or more blocks, attempting to (partially ?) recover the data (a forlorn hope at this stage), fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name ''grown'' list). The contents of the GLIST may not be that interesting but `smartctl` prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life. |
| 525 | |
| 526 | Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. |
| 527 | |
| 528 | ==== Example ==== |
| 529 | |
| 530 | Given a ''bad block'', it still may be useful to look at the `fdisk` command (if the disk has multiple partitions) to find out which partition is involved, then use `debugfs` (or a similar tool for the file system in question) to find out which, if any, file or other part of the file system may have been damaged. This is discussed in section [#Repairsinafilesystem Repairs in a file system]. |
| 531 | |
| 532 | Then a program that can execute the `REASSIGN BLOCKS SCSI` command is required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows the author's `sg_reassign` utility in the `sg3_utils` package can be used. Also found in that package is `sg_verify` which can be used to check that a block is readable. |
| 533 | |
| 534 | Assume that `logical block address 1193046` (which is `123456` in hex) is corrupt ^[#footnote10 [10]]^ on the disk at `/dev/sdb`. A long selftest command like `smartctl -t long /dev/sdb` may result in log results like this: |
| 535 | |
| 536 | {{{ |
| 537 | # smartctl -l selftest /dev/sdb |
| 538 | smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen |
| 539 | Home page is http://smartmontools.sourceforge.net/ |
| 540 | SMART Self-test log |
| 541 | Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] |
| 542 | Description number (hours) |
| 543 | # 1 Background long Failed in segment - 354 1193046 [0x3 0x11 0x0] |
| 544 | # 2 Background short Completed - 323 - [- - -] |
| 545 | # 3 Background short Completed - 194 - [- - -] |
| 546 | }}} |
| 547 | |
| 548 | The `sg_verify` utility can be used to confirm that there is a problem at that address: |
| 549 | |
| 550 | {{{ |
| 551 | # sg_verify --lba=1193046 /dev/sdb |
| 552 | verify (10): Fixed format, current; Sense key: Medium Error |
| 553 | Additional sense: Unrecovered read error |
| 554 | Info fld=0x123456 [1193046] |
| 555 | Field replaceable unit code: 228 |
| 556 | Actual retry count: 0x008b |
| 557 | medium or hardware error, reported lba=0x123456 |
| 558 | }}} |
| 559 | |
| 560 | Now the GLIST length is checked before the block reassignment: |
| 561 | |
| 562 | {{{ |
| 563 | # sg_reassign --grown /dev/sdb |
| 564 | >> Elements in grown defect list: 0 |
| 565 | }}} |
| 566 | |
| 567 | And now for the actual reassignment followed by another check of the GLIST length: |
| 568 | |
| 569 | {{{ |
| 570 | # sg_reassign --address=1193046 /dev/sdb |
| 571 | # sg_reassign --grown /dev/sdb |
| 572 | >> Elements in grown defect list: 1 |
| 573 | }}} |
| 574 | |
| 575 | The GLIST length has grown by one as expected. If the disk was unable to recover any data, then the ''new'' block at lba `0x123456` has vendor specific data in it. The `sg_reassign` utility can also do bulk reassigns, see `man sg_reassign` for more information. |
| 576 | |
| 577 | The `dd` command could be used to read the contents of the ''new'' block: |
| 578 | |
| 579 | {{{ |
| 580 | # dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1 |
| 581 | }}} |
| 582 | |
| 583 | and a hex editor ^[#footnote11 [11]]^ used to view and potentially change the `blk.img` file. An altered `blk.img` file (or `/dev/zero`) could be written back with: |
| 584 | |
| 585 | {{{ |
| 586 | # dd if=blk.img of=/dev/sdb seek=1193046 oflag=direct bs=512 count=1 |
| 587 | }}} |
| 588 | |
| 589 | More work may be needed at the file system level, especially if the reassigned block held critical file system information such as a superblock or a directory. |
| 590 | |
| 591 | Even if a full backup of the disk is available, or the disk has been ''ejected'' from a RAID, it may still be worthwhile to reassign the bad block(s) that caused the problem (or simply format the disk (see `sg_format` in the `sg3_utils package`)) and re-use the disk later (not unlike the way a replacement disk from a manufacturer might be used). |
| 592 | |
| 593 | |
| 594 | == Case Studies == |
| 595 | |
| 596 | This section is intended to collect step-by-step descriptions of some real-life use cases. |
| 597 | |
| 598 | === Recovering a (mostly) unreadable sector of a Notebook HDD === |
| 599 | |
| 600 | This was done in March 2016 under Windows 7 using ''Cygwin''^[#footnote12 12.]^ ports of ''GNU ddrescue''^[#footnote13 13.]^ and ''The Sleuth Kit (TSK)''^[#footnote14 14.]^. All commands shown should work similar on other platforms and with other filesystems. |
| 601 | |
| 602 | ==== Determine Logical Block Address of unreadable sector ==== |
| 603 | |
| 604 | Examine smartctl output: |
| 605 | {{{ |
| 606 | root:~# smartctl -x /dev/sdb |
| 607 | smartctl 6.5 2016-02-29 r4227 [x86_64-w64-mingw32-win7-sp1] (daily-20160229) |
| 608 | ... |
| 609 | Model Family: SAMSUNG SpinPoint MP5 |
| 610 | Device Model: SAMSUNG HM640JJ |
| 611 | ... |
| 612 | Firmware Version: 2AK10001 |
| 613 | User Capacity: 640.135.028.736 bytes [640 GB] |
| 614 | Sector Size: 512 bytes logical/physical |
| 615 | Rotation Rate: 7200 rpm |
| 616 | Form Factor: 2.5 inches |
| 617 | ... |
| 618 | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
| 619 | ... |
| 620 | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 |
| 621 | ... |
| 622 | 9 Power_On_Hours -O--CK 100 100 000 - 251 <=== See Self-test Log below |
| 623 | ... |
| 624 | 197 Current_Pending_Sector -O--CK 100 100 000 - 1 <=== At least 1 bad sector |
| 625 | ... |
| 626 | SMART Extended Comprehensive Error Log Version: 1 (2 sectors) |
| 627 | Device Error Count: 351 (device log contains only the most recent 8 errors) |
| 628 | ... |
| 629 | Error 351 [6] occurred at disk power-on lifetime: 251 hours (10 days + 11 hours) |
| 630 | When the command that caused the error occurred, the device was active or idle. |
| 631 | |
| 632 | After command completion occurred, registers were: |
| 633 | ER -- ST COUNT LBA_48 LH LM LL DV DC |
| 634 | -- -- -- == -- == == == -- -- -- -- -- |
| 635 | 40 -- 51 00 01 00 00 33 3f d8 a6 40 00 Error: UNC 1 sectors at LBA = 0x333fd8a6 = 859822246 <=== Its LBA |
| 636 | |
| 637 | Commands leading to the command that caused the error were: |
| 638 | CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name |
| 639 | -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- |
| 640 | 25 00 00 00 01 00 00 33 3f d8 a6 40 00 00:00:06.924 READ DMA EXT |
| 641 | ... |
| 642 | |
| 643 | SMART Extended Self-test Log Version: 1 (2 sectors) |
| 644 | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
| 645 | # 1 Short offline Completed: read failure 90% 176 859822246 <=== Detected 75 power on hours ago |
| 646 | }}} |
| 647 | |
| 648 | A read scan helps to verify the LBA and checks for other possible bad sectors |
| 649 | (alternatively replace `/dev/null` by a file path to create a disk image): |
| 650 | |
| 651 | {{{ |
| 652 | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdb /dev/null disk.map |
| 653 | GNU ddrescue 1.21-rc2 |
| 654 | About to copy 610480 MiBytes from /dev/sdb [SAMSUNG HM640JJ::...] to /dev/null [0]. |
| 655 | Proceed (y/N)? y |
| 656 | ... |
| 657 | non-tried: 0 B, errsize: 512 B, run time: 2h |
| 658 | rescued: 610480 MiB, errors: 1, remaining time: n/a |
| 659 | percent rescued: 99.99% time since last successful read: 20s |
| 660 | Finished |
| 661 | }}} |
| 662 | |
| 663 | The `ddrescue` map file now shows byte ranges of good and bad disk areas: |
| 664 | |
| 665 | {{{ |
| 666 | root:~# cat disk.map |
| 667 | ... |
| 668 | # pos size status |
| 669 | 0x00000000 0x667FB14C00 + |
| 670 | 0x667FB14C00 0x00000200 - <=== 512 bytes unreadable |
| 671 | 0x667FB14E00 0x2E8B541200 + |
| 672 | }}} |
| 673 | |
| 674 | Translate the byte position to the LBA: |
| 675 | |
| 676 | {{{ |
| 677 | root:~# echo $((0x667FB14C00/512)) |
| 678 | 859822246 |
| 679 | }}} |
| 680 | |
| 681 | Or convert the map file to a `badblocks` like list with `ddrescuelog` (part of recent versions of ''ddrescue'' package): |
| 682 | |
| 683 | {{{ |
| 684 | root:~# ddrescuelog --list-blocks=- disk.map |
| 685 | 859822246 |
| 686 | }}} |
| 687 | |
| 688 | Both match the LBA reported by `smartctl`. |
| 689 | |
| 690 | ==== Find affected file ==== |
| 691 | |
| 692 | Get start offset of affected partition: |
| 693 | |
| 694 | {{{ |
| 695 | root:~# fdisk --list /dev/sdb |
| 696 | ... |
| 697 | Device Boot Start End Sectors Size Id Type |
| 698 | /dev/sdb1 63 1250258624 1250258562 596.2G 7 HPFS/NTFS/exFAT |
| 699 | }}} |
| 700 | |
| 701 | Get filesystem block (cluster) size if unknown (4096 in many cases): |
| 702 | |
| 703 | {{{ |
| 704 | root:~# fsstat /dev/sdb1 |
| 705 | ... |
| 706 | File System Type: NTFS |
| 707 | ... |
| 708 | Sector Size: 512 |
| 709 | Cluster Size: 4096 |
| 710 | ... |
| 711 | }}} |
| 712 | |
| 713 | Calculate number of bad cluster as `(BAD_LBA - START_LBA) / SECTORS_PER_CLUSTER`: |
| 714 | |
| 715 | {{{ |
| 716 | root:~# echo $(((859822246-63)/8)) |
| 717 | 107477772 |
| 718 | }}} |
| 719 | |
| 720 | Find inode (here: MFT entry) used by this cluster: |
| 721 | |
| 722 | {{{ |
| 723 | root:~# ifind -d 107477772 /dev/sdb1 |
| 724 | 663-128-2 |
| 725 | }}} |
| 726 | |
| 727 | Print some info about this inode: |
| 728 | |
| 729 | {{{ |
| 730 | root:~# istat /dev/sdb1 663-128-2 |
| 731 | ... |
| 732 | Name: Backup_2015-12-17.zip |
| 733 | Parent MFT Entry: 30 Sequence: 1 |
| 734 | Allocated Size: 4660039680 Actual Size: 4660039516 |
| 735 | Created: 2015-12-17 13:43:30.460000000 (CET) |
| 736 | File Modified: 2015-12-17 13:46:19.647000000 (CET) |
| 737 | ... |
| 738 | Type: $DATA (128-2) Name: N/A Non-Resident size: 4660039516 init_size: 4660039516 |
| 739 | 106950180 106950181 ... |
| 740 | ... |
| 741 | 107477772 <=== The bad cluster |
| 742 | ... |
| 743 | 108087884 |
| 744 | }}} |
| 745 | |
| 746 | Find full path of affected file: |
| 747 | |
| 748 | {{{ |
| 749 | root:~# ffind /dev/sdb1 663-128-2 |
| 750 | /Backups/2015/Backup_2015-12-17.zip |
| 751 | }}} |
| 752 | |
| 753 | If the file is no longer needed, it could be overwritten in place and removed then. This is easy with `shred` from ''GNU coreutils'': `shred --iterations=1 --remove /PATH/TO/FILE`. This should reallocate the bad sector in most cases. |
| 754 | |
| 755 | ==== Try to recover the bad sector ==== |
| 756 | |
| 757 | Start with 100 read retries of the bad sector, write to `recovered.bin` if successful: |
| 758 | |
| 759 | {{{ |
| 760 | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --retry=100 \ |
| 761 | --input-position=859822246s --output-position=0 --size=1s \ |
| 762 | /dev/sdb recovered.bin recovered.map |
| 763 | ... |
| 764 | Current status |
| 765 | ipos: 419835 MiB, non-trimmed: 0 B, current rate: 32 B/s |
| 766 | opos: 0 B, non-scraped: 0 B, average rate: 4 B/s |
| 767 | non-tried: 0 B, errsize: 0 B, run time: 1m 49s |
| 768 | rescued: 512 B, errors: 0, remaining time: n/a |
| 769 | percent rescued: 100.00% time since last successful read: 0s |
| 770 | Finished |
| 771 | }}} |
| 772 | |
| 773 | We were very lucky: |
| 774 | |
| 775 | {{{ |
| 776 | root:~# cat recovered.map |
| 777 | ... |
| 778 | # pos size status |
| 779 | 0x00000000 0x667FB14C00 ? |
| 780 | 0x667FB14C00 0x00000200 + <=== Now OK! |
| 781 | 0x667FB14E00 0x2E8B541200 ? |
| 782 | }}} |
| 783 | |
| 784 | Check whether the disk firmware took the chance to reallocate the sector using the recovered data: |
| 785 | |
| 786 | {{{ |
| 787 | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
| 788 | dd: error reading ‘/dev/sdb’: Input/output error |
| 789 | 0+0 records in |
| 790 | 0+0 records out |
| 791 | 0 bytes (0 B) copied, 23.5006 s, 0.0 kB/s |
| 792 | }}} |
| 793 | |
| 794 | No luck in this case. So overwrite the sector manually: |
| 795 | |
| 796 | {{{ |
| 797 | root:~# dd seek=859822246 count=1 oflag=direct if=recovered.bin of=/dev/sdb |
| 798 | 1+0 records in |
| 799 | 1+0 records out |
| 800 | 512 bytes (512 B) copied, 1.05331 s, 0.5 kB/s |
| 801 | }}} |
| 802 | |
| 803 | Read data back and check: |
| 804 | |
| 805 | {{{ |
| 806 | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
| 807 | 1+0 records in |
| 808 | 1+0 records out |
| 809 | 512 bytes (512 B) copied, 0.0211745 s, 24.2 kB/s |
| 810 | |
| 811 | root:~# diff -s recovered.bin test.bin |
| 812 | Files recovered.bin and test.bin are identical |
| 813 | }}} |
| 814 | |
| 815 | Finally, run a SMART self-test and check its result: |
| 816 | |
| 817 | {{{ |
| 818 | root:~# smartctl -t short /dev/sdb |
| 819 | ... |
| 820 | Sending command: "Execute SMART Short self-test routine immediately in off-line mode". |
| 821 | ... |
| 822 | Please wait 2 minutes for test to complete. |
| 823 | |
| 824 | root:~# sleep 120 # :-) |
| 825 | |
| 826 | root:~# smartctl -x /dev/sdb |
| 827 | ... |
| 828 | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
| 829 | ... |
| 830 | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 <=== Interesting... |
| 831 | ... |
| 832 | 9 Power_On_Hours -O--CK 100 100 000 - 252 |
| 833 | ... |
| 834 | 197 Current_Pending_Sector -O--CK 100 100 000 - 0 <=== As expected |
| 835 | ... |
| 836 | SMART Extended Self-test Log Version: 1 (2 sectors) |
| 837 | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
| 838 | # 1 Short offline Completed without error 00% 252 - <=== Works again! |
| 839 | # 2 Short offline Completed: read failure 90% 176 859822246 |
| 840 | }}} |
| 841 | |
| 842 | Interestingly the `Reallocated_Sector_Ct` did not increase. Either the firmware did not record the reallocation or decided to reuse the original sector. |
| 843 | |
| 844 | Done! |
645 | | Et voilà ! |
646 | | |
647 | | === Bad block reassignment === |
648 | | |
649 | | The SCSI disk command set and associated disk architecture are assumed in this section. SCSI disks have their own logical to physical mapping allowing a damaged sector (usually carrying 512 bytes of data) to be remapped irrespective of the operating system, file system or software RAID being used. |
650 | | |
651 | | The terms ''block and sector'' are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a ''logical block''. |
652 | | |
653 | | When a SCSI disk is formatted, defective sectors identified during the manufacturing process (the so called primary list: PLIST), those found during the format itself (the certification list: CLIST), those given explicitly to the format command (the DLIST) and optionally the previous grown list (GLIST) are not used in the logical block map. The number (and low level addresses) of the unmapped sectors can be found with the `READ DEFECT DATA SCSI` command. |
654 | | |
655 | | SCSI disks tend to be divided into zones which have spare sectors and perhaps spare tracks, to support the logical block address mapping process. The idea is that if a logical block is remapped, the heads do not have to move a long way to access the replacement sector. Note that spare sectors are a scarce resource. |
656 | | |
657 | | Once a SCSI disk format has completed successfully, other problems may appear over time. These fall into two categories: |
658 | | |
659 | | * recoverable: the Error Correction Codes (ECC) detect a problem but it is small enough to be corrected. Optionally other strategies such as retrying the access may retrieve the data. |
660 | | * unrecoverable: try as it may, the disk logic and ECC algorithms cannot recover the data. This is often reported as a ''medium error''. |
661 | | |
662 | | Other things can go wrong, typically associated with the transport and they will be reported using a term other than ''medium error''. For example a disk may decide a read operation was successful but a computer's host bus adapter (HBA) checking the incoming data detects a CRC error due to a bad cable or termination. |
663 | | |
664 | | Depending on the disk vendor, recoverable errors can be ignored. After all, some disks have up to 68 bytes of ECC above the payload size of 512 bytes so why use up spare sectors which are limited in number ^[#footnote8 [8]]^ ? If the disk can recover the data and does decide to re-allocate (reassign) a sector, then first it checks the settings of the `ARRE` and `AWRE` bits in the read-write error recovery mode page. Usually these bits are set ^[#footnote9 [9]]^ enabling automatic (read or write) re-allocation. The automatic re-allocation may also fail if the zone (or disk) has run out of spare sectors. |
665 | | |
666 | | Another consideration with RAIDs, and applications that require a high data rate without pauses, is that the controller logic may not want a disk to spend too long trying to recover an error. |
667 | | |
668 | | Unrecoverable errors will cause a ''medium error'' sense key, perhaps with some useful additional sense information. If the extended background self test includes a full disk read scan, one would expect the self test log to list the bad block, as shown in section [#Repairsinafilesystem Repairs in a file system]. Recent SCSI disks with a periodic background scan should also list unrecoverable read errors (and some recoverable errors as well). The advantage of the background scan is that it runs to completion while self tests will often terminate at the first serious error. |
669 | | |
670 | | SCSI disks expect unrecoverable errors to be fixed manually using the `REASSIGN BLOCKS SCSI` command since loss of data is involved. It is possible that an operating system or a file system could issue the `REASSIGN BLOCKS` command itself but the authors are unaware of any examples. The `REASSIGN BLOCKS` command will reassign one or more blocks, attempting to (partially ?) recover the data (a forlorn hope at this stage), fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name ''grown'' list). The contents of the GLIST may not be that interesting but `smartctl` prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life. |
671 | | |
672 | | Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. |
673 | | |
674 | | ==== Example ==== |
675 | | |
676 | | Given a ''bad block'', it still may be useful to look at the `fdisk` command (if the disk has multiple partitions) to find out which partition is involved, then use `debugfs` (or a similar tool for the file system in question) to find out which, if any, file or other part of the file system may have been damaged. This is discussed in section [#Repairsinafilesystem Repairs in a file system]. |
677 | | |
678 | | Then a program that can execute the `REASSIGN BLOCKS SCSI` command is required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows the author's `sg_reassign` utility in the `sg3_utils` package can be used. Also found in that package is `sg_verify` which can be used to check that a block is readable. |
679 | | |
680 | | Assume that `logical block address 1193046` (which is `123456` in hex) is corrupt ^[#footnote10 [10]]^ on the disk at `/dev/sdb`. A long selftest command like `smartctl -t long /dev/sdb` may result in log results like this: |
681 | | |
682 | | {{{ |
683 | | # smartctl -l selftest /dev/sdb |
684 | | smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen |
685 | | Home page is http://smartmontools.sourceforge.net/ |
686 | | SMART Self-test log |
687 | | Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] |
688 | | Description number (hours) |
689 | | # 1 Background long Failed in segment - 354 1193046 [0x3 0x11 0x0] |
690 | | # 2 Background short Completed - 323 - [- - -] |
691 | | # 3 Background short Completed - 194 - [- - -] |
692 | | }}} |
693 | | |
694 | | The `sg_verify` utility can be used to confirm that there is a problem at that address: |
695 | | |
696 | | {{{ |
697 | | # sg_verify --lba=1193046 /dev/sdb |
698 | | verify (10): Fixed format, current; Sense key: Medium Error |
699 | | Additional sense: Unrecovered read error |
700 | | Info fld=0x123456 [1193046] |
701 | | Field replaceable unit code: 228 |
702 | | Actual retry count: 0x008b |
703 | | medium or hardware error, reported lba=0x123456 |
704 | | }}} |
705 | | |
706 | | Now the GLIST length is checked before the block reassignment: |
707 | | |
708 | | {{{ |
709 | | # sg_reassign --grown /dev/sdb |
710 | | >> Elements in grown defect list: 0 |
711 | | }}} |
712 | | |
713 | | And now for the actual reassignment followed by another check of the GLIST length: |
714 | | |
715 | | {{{ |
716 | | # sg_reassign --address=1193046 /dev/sdb |
717 | | # sg_reassign --grown /dev/sdb |
718 | | >> Elements in grown defect list: 1 |
719 | | }}} |
720 | | |
721 | | The GLIST length has grown by one as expected. If the disk was unable to recover any data, then the ''new'' block at lba `0x123456` has vendor specific data in it. The `sg_reassign` utility can also do bulk reassigns, see `man sg_reassign` for more information. |
722 | | |
723 | | The `dd` command could be used to read the contents of the ''new'' block: |
724 | | |
725 | | {{{ |
726 | | # dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1 |
727 | | }}} |
728 | | |
729 | | and a hex editor ^[#footnote11 [11]]^ used to view and potentially change the `blk.img` file. An altered `blk.img` file (or `/dev/zero`) could be written back with: |
730 | | |
731 | | {{{ |
732 | | # dd if=blk.img of=/dev/sdb seek=1193046 oflag=direct bs=512 count=1 |
733 | | }}} |
734 | | |
735 | | More work may be needed at the file system level, especially if the reassigned block held critical file system information such as a superblock or a directory. |
736 | | |
737 | | Even if a full backup of the disk is available, or the disk has been ''ejected'' from a RAID, it may still be worthwhile to reassign the bad block(s) that caused the problem (or simply format the disk (see `sg_format` in the `sg3_utils package`)) and re-use the disk later (not unlike the way a replacement disk from a manufacturer might be used). |
738 | | |
739 | | |
740 | | == Case Studies == |
741 | | |
742 | | This section is intended to collect step-by-step descriptions of some real-life use cases. |
743 | | |
744 | | === Recovering a (mostly) unreadable sector of a Notebook HDD === |
745 | | |
746 | | This was done in March 2016 under Windows 7 using ''Cygwin''^[#footnote12 12.]^ ports of ''GNU ddrescue''^[#footnote13 13.]^ and ''The Sleuth Kit (TSK)''^[#footnote14 14.]^. All commands shown should work similar on other platforms and with other filesystems. |
747 | | |
748 | | ==== Determine Logical Block Address of unreadable sector ==== |
749 | | |
750 | | Examine smartctl output: |
751 | | {{{ |
752 | | root:~# smartctl -x /dev/sdb |
753 | | smartctl 6.5 2016-02-29 r4227 [x86_64-w64-mingw32-win7-sp1] (daily-20160229) |
754 | | ... |
755 | | Model Family: SAMSUNG SpinPoint MP5 |
756 | | Device Model: SAMSUNG HM640JJ |
757 | | ... |
758 | | Firmware Version: 2AK10001 |
759 | | User Capacity: 640.135.028.736 bytes [640 GB] |
760 | | Sector Size: 512 bytes logical/physical |
761 | | Rotation Rate: 7200 rpm |
762 | | Form Factor: 2.5 inches |
763 | | ... |
764 | | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
765 | | ... |
766 | | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 |
767 | | ... |
768 | | 9 Power_On_Hours -O--CK 100 100 000 - 251 <=== See Self-test Log below |
769 | | ... |
770 | | 197 Current_Pending_Sector -O--CK 100 100 000 - 1 <=== At least 1 bad sector |
771 | | ... |
772 | | SMART Extended Comprehensive Error Log Version: 1 (2 sectors) |
773 | | Device Error Count: 351 (device log contains only the most recent 8 errors) |
774 | | ... |
775 | | Error 351 [6] occurred at disk power-on lifetime: 251 hours (10 days + 11 hours) |
776 | | When the command that caused the error occurred, the device was active or idle. |
777 | | |
778 | | After command completion occurred, registers were: |
779 | | ER -- ST COUNT LBA_48 LH LM LL DV DC |
780 | | -- -- -- == -- == == == -- -- -- -- -- |
781 | | 40 -- 51 00 01 00 00 33 3f d8 a6 40 00 Error: UNC 1 sectors at LBA = 0x333fd8a6 = 859822246 <=== Its LBA |
782 | | |
783 | | Commands leading to the command that caused the error were: |
784 | | CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name |
785 | | -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- |
786 | | 25 00 00 00 01 00 00 33 3f d8 a6 40 00 00:00:06.924 READ DMA EXT |
787 | | ... |
788 | | |
789 | | SMART Extended Self-test Log Version: 1 (2 sectors) |
790 | | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
791 | | # 1 Short offline Completed: read failure 90% 176 859822246 <=== Detected 75 power on hours ago |
792 | | }}} |
793 | | |
794 | | A read scan helps to verify the LBA and checks for other possible bad sectors |
795 | | (alternatively replace `/dev/null` by a file path to create a disk image): |
796 | | |
797 | | {{{ |
798 | | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --force /dev/sdb /dev/null disk.map |
799 | | GNU ddrescue 1.21-rc2 |
800 | | About to copy 610480 MiBytes from /dev/sdb [SAMSUNG HM640JJ::...] to /dev/null [0]. |
801 | | Proceed (y/N)? y |
802 | | ... |
803 | | non-tried: 0 B, errsize: 512 B, run time: 2h |
804 | | rescued: 610480 MiB, errors: 1, remaining time: n/a |
805 | | percent rescued: 99.99% time since last successful read: 20s |
806 | | Finished |
807 | | }}} |
808 | | |
809 | | The `ddrescue` map file now shows byte ranges of good and bad disk areas: |
810 | | |
811 | | {{{ |
812 | | root:~# cat disk.map |
813 | | ... |
814 | | # pos size status |
815 | | 0x00000000 0x667FB14C00 + |
816 | | 0x667FB14C00 0x00000200 - <=== 512 bytes unreadable |
817 | | 0x667FB14E00 0x2E8B541200 + |
818 | | }}} |
819 | | |
820 | | Translate the byte position to the LBA: |
821 | | |
822 | | {{{ |
823 | | root:~# echo $((0x667FB14C00/512)) |
824 | | 859822246 |
825 | | }}} |
826 | | |
827 | | Or convert the map file to a `badblocks` like list with `ddrescuelog` (part of recent versions of ''ddrescue'' package): |
828 | | |
829 | | {{{ |
830 | | root:~# ddrescuelog --list-blocks=- disk.map |
831 | | 859822246 |
832 | | }}} |
833 | | |
834 | | Both match the LBA reported by `smartctl`. |
835 | | |
836 | | ==== Find affected file ==== |
837 | | |
838 | | Get start offset of affected partition: |
839 | | |
840 | | {{{ |
841 | | root:~# fdisk --list /dev/sdb |
842 | | ... |
843 | | Device Boot Start End Sectors Size Id Type |
844 | | /dev/sdb1 63 1250258624 1250258562 596.2G 7 HPFS/NTFS/exFAT |
845 | | }}} |
846 | | |
847 | | Get filesystem block (cluster) size if unknown (4096 in many cases): |
848 | | |
849 | | {{{ |
850 | | root:~# fsstat /dev/sdb1 |
851 | | ... |
852 | | File System Type: NTFS |
853 | | ... |
854 | | Sector Size: 512 |
855 | | Cluster Size: 4096 |
856 | | ... |
857 | | }}} |
858 | | |
859 | | Calculate number of bad cluster as `(BAD_LBA - START_LBA) / SECTORS_PER_CLUSTER`: |
860 | | |
861 | | {{{ |
862 | | root:~# echo $(((859822246-63)/8)) |
863 | | 107477772 |
864 | | }}} |
865 | | |
866 | | Find inode (here: MFT entry) used by this cluster: |
867 | | |
868 | | {{{ |
869 | | root:~# ifind -d 107477772 /dev/sdb1 |
870 | | 663-128-2 |
871 | | }}} |
872 | | |
873 | | Print some info about this inode: |
874 | | |
875 | | {{{ |
876 | | root:~# istat /dev/sdb1 663-128-2 |
877 | | ... |
878 | | Name: Backup_2015-12-17.zip |
879 | | Parent MFT Entry: 30 Sequence: 1 |
880 | | Allocated Size: 4660039680 Actual Size: 4660039516 |
881 | | Created: 2015-12-17 13:43:30.460000000 (CET) |
882 | | File Modified: 2015-12-17 13:46:19.647000000 (CET) |
883 | | ... |
884 | | Type: $DATA (128-2) Name: N/A Non-Resident size: 4660039516 init_size: 4660039516 |
885 | | 106950180 106950181 ... |
886 | | ... |
887 | | 107477772 <=== The bad cluster |
888 | | ... |
889 | | 108087884 |
890 | | }}} |
891 | | |
892 | | Find full path of affected file: |
893 | | |
894 | | {{{ |
895 | | root:~# ffind /dev/sdb1 663-128-2 |
896 | | /Backups/2015/Backup_2015-12-17.zip |
897 | | }}} |
898 | | |
899 | | If the file is no longer needed, it could be overwritten in place and removed then. This is easy with `shred` from ''GNU coreutils'': `shred --iterations=1 --remove /PATH/TO/FILE`. This should reallocate the bad sector in most cases. |
900 | | |
901 | | ==== Try to recover the bad sector ==== |
902 | | |
903 | | Start with 100 read retries of the bad sector, write to `recovered.bin` if successful: |
904 | | |
905 | | {{{ |
906 | | root:~# ddrescue --ask --verbose --binary-prefixes --idirect --retry=100 \ |
907 | | --input-position=859822246s --output-position=0 --size=1s \ |
908 | | /dev/sdb recovered.bin recovered.map |
909 | | ... |
910 | | Current status |
911 | | ipos: 419835 MiB, non-trimmed: 0 B, current rate: 32 B/s |
912 | | opos: 0 B, non-scraped: 0 B, average rate: 4 B/s |
913 | | non-tried: 0 B, errsize: 0 B, run time: 1m 49s |
914 | | rescued: 512 B, errors: 0, remaining time: n/a |
915 | | percent rescued: 100.00% time since last successful read: 0s |
916 | | Finished |
917 | | }}} |
918 | | |
919 | | We were very lucky: |
920 | | |
921 | | {{{ |
922 | | root:~# cat recovered.map |
923 | | ... |
924 | | # pos size status |
925 | | 0x00000000 0x667FB14C00 ? |
926 | | 0x667FB14C00 0x00000200 + <=== Now OK! |
927 | | 0x667FB14E00 0x2E8B541200 ? |
928 | | }}} |
929 | | |
930 | | Check whether the disk firmware took the chance to reallocate the sector using the recovered data: |
931 | | |
932 | | {{{ |
933 | | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
934 | | dd: error reading ‘/dev/sdb’: Input/output error |
935 | | 0+0 records in |
936 | | 0+0 records out |
937 | | 0 bytes (0 B) copied, 23.5006 s, 0.0 kB/s |
938 | | }}} |
939 | | |
940 | | No luck in this case. So overwrite the sector manually: |
941 | | |
942 | | {{{ |
943 | | root:~# dd seek=859822246 count=1 oflag=direct if=recovered.bin of=/dev/sdb |
944 | | 1+0 records in |
945 | | 1+0 records out |
946 | | 512 bytes (512 B) copied, 1.05331 s, 0.5 kB/s |
947 | | }}} |
948 | | |
949 | | Read data back and check: |
950 | | |
951 | | {{{ |
952 | | root:~# dd skip=859822246 count=1 iflag=direct if=/dev/sdb of=test.bin |
953 | | 1+0 records in |
954 | | 1+0 records out |
955 | | 512 bytes (512 B) copied, 0.0211745 s, 24.2 kB/s |
956 | | |
957 | | root:~# diff -s recovered.bin test.bin |
958 | | Files recovered.bin and test.bin are identical |
959 | | }}} |
960 | | |
961 | | Finally, run a SMART self-test and check its result: |
962 | | |
963 | | {{{ |
964 | | root:~# smartctl -t short /dev/sdb |
965 | | ... |
966 | | Sending command: "Execute SMART Short self-test routine immediately in off-line mode". |
967 | | ... |
968 | | Please wait 2 minutes for test to complete. |
969 | | |
970 | | root:~# sleep 120 # :-) |
971 | | |
972 | | root:~# smartctl -x /dev/sdb |
973 | | ... |
974 | | ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE |
975 | | ... |
976 | | 5 Reallocated_Sector_Ct PO--CK 252 252 010 - 0 <=== Interesting... |
977 | | ... |
978 | | 9 Power_On_Hours -O--CK 100 100 000 - 252 |
979 | | ... |
980 | | 197 Current_Pending_Sector -O--CK 100 100 000 - 0 <=== As expected |
981 | | ... |
982 | | SMART Extended Self-test Log Version: 1 (2 sectors) |
983 | | Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error |
984 | | # 1 Short offline Completed without error 00% 252 - <=== Works again! |
985 | | # 2 Short offline Completed: read failure 90% 176 859822246 |
986 | | }}} |
987 | | |
988 | | Interestingly the `Reallocated_Sector_Ct` did not increase. Either the firmware did not record the reallocation or decided to reuse the original sector. |
989 | | |
990 | | Done! |
| 989 | Et voilà ! |