Changes between Version 26 and Version 27 of BadBlockHowto
- Timestamp:
- May 21, 2023, 11:20:24 AM (18 months ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
BadBlockHowto
v26 v27 52 52 }}} 53 53 54 First Step: We need to locate the partition on which this sector of the disk lives:54 **First Step**: We need to locate the partition on which this sector of the disk lives: 55 55 56 56 {{{ … … 77 77 You can see that this is an `ext2` file system, mounted at `/data`. 78 78 79 Second Step:we need to find the block size of the file system (normally 4096 bytes for `ext2`):79 **Second Step:** we need to find the block size of the file system (normally 4096 bytes for `ext2`): 80 80 81 81 {{{ … … 85 85 }}} 86 86 87 In this case the block size is 4096 bytes. Third Step: we need to determine which File System Block contains this LBA. The formula is: 87 In this case the block size is 4096 bytes. 88 89 **Third Step**: we need to determine which File System Block contains this LBA. The formula is: 88 90 89 91 {{{ … … 109 111 Note: the fractional part of `0.125` indicates that this problem LBA is actually the second of the eight sectors that make up this file system block. 110 112 111 Fourth Step:we use `debugfs` to locate the inode stored in this block, and the file that contains that inode:113 **Fourth Step:** we use `debugfs` to locate the inode stored in this block, and the file that contains that inode: 112 114 113 115 {{{ … … 161 163 }}} 162 164 163 Fifth StepNOTE: '''This last step will permanently and irretrievably destroy the contents of the file system block that is damaged''': if the block was allocated to a file, some of the data that is in this file is going to be overwritten with zeros. You will not be able to recover that data unless you can replace the file with a fresh or correct version.165 **Fifth Step** NOTE: '''This last step will permanently and irretrievably destroy the contents of the file system block that is damaged''': if the block was allocated to a file, some of the data that is in this file is going to be overwritten with zeros. You will not be able to recover that data unless you can replace the file with a fresh or correct version. 164 166 165 167 To force the disk to reallocate this bad block we'll write zeros to the bad block, and sync the disk: … … 414 416 creates the file. Leave it running until the partition/file system is full. This will make the disk reallocate those sectors which do not belong to a file. Check the `smartctl -a` output after that and make sure that the sectors are reallocated. If any remain, use the debugfs method. Of course the usual caveats apply - back it up first, and so on. 415 417 418 Comment by Mingye Wang: wouldn't it be easier to skip to step 5 and do the `dd` or `hdparm`? 419 416 420 === ReiserFS example === 417 421 … … 427 431 }}} 428 432 429 [Step 0]The SMART selftest/error log (see `smartctl -l selftest`) indicated there was a problem with block address (i.e. the 512 byte sector at) `58656333`. The partition table (e.g. see `sfdisk -luS /dev/hda` or `fdisk -ul /dev/hda`) indicated that this block was in the `/dev/hda3` partition which contained a `ReiserFS` file system. That partition started at block address `54781650`.433 **[Step 0]** The SMART selftest/error log (see `smartctl -l selftest`) indicated there was a problem with block address (i.e. the 512 byte sector at) `58656333`. The partition table (e.g. see `sfdisk -luS /dev/hda` or `fdisk -ul /dev/hda`) indicated that this block was in the `/dev/hda3` partition which contained a `ReiserFS` file system. That partition started at block address `54781650`. 430 434 431 435 While doing the initial analysis it may also be useful to take a copy of the disk attributes returned by `smartctl -A /dev/hda`. Specifically the values associated with the `Reallocated_Sector_Ct` and `Reallocated_Event_Count` attributes (for `ATA` disks, the grown list (`GLIST`) length for SCSI disks). If these are incremented at the end of the procedure it indicates that the disk has re-allocated one or more sectors. 432 436 433 [Step 1]Get the file system's block size:437 **[Step 1]** Get the file system's block size: 434 438 435 439 {{{ … … 438 442 }}} 439 443 440 [Step 2]Calculate the block number:444 **[Step 2]** Calculate the block number: 441 445 442 446 {{{ … … 447 451 It is re-assuring that the calculated 4 KB damaged block address in `/dev/hda3` is less than `Count of blocks on the device` shown in the output of `debugreiserfs` shown above. 448 452 449 [Step 3]Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block` error we should check if our calculation in [Step 2] was correct ;)453 **[Step 3]** Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block` error we should check if our calculation in [Step 2] was correct ;) 450 454 451 455 {{{ … … 465 469 So it looks like we have the right (i.e. faulty) block address. 466 470 467 [Step 4]Try then to find the affected file ^[#footnote3 [3]]^:471 **[Step 4]** Try then to find the affected file ^[#footnote3 [3]]^: 468 472 469 473 {{{ … … 473 477 If you do not find any unreadable files, then the block may be free or located in some metadata of the file system. 474 478 475 [Step 5]Try your luck: bang the affected block with `badblocks -n` (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation ^[#footnote4 [4]]^:479 **[Step 5]** Try your luck: bang the affected block with `badblocks -n` (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation ^[#footnote4 [4]]^: 476 480 477 481 {{{ … … 483 487 check success with `debugreiserfs -1 484335 /dev/hda3`. Otherwise: 484 488 485 [Step 6]Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation:489 **[Step 6]** Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation: 486 490 487 491 {{{ … … 492 496 }}} 493 497 494 [Step 7]If you can't rule out the bad block being in metadata, do a file system check:498 **[Step 7]** If you can't rule out the bad block being in metadata, do a file system check: 495 499 496 500 {{{ … … 500 504 This could take a long time so you probably better go for lunch ... 501 505 502 [Step 8]Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now.506 **[Step 8]** Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now. 503 507 504 508 == Repairs at the disk level == … … 540 544 The SCSI disk command set and associated disk architecture are assumed in this section. SCSI disks have their own logical to physical mapping allowing a damaged sector (usually carrying 512 bytes of data) to be remapped irrespective of the operating system, file system or software RAID being used. 541 545 542 The terms ''block andsector'' are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a ''logical block''.546 The terms ''block'' and ''sector'' are used interchangeably, although block tends to get used in higher level or more abstract contexts such as a ''logical block''. 543 547 544 548 When a SCSI disk is formatted, defective sectors identified during the manufacturing process (the so called primary list: PLIST), those found during the format itself (the certification list: CLIST), those given explicitly to the format command (the DLIST) and optionally the previous grown list (GLIST) are not used in the logical block map. The number (and low level addresses) of the unmapped sectors can be found with the `READ DEFECT DATA SCSI` command. … … 561 565 SCSI disks expect unrecoverable errors to be fixed manually using the `REASSIGN BLOCKS SCSI` command since loss of data is involved. It is possible that an operating system or a file system could issue the `REASSIGN BLOCKS` command itself but the authors are unaware of any examples. The `REASSIGN BLOCKS` command will reassign one or more blocks, attempting to (partially ?) recover the data (a forlorn hope at this stage), fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name ''grown'' list). The contents of the GLIST may not be that interesting but `smartctl` prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life. 562 566 563 Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. 567 Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. Make sure to disable "quick format" so the formatting actually write through the entire disk! 564 568 565 569 ==== Example ==== … … 1062 1066 1063 1067 || Date || Author || Description || 1068 ||2023-05-21||Artoria2e5||Formatting: make the steps bold so they are easier to find & 1 comment|| 1064 1069 ||2021-06-12||ttsiodras||Added a note about LUKS-encrypted partitions hosting ext filesystems|| 1065 1070 ||2017-03-29||chrfranke||Add section Case Studies with a real-life example using ddrescue and sleuthkit on Windows||