| 371 | === ReiserFS example === |
| 372 | |
| 373 | This section was written by Joachim Jautz with additions from Manfred Schwarb. |
| 374 | |
| 375 | The following problems were reported during a scheduled test: |
| 376 | |
| 377 | {{{ |
| 378 | smartd[575]: Device: /dev/hda, starting scheduled Offline Immediate Test. |
| 379 | [... 1 hour later ...] |
| 380 | smartd[575]: Device: /dev/hda, 1 Currently unreadable (pending) sectors |
| 381 | smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors |
| 382 | }}} |
| 383 | |
| 384 | [Step 0] The SMART selftest/error log (see `smartctl -l selftest`) indicated there was a problem with block address (i.e. the 512 byte sector at) `58656333`. The partition table (e.g. see `sfdisk -luS /dev/hda` or `fdisk -ul /dev/hda`) indicated that this block was in the `/dev/hda3` partition which contained a `ReiserFS` file system. That partition started at block address `54781650`. |
| 385 | |
| 386 | While doing the initial analysis it may also be useful to take a copy of the disk attributes returned by `smartctl -A /dev/hda`. Specifically the values associated with the `Reallocated_Sector_Ct` and `Reallocated_Event_Count` attributes (for ATA disks, the grown list (`GLIST`) length for `SCSI` disks). If these are incremented at the end of the procedure it indicates that the disk has re-allocated one or more sectors. |
| 387 | |
| 388 | [Step 1] Get the file system's block size: |
| 389 | |
| 390 | {{{ |
| 391 | # debugreiserfs /dev/hda3 | grep '^Blocksize' |
| 392 | Blocksize: 4096 |
| 393 | }}} |
| 394 | |
| 395 | [Step 2] Calculate the block number: |
| 396 | |
| 397 | {{{ |
| 398 | # echo "(58656333-54781650)*512/4096" | bc -l |
| 399 | 484335.37500000000000000000 |
| 400 | }}} |
| 401 | |
| 402 | It is re-assuring that the calculated 4 KB damaged block address in `/dev/hda3` is less than `Count of blocks on the device` shown in the output of `debugreiserfs` shown above. |
| 403 | |
| 404 | [Step 3] Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. If we do not get the `Cannot read the block` error we should check if our calculation in [Step 2] was correct ;) |
| 405 | |
| 406 | {{{ |
| 407 | # debugreiserfs -1 484335 /dev/hda3 |
| 408 | debugreiserfs 3.6.19 (2003 http://www.namesys.com) |
| 409 | 484335 is free in ondisk bitmap |
| 410 | The problem has occurred looks like a hardware problem. |
| 411 | }}} |
| 412 | |
| 413 | If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight, the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to risk your time and data on it. If you don't want to follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (`-B`) with reiserfs utils to handle this block correctly. |
| 414 | |
| 415 | {{{ |
| 416 | bread: Cannot read the block (484335): (Input/output error). |
| 417 | Aborted |
| 418 | }}} |
| 419 | |
| 420 | So it looks like we have the right (i.e. faulty) block address. |
| 421 | |
| 422 | [Step 4] Try then to find the affected file [#footnote3 [3]]: |
| 423 | |
| 424 | {{{ |
| 425 | tar -cO /mydir | cat >/dev/null |
| 426 | }}} |
| 427 | |
| 428 | If you do not find any unreadable files, then the block may be free or located in some metadata of the file system. |
| 429 | |
| 430 | [Step 5] Try your luck: bang the affected block with `badblocks -n` (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation [#footnote4 [4]]: |
| 431 | |
| 432 | {{{ |
| 433 | # badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100` |
| 434 | }}} |
| 435 | |
| 436 | [#footnote5 [5]] |
| 437 | |
| 438 | check success with `debugreiserfs -1 484335 /dev/hda3`. Otherwise: |
| 439 | |
| 440 | [Step 6] Perform this step only if Step 5 has failed to fix the problem: overwrite that block to force reallocation: |
| 441 | |
| 442 | {{{ |
| 443 | # dd if=/dev/zero of=/dev/hda3 count=1 bs=4096 seek=484335 |
| 444 | 1+0 records in |
| 445 | 1+0 records out |
| 446 | 4096 bytes transferred in 0.007770 seconds (527153 bytes/sec) |
| 447 | }}} |
| 448 | |
| 449 | [Step 7] If you can't rule out the bad block being in metadata, do a file system check: |
| 450 | |
| 451 | {{{ |
| 452 | reiserfsck --check |
| 453 | }}} |
| 454 | |
| 455 | This could take a long time so you probably better go for lunch ... |
| 456 | |
| 457 | [Step 8] Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now. |
| 458 | |
375 | | [=#footnote2 [2]] Starting with GNU coreutils release 5.3.0, the `dd` command in Linux includes the options 'iflag=direct' and 'oflag=direct'. Using these with the `dd` commands should be helpful, because adding these flags should avoid any interaction with the block buffering IO layer in Linux and permit direct reads/writes from the raw device. Use `dd --help` to see if your version of dd supports these options. If not, the latest code for dd can be found at https://www.gnu.org/software/coreutils/. |
| 463 | [=#footnote2 [2]] Starting with GNU coreutils release 5.3.0, the `dd` command in Linux includes the options 'iflag=direct' and 'oflag=direct'. Using these with the `dd` commands should be helpful, because adding these flags should avoid any interaction with the block buffering IO layer in Linux and permit direct reads/writes from the raw device. Use `dd --help` to see if your version of dd supports these options. If not, the latest code for dd can be found at https://www.gnu.org/software/coreutils/. |
| 464 | |
| 465 | [=#footnote3 [3]] Do not use `tar -c -f /dev/null or tar -cO /mydir >/dev/null`. GNU tar does not actually read the files if `/dev/null` is used as archive path or as standard output, see info tar. |
| 466 | |
| 467 | [=#footnote4 [4]] Important: set blocksize range is arbitrary, but do not only test a single block, as bad blocks are often social. Not too large as this test probably has not 0% risk. |
| 468 | |
| 469 | [=#footnote5 [5]] The rather awkward `expr 484335 + 100` (note the back quotes) can be replaced with `$((484335+100))` if the `bash` shell is being used. Similarly the last argument can become `$((484335-100))`. |