| 197 | |
| 198 | ---- |
| 199 | === What is error recovery control (ERC) and why it is important to enable it for the (S)ATA disks in RAID? === |
| 200 | |
| 201 | |
| 202 | In computing, error recovery control (ERC) (Western Digital: time-limited error recovery (TLER), Samsung/Hitachi: command completion time limit (CCTL)) is a feature of hard disks which allow a system administrator to configure the amount of time a drive's firmware is allowed to spend recovering from a read or write error. Limiting the recovery time allows for improved error handling in hardware or software RAID environments. In some cases, there is a conflict as to whether error handling should be undertaken by the hard drive or by the RAID implementation, which leads to drives being marked as unusable and significant performance degradation, when this could otherwise have been avoided. |
| 203 | |
| 204 | It is best for ERC to be "enabled" when in a RAID array to prevent the recovery time from a disk read or write error from exceeding the RAID implementation's timeout threshold. If a drive times out, the hard disk will need to be manually re-added to the array, requiring a re-build and re-synchronization of the hard disk. |
| 205 | |
| 206 | On disks that fully implement the ATA-8 standard, the smartctl utility can be used to control the ERC behavior of many drives by setting the SCT Error Recovery Control (scterc) parameter: |
| 207 | |
| 208 | - Reading current settings: |
| 209 | {{{ |
| 210 | |
| 211 | smartctl -l scterc /dev/sda |
| 212 | SCT Error Recovery Control: |
| 213 | Read: Disabled |
| 214 | Write: Disabled |
| 215 | }}} |
| 216 | - Changing the setting: |
| 217 | {{{ |
| 218 | smartctl -l scterc,150,150 /dev/sda |
| 219 | SCT Error Recovery Control: |
| 220 | Read: 150 (15.0 seconds) |
| 221 | Write: 150 (15.0 seconds) |
| 222 | }}} |
| 223 | |
| 224 | ERC control needs to be set on the boot time and if hot-replacement been made. |