Version 13 (modified by 15 years ago) ( diff ) | ,
---|
Smartmontools Frequently Asked Questions (FAQ)
Table of Contents
-
Attributes
-
I see some strange output from
smartctl
. What does it mean? -
Why is my disk temperature s reported by
smartd
as 150 Celsius? - Attribute 194 (Temperature Celsius) behaves strangely on my Seagate disk
-
smartctl
reports the age as thousands of hours for my Maxtor/Hitachi/Fujitsu disk , yet it is only a few days old - The power-on timer (Attribute 9 raw value) on my Maxtor disk acts strange.
- The time stamps in the self-test log don't correspond to the power-on time, when test was run on my Western Digital (WD) disk
- The (normalized) WORST Attribute values of my Western Digital (WD) disk are larger than the (normalized) CURRENT Attribute values
- What Attributes does smartmontools not yet recognize?
-
I see some strange output from
- Configuration
- Protocols, Devices and Controllers
- Smartmontools Database
- Selftests
- Operating System
- Firmware Issues
- Distribution
Attributes
I see some strange output from smartctl
. What does it mean?
The raw SMART attributes (temperature, power-on lifetime, and so on) are stored in vendor-specific structures. Sometime these are strange. Hitachi disks (at least some of them) store power-on lifetime in minutes, rather than hours (see next question below). IBM disks (at least some of them) have three temperatures stored in the raw structure, not just one. And so on.
If you find strange output, or unknown attributes, have a look at our wiki pages, were we collect vendor specific info:
When you don't find an answer to your question there, please send an email to smartmontools-support and we'll help you try and figure it out.
Why is my disk temperature s reported by smartd
as 150 Celsius?
It's not. Please read the end of the smartd
man page (NOTES).
For example, in the message:
Device: /dev/hda, SMART Attribute: 194 Temperature_Celsius changed from 94 to 93
the value given is the Normalized not the Raw Attribute value (the
disk temperature in this case is about 22 Celsius). The
'-R'
and '-r'
Directives modify this behavior, so that
the information is printed with the Raw values as well, for example:
Device: /dev/hda, SMART Attribute: 194 Temperature_Celsius changed from 94 [Raw 22] to 93 [Raw 23]
Here the Raw values are the actual disk temperatures in Celsius. The
way in which the Raw values are printed, and the names under which the
Attributes are reported, is governed by the various
'-v Num,Description'
Directives described in the smartd
man page. Please see the smartctl
manual page for further
explanation of the differences between Normalized and Raw Attribute values.
Attribute 194 (Temperature Celsius) behaves strangely on my Seagate disk
Some Seagate disks store the current temperature Celsius in both the RAW and NORMALIZED Attribute 194 values, and the maximum lifetime temperature in Celsius in the WORST value. Since cooler is better, this means that in this case, lower NORMALIZED Attribute values are farther from failure, and that over time the WORST Attribute values get larger, not smaller (as with other Attributes).
smartctl
reports the age as thousands of hours for my Maxtor/Hitachi/Fujitsu disk , yet it is only a few days old
On recent disks, Maxtor has started to use Attribute 9 to
store the power-on disk lifetime in minutes rather than hours.
In this case, use the: '-v 9,minutes'
option to correctly
display hours and minutes.
Some models of Fujitsu disks use Attribute 9 to store
the power-on disk lifetime in seconds. In that case, use the:
'-v 9,seconds'
option to correctly display hours, minutes and seconds.
The power-on timer (Attribute 9 raw value) on my Maxtor disk acts strange.
There are three related problems with Maxtor's SMART firmware:
- On some Maxtor disks, the raw value of Attribute 9 (Power On Time) is supposed to be minutes. But it advances at an unpredictable rate, always more slowly than one count per minute. This is because when the disk is in idle mode, the counter stops advancing. This is only supposed to happen in standby mode. This will be corrected in Maxtor product lines released after October 2004.
- In Maxtor disks that use the raw value of Attribute 9 as a minutes counter, only two bytes (of the six available) are used to store the raw value. So it resets to zero once every 65536=216 minutes, or about once every 1092 hours. This is fixed in all Maxtor disks manufactured after July 2003, where the raw value was extended to four bytes.
- In Maxtor disks that use the raw value of Attribute 9 as a minutes counter, the hour time-stamps in the self-test and ATA error logs are calculated by right shifting 6 bits. This is equivalent to dividing by 64 rather than by 60. As a result, the hour time stamps in these logs advance 7% more slowly than they should. Thus, if you do self-tests once per week at the same time, instead of the time-stamps being 168 hours apart, they are 157 hours apart. This is also fixed in all Maxtor disks manufactured after July 2003.
The time stamps in the self-test log don't correspond to the power-on time, when test was run on my Western Digital (WD) disk
The self-test log timestamps in many WD disks roll back to zero every 1092 hours (65536 minutes). This problem is due to a WD firmware bug. The power-on lifetime in hours is correctly stored in Attribute
- However when the power-on lifetime is calculated for self-test log
entries, the lifetime in minutes is put into a 16-bit register then divided by 60. The 16-bit register overflows and wraps around every 1092 hours.
For WD drives that exhibit this firmware bug, the relationship between Attribute 9's raw value (H) and the time-stamps in the self-test log (h) are given by:
Let H = power on hours as shown by Attribute 9 (correct) Let M = 60*H (power on minutes, correct) Let m = M mod 65536 (incorrect value of power on minutes) Let h = m/60 (incorrect value of power on hours, shown in self-test log)
The (normalized) WORST Attribute values of my Western Digital (WD) disk are larger than the (normalized) CURRENT Attribute values
Western Digital firmware initializes SMART Attributes 10, 11, and 199 after either 120 spin-ups or 8 power-on hours. Until that time, they have the uninitialized value 253.
What Attributes does smartmontools not yet recognize?
From Maxtor disks (99), (100), and (101). These are not used by Maxtor in SMART revision 5. They will be used in SMART revision 6, but the engineering group has not yet decided what to monitor with these Attributes.
Configuration
My Fedora Core Linux system displays the startup message: 'smartd [FAILED]'
Fedora Core is distributed with a smartd
configuration file
/etc/smartd.conf
that monitors the first IDE disk /dev/hda. If this
device does not exist (or lacks SMART capability) you will get the
error message above. Look in SYSLOG (/var/log/messages) for
additional details about what is going wrong.
The solution: If your system has only SCSI disks, or has IDE disk(s)
on a non-primary controller, just edit /etc/smartd.conf
to reflect the
correct location of the drive(s). Please also read the smartd.conf
man page for additional information.
Protocols, Devices and Controllers
Can I monitor disks behind RAID controllers?
Support for disks behind RAID controllers is highly dependent on both platform and controller type. See our page about smartmontools RAID controller support for the details.
Smartmontools for FireWire, USB, and SATA disks/systems
As for USB and FireWire (IEEE 1394) disks and tape drives, the news is not good. They appear to the operating system as SCSI devices but their implementations do not usually support those SCSI commands needed by smartmontools. A consortium associated with IEEE 1394 certified some external enclosures (containing a ATA disk and a protocol bridge) as being compliant to the relevant standards. Even still, that compliance means that they tend to only support the bare minimum of commands needed for device operation (i.e. SMART support is an unsupported extra). Hopefully external USB and Firewire devices will support SAT in the future, see below. Some USB device based on cypress chips support a proprietary protocol (ATACB) that allow to send raw ATA commands (i.e. SMART support).
Smartmontools should work correctly with SATA drives under both
Linux 2.4 and 2.6 kernels. Depending on which subsystem the SATA
controller is in (i.e. drivers/ide
, drivers/ata
or libata (under drivers/scsi
) a SATA drive will
appear as /dev/hd*
or /dev/sd*
. Either way,
smartmontools should be able to figure out what is going on and act
accordingly. In some cases smartmontools may need a hint in the form of
a '-d sat
' or '-d ata
' option on the smartctl
command
line or in the /etc/smartd.conf
file.
There may be a hint to add one of those options in the log file
when smartd
is run as a daemon or on the command line with smartctl
.
The '-d ata
' option means that even though
the drive has a SCSI device name, treat it as an ATA
disk. Unfortunately such an approach doesn't often work. The next
paragraph has more information about '-d sat
'.
The SCSI to ATA Translation (SAT) standard (ANSI INCITS 431-2007)
may solve many problems in this area. It defines how SCSI commands will
be translated to the corresponding ATA commands and defines a
pass-through mechanism. ATA commands are conveyed natively by two
transports: parallel and serial ATA. SCSI commands can be
conveyed by many transports: the veteran SCSI Parallel Interface
(SPI), Fibre Channel (FC), Infiniband (SRP), Serial
Attached SCSI (SAS), IP (iSCSI and iSER), USB (mass storage), and IEEE
1394 (SBP) to name some. Due to their cost and storage capacity, more
and more ATA disks (especially SATA disks) are appearing "behind" a
SCSI transport. This is especially true of the SAS transport which can
painlessly accomodate both SAS and SATA disks. Enter another acronym:
SATL which stands for SCSI to ATA Translation Layer. In Linux libata
has a SATL in it. Some SAS host bus adapters have a SATL in their
firmware. FC might have a SATL in a switch. Perhaps in the future USB
and IEEE 1394 enclosures will have a SATL in them. Starting from
smartmontools versions 5.36 and 5.37, no matter where a SATL is,
irrespective of the operating system in use, the user should have less
problems with ATA disks, no matter which transport is involved. As
always, it helps to know a little of what is happening under the
covers. The '-d sat
' option instructs smartctl
and smartd
to assume a SATL is in place and act accordingly.
The smartctl
command can often detect a SATL and autoconfigure
while in smartmontools version 5.37 smartd
often needs a hint.
The current USB mass storage specification is based on a version of SCSI
(SPC-2) that can't support SAT. But some chips manufacturers implement
proprietary SCSI commands that allow ATA pass through (similiar like for SAT).
Well known is the cypress chipset, that contains an ATACB proprietary pass through
(for ATA commands passed through SCSI commands) for which
some information is publicly available.
Smartmontools SVN version support these cypress chips via
the '-d usbcypress
' option on the smartctl command line.
There is no autodetection at the moment. If you want to know,
wether your device supports it, check your device usb id (most
cypress usb ata bridge got vid=0x04b4
, pid=0x6830
)
or to try to call smartctl
with option
'-d usbcypress
'. If the usb device doesn't support
ATACB, smartmontools will abort.
Smartmontools for SCSI disks and tapes (TapeAlert)
Smartmontools for SCSI disks and tapes (including medium changers) is discussed on a separate page.
Smartmontools Database
My ATA drive is not in the smartctl
/smartd
database
Does this break anything? How do I get it added?
If your drive is not in the database, then the names of the Attributes
(displayed in the ATTRIBUTE_NAME
column of smartctl -A /dev/hd?
)
and the format of the the raw Attribute values shown in the
RAW_VALUE
column may be incorrect. This is mostly cosmetic:
the essential drive health monitoring/testing functionality of
smartmontools
does not depend upon the database.
If your drive is not in the database, pleaes check the sourceforge project page to be sure that you are using the latest smartmontools release. Each new release has additional drives added to the database. Please do not submit a new drive for the database without checking to see if it is already in the database of the current smartmontools release version.
If your drive is not in the database of the current release, to have it added to the database, first use the command:
smartctl -t short /dev/hd?
to run a short self-test on the drive, and wait a
few minutes for the test to complete. Then email
the entire output from:
smartctl -a /dev/hd?
to smartmontools-database as a plain-text ASCII email attachment (file type: ".txt"). The timestamp in the self-test log will help us to determine whether Attribute 9 is being used to store the lifetime in hours, minutes, or seconds.
If you need to use any of the vendor-specific display options
(-v
options) with the drive, or if any of the Attributes are
behaving strangely, please include that information as well.
Selftests
ATA drive is failing self-tests, but SMART health status is 'PASSED'. What's going on?
If your ATA drive supports self-tests, you should run them on a regular basis, for example one per week:
smartctl -t long /dev/hd?
After the test has completed, you should examine the results with:
smartctl -l selftest /dev/hd?
If the drive fails a self-test, but still has 'PASSED
' SMART health status, this usually means that there is a corrupted (uncorrectable=UNC) sector on the disk. This means that the ECC data stored at that sector is not consistent with the user data stored at that sector, and an attempt to read the sector fails with a UNC error. This can be a one-time transient effect: a sudden power failure while the disk was writing to the sector corrupted the ECC code or data, but the sector <em>could</em> correctly store new data. Or it can be a permanent effect: the magnetic media has been damaged by a bit of dust, and the sector could not correctly store new data.
If the disk can read the sector of data a single time, and the damage is permanent, not transient, then the disk firmware will mark the sector as 'bad' and allocate a spare sector to replace it. But if the disk can't read the sector even once, then it won't reallocate the sector, in hopes of being able, at some time in the future, to read the data from it. A write to an unreadable (corrupted) sector will fix the problem. If the damage is transient, then new consistent data will be written to the sector. If the damange is permanent, then the write will force sector reallocation. Please see Bad block HOWTO for instructions about how to force this sector to reallocate (Linux only).
The disk still has passing health status because the firmware has not found other signs of trouble, such as a failing servo.
Such disks can often be repaired by using the disk manufaturer's 'disk evaluation and repair' utility. Beware: this may force reallocation of the lost sector and thus corrupt or destroy any file system on the disk. See Bad block HOWTO for generic Linux instructions.
smartd
is warning that my ATA disk has unreadable or uncorrectable or pending sectors. What's going on?
Disk drives store data in blocks (sectors) of 512 bytes. Each 512 bytes has additional bytes appended to it (usually 40 to 60) which are used internally by the disk firmware for error checking/detection and correction. These are called ECC bytes.
Sometimes the data in a sector gets corrupted. This can happen because a speck of dust scratched the disk, or because the disk was powered down while writing data to that sector, or for other reasons. Usually the ECC bytes can be used to correct the corrupted data. However if the ECC bytes are inconsistent or can't be used to correct the bad data, then the 512 bytes of data are lost. Such a sector is called unreadable or uncorrectable.
If your disk has an unreadable sector, this means that some of your data can't be retrieved. You can force the disk to replace the unreadable sector with a spare good sector, but only at the price of losing the 512 bytes of data forever.
Disks with uncorrectable sectors can often be repaired by using the disk manufaturer's 'disk evaluation and repair' utility (see previous FAQ entry). Beware: this may force reallocation of the lost sector and thus corrupt or destroy any file system on the disk. See Bad block HOWTO for generic Linux instructions.
Normally when an uncorrectable sector is found, the disk puts this onto a 'pending sector list' to indicate that it should be replaced with a spare good sector. However this replacement won't take place until either the disk can read the data on the bad sector, or is commanded to write new data to that bad sector.
Where can I find manufacturer-specific disk-testing utilities?
A good listing of such utilities can be found here. Unfortunately most of these are for MS operating systems, but most can be run from a MS-DOS boot disk.
The UBCD (Ultimate Boot CD) includes most of these disk-testing utilities and many other useful diagnostic tools ready to boot from CD or USB memory stick. The UBCD can be customized by adding other images, like one containing smartmontools.
Note: if you do run one of these utilities, and it identifies the meanings of any SMART Attributes that are not known to smartmontools, please report them to the smartmontools-support mailing list or add the info to our info pages on vendor specific SMART Attributes.
These utilities have an important role to fill. If your disk has bad sectors (for example, as revealed by running self-tests with smartmontools) and the disk is not able to recover the data from those sectors, then the disk will not automatically reallocate those damaged sectors from its set of spare sectors, because forcing the reallocation to take place may entail some loss of data. Because the commands that force such reallocation are Vendor Specific, most manufactuers provide a utility for this purpose. It may cause data loss but can repair damaged sectors (at least, until it runs out of replacement sectors).
Operating System
What are the operating system requirements?
Please see the first section of the INSTALL file.
BIOS has a SMART enable/disable setting. What does it do, and how should I set it?
Some type of BIOS can check the SMART health status of a disk at bootup: the equivalent of 'smartctl -H /dev/hd?
'. This one-time check on bootup is done if the BIOS SMART setting is set to ENABLE
, and is not done if the setting is set to DISABLE
.
If this one-time check is done, and the disk's health status is found to be FAILED
, then typically the BIOS will display an error message and refuse to boot the machine.
For the proper functioning of smartmontools, either BIOS setting may be used.
On Windows smartctl
prints the message: "...Log Read failed: Function not implemented"
What is going wrong?
This means that the device driver does not support the command SMART READ LOG. The message does not indicate a hard disk problem'' It does also not mean that the disk itself does not support SMART logs. It may still be possible to read the logs with a Linux version of smartmontools run from some bootable CD.
To access ATA SMART functionality on Windows, smartmontools uses the I/O control calls SMART_RCV_DRIVE_DATA and SMART_SEND_DRIVE_CMD. These calls were available since Win95 OSR2. An example program from Microsoft can be found here (the related KB article 208048 is no longer available).
Starting with NT4, these calls do more restrictive parameter checks. In particular, the command codes for SMART READ LOG and ABORT SELF-TEST are not accepted. To perform these functions, smartmontools uses the undocumented functions SCSIOP_ATA_PASSTHROUGH (NT4) or IOCTL_IDE_PASS_THROUGH (2000/XP) instead. An example program using these calls can be found here, a related newsgroup thread is here.
Unfortunately, these undocumented functions are not implemented in
most vendor specific ATA device drivers. smartctl
prints a
"Function not implemented" message in this case.
A new I/O control call IOCTL_ATA_PASS_THROUGH is available since Win2003 and XP SP2. It should be supported by most new drivers. Experimental code using this call was added 2006-04-27 and is included in smartmontools release 5.37.
I found in syslog: 'Can't locate module block-major-65'
When I run smartd
, the SYSLOG /var/log/messages
contains messages like this:
smartd: Reading Device /dev/sdv modprobe: modprobe: Can't locate module block-major-65
This is because when smartd
starts, if there is no
configuration file, it looks for all ATA and SCSI devices to monitor
(matching the pattern /dev/hd[a-t]
or
/dev/sd[a-z]
). The log messages appear because your
system doesn't have most of these devices.
The solution is simple: use the smartd
configuration file
/etc/smartd.conf
to specify which devices to monitor.
Firmware Issues
What's the story on IBM SMART disks?
Apparently some of the older SMART firmware on IBM disks can interfere with the regular operation of the disk. If you have this problem, here are some links to an IBM Firmware Upgrade that fixes the problem:
Geocities Site
IBM Site #1
IBM Site #2
Distribution
Is there a bootable standalone CD or floppy that contains smartmontools?
Yes there are. Look to section Run from Live-system on the download page.
Does it work on Windows?
Yes, finally it does. A windows port of smartctl
5.26 by Christian Franke was first checked in 2004/02/23 on CVS branch
RELEASE_5_26_WIN32_BRANCH and has been merged to the CVS trunk later.
The Cygwin environment can be used to built both Cygwin and Windows (using MinGW) versions of smartctl
and smartd
.
Installation instructions for binary distributions can be found here for Cygwin and here for Windows.
Why did the release version scheme change?
It was non-standard. So with the move to GNU Autoconf and GNU Automake it changed from 5.X-Y (where X and Y are one or more digits) to 5.Y. Starting with the first release, and moving forward in time, the releases are numbered as follows:
5.0-1, 5.0-2, ..., 5.0-45, 5.1-1, ..., 5.1-18, 5.19, 5.20, ...
What's this smartctl message mean?: Warning: ATA error count 9 inconsistent with error log pointer 5
The ATA error log is stored in a circular buffer, and the ATA specifications are unambiguous about how the entries should be ordered. This warning message means that the disk's firmware does not strictly obey the ATA specification regarding the ordering of the error log entries in the circular buffer. Smartmontools will correct for this oversight, so this warning message can be safely ignored by users. (On the other hand, firmware engineers: please read the ATA specs more closely then fix your code!).
On Windows, smartctl aborts with the message "...SMART_GET_VERSION failed". What is going wrong?
A failing SMART_GET_VERSION call means that the device driver does not implement the I/O controls (see below) to access ATA SMART functionality.
Some Windows drivers for (S)ATA controllers are implemented as SCSI class drivers. This is usually the case for drivers which support RAID. Unfortunately, such drivers do not support the ATA specific SMART I/O controls.
How can I check that the package hasn't been tampered with?
Since the smartmontools utilities run as root, you might be concerned about something harmful being embedded within them. Starting with release 5.19 of smartmontools, the .rpm files and tarball have been GPG signed. The tarball's fingerprint is given in a file on the release page with a name like smartmontools-5.32.tar.gz.asc.
Please verify these using the