S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology; often written as SMART) is a monitoring system for computer hard disk drives to detect and report on various indicators of reliability, in the hope of anticipating failures.
When a failure is anticipated by S.M.A.R.T., the user may choose to replace the drive to avoid unexpected outage and data loss. The manufacturer may be able to use the S.M.A.R.T. data to discover where faults lie and prevent them from recurring in future drive designs.
The purpose of S.M.A.R.T. is to warn a user of impending drive failure while there is still time to take action, such as copying the data to a replacement device.
Hard disk failures fall into one of two basic classes:
- Predictable failures: These failures result from slow processes such as mechanical wear and gradual degradation of storage surfaces. Monitoring can determine when such failures are becoming more likely.
- Unpredictable failures: These failures happen suddenly and without warning. They range from electronic components becoming defective to a sudden mechanical failure (perhaps due to improper handling).
Mechanical failures account for about 60% of all drive failures.[1] While the eventual failure may be catastrophic, most mechanical failures result from gradual wear and there are usually certain indications that failure is imminent. These may include increased heat output, increased noise level, problems with reading and writing of data, or an increase in the number of damaged disk sectors.
Work at Google on over 100,000 drives found correlations between certain S.M.A.R.T. information and actual failure rates. In the 60 days following the first scan error on a drive, the drive was, on average, 39 times more likely to fail than it would have been had if no such error occurred.
[...continues...]