All Posts

How to Read SMART Data and Predict SSD Failure

G
GGFix Technical Team
8 April 20259 min read112 views
How to Read SMART Data and Predict SSD Failure
GGFix monitors this 24/7

Your drive could be failing right now — silently.

NVMe and SSD failures rarely announce themselves. SMART data degrades for weeks before the crash. GGFix reads these signals 24/7 and alerts you while there's still time to back up and replace.

Start 3-Day Free TrialNo card required

SMART data gives you weeks of warning before an SSD fails — but only if you know which attributes matter. Most guides list every SMART attribute. This one tells you which five actually predict failure, what the numbers mean, and how to act before you lose data.

SMART stands for Self-Monitoring, Analysis and Reporting Technology. Every SSD and HDD manufactured since the late 1990s reports this data. Most people never look at it. When their drive dies, the data was there the whole time.

Why SMART Data Is Usually Ignored (And Why That's a Problem)

The typical PC user has never opened CrystalDiskInfo. The typical IT professional checks SMART reactively — after a machine starts acting up. By then, the drive has often been failing for 4-8 weeks.

Backblaze's quarterly drive failure reports (the most comprehensive public dataset on hard drive reliability, covering millions of drives over years) consistently show that drives exhibiting elevated SMART errors have dramatically higher failure rates within 30-60 days. The data isn't theoretical — it's empirical evidence that SMART monitoring works.

For an organization managing 50 machines, at least one drive will hit early failure indicators in any given quarter. The question is whether you catch it at SMART health 85% (2-4 weeks before failure) or at SMART health 0% (the morning it doesn't boot).

The Five SMART Attributes That Actually Predict Failure

Most SMART monitoring tools display 30-50 attributes. The majority are informational noise. These five are the ones that actually correlate with imminent failure:

1. Reallocated Sectors Count (ID 05)

What it means: The drive found a bad sector and remapped it to a spare area. This is normal in small quantities — drives ship with spare sectors specifically for this. When this count starts rising, the drive is running out of healthy cells to remap to.

When to worry: Any non-zero value warrants attention. A value above 5 on an NVMe SSD is serious. A value above 20 on any drive means replace it now. For HDDs, rising values (even if currently low) indicate the drive is actively degrading.

The critical distinction: A static value of 3 that hasn't moved in 6 months is different from a value that went from 0 to 3 in the last week. Trend matters more than absolute value.

2. Current Pending Sector Count (ID C5 / 197)

What it means: Sectors the drive has flagged as unstable — it hasn't remapped them yet because it's waiting to confirm they're truly bad. These are sectors that returned errors when read. The drive will attempt to remap them on next write.

When to worry: Any non-zero value is a yellow flag. This attribute rising means active read errors are occurring right now. Combined with a rising Reallocated Sectors Count, this indicates a drive in active failure.

Real impact: If Windows needs to read a file from a pending sector, it will return an error. System files on pending sectors cause BSODs. User files return read errors. Database files corrupt. This is where "SMART looked fine until it didn't" actually means "people missed attribute 197."

3. Uncorrectable Sector Count (ID C6 / 198)

What it means: Sectors that couldn't be corrected even with error correction codes. These are permanently lost sectors. The data that was there is gone.

When to worry: Any non-zero value is a critical alert. Replace the drive immediately. This attribute at zero is normal. This attribute at 1 means the drive has already lost data and will lose more.

4. Wear Leveling Count (ID B3 / 173 / 177, varies by manufacturer)

What it means: Specific to SSDs and NVMe drives. Reports the remaining NAND endurance as a percentage. 100 = new. 0 = NAND cells are fully worn. This is the attribute that tracks how much write life the drive has left.

When to worry: Values below 10 mean the drive is near end-of-life for write operations. Enterprise SSDs with TBW (Terabytes Written) ratings will decline this counter as TBW is consumed. Consumer SSDs vary — a 500GB drive with 300 TBW rating used heavily in a workstation role will decline faster than the same drive used for light office work.

Note: Samsung calls this "Wear Leveling Count" (177), Intel/Micron call it "Media Wasted Cells" or similar. The concept is the same across manufacturers.

5. Power-On Hours (ID 09)

What it means: Total hours the drive has been powered on. Not a failure predictor by itself, but essential context. A drive with Reallocated Sectors at 3 and 200 hours of use is alarming. The same count at 40,000 hours is a sign of a well-used drive reaching normal end-of-life.

When to use it: Cross-reference with other attributes to assess severity. Also useful for lifecycle planning — drives approaching 5 years of use (roughly 40,000 hours) warrant proactive replacement regardless of SMART health percentage.

The Health Percentage Number

CrystalDiskInfo and similar tools display a single health percentage. Understanding what this number represents:

  • 95-100%: Healthy. Monitor quarterly.
  • 85-94%: Caution. Monitor monthly. Plan replacement.
  • 70-84%: Warning. Order a replacement immediately. Back up now.
  • Below 70%: Critical. Data at risk. Replace before next use if possible.

This percentage is a composite score calculated from the weighted critical attributes above. The exact formula varies by tool, which is why two tools may show slightly different percentages for the same drive. The trend matters more than the exact number — a drive dropping 3% per month is a different situation from a drive that's been stable at 88% for two years.

NVMe vs. SATA: Different Attributes, Same Principles

NVMe drives use the NVMe Health Information Log rather than traditional SMART, but the monitoring principles are identical:

  • Percentage Used (NVMe equivalent of wear leveling) — 0% = new, 100% = end of rated life. Note: some drives report correctly past 100% as a warning sign.
  • Available Spare — percentage of spare NAND blocks remaining for remapping. When this hits 0%, the drive can no longer remap bad sectors.
  • Data Units Written — total write workload, useful for comparing against TBW rating.
  • Media and Data Integrity Errors — NVMe equivalent of uncorrectable errors. Non-zero means data has been lost.

CrystalDiskInfo reads NVMe health data correctly on most drives. Some enterprise NVMe drives require manufacturer-specific tools (Samsung Magician, Western Digital Dashboard, Crucial Storage Executive).

How to Check SMART Data Right Now

Option 1: CrystalDiskInfo (free, Windows) — Download from the official site. Open it, select your drive, look for the overall health status (Good/Caution/Bad) and check the five critical attributes above. Caution on any critical attribute = plan replacement. Bad = replace now.

Option 2: Windows built-in — Open PowerShell as admin, run: Get-PhysicalDisk | Get-StorageReliabilityCounter | Select-Object DeviceId, ReadErrorsTotal, WriteErrorsTotal, Temperature, Wear. Limited compared to CrystalDiskInfo but available without any software install.

Option 3: Continuous monitoring — Manual SMART checks are point-in-time. A drive that's healthy Monday can start showing pending sectors by Thursday. The SSD thermal throttling and health monitoring guide covers the case for continuous monitoring vs. periodic checks.

GGFix reads SMART data from every monitored drive every 60 seconds and uploads aggregated health data every 5 minutes. When any critical attribute shows a non-zero value or when health percentage drops below configurable thresholds, an alert fires immediately. On a fleet of 50 machines, this means no drive failure goes undetected for weeks — the alert fires within hours of the first SMART anomaly.

Building a SMART Monitoring Routine

For individual machines: check CrystalDiskInfo monthly. Takes 2 minutes. Screenshot the health report so you have a baseline for trend comparison.

For IT professionals managing multiple machines: manual SMART checks don't scale. At 20 machines, monthly checks are 40 minutes of work. At 50 machines, it's over 100 minutes. And monthly is still too infrequent — a drive can go from healthy to failing in 2-3 weeks.

Automated monitoring is the only practical approach at scale. Configure alerts at:

  • SMART health below 90%: Warning notification
  • SMART health below 80%: Critical alert, replacement required
  • Reallocated sectors above 0: Immediate alert
  • Uncorrectable errors above 0: Emergency alert, potential data loss

The hardware monitoring alert thresholds guide covers threshold configuration across all hardware components, not just storage.

Frequently Asked Questions

Q: My SSD shows "Good" health but is slow. Can SMART miss performance issues?

Yes. SMART health percentage primarily tracks cell health and error rates, not read/write performance degradation. An SSD that's thermally throttling (reducing speed due to heat) can show 100% SMART health while performing at 20% of rated speed. For performance monitoring, you need thermal data alongside SMART data. An NVMe drive running at 75°C will throttle regardless of what SMART reports.

Q: How often do drives fail without any SMART warning?

Backblaze's research suggests roughly 30% of drive failures occur in drives with no prior SMART anomalies. This is often cited as evidence that SMART monitoring is unreliable — which misses the point. Of the other 70%, SMART provided warning. The 30% failure rate without warning is irreducible with current technology. Monitoring catches 70% of failures before they happen; not monitoring catches 0%.

Q: Should I trust a drive that shows Caution but has been stable for months?

A stable Caution status (attributes non-zero but not rising) is different from a worsening Caution status. Document the values and check monthly. If they haven't moved in 6 months, the drive has settled into its post-remapping state and may be stable. If any value is rising month over month, the drive is actively degrading — replace it regardless of how slowly.

Q: Does SMART work on external USB drives?

Partially. USB-to-SATA and USB-to-NVMe bridges pass SMART queries through on most modern adapters, but some budget adapters block SMART commands entirely. CrystalDiskInfo will show "SMART Status Not Available" for drives behind blocking adapters. If you need reliable SMART on external drives, use a direct SATA/NVMe connection (like a dock with direct passthrough) rather than an integrated external drive enclosure.

Q: How much warning time does SMART actually give before failure?

It varies by failure mode. Gradual wear (Reallocated Sectors rising slowly) typically gives 2-8 weeks of warning. Sudden controller failure gives zero warning — but this type of failure is rare (roughly 10-15% of SSD failures). For the most common failure modes — NAND wear, sector degradation, and age-related failure — SMART provides actionable warning the majority of the time.

GGFix Hardware Monitoring

Is your drive showing early failure signs right now?

GGFix reads SMART data continuously and alerts you weeks before data loss — with the specific attribute (reallocated sectors, wear level, health %) named in plain English.

  • 3-day free trial — no credit card, 1 machine included
  • Installs silently as a Windows Service (2 minutes)
  • 50+ sensors + top 25 processes monitored every minute
  • Auto-decodes BSODs and Event IDs 41 / 1001 / 219 / WHEA
  • AI names the exact app that caused any crash or spike
  • Telegram or email alerts in under 10 seconds
Start Monitoring Free
$20/mo · $200/yr (2 months free) · cancel anytime
What does ignoring this actually cost?
ScenarioTypical cost (USD)
Professional data recovery (failed drive)$500 – $2,500
Emergency workstation replacement$1,500 – $4,000
Lost project / missed deadline (1 person)$300 – $1,500
Drive replacement (when warned early)$80 – $300
GGFix monitoring (per machine / month)$20
GGFix monitoring (per machine / year — 2 months free)$200

Early warning is the cheapest insurance you can buy. GGFix catches problems when the fix is still cheap — and names the exact app, sensor, or BSOD code responsible.

Start Monitoring Free — 3 Days
1 machine · no card required · 2 minutes to install
G

GGFix Technical Team

Writing about hardware monitoring, fleet management, and keeping machines alive. Powered by GGFix.

[ free 3-day trial · no credit card ]

Know before it breaks.

GGFix installs in 2 minutes and starts watching your hardware immediately — CPU temps, GPU load, disk health, fan speeds, and 50+ sensors. AI tells you what's wrong before it causes damage.

3 days freeNo credit cardSetup in 2 minCancel anytime

We use essential cookies to make this site work. With your consent we also use analytics (Google Analytics) and error reporting (Sentry) to improve the product. See our Cookie Policy and Privacy Policy.