All Posts

The IT Technician's Hardware Diagnostic Toolkit

7 April 20269 min read1 views
GGFix monitors this 24/7

Your hardware is degrading. The question is whether you find out first.

GGFix monitors 50+ sensors per machine, tracks the top 25 processes every minute, decodes every BSOD into plain English, and alerts you in under 10 seconds — before degradation turns into a failure, a repair bill, or lost work.

Start 3-Day Free TrialNo card required

The IT Technician's Hardware Diagnostic Toolkit

Every IT technician develops a preferred set of diagnostic tools over time. After 8 years of hardware repair work and diagnosing everything from "my computer is slow" complaints to enterprise server failures, the toolkit settles around a relatively stable set of tools that cover the full diagnostic surface of a Windows PC: hardware sensors, storage health, RAM integrity, CPU stability, and network connectivity. This guide covers the tools that belong in every technician's diagnostic workflow, what each one is best at, and when to reach for which.

For ongoing fleet monitoring (as opposed to one-time diagnostics), see our complete hardware monitoring guide. For post-repair validation, see our post-repair hardware validation guide.

Category 1: Real-Time Hardware Sensor Readers

HWiNFO64

What it is: The most comprehensive hardware sensor reader for Windows. Reads every accessible sensor: CPU per-core temperatures, GPU core/VRAM/hotspot temperatures, fan speeds, voltage rails, clock frequencies, power draw.

When to use: Any time you need to know exactly what a machine's hardware sensors are reporting. Baseline diagnostic when a machine is "running hot" or "crashing unexpectedly." Validation after hardware repair (thermal paste, fan replacement) to confirm temperatures are in range.

How to use it effectively: Run it for 5–10 minutes under the workload that triggers the problem. Use the "min/max" columns — the maximum temperatures recorded over the session are more informative than the current readings.

Free: Yes. hwinfo.com.

GPU-Z

What it is: GPU-specific diagnostic tool from TechPowerUp. Shows GPU specifications, VBIOS version, and real-time sensor data specifically for GPU hardware.

When to use: GPU-specific issues. When you need to verify GPU specifications (useful when clients report incorrect GPU information), check VBIOS version, or log GPU-specific sensors over a gaming session or render job.

Free: Yes. techpowerup.com/gpuz.

CPU-Z

What it is: CPU identification and memory specification tool. Shows CPU model, stepping, voltage, clock speed, and detailed RAM timing information.

When to use: Identifying exact CPU stepping (relevant for Intel Raptor Lake instability diagnosis), verifying RAM frequency and timings, confirming XMP/EXPO profile is active.

Free: Yes. cpuid.com/softwares/cpu-z.html.

Category 2: Storage Health

CrystalDiskInfo

What it is: S.M.A.R.T. attribute reader for all connected drives. Shows drive health status (Good/Caution/Bad), individual S.M.A.R.T. attribute values, temperature, power-on hours, and power cycle count.

When to use: Any time storage problems are suspected. "Slow computer" complaints. Before imaging a machine (confirm drive health before trusting the image). After recovering data from a failing drive. Regular fleet health checks.

The most important attributes to check:

  • Reallocated Sectors Count (ID 5): Any non-zero value is a warning sign
  • Reported Uncorrectable Errors (ID 187): Non-zero indicates read errors
  • Current Pending Sector Count (ID 197): Non-zero means sectors waiting to be reallocated
  • SSD Wear Leveling Count / Media Wearout Indicator: Shows remaining SSD endurance

Free: Yes. crystalmark.info.

CrystalDiskMark

What it is: Storage benchmark tool. Measures sequential and random read/write performance.

When to use: Performance complaints. After SSD repair or replacement validation. Identifying SSDs that are thermal throttling (if benchmark shows unexpected slowness and temperatures are high, throttling is the cause).

Free: Yes. crystalmark.info.

Victoria

What it is: Advanced disk diagnostics and repair tool. Can perform surface scans, remap bad sectors (on drives that support it), and test drive response times at a lower level than CrystalDiskInfo.

When to use: When CrystalDiskInfo shows warnings and you need to understand severity. Can sometimes "fix" drives with remappable sectors by forcing the remap process, extending usable life for recovery purposes. Advanced tool — understand what you're doing before using the write functions.

Free: Yes (basic version). hdd.by.

Category 3: RAM Diagnostics

MemTest86

What it is: Bootable RAM testing tool. Runs outside Windows, testing all accessible RAM with a comprehensive set of memory patterns. The gold standard for RAM diagnostic.

When to use: Any unexplained BSODs, application crashes, data corruption events, or "system instability" complaints. RAM errors are frequently misdiagnosed as software problems. MemTest86 is the definitive test — passing MemTest86 (all tests, at least 2 passes) makes RAM a less likely culprit.

How to use it effectively: Run at least one full pass (can take 1–4 hours depending on RAM capacity). For a definitive result, run 2 complete passes. Test with all sticks installed first; if errors appear, remove sticks one at a time to identify the faulty module.

Free: Yes. memtest86.com.

Windows Memory Diagnostic

What it is: Windows' built-in RAM testing tool, accessible via Control Panel or searching "Windows Memory Diagnostic."

When to use: Quick check when you cannot boot from external media. Less comprehensive than MemTest86 but faster and does not require a bootable USB.

Limitation: Less sensitive than MemTest86. Passing Windows Memory Diagnostic does not rule out RAM issues the way passing MemTest86 does.

Category 4: CPU and System Stability Testing

Prime95

What it is: A prime number calculation program repurposed as a CPU stress test. Runs SmallFFT, LargeFFT, or In-Place FFT tests that maximize CPU thermal and electrical load.

When to use: Validating CPU cooling after thermal paste replacement. Confirming CPU stability after overclocking. Diagnosing power delivery issues (Prime95 stresses VRMs maximally).

How to use it effectively: Run SmallFFT for maximum CPU heat generation (useful for validating cooling solutions). Run Blend mode for mixed CPU/RAM stress (useful for stability validation). Run for 15–30 minutes for a quick check; 24 hours for thorough stability validation.

Important: Prime95 generates unrealistically high CPU loads that consumer cooling solutions are not always designed for. It can trigger throttling that does not occur in normal use. Do not use Prime95 results as the only test of system adequacy — also test under actual workloads.

Free: Yes. mersenne.org/prime95.

AIDA64 Extreme

What it is: System information and stress testing tool. More calibrated than Prime95 for realistic workload simulation. Useful for long-duration stability testing.

When to use: Long-duration stability tests (overnight runs) where Prime95's aggressive FFT tests are too unrealistic. Hardware identification and documentation.

Cost: $40 for a personal license. Frequently used in professional IT environments for its comprehensive hardware reporting.

FurMark

What it is: GPU stress test that maximizes GPU thermal load beyond what games or typical workloads produce.

When to use: Validating GPU cooling after thermal paste replacement. Testing GPU stability after repair. Identifying GPUs that throttle under sustained maximum load.

Caution: FurMark pushes GPUs harder than most real workloads. Very old or already-degraded GPUs can fail during FurMark tests. Appropriate for post-repair validation; less appropriate for routine testing on customer hardware you haven't inspected.

Free: Yes. geeks3d.com/furmark.

Category 5: System-Level Diagnostics

Speccy

What it is: System information tool from Piriform. Provides a quick summary of all installed hardware, temperatures, and component specifications.

When to use: Quickly documenting a system's hardware configuration for repair records, warranty claims, or client documentation.

Free: Yes (basic version). ccleaner.com/speccy.

Windows Event Viewer (built-in)

What it is: Windows' built-in event logging system. Records hardware errors, application crashes, driver failures, and system events.

When to use: Any unexplained crash or error. Always check Event Viewer before and after any hardware repair to document the error state and confirm resolution. See our Windows Event Viewer hardware diagnostics guide for specific event IDs to look for.

NirSoft BlueScreenView

What it is: BSOD (Blue Screen of Death) analysis tool. Reads Windows minidump files from previous crashes and identifies the driver or component that caused each crash.

When to use: Systems with a history of BSODs. Helps distinguish hardware-caused BSODs (memory.sys, ntoskrnl.exe crashes pointing to RAM/hardware) from driver-caused BSODs (pointing to specific driver files) from corruption-caused BSODs.

Free: Yes. nirsoft.net/utils/blue_screen_view.html.

The Diagnostic Workflow

For an incoming "my computer is acting strange" ticket, the diagnostic sequence that covers 90% of hardware causes:

  1. CrystalDiskInfo: Check drive health first — storage failure is the most common cause of "strange behavior." If S.M.A.R.T. shows warnings, prioritize data backup immediately.
  2. HWiNFO64 under load: Check temperatures and voltages during the problematic workload. Document the maximum values.
  3. BlueScreenView: If there are BSODs, identify the crash type.
  4. MemTest86: If crashes are random and temperatures are normal, RAM is the next suspect. Run at least one full pass.
  5. Prime95/FurMark: If the problem is heat-related, validate that the cooling solution is adequate after cleaning/repair.
  6. Event Viewer: Document hardware errors and confirm clean system after repair.

For fleet-wide monitoring (rather than one-off diagnostics), GGFix provides continuous visibility across all machines, surfacing issues before they require a technician visit.

Frequently Asked Questions

Is there a single tool that does everything?

No. Each tool is specialized for a specific diagnostic layer. HWiNFO reads sensors but doesn't test RAM. MemTest86 tests RAM but doesn't show GPU temperatures. A complete diagnostic workflow uses multiple tools in sequence. The combination of CrystalDiskInfo + HWiNFO + MemTest86 covers the three most common hardware failure categories (storage, thermal, RAM) for typical diagnostic cases.

Are all these tools safe to run on customer machines?

Yes, with the exception of write-intensive tools (Victoria's repair functions) which should only be used with the customer's explicit understanding. The sensor readers (HWiNFO, CrystalDiskInfo, Speccy) and stress tests (Prime95, MemTest86) are standard diagnostic tools used throughout the industry.

How long should stress tests run to be meaningful?

15–30 minutes is sufficient for thermal validation (confirming temperatures are stable after repair). For stability validation (confirming no crashes under load), 4–8 hours is a more reliable test. Overnight runs (8–12 hours) give high confidence in stability for critical systems.

What is the most important tool to carry on a USB drive for field diagnostics?

For a single tool, MemTest86 (bootable USB) is the most universally useful because it tests the hardware component (RAM) most commonly involved in unexplained instability that has no obvious visible cause. Add CrystalDiskInfo for storage diagnostics and HWiNFO for sensor reading, and you have the core diagnostic triad covered.

GGFix Hardware Monitoring

Find out if your hardware has problems right now.

GGFix monitors 50+ sensors per machine plus the top 25 processes every minute, decodes BSODs into plain English, and pushes alerts to your phone in under 10 seconds.

  • 3-day free trial — no credit card, 1 machine included
  • Installs silently as a Windows Service (2 minutes)
  • 50+ sensors + top 25 processes monitored every minute
  • Auto-decodes BSODs and Event IDs 41 / 1001 / 219 / WHEA
  • AI names the exact app that caused any crash or spike
  • Telegram or email alerts in under 10 seconds
Start Monitoring Free
$20/mo · $200/yr (2 months free) · cancel anytime
What does ignoring this actually cost?
ScenarioTypical cost (USD)
Emergency repair after hardware failure$300 – $1,500
Data recovery (worst case)$500 – $2,500
Lost workday per incident$150 – $800
Preventive maintenance (if flagged early)$30 – $130
GGFix monitoring (per machine / month)$20
GGFix monitoring (per machine / year — 2 months free)$200

Early warning is the cheapest insurance you can buy. GGFix catches problems when the fix is still cheap — and names the exact app, sensor, or BSOD code responsible.

Start Monitoring Free — 3 Days
1 machine · no card required · 2 minutes to install

Writing about hardware monitoring, fleet management, and keeping machines alive. Powered by GGFix.

[ free 3-day trial · no credit card ]

Know before it breaks.

GGFix installs in 2 minutes and starts watching your hardware immediately — CPU temps, GPU load, disk health, fan speeds, and 50+ sensors. AI tells you what's wrong before it causes damage.

3 days freeNo credit cardSetup in 2 minCancel anytime

We use essential cookies to make this site work. With your consent we also use analytics (Google Analytics) and error reporting (Sentry) to improve the product. See our Cookie Policy and Privacy Policy.