All Posts

RMM Hardware Monitoring: What Your Platform Misses (And How to Fill the Gap)

7 April 202613 min read1 views
GGFix monitors this 24/7

One offline machine during a deadline costs more than a year of monitoring.

With a fleet you can't physically check every machine every day, and most RMMs show 'online' right up until the moment a workstation blue-screens from thermal shutdown. GGFix watches the hardware layer — sensors, processes, BSODs decoded into plain English — and pushes alerts to whoever is on-call. Whether you have 3 machines or 300.

Start 3-Day Free TrialNo card required

RMM Hardware Monitoring: What Your Platform Misses (And How to Fill the Gap)

The RMM hardware monitoring gap is real, documented in vendor knowledge bases, and costs MSPs clients every year. ConnectWise Automate, Datto RMM, and NinjaOne all offer "hardware monitoring" — but what their agents actually read is a fraction of the sensor data that predicts component failure. A client machine can burn out a GPU, seize a CPU fan, or lose a drive while your RMM dashboard shows everything green. This guide breaks down exactly what each platform monitors, what it misses, and how a dedicated hardware sensor layer fills the difference.

What MSPs Mean When They Say "Hardware Monitoring"

Two definitions of hardware monitoring exist in the industry, and the gap between them is where machines fail silently.

The RMM definition: uptime, CPU utilization percentage, RAM usage, disk space, network connectivity, SMART pass/fail status. These are availability metrics — they tell you whether a machine is online and whether its resources are being consumed. They were designed for server management in the early 2010s and inherited by endpoint RMMs.

The sensor definition: 80 to 120 hardware channels per machine. Per-core CPU temperatures. GPU core temperature, hotspot temperature, and VRAM temperature (three distinct sensors on a modern RTX card). Individual fan RPMs by header. VRM (Voltage Regulator Module) temperatures. NVMe SSD temperature and health log. Motherboard voltage rails. SMART raw attributes — reallocated sectors, pending sectors, uncorrectable sectors — not just the single pass/fail flag.

The difference is not subtle. A machine running an RTX 4080 at 95°C hotspot with a CPU fan degrading from 1,800 RPM to 400 RPM over three weeks looks completely healthy by RMM availability metrics. The resource utilization is normal. The SMART flag is PASSED. The machine is online. And then it fails.

For MSPs managing client fleets, this distinction determines whether you are actually delivering proactive hardware management or just monitoring uptime. Our complete PC fleet management guide covers how this sensor layer fits into a broader fleet monitoring strategy.

What ConnectWise, Datto, and NinjaOne Actually Monitor

All three platforms have published documentation that reveals both their capabilities and their gaps. The research below is drawn directly from official sources.

The Sensor Coverage Matrix

SensorNinjaOneConnectWise AutomateDatto RMMGGFix
CPU temperatureYes (aggregate)Yes (per-core via sensor chip)Script, polledYes (per-core, real-time)
GPU temperatureNo — Task Manager workaroundNo — documented gapNoYes
Fan RPMNoSensor class statusNormal/not-Normal statusYes (per fan, RPM value)
VRM temperatureNoNoNoYes (where supported)
NVMe SSD temperatureNo — manual script requiredNoNoYes
SMART raw attributesNoPartial (smartctl.exe required)No — pass/fail onlyYes
Motherboard voltagesNoYes (sensor chip)NoYes
Monitoring modelThreshold alertsPolled inventory cyclePolled component scripts1-min polling, 5-min cloud upload

ConnectWise Automate

ConnectWise Automate has the deepest native hardware sensor coverage of the three. Its agent reads data from the motherboard sensor chip — system temperatures, per-core CPU temperatures, fan speed classes, and voltage readings. Sensor monitors can be scoped to individual sensors or to a sensor class (all temperature sensors, all fan sensors).

The documented gap is GPU temperature. ConnectWise's own documentation notes that most motherboards do not expose graphics card temperature through standard sensor chip channels. The GPU thermal data lives on a separate bus that the agent does not read. For MSPs with clients running workstation GPUs — CAD, rendering, video production — this is a significant blind spot.

The other structural limitation: sensor data is collected during the inventory cycle, not streamed continuously. Alert latency can run from minutes to hours depending on inventory schedule configuration.

Datto RMM

Datto RMM's out-of-box monitoring policies cover CPU usage, disk space, memory, and network. Hardware sensor data beyond that baseline requires ComStore component monitors — pre-built PowerShell scripts that run on a polling schedule. CPU temperature and SMART drive health monitoring are available as ComStore components, but they are not deployed by default.

Datto's own blog post on custom workstation monitoring is candid about the gap: the default monitoring policies leave workstation hardware undercovered, and MSPs need to build or deploy custom component monitors to address it. This is a vendor admission, not speculation.

The monitoring model matters here. ComStore components run on a schedule and return a status. This is a polling model — not a real-time sensor feed. A CPU that spikes to 105°C under a 45-minute render job may cool before the next polling cycle fires.

NinjaOne

NinjaOne's hardware monitoring page lists CPU temperature, disk health, fan speed, and battery health as monitored metrics. The fan speed and CPU temperature claims have documentation behind them. The GPU temperature claim does not.

NinjaOne's own knowledge base published a post titled "Monitor GPU Temperature from Task Manager." The official answer to how MSPs should check client GPU temperatures is to open Windows Task Manager on the machine. This is not remote monitoring — it is a manual diagnostic step that requires someone to be at the machine or in a remote session.

A separate NinjaOne blog post on checking NVMe SSD temperatures in Windows 11 similarly routes users through manual tools rather than the RMM agent. When a platform publishes workaround tutorials for sensors it cannot read, that is the clearest possible documentation of the gap.

The Three Failures RMMs Won't Catch in Time

These are not edge cases. They are the most common pre-failure signatures in workstation hardware, and all three live in the sensor gap.

1. GPU thermal runaway. Modern NVIDIA RTX cards have three distinct temperature sensors: the core temperature, the hotspot temperature (a second sensor on the die), and the VRAM temperature. The hotspot can run 20 to 30°C above the core temperature under sustained load. VRAM temperatures are independent and can climb faster than core temps on memory-intensive workloads. None of these sensors are available through the standard channels RMM agents use. A GPU cooking at 95°C hotspot — where NVIDIA's thermal throttle triggers at 83°C and protection at 110°C — registers as a healthy, online machine in every major RMM.

2. Fan bearing degradation. A fan that normally runs at 1,800 RPM and gradually drops to 400 RPM over six weeks is a machine three weeks from a thermal event. This failure mode produces a clear, readable trend in RPM data. It is invisible to platforms that report only Normal/not-Normal fan status, or that do not collect fan RPM at all. By the time the fan reaches zero RPM and the machine shuts down thermally, the trend data was sitting unread for weeks.

3. SSD dying with a PASSED status. Google's hardware reliability study found that 56% of failed drives showed no SMART errors before failure. The SMART overall health assessment — the single PASSED/FAILED flag that most RMMs report — has poor predictive accuracy on its own. The predictive value is in the raw attribute trends: Attribute 5 (reallocated sectors), Attribute 197 (pending sectors), Attribute 198 (uncorrectable sectors). Any non-zero value in these attributes is a warning; rising values are an active failure in progress. A drive at PASSED with 47 reallocated sectors is not a healthy drive. Our guide to reading SMART data and predicting SSD failure covers this in detail.

The Development Priority Problem

In 2024 and 2025, all three platforms released major product updates. NinjaOne launched compound condition monitoring, AI-assisted patch approvals, and digital experience metrics. ConnectWise rebuilt its platform on the Asio architecture with 1,200+ out-of-box monitors and AI-powered alert noise reduction. Datto RMM shipped a 200-app software management library, Microsoft 365 onboarding integration, and EDR consolidation.

None of them announced hardware sensor depth improvements.

This is not an oversight — it reflects a deliberate product philosophy. RMMs were built for remote management: patching, scripting, software deployment, ticketing integration. Hardware health was added as a secondary feature layer that relies on Windows WMI, which was never designed for deep hardware sensor access. WMI's Win32_TemperatureProbe class exists in the schema but is rarely populated by hardware drivers. GPU temperature is not exposed through standard WMI on modern NVIDIA and AMD hardware when queried by a background service process — this is a documented limitation, not a configuration issue.

The sensor gap is not closing. Modern hardware is adding more sensor channels, not fewer. Gen 5 NVMe drives have NVMe health logs with media errors, wear indicators, and available spare capacity that do not map to legacy SMART attributes. Current-generation GPUs have GDDR7 VRAM with separate thermal sensors. High-core-count workstation CPUs expose per-chiplet temperature data that aggregate monitoring masks. The platforms investing in security integrations and UI redesigns are not investing in reading this data.

Can You Run a Dedicated Monitor Alongside Your RMM?

Yes. This is the question MSPs avoid asking because of agent sprawl concerns, but the answer is straightforward.

A purpose-built hardware monitoring agent like GGFix runs as a Windows service at approximately 15 MB of RAM. It does not intercept network traffic, manage patches, or interact with the same system interfaces as RMM agents. It reads hardware sensor data via LibreHardwareMonitor, aggregates it every 60 seconds, and uploads to the cloud every 5 minutes. The agent coexists with NinjaOne, ConnectWise Automate, and Datto RMM on the same machines without conflict — different data layer, different purpose, no resource competition worth measuring.

The positioning matters here: dedicated hardware monitoring does not replace your RMM. Your RMM handles patch management, remote access, scripting, and ticketing. The hardware monitor handles the 80 sensor channels your RMM agent cannot read. Stack them — they solve different problems.

For MSPs managing larger fleets, our breakdown of remote hardware monitoring for MSP clients covers deployment workflows and how sensor data integrates with existing alert pipelines.

What to Look for in a Dedicated Hardware Monitoring Tool

If you are evaluating dedicated hardware monitoring for your MSP, five criteria separate tools that actually fill the gap from tools that replicate what your RMM already does.

1. Sensor library depth. The tool should use LibreHardwareMonitor or an equivalent library that reads hardware sensors directly, not via WMI. Confirm it reads GPU hotspot, VRAM temp, VRM temperature, per-fan RPM, NVMe health log, and SMART raw attributes — not just the headline metrics.

2. Per-machine pricing. Tools priced per technician or per site do not scale cleanly for MSPs with varied client sizes. Per-machine pricing (GGFix charges 89 DKK per machine per month, approximately 13 USD) gives you predictable costs proportional to actual fleet size.

3. Alert latency. A polled component script that runs every 30 minutes will miss a thermal event that spikes and recovers in 20 minutes. Real-time or near-real-time sensor streaming — 1-minute polling at the agent, with alerts firing on threshold breach rather than waiting for the next cycle — is the standard that matters for catching failures before they complete.

4. Fleet-level visibility. Individual machine dashboards are table stakes. The value for MSPs comes from fleet-wide views: how many machines across all clients have GPU temperatures above 85°C right now, which machines have rising reallocated sector counts, which fans are trending toward bearing failure. Our analysis of monitoring stacks that scale to 50+ machines covers how this fleet visibility changes the economics of proactive IT.

5. Agent footprint and deployment simplicity. If onboarding a new client requires half a day of agent configuration, the tool does not scale. A hardware monitoring agent should install silently, require no manual sensor configuration, and report to the dashboard within minutes of deployment.

Frequently Asked Questions

Does NinjaOne monitor CPU temperature?

NinjaOne monitors CPU temperature as an aggregate metric and can alert when it exceeds a configured threshold. It does not expose per-core temperature data and does not read GPU temperature, VRM temperature, or NVMe SSD temperature through the RMM agent. NinjaOne's own documentation addresses GPU temperature by directing users to Windows Task Manager — a manual check, not a remote monitoring capability.

Can ConnectWise Automate alert on GPU overheating?

Not natively. ConnectWise Automate reads data from the motherboard sensor chip, which exposes CPU temperatures, fan class status, and voltage readings. GPU temperature data is not accessible through this channel — the GPU thermal sensors are on a separate bus that standard agent polling does not reach. ConnectWise's documentation acknowledges this limitation explicitly.

What is the difference between RMM monitoring and hardware monitoring?

RMM monitoring tracks endpoint availability, resource utilization, and software state — designed to manage and support machines remotely. Hardware monitoring reads physical sensor channels: temperatures, voltages, fan speeds, drive health attributes. These are complementary, not competing. An RMM tells you a machine is online and its disk is 80% full. A hardware monitor tells you that disk has 47 reallocated sectors, the CPU fan is running at 400 RPM, and the GPU hotspot is 94°C.

How do MSPs monitor GPU temperature on client machines?

The practical approach is a dedicated hardware monitoring agent using LibreHardwareMonitor or equivalent, which reads GPU sensor data directly and transmits it to a cloud dashboard with configurable alerts. This runs alongside existing RMM agents without conflict. The alternative — scripted solutions or manual checks — does not scale across a client fleet and lacks the trend data needed for early failure detection.

Is hardware monitoring included in NinjaOne, Datto, or ConnectWise pricing?

Basic hardware inventory — CPU model, RAM capacity, drive presence — is included. Thermal sensor monitoring, fan speed tracking, voltage monitoring, and deep SMART attribute collection are not included in any of the three platforms at the level required for predictive hardware management. These capabilities require either custom scripting effort within the RMM (Datto ComStore components, ConnectWise monitors with smartctl.exe) or a dedicated hardware monitoring layer.

What happens to SMART data when a drive is actually failing?

The SMART overall health flag (PASSED/FAILED) often stays PASSED until catastrophic failure is imminent. Google's hardware reliability study found this flag missed 56% of drives before failure. The predictive value comes from raw attribute trends — specifically Attribute 5 (reallocated sectors), Attribute 197 (current pending sectors), and Attribute 198 (uncorrectable sectors). A drive with zero reallocated sectors today and 47 reallocated sectors next month is actively failing, regardless of what the health flag reports.

GGFix Hardware Monitoring

Stop checking machines manually. Watch all of them at once.

GGFix gives you a single dashboard for your entire fleet — sensors, processes, and decoded BSODs across every machine — with AI-powered alerts that push to Telegram or your PSA webhook.

  • 3-day free trial — no credit card, 1 machine included
  • Installs silently as a Windows Service (2 minutes)
  • 50+ sensors + top 25 processes monitored every minute
  • Auto-decodes BSODs and Event IDs 41 / 1001 / 219 / WHEA
  • AI names the exact app that caused any crash or spike
  • Telegram or email alerts in under 10 seconds
Start Monitoring Free
$20/mo · $200/yr (2 months free) · cancel anytime
What does ignoring this actually cost?
ScenarioTypical cost (USD)
Render farm down during production deadline$1,500 – $7,000
IT consultant (reactive emergency response)$250 – $600/day
Hardware failure across 5 machines (avg)$1,200 – $4,500
Emergency after-hours technician callouts$200 – $600
GGFix monitoring (per machine / month)$20
GGFix monitoring (per machine / year — 2 months free)$200

Early warning is the cheapest insurance you can buy. GGFix catches problems when the fix is still cheap — and names the exact app, sensor, or BSOD code responsible.

Start Monitoring Free — 3 Days
1 machine · no card required · 2 minutes to install

Writing about hardware monitoring, fleet management, and keeping machines alive. Powered by GGFix.

[ free 3-day trial · no credit card ]

Know before it breaks.

GGFix installs in 2 minutes and starts watching your hardware immediately — CPU temps, GPU load, disk health, fan speeds, and 50+ sensors. AI tells you what's wrong before it causes damage.

3 days freeNo credit cardSetup in 2 minCancel anytime

We use essential cookies to make this site work. With your consent we also use analytics (Google Analytics) and error reporting (Sentry) to improve the product. See our Cookie Policy and Privacy Policy.