VMware ESXi Host Disconnected or Not Responding: How to Fix It
— ny_wk

Few things spike a sysadmin's heart rate like an ESXi host going red / "Not Responding" / disconnected in vCenter — especially when VMs are still serving traffic on it. The good news: in most cases the VMs are fine and it's the management layer that's stuck. Here's how to diagnose it calmly and bring the host back, in the order that fixes it fastest.
First, understand what "disconnected" means
vCenter talks to each host through two management agents: hostd (the host daemon) and vpxa (the vCenter agent). When the host shows "Not Responding," it usually means vCenter lost contact with those agents — not that the host or its VMs are down. So step one is: don't panic, and don't power-cycle the host with running VMs unless you've ruled everything else out.
Step 1 — Check whether the host and VMs are actually alive
- Can you ping the host's management IP?
- Are the VMs still reachable on the network (serving traffic)?
- Can you reach the host directly via the ESXi web UI or SSH?
If VMs are up and you can reach the host directly, it's almost certainly a management-agent or vCenter-connection issue — the safe, common case.
Step 2 — Try a simple reconnect in vCenter
Right-click the host → Connection → Reconnect. A transient network blip often resolves with this alone. If it reconnects, you're done.
Step 3 — Restart the management agents (the usual fix)
This restarts hostd/vpxa without touching running VMs. Two ways:
- Via DCUI (console/iLO/iDRAC): log in → Troubleshooting Options → Restart Management Agents → confirm.
- Via SSH (if enabled): run
/etc/init.d/hostd restartand/etc/init.d/vpxa restart, or restart all withservices.sh restart.
Give it a minute, then reconnect the host in vCenter. This resolves the large majority of "Not Responding" cases.
Step 4 — Check the deeper causes if it persists
| Symptom | Likely cause | What to check |
|---|---|---|
| Host pingable, agents restart but re-disconnect | vpxa/heartbeat or vCenter-side issue | vCenter health; remove + re-add host |
| Host unpingable, VMs down | Network or PSOD (purple screen) | Physical NIC/switch; console for PSOD |
| Host slow/agents won't start | Resource exhaustion (root FS full, RAM) | vdf -h for full /var or scratch; logs |
| Storage all-paths-down | SAN/datastore connectivity | Storage paths, HBA, multipathing |
Step 5 — Read the logs
On the host, the key logs are /var/log/hostd.log and /var/log/vpxa.log (and vmkernel.log for hardware/storage). A full /var or scratch partition is a classic cause of agents dying — check disk space first if restarts don't stick.
Step 6 — Last resorts
- Remove and re-add the host in vCenter (re-registers the agents).
- Only if the host is truly hung and VMs are already down/migrated: a hard reboot. Never do this lightly with running VMs.
Prevent the next 2 a.m. page
- Monitor CPU, memory, datastore, and host reachability with thresholds + instant alerts, so you catch a filling
/varor a flaky path before it disconnects. - Keep the scratch/log partition from filling (redirect logs to a syslog server / datastore).
- Watch network and storage paths — most "mystery" disconnects trace back to one of them.
Key takeaways
- "Not Responding" usually means vCenter lost the management agents (hostd/vpxa) — VMs are often fine.
- Order of fixes: verify host/VMs alive → Reconnect → Restart Management Agents (DCUI or
services.sh restart) → investigate network/storage/resources → re-add host. - A full /var or scratch partition is a top cause — check disk space and logs.
- Monitoring with alerts prevents most of these from becoming outages.
Frequently asked questions
Will restarting management agents affect running VMs?
No — hostd/vpxa restarts don't stop or reboot VMs. It's the safe first real fix.
How do I restart the agents without SSH?
Use the DCUI (console/iLO) → Troubleshooting Options → Restart Management Agents.
The host is unpingable and VMs are down — now what?
Check the physical console for a PSOD and inspect network/power. This is the genuine outage case; only then consider a hard reboot.
Why does this keep happening?
Recurring disconnects usually point to a full log/scratch partition, a flaky NIC/path, or resource exhaustion — fix the root cause and add monitoring.
Stay calm, confirm the VMs are actually fine, restart the management agents, and only escalate to network/storage/reboot if that doesn't hold — that sequence resolves the vast majority of ESXi "Not Responding" alerts.