VMware ESXi Host Disconnected or Not Responding: How to Fix It

— ny_wk

Few things spike a sysadmin's heart rate like an ESXi host going red / "Not Responding" / disconnected in vCenter — especially when VMs are still serving traffic on it. The good news: in most cases the VMs are fine and it's the management layer that's stuck. Here's how to diagnose it calmly and bring the host back, in the order that fixes it fastest.

First, understand what "disconnected" means

vCenter talks to each host through two management agents: hostd (the host daemon) and vpxa (the vCenter agent). When the host shows "Not Responding," it usually means vCenter lost contact with those agents — not that the host or its VMs are down. So step one is: don't panic, and don't power-cycle the host with running VMs unless you've ruled everything else out.

Step 1 — Check whether the host and VMs are actually alive

Can you ping the host's management IP?
Are the VMs still reachable on the network (serving traffic)?
Can you reach the host directly via the ESXi web UI or SSH?

If VMs are up and you can reach the host directly, it's almost certainly a management-agent or vCenter-connection issue — the safe, common case.

Step 2 — Try a simple reconnect in vCenter

Right-click the host → Connection → Reconnect. A transient network blip often resolves with this alone. If it reconnects, you're done.

Step 3 — Restart the management agents (the usual fix)

This restarts hostd/vpxa without touching running VMs. Two ways:

Via DCUI (console/iLO/iDRAC): log in → Troubleshooting Options → Restart Management Agents → confirm.
Via SSH (if enabled): run /etc/init.d/hostd restart and /etc/init.d/vpxa restart, or restart all with services.sh restart.

Give it a minute, then reconnect the host in vCenter. This resolves the large majority of "Not Responding" cases.

Step 4 — Check the deeper causes if it persists

Symptom	Likely cause	What to check
Host pingable, agents restart but re-disconnect	vpxa/heartbeat or vCenter-side issue	vCenter health; remove + re-add host
Host unpingable, VMs down	Network or PSOD (purple screen)	Physical NIC/switch; console for PSOD
Host slow/agents won't start	Resource exhaustion (root FS full, RAM)	`vdf -h` for full /var or scratch; logs
Storage all-paths-down	SAN/datastore connectivity	Storage paths, HBA, multipathing

Step 5 — Read the logs

On the host, the key logs are /var/log/hostd.log and /var/log/vpxa.log (and vmkernel.log for hardware/storage). A full /var or scratch partition is a classic cause of agents dying — check disk space first if restarts don't stick.

Step 6 — Last resorts

Remove and re-add the host in vCenter (re-registers the agents).
Only if the host is truly hung and VMs are already down/migrated: a hard reboot. Never do this lightly with running VMs.

Prevent the next 2 a.m. page

Monitor CPU, memory, datastore, and host reachability with thresholds + instant alerts, so you catch a filling /var or a flaky path before it disconnects.
Keep the scratch/log partition from filling (redirect logs to a syslog server / datastore).
Watch network and storage paths — most "mystery" disconnects trace back to one of them.

Key takeaways

"Not Responding" usually means vCenter lost the management agents (hostd/vpxa) — VMs are often fine.
Order of fixes: verify host/VMs alive → Reconnect → Restart Management Agents (DCUI or services.sh restart) → investigate network/storage/resources → re-add host.
A full /var or scratch partition is a top cause — check disk space and logs.
Monitoring with alerts prevents most of these from becoming outages.

Frequently asked questions

Will restarting management agents affect running VMs?

No — hostd/vpxa restarts don't stop or reboot VMs. It's the safe first real fix.

How do I restart the agents without SSH?

Use the DCUI (console/iLO) → Troubleshooting Options → Restart Management Agents.

The host is unpingable and VMs are down — now what?

Check the physical console for a PSOD and inspect network/power. This is the genuine outage case; only then consider a hard reboot.

Why does this keep happening?

Recurring disconnects usually point to a full log/scratch partition, a flaky NIC/path, or resource exhaustion — fix the root cause and add monitoring.

Stay calm, confirm the VMs are actually fine, restart the management agents, and only escalate to network/storage/reboot if that doesn't hold — that sequence resolves the vast majority of ESXi "Not Responding" alerts.