How the Kubernetes Scheduler Actually Picks a Node

— ny_wk

When you run kubectl apply and a pod springs to life on some node, a quiet piece of the control plane made that call: the kube-scheduler. It is one of those components you never think about until a pod is stuck in Pending and you are staring at it wondering why. Here is what it actually does, in plain terms.

The one job it owns

The scheduler has a narrow, important responsibility: decide which node a new pod should run on. That is it. It does not start containers, pull images, or babysit anything afterward — that is the kubelet's job on each node. The scheduler just watches for pods that have no node assigned and binds each one to the best available node.

So the lifecycle looks like this:

You create a pod (directly, or via a Deployment).
The scheduler notices a pod whose nodeName is empty.
It picks a node and writes that binding back to the API server.
The kubelet on that node sees the pod is now its problem and actually runs it.

How it chooses — filter, then score

The decision happens in two phases, and understanding them explains almost every "why is my pod Pending?" mystery.

1. Filtering (can this node even run it?)

The scheduler throws out every node that cannot host the pod. A node gets filtered out if it does not have enough free CPU or memory for the pod's requests, if it does not match the pod's nodeSelector or affinity rules, if a taint blocks the pod, or if a required volume cannot attach there. What survives is the list of feasible nodes.

2. Scoring (which feasible node is best?)

Among the survivors, the scheduler ranks each node with a score — favoring things like spreading pods across nodes, packing efficiently, and honoring affinity preferences. The highest-scoring node wins, and the pod is bound there.

If zero nodes survive filtering, the pod stays Pending — and that is your signal to check requests, taints, and affinity rules.

Why this matters in real life

Most scheduling headaches trace back to this model:

Pod stuck Pending: usually no node has enough requested CPU/memory, or a selector/taint rules them all out.
Everything piling on one node: often missing resource requests, so the scheduler cannot reason about capacity.
Pods not landing where you expect: check nodeSelector, node/pod affinity, and tolerations.

Set sensible CPU and memory requests on your pods. That single habit gives the scheduler the information it needs and prevents most of these problems.

Key takeaways

The kube-scheduler's only job is binding unscheduled pods to nodes — not running them.
It works in two passes: filter out unfit nodes, then score the rest and pick the best.
A Pending pod almost always means filtering eliminated every node.
Accurate resource requests are the key to predictable scheduling.

Frequently asked questions

Does the scheduler run the container?

No. It only assigns the pod to a node. The kubelet on that node pulls images and runs the containers.

Why is my pod stuck in Pending?

No node passed the filtering phase — typically not enough requested CPU/memory, or a nodeSelector, affinity rule, or taint excluded every node.

Can I influence which node a pod lands on?

Yes — with nodeSelector, node/pod affinity and anti-affinity, taints and tolerations, and resource requests. You can also run multiple schedulers for special cases.

What happens if two pods want the same node?

Each is scheduled independently; the scheduler re-evaluates current capacity for every pod, so it accounts for what it just placed.

The scheduler feels like magic until you see the filter-then-score logic — then "why did it land there?" turns into a question you can actually answer.