Taints, Tolerations, Node Selector & Node Affinity in Kubernetes¶
π§ Problem-Driven Storyline: Smarter Scheduling with Rules¶
In a multi-node Kubernetes cluster, controlling which Pods run on which nodes is crucial for optimal resource usage, security, and performance. But out-of-the-box, Kubernetes treats all nodes equally unless we give it some hints.
Letβs journey through how we intelligently guide Kubernetes to make smarter scheduling decisions β from rejecting unwanted Pods to steering preferred ones with pinpoint control.
1οΈβ£ Taints & Tolerations: Making Nodes Say "No!"¶
π€ The Problem¶
What if a node is reserved for special workloads? How can we ensure that only specific pods land there and others stay away?
π οΈ The Solution: Taints & Tolerations¶
- A taint is applied to a node. It says: βDonβt schedule any pod here unless it tolerates me!β
- A toleration is added to a pod. It says: βItβs okay, I can live with that taint.β
π Taint Syntax:¶
kubectl taint node ibtisam-worker flower=rose:NoSchedule
π To remove the taint:¶
kubectl taint node ibtisam-worker flower=rose:NoSchedule-
π Inspect taints on a node:¶
kubectl describe node ibtisam-control-plane | grep -i taint -5
βοΈ Taint Effects:¶
- NoSchedule: Pods that donβt tolerate this taint will not be scheduled.
- PreferNoSchedule: Scheduler will try to avoid tainted node, but not guaranteed.
- NoExecute: Existing pods without toleration will be evicted from the node.
π§ͺ YAML Example: Tolerated vs Non-Tolerated¶
# Plain pod (no toleration)
apiVersion: v1
kind: Pod
metadata:
name: plain-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
# Pod that tolerates the taint flower=rose:NoSchedule
apiVersion: v1
kind: Pod
metadata:
name: tol-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
tolerations:
- key: "flower"
operator: "Equal"
value: "rose"
effect: "NoSchedule"
π€― Problem Solved?¶
Yes: Unwanted pods are repelled from tainted nodes.
β Still a Problem:¶
The wanted pod (with toleration) could be scheduled anywhere else too β not necessarily on the desired node.
2οΈβ£ Labels, NodeSelector, and Directed Scheduling¶
β οΈ The Next Problem¶
We want not only to tolerate a nodeβs taint β but also to target that specific node.
β The Solution: Labels + nodeSelector¶
- Labels are key-value pairs that we can apply to various Kubernetes resources (nodes, pods, services, etc.).
- nodeSelector allows us to schedule pods on nodes that have specific labels.
π·οΈ Label the Node¶
kubectl label node ibtisam-worker2 cpu=large
π§ͺ YAML: nodeSelector¶
apiVersion: v1
kind: Pod
metadata:
name: nodeselector-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
nodeSelector:
cpu: large
β οΈ Limitations of nodeSelector:¶
- Only uses equality-based matching (
key = value). - No advanced expressions like "In", "Exists", etc.
- Cannot match multiple complex conditions.
3οΈβ£ Node Affinity: Advanced Scheduling Logic¶
π The Upgrade from nodeSelector¶
Node Affinity lets you define more expressive rules using match expressions with multiple operators.
π‘ Two Types (as of now):¶
requiredDuringSchedulingIgnoredDuringExecution: Hard rule β pod must match, or it won't be scheduled.preferredDuringSchedulingIgnoredDuringExecution: Soft rule β try to match, but can skip.
π§ͺ Hard Node Affinity (Required)¶
apiVersion: v1
kind: Pod
metadata:
name: ha-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
- hdd
π§ͺ Soft Node Affinity (Preferred)¶
apiVersion: v1
kind: Pod
metadata:
name: sa-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: disktype
operator: In
values:
- ssd
π§ͺ Combo: Toleration + Node Affinity¶
apiVersion: v1
kind: Pod
metadata:
name: tol-ha-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
tolerations:
- key: "flower"
operator: "Equal"
value: "rose"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
4οΈβ£ Multi-Taint & Multi-Label Logic¶
π§ͺ Multiple Taints on a Node¶
kubectl taint nodes node1 env=prod:NoSchedule
kubectl taint nodes node1 gpu=nvidia:NoExecute
π§ͺ Multiple Labels on a Node¶
kubectl label nodes node1 env=prod disktype=ssd tier=frontend
You can combine expressions with logical AND by using multiple matchExpressions. All must be satisfied.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values:
- prod
- key: disktype
operator: In
values:
- ssd
5οΈβ£ Labels vs. Taints β Key Distinction¶
| Feature | Labels | Taints & Tolerations |
|---|---|---|
| Applied To | Any resource | Only nodes |
| Purpose | Selection / Match | Restriction / Repelling |
| Pod Role | selector/affinity | toleration |
| Enforcement | Soft (opt-in) | Hard (opt-out) |
Labels are universal selectors. Taints are strict gatekeepers on nodes.
π§ͺ YAML Lab: Multi-Taint + Multi-Affinity Example¶
apiVersion: v1
kind: Pod
metadata:
name: multi-constraint-po
spec:
containers:
- name: abcd
image: busybox
command: ["sleep", "3600"]
tolerations:
- key: "env"
operator: "Equal"
value: "prod"
effect: "NoSchedule"
- key: "gpu"
operator: "Exists"
effect: "NoExecute"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
- key: tier
operator: In
values:
- frontend
β Summary: Dot-Connecting Review¶
- Taints repel pods, tolerations let pods tolerate taints.
- Labels attract pods via nodeSelector or nodeAffinity.
- nodeSelector is basic; nodeAffinity is expressive.
- You can combine toleration + affinity to precisely target nodes.
- Multi-taints or labels work by satisfying all conditions.
This layering gives you surgical scheduling control for real-world production environments.
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
selector:
matchLabels: # matchLabels are key-value pairs.
app: store
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions: # matchExpressions: key field is "key", the operator is "In", and the values array contains only "value".
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: redis-server
image: redis:3.2-alpine