Taints, Tolerations, Node Selector & Node Affinity in Kubernetes

🧠 Problem-Driven Storyline: Smarter Scheduling with Rules

In a multi-node Kubernetes cluster, controlling which Pods run on which nodes is crucial for optimal resource usage, security, and performance. But out-of-the-box, Kubernetes treats all nodes equally unless we give it some hints.

Let’s journey through how we intelligently guide Kubernetes to make smarter scheduling decisions β€” from rejecting unwanted Pods to steering preferred ones with pinpoint control.


1️⃣ Taints & Tolerations: Making Nodes Say "No!"

πŸ€” The Problem

What if a node is reserved for special workloads? How can we ensure that only specific pods land there and others stay away?

πŸ› οΈ The Solution: Taints & Tolerations

  • A taint is applied to a node. It says: β€œDon’t schedule any pod here unless it tolerates me!”
  • A toleration is added to a pod. It says: β€œIt’s okay, I can live with that taint.”

πŸ“Œ Taint Syntax:

kubectl taint node ibtisam-worker flower=rose:NoSchedule

πŸ” To remove the taint:

kubectl taint node ibtisam-worker flower=rose:NoSchedule-

πŸ” Inspect taints on a node:

kubectl describe node ibtisam-control-plane | grep -i taint -5

βš™οΈ Taint Effects:

  • NoSchedule: Pods that don’t tolerate this taint will not be scheduled.
  • PreferNoSchedule: Scheduler will try to avoid tainted node, but not guaranteed.
  • NoExecute: Existing pods without toleration will be evicted from the node.

πŸ§ͺ YAML Example: Tolerated vs Non-Tolerated

# Plain pod (no toleration)
apiVersion: v1
kind: Pod
metadata:
  name: plain-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
# Pod that tolerates the taint flower=rose:NoSchedule
apiVersion: v1
kind: Pod
metadata:
  name: tol-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
  tolerations:
  - key: "flower"
    operator: "Equal"
    value: "rose"
    effect: "NoSchedule"

🀯 Problem Solved?

Yes: Unwanted pods are repelled from tainted nodes.

❗ Still a Problem:

The wanted pod (with toleration) could be scheduled anywhere else too β€” not necessarily on the desired node.


2️⃣ Labels, NodeSelector, and Directed Scheduling

⚠️ The Next Problem

We want not only to tolerate a node’s taint β€” but also to target that specific node.

βœ… The Solution: Labels + nodeSelector

  • Labels are key-value pairs that we can apply to various Kubernetes resources (nodes, pods, services, etc.).
  • nodeSelector allows us to schedule pods on nodes that have specific labels.

🏷️ Label the Node

kubectl label node ibtisam-worker2 cpu=large

πŸ§ͺ YAML: nodeSelector

apiVersion: v1
kind: Pod
metadata:
  name: nodeselector-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
  nodeSelector:
    cpu: large

⚠️ Limitations of nodeSelector:

  • Only uses equality-based matching (key = value).
  • No advanced expressions like "In", "Exists", etc.
  • Cannot match multiple complex conditions.

3️⃣ Node Affinity: Advanced Scheduling Logic

πŸš€ The Upgrade from nodeSelector

Node Affinity lets you define more expressive rules using match expressions with multiple operators.

πŸ’‘ Two Types (as of now):

  • requiredDuringSchedulingIgnoredDuringExecution: Hard rule β€” pod must match, or it won't be scheduled.
  • preferredDuringSchedulingIgnoredDuringExecution: Soft rule β€” try to match, but can skip.

πŸ§ͺ Hard Node Affinity (Required)

apiVersion: v1
kind: Pod
metadata:
  name: ha-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
            - hdd

πŸ§ͺ Soft Node Affinity (Preferred)

apiVersion: v1
kind: Pod
metadata:
  name: sa-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

πŸ§ͺ Combo: Toleration + Node Affinity

apiVersion: v1
kind: Pod
metadata:
  name: tol-ha-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
  tolerations:
  - key: "flower"
    operator: "Equal"
    value: "rose"
    effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

4️⃣ Multi-Taint & Multi-Label Logic

πŸ§ͺ Multiple Taints on a Node

kubectl taint nodes node1 env=prod:NoSchedule
kubectl taint nodes node1 gpu=nvidia:NoExecute
- All taints must be tolerated by the Pod to be scheduled.

πŸ§ͺ Multiple Labels on a Node

kubectl label nodes node1 env=prod disktype=ssd tier=frontend

You can combine expressions with logical AND by using multiple matchExpressions. All must be satisfied.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: env
          operator: In
          values:
          - prod
        - key: disktype
          operator: In
          values:
          - ssd

5️⃣ Labels vs. Taints β€” Key Distinction

Feature Labels Taints & Tolerations
Applied To Any resource Only nodes
Purpose Selection / Match Restriction / Repelling
Pod Role selector/affinity toleration
Enforcement Soft (opt-in) Hard (opt-out)

Labels are universal selectors. Taints are strict gatekeepers on nodes.


πŸ§ͺ YAML Lab: Multi-Taint + Multi-Affinity Example

apiVersion: v1
kind: Pod
metadata:
  name: multi-constraint-po
spec:
  containers:
  - name: abcd
    image: busybox
    command: ["sleep", "3600"]
  tolerations:
  - key: "env"
    operator: "Equal"
    value: "prod"
    effect: "NoSchedule"
  - key: "gpu"
    operator: "Exists"
    effect: "NoExecute"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
          - key: tier
            operator: In
            values:
            - frontend

βœ… Summary: Dot-Connecting Review

  • Taints repel pods, tolerations let pods tolerate taints.
  • Labels attract pods via nodeSelector or nodeAffinity.
  • nodeSelector is basic; nodeAffinity is expressive.
  • You can combine toleration + affinity to precisely target nodes.
  • Multi-taints or labels work by satisfying all conditions.

This layering gives you surgical scheduling control for real-world production environments.


apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:                                  # matchLabels are key-value pairs.
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:    # matchExpressions: key field is "key", the operator is "In", and the values array contains only "value". 
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: redis:3.2-alpine