GitHub EN PT

K8s by Example: Taints & Tolerations

Taints mark nodes to repel Pods; tolerations allow Pods to schedule onto tainted nodes. Unlike affinity (attracting Pods to nodes), taints work by repulsion. Use for: dedicated nodes, GPU nodes, maintenance, preventing workloads from specific hardware.

taint-effects.yaml

Three taint effects: NoSchedule (new Pods won’t schedule), PreferNoSchedule (soft preference to avoid), NoExecute (existing Pods evicted, new Pods won’t schedule). Taints have key, value, and effect.

kubectl taint nodes worker-1 dedicated=gpu:NoSchedule

kubectl taint nodes worker-2 preferred=lowpriority:PreferNoSchedule

kubectl taint nodes worker-3 maintenance=true:NoExecute
toleration-exact.yaml

The Equal operator requires exact key and value match. This Pod can run on nodes with dedicated=gpu:NoSchedule taint. Without this toleration, it would be rejected.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "gpu"
      effect: "NoSchedule"
  containers:
    - name: ml-training
      image: ml-trainer:v1
      resources:
        limits:
          nvidia.com/gpu: 1
toleration-exists.yaml

The Exists operator matches any taint with the specified key, regardless of value. Useful when you don’t care about the specific taint value. Omit value when using Exists.

apiVersion: v1
kind: Pod
metadata:
  name: tolerate-any-dedicated
spec:
  tolerations:
    - key: "dedicated"
      operator: "Exists"
      effect: "NoSchedule"
  containers:
    - name: app
      image: my-app:v1
---
tolerations:
  - operator: "Exists"
    effect: "NoSchedule"
---
tolerations:
  - operator: "Exists"
toleration-seconds.yaml

tolerationSeconds specifies how long a Pod stays on a node after a NoExecute taint is applied. Useful for graceful migration during maintenance. Without it, Pods are evicted immediately.

apiVersion: v1
kind: Pod
metadata:
  name: graceful-eviction
spec:
  tolerations:
    - key: "node.kubernetes.io/not-ready"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 300
    - key: "node.kubernetes.io/unreachable"
      operator: "Exists"
      effect: "NoExecute"
      tolerationSeconds: 300
  containers:
    - name: app
      image: my-app:v1
dedicated-nodes.yaml

Dedicated nodes pattern: taint nodes for specific teams/workloads. Only Pods with matching tolerations (and affinity for targeting) can run there. Prevents noisy neighbors and enables cost allocation.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: team-alpha-app
spec:
  template:
    spec:
      tolerations:
        - key: "team"
          operator: "Equal"
          value: "alpha"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: team
                    operator: In
                    values: ["alpha"]
      containers:
        - name: app
          image: team-alpha-app:v1
maintenance-drain.yaml

Maintenance workflow: taint node with NoExecute to evict workloads. DaemonSets and critical Pods need tolerations to survive. After maintenance, remove the taint to allow scheduling again.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
spec:
  template:
    spec:
      tolerations:
        - key: "maintenance"
          operator: "Exists"
          effect: "NoExecute"
        - key: "node-role.kubernetes.io/control-plane"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
        - name: node-exporter
          image: prom/node-exporter:v1.7.0
built-in-taints.yaml

Kubernetes adds taints automatically for node conditions. Default tolerations allow Pods 5 minutes before eviction. Override with your own tolerations. Common built-in taints handle node failures gracefully.

tolerations:
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300
  - key: "node.kubernetes.io/unreachable"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300
terminal

Dedicated GPU nodes for LLM inference: taint GPU nodes to prevent regular workloads from scheduling there. Also label nodes for affinity targeting. Taints repel, labels attract.

$ kubectl taint nodes gpu-node-1 nvidia.com/gpu=true:NoSchedule
node/gpu-node-1 tainted

$ kubectl taint nodes gpu-node-2 nvidia.com/gpu=true:NoSchedule
node/gpu-node-2 tainted

$ kubectl label nodes gpu-node-1 gpu-node-2 accelerator=nvidia-a100
node/gpu-node-1 labeled
node/gpu-node-2 labeled

Verify taints are applied. GPU nodes now reject Pods without matching tolerations. Regular workloads automatically schedule on CPU-only nodes.

$ kubectl get nodes -L accelerator
NAME         STATUS   ROLES    ACCELERATOR
cpu-node-1   Ready    <none>
cpu-node-2   Ready    <none>
gpu-node-1   Ready    <none>   nvidia-a100
gpu-node-2   Ready    <none>   nvidia-a100

$ kubectl describe node gpu-node-1 | grep Taints
Taints:             nvidia.com/gpu=true:NoSchedule
llm-inference-deployment.yaml

LLM inference deployment with GPU toleration. The toleration allows scheduling on tainted GPU nodes. Without it, the scheduler rejects the Pod from GPU nodes.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llm-inference
spec:
  replicas: 2
  selector:
    matchLabels:
      app: llm-inference
  template:
    metadata:
      labels:
        app: llm-inference
    spec:
      tolerations:
        - key: "nvidia.com/gpu"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"

Combine toleration with nodeAffinity to ensure Pods only run on GPU nodes. Toleration permits GPU nodes; affinity requires them. Request nvidia.com/gpu resource for actual GPU allocation.

      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: accelerator
                    operator: In
                    values: ["nvidia-a100"]
      containers:
        - name: vllm
          image: vllm/vllm-openai:latest
          resources:
            limits:
              nvidia.com/gpu: 1
          ports:
            - containerPort: 8000
terminal

Verify placement: LLM Pods run on GPU nodes, regular Pods without tolerations land on CPU nodes. Taints protect expensive GPU capacity from non-GPU workloads.

$ kubectl get pods -o wide
NAME                   READY   NODE
llm-inference-abc12    1/1     gpu-node-1
llm-inference-def34    1/1     gpu-node-2
web-frontend-xyz99     1/1     cpu-node-1

$ kubectl run test --image=nginx
pod/test created

$ kubectl get pod test -o wide
NAME   READY   STATUS    NODE
test   1/1     Running   cpu-node-1
terminal

Manage taints with kubectl. Add taints to repel Pods, remove taints (with minus suffix) to allow scheduling. View taints in node description or get nodes output.

$ kubectl taint nodes worker-1 dedicated=gpu:NoSchedule
node/worker-1 tainted

$ kubectl taint nodes worker-2 team=alpha:NoSchedule
node/worker-2 tainted

$ kubectl describe node worker-1 | grep -A3 Taints
Taints:             dedicated=gpu:NoSchedule

$ kubectl get nodes -o custom-columns=\
NAME:.metadata.name,TAINTS:.spec.taints
NAME       TAINTS
worker-1   [map[effect:NoSchedule key:dedicated value:gpu]]
worker-2   [map[effect:NoSchedule key:team value:alpha]]

$ kubectl taint nodes worker-1 dedicated=gpu:NoSchedule-
node/worker-1 untainted

Index | Use arrow keys to navigate