K8s by Example: Graceful Shutdown

Graceful shutdown ensures in-flight requests complete before a Pod terminates. When Kubernetes sends SIGTERM, apps should stop accepting new requests, finish existing ones, then exit. The terminationGracePeriodSeconds sets the deadline. Use for: zero-downtime deployments, preventing data loss, clean connection handling.

graceful-shutdown-basic.yaml

The terminationGracePeriodSeconds gives your app time to shut down gracefully. Default is 30 seconds. After this period, Kubernetes sends SIGKILL. Set this higher than your longest request.

apiVersion: v1
kind: Pod
metadata:
  name: web-server
spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: app
      image: my-app:v1
      ports:
        - containerPort: 8080
prestop-hook.yaml

A preStop hook runs before SIGTERM is sent. Use it to deregister from service discovery, drain connections, or notify load balancers. The sleep gives time for endpoints to update before shutdown begins.

apiVersion: v1
kind: Pod
metadata:
  name: web-server
spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: app
      image: my-app:v1
      ports:
        - containerPort: 8080
      lifecycle:
        preStop:
          exec:
            command:
              - /bin/sh
              - -c
              - sleep 10

The preStop sleep solves a race condition: Pod termination and endpoint removal happen concurrently. The sleep ensures kube-proxy updates before your app stops accepting connections.

lifecycle:
  preStop:
    httpGet:
      path: /prestop
      port: 8080
shutdown-sequence.yaml

Complete shutdown-aware deployment. The app handles SIGTERM by stopping new connections, waiting for in-flight requests, closing database connections, and exiting cleanly. Combined with preStop hook and sufficient grace period.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      terminationGracePeriodSeconds: 120
      containers:
        - name: api
          image: api-server:v1
          ports:
            - containerPort: 8080
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 15"]
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            periodSeconds: 5
connection-draining.yaml

For long-lived connections (WebSockets, gRPC streams), configure connection draining. The app tracks active connections and waits for them to close naturally or times them out during shutdown.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: websocket-server
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 300
      containers:
        - name: ws
          image: websocket-server:v1
          env:
            - name: SHUTDOWN_TIMEOUT
              value: "280"
            - name: CONNECTION_DRAIN_TIMEOUT
              value: "60"
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - -c
                  - |
                    curl -X POST localhost:8080/admin/drain
                    sleep 15
job-graceful-shutdown.yaml

Jobs and batch workloads need graceful shutdown too. If terminated mid-processing, the job should checkpoint progress or mark items for retry. The activeDeadlineSeconds limits total runtime.

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processor
spec:
  activeDeadlineSeconds: 3600
  template:
    spec:
      terminationGracePeriodSeconds: 300
      restartPolicy: OnFailure
      containers:
        - name: processor
          image: batch-processor:v1
          env:
            - name: CHECKPOINT_DIR
              value: "/data/checkpoints"
            - name: GRACEFUL_SHUTDOWN_TIMEOUT
              value: "290"
          volumeMounts:
            - name: data
              mountPath: /data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: batch-data
terminal

Monitor shutdown behavior in Pod events. Look for preStop hook execution, SIGTERM timing, and whether the container exited cleanly or was killed after grace period.

$ kubectl delete pod web-server
pod "web-server" deleted

$ kubectl describe pod web-server
Events:
  Type    Reason     Message
  ----    ------     -------
  Normal  Killing    Stopping container app
  Normal  PreStop    Executing preStop hook

$ kubectl get pod web-server -w
NAME         READY   STATUS        RESTARTS   AGE
web-server   1/1     Terminating   0          1h
web-server   0/1     Terminating   0          1h

$ kubectl logs web-server --previous
2024-01-15T10:30:00Z SIGTERM received, starting graceful shutdown
2024-01-15T10:30:00Z Stopped accepting new connections
2024-01-15T10:30:05Z Waiting for 3 in-flight requests
2024-01-15T10:30:08Z All requests completed, exiting

Index | GitHub | Use arrow keys to navigate |