K8s by Example: Horizontal Pod Autoscaler

HPA scales Pods based on CPU, memory, or custom metrics. Requires metrics-server in the cluster. Adjusts replicas to maintain target utilization automatically.

hpa.yaml

HPA uses the autoscaling/v2 API. scaleTargetRef points to the Deployment to scale. HPA adjusts replicas between min and max based on metrics.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
pod-resources.yaml

HPA calculates utilization as current/request. Requires resource requests to be defined. Without requests, HPA cannot determine utilization percentage.

spec:
  containers:
    - name: app
      resources:
        requests:
          cpu: 200m
          memory: 256Mi
hpa-multi-metric.yaml

Combine CPU and memory metrics. HPA takes the max of calculated replicas from all metrics. Good for workloads that are both CPU and memory bound.

spec:
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
hpa-behavior.yaml

Control scaling behavior: scale up fast, down slow. stabilizationWindowSeconds prevents flapping during traffic spikes. Policies control scale rate per time period.

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
hpa-custom.yaml

Custom metrics from Prometheus or other sources. Requires metrics adapter (prometheus-adapter, KEDA). Scale on queue depth, requests per second, or any application metric.

spec:
  metrics:
    - type: Pods
      pods:
        metric:
          name: requests_per_second
        target:
          type: AverageValue
          averageValue: 1000
    - type: Object
      object:
        metric:
          name: queue_depth
        describedObject:
          apiVersion: v1
          kind: Service
          name: rabbitmq
        target:
          type: Value
          value: 100
hpa-external.yaml

External metrics come from outside the cluster (cloud monitoring, SaaS). Scale based on SQS queue length, Pub/Sub backlog, or any external data source. Requires external metrics provider like KEDA.

spec:
  metrics:
    - type: External
      external:
        metric:
          name: sqs_queue_messages
          selector:
            matchLabels:
              queue: orders
        target:
          type: AverageValue
          averageValue: 10
hpa-targets.yaml

HPA works with Deployments, StatefulSets, and ReplicaSets. Cannot scale DaemonSets (1 per node by design). It conflicts with manual scaling because HPA overwrites manual replica changes.

spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
---
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: postgres
terminal

Debug HPA by checking events, current metrics, and conditions. Common issues: metrics-server not running, missing resource requests, invalid metric names. TARGETS shows current/target utilization.

$ kubectl get hpa
NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app   Deployment/my-app   65%/70%   2         10        3          5m

$ kubectl describe hpa my-app
Conditions:
  Type            Status  Reason
  AbleToScale     True    ReadyForNewScale
  ScalingActive   True    ValidMetricFound

$ kubectl get pods -n kube-system | grep metrics-server
metrics-server-abc12   1/1     Running

$ kubectl top pods
NAME        CPU(cores)   MEMORY(bytes)
my-app-1    150m         200Mi
my-app-2    140m         195Mi

$ kubectl autoscale deployment my-app \
    --min=2 --max=10 --cpu-percent=70
horizontalpodautoscaler.autoscaling/my-app created

Index | GitHub | Use arrow keys to navigate |