Horizontal Pod Autoscaler (HPA): Metrics & Custom Metrics

DodaTech 3 min read

In this tutorial, you'll learn about Horizontal Pod Autoscaler (HPA): Metrics & Custom Metrics. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

The Horizontal Pod Autoscaler automatically scales the number of pods based on observed metrics, supporting CPU, memory, custom application metrics, and external metrics from cloud providers.

What You'll Learn

This tutorial covers HPA configuration for resource metrics, custom metrics with Prometheus, external metrics for cloud scaling, scaling policies, and behavior tuning to avoid thrashing.

Why It Matters

Without autoscaling, clusters are either over-provisioned and expensive or under-provisioned and slow. HPA ensures applications handle traffic spikes automatically while minimizing cost during low usage.

Real-World Use

Spotify uses HPA with custom metrics based on audio streaming latency to scale their backend services. Zalando uses HPA with Prometheus metrics to scale e-commerce services during flash sales.

HPA with CPU and Memory

The simplest HPA uses resource metrics from the metrics server.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

# Ensure metrics server is running
kubectl -n kube-system get pods -l k8s-app=metrics-server

# Check HPA status
kubectl get hpa app-hpa

# Watch HPA in real time
kubectl get hpa app-hpa --watch

Expected output shows current CPU and memory utilization with target values.

Custom Metrics with Prometheus

Use the Prometheus Adapter to expose application-specific metrics.

Deploy Prometheus Adapter

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --set prometheus.url=http://prometheus-server \
  --set prometheus.port=9090

Configure a Custom Metric

# values.yaml for prometheus-adapter
rules:
  default: true
  custom:
  - seriesQuery: 'http_requests_total{namespace!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "http_requests_total"
      as: "requests_per_second"
    metricsQuery: 'rate(http_requests_total{<<.LabelMatchers>>}[2m])'

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

External Metrics

Scale based on cloud provider metrics like queue depth or database connections.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 1
  maxReplicas: 50
  metrics:
  - type: External
    external:
      metric:
        name: aws_sqs_queue_depth
      target:
        type: AverageValue
        averageValue: "10"

Scaling Policies and Behavior

Fine-tune scaling speed to prevent thrashing.

spec:
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Pods
        value: 4
        periodSeconds: 15
      - type: Percent
        value: 100
        periodSeconds: 15

The scaleUp policy allows adding up to 4 pods every 15 seconds or doubling. The scaleDown policy limits removal to 10 percent per 60 seconds.

Multiple Metrics

When multiple metrics are specified, the HPA uses the metric that requires the most replicas.

# View HPA decisions
kubectl describe hpa app-hpa

# Check events
kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler

Practice Questions

What metrics does the default HPA support? CPU and memory utilization from the metrics server.
How do you scale based on HTTP requests per second? Use a Prometheus Adapter with a custom metric query that calculates rate of requests.
What is the stabilization window in HPA behavior? It prevents flapping by requiring sustained metric values before scaling down.
How does HPA handle multiple metrics? It calculates the desired replicas for each metric and uses the highest count.
What is the difference between AverageValue and Utilization target types? AverageValue targets an absolute value per pod. Utilization targets a percentage of the resource request.

← Previous Helm Charts: Package Management for Kubernetes Next → Resource Quotas, Limit Ranges & Priority Classes

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Kubernetes