Horizontal Pod Autoscaler (HPA): Metrics & Custom Metrics
In this tutorial, you'll learn about Horizontal Pod Autoscaler (HPA): Metrics & Custom Metrics. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
The Horizontal Pod Autoscaler automatically scales the number of pods based on observed metrics, supporting CPU, memory, custom application metrics, and external metrics from cloud providers.
What You'll Learn
This tutorial covers HPA configuration for resource metrics, custom metrics with Prometheus, external metrics for cloud scaling, scaling policies, and behavior tuning to avoid thrashing.
Why It Matters
Without autoscaling, clusters are either over-provisioned and expensive or under-provisioned and slow. HPA ensures applications handle traffic spikes automatically while minimizing cost during low usage.
Real-World Use
Spotify uses HPA with custom metrics based on audio streaming latency to scale their backend services. Zalando uses HPA with Prometheus metrics to scale e-commerce services during flash sales.
HPA with CPU and Memory
The simplest HPA uses resource metrics from the metrics server.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# Ensure metrics server is running
kubectl -n kube-system get pods -l k8s-app=metrics-server
# Check HPA status
kubectl get hpa app-hpa
# Watch HPA in real time
kubectl get hpa app-hpa --watch
Expected output shows current CPU and memory utilization with target values.
Custom Metrics with Prometheus
Use the Prometheus Adapter to expose application-specific metrics.
Deploy Prometheus Adapter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--set prometheus.url=http://prometheus-server \
--set prometheus.port=9090
Configure a Custom Metric
# values.yaml for prometheus-adapter
rules:
default: true
custom:
- seriesQuery: 'http_requests_total{namespace!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "http_requests_total"
as: "requests_per_second"
metricsQuery: 'rate(http_requests_total{<<.LabelMatchers>>}[2m])'
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa-custom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "100"
External Metrics
Scale based on cloud provider metrics like queue depth or database connections.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 1
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: aws_sqs_queue_depth
target:
type: AverageValue
averageValue: "10"
Scaling Policies and Behavior
Fine-tune scaling speed to prevent thrashing.
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 4
periodSeconds: 15
- type: Percent
value: 100
periodSeconds: 15
The scaleUp policy allows adding up to 4 pods every 15 seconds or doubling. The scaleDown policy limits removal to 10 percent per 60 seconds.
Multiple Metrics
When multiple metrics are specified, the HPA uses the metric that requires the most replicas.
# View HPA decisions
kubectl describe hpa app-hpa
# Check events
kubectl get events --field-selector involvedObject.kind=HorizontalPodAutoscaler
Practice Questions
What metrics does the default HPA support? CPU and memory utilization from the metrics server.
How do you scale based on HTTP requests per second? Use a Prometheus Adapter with a custom metric query that calculates rate of requests.
What is the stabilization window in HPA behavior? It prevents flapping by requiring sustained metric values before scaling down.
How does HPA handle multiple metrics? It calculates the desired replicas for each metric and uses the highest count.
What is the difference between AverageValue and Utilization target types? AverageValue targets an absolute value per pod. Utilization targets a percentage of the resource request.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro