Kubernetes Monitoring with Prometheus and Grafana
In this tutorial, you'll learn about Kubernetes Monitoring with Prometheus and Grafana. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You'll Learn
Monitor Kubernetes with Prometheus and Grafana — install the monitoring stack, collect pod/node metrics, create dashboards, and configure alerts.
Why It Matters
Without monitoring, you're flying blind. Prometheus + Grafana is the standard stack for Kubernetes Observability — metrics, visualization, and alerting.
Real-World Use
Alerting when a pod is crash-looping, visualizing CPU/memory trends, tracking API error rates, and capacity planning.
Architecture
Pods (export metrics)
↓
Prometheus Server (scrapes every 15s)
↓
Alertmanager → Slack/Email/PagerDuty
↓
Grafana (visualizes from Prometheus)
Install kube-prometheus-stack
# Add the Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install the stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace
# Check components
kubectl get pods -n monitoring
# prometheus-operator, alertmanager, grafana, prometheus-node-exporter
Access Grafana
# Port forward to Grafana
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Default credentials
# Username: admin
# Password: prom-operator
What You Get
Out of the box, you get dashboards for:
| Dashboard | What It Shows |
|---|---|
| Kubernetes / Compute Resources / Namespace | CPU, memory, network per namespace |
| Kubernetes / Compute Resources / Pod | Pod-level resource usage |
| Kubernetes / Networking | Network traffic and errors |
| Kubernetes / API Server | API server latency, request rate |
| Node Exporter / Nodes | OS-level metrics (disk, load, memory) |
| Kubernetes / StatefulSets | StatefulSet status and replica count |
Key Metrics to Monitor
# Pod CPU usage (percentage of request)
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))
by (pod, namespace)
/
sum(kube_pod_container_resource_requests{resource="cpu"})
by (pod, namespace)
# Memory usage (percentage)
sum(container_memory_working_set_bytes{container!=""})
by (pod, namespace)
/
sum(kube_pod_container_resource_requests{resource="memory"})
by (pod, namespace)
# Pod restart count
increase(kube_pod_container_status_restarts_total[1h])
# Node CPU utilization
1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))
# Disk space (percentage used)
100 - (node_filesystem_free_bytes{mountpoint="/"} /
node_filesystem_size_bytes{mountpoint="/"} * 100)
Custom ServiceMonitor
To monitor your own apps:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-monitor
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: http
path: /metrics
interval: 15s
Your app needs a /metrics endpoint exposing Prometheus-formatted metrics:
from prometheus_client import start_http_server, Counter, Histogram
import random, time
REQUESTS = Counter('http_requests_total', 'Total HTTP requests')
LATENCY = Histogram('http_request_duration_seconds', 'Request latency')
@REQUESTS.count_exceptions()
def handle_request():
with LATENCY.time():
time.sleep(random.random())
REQUESTS.inc()
start_http_server(8000) # /metrics endpoint
Alerting Rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: my-alerts
spec:
groups:
- name: kubernetes
rules:
- alert: HighCpuUsage
expr: |
sum(rate(container_cpu_usage_seconds_total[5m]))
by (pod) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} CPU > 80% for 5m"
- alert: PodCrashLooping
expr: |
increase(kube_pod_container_status_restarts_total[10m]) > 3
for: 2m
labels:
severity: critical
Resource Recommendations
# See recommended resource usage per container
kubectl top pods
kubectl top nodes
# Install the VPA for recommendations
# Vertical Pod Autoscaler suggests CPU/memory requests
Key Commands
# Port forward Prometheus
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090
# Port forward Alertmanager
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-alertmanager 9093:9093
# Query Prometheus API
curl http://localhost:9090/api/v1/query?query=up
# Check alerting rules
curl http://localhost:9090/api/v1/rules
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro