Kubernetes Monitoring with Prometheus and Grafana

DodaTech 2 min read

In this tutorial, you'll learn about Kubernetes Monitoring with Prometheus and Grafana. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

What You'll Learn

Monitor Kubernetes with Prometheus and Grafana — install the monitoring stack, collect pod/node metrics, create dashboards, and configure alerts.

Why It Matters

Without monitoring, you're flying blind. Prometheus + Grafana is the standard stack for Kubernetes Observability — metrics, visualization, and alerting.

Real-World Use

Alerting when a pod is crash-looping, visualizing CPU/memory trends, tracking API error rates, and capacity planning.

Architecture

Pods (export metrics)
    ↓
Prometheus Server (scrapes every 15s)
    ↓
Alertmanager → Slack/Email/PagerDuty
    ↓
Grafana (visualizes from Prometheus)

Install kube-prometheus-stack

# Add the Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the stack
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace

# Check components
kubectl get pods -n monitoring
# prometheus-operator, alertmanager, grafana, prometheus-node-exporter

Access Grafana

# Port forward to Grafana
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80

# Default credentials
# Username: admin
# Password: prom-operator

What You Get

Out of the box, you get dashboards for:

Dashboard	What It Shows
Kubernetes / Compute Resources / Namespace	CPU, memory, network per namespace
Kubernetes / Compute Resources / Pod	Pod-level resource usage
Kubernetes / Networking	Network traffic and errors
Kubernetes / API Server	API server latency, request rate
Node Exporter / Nodes	OS-level metrics (disk, load, memory)
Kubernetes / StatefulSets	StatefulSet status and replica count

Key Metrics to Monitor

# Pod CPU usage (percentage of request)
sum(rate(container_cpu_usage_seconds_total{container!=""}[5m]))
  by (pod, namespace)
/
sum(kube_pod_container_resource_requests{resource="cpu"})
  by (pod, namespace)

# Memory usage (percentage)
sum(container_memory_working_set_bytes{container!=""})
  by (pod, namespace)
/
sum(kube_pod_container_resource_requests{resource="memory"})
  by (pod, namespace)

# Pod restart count
increase(kube_pod_container_status_restarts_total[1h])

# Node CPU utilization
1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))

# Disk space (percentage used)
100 - (node_filesystem_free_bytes{mountpoint="/"} /
        node_filesystem_size_bytes{mountpoint="/"} * 100)

Custom ServiceMonitor

To monitor your own apps:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: http
      path: /metrics
      interval: 15s

Your app needs a /metrics endpoint exposing Prometheus-formatted metrics:

from prometheus_client import start_http_server, Counter, Histogram
import random, time

REQUESTS = Counter('http_requests_total', 'Total HTTP requests')
LATENCY = Histogram('http_request_duration_seconds', 'Request latency')

@REQUESTS.count_exceptions()
def handle_request():
    with LATENCY.time():
        time.sleep(random.random())
        REQUESTS.inc()

start_http_server(8000)  # /metrics endpoint

Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-alerts
spec:
  groups:
    - name: kubernetes
      rules:
        - alert: HighCpuUsage
          expr: |
            sum(rate(container_cpu_usage_seconds_total[5m]))
              by (pod) > 0.8
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} CPU > 80% for 5m"

        - alert: PodCrashLooping
          expr: |
            increase(kube_pod_container_status_restarts_total[10m]) > 3
          for: 2m
          labels:
            severity: critical

Resource Recommendations

# See recommended resource usage per container
kubectl top pods
kubectl top nodes

# Install the VPA for recommendations
# Vertical Pod Autoscaler suggests CPU/memory requests

Key Commands

# Port forward Prometheus
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090

# Port forward Alertmanager
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-alertmanager 9093:9093

# Query Prometheus API
curl http://localhost:9090/api/v1/query?query=up

# Check alerting rules
curl http://localhost:9090/api/v1/rules

← Previous Kubernetes ConfigMaps and Secrets — Complete Guide Next → Docker and Kubernetes Security Best Practices

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Docker Kubernetes