Monitoring Kubernetes: kube-state-metrics and cAdvisor
In this tutorial, you'll learn about Monitoring Kubernetes: kube. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You Will Learn
This tutorial teaches you how to set up comprehensive monitoring for a Kubernetes cluster using kube-state-metrics for object-level metrics, cAdvisor for container metrics, and Prometheus for collection and alerting.
Why It Matters
Kubernetes is dynamic -- pods come and go, nodes scale, and workloads shift. Traditional monitoring tools cannot keep up. You need a monitoring stack designed for ephemeral infrastructure that automatically discovers new targets as they appear.
Real-World Use
The DodaTech infrastructure team manages 15 Kubernetes clusters across three regions. When a node failed in us-east-1, Prometheus detected the node condition change, kube-state-metrics showed the pod redistribution, and cAdvisor reported the resource pressure on remaining nodes -- all within 30 seconds of the failure.
Monitoring Kubernetes requires three layers: node-level metrics (cAdvisor for containers, Node Exporter for hosts), object-level metrics (kube-state-metrics for deployments, services, pods), and control plane metrics (API server, scheduler, controller manager). The Kubernetes monitoring ecosystem is built around Prometheus and its Kubernetes service discovery.
Prerequisites
- A running Kubernetes cluster (local minikube or cloud-based)
- Docker installed
- Basic knowledge of kubectl commands
- Understanding of Prometheus Introduction
Step-by-Step Tutorial
Step 1: Deploy Prometheus Stack with Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus <a href="/devops/prometheus-grafana/">Prometheus</a>-community/kube-<a href="/devops/prometheus-grafana/">Prometheus</a>-stack --namespace monitoring --create-namespace
Expected output: A complete monitoring stack deployed including Prometheus, Alertmanager, Grafana, kube-state-metrics, and node-exporter.
Step 2: Verify the Deployment
kubectl get pods -n monitoring
kubectl get svc -n monitoring
Look for pods with <a href="/devops/prometheus-grafana/">Prometheus</a>-kube-state-metrics, <a href="/devops/prometheus-grafana/">Prometheus</a>-node-exporter, and <a href="/devops/prometheus-grafana/">Prometheus</a>-server.
Step 3: Explore kube-state-metrics
kube-state-metrics generates metrics about Kubernetes objects. Port-forward and view:
kubectl port-forward -n monitoring svc/<a href="/devops/prometheus-grafana/">Prometheus</a>-kube-state-metrics 8080:8080
curl http://localhost:8080/metrics | head -30
Expected output: Metrics like kube_deployment_status_replicas, kube_pod_status_phase, kube_node_status_condition.
Step 4: Key kube-state-metrics Queries
# Number of running pods
count(kube_pod_status_phase{phase="Running"})
# Deployments with unavailable replicas
kube_deployment_status_replicas_unavailable > 0
# Node memory capacity
kube_node_status_capacity{resource="memory"}
# Pods by node
count by (node) (kube_pod_info)
Step 5: Explore cAdvisor Metrics
cAdvisor is embedded in the kubelet. It exposes container-level metrics:
# Container CPU usage
rate(container_cpu_usage_seconds_total[5m])
# Container memory usage
container_memory_usage_bytes
# Container network receive rate
rate(container_network_receive_bytes_total[5m])
# Container filesystem usage
container_fs_usage_bytes
Step 6: Monitor Kubernetes Control Plane
If your cluster exposes control plane metrics, scrape the API server:
# Add to Prometheus scrape config
- job_name: "kubernetes-apiservers"
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
Step 7: Set Up ServiceMonitors for Custom Applications
ServiceMonitor is the Kubernetes custom resource for configuring Prometheus scraping:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: http
interval: 15s
namespaceSelector:
matchNames:
- default
Apply it:
kubectl apply -f servicemonitor.yaml
Step 8: Create Kubernetes-Specific Alerts
groups:
- name: kubernetes
rules:
- alert: PodNotRunning
expr: kube_pod_status_phase{phase=~"Pending|Unknown|Failed"} > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is not running"
- alert: NodeNotReady
expr: kube_node_status_condition{condition="Ready",status="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.node }} is not ready"
- alert: HighPodRestartRate
expr: rate(kube_pod_container_status_restarts_total[10m]) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is restarting frequently"
Learning Path
flowchart LR
A[Kubernetes Cluster] --> B[kube-state-metrics]
A --> C[cAdvisor/kubelet]
A --> D[Node Exporter]
B --> E[Prometheus]
C --> E
D --> E
E --> F[Grafana]
E --> G[Alertmanager]
B -.-> H[Deployment/Pod/Node metrics]
C -.-> I[Container CPU/Memory/Network]
style A fill:#4a90d9,color:#fff
style E fill:#e67e22,color:#fff
Common Errors
kube-state-metrics shows no data -- The service account does not have sufficient RBAC permissions. Check the ClusterRole bindings for kube-state-metrics.
cAdvisor metrics are missing -- The kubelet is not configured for cAdvisor or the port is blocked. Verify kubelet is listening on port 10250.
Pod restarts cause metric gaps -- Prometheus scrapes targets by pod IP. When a pod restarts, the IP changes. Prometheus discovers the new IP in the next service discovery cycle.
Helm chart installation fails -- The helm Repository URL is wrong or the chart name is incorrect. Verify
helm search repo <a href="/devops/prometheus-grafana/">Prometheus</a>-community/kube-<a href="/devops/prometheus-grafana/">Prometheus</a>-stack.ServiceMonitor does not appear in Prometheus -- The CRD was not installed. Ensure the kube-Prometheus-stack CRDs are present with
kubectl get crd.Control plane scrape returns 403 -- Token authentication is not configured correctly. Verify the service account has permissions to access the API server metrics endpoint.
High memory usage from Prometheus in cluster -- The number of time series is too high for the allocated memory. Increase
--storage.tsdb.retention.timeor add resource limits.
Practice Questions
What does kube-state-metrics expose? Answer: Metrics about the state of Kubernetes objects: pods, deployments, nodes, services, namespaces, and other resources.
Where does cAdvisor run in a Kubernetes cluster? Answer: cAdvisor is embedded in the kubelet binary on each node. It exposes container-level metrics through the kubelet API.
What is a ServiceMonitor in the Prometheus Operator ecosystem? Answer: A custom resource that defines how Prometheus should scrape metrics from a set of Kubernetes services, including label selectors and port configuration.
How does Prometheus discover targets in Kubernetes? Answer: Through
Kubernetes_sd_configswith roles likepod,service,endpoints,node, andingress.What is the purpose of the kube-Prometheus-stack Helm chart? Answer: It deploys a complete Prometheus monitoring stack for Kubernetes, including Prometheus, Alertmanager, Grafana, kube-state-metrics, and node-exporter with preconfigured dashboards and alerts.
Challenge
Deploy the kube-Prometheus-stack Helm chart to a Kubernetes cluster with three worker nodes. Verify that kube-state-metrics exposes deployment, pod, and node metrics. Confirm cAdvisor metrics are available from each node. Create a custom ServiceMonitor for a sample application deployed with 3 replicas. Write Prometheus alerting rules for: node not ready (critical), pod in CrashLoopBackOff (critical), and node disk pressure (warning). Import the Kubernetes cluster Grafana dashboard (ID 315) and verify all panels show data. Generate load on one node and observe the dashboard change.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro