Chaos Mesh — Kubernetes Chaos Engineering Platform

DodaTech Updated 2026-06-21 4 min read

In this tutorial, you'll learn about Chaos Mesh. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Chaos Mesh is an open-source Chaos Engineering platform designed specifically for Kubernetes. It provides a rich set of fault types — pod kills, network partitions, DNS failures, disk I/O delays, and CPU stress — all managed through Kubernetes custom resources.

What You Will Learn

This tutorial teaches you how to install Chaos Mesh, define chaos experiments as Kubernetes resources, and run safe Fault Injection experiments on your cluster.

Why It Matters

Chaos Mesh turns Chaos Engineering into a native Kubernetes experience. You define experiments with the same tools and workflows you already use for deployments. This reduces the barrier to entry and makes experiments reproducible and version-controlled.

Real-World Use

DodaTech uses Chaos Mesh to run weekly experiments across 40 Microservices. Every experiment is defined as a YAML file stored in the same Git repository as the service manifests, making experiments auditable and repeatable.

Prerequisites

Before starting you should understand:

Kubernetes cluster operations and kubectl commands
Chaos Engineering fundamentals from the previous tutorials
How to create and apply Kubernetes custom resources
Basic YAML syntax

Step 1: Install Chaos Mesh

Install Chaos Mesh using Helm or the quick installation script:

# Install Chaos Mesh using Helm
helm repo add chaos-mesh https://charts.chaos-mesh.org
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --create-namespace \
  --version 2.7.0

# Verify installation
kubectl get pods -n chaos-mesh
# Expected output:
# NAME                                        READY   STATUS
# chaos-controller-manager-7d9f8c6b4f-abc1   1/1     Running
# chaos-daemon-5h6k8                         1/1     Running
# chaos-dashboard-abc123                     1/1     Running

Step 2: Explore the Fault Types

Chaos Mesh supports multiple fault types organized into categories:

# List available chaos kinds
kubectl api-resources | grep chaos
# Expected output:
# podchaos               chaos-mesh.org/v1alpha1
# networkchaos           chaos-mesh.org/v1alpha1
# dnschaos               chaos-mesh.org/v1alpha1
# httpchaos              chaos-mesh.org/v1alpha1
# iochaos                chaos-mesh.org/v1alpha1
# stresschaos            chaos-mesh.org/v1alpha1
# kernelnos              chaos-mesh.org/v1alpha1
# timechaos              chaos-mesh.org/v1alpha1

Step 3: Create a Pod Kill Experiment

The simplest Chaos Mesh experiment kills a single pod:

# pod-kill-example.yaml
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: pod-kill-demo
spec:
  action: pod-kill
  mode: one
  selector:
    namespaces:
      - default
    labelSelectors:
      app: nginx
  duration: 30s

kubectl apply -f pod-kill-example.yaml
# Expected output:
# podchaos.chaos-mesh.org/pod-kill-demo created

Step 4: Create a Network Latency Experiment

Simulate network delays to test how services handle slow connections:

# network-latency.yaml
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-latency-demo
spec:
  action: delay
  mode: all
  selector:
    namespaces:
      - default
    labelSelectors:
      app: web-service
  delay:
    latency: 500ms
    correlation: 50
    jitter: 100ms
  duration: 60s

kubectl apply -f network-latency.yaml
# Expected output:
# networkchaos.chaos-mesh.org/network-latency-demo created

Step 5: Monitor and Stop Experiments

Chaos Mesh provides a dashboard and CLI for monitoring active experiments:

# List active experiments
kubectl get podchaos
# Expected output:
# NAME              ACTION    DURATION   STATUS
# pod-kill-demo     pod-kill  30s        Running

# Manually stop an experiment
kubectl delete podchaos pod-kill-demo
# Expected output:
# podchaos.chaos-mesh.org "pod-kill-demo" deleted

Learning Path

flowchart LR
  A[Game Days] --> B[Chaos Mesh]
  B --> C[LitmusChaos]
  C --> D[Gremlin Platform]
  D --> E[AWS Fault Injection]
  style B fill:#f90,color:#fff

Common Errors

Forgetting to set a duration: Without a duration the fault runs indefinitely. Always set a duration or use a scheduler.
Using mode: all without understanding Blast Radius: Mode: all affects every matching pod. Use mode: one for initial experiments.
Network chaos blocking critical system traffic: Carefully scope network chaos selectors to avoid blocking Kubernetes control plane traffic.
Not verifying Chaos Mesh pod status after installation: If the chaos-daemon pods are not running experiments will silently fail.
Applying chaos resources to the wrong namespace: Double-check the selector namespace. An experiment meant for staging might target production.

Practice Questions

What are the six main fault types supported by Chaos Mesh?
How do you limit a Chaos Mesh experiment to a single pod?
What is the purpose of the chaos-daemon component?
How do you stop a running Chaos Mesh experiment?
Why should you set a duration on every experiment?

Challenge

Create a Chaos Mesh experiment that injects 300ms of latency into a web service for 90 seconds, targeting only pods with the label tier: frontend in the staging namespace. Verify the experiment is running and then stop it manually.

FAQ

What is Chaos Mesh?

Chaos Mesh is an open-source Chaos Engineering platform for Kubernetes that provides various fault types as Kubernetes custom resources.

How is Chaos Mesh different from Chaos Monkey?

Chaos Monkey only terminates instances. Chaos Mesh provides pod kills, network partitions, DNS failures, disk pressure, CPU stress, and time skew faults.

Do I need to modify my application code to use Chaos Mesh?

No. Chaos Mesh operates at the infrastructure level. Your application code does not need any changes.

Can Chaos Mesh run scheduled experiments?

Yes. You can use the scheduler field with cron syntax to run experiments on a recurring schedule.

Is Chaos Mesh production-safe?

Yes, when used with proper Blast Radius controls. Start with staging, limit scope with selectors, and always set a duration.

← Previous Game Days — Running Chaos Drills with Your Team Next → LitmusChaos — Cloud-Native Chaos Engineering

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Chaos Engineering