LitmusChaos Guide — Cloud-Native Chaos Engineering for Kubernetes
In this tutorial, you'll learn about LitmusChaos Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Litmus is an open-source Chaos Engineering platform designed for cloud-native environments. It extends Kubernetes with workflow Orchestration, a ChaosHub experiment marketplace, GitOps integration, and automated resilience scoring that helps teams track their progress over time.
What You Will Learn
This tutorial teaches you how to install LitmusChaos, browse and run experiments from ChaosHub, create multi-step chaos workflows, integrate with CI/CD pipelines, and interpret resilience scores.
Why It Matters
LitmusChaos turns Chaos Engineering into a continuous practice that integrates with your existing development workflows. Instead of running manual experiments you define chaos workflows as code, execute them automatically after deployments, and track resilience improvements through quantitative scores.
Real-World Use
DodaTech integrated LitmusChaos into the deployment pipeline for Durga Antivirus Pro. Every Canary Deployment triggers a chaos workflow that must pass before the release reaches more than 10 percent of users. This catches resilience regressions before they affect customers.
Prerequisites
Before starting you should understand:
- Kubernetes operations and custom resource definitions
- Chaos Engineering fundamentals (hypothesis, Steady State, blast radius)
- Basic CI/CD pipeline concepts
- Helm package manager
Step 1: Install LitmusChaos
Install LitmusChaos using Helm with the Litmus Portal for experiment management.
# Add the LitmusChaos Helm repository
helm repo add litmuschaos https://litmuschaos.github.io/litmus-helm
helm repo update
# Install LitmusChaos infrastructure
helm install litmus litmuschaos/litmus \
--namespace litmus \
--create-namespace \
--set portal.frontend.service.type=NodePort
# Verify all pods are running
kubectl get pods -n litmus
Expected output:
NAME READY STATUS
litmus-frontend-7d9f8c6b4f-abc1 1/1 Running
litmus-server-5b7c8d9e4f-def2 1/1 Running
mongo-0 1/1 Running
chaos-exporter-c6b7d8e9f-ghi3 1/1 Running
Step 2: Access the Litmus Portal
Port-forward to access the Litmus web UI and create your first project.
kubectl port-forward svc/litmus-frontend-service 9091:9091 -n litmus
# Expected output:
# Forwarding from 127.0.0.1:9091 -> 9091
Open http://localhost:9091 in your browser. Create an admin account, then create a project with a name that matches your team or service.
Step 3: Run an Experiment from ChaosHub
ChaosHub is Litmus's built-in marketplace of pre-built chaos experiments. Browse and execute experiments without writing YAML from scratch.
# List available experiments using litmusctl
litmusctl get experiments
# Expected output:
# EXPERIMENT NAME CHAOSHUB
# pod-delete litmuschaos
# node-cpu-hog litmuschaos
# network-loss litmuschaos
# kubelet-service-kill litmuschaos
# pod-autoscaler litmuschaos
# Run a pod-delete experiment on a target deployment
litmusctl create experiment pod-delete \
--target-namespace default \
--app-label app=nginx \
--duration 30s
# Expected output:
# Experiment pod-delete scheduled successfully
Step 4: Create a Multi-Step Chaos Workflow
LitmusChaos supports workflows that chain multiple experiments sequentially or in parallel.
# chaos-workflow.yaml
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosWorkflow
metadata:
name: post-deployment-workflow
namespace: litmus
spec:
workflow:
steps:
- name: pod-delete-test
template: pod-delete
- name: network-loss-test
template: network-loss
dependsOn:
- pod-delete-test
- name: cpu-hog-test
template: cpu-hog
dependsOn:
- network-loss-test
templates:
- name: pod-delete
experiment: pod-delete
spec:
duration: 30s
- name: network-loss
experiment: network-loss
spec:
duration: 45s
lossPercent: "30"
- name: cpu-hog
experiment: cpu-hog
spec:
duration: 60s
cpuPercent: "80"
Step 5: Integrate with CI/CD Pipeline
Add chaos experiments to your CI/CD pipeline to automatically test resilience after every deployment.
# .github/workflows/chaos-pipeline.yml
name: Chaos Engineering Pipeline
on:
deployment_status:
types: [success]
jobs:
chaos-test:
runs-on: ubuntu-latest
steps:
- name: Install litmusctl
run: |
curl -LO https://litmusctl.litmuschaos.io/latest/linux/litmusctl
chmod +x litmusctl
sudo mv litmusctl /usr/local/bin/
- name: Run chaos workflow
run: |
litmusctl create workflow \
--file chaos-workflow.yaml \
--project-id ${{ secrets.LITMUS_PROJECT_ID }}
- name: Check resilience score
run: |
litmusctl get resilience-score \
--workflow post-deployment-workflow
Expected resilience score output:
Resilience Score: 92/100
Result: PASS (threshold: 80/100)
Learning Path
flowchart LR A[Chaos Mesh] --> B[LitmusChaos] B --> C[Gremlin] C --> D[AWS Fault Injection] D --> E[Azure Chaos Studio] style B fill:#f90,color:#fff
Common Errors
- Not connecting a chaos infrastructure agent before running experiments: The agent must be installed on the target cluster. Without it experiments fail with infrastructure unavailable errors.
- Skipping probe configuration in experiments: Without probes LitmusChaos cannot verify Steady State during the experiment. Always configure HTTP or command probes.
- Running workflows without sequential dependencies: Multiple experiments running simultaneously make it impossible to attribute degradation to a specific fault. Use dependsOn for sequential execution.
- Forgetting to label namespaces as chaos targets: LitmusChaos uses namespace labels to identify safe targets. Label namespaces with
<a href="/chaos-engineering/litmuschaos/">LitmusChaos</a>.io/chaos: enabled. - Overlooking the ChaosHub experiment version: ChaosHub experiments receive updates. Pin specific versions in your workflow to avoid unexpected changes.
Practice Questions
- What is ChaosHub and how does it simplify experiment creation?
- How do you create a sequential chaos workflow in LitmusChaos?
- What is a resilience score and how is it calculated?
- How do you integrate LitmusChaos with a GitHub Actions CI/CD pipeline?
- What are probes and why are they critical for experiment safety?
Challenge
Set up a LitmusChaos workflow that runs three sequential experiments after a deployment: pod-delete, network-loss with 30 percent packet loss, and cpu-hog at 80 percent utilization. Configure HTTP probes to verify service health throughout. Each experiment should only proceed if the previous one passes. Integrate the workflow into a GitHub Actions pipeline and verify the resilience score.
FAQ
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro