Gremlin Platform — Managed Chaos Engineering Service

DodaTech Updated 2026-06-21 5 min read

In this tutorial, you'll learn about Gremlin Platform. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Gremlin is a managed Chaos Engineering platform that provides safe, controlled failure injection for both Kubernetes and traditional infrastructure. Gremlin offers a web console, CLI, and API for running experiments with built-in safety controls.

What You Will Learn

This tutorial teaches you how to use the Gremlin platform to inject faults, create scenarios, and run experiments with automated guardrails and team collaboration.

Why It Matters

Gremlin abstracts away the complexity of building your own Chaos Engineering infrastructure. It provides a curated set of attack types — from CPU exhaustion to blackhole networking — with safety controls that prevent experiments from going out of control.

Real-World Use

DodaTech uses Gremlin for chaos experiments on legacy infrastructure that does not run on Kubernetes. The Gremlin agent supports Linux and Windows hosts, making it possible to run chaos experiments on bare metal servers running Durga Antivirus Pro scanning nodes.

Prerequisites

Before starting you should understand:

Chaos Engineering concepts (Steady State, hypothesis, Blast Radius)
How to install and configure software agents on Linux
Basic networking concepts (latency, packet loss, bandwidth)

Step 1: Install the Gremlin Agent

# Install Gremlin agent on Ubuntu/Debian
curl -fsSL https://www.gremlin.com/install/ubuntu.sh | sudo bash
sudo systemctl start gremlind
sudo systemctl enable gremlind

# Verify agent is running
sudo gremlin status
# Expected output:
# Gremlin daemon is running
# Client ID: abc123-def456
# Team ID: team-789

Step 2: Authenticate and Create a Team

Configure the agent with your team credentials:

# Authenticate with Gremlin
gremlin login --team admin-team --password ********
# Expected output:
# ✅ Successfully authenticated
# Welcome to Gremlin

# List available attack types
gremlin help attacks
# Expected output:
# CPU - Consume CPU resources
# Memory - Consume memory resources
# Blackhole - Drop all network traffic
# Latency - Add network latency
# Packet Loss - Drop network packets
# DNS - Block or modify DNS responses
# Process Kill - Terminate a specific process
# Shutdown - Shutdown or reboot the host

Step 3: Run a CPU Attack

The simplest attack consumes CPU resources on a target host:

# Saturate 1 CPU core for 60 seconds
gremlin attack cpu \
  --length 60 \
  --target 1

# Expected output:
# ✅ CPU attack created
# Attack ID: cpu-attack-001
# Type: CPU
# Targets: 1 host
# Duration: 60 seconds

Monitor the effect on the target:

# On the target host check CPU usage
top -bn1 | head -5
# Expected output:
# %Cpu(s): 100.0 us,  0.0 sy,  0.0 ni

Step 4: Run a Network Latency Attack

Simulate network delays between services:

# Add 200ms latency to port 8080 for 120 seconds
gremlin attack latency \
  --length 120 \
  --target 1 \
  --port 8080 \
  --latency 200

# Expected output:
# ✅ Latency attack created
# Attack ID: lat-attack-002
# Type: Latency
# Latency: 200ms
# Duration: 120 seconds

Verify the latency is applied:

# From another host measure the latency
ping target-host
# Expected output:
# rtt min/avg/max/mdev = 200.342/201.123/202.456/0.567 ms

Step 5: Create a Scenario with Multiple Attacks

Scenarios chain multiple attacks together for complex failure simulations:

# Create a scenario that simulates a degraded database
gremlin scenario create \
  --name "Degraded Database" \
  --description "Simulates a database experiencing resource pressure"
# Then add steps:
gremlin scenario step add \
  --scenario-id "scenario-001" \
  --attack cpu --length 120 --target db-host

gremlin scenario step add \
  --scenario-id "scenario-001" \
  --attack latency --length 120 --target db-host --port 5432 --latency 100

gremlin scenario run --scenario-id "scenario-001"
# Expected output:
# ✅ Scenario 'Degraded Database' is now running

Learning Path

flowchart LR
  A[LitmusChaos] --> B[Gremlin Platform]
  B --> C[AWS Fault Injection]
  C --> D[Azure Chaos Studio]
  D --> E[Latency Injection]
  style B fill:#f90,color:#fff

Common Errors

Running attacks without Gremlin daemon running: The gremlind daemon must be active on the target host. Check with sudo <a href="/devops/chaos-engineering/">gremlin</a> status.
Not setting a duration on attacks: Without a length parameter attacks run until manually stopped. Always set a duration.
Using incorrect port numbers for latency attacks: When targeting specific ports ensure the service is actually listening on that port.
Overlapping attacks on the same host: Multiple concurrent attacks on a single host can produce unpredictable results.
Forgetting to halt the attack when done: Use <a href="/devops/chaos-engineering/">gremlin</a> halt <attack-id> to stop an attack early.

Practice Questions

What attack types does Gremlin support for network chaos?
How do you install the Gremlin agent on a Linux host?
What is a Gremlin scenario and how does it differ from a single attack?
How do you verify that a latency attack is active?
How do you stop a running Gremlin attack manually?

Challenge

Create a Gremlin scenario that simulates a three-stage failure: first inject 50 percent CPU on the application server, then after 60 seconds add 300ms latency to the database connection, and finally after another 60 seconds kill the Process on a cache server. Run the scenario and document the system behavior at each stage.

FAQ

What is Gremlin?

Gremlin is a managed Chaos Engineering platform that provides safe failure injection for infrastructure and applications through a web console, CLI, and API.

Does Gremlin require agents on every target?

Yes. Gremlin installs lightweight agents on target hosts or clusters to execute attacks. The agent communicates with the Gremlin API.

Can Gremlin run experiments on Windows servers?

Yes. Gremlin supports Windows Server 2016 and later. Installation uses an MSI package instead of the Linux script.

How does Gremlin handle safety and guardrails?

Gremlin provides halt-on-alert integrations, team-based permissions, scheduled halt times, and configurable Blast Radius limits.

Is Gremlin free?

Gremlin offers a free tier with limited attacks and targets. Paid plans provide full access to all attack types, scenarios, and team features.

← Previous LitmusChaos — Cloud-Native Chaos Engineering Next → AWS Fault Injection Service — Testing AWS Workloads

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Chaos Engineering