Fault Injection Proxy — Toxiproxy & Service Mesh Chaos
In this tutorial, you'll learn about Fault Injection Proxy. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
A Fault Injection proxy sits between your application and its dependencies, intercepting traffic to inject failures like latency, disconnections, and data corruption. Toxiproxy and service mesh proxies like Envoy are the most common tools for this approach.
What You Will Learn
This tutorial teaches you how to use Toxiproxy for application-level Fault Injection and how service meshes provide built-in chaos capabilities through Envoy proxy configurations.
Why It Matters
Fault Injection proxies work at the application layer (TCP/HTTP), which means they can simulate failures that look exactly like real dependency failures. This is more realistic than infrastructure-level faults for testing how your code handles database timeouts, API errors, and connection resets.
Real-World Use
DodaTech uses Toxiproxy in the CI pipeline for every microservice. The pipeline starts Toxiproxy with a pre-configured set of faults and runs the integration test suite against the degraded dependencies. This catches timeout and retry bugs before they reach staging.
Prerequisites
Before starting you should understand:
- Chaos Engineering fundamentals (hypothesis, Blast Radius)
- How TCP connections and HTTP requests work
- Docker basics for running containerized tools
- Application code concepts for timeouts and retries
Step 1: Set Up Toxiproxy
Toxiproxy is a proxy that can inject fault conditions into TCP connections:
# docker-compose-toxiproxy.yaml
services:
toxiproxy:
image: shopify/toxiproxy:2.8.0
ports:
- "8474:8474"
- "15432:15432"
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
docker compose -f docker-compose-toxiproxy.yaml up -d
# Expected output:
# [+] Running 3/3
# - Container postgres Started
# - Container toxiproxy Started
Step 2: Create a Proxy and Add Toxicity
Use the Toxiproxy CLI to create a proxy and add a fault:
# Create a proxy for PostgreSQL
toxiproxy-cli create payment-db \
--listen 0.0.0.0:15432 \
--upstream postgres:5432
# Expected output:
# Created proxy payment-db
# Add latency toxicity (500ms delay on downstream traffic)
toxiproxy-cli toxic add payment-db \
--type latency \
--attribute latency=500 \
--attribute jitter=50
# Expected output:
# Added downstream latency toxic on proxy payment-db
Step 3: Test Application Behavior Under Proxy Faults
Point your application at the Toxiproxy port instead of the real database:
# Run application pointing to Toxiproxy port
DATABASE_URL=postgres://user:pass@localhost:15432/mydb python app.py
# In another terminal, measure query time
time curl http://localhost:8080/api/data
# Expected output with 500ms latency:
# {"data": [...]}
# real 0m1.234s
# The query took 1.2 seconds due to the 500ms proxy latency
Step 4: Use Envoy Service Mesh for Chaos
In a service mesh, Envoy proxies can inject faults through configuration:
# envoy-fault-injection.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: payment-service-delay
spec:
hosts:
- payment-service
http:
- fault:
delay:
percentage:
value: 50
fixedDelay: 3s
abort:
percentage:
value: 10
httpStatus: 500
route:
- destination:
host: payment-service
Expected output:
kubectl apply -f envoy-fault-injection.yaml
# Expected output:
# virtualservice.networking.istio.io/payment-service-delay created
Step 5: Remove Faults and Verify Recovery
After testing remove the fault configurations:
# Remove Toxiproxy toxicity
toxiproxy-cli toxic remove payment-db latency
# Remove Envoy fault injection
kubectl delete virtualservice payment-service-delay
# Verify normal behavior
curl -s -o /dev/null -w "%{time_total}s\n" http://localhost:8080/api/data
# Expected output (no latency):
# 0.234s
Learning Path
flowchart LR A[Latency Injection] --> B[Fault Injection Proxy] B --> C[Dependency Testing] C --> D[Resilience Testing] D --> E[Database Faults] style B fill:#f90,color:#fff
Common Errors
- Pointing the application to the wrong port: Ensure the application connects to the Toxiproxy listen port not directly to the upstream service.
- Adding toxicity to the wrong stream direction: Downstream affects responses to the client. Upstream affects requests to the server. Test both directions.
- Forgetting to remove proxy after testing: A running Toxiproxy with active toxicity will affect all connections. Always clean up.
- Setting abort percentages too high: A 50 percent abort rate will cause most integration tests to fail. Start with 1-5 percent.
- Using Fault Injection in production without proper isolation: Service mesh Fault Injection affects real traffic. Use percentage-based faults and monitor closely.
Practice Questions
- What is the difference between Toxiproxy and tc for Fault Injection?
- How do you create a proxy and add toxicity using the Toxiproxy CLI?
- How does Envoy Fault Injection differ from Toxiproxy?
- What does the downstream stream direction mean in Toxiproxy?
- How do you verify that a Fault Injection has been removed?
Challenge
Set up a complete Fault Injection test environment using Docker Compose with an application, a database, and Toxiproxy between them. Write integration tests that verify the application handles 2-second database latency, connection resets, and 10 percent abort rates correctly.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro