Skip to content

Fault Injection Proxy — Toxiproxy & Service Mesh Chaos

DodaTech Updated 2026-06-21 4 min read

In this tutorial, you'll learn about Fault Injection Proxy. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

A Fault Injection proxy sits between your application and its dependencies, intercepting traffic to inject failures like latency, disconnections, and data corruption. Toxiproxy and service mesh proxies like Envoy are the most common tools for this approach.

What You Will Learn

This tutorial teaches you how to use Toxiproxy for application-level Fault Injection and how service meshes provide built-in chaos capabilities through Envoy proxy configurations.

Why It Matters

Fault Injection proxies work at the application layer (TCP/HTTP), which means they can simulate failures that look exactly like real dependency failures. This is more realistic than infrastructure-level faults for testing how your code handles database timeouts, API errors, and connection resets.

Real-World Use

DodaTech uses Toxiproxy in the CI pipeline for every microservice. The pipeline starts Toxiproxy with a pre-configured set of faults and runs the integration test suite against the degraded dependencies. This catches timeout and retry bugs before they reach staging.

Prerequisites

Before starting you should understand:

  • Chaos Engineering fundamentals (hypothesis, Blast Radius)
  • How TCP connections and HTTP requests work
  • Docker basics for running containerized tools
  • Application code concepts for timeouts and retries

Step 1: Set Up Toxiproxy

Toxiproxy is a proxy that can inject fault conditions into TCP connections:

# docker-compose-toxiproxy.yaml
services:
  toxiproxy:
    image: shopify/toxiproxy:2.8.0
    ports:
      - "8474:8474"
      - "15432:15432"
  postgres:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
docker compose -f docker-compose-toxiproxy.yaml up -d
# Expected output:
# [+] Running 3/3
#  - Container postgres  Started
#  - Container toxiproxy Started

Step 2: Create a Proxy and Add Toxicity

Use the Toxiproxy CLI to create a proxy and add a fault:

# Create a proxy for PostgreSQL
toxiproxy-cli create payment-db \
  --listen 0.0.0.0:15432 \
  --upstream postgres:5432

# Expected output:
# Created proxy payment-db

# Add latency toxicity (500ms delay on downstream traffic)
toxiproxy-cli toxic add payment-db \
  --type latency \
  --attribute latency=500 \
  --attribute jitter=50

# Expected output:
# Added downstream latency toxic on proxy payment-db

Step 3: Test Application Behavior Under Proxy Faults

Point your application at the Toxiproxy port instead of the real database:

# Run application pointing to Toxiproxy port
DATABASE_URL=postgres://user:pass@localhost:15432/mydb python app.py

# In another terminal, measure query time
time curl http://localhost:8080/api/data
# Expected output with 500ms latency:
# {"data": [...]}
# real    0m1.234s
# The query took 1.2 seconds due to the 500ms proxy latency

Step 4: Use Envoy Service Mesh for Chaos

In a service mesh, Envoy proxies can inject faults through configuration:

# envoy-fault-injection.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: payment-service-delay
spec:
  hosts:
    - payment-service
  http:
    - fault:
        delay:
          percentage:
            value: 50
          fixedDelay: 3s
        abort:
          percentage:
            value: 10
          httpStatus: 500
      route:
        - destination:
            host: payment-service

Expected output:

kubectl apply -f envoy-fault-injection.yaml
# Expected output:
# virtualservice.networking.istio.io/payment-service-delay created

Step 5: Remove Faults and Verify Recovery

After testing remove the fault configurations:

# Remove Toxiproxy toxicity
toxiproxy-cli toxic remove payment-db latency

# Remove Envoy fault injection
kubectl delete virtualservice payment-service-delay

# Verify normal behavior
curl -s -o /dev/null -w "%{time_total}s\n" http://localhost:8080/api/data
# Expected output (no latency):
# 0.234s

Learning Path

flowchart LR
  A[Latency Injection] --> B[Fault Injection Proxy]
  B --> C[Dependency Testing]
  C --> D[Resilience Testing]
  D --> E[Database Faults]
  style B fill:#f90,color:#fff

Common Errors

  1. Pointing the application to the wrong port: Ensure the application connects to the Toxiproxy listen port not directly to the upstream service.
  2. Adding toxicity to the wrong stream direction: Downstream affects responses to the client. Upstream affects requests to the server. Test both directions.
  3. Forgetting to remove proxy after testing: A running Toxiproxy with active toxicity will affect all connections. Always clean up.
  4. Setting abort percentages too high: A 50 percent abort rate will cause most integration tests to fail. Start with 1-5 percent.
  5. Using Fault Injection in production without proper isolation: Service mesh Fault Injection affects real traffic. Use percentage-based faults and monitor closely.

Practice Questions

  1. What is the difference between Toxiproxy and tc for Fault Injection?
  2. How do you create a proxy and add toxicity using the Toxiproxy CLI?
  3. How does Envoy Fault Injection differ from Toxiproxy?
  4. What does the downstream stream direction mean in Toxiproxy?
  5. How do you verify that a Fault Injection has been removed?

Challenge

Set up a complete Fault Injection test environment using Docker Compose with an application, a database, and Toxiproxy between them. Write integration tests that verify the application handles 2-second database latency, connection resets, and 10 percent abort rates correctly.

FAQ

What is a Fault Injection proxy?

A Fault Injection proxy sits between services and modifies traffic to simulate failures like latency, disconnections, and errors at the application layer.

What is Toxiproxy?

Toxiproxy is an open-source TCP proxy from Shopify that lets you inject fault conditions into network connections for testing.

Can I use Fault Injection proxies in production?

Use percentage-based faults through service mesh proxies in production. Dedicated proxies like Toxiproxy are better suited for test environments.

How does service mesh Fault Injection work?

Service mesh sidecar proxies (Envoy) intercept all traffic between services and can inject delays, aborts, or bandwidth limits based on percentage or request matching.

What are the common types of toxicity in Toxiproxy?

Latency, timeout, slicer (cuts connections), limit_data (limits bandwidth), and corrupt (corrupts data) are the main toxicity types.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro