Network Partitioning — Simulating Split-Brain Scenarios
In this tutorial, you'll learn about Network Partitioning. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Network Partitioning is a Chaos Engineering technique that splits a cluster into isolated groups that cannot communicate with each other. This simulates real-world network failures like switch failures, firewall misconfigurations, or cloud provider availability zone isolation.
What You Will Learn
This tutorial teaches you how to create network partitions using Chaos Mesh, iptables, and cloud networking tools, and how to test your applications behavior during split-brain scenarios.
Why It Matters
Network partitions are the hardest failures for Distributed Systems to handle correctly. During a partition each side of the split believes it is the authoritative source of truth. Without proper quorum and consensus mechanisms this leads to data corruption, duplicate processing, or full system unavailability.
Real-World Use
DodaTech uses network Partitioning experiments to test the etcd cluster that backs Durga Antivirus Pro Configuration Management. By isolating one etcd node the team confirmed that the remaining nodes maintain quorum and the application continues to operate normally.
Prerequisites
Before starting you should understand:
- Chaos Engineering fundamentals (hypothesis, Blast Radius)
- Kubernetes networking basics (pods, services, DNS)
- Distributed consensus concepts (quorum, leader election)
- Docker for local cluster experiments
Step 1: Create a Local Etcd Cluster
Set up a three-node etcd cluster for partition experiments:
# docker-compose-etcd.yaml
services:
etcd1:
image: bitnami/etcd:3.5
environment:
ETCD_NAME: infra1
ETCD_INITIAL_CLUSTER: infra1=http://etcd1:2380,infra2=http://etcd2:2380,infra3=http://etcd3:2380
etcd2:
image: bitnami/etcd:3.5
environment:
ETCD_NAME: infra2
ETCD_INITIAL_CLUSTER: infra1=http://etcd1:2380,infra2=http://etcd2:2380,infra3=http://etcd3:2380
etcd3:
image: bitnami/etcd:3.5
environment:
ETCD_NAME: infra3
ETCD_INITIAL_CLUSTER: infra1=http://etcd1:2380,infra2=http://etcd2:2380,infra3=http://etcd3:2380
docker compose -f docker-compose-etcd.yaml up -d
# Expected output:
# [+] Running 3/3
# - Container etcd1 Started
# - Container etcd2 Started
# - Container etcd3 Started
Step 2: Create a Network Partition
Isolate one node from the others using iptables:
# On etcd1, block traffic to/from etcd2
docker exec etcd1 iptables -A INPUT -s etcd2 -j DROP
docker exec etcd1 iptables -A OUTPUT -d etcd2 -j DROP
# Check cluster health from etcd1
docker exec etcd1 etcdctl endpoint health --cluster
# Expected output:
# http://etcd1:2379 is healthy
# http://etcd2:2379 is unhealthy
# http://etcd3:2379 is healthy
# Check from etcd2
docker exec etcd2 etcdctl endpoint health --cluster
# Expected output:
# http://etcd1:2379 is unhealthy
# http://etcd2:2379 is healthy
# http://etcd3:2379 is healthy
Step 3: Test Quorum During Partition
Verify that the cluster maintains quorum with 2 of 3 nodes:
# Write a key from etcd2 (connected side)
docker exec etcd2 etcdctl put /config/key "value1"
# Expected output:
# OK
# Try to read the key from etcd1 (isolated side)
docker exec etcd1 etcdctl get /config/key
# Expected output:
# /config/key
# value1
# The value is replicated via etcd3 which can reach both sides
Step 4: Test Full Cluster Split
Create an even split where both sides have 1 node:
# Isolate etcd3 from both etcd1 and etcd2
docker exec etcd3 iptables -A INPUT -s etcd1 -j DROP
docker exec etcd3 iptables -A OUTPUT -d etcd1 -j DROP
docker exec etcd3 iptables -A INPUT -s etcd2 -j DROP
docker exec etcd3 iptables -A OUTPUT -d etcd2 -j DROP
# Try to write from the single-node side
docker exec etcd3 etcdctl put /config/key "value2"
# Expected output:
# Error: etcdserver: leader changed
# A single node cannot form quorum (needs majority of 3 = 2)
Step 5: Use Chaos Mesh for Kubernetes Partitions
In Kubernetes use NetworkChaos to partition pods:
# k8s-partition.yaml
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
name: etcd-partition
spec:
action: partition
mode: all
selector:
namespaces:
- production
labelSelectors:
app: etcd
direction: both
target:
mode: one
selector:
namespaces:
- production
labelSelectors:
app: etcd
statefulset.kubernetes.io/pod-name: etcd-0
duration: 60s
kubectl apply -f k8s-partition.yaml
# Expected output:
# networkchaos.chaos-mesh.org/etcd-partition created
Learning Path
flowchart LR A[Database Faults] --> B[Network Partitioning] B --> C[Infrastructure Faults] C --> D[Kubernetes Chaos] D --> E[Chaos Engineering Pipeline] style B fill:#f90,color:#fff
Common Errors
- Assuming symmetric partitions: In real networks partitions are rarely symmetric. Node A may be able to reach B but not vice versa. Test both directions.
- Not testing partition recovery: Removing the partition is more complex than creating it. Ensure iptables rules are cleaned up and the cluster re-joins correctly.
- Forgetting that etcd requires majority quorum: With 3 nodes you need at least 2 for quorum. A 1-1-1 split or 1-2 split can behave differently.
- Using random partition targets: Design which nodes are isolated intentionally. Random partitions may not test the specific scenario you need.
- Not considering the blast radius of networking experiments: A Network Partition on a shared network can affect unrelated services. Use dedicated test clusters.
Practice Questions
- What is a Network Partition and why is it dangerous for Distributed Systems?
- How many nodes are needed for quorum in a 5-node etcd cluster?
- What happens when a Network Partition prevents quorum?
- How do you create a Network Partition using iptables?
- How does Chaos Mesh define partition targets?
Challenge
Create a 5-node etcd cluster and design three partition scenarios: a 3-2 split (quorum maintained), a 2-2-1 split (no quorum), and an asymmetric partition where one node can send but not receive. For each scenario document whether the cluster remains available and whether writes succeed.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro