Network Partitioning — Simulating Split-Brain Scenarios

DodaTech Updated 2026-06-21 5 min read

In this tutorial, you'll learn about Network Partitioning. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Network Partitioning is a Chaos Engineering technique that splits a cluster into isolated groups that cannot communicate with each other. This simulates real-world network failures like switch failures, firewall misconfigurations, or cloud provider availability zone isolation.

What You Will Learn

This tutorial teaches you how to create network partitions using Chaos Mesh, iptables, and cloud networking tools, and how to test your applications behavior during split-brain scenarios.

Why It Matters

Network partitions are the hardest failures for Distributed Systems to handle correctly. During a partition each side of the split believes it is the authoritative source of truth. Without proper quorum and consensus mechanisms this leads to data corruption, duplicate processing, or full system unavailability.

Real-World Use

DodaTech uses network Partitioning experiments to test the etcd cluster that backs Durga Antivirus Pro Configuration Management. By isolating one etcd node the team confirmed that the remaining nodes maintain quorum and the application continues to operate normally.

Prerequisites

Before starting you should understand:

Chaos Engineering fundamentals (hypothesis, Blast Radius)
Kubernetes networking basics (pods, services, DNS)
Distributed consensus concepts (quorum, leader election)
Docker for local cluster experiments

Step 1: Create a Local Etcd Cluster

Set up a three-node etcd cluster for partition experiments:

# docker-compose-etcd.yaml
services:
  etcd1:
    image: bitnami/etcd:3.5
    environment:
      ETCD_NAME: infra1
      ETCD_INITIAL_CLUSTER: infra1=http://etcd1:2380,infra2=http://etcd2:2380,infra3=http://etcd3:2380
  etcd2:
    image: bitnami/etcd:3.5
    environment:
      ETCD_NAME: infra2
      ETCD_INITIAL_CLUSTER: infra1=http://etcd1:2380,infra2=http://etcd2:2380,infra3=http://etcd3:2380
  etcd3:
    image: bitnami/etcd:3.5
    environment:
      ETCD_NAME: infra3
      ETCD_INITIAL_CLUSTER: infra1=http://etcd1:2380,infra2=http://etcd2:2380,infra3=http://etcd3:2380

docker compose -f docker-compose-etcd.yaml up -d
# Expected output:
# [+] Running 3/3
#  - Container etcd1 Started
#  - Container etcd2 Started
#  - Container etcd3 Started

Step 2: Create a Network Partition

Isolate one node from the others using iptables:

# On etcd1, block traffic to/from etcd2
docker exec etcd1 iptables -A INPUT -s etcd2 -j DROP
docker exec etcd1 iptables -A OUTPUT -d etcd2 -j DROP

# Check cluster health from etcd1
docker exec etcd1 etcdctl endpoint health --cluster
# Expected output:
# http://etcd1:2379 is healthy
# http://etcd2:2379 is unhealthy
# http://etcd3:2379 is healthy

# Check from etcd2
docker exec etcd2 etcdctl endpoint health --cluster
# Expected output:
# http://etcd1:2379 is unhealthy
# http://etcd2:2379 is healthy
# http://etcd3:2379 is healthy

Step 3: Test Quorum During Partition

Verify that the cluster maintains quorum with 2 of 3 nodes:

# Write a key from etcd2 (connected side)
docker exec etcd2 etcdctl put /config/key "value1"
# Expected output:
# OK

# Try to read the key from etcd1 (isolated side)
docker exec etcd1 etcdctl get /config/key
# Expected output:
# /config/key
# value1
# The value is replicated via etcd3 which can reach both sides

Step 4: Test Full Cluster Split

Create an even split where both sides have 1 node:

# Isolate etcd3 from both etcd1 and etcd2
docker exec etcd3 iptables -A INPUT -s etcd1 -j DROP
docker exec etcd3 iptables -A OUTPUT -d etcd1 -j DROP
docker exec etcd3 iptables -A INPUT -s etcd2 -j DROP
docker exec etcd3 iptables -A OUTPUT -d etcd2 -j DROP

# Try to write from the single-node side
docker exec etcd3 etcdctl put /config/key "value2"
# Expected output:
# Error: etcdserver: leader changed
# A single node cannot form quorum (needs majority of 3 = 2)

Step 5: Use Chaos Mesh for Kubernetes Partitions

In Kubernetes use NetworkChaos to partition pods:

# k8s-partition.yaml
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: etcd-partition
spec:
  action: partition
  mode: all
  selector:
    namespaces:
      - production
    labelSelectors:
      app: etcd
  direction: both
  target:
    mode: one
    selector:
      namespaces:
        - production
      labelSelectors:
        app: etcd
        statefulset.kubernetes.io/pod-name: etcd-0
  duration: 60s

kubectl apply -f k8s-partition.yaml
# Expected output:
# networkchaos.chaos-mesh.org/etcd-partition created

Learning Path

flowchart LR
  A[Database Faults] --> B[Network Partitioning]
  B --> C[Infrastructure Faults]
  C --> D[Kubernetes Chaos]
  D --> E[Chaos Engineering Pipeline]
  style B fill:#f90,color:#fff

Common Errors

Assuming symmetric partitions: In real networks partitions are rarely symmetric. Node A may be able to reach B but not vice versa. Test both directions.
Not testing partition recovery: Removing the partition is more complex than creating it. Ensure iptables rules are cleaned up and the cluster re-joins correctly.
Forgetting that etcd requires majority quorum: With 3 nodes you need at least 2 for quorum. A 1-1-1 split or 1-2 split can behave differently.
Using random partition targets: Design which nodes are isolated intentionally. Random partitions may not test the specific scenario you need.
Not considering the blast radius of networking experiments: A Network Partition on a shared network can affect unrelated services. Use dedicated test clusters.

Practice Questions

What is a Network Partition and why is it dangerous for Distributed Systems?
How many nodes are needed for quorum in a 5-node etcd cluster?
What happens when a Network Partition prevents quorum?
How do you create a Network Partition using iptables?
How does Chaos Mesh define partition targets?

Challenge

Create a 5-node etcd cluster and design three partition scenarios: a 3-2 split (quorum maintained), a 2-2-1 split (no quorum), and an asymmetric partition where one node can send but not receive. For each scenario document whether the cluster remains available and whether writes succeed.

FAQ

What is a Network Partition?

A Network Partition occurs when a network failure splits a cluster into isolated groups that cannot communicate with each other.

What is split-brain in Distributed Systems?

Split-brain happens when both sides of a Network Partition continue operating independently, potentially making conflicting changes that are unrecoverable when the partition heals.

How does etcd handle network partitions?

etcd uses the Raft consensus algorithm which requires a majority of nodes (quorum) to accept writes. A minority partition becomes read-only.

Can network Partitioning cause data loss?

In systems without proper consensus mechanisms yes. Systems using Raft or Paxos avoid data loss by requiring quorum for writes.

How do you recover from a Network Partition?

Fix the underlying network issue. The cluster nodes automatically rejoin and reconcile their state through the consensus protocol.

← Previous Database Chaos — Connection Drops, Replication Lag & Corruption Next → Infrastructure Faults — CPU, Memory, Disk & Node Failures

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Chaos Engineering