Kubernetes StatefulSets Guide — Stateful Application Management

DodaTech Updated 2026-06-24 9 min read

In this tutorial, you'll learn about Kubernetes StatefulSets Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Kubernetes StatefulSets manage stateful applications by providing stable network identities, ordered deployment and scaling, and persistent storage that follows each pod through rescheduling.

What You'll Learn

You'll master StatefulSets — stable pod identities with ordinal indexing and hostnames, PersistentVolumeClaim templates for per-pod storage, headless Services, ordered rolling updates, and graceful scaling for stateful workloads.

Why This Problem Matters

Deployments treat pods as interchangeable. Databases, Message Queues, and Distributed Systems need stable identities — each pod must be uniquely identifiable and maintain its storage across rescheduling. StatefulSets provide these guarantees for stateful applications.

Real-World Use

DodaZIP's metadata database cluster runs on StatefulSets with three replicas. Each pod has a dedicated PVC that persists across rescheduling. The headless service (postgres-0.postgres.dodatech.svc.cluster.local) ensures stable DNS names for Replication configuration.

StatefulSet Architecture

flowchart TB
  subgraph StatefulSet
    SS[StatefulSet: postgres]
    SS --> Pod0[postgres-0]
    SS --> Pod1[postgres-1]
    SS --> Pod2[postgres-2]
  end
  subgraph HeadlessService
    SVC[Service: postgres
clusterIP: None]
  end
  subgraph Storage
    PVC0[PVC postgres-0 ➔ 100Gi]
    PVC1[PVC postgres-1 ➔ 100Gi]
    PVC2[PVC postgres-2 ➔ 100Gi]
  end
  subgraph DNS
    DNS0[postgres-0.postgres.svc.cluster.local]
    DNS1[postgres-1.postgres.svc.cluster.local]
    DNS2[postgres-2.postgres.svc.cluster.local]
  end
  Pod0 --- PVC0
  Pod0 --- DNS0
  Pod1 --- PVC1
  Pod1 --- DNS1
  Pod2 --- PVC2
  Pod2 --- DNS2
  Pod0 --- SVC
  Pod1 --- SVC
  Pod2 --- SVC

Basic StatefulSet

# statefulset-postgres.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  clusterIP: None  # Headless service
  ports:
    - port: 5432
      name: postgres
  selector:
    app: postgres
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:16
          env:
            - name: POSTGRES_PASSWORD
              value: secret
          ports:
            - containerPort: 5432
              name: postgres
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 100Gi

kubectl apply -f statefulset-postgres.yaml
kubectl get sts
kubectl get pods -w

Expected output:

NAME       READY   STATUS    RESTARTS   AGE
postgres   3/3    Running   0          2m
postgres-0   1/1     Running   0          2m
postgres-1   1/1     Running   0          1m
postgres-2   1/1     Running   0          30s

Notice the ordered creation: postgres-0 starts first, then postgres-1 after it's Ready, then postgres-2.

Stable DNS Names

# Query DNS from another pod
kubectl run dns-test --image=busybox:1.36 --rm -it --restart=Never -- \
  nslookup postgres-0.postgres.default.svc.cluster.local

Expected output:

Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      postgres-0.postgres.default.svc.cluster.local
Address 1: 10.244.1.5

Pod Identity Simulation

class StatefulSetPod:
    def __init__(self, ordinal: int, statefulset: str, replicas: int):
        self.ordinal = ordinal
        self.name = f"{statefulset}-{ordinal}"
        self.hostname = self.name
        self.subdomain = statefulset
        self.ready = False

    def dns_name(self, namespace: str = "default") -> str:
        return (f"{self.name}.{self.subdomain}."
                f"{namespace}.svc.cluster.local")

    def __repr__(self):
        return (f"Pod({self.name}, "
                f"dns={self.dns_name()}, "
                f"ready={self.ready})")

class StatefulSet:
    def __init__(self, name: str, replicas: int,
                 service_name: str = None):
        self.name = name
        self.replicas = replicas
        self.service_name = service_name or name
        self.pods = [
            StatefulSetPod(i, name, replicas)
            for i in range(replicas)
        ]
        self.volumes = {}

    def scale(self, new_replicas: int):
        if new_replicas > self.replicas:
            for i in range(self.replicas, new_replicas):
                self.pods.append(
                    StatefulSetPod(i, self.name, new_replicas)
                )
        elif new_replicas < self.replicas:
            self.pods = self.pods[:new_replicas]
            print(f"Scaling down to {new_replicas}: "
                  f"pods {new_replicas}-{self.replicas - 1} "
                  f"terminated")
        self.replicas = new_replicas

    def rolling_update(self, new_image: str):
        for pod in reversed(self.pods):
            print(f"Updating {pod.name} to {new_image}...")
            pod.ready = False
            # Simulate update
            pod.ready = True
            print(f"  {pod.name} updated")

    def get_pod_by_ordinal(self, ordinal: int) -> StatefulSetPod:
        return self.pods[ordinal]

sts = StatefulSet("postgres", 3)
for pod in sts.pods:
    pod.ready = True
    print(f"  {pod.name}: {pod.dns_name()}")

print("\nScaling from 3 to 5...")
sts.scale(5)
for pod in sts.pods:
    print(f"  {pod.name}: hostname={pod.hostname}")

print("\nRolling update from postgres:16 to postgres:17...")
sts.rolling_update("postgres:17")

Expected output:

  postgres-0: postgres-0.postgres.default.svc.cluster.local
  postgres-1: postgres-1.postgres.default.svc.cluster.local
  postgres-2: postgres-2.postgres.default.svc.cluster.local

Scaling from 3 to 5...
  postgres-0: hostname=postgres-0
  postgres-1: hostname=postgres-1
  postgres-2: hostname=postgres-2
  postgres-3: hostname=postgres-3
  postgres-4: hostname=postgres-4

Rolling update from postgres:16 to postgres:17...
Updating postgres-2 to postgres:17...
  postgres-2 updated
Updating postgres-1 to postgres:17...
  postgres-1 updated
Updating postgres-0 to postgres:17...
  postgres-0 updated

Ordered Pod Management

# Ordered pod management policies
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: zookeeper
spec:
  podManagementPolicy: OrderedReady  # Default: create/delete one at a time
  # Alternative: Parallel (start all pods simultaneously)
  # podManagementPolicy: Parallel
  serviceName: zookeeper
  replicas: 3
  template:
    spec:
      containers:
        - name: zookeeper
          image: zookeeper:3.9

Parallel Pod Management

For workloads where startup ordering doesn't matter:

spec:
  podManagementPolicy: Parallel

Parallel is used for workloads that can handle all pods starting simultaneously (e.g., Cassandra, where each node joins the ring independently).

Update Strategy

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  updateStrategy:
    type: RollingUpdate  # Default
    rollingUpdate:
      maxUnavailable: 1  # How many pods can be down during update
      partition: 0       # Only update ordinals >= partition
  # Alternative: OnDelete (manual pod deletion triggers update)
  # updateStrategy:
  #   type: OnDelete

Canary Updates with Partition

spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      partition: 2  # Only update pods with ordinal >= 2

With partition: 2, only pod-2, pod-3, etc. are updated. Pod-0 and pod-1 stay on the old version — a canary for production testing.

Persistent Storage Per Pod

class VolumeClaimTemplate:
    def __init__(self, name: str, size: str, storage_class: str):
        self.name = name
        self.size = size
        self.storage_class = storage_class

    def claim_name(self, pod_name: str) -> str:
        return f"{self.name}-{pod_name}"

class StatefulSetPVCManager:
    def __init__(self):
        self.pvcs = {}

    def create_pvc(self, sts_name: str, ordinal: int,
                   template: VolumeClaimTemplate):
        pod_name = f"{sts_name}-{ordinal}"
        claim_name = template.claim_name(pod_name)
        if claim_name not in self.pvcs:
            self.pvcs[claim_name] = {
                "size": template.size,
                "storage_class": template.storage_class,
                "bound_to": pod_name,
                "status": "Bound"
            }
            print(f"Created PVC {claim_name} ({template.size}) "
                  f"for {pod_name}")

    def delete_pod_pvcs(self, sts_name: str, ordinal: int):
        pod_name = f"{sts_name}-{ordinal}"
        to_delete = [
            name for name, pvc in self.pvcs.items()
            if pvc["bound_to"] == pod_name
        ]
        for name in to_delete:
            del self.pvcs[name]
            print(f"Deleted PVC {name}")

    def verify_storage_retention(self, sts_name: str, ordinal: int):
        pod_name = f"{sts_name}-{ordinal}"
        claim_name = f"data-{pod_name}"
        return claim_name in self.pvcs

manager = StatefulSetPVCManager()
template = VolumeClaimTemplate("data", "100Gi", "fast-ssd")

for i in range(3):
    manager.create_pvc("postgres", i, template)

print(f"\nPVC for postgres-0 exists: "
      f"{manager.verify_storage_retention('postgres', 0)}")

manager.delete_pod_pvcs("postgres", 0)
print(f"PVC for postgres-0 after delete: "
      f"{manager.verify_storage_retention('postgres', 0)}")

Expected output:

Created PVC data-postgres-0 (100Gi) for postgres-0
Created PVC data-postgres-1 (100Gi) for postgres-1
Created PVC data-postgres-2 (100Gi) for postgres-2

PVC for postgres-0 exists: True
Deleted PVC data-postgres-0
PVC for postgres-0 after delete: False

Common Mistakes

1. Using Deployment for Stateful Workloads

Deployments don't guarantee stable pod identities or storage persistence. When a pod is recreated, it gets a random name and may not mount the same PVC. Always use StatefulSet for databases, queues, and stateful services.

2. Not Using Headless Service

Without a headless Service (clusterIP: None), pods get random DNS names. StatefulSet requires a headless service for stable network identities.

3. Scaling Down Without Draining

Scaling a StatefulSet from 5 to 3 deletes pods 4 and 3. If these are database nodes, data may be lost unless Replication has caught up. Use PodDisruptionBudget and drain the nodes before scaling down.

4. Forgetting PVC Retention

When you delete a StatefulSet, the PVCs remain (they're not owned by the StatefulSet). This prevents data loss but also means storage costs continue. Manage PVC lifecycle separately.

5. Using ReadWriteOnce for Shared Access

ReadWriteOnce can only be mounted by one node. If multi-node read access is needed, use ReadWriteMany via NFS or EFS. Each StatefulSet pod gets its own RWO volume.

6. Ordered Pod Deletion Without Dependencies

StatefulSet deletes pods in reverse ordinal order (3, 2, 1, 0). If pod-2 depends on pod-0 (e.g., Replication), the deletion order may cause issues. Handle dependencies in PreStop hooks.

7. No StatefulSet-Specific Monitoring

StatefulSet failures often involve storage (PVC pending, volume attachment timeout). Monitor PVC status, volume attachment errors, and pod eviction events separately from stateless deployments.

Practice Questions

1. How does a StatefulSet differ from a Deployment?

StatefulSet provides stable pod identity (pod-name-index), ordered deployment/scaling, and per-pod persistent storage. Deployment provides identical, interchangeable pods with no guaranteed identity or storage persistence.

2. What is the purpose of the headless service in a StatefulSet?

The headless service (clusterIP: None) enables DNS-based pod discovery. Each pod gets a DNS A record like pod-name.service-name.namespace.svc.cluster.local, resolving directly to the pod's IP.

3. How does rolling update work in StatefulSet?

Pods are updated in reverse ordinal order (largest to smallest). Each pod is terminated and recreated with the new spec before the next one is updated. With partition, you can control where the update starts, enabling canary deployments.

4. What happens to PVCs when a StatefulSet is scaled down?

PVCs are NOT deleted when the StatefulSet scale reduces. The PVCs remain in the cluster to preserve data. To delete them, you must manually delete the PVCs or use kubectl delete sts --cascade=orphan and handle cleanup separately.

5. Challenge: Design a StatefulSet-based Cassandra cluster.

Cassandra needs each node to have a unique identity (for gossip protocol), persistent storage, and ordered bootstrap (first node seeds the cluster). Design the StatefulSet configuration with: headless service, volumeClaimTemplates, ordered pod management, initial readiness check that waits for the seed node, and update strategy with partition for rolling upgrades.

Mini Project: StatefulSet Cluster Manager

import time

class ClusterNode:
    def __init__(self, ordinal: int, cluster_size: int, seed: bool = False):
        self.ordinal = ordinal
        self.name = f"node-{ordinal}"
        self.seed = seed
        self.data = {}
        self.ready = False

    def join_cluster(self, seed_node):
        print(f"  {self.name} joining via {seed_node.name}")
        self.ready = True

    def __repr__(self):
        return (f"{self.name} (seed={self.seed}, "
                f"ready={self.ready})")

class StatefulCluster:
    def __init__(self, name: str, replicas: int):
        self.name = name
        self.nodes = []
        self.seed_node = None
        self.scale_to(replicas)

    def scale_to(self, n: int):
        if n > len(self.nodes):
            for i in range(len(self.nodes), n):
                is_seed = (i == 0)
                node = ClusterNode(i, n, seed=is_seed)
                self.nodes.append(node)
                if is_seed:
                    self.seed_node = node
        elif n < len(self.nodes):
            self.nodes = self.nodes[:n]
        print(f"\nCluster scaled to {n} nodes:")
        self.bootstrap()

    def bootstrap(self):
        for node in self.nodes:
            if node.seed:
                node.ready = True
                print(f"  {node.name} bootstrapped (seed)")
            elif self.seed_node:
                node.join_cluster(self.seed_node)

cluster = StatefulCluster("cassandra", 3)
print("\nNodes:")
for n in cluster.nodes:
    print(f"  {n}")

Expected output:

Cluster scaled to 3 nodes:
  node-0 bootstrapped (seed)
  node-1 joining via node-0
  node-2 joining via node-0

Nodes:
  node-0 (seed=True, ready=True)
  node-1 (seed=False, ready=True)
  node-2 (seed=False, ready=True)

FAQ

Can I use a regular Service with StatefulSet?

Yes, but the regular Service provides load-balanced access to any ready pod. For stable per-pod DNS, you need a headless Service. Many deployments use both: a headless Service for intra-cluster pod communication and a regular Service for external access.

How do I back up data for a StatefulSet pod?

Since each pod has its own PVC, you can back up by connecting to the pod and running backup commands, using a sidecar backup container, or using the cloud provider's snapshot feature on the underlying volume.

What happens when a StatefulSet pod is rescheduled to a different node?

The pod keeps its name, hostname, and DNS entry. The PVC is detached from the old node and reattached to the new node. Data persists because the PVC remains bound to the pod, even across nodes.

What's Next

Kubernetes Service Accounts Guide

Kubernetes Persistent Volumes

Kubernetes Storage Classes

Congratulations on completing this StatefulSets guide! Here's where to go from here:

Practice daily — Deploy a StatefulSet with a database
Build a project — Set up a Cassandra or PostgreSQL cluster on StatefulSets
Explore related topics — Operator pattern for databases, volume snapshots, backup/restore with Velero
Join the community — Share your StatefulSet configurations and get feedback

Remember: every expert was once a beginner. Keep stateful!

← Previous Kubernetes Priority & Preemption — Critical Workloads First Next → Kubernetes Service Accounts Guide — Pod Identity and Access

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Docker Kubernetes