Kubernetes Pod Lifecycle Guide — From Pending to Running

DodaTech Updated 2026-06-24 9 min read

In this tutorial, you'll learn about Kubernetes Pod Lifecycle Guide. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Kubernetes pod lifecycle defines the sequence of states a pod passes through from creation to termination, including initialization, readiness checks, and graceful shutdown.

What You'll Learn

You'll master the pod lifecycle — phase transitions (Pending, Running, Succeeded, Failed, Unknown), container state machines, init containers, probe types (liveness, readiness, startup), and the termination grace period.

Why This Problem Matters

Understanding pod lifecycle is essential for debugging failed deployments, designing graceful shutdowns, and configuring rolling updates. A pod that appears "Running" may be stuck in crash loop backoff, and without understanding lifecycle states, you waste hours troubleshooting.

Real-World Use

Doda Browser's Kubernetes deployment uses init containers to download malware signature databases before the main application starts. Readiness probes ensure traffic is only routed to pods that have fully loaded their signatures.

Pod Lifecycle Phases

flowchart TB
  Create[User creates Pod] --> Pending[Phase: Pending]
  Pending -->|Scheduler assigns node| ContainerCreating[Container Creating]
  ContainerCreating -->|Init containers complete| Running[Phase: Running]
  
  Running -->|All containers exit 0| Succeeded[Phase: Succeeded]
  Running -->|Any container exits non-zero| Failed[Phase: Failed]
  Running -->|Node failure / timeout| Unknown[Phase: Unknown]
  Pending -->|Scheduling failure| Failed
  
  subgraph InitFlow
    Init[Init Container 1] --> Init2[Init Container 2]
    Init2 --> Main[Main Container Starts]
  end
  
  ContainerCreating --> InitFlow

Pod Phase Demonstration

import time
import kubernetes
from kubernetes import client, config

config.load_kube_config()
v1 = client.CoreV1Api()

def watch_pod_lifecycle(namespace: str, pod_name: str, timeout: int = 120):
    start = time.time()
    w = kubernetes.watch.Watch()
    for event in w.stream(
        v1.list_namespaced_pod,
        namespace,
        field_selector=f"metadata.name={pod_name}",
        timeout_seconds=timeout
    ):
        pod = event["object"]
        phase = pod.status.phase
        conditions = {
            c.type: c.status for c in (pod.status.conditions or [])
        }
        container_statuses = pod.status.container_statuses or []
        container_states = {}
        for cs in container_statuses:
            state = cs.state
            if state.running:
                container_states[cs.name] = "running"
            elif state.waiting:
                container_states[cs.name] = f"waiting({state.waiting.reason})"
            elif state.terminated:
                container_states[cs.name] = f"terminated({state.terminated.exit_code})"

        elapsed = time.time() - start
        print(f"[{elapsed:5.1f}s] Phase: {phase:>10} | "
              f"Conditions: {conditions} | "
              f"Containers: {container_states}")

        if phase in ("Succeeded", "Failed"):
            break
    w.stop()

# Run a simple pod and watch its lifecycle
pod_manifest = {
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {"name": "lifecycle-demo"},
    "spec": {
        "restartPolicy": "Never",
        "containers": [{
            "name": "main",
            "image": "alpine",
            "command": ["sh", "-c", "echo 'Starting...'; sleep 3; echo 'Done!'"]
        }]
    }
}

v1.create_namespaced_pod("default", pod_manifest)
watch_pod_lifecycle("default", "lifecycle-demo")
v1.delete_namespaced_pod("default", "lifecycle-demo")

Expected output:

[  0.5s] Phase:    Pending | Conditions: {} | Containers: {}
[  1.2s] Phase:  Pending | Conditions: {'PodScheduled': 'True'} | Containers: {}
[  2.8s] Phase:  Running | Conditions: {'PodScheduled': 'True', 'Initialized': 'True', 'Ready': 'True'} | Containers: {'main': 'running'}
[  6.0s] Phase: Succeeded | Conditions: {'PodScheduled': 'True', 'Initialized': 'True', 'Ready': 'False'} | Containers: {'main': 'terminated(0)'}

Init Containers

Init containers run sequentially before the main application starts:

# init-container-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: init-demo
spec:
  initContainers:
    - name: init-download
      image: busybox:1.36
      command:
        - wget
        - "-O"
        - "/data/signatures.db"
        - "https://cdn.dodatech.com/signatures/latest.db"
      volumeMounts:
        - name: data
          mountPath: /data
    - name: init-check
      image: alpine:3.20
      command:
        - sh
        - "-c"
        - "test -s /data/signatures.db && echo 'Signatures verified'"
      volumeMounts:
        - name: data
          mountPath: /data
  containers:
    - name: main
      image: alpine:3.20
      command: ["sh", "-c", "cat /data/signatures.db | head -5"]
      volumeMounts:
        - name: data
          mountPath: /data
  volumes:
    - name: data
      emptyDir: {}

kubectl apply -f init-container-pod.yaml
kubectl get pod init-demo -w

Expected output:

NAME        READY   STATUS     RESTARTS   AGE
init-demo   0/1     Init:0/2   0          0s
init-demo   0/1     Init:0/2   0          3s
init-demo   0/1     Init:1/2   0          7s
init-demo   0/1     PodInitializing   0          10s
init-demo   1/1     Running           0          12s
init-demo   0/1     Completed         0          14s

Probes: Liveness, Readiness, Startup

apiVersion: v1
kind: Pod
metadata:
  name: probe-demo
spec:
  containers:
    - name: app
      image: python:3.12-slim
      command:
        - python
        - "-c"
        - |
          import http.server, time
          class H(http.server.BaseHTTPRequestHandler):
              ready = False
              def do_GET(self):
                  if self.path == "/healthz":
                      if H.ready:
                          self.send_response(200)
                      else:
                          self.send_response(503)
                  else:
                      self.send_response(200)
                  self.end_headers()
          time.sleep(5)  # Simulate slow startup
          H.ready = True
          http.server.HTTPServer(("0.0.0.0", 8080), H).serve_forever()
      ports:
        - containerPort: 8080
      startupProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 0
        periodSeconds: 2
        failureThreshold: 30
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        periodSeconds: 10
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /healthz
          port: 8080
        periodSeconds: 5
        initialDelaySeconds: 0

Container State Machine

class ContainerStateMachine:
    def __init__(self):
        self.state = "waiting"
        self.reason = "ContainerCreating"
        self.restart_count = 0
        self.exit_code = None

    def transition(self):
        transitions = {
            "waiting": self._waiting_transition,
            "running": self._running_transition,
            "terminated": self._terminated_transition,
        }
        return transitions.get(self.state, lambda: self.state)()

    def _waiting_transition(self):
        print(f"Waiting: {self.reason}")
        if self.reason == "ContainerCreating":
            self.state = "running"
            self.reason = None
        elif self.reason == "CrashLoopBackOff":
            self.restart_count += 1
            self.state = "waiting"
            self.reason = f"CrashLoopBackOff (backoff {2**self.restart_count}s)"

    def _running_transition(self):
        print("Running")
        import random
        if random.random() < 0.2:  # 20% chance of crash
            self.state = "terminated"
            self.exit_code = 1
            self.reason = "Error"

    def _terminated_transition(self):
        print(f"Terminated: exit code {self.exit_code}")
        if self.exit_code != 0 and self.restart_count < 3:
            self.state = "waiting"
            self.reason = "CrashLoopBackOff"
        else:
            print("Pod failed permanently")

fsm = ContainerStateMachine()
for i in range(10):
    print(f"Step {i}: ", end="")
    fsm.transition()
    if fsm.state == "terminated" and fsm.restart_count >= 3:
        break

Expected output:

Step 0: Waiting: ContainerCreating
Step 1: Running
Step 2: Running
Step 3: Terminated: exit code 1
Step 4: Waiting: CrashLoopBackOff
Step 5: Running
Step 6: Running
Step 7: Terminated: exit code 1
Step 8: Waiting: CrashLoopBackOff (backoff 4s)
Step 9: Running

Pod Termination Flow

flowchart LR
  A[Delete Pod] --> B[PreStop Hook]
  B --> C[SIGTERM to PID 1]
  C --> D{Grace Period
Default 30s}
  D -->|Within period| E[Process exits cleanly]
  D -->|Timeout| F[SIGKILL to PID 1]
  E --> G[Pod removed from Endpoints]
  F --> G
  G --> H[Pod resource freed]

apiVersion: v1
kind: Pod
metadata:
  name: graceful-shutdown
spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: app
      image: python:3.12-slim
      command:
        - python
        - "-c"
        - |
          import signal, time, sys
          def handler(signum, frame):
              print("Received SIGTERM, draining connections...")
              time.sleep(3)  # Complete in-flight requests
              print("Shutdown complete")
              sys.exit(0)
          signal.signal(signal.SIGTERM, handler)
          print("Server started")
          while True:
              time.sleep(1)
      lifecycle:
        preStop:
          exec:
            command: ["sh", "-c", "echo 'PreStop: removing from LB'; sleep 2"]

Common Mistakes

1. No Readiness Probe

Without readiness probes, the Service sends traffic to pods before they're ready. The first few requests fail. Always set a readiness probe that checks application-level readiness (not just port listening).

2. Liveness Probe That Depends on External Services

A liveness probe that checks a database causes a restart loop when the DB is temporarily slow. Use readiness for external dependencies, liveness for application-internal health.

3. Init Container Resource Limits

Init containers without resource limits can consume all node resources before the main app starts. Set CPU/memory limits on init containers too.

4. Setting terminationGracePeriodSeconds Too Short

If the application takes 10 seconds to drain connections but the grace period is 5 seconds, the Process gets SIGKILL with connections still open. Set grace period to at least 2x the expected drain time.

5. Ignoring CrashLoopBackOff

CrashLoopBackOff is a symptom, not a cause. Check container logs (kubectl logs <pod> --previous) to see the previous (crashed) container's output.

6. No PreStop Hook for Stateful Workloads

Databases and queue consumers need PreStop hooks to deregister from the cluster before shutting down. Without it, other nodes try to route traffic to a dying pod.

7. RestartPolicy Always for Batch Jobs

Use restartPolicy: OnFailure or Never for batch jobs. The default Always restarts a completed job container, causing it to run again unnecessarily.

Practice Questions

1. What is the difference between init containers and sidecar containers?

Init containers run to completion before the main container starts. Sidecar containers run alongside the main container for the pod's entire lifetime (e.g., log shippers, proxies). Init containers are sequential; sidecars are parallel.

2. Why does a pod stay in Pending state?

Pending means the pod hasn't been scheduled to a node. Common causes: insufficient resources (CPU/memory), node selector or taint mismatch, persistent volume claim not bound, or scheduler issues.

3. How does kubectl delete pod behave differently from kubectl delete pod --force?

Without --force, the pod gets a SIGTERM and has terminationGracePeriodSeconds to shut down gracefully. With --force, the pod is immediately removed from the API server (SIGKILL sent after the short grace period).

4. What happens to pods when a node fails?

After node-monitor-grace-period (default 40s), the node is marked Unknown. After pod-eviction-timeout (default 5m), pods on that node are evicted and rescheduled on healthy nodes.

5. Challenge: Design a zero-downtime deployment using pod lifecycle hooks.

A web service with 10 replicas. During rollout, new pods must be fully ready before old pods are terminated. Design the probe configuration, termination grace period, and PreStop hook needed for zero-downtime updates.

Mini Project: Pod Lifecycle Simulator

import time
import random

class Pod:
    def __init__(self, name: str):
        self.name = name
        self.phase = "Pending"
        self.ready = False
        self.init_complete = False
        self.startup_ok = False
        self.restart_count = 0
        self.deleted = False

    def tick(self):
        if self.deleted and self.phase != "Terminating":
            self.phase = "Terminating"
            return

        if self.phase == "Pending" and self.init_complete:
            self.phase = "Running"

        if self.phase == "Running":
            if random.random() < 0.05:
                self.restart_count += 1
                if self.restart_count > 3:
                    self.phase = "CrashLoopBackOff"
                else:
                    self.phase = "Running"

    def show(self):
        status = f"{self.phase:>15}"
        if self.phase == "Running":
            status += f" (restarts: {self.restart_count})"
        return f"Pod {self.name:>10}: {status} | Ready: {str(self.ready):>5}"

pods = [Pod(f"web-{i}") for i in range(3)]
for _ in range(20):
    for p in pods:
        p.tick()
    for p in pods:
        print(p.show())
    print()
    time.sleep(0.3)

Expected output:

Pod       web-0:         Pending | Ready: False
Pod       web-1:         Pending | Ready: False
Pod       web-2:         Pending | Ready: False

Pod       web-0:         Running | Ready: True
Pod       web-1:         Running | Ready: True
Pod       web-2:         Pending | Ready: False

Pod       web-0:         Running | Ready: True
Pod       web-1:         Running | Ready: True
Pod       web-2:         Running | Ready: True

FAQ

What is the difference between a liveness probe and a readiness probe?

A liveness probe determines if the container should be restarted (is the app deadlocked?). A readiness probe determines if the container should receive traffic (is the app ready to serve?). A failing readiness probe removes the pod from Service endpoints but doesn't restart it.

How many init containers can a pod have?

A pod can have any number of init containers. They run sequentially. The pod's phase remains Init: until all init containers complete successfully. If any init container fails, the entire pod restarts (depending on restartPolicy).

What is the startup probe used for?

The startup probe disables liveness and readiness checks until it succeeds. This prevents slow-starting containers (30-60s initialization) from being killed by liveness probes before they're ready. Once the startup probe succeeds, liveness and readiness probes take over.

What's Next

Kubernetes HPA & VPA Guide

Kubernetes Pods Explained

Kubernetes Deployments

Congratulations on completing this pod lifecycle guide! Here's where to go from here:

Practice daily — Inspect pod states in your cluster
Build a project — Create a deployment with init containers and probes
Explore related topics — Container lifecycle hooks, pod disruption budgets, taints and tolerations
Join the community — Share your lifecycle debugging stories and get feedback

Remember: every expert was once a beginner. Keep orchestrating!

← Previous Docker Resource Limits — CPU, Memory, and I/O Constraints Guide Next → Kubernetes HPA & VPA Guide — Autoscaling Workloads

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Docker Kubernetes